Streams files from a google drive folder
This library streams files from a google drive folder, and applies conversions (if desired).
I wrote this library to codify my solution to a painful experience in which 403 errors developed during larger folder downloads. This was initially puzzling (403 code not sufficient to understand the precise issue), but I solved it by serially streaming the files. The original problem must have been that I was making too many requests all at once over a very small time period, resulting in a 403 from the Google Drive service by policy.
This library uses the google api node client, and therefore requires credentials. By default, this library relies on Environment variable SVC_ACCT_CREDENTIALS to point to a valid Google service account credential file. Further, this credential must have the permissions to impersonate the user email passed in as userId
for the folder folderId
.
However, If you have different requirements (OAuth2, JWT, etc), you can supply your own pre-resolved auth reference by passing google auth as an 'auth' option. For more information, all options are detailed here.
By default, no conversion occurs, data is just passed through as a Buffer (if binary) or a utf8 encoded string.
To change this, supply a transformer
function and an optional exportMimeMap
if the source file on Google Drive is a Google Workspace file type (Docs, Spreadsheet, Presentation, Drawing, etc). These options are supplied using the conversion options.
The library exports a single function that returns a Promise that resolves to a Stream.
Here's the signature:
Promise<Stream> GoogleDriveFolder (folderId, userId, Options)
A quick look at all the input and the types. All options are optional. For a more detailed explanation of input skip to here.
folderId: String,
userId: String,
[Options: Object]
outputDirectory: String,
scopes: Array<String>,
fileQuery: String,
auth: GoogleAuth | OAuth2Client | JWT | String,
exportMimeMap: Object<key:String, value:String>,
transformer: Function
The returned Promise resolves to a Stream
in object mode to receive data objects for each downloaded file as it arrives.
The format of the data objects you will receive on data
events:
{
input: {
data, // Buffer | String, data from Google Drive, Buffer if binary, 'utf-8' String otherwise
name, // String of the file name
ext, // String of the file extension
binary, // Boolean, true if binary data content
downloadMeta, // Object
mimeType | alt // String, mimeType for export, alt for get
},
// If converted is true, the converted data. Otherwise, a reference to input
output: {
data, // String (utf-8) or Buffer if input was binary
name, // String of the file name
ext // String of the file extension
},
converted // Boolean, true if conversion occurred, false otherwise
}
The input
is data as downloaded from Google Drive.
The output
is data as converted by a transformer.
If no conversion occurs (converted === false
), output is a referece to input
.
SVC_ACCT_CREDENTIALS
environment variable must point to a valid Google Service Account credential file UNLESS Auth is supplied using the auth
option.
-
folderId
{String} - Uniquely identifies your folder in the google drive service. Found on the web in the url. -
userId
{String} - The email address of the folder owner that SVC_ACCT_CREDENTIALS will impersonate. If you supply theauth
option, this parameter is ignored.
All options are optional.
-
[Options]
{Object} - The general options object. -
[Options.scopes]
{Array} - Scopes to use for auth (if required) in a special case. Defaults to thedrive.readonly
scope. -
[Options.fileQuery]
{String} - A file query string used to filter files to download by specific characteristics. Defaults to downloading all files in thefolderId
that are NOT deleted (trashed = false
). @see file reference search terms. -
[Options.auth]
{GoogleAuth | OAuth2Client | JWT | String} - Given directly to the Google Drive NodeJS Clientauth
option. Use this option to override the default behavior of theSVC_ACCT_CREDENTIALS
environment variable path to a service account credentials. This will also cause theuserId
parameter to be ignored. -
[Options.outputDirectory]
{String} - Absolute path to the output directory. Defaults to falsy. If supplied, files are written out as data arrives. Does not touch the directory other than to write files. The directory must already exist.
-
[Options.exportMimeMap]
{Object} - A name-value map of Google Workspace mime-types to a conversion mime-type to be performed by the Google Drive service prior to sending the data to thetransformer
function. If this option is supplied, the Google Drive Files 'export' method is used, and therefore the types are presumed to conform to the service capabilities outlined in the Google Export Reference. For detail on Google Workspace mime-types, see Google Workspace MimeTypes. If this option is not supplied, the Google Drive Files 'get' method is used for download. -
[Options.transformer]
{Function} - Transforms the input after download (or optional export conversion) and before it goes out to the stream (and optionaloutputDirectory
, if supplied). Defaults to pass-through. Returns a Promise that resolves to an object that conforms to the stream data format.
A Transformer function receives input from the download and returns a Promise that resolves to the data stream object format
A supplied transformer
function receives a single object from the download of the following format:
{
name: String,
ext: String,
data: <Buffer | String>, // Buffer if binary, 'utf-8' String otherwise
binary: Boolean,
downloadMeta: Object
method: String, // 'get' or 'export'
parameters: Object // parameters used in the download (mimeType from exportMimeMap or alt='media')
}
This example downloads all files from the given Google Drive Folder.
It presumes they are all Google Workspace GoogleDocs files that are markdown documents.
Downloads them as text/plain
, converts them to 'html', and outputs the result to the stream.
import googleDriveFolder from '@localnerve/google-drive-folder';
process.env.SVC_ACCT_CREDENTIALS = '/path/to/svcacctcredential.json';
const folderId = 'ThEfOlDeRiDyOuSeEiNyOuRbRoWsErOnGoOgLeDrIvE';
const userId = 'email-of-the-folder-owner@will-be-impersonated.by-svc-acct';
// Use remark for markdown to html conversion
const remark = require('remark');
const remarkHtml = require('remark-html');
/**
* The transformer definition.
* Receives an input object from the Google Drive download, outputs a conversion object
* in the form of #stream-data-format defined in this readme.md
*
* @param input (Object) The file input object
* @param input.name {String} The name of the file downloaded
* @param input.ext {String} The extension of the file download (if available)
* @param input.data {String} The utf-8 encoded string data from the 'text/plain' download.
* @param input.binary {Boolean} False in this case, true if content binary.
* @param input.downloadMeta {Object} download meta data
* @param input.downloadMeta.method {String} 'export' or 'get', 'export' in this case.
* @param input.downloadMeta.parameters {Object} the download parameters
* @param input.downloadMeta.parameters.mimeType {String} The mimeType used, 'text/plain' in this case.
* @param input.downloadMeta.parameters.alt {String} 'meta' when 'get' method is used, undefined in this case.
*/
function myTransformer (input) {
return new Promise((resolve, reject) => {
remark()
.use(remarkHtml)
.process(input.data, (err, res) => {
if (err) {
err.message = `Failed to convert '${input.name}'` + err.message;
return reject(err);
}
resolve({
input,
output: {
name: input.name,
ext: '.html',
data: String(res)
},
converted: true
});
});
});
}
// let's do this:
try {
// blocks until object flow begins
const stream = await googleDriveFolder(folderId, userId, {
exportMimeMap: {
'application/vnd.google-apps.document': 'text/plain'
},
transformer: myTransformer
});
stream.on('data', data => {
// data.input.data has markdown
// data.output.data has html
console.log(`Received converted data for '${data.output.name}'`, data);
});
stream.on('end', () => {
console.log('downloads are done, we got all the files.');
});
stream.on('error', e => {
throw e;
});
}
catch (e) {
console.error(e); // something went wrong
}
import googleDriveFolder from '@localnerve/google-drive-folder';
process.env.SVC_ACCT_CREDENTIALS = '/path/to/svcacctcredential.json';
const folderId = 'ThEfOlDeRiDyOuSeEiNyOuRbRoWsErOnGoOgLeDrIvE';
const userId = 'email-of-the-folder-owner@will-be-impersonated.by-svc-acct';
try {
// Blocks until object flow begins (while auth and file list is downloaded)
const stream = await googleDriveFolder(folderId, userId);
stream.on('data', data => {
console.log(`Received a data object for '${data.input.name}'`, data);
});
stream.on('end', () => {
console.log('downloads are done, we got all the files');
});
stream.on('error', e => {
throw e;
});
} catch (e) {
console.error(e); // something went wrong
}
All the prerequisites and possible arguments as code (except for auth
option):
import googleDriveFolder from '@localnerve/google-drive-folder';
import myTransformer from './myTransformer'; // your transform function
process.env.SVC_ACCT_CREDENTIALS = '/path/to/svcacctcredential.json';
const folderId = 'ThEfOlDeRiDyOuSeEiNyOuRbRoWsErOnGoOgLeDrIvE';
const userId = 'email-of-the-folder-owner@will-be-impersonated.by-svc-acct';
// all optional, if outputDirectory omitted, returns ReadableStream
const options = {
outputDirectory: '/tmp/mydrivefolder/mustexist',
scopes: [
'special/google.auth/scope/you/might/need/other/than/drive.readonly',
'https://www.googleapis.com/auth/drive.readonly'
],
fileQuery: 'name contains ".md"', // download GoogleDocs markdown files
exportMimeMap: {
'application/vnd.google-apps.document': 'text/plain'
},
transformer: myTransformer
};
try {
// Blocks until object flow begins
const stream = await googleDriveFolder(folderId, userId, options);
stream.on('data', data => {
console.log(`Received a data object for '${data.input.name}'`, data);
});
stream.on('end', () => {
console.log('downloads are done, we got all the files');
});
stream.on('error', e => {
throw e;
});
} catch (e) {
console.error(e); // something went wrong
}