This is a DBOS library for working with Amazon Web Services Simple Storage Service (S3).
In order to store and retrieve objects in S3, it is necessary to:
- Register with AWS, create an S3 bucket, and create access credentials. (See Getting started with Amazon S3 in AWS documentation.)
- If browser-based clients will read from S3 directly, set up S3 CORS. (See Using cross-origin resource sharing (CORS) in AWS documentation.)
First, ensure that the DBOS S3 component is installed into the application:
npm install --save @dbos-inc/component-aws-s3
Second, ensure that the library class is imported and exported from an application entrypoint source file:
import { S3Ops } from "@dbos-inc/component-aws-s3";
export { S3Ops };
Third, place appropriate configuration into the dbos-config.yaml
file; the following example will pull the AWS information from the environment:
application:
aws_s3_configuration: aws_config # Optional if the section is called `aws_config`
aws_config:
aws_region: ${AWS_REGION}
aws_access_key_id: ${AWS_ACCESS_KEY_ID}
aws_secret_access_key: ${AWS_SECRET_ACCESS_KEY}
If a different configuration file section should be used for S3, the aws_s3_configuration
can be changed to indicate a configuration section for use with S3. If multiple configurations are to be used, the application code will have to name and configure them.
The application will need at least one s3 bucket. This can be placed in the application
section of dbos-config.yaml
also, but the naming key is to be established by the application.
S3Ops
is a configured class. The AWS configuration (or config file key name) and bucket name must be provided when a class instance is created, for example:
const defaultS3 = configureInstance(S3Ops, 'myS3Bucket', {awscfgname: 'aws_config', bucket: 'my-s3-bucket', ...});
The S3Ops
class provides several communicator wrappers for S3 functions.
A string can be written to an S3 key with the following:
const putres = await ctx.invoke(defaultS3).put('/my/test/key', "Test string from DBOS");
Note that the arguments to put
will be logged to the database. Consider having the client upload to S3 with a presigned post if the data is generated outside of DBOS or if the data is larger than a few megabytes.
A string can be read from an S3 key with the following:
const getres = await ctx.invoke(defaultS3).get('/my/test/key');
Note that the return value from get
will be logged to the database. Consider reading directly from S3 with a signed URL if the data is large.
An S3 key can be removed/deleted with the following:
const delres = await ctx.invoke(defaultS3).delete('/my/test/key');
As shown below, DBOS provides a convenient and powerful way to track and manage S3 objects. However, it is often not convenient to send or retrieve large S3 object contents through DBOS; it would be preferable to have the client talk directly to S3. S3 accomodates this use case very well, using a feature called "presigned URLs".
In these cases, the client can place a request to DBOS that produces a presigned GET / POST URL, which the client can use for a limited time and purpose for S3 access.
A presigned GET URL can be created for an S3 key with the following:
const geturl = await ctx.invoke(defaultS3).presignedGetURL('/my/test/key', 30 /*expiration, in seconds*/);
The resulting URL string can be used in the same way as any other URL for placing HTTP GET requests.
It may be desired to return the Content-Type
, Content-Disposition
, or other headers to the client that ultimately makes the S3 GET request. This can be controlled by providing response options.
// Let the client know it is downloading a .zip file
const geturl = await ctx.invoke(defaultS3).presignedGetURL('/my/test/key', 30 /*expiration, in seconds*/, {ResponseContentType: 'application/zip', ResponseContentDisposition: `attachment; filename="file.zip"`});
A presigned POST URL can be created for an S3 key with the following:
const presignedPost = await ctx.invoke(defaultS3).createPresignedPost(
'/my/test/key', 30/*expiration*/, {contentType: 'text/plain'}/*size/content restrictions*/);
The resulting PresignedPost
object is slightly more involved than a regular URL, as it contains not just URL to be used, but also the required HTTP headers that must accompany the POST request. An example of using the PresignedPost
in Typescript with Axios is:
async function uploadToS3(presignedPostData: PresignedPost, filePath: string) {
const formData = new FormData();
// Append all the fields from the presigned post data
Object.keys(presignedPostData.fields).forEach(key => {
formData.append(key, presignedPostData.fields[key]);
});
// Append the file you want to upload
const fileStream = fs.createReadStream(filePath);
formData.append('file', fileStream);
// Access the presigned post URL
return await axios.post(presignedPostData.url, formData);
}
In many cases, an application wants to keep track of objects that have been stored in S3. S3 is, as the name implies, simple storage, and it doesn't track file attributes, permissions, fine-grained ownership, dependencies, etc.
Keeping an indexed set of file metadata records, including referential links to their owners, is a "database problem". And, while keeping the database in sync with the contents of S3 sounds like it may be tricky, DBOS Transact Workflows provide the perfect tool for accomplishing this, even in the face of client or server failures.
The S3Ops
class provides workflows that can be used to ensure that a table of file records is maintained for an S3 bucket. This table can have any schema suitable to the application (an example table schema can be found in s3_utils.test.ts
), because the application provides the code to maintain it as a set of callback functions that will be triggered from the workflow.
The interface for the workflow functions (described below) allows for the following callbacks:
export interface FileRecord {
key: string; // AWS S3 Key
}
export interface S3Config{
s3Callbacks?: {
/* Called when a new active file is added to S3 */
newActiveFile: (ctx: WorkflowContext, rec: FileRecord) => Promise<unknown>;
/* Called when a new pending (still undergoing workflow processing) file is added to S3 */
newPendingFile: (ctx: WorkflowContext, rec: FileRecord) => Promise<unknown>;
/* Called when pending file becomes active */
fileActivated: (ctx: WorkflowContext, rec: FileRecord) => Promise<unknown>;
/* Called when a file is deleted from S3 */
fileDeleted: (ctx: WorkflowContext, rec: FileRecord) => Promise<unknown>;
},
//... remainder of S3 Config
}
The saveStringToFile
workflow stores a string to an S3 key, and runs the callback function to update the database. If anything goes wrong during the workflow, S3 will be cleaned up and the database will be unchanged by the workflow.
await ctx.invokeWorkflow(defaultS3).saveStringToFile(fileDBRecord, 'This is my file');
This workflow performs the following actions:
- Puts the string in S3
- If there is difficulty with S3, ensures that no entry is left there and throws an error
- Invokes the callback for a new active file record
Note that the arguments to saveStringToFile
will be logged to the database. Consider having the client upload to S3 with a presigned post if the data is larger than a few megabytes.
The workflow function readStringFromFile(ctx: WorkflowContext, fileDetails: FileRecord)
will retrieve the contents of an S3 object as a string
.
This workflow currently performs no additional operations outside of a call to get(fileDetails.key)
.
Note that the return value from readStringFromFile
will be logged to the database. Consider having the client read from S3 with a signed URL if the data is larger than a few megabytes.
The deleteFile
workflow function removes a file from both S3 and the database.
await ctx.invokeWorkflow(defaultS3).deleteFile(fileDBRecord);
This workflow performs the following actions:
- Invokes the callback for a deleted file record
- Removes the key from S3
The workflow to allow application clients to upload to S3 directly is more involved than other workflows, as it runs concurrently with the client. The workflow interaction generally proceeds as follows:
- The client makes some request to a DBOS handler.
- The handler decides that the client will be uploading a file, and starts the upload workflow.
- The upload workflow sends the handler a presigned post, which the handler returns to the client. The workflow continues in the background, waiting to hear that the client upload is complete.
- The client (ideally) uploads the file using the presigned post and notifies the application; if not, the workflow times out and cleans up. (The workflow timeout should be set to occur after the presigned post expires.)
- If successful, the database is updated to reflect the new file, otherwise S3 is cleaned of any partial work.
The workflow can be used in the following way:
// Start the workflow
const wfHandle = await ctx.startWorkflow(defaultS3)
.writeFileViaURL(fileDBRec, 60/*expiration*/, {contentType: 'text/plain'} /*content restrictions*/);
// Get the presigned post from the workflow
const ppost = await ctx.getEvent<PresignedPost>(wfHandle.getWorkflowUUID(), "uploadkey");
// Return the ppost object to the client for use as in 'Presigned POSTs' section above
// The workflow UUID should also be known in some way
Upon a completion call from the client, the following should be performed to notify the workflow to proceed:
// Look up wfHandle by the workflow UUID
const wfHandle = ctx.retrieveWorkflow(uuid);
// Notify workflow - truish means success, any falsy value indicates failure / cancel
await ctx.send<boolean>(wfHandle.getWorkflowUUID(), true, "uploadfinish");
// Optionally, await completion of the workflow; this ensures that the database record is written,
// or will throw an error if anything went wrong in the workflow
await wfHandle.getResult();
The workflow function getFileReadURL(ctx: WorkflowContext, fileDetails: FileRecord, expiration: number, options: S3GetResponseOptions)
returns a signed URL for retrieving object contents from S3, valid for expiration
seconds, and using any provided response options.
This workflow currently performs no additional operations outside of a call to presignedGetURL(fileDetails.key, expiration, options)
.
Do not reuse S3 keys. Assigning unique identifiers to files is a much better idea, if a "name" is to be reused, it can be reused in the lookup database. Reasons why S3 keys should not be reused:
- S3 caches the key contents. Even a response of "this key doesn't exist" can be cached. If you reuse keys, you may get a stale value.
- Workflow operations against an old use of a key may still be in process... for example a delete workflow may still be attempting to delete the old object at the same time a new file is being placed under the same key.
DBOS Transact logs function parameters and return values to the system database. Some functions above treat the data contents of the S3 object as a parameter or return value, and are therefore only suitable for small data items (kilobytes, maybe megabytes, not gigabytes). For large data, use workflows where the data is sent directly to and from S3 using presigned URLs.
The s3_utils.test.ts
file included in the source repository can be used to upload and download files to/from S3 using various approaches. Before running, set the following environment variables:
-
S3_BUCKET
: The S3 bucket for setting / retrieving test objects -
AWS_REGION
: AWS region to use -
AWS_ACCESS_KEY_ID
: The access key with permission to use the S3 service -
AWS_SECRET_ACCESS_KEY
: The secret access key corresponding toAWS_ACCESS_KEY_ID
- For a detailed DBOS Transact tutorial, check out our programming quickstart.
- To learn how to deploy your application to DBOS Cloud, visit our cloud quickstart
- To learn more about DBOS, take a look at our documentation or our source code.