s3-concat
TypeScript icon, indicating that this package has built-in type declarations

1.2.1 • Public • Published

ci codecov

s3-concat

s3-concat is a library for concatenating multiple files stored in AWS S3 into a single file using multipart upload. This is particularly useful for handling large datasets and optimizing S3 operations. Files larger than 5MiB are uploaded using multipart upload, while files smaller than 5MiB are concatenated via streaming. Additionally, the order in which the source files are concatenated can also be controlled.

Inspired by the s3-concat project on PyPI.

Installation

npm install s3-concat

Usage

Example

Example 1: Concatenating into a Single File

This example shows how to concatenate all files into a single file without using the minSize option.

import { S3Client } from '@aws-sdk/client-s3';
import { S3Concat } from 's3-concat';

const s3Client = new S3Client({});
const srcBucketName = process.env.srcBucketName!;
const dstBucketName = process.env.dstBucketName!;
const dstPrefix = 'output';

const main = async () => {
  const s3Concat = new S3Concat({
    s3Client,
    srcBucketName: srcBucketName,
    dstBucketName: dstBucketName,
    dstPrefix,
    concatFileName: 'final_concat.json',
  });

  await s3Concat.addFiles('tmp/1gb');
  await s3Concat.concat();
};

main().then(() => console.log('success'));

In this example, all files from the tmp/1gb prefix in the source bucket will be concatenated into a single file named final_concat.json.

Example 2: Concatenating into Multiple Files with minSize

This example shows how to use the minSize option to split the concatenated files if the total size exceeds the specified limit.

import { S3Client } from '@aws-sdk/client-s3';
import { S3Concat } from 's3-concat';

const s3Client = new S3Client({});
const srcBucketName = process.env.srcBucketName!;
const dstBucketName = process.env.dstBucketName!;
const dstPrefix = 'output';

const main = async () => {
  const s3Concat = new S3Concat({
    s3Client,
    srcBucketName: srcBucketName,
    dstBucketName: dstBucketName,
    dstPrefix,
    concatFileNameCallback: (i) => `concat_${i}.json`,
    minSize: '5GiB',
  });

  await s3Concat.addFiles('tmp/1gb');
  await s3Concat.concat();
};

main().then(() => console.log('success'));

In this example, files from the tmp/1gb prefix in the source bucket will be concatenated and split into multiple files if the total size exceeds 5GiB. The concatenated files will be named using the callback function, resulting in names like concat_1.json, concat_2.json, etc.

Example 3: Custom Join Order Example

It is possible to specify the join order using the joinOrder option. Although the presets keyNameDsc and keyNameAsc are supported, you can also customize the join order by providing your own function that conforms to the type JoinOrderCompareFn (e.g., JoinOrderCompareFn<{ key: string; size: number; lastModified: Date }>).

// Descending order by keyName
const s3Concat = new S3Concat({
  s3Client,
  srcBucketName: srcBucketName,
  dstBucketName: dstBucketName,
  dstPrefix,
  concatFileNameCallback: (i) => `concat_${i}.json`,
+ joinOrder: 'keyNameDsc', // use builtin keyword
});

// Descending order by lastModified
const s3Concat = new S3Concat({
  s3Client,
  srcBucketName: srcBucketName,
  dstBucketName: dstBucketName,
  dstPrefix,
  concatFileNameCallback: (i) => `concat_${i}.json`,
+ joinOrder: (a, b) => a.lastModified.getTime() - b.lastModified.getTime(),
});

// Descending order by size
const s3Concat = new S3Concat({
  s3Client,
  srcBucketName: srcBucketName,
  dstBucketName: dstBucketName,
  dstPrefix,
  concatFileNameCallback: (i) => `concat_${i}.json`,
+ joinOrder: (a, b) => b.size - a.size,
});

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please open an issue or submit a pull request with any changes or improvements.

Package Sidebar

Install

npm i s3-concat

Weekly Downloads

4

Version

1.2.1

License

MIT

Unpacked Size

35.2 kB

Total Files

11

Last publish

Collaborators

  • shuntaka9576