parq
TypeScript icon, indicating that this package has built-in type declarations

0.1.1 • Public • Published

parq is a Parquet reader in JavaScript. Install from NPM via parq.

Usage

You can build a reader and then iterate over its contents, yielding a Uint8Array for each value:

import { buildReader, flatIterate } from 'parq';

const bytes = /* Uint8Array from somewhere */;
const pr = await buildReader(bytes);

// iterate over the data in rows 100-200 of column zero
const it = flatIterate(pr, 0, 100, 200);

let i = 100;
for await (const value of it) {
  console.info(`col0 row${i}=`, value);
  ++i;
}

It's a bit awkward to receive a Uint8Array per-value, but it matches how Parquet works: it has a variety of primitive data types as well as the BYTE_ARRAY type which has variable length. This type is usually used for UTF-8 encoded strings.

To find out what type is used per-column, check pr.info().columns for their name, type, and so on, before indexing.

Advanced Usage

You can access the low-level methods on ParquetReader to read raw page data directly. These need a little bit of work to eventually render, but this means you can process the data more efficiently.

You can also pass a Reader implementation to buildReader instead of raw bytes. This is a method which reads bytes in a specific range, useful if you are processing large files and don't want to read it from disk or network all at once.

Demo

There's a simple demo on GitHub Pages, with the source in [./demo]. This uses a Worker to process Parquet data remotely, which means that this code can trivially handle gigabyte or more file sizes. It implements a remote ParquetReader that connects to the worker.

Readme

Keywords

none

Package Sidebar

Install

npm i parq

Weekly Downloads

1

Version

0.1.1

License

none

Unpacked Size

302 kB

Total Files

14

Last publish

Collaborators

  • samthor