MotherDuck Wasm Client

MotherDuck is a managed DuckDB-in-the-cloud service.

DuckDB Wasm brings DuckDB to every browser thanks to WebAssembly.

The MotherDuck Wasm Client library enables using MotherDuck through DuckDB Wasm in your own browser applications.

Examples

Example projects and live demos can be found here.

Status

Please note that the MotherDuck Wasm Client library is in an early stage of active development. Its structure and API may change considerably.

Our current intention is to align more closely with the DuckDB Wasm API in the future, to make using MotherDuck with DuckDB Wasm as easy as possible.

DuckDB Version Support

The MotherDuck Wasm Client library uses the same version of DuckDB Wasm as the MotherDuck web UI. Since the DuckDB Wasm assets are fetched dynamically, and the MotherDuck web UI is updated weekly and adopts new DuckDB versions promptly, the DuckDB version used could change even without upgrading the MotherDuck Wasm Client library. Check pragma version to see which DuckDB version is in use.

Installation

npm install @motherduck/wasm-client

Requirements

To faciliate efficient communication across worker threads, the MotherDuck Wasm Client library currently uses advanced browser features, including SharedArrayBuffer.

Due to security requirements of modern browsers, these features require applications to be cross-origin isolated.

To use the MotherDuck Wasm Client library, your application must be in cross-origin isolation mode, which is enabled when it is served with the following headers:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

You can check whether your application is in this mode by examining the crossOriginIsolated property in the browser console.

Note that applications in this mode are restricted in some ways. In particular, resources from different origins can only be loaded if they are served with a Cross-Origin-Resource-Policy (CORS) header with the value cross-origin.

Dependencies

The MotherDuck Wasm Client library depends on apache-arrow as a peer dependency. If you use npm version 7 or later to install @motherduck/wasm-client, then apache-arrow will automatically be installed, if it is not already.

If you already have apache-arrow installed, then @motherduck/wasm-client will use it, as long as it is a compatible version (^14.0.x at the time of this writing).

Optionally, you can use a variant of @motherduck/wasm-client that bundles apache-arrow instead of relying on it as a peer dependency. Don't use this option if you are using apache-arrow elsewhere in your application, because different copies of this library don't work together. To use this version, change your imports to:

import '@motherduck/wasm-client/with-arrow';

instead of:

import '@motherduck/wasm-client';

Usage

The MotherDuck Wasm Client library is written in TypeScript and exposes full TypeScript type definitions. These instructions assume you are using it from TypeScript.

Once you have installed @motherduck/wasm-client, you can import the main class, MDConnection, as follows:

import { MDConnection } from '@motherduck/wasm-client';

Creating Connections

To create a connection to a MotherDuck-connected DuckDB instance, call the create static method:

const connection = MDConnection.create({
  mdToken: token
});

The mdToken parameter is required and should be set to a valid MotherDuck access token. You can create a MotherDuck access token in the MotherDuck UI. For more information, see Authenticating to MotherDuck.

The create call returns immediately, but starts the process of loading the DuckDB Wasm assets from https://app.motherduck.com and starting the DuckDB Wasm worker.

This initialization process happens asynchronously. Any query evaluated before initialization is complete will be queued.

To determine whether initialization is complete, call the isInitialized method, which returns a promise resolving to true when DuckDB Wasm is initialized:

await connection.isInitialized();

Multiple connections can be created. Connections share a DuckDB Wasm instance, so creating subsequent connections will not repeat the initialization process.

Queries evaluated on different connections happen concurrently; queries evaluated on the same connection are queued sequentially.

Evaluating Queries

To evaluate a query, call the evaluateQuery method on the connection object:

try {
  const result = await connection.evaluateQuery(sql);
  console.log('query result', result);
} catch (err) {
  console.log('query failed', err);
}

The evaluateQuery method returns a promise for the result. In an async function, you can use the await syntax as above. Or, you can use the then and/or catch methods:

connection.evaluateQuery(sql).then((result) => {
  console.log('query result', result);
}).catch((reason) => {
  console.log('query failed', reason);
});

See Results below for the structure of the result object.

Prepared Statements

To create a prepared statement for later evaluation, use the prepareQuery method:

const prepareResult = await this.prepareQuery('SELECT v + ? FROM generate_series(0, 10000) AS t(v);');

This returns an AsyncPreparedStatement, which can be evaluated later using the send method:

const arrowStream = await prepareResult.send(234);

Note: The query method of the AsyncPreparedStatement should not be used, because it can lead to deadlock when combined with the MotherDuck extension.

To immediately evaluate a prepared statement, call the evaluatePreparedStatement method:

const result = await connection.evaluatePreparedStatement('SELECT v + ? FROM generate_series(0, 10000) AS t(v);', [234]);

This returns a materialized result, as described in Results below.

Canceling Queries

To evalute a query that can be canceled, use the enqueueQuery and evaluateQueuedQuery methods:

const queryId = connection.enqueueQuery(sql);
const result = await connection.evaluateQueuedQuery(queryId);

To cancel a query evaluated in this fashion, use the cancelQuery method, passing the queryId returned by enqueueQuery:

const queryWasCanceled = await connection.cancelQuery(queryId);

The cancelQuery method returns a promise for a boolean indicating whether the query was successfully canceled.

The result promise of a canceled query will be rejected with and error message. The cancelQuery method takes an optional second argument for controlling this message:

const queryWasCanceled = await connection.cancelQuery(queryId, 'custom error message');

Streaming Results

The query methods above return fully materialized results. To evalute a query and return a stream of results, use evaluateStreamingQuery or evaluateStreamingPreparedStatement:

const result = await connection.evaluateStreamingQuery(sql);

See Results below for the structure of the result object.

Error Handling

The query result promises returned by evaluateQuery, evaluatePreparedStatement, evaluateQueuedQuery, and evaluateStreamingQuery will be rejected in the case of an error.

For convenience, "safe" variants of these three method are provided that catch this error and always resolve to a value indicating success or failure. For example:

const result = await connection.safeEvaluateQuery(sql);
if (result.status === 'success') {
  console.log('rows', result.rows);
} else {
  console.log('error', result.err);
}

Results

A successful query result may either be fully materialized, or it may contain a stream.

Use the type property of the result object, which is either 'materialized' or 'streaming', to distinguish these.

Materialized Results

A materialized result contains a data property, which provides several methods for getting the results.

The number of columns and rows in the result are available through the columnCount and rowCount properties of data. Column names and types can be retrived using the columnName(columnIndex) and columnType(columnIndex) methods. Individual values can be accessed using the value(columnIndex, rowIndex) method. See below for details about the forms values can take. Several convenience methods also simplify common access patterns; see singleValue(), columnNames(), deduplicatedColumnNames(), and toRows().

The toRows() method is especially useful in many cases. It returns the result as an array of row objects. Each row object has one property per column, named after that column. (Multiple columns with the same name are dedupicated with suffixes.) The type of each column property of a row object depends on the type of the corresponding column in DuckDB.

Many values are converted to a JavaScript primitive type, such as boolean, number, or string. Some numeric values too large to fit in a JavaScript number (e.g a DuckDB BIGINT) are converted to a JavaScript bigint. Some DuckDB types, such as DATE, TIME, TIMESTAMP, and DECIMAL, are converted to JavaScript objects implementing an interface specific to that type. Nested types such as DuckDB LIST, MAP, and STRUCT are also exposed through speical JavaScript objects.

These objects all implement toString to return a string representation. For primitive, this representation is identical to DuckDB's string conversion (e.g. using CAST to VARCHAR). For nested types, the representation is equivalent to the syntax used to construct these types. They also have properties exposing the underlying value. For example, the object for a DuckDB TIME has a microseconds property (of type bigint). See the TypeScript type definitions for details.

Note that these result types differ from those returned by DuckDB Wasm without the MotherDuck Wasm Client library. The MotherDuck Wasm Client library implements custom conversion logic to preserve the full range of some types.

Streaming Results

A streaming result contains three ways to consume the results, arrowStream, dataStream, and dataReader. The first two (arrowStream and dataStream) implement the async iterator protocol, and return items representing batches of rows, but return different kinds of batch objects. Batches correspond to DuckDB DataChunks, which are no more than 2048 rows. The third (dataReader) wraps dataStream and makes consuming multiple batches easier.

The dataStream iterator returns a sequence of data objects, each of which implements the same interface as the data property of a materialized query result, described above.

The dataReader implements the same data interface, but also adds useful methods such as readAll and readUntil, which can be used to read at least a given number of rows, possibly across multiple batches.

The arrowStream property provides access to the underlying Arrow RecordBatch stream reader. This can be useful if you need the underlying Arrow representation. Also, this stream has convenience methods such as readAll to materialize all batches. Note, however, that Arrow performs sometimes lossy conversion of the underlying data to JavaScript types for certain DuckDB types, especially dates, times, and decimals. Also, converting Arrow values to strings will not always match DuckDB's string conversion.

Note that results of remote queries are not streamed end-to-end yet. Results of remote queries are fully materialized on the client upstream of this API. So the first batch will not be returned from this API until all results have been received by the client. End-to-end streaming of remote query results is on our roadmap.

DuckDB Wasm API

To access the underlying DuckDB Wasm instance, use the getAsyncDuckDb function. Note that this function returns (a Promise to) a singleton instance of DuckDB Wasm also used by the MotherDuck Wasm Client.