DagDB
This project is pre-release, do not use it in production, breaking changes may still occur without notice.
DagDB is a portable and syncable database for the Web.
It can run as a distributed database in Node.js, including Serverless environments using AWS services as a backend.
It also runs in the browser. In fact, there is no "client and server"
in DagDB, everything is just a DagDB database replicating from another
database. In this way, it's closer to git
than a traditional database
workflow.
Creating Databases
At an abstract level, DagDB databases operate on top of two storage interfaces. The first is the block store, which is a relatively simple key/value store. The second is an updater which is a single mutable reference to the current root of the database.
The following methods are available to simplify the process of creating a new database on a number of storage and updater backends.
Create an in-memory database.
const db = await dagdb /* 'inmemory' also works */
Create a database in a Browser
const db = await dagdb
If you want to have multiple unique databases stored in the browser
you can use the updateKey
option.
// the default updateKey is "root"const db = await dagdb
Create a database in S3
const Bucket = 'bucketName'const s3 = params: Bucket let db = await dagdb
This uses S3 for block storage and for the update transaction. This will work fine as long as you don't try to update the same database with a lot of concurrency, then you might encounter eventually consistency issues w/ S3. An updater built on top of Dynamo that can do transactional updates is planned in order to resolve these concerns.
Create a database from a leveldown interface.
This allows you to store DagDB data in a wide variety of storage backends.
const leveldown = // memdown takes a unique identifierconst db = await dagdb
Create a database at a remote URL (no local caching or storage).
const db = await dagdb
Opening a Database
Opening a remote database
const db = await dagdb
Opening a database in the Browser
const db = await dagdb
If you want to have multiple unique databases stored in the browser
you can use the updateKey
option.
// the default updateKey is "root"const db = await dagdb
Create a database in S3
const Bucket = 'bucketName'const s3 = params: Bucket let db = await dagdb
Opening a leveldown database
const db = await dagdb
Key Value Storage
DagDB's primary storage system is a simple key-value store. Keys can be any string, and values can be almost anything.
For instance, all JSON types are natively supported as values.
let db = await dagdbawait dbconsole// prints "world"
As you can see, you can set and get values immediately. Something to
note about this example is that, while the "hello"
key is available,
it is actually coming out of a staging area that has not yet been committed
to the database.
Every instance of DagDB
is bound to an immutable database state.
We then add, remove, or change keys in that database until finally
updating it, which will return us a new DagDB
instance
for the newly updated immutable state.
let db = await dagdbawait dbdb = await dbconsole// prints "world"
Now that we know how to set values and update the database lets work with some more advanced values.
const now = await db
As you can see, we can use all JSON types and there's no limit to how far we
can nest values inside of objects. In addition to JSON types we support efficient
binary serialization, so you can use Uint8Array
for any binary you have.
Links
So far we haven't shown you anything you can't do with any other key-value store. Now let's look at some features unique to DagDB and the primitives it's built on.
const link = await dblink name: 'Earth' size: 39588 await dbawait dbdb = await db const howBigIsYourPlanet = async { const person = await db const planet = await person console}await // prints "Mikeal Rogers lives on a planet w/ a radius of 3958.8mi"await // prints "Chris Hafey lives on a planet w/ a radius of 3958.8mi"
Pretty cool!
As you can see, link values are decoded by DagDB as async functions that will return the decoded value from the database.
The great thing about links is that the data is de-duplicated across the database. DagDB uses a technique called "content addressing" that links data by hashing the value. This means that, even if you create the link again with the same data, the link will be the same and the data will be deduplicated.
You can also compare links in order to tell if they refer to the same data.
const link1 = await dblink name: 'Earth' size: 39588 const link2 = await dblink name: 'Earth' size: 39588 console// prints true const samePlanet = async { const person1 = await db const person2 = await db if person1planet console else console }// prints "Mikeal Rogers is on the same planet as Chris Hafey"
As you can see, links are more than addresses, they are useful values for comparison.
There's no limit to the number of links and the depth at which you nest your values. Most importantly, you can use linked data in any other value with zero copy overhead, it's just a simple small update to the link value.
Streams
Since it is often problematic to store large amounts of binary as a single value, DagDB also natively supports storing streams of binary data.
DagDB treats any async generator as a binary stream. Node.js Streams are valid async generators so they work right away.
const reader = db = await db const printFile = async { const value = await db for { processstdout }}
Note that, while you can use any Stream interface that is a valid async generator (like Node.js Streams) to store the data, when you retrieve the stream it will be returned as a common async generator (not a Node.js Stream).
The size of every chunk in the stream is preserved. However, this may change in the future. Some transports have issues with block sizes larger than 1mb so we may change the defaults in the future to keep each chunk below 1mb.
Nesting Databases
Another really cool think you can do is use DagDB's as values in other databases.
let db1 = await dagdblet db2 = await dagdb db1 = await db1db2 = await db2 const db = await db2console// prints "world"
This feature uses a very flexible system that can be extended in the future to feature all kinds of new data types.
Custom Types
DagDB's support for nesting databases lends itself to support other types of embeddings as well. This is a powerful feature that is used internally to allow embedding of builtin types and classes, but it can also be used to support embedding arbitrary custom types as well.
The API is pretty simple, it requires the caller to specify a type
string, and an init
function that takes two arguments, namely the root
cid of the custom object, and the underlying store
. For example:
const initCustomType = async { return await store}
Additionally, the custom type/object must support the following interface:
From the internal docs:
Encoders, both here and in special types, are async generators that yield as many blocks as they like as long as the very last thing they yield is NOT a Block. This is so that the final root of each each node can be embedded in a parent. This contract MUST be adhered to by all special types. Additionally, the _dagdb property specifies the type name for v1 of the interface (leaving room for future interface changes), and is used to lookup the in memory custom type mapping.
To register the custom type, you simply call register
on the database:
let db = await dagdb; db; const value = ...args;await db;db = await db; const custom = await local;custommethod;
Replication
Replication in DagDB is quite different than traditional databases. Since there isn't a client and a server, since there's just databases everywhere, replication is a key component of how you access data.
The closest thing to DagDB replication you're familiar with is git
. The way changes are merged
from one branch to another and from one remote to another. We even have a system for keeping track
of remote databases that feels a lot like git.
Let's start by adding and pulling from a remote.
const url = 'http://website.com/db'const remoteDatabase = await dagdbawait remoteDatabase let db = dagdbawait dbremotes await dbremotesdb = await db console// prints "world"
Using remotes for replication is an efficient way to move data around because it keeps track of the last changeset and can easily pull only the changes since that time. However, if you have two database instances locally you can easily merge one into the other without using the remote system.
let db1 = await dagdblet db2 = await dagdb db1 = await db1db2 = await db2 console// prints "world"
Replicate remote to key
So far, we've been using replication to merge an entire database's keyspace into our own. But as we've already seen, you can use a DagDB database as a value, so it would make sense to use a remote to replicate into a key rather than merging into our entire local namespace.
const url = 'http://website.com/db'const remoteDatabase = await dagdbawait remoteDatabase let db = dagdbawait dbremotes await dbremotesdb = await dbconst webdb = await db console// prints "world"
Running the HTTP Service
in Node.js
If you're using Node.js it's quite easy to get an HTTP handler you can
pass to http.createServer
for any database instance.
const db = await dagdbconst handler = const server = httpserver