Parse GTF data. This is a simplified version of @gmod/gtf for with just basic parsing and no node.js stream module usage
$ npm install --save gtf-nostream
const { parseStringSync } = require('gtf-nostream')
// or in ES6 (recommended)
import { parseStringSync } from 'gtf-nostream'
const fs = require('fs')
// parse a string of gtf synchronously
const stringOfGTF = fs.readFileSync('my_annotations.gtf', 'utf8')
const arrayOfThings = gtf.parseStringSync(stringOfGTF)
In GTF, features can have more than one location. We parse features as arrayrefs
of all the lines that share that feature's ID. Values that are .
in the GTF
are null
in the output.
A simple feature that's located in just one place:
[
{
"seq_id": "ctg123",
"source": null,
"type": "gene",
"start": 1000,
"end": 9000,
"score": null,
"strand": "+",
"phase": null,
"attributes": {
"ID": ["gene00001"],
"Name": ["EDEN"]
},
"child_features": [],
"derived_features": []
}
]
A CDS called cds00001
located in two places:
[
{
"seq_id": "ctg123",
"source": null,
"type": "CDS",
"start": 1201,
"end": 1500,
"score": null,
"strand": "+",
"phase": "0",
"attributes": {
"ID": ["cds00001"],
"Parent": ["mRNA00001"]
},
"child_features": [],
"derived_features": []
},
{
"seq_id": "ctg123",
"source": null,
"type": "CDS",
"start": 3000,
"end": 3902,
"score": null,
"strand": "+",
"phase": "0",
"attributes": {
"ID": ["cds00001"],
"Parent": ["mRNA00001"]
},
"child_features": [],
"derived_features": []
}
]
Parser options
Whether to resolve references to derives from features
Type: boolean
Text encoding of the input GTF. default 'utf8'
Type: BufferEncoding
Whether to parse features, default true
Type: boolean
Whether to parse directives, default false
Type: boolean
Whether to parse comments, default false
Type: boolean
Whether to parse sequences, default true
Type: boolean
Parse all features, directives, comments, and sequences. Overrides other parsing options. Default false.
Type: boolean