markdown-tree-parser

A powerful JavaScript library and CLI tool for parsing and manipulating markdown files as tree structures. Built on top of the battle-tested remark/unified ecosystem.

🚀 Features

🌳 Tree-based parsing - Treats markdown as manipulable Abstract Syntax Trees (AST)
✂️ Section extraction - Extract specific sections with automatic boundary detection
🔍 Powerful search - CSS-like selectors and custom search functions
📚 Batch processing - Process multiple sections at once
🛠️ CLI & Library - Use as a command-line tool or JavaScript library
📊 Document analysis - Get statistics and generate table of contents
🎯 TypeScript ready - Full type definitions included

📦 Installation

Global Installation (for CLI usage)

# Using npm
npm install -g @kayvan/markdown-tree-parser

# Using pnpm (may require approval for build scripts)
pnpm install -g @kayvan/markdown-tree-parser
pnpm approve-builds -g  # If prompted

# Using yarn
yarn global add @kayvan/markdown-tree-parser

Local Installation (for library usage)

npm install @kayvan/markdown-tree-parser

🔧 CLI Usage

After global installation, use the md-tree command:

List all headings

md-tree list README.md
md-tree list README.md --format json

Extract specific sections

# Extract one section
md-tree extract README.md "Installation"

# Extract to a file
md-tree extract README.md "Installation" --output ./sections

Extract all sections at a level

# Extract all level-2 sections
md-tree extract-all README.md 2

# Extract to separate files
md-tree extract-all README.md 2 --output ./sections

Show document structure

md-tree tree README.md

Search with CSS-like selectors

# Find all level-2 headings
md-tree search README.md "heading[depth=2]"

# Find all links
md-tree search README.md "link"

Document statistics

md-tree stats README.md

Check links

md-tree check-links README.md
md-tree check-links README.md --recursive

Generate table of contents

md-tree toc README.md --max-level 3

Complete CLI options

md-tree help

📚 Library Usage

Basic Usage

import { MarkdownTreeParser } from 'markdown-tree-parser';

const parser = new MarkdownTreeParser();

// Parse markdown into AST
const markdown = `
# My Document
Some content here.

## Section 1
Content for section 1.

## Section 2
Content for section 2.
`;

const tree = await parser.parse(markdown);

// Extract a specific section
const section = parser.extractSection(tree, 'Section 1');
const sectionMarkdown = await parser.stringify(section);

console.log(sectionMarkdown);
// Output:
// ## Section 1
// Content for section 1.

Advanced Usage

import { MarkdownTreeParser, createParser, extractSection } from 'markdown-tree-parser';

// Create parser with custom options
const parser = createParser({
  bullet: '-',      // Use '-' for lists
  emphasis: '_',    // Use '_' for emphasis
  strong: '__'      // Use '__' for strong
});

// Extract all sections at level 2
const tree = await parser.parse(markdown);
const sections = parser.extractAllSections(tree, 2);

sections.forEach(async (section, index) => {
  const heading = parser.getHeadingText(section.heading);
  const content = await parser.stringify(section.tree);
  console.log(`Section ${index + 1}: ${heading}`);
  console.log(content);
});

// Use convenience functions
const sectionMarkdown = await extractSection(markdown, 'Installation');

Search and Manipulation

// CSS-like selectors
const headings = parser.selectAll(tree, 'heading[depth=2]');
const links = parser.selectAll(tree, 'link');
const codeBlocks = parser.selectAll(tree, 'code');

// Custom search
const customNode = parser.findNode(tree, (node) => {
  return node.type === 'heading' &&
         parser.getHeadingText(node).includes('API');
});

// Transform content
parser.transform(tree, (node) => {
  if (node.type === 'heading' && node.depth === 1) {
    node.depth = 2; // Convert h1 to h2
  }
});

// Get document statistics
const stats = parser.getStats(tree);
console.log(`Document has ${stats.wordCount} words and ${stats.headings.total} headings`);

// Generate table of contents
const toc = parser.generateTableOfContents(tree, 3);
console.log(toc);

Working with Files

import fs from 'fs/promises';

// Read and process a file
const content = await fs.readFile('README.md', 'utf-8');
const tree = await parser.parse(content);

// Extract all sections and save to files
const sections = parser.extractAllSections(tree, 2);

for (let i = 0; i < sections.length; i++) {
  const section = sections[i];
  const filename = `section-${i + 1}.md`;
  const markdown = await parser.stringify(section.tree);
  await fs.writeFile(filename, markdown);
}

🎯 Use Cases

📖 Documentation Management - Split large docs into manageable sections
🌐 Static Site Generation - Process markdown for blogs and websites
📝 Content Organization - Restructure and reorganize markdown content
🔍 Content Analysis - Analyze document structure and extract insights
📋 Documentation Tools - Build custom documentation processing tools
🚀 Content Migration - Extract and transform content between formats

🏗️ API Reference

MarkdownTreeParser

Constructor

new MarkdownTreeParser(options = {})

Methods

parse(markdown) - Parse markdown into AST
stringify(tree) - Convert AST back to markdown
extractSection(tree, headingText, level?) - Extract specific section
extractAllSections(tree, level) - Extract all sections at level
select(tree, selector) - Find first node matching CSS selector
selectAll(tree, selector) - Find all nodes matching CSS selector
findNode(tree, condition) - Find node with custom condition
getHeadingText(headingNode) - Get text content of heading
getHeadingsList(tree) - Get all headings with metadata
getStats(tree) - Get document statistics
generateTableOfContents(tree, maxLevel) - Generate TOC
transform(tree, visitor) - Transform tree with visitor function

Convenience Functions

createParser(options) - Create new parser instance
extractSection(markdown, sectionName, options) - Quick section extraction
getHeadings(markdown, options) - Quick heading extraction
generateTOC(markdown, maxLevel, options) - Quick TOC generation

🔗 CSS-Like Selectors

The library supports powerful CSS-like selectors for searching:

// Element selectors
parser.selectAll(tree, 'heading')     // All headings
parser.selectAll(tree, 'paragraph')  // All paragraphs
parser.selectAll(tree, 'link')       // All links

// Attribute selectors
parser.selectAll(tree, 'heading[depth=1]')    // H1 headings
parser.selectAll(tree, 'heading[depth=2]')    // H2 headings
parser.selectAll(tree, 'link[url*="github"]') // Links containing "github"

// Pseudo selectors
parser.selectAll(tree, ':first-child')  // First child elements
parser.selectAll(tree, ':last-child')   // Last child elements

🧪 Testing

# Run tests
npm test

# Test CLI
npm run test:cli

# Run examples
npm run example

🔧 Development

Prerequisites

Node.js 18+
npm

Setup

# Clone the repository
git clone https://github.com/ksylvan/markdown-tree-parser.git
cd markdown-tree-parser

# Install dependencies
npm install

# Run tests
npm test

# Run linting
npm run lint

# Format code
npm run format

# Test CLI functionality
npm run test:cli

CI/CD

This project uses GitHub Actions for continuous integration. The workflow automatically:

Tests against Node.js versions 18.x, 20.x, and 22.x
Runs linting with ESLint
Executes the full test suite
Tests CLI functionality
Verifies the package can be published

The CI badge in the README shows the current build status and links to the Actions page.

🤝 Contributing

Contributions are welcome! Please read our Contributing Guide for details.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built on top of the excellent unified ecosystem:

remark - Markdown processing
mdast - Markdown AST specification
unist - Universal syntax tree utilities

📞 Support

Made with ❤️ by Kayvan Sylvan

@kayvan/markdown-tree-parser