seta-indexer
TypeScript icon, indicating that this package has built-in type declarations

0.1.1 • Public • Published

seta-indexer

A standalone CLI tool for vector database indexing and semantic search over documentation. Supports PDF, Markdown, text, and more. Powered by local embeddings and LanceDB.

Features

  • Indexes PDF, Markdown, MDX, TXT, JSON, YAML, XML, CSV
  • Intelligent chunking with sentence/paragraph boundary detection
  • Fast local embeddings with all-MiniLM-L6-v2 (via @xenova/transformers)
  • Vector similarity search with filtering (library, topic, difficulty)
  • Incremental updates and file change detection
  • Flexible file selection with --include and --exclude glob patterns

Installation

npm install -g seta-indexer
# or use npx
npx seta-indexer <folder> [options]

Usage

Index a documentation folder

npx seta-indexer /path/to/docs

Local usage (from cloned repo)

# Clone and setup
git clone https://github.com/techformist/seta-indexer.git
cd seta-indexer
npm install

# Build the project
npm run build

# Run locally with node
node dist/cli.js /path/to/docs

# Or use the dev script for development
npm run dev -- index /path/to/docs

Available commands (local usage)

# Index documents
node dist/cli.js index /path/to/docs --verbose

# Search indexed content
node dist/cli.js search "your query" /path/to/docs

# Show database statistics
node dist/cli.js stats /path/to/docs

# Clean/remove database
node dist/cli.js clean /path/to/docs

# Run tests
npm test

With options

  • --verbose, -v : Detailed logging
  • --force : Force re-index all files
  • --chunk-size <size> : Chunk size (default: 1000)
  • --chunk-overlap <overlap> : Overlap (default: 200)
  • --model <model> : Embedding model (default: all-MiniLM-L6-v2)
  • --db-path <path> : Custom DB path
  • --include <patterns...> : Glob patterns to include (e.g. **/*.md docs/**/*.pdf)
  • --exclude <patterns...> : Glob patterns to exclude (e.g. **/drafts/**)

Supported File Types

  • .pdf, .md, .mdx, .txt, .json, .yaml, .yml, .xml, .csv (by default)
  • Use --include/--exclude for custom file selection

Output Examples

Indexing:

🚀 Starting indexing process for: /docs
📁 Documentation path: /docs
🗄️  Database path: /docs/.seta_lancedb
📋 Loading existing index state...
🔍 Scanning documentation files...
📄 Found 25 documentation files
🧠 Initializing embedding model...
🔗 Connecting to LanceDB...
⚙️  Processing documentation files...
   📄 Processing: main_guide.md
   📝 Generated 12 chunks
   ✅ Generated 12 embedded chunks
✅ Indexing completed

Troubleshooting

  • Ensure all dependencies are installed (npm install)
  • For PDF extraction errors, check file integrity
  • For embedding errors, ensure enough RAM and disk space for model caching
  • For DB errors, use --force to re-index from scratch

License

MIT

Package Sidebar

Install

npm i seta-indexer

Weekly Downloads

12

Version

0.1.1

License

MIT

Unpacked Size

27.4 kB

Total Files

13

Last publish

Collaborators

  • techformist