@mimik/dataset-cli

1.0.0 • Public • Published

@mimik/dataset-cli

A CLI tool to parse a PDF file, generate text chunks with embeddings, and save as JSON for Retrieval-Augmented Generation (RAG).

Installation

To install the tool globally from npm, use the following command:

npm install -g @mimik/dataset-cli

Usage

After installing the tool globally, you can use the dataset-cli command to process a PDF file.

Options

• -i, --input : (required) Path to the input PDF file. • -o, --output : (required) Path to the output MDF file. • -u, --url : (optional) Embedding model URL. Default is http://localhost:8083/api/mim/v1/embeddings. • -k, --apiKey : (optional) API key for the embedding model. • -m, --model : (optional) Embedding model name. Default is "nomic-embed-text-v1.5.Q8_0".

Example Command

dataset-cli -i /path/to/your/input.pdf -o /path/to/your/output.mdf -u http://localhost:1234/v1/embeddings -k your-api-key -m your-model-name

Dependents (0)

Package Sidebar

Install

npm i @mimik/dataset-cli

Weekly Downloads

3

Version

1.0.0

License

ISC

Unpacked Size

5.6 kB

Total Files

3

Last publish

Collaborators

  • sasan.raisdana
  • miburger
  • hofachiang
  • mimik-npm-editor
  • mimikopensource