@fnet/ocr-text-coords

This project is designed to facilitate the extraction of text and their corresponding bounding box coordinates from images, using the Tesseract.js library. It supports various input formats such as file paths, URLs, or Base64 encoded strings and can handle multiple languages.

How It Works

The project uses Tesseract.js to perform Optical Character Recognition (OCR) on a given image. You provide an image through a file path, URL, or Base64 string, along with a language specification. The project then processes the image, identifying text and marking each word's location with a bounding box. This information is returned as a structured object containing both the extracted text and coordinates.

Key Features

Text Extraction: Automatically extracts text from images.
Bounding Box Coordinates: Provides the coordinates for each detected word.
Multiple Input Formats: Works with file paths, URLs, or Base64 image data.
Language Support: Allows recognition in multiple languages by specifying language codes.

Conclusion

This project offers a straightforward way to extract and locate text within images, applicable to a variety of formats and languages. It provides a simple interface to access Tesseract.js capabilities, returning structured results that can be easily integrated into further processing or analysis tasks.

Developer Guide for @fnet/ocr-text-coords

Overview

The @fnet/ocr-text-coords library is designed to help developers extract textual content from images along with their bounding box coordinates. Using Optical Character Recognition (OCR) powered by Tesseract.js, this library supports images provided via file paths, URLs, or Base64 strings. Developers can easily integrate this functionality into applications where text extraction from visual media is required.

Installation

You can install the @fnet/ocr-text-coords library using either npm or yarn:

npm install @fnet/ocr-text-coords

Or with yarn:

yarn add @fnet/ocr-text-coords

Usage

The library provides a simple public function that you can use to extract text from images:

Basic Usage

First, import the library into your project:

import extractTextWithCoords from '@fnet/ocr-text-coords';

You can then use the function to process an image:

(async () => {
  try {
    const result = await extractTextWithCoords({ imageInput: 'path/to/your/image.jpg', language: 'eng' });
    console.log(result.text); // Outputs full extracted text
    console.log(result.words); // Outputs array of words with bounding boxes
  } catch (error) {
    console.error('Error extracting text:', error.message);
  }
})();

Parameters

imageInput: A string representing the image input. This can be a file path, a URL, or a Base64-encoded string.
language (optional): A string specifying the language(s) for OCR. The default is "eng". You can specify multiple languages by joining their codes with a plus sign (e.g., "eng+tur").

Examples

Here are a few practical examples demonstrating typical use cases:

Extracting Text from a Local File

import extractTextWithCoords from '@fnet/ocr-text-coords';

// Process an image from a local file
(async () => {
  const result = await extractTextWithCoords({ imageInput: './local-image.png' });
  console.log(result);
})();

Extracting Text from a URL

import extractTextWithCoords from '@fnet/ocr-text-coords';

// Process an image from a URL
(async () => {
  const result = await extractTextWithCoords({ imageInput: 'https://example.com/image.jpg', language: 'eng+spa' });
  console.log(result);
})();

Extracting Text from a Base64 String

import extractTextWithCoords from '@fnet/ocr-text-coords';

// Process an image from a Base64 string
(async () => {
  const base64String = 'data:image/png;base64,iVBORw0KGgoAAAANS...';
  const result = await extractTextWithCoords({ imageInput: base64String });
  console.log(result);
})();

Acknowledgement

This library uses Tesseract.js for OCR processing. Tesseract.js is an open-source OCR engine supported by an active developer community.

Input Schema

$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
  imageInput:
    type: string
    description: The path, URL, or Base64 string of the image to be processed.
  language:
    type: string
    description: The language code(s) to use for OCR (e.g., "eng", "tur", or "eng+tur").
    default: eng
required:
  - imageInput