This project is designed to facilitate the extraction of text and their corresponding bounding box coordinates from images, using the Tesseract.js library. It supports various input formats such as file paths, URLs, or Base64 encoded strings and can handle multiple languages.
The project uses Tesseract.js to perform Optical Character Recognition (OCR) on a given image. You provide an image through a file path, URL, or Base64 string, along with a language specification. The project then processes the image, identifying text and marking each word's location with a bounding box. This information is returned as a structured object containing both the extracted text and coordinates.
- Text Extraction: Automatically extracts text from images.
- Bounding Box Coordinates: Provides the coordinates for each detected word.
- Multiple Input Formats: Works with file paths, URLs, or Base64 image data.
- Language Support: Allows recognition in multiple languages by specifying language codes.
This project offers a straightforward way to extract and locate text within images, applicable to a variety of formats and languages. It provides a simple interface to access Tesseract.js capabilities, returning structured results that can be easily integrated into further processing or analysis tasks.
The @fnet/ocr-text-coords
library is designed to help developers extract textual content from images along with their bounding box coordinates. Using Optical Character Recognition (OCR) powered by Tesseract.js, this library supports images provided via file paths, URLs, or Base64 strings. Developers can easily integrate this functionality into applications where text extraction from visual media is required.
You can install the @fnet/ocr-text-coords
library using either npm or yarn:
npm install @fnet/ocr-text-coords
Or with yarn:
yarn add @fnet/ocr-text-coords
The library provides a simple public function that you can use to extract text from images:
First, import the library into your project:
import extractTextWithCoords from '@fnet/ocr-text-coords';
You can then use the function to process an image:
(async () => {
try {
const result = await extractTextWithCoords({ imageInput: 'path/to/your/image.jpg', language: 'eng' });
console.log(result.text); // Outputs full extracted text
console.log(result.words); // Outputs array of words with bounding boxes
} catch (error) {
console.error('Error extracting text:', error.message);
}
})();
-
imageInput
: A string representing the image input. This can be a file path, a URL, or a Base64-encoded string. -
language
(optional): A string specifying the language(s) for OCR. The default is"eng"
. You can specify multiple languages by joining their codes with a plus sign (e.g.,"eng+tur"
).
Here are a few practical examples demonstrating typical use cases:
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a local file
(async () => {
const result = await extractTextWithCoords({ imageInput: './local-image.png' });
console.log(result);
})();
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a URL
(async () => {
const result = await extractTextWithCoords({ imageInput: 'https://example.com/image.jpg', language: 'eng+spa' });
console.log(result);
})();
import extractTextWithCoords from '@fnet/ocr-text-coords';
// Process an image from a Base64 string
(async () => {
const base64String = '...';
const result = await extractTextWithCoords({ imageInput: base64String });
console.log(result);
})();
This library uses Tesseract.js for OCR processing. Tesseract.js is an open-source OCR engine supported by an active developer community.
$schema: https://json-schema.org/draft/2020-12/schema
type: object
properties:
imageInput:
type: string
description: The path, URL, or Base64 string of the image to be processed.
language:
type: string
description: The language code(s) to use for OCR (e.g., "eng", "tur", or "eng+tur").
default: eng
required:
- imageInput