A simple recipe ingredient and instruction parser that avoids regexes as much as possible.
Install package from npmjs.com:
npm install @jlucaspains/sharp-recipe-parser
# or
yarn add @jlucaspains/sharp-recipe-parser
Then:
import { parseIngredient, parseInstruction } from '@jlucaspains/sharp-recipe-parser';
// with default options
parseIngredient('300g flour', 'en');
// results in
// {
// quantity: 300,
// quantityText: '300',
// minQuantity: 300,
// maxQuantity: 300,
// unit: 'gram',
// unitText: 'g',
// ingredient: 'flour',
// extra: '',
// alternativeQuantities: []
// }
// with explicit options
parseIngredient('300g flour, very fine', 'en', { includeAlternativeUnits: true, includeExtra: true});
// results in
// {
// quantity: 300,
// quantityText: '300',
// minQuantity: 300,
// maxQuantity: 300,
// unit: 'gram',
// unitText: 'g',
// ingredient: 'flour',
// extra: 'very fine',
// alternativeQuantities: [
// {
// quantity: 0.6614,
// unit: 'lb',
// unitText: 'pound',
// minQuantity: 0.6614,
// maxQuantity: 0.6614
// },
// {
// quantity: 0.3,
// unit: 'kg',
// unitText: 'kilogram',
// minQuantity: 0.3,
// maxQuantity: 0.3
// },
// {
// quantity: 10.5822,
// unit: 'oz',
// unitText: 'ounce',
// minQuantity: 10.5822,
// maxQuantity: 10.5822
// },
// {
// quantity: 300000,
// unit: 'mg',
// unitText: 'milligram',
// minQuantity: 300000,
// maxQuantity: 300000
// }
// ]
// }
// with default options
parseInstruction('Bake at 400F for 30 minutes.');
// results in
// {
// totalTimeInSeconds: 1800,
// timeItems: [ { timeInSeconds: 1800, timeUnitText: 'minutes', timeText: '30' } ],
// temperature: 400,
// temperatureText: '400',
// temperatureUnit: 'fahrenheit',
// temperatureUnitText: 'F'
// }
// with explicit options
parseInstruction('Bake at 400F for 30 minutes.', { includeAlternativeTemperatureUnit: true });
// {
// totalTimeInSeconds: 1800,
// timeItems: [ { timeInSeconds: 1800, timeUnitText: 'minutes', timeText: '30' } ],
// temperature: 400,
// temperatureText: '400',
// temperatureUnit: 'fahrenheit',
// temperatureUnitText: 'F',
// alternativeTemperatures: [
// {
// quantity: 204.4444,
// unit: 'C',
// minQuantity: 204.4444,
// maxQuantity: 204.4444
// }
// ]
// }
sharp-recipe-parser uses a simple technique that preserves words and punctuation in order to tokenize the ingredient and instruction phrases. After tokenization, rules developed specifically for recipe parsing are executed like so:
- Look through the tokens for numbers (e.g. 1, 10, 1.5, 1/2, 1 1/4, one, etc)
- Fractions are parsed using fraction.js
- Word numbers (e.g. one, two, etc) are lookup in a language specific dictionary
- Ranges for min and max are determined by markers defined in a language specific dictionary (e.g. -, to)
- If no numbers are found, reset the index so next step starts at token 0
- Assume the next word is a UOM, singularize the word, lookup in language specific UOM dictionary
- Assume the next words up to a comma is the ingredient description
- Anything after the comma is an extra
- Identify the quantity from whole numbers (2), decimals (1.5), fractions (1/2), Unicode fractions (½), composite fractions (1 1/2), and ranges (1-2)
- Identify 58 notations of english language UOMs plus appropriate plural words (e.g. cup, cups, g, gram, grams, etc). See all UOMs in
units.en.ts
in source code. - Calculate alternative quantity UOMs
- Identify the ingredient
- Note that parenthesis are ignored so 1 cup (150g) flour will only identify flour as the ingredient
- Automatically removes prepositions from ingredients (e.g. 10g of flour; only flour is identified as ingredient)
- Identify extra instructions (e.g. 1 cup of carrots, cut small; cut small becomes extra)
- Identify instances of time units in minutes, hours, and days
- "Bake for 30 minutes"
- "Rise for 2 hours"
- "Wait 3 days"
- Identify the temperature in Farenheit or Celcius.
- "180C"
- "350F"
- "180°C"
- "180 degree celsius"
- By default regex is not allowed. If absolutely necessary, they will be reviewed in a case-by-case basis
- All changes need to have appropriate translation in the same PR
- Open an issue describing the problem you are trying to fix before opening a PR. That should help ensure all PRs are reviewed and approved.
- Please be nice. This is a work of love, not money.
-
Why not use AI? I've tried a few models such as the New York Times Ingredient Phrase Tagger but nothing yielded the results in a satisfactory way. Mostly, they introduced dependencies that were less than ideal.
-
Where is this used? The library was developed side-by-side with Sharp Cooking. As soon as version 1 of the library is released, I expected Sharp Cooking will leverage it in PROD as well.
-
Why not regex? Regex quickly becomes clunky and hard to understand at a glance. There are reasonably simple rules that can be followed to parse and understand a recipe using a tokenizer and dictionaries to lookup specific data.