mongo-checker

1.0.0 • Public • Published

Discord MIT License

mongoChecker

CLI tool for searching duplicate values in a MongoDB collection by a chosen field.

Features

  • Finds duplicates by any user-defined field ("createdAt", "text", "price", etc).
  • Dates are formatted as ISO. Arrays and objects are output via JSON.stringify.
  • Works with a configuration file containing: uri, db, collection, field, allowDiskUse, maxDuplicatesToShow.
  • Informative logs:

In the screenshot, an example of checking the collection "posts" (10,000,000 documents) by field "createdAt". Documents created in turboMaker, with value - timeStepMs: 0.

How it works

Aggregation pipeline:

const duplicates = await coll.aggregate(
        [
          { $group: { _id: `$${field}`, count: { $sum: 1 } } },
          { $match: { count: { $gt: 1 } } },
          { $sort: { count: -1 } }
        ],
        { allowDiskUse }
      ).toArray();

_id in the group stage → the field value (date/string/number/object/array).

Output formatting:
Date → ISO string
Array/Object → JSON.stringify
Other → String(value)

Installation & Usage

  1. Install the package:
npm i mongo-checker
  1. Add a script in your package.json:
"scripts": {
  "mongoChecker": "mongo-checker"
}
  1. In the root of the project, create a file - mongo-checker.config.js.

Example of file contents:

export default {
  uri: "mongodb://127.0.0.1:27017",
  db: "crystalTest",
  collection: "posts",
  field: "createdAt",
  allowDiskUse: true,
  maxDuplicatesToShow: 5
};

⚠️ All parameters are required — if any is missing, the tool will throw an error.

  1. Run from the project root:
npm run mongoChecker

Config parameters explained

allowDiskUse: true

MongoDB is allowed to use temporary disk space for intermediate data.

When to enable:

  • With a small amount of RAM.
  • For large collections (tens of millions of documents or more), to avoid out-of-memory errors.

Drawbacks:

  • Disk is slower than RAM → query execution can be significantly slower.
  • If the disk is heavily used, other operations may slow down as well.

allowDiskUse: false

MongoDB processes data only in RAM.

  • For small collections (up to ~1M documents), this is usually faster.
  • For huge collections, the operation may fail with an out-of-memory error.

In-memory operations are often much faster than disk-based ones - allowDiskUse: true.

maxDuplicatesToShow

Limits the maximum number of duplicate values displayed in the output.

An example of mongoChecker in operation:

CRYSTAL v1.0 features

SHEDOV.TOP | CRYSTAL | Discord | Telegram | X | VK | VK Video | YouTube

Package Sidebar

Install

npm i mongo-checker

Homepage

shedov.top

Weekly Downloads

850

Version

1.0.0

License

MIT

Unpacked Size

10.4 kB

Total Files

5

Last publish

Collaborators

  • andrewshedov