doc-link-checker
TypeScript icon, indicating that this package has built-in type declarations

1.1.0 • Public • Published

Doc Link Checker

CI

Doc Link Checker is designed to verify links in your documentation. Primarily, this is targeted at verifying internal (relative) references, to ensure broken links are detected early.

At the moment the detection is limited to links and definitions in Markdown files only. Future support for images and link references is planned, as well as reStructured Text support. Please see the ideas list for a full list of currently planned features.

Doc Link Checker is 100% native Typescript.

If you just want to use this as a CLI tool, I recommend taking a look at doc-link-checker-cli.

Install

With yarn:

$ yarn add doc-link-chcker

Or with npm:

$ npm add doc-link-checker

Usage

There are two parts to this package - Scanning and Checking.

Scanning

Scanning involves searching through a selection of files for any kind of link. In this context, a link is defined as any kind of reference to another part of the same document or any other document.

import { scanFiles, ScanOptions } from "doc-link-checker";

// You don't have to pass all options (hence the use of Partial).
// These are all the defaults, which you can override as desired.
const options: Partial<ScanOptions> = {
  basePath: process.cwd(),
  mdType: "commonmark",
  mdFileExts: new Set([".md", ".mdown", ".markdown"]),
  caseSensitive: false,
  globConcurrency: 0,
};

const scan = scanFiles(
  // An array of globs for files that should be scanned
  ["**/*.md"],
  // An array of globs for files that would be matched by include globs,
  // but that should actually be excluded
  ["path/to/*.md"],
  options,
);

// The return value of scanFiles() is an async generator.
// It will only include results for files that it thinks are Markdown files,
// even if the supplied globs match other files.
for await (const result of scan) {
  // Each result object contains two items.
  // The first is a VFile object for the parsed Markdown file.
  console.log(result.file.path);

  // The second is a generator for links found in the parsed document.
  for (const link of links) {
    // Links have three properties.
    // If the link is a valid URL with a protocol, it will include an actual URL object.
    // If it isn't, link.url will be null.
    console.log(link.url.origin);

    // The href is the actual contents of the link in the raw document.
    console.log(link.href);

    // The position contains information about where in the document the link appears.
    // This is useful for linting tools wishing to provide feedback to users.
    // If no position could be determined, link.position will be null. Each position
    // object can include both start and end references with line and column numbers.
    console.log(link.position.start.line);
    console.log(link.position.end.column);
  }
}

If your documents are targeting Github Flavoured Markdown, you should supply gfm for the mdType option. This impacts how the Markdown files are parsed.

The mdFileExts is used to determine which files are actually Markdown files. Only files with these extensions will be yielded by the scanner, even if other files match the supplied globs. Files with no extension will never be yielded.

The following options map directly to options supported by the fast-glob package, which is used under the hood to find files:

  • caseSensitive
  • globConcurrency

Checking

Checking links happens on a per-file basis.

Right now, there is no support for checking links that are URLs. These will automatically be skipped.

import {
  verifyLinks,
  VerifyLinksOptions,
  FileCheckResponse,
  AnchorCheckResponse,
} from "doc-link-checker";

import { read } from "to-vfile";

const basePath = process.cwd();
// Normally the VFile object and the links iterable would be obtained
// directly from the scanner, rather than retrieving these ourselves.
const readme = await read("README.md");
// VFile objects passed to verifyLinks() must be relative to basePath,
// so we fix it up here manually. This isn't a problem when obtaining
// results from the scanner.
readme.path = "README.md";
const links = [
  // Positions omitted for brevity.
  { href: "docs/intro.md", url: null, position: {...} },
  { href: "docs/advanced.md", url: null, position: {...} },
];

// You don't have to pass all options (hence the use of Partial).
// These are all the defaults, which you can override as desired.
const options: Partial<VerifyLinksOptions> = {
  mdType: "commonmark",
  mdFileExts: new Set([".md", ".mdown", ".markdown"]),
};

const verify = verifyLinks(basePath, file, links, options);

// The return value of verifyLinks() is an async generator. It
// will only include results for actual errors. If there are no
// errors, there will be no results.
for await (const verifyError of verify) {
  // There are two types of errors - those that relate to filenames,
  // and those that relate to anchors.
  if (verifyError.errorType === "file") {
    console.log("error matching filename in link");
    console.log(verifyError.link.href);

    // You can optionally match against error codes, to provide fine-grained
    // feedback to the user. All error codes are described in detail below.
    if (verifyError.errorCode === FileCheckResponse.FILE_OUTSIDE_BASE) {
      console.log("must not target files outside the repository");
    }
  } else if (verifyError.errorType === "anchor") {
    console.log("error matching anchor in link");
    console.log(verifyError.link.href);

    // Errors related to anchors have a different set of error codes. All
    // error codes are described in detail below.
    if (verifyError.errorCode === AnchorCheckResponse.BINARY_FILE) {
      console.log("cannot target binary files with anchors");
    }
  }
}

The Markdown-related options have the same meaning as they would for scanning.

The mdFileExts option is also used to control what are valid anchors in links:

  • Links targeting documents can only use heading anchors
  • Links targeting binary files cannot have anchors
  • Links targeting non-document text files can only have valid line number targets

Error codes

There are two types of errors that can be returned by the checker:

  • File errors
  • Anchor errors
File errors

A file error indicates there was a problem location the file referenced in the link.

1 - file doesn't exist

The file targeted by a link does not exist.

2 - file exists outside base directory

The file targeted by a link exists, but is outside of the base directory (basePath). This is likely a sign of a mistake.

3 - convert to pure anchor

The file targeted by the link is the file which contains the link. In other words, it points to itself. The link should be converted to a pure anchor.

Anchor errors

An anchor error indicates the file referenced in the link exists, but there is a problem with the heading or line number referred to after the # in the link.

0 - empty anchor

The link includes a #, but there's nothing after the #.

1 - binary file

The link is targeting a binary file, which means there is no useful way to target individual sections of the file with an anchor.

2 - anchor undiscoverable

The link is targeting a file with no file extension, which means it cannot easily be determined what the file is or if it contains valid anchor targets.

3 - no anchors in filetype

The link is targeting a non-document file with an anchor that isn't supported for non-document files.

5 - heading match fail

The link is targeting a document file, and the anchor points to a heading that doesn't exist.

7 - line target fail

The link is targeting a non-document text file, and the anchor is a line number reference that doesn't exist. Either it points to a single line whose line number is greater than the number of lines in the target file, or it points to a multi-line range whose end line number is greater than the number of lines in the target file.

8 - line target is invalid

The link is targeting a non-document text file, and the anchor looks like a line number reference, but isn't a valid line number reference.

This is a specialised case of error code 3, designed to help with debugging for end users.

9 - multi-line target range is invalid

The link is targeting a non-document text file, and the anchor is a multi-line range. The start number of the range is greater than or equal to the end number of the range.

Putting it all together

It's rare that you would want to use the scanner and the checker separately. Here's an example of how to use the two together.

import { scanFiles, verifyFiles } from "doc-link-checker";

const scanOptions = { basePath: "/path/to/documents" };

let foundAnyError = false;
const scan = scanFiles(["**/*.md"], [], scanOptions);
for await (const result of scan) {
  const verify = verifyLinks(scanOptions.basePath, result.file, result.links);
  for await (const verifyError of verify) {
    if (verifyError.errorType === "anchor") {
      // We don't care about anchor errors for some reason.
      continue;
    }

    foundAnyError = true;
    console.log(`file ${result.file.path} has invalid link ${verifyError.link.href}`);
  }
}
if (foundAnyError) {
  process.exit(1);
}

Other interfaces

Defaults

The default values for options related to Markdown in scanFiles and verifyLinks can be imported from the package, should you wish to use them in your code.

import { mdDefaultType, mdDefaultFileExts } from "doc-link-checker";

Backwards compatibility

This project aims to follow semantic versioning.

The only public interface for this package is what can be imported directly from the package's main file. Nested imports are not supported, and the internal organisation of the code could change at any time.

What the checker reports as an error may change with minor version bumps. The maintainers endeavour to ensure it will not change with patch version bumps, except where there are genuine bugs or regressions in behaviour.

Development

Typescript

The code is written in Typescript. You can check that the code compiles successfully by running tsc like so:

$ yarn run build

Linting

The tool xo is used for linting the code. This wraps eslint and prettier with a strict set of default rules. You can run xo like so:

$ yarn run lint

Tests

The tests are written using mocha and chai. You can run them like so:

$ yarn run test

Debugging

You can get the full tree of Markdown nodes used when scanning files quickly and easily by running the following command:

$ yarn run md-tree path/to/file.md

Contributing

All contributions are welcome! Please make sure that any code changes are accompanied by updated tests. I also recommend running prettier before committing, like so:

$ yarn run reformat

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3 of the License.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Package Sidebar

Install

npm i doc-link-checker

Weekly Downloads

18

Version

1.1.0

License

GPLv3

Unpacked Size

100 kB

Total Files

53

Last publish

Collaborators

  • djmattyg007