comby-reducer
A program and data format reducer for arbitrary language syntax. Produces human comprehensible output. Define declarative transformations with ease.
Quick start
Install the comby-reducer
binary on your path with
npm:
npm install -g @comby-tools/comby-reducer
Invoke it like this:
comby-reducer <file-to-reduce> --transforms ./transforms -- <crashing-program> @@
To feed the file input to <crasing-=program>
via stdin
, invoke it like this instead:
comby-reducer <file-to-reduce> --stdin --transforms ./transforms -- <crashing-program>
Alternative local install
Install comby-reducer
in a local directory at
./node_modules/.bin/comby-reducer
. If you see some warnings just ignore them.
npm install @comby-tools/comby-reducer
Example
Let's say you just ran a program that crashed a compiler and want to find a
smaller example program that triggers the same crash. We'll simulate how to
find a smaller example program with comby-reducer
.
Step 1. Clone the repository
git clone https://github.com/comby-tools/comby-reducer
In example/program.c
you'll find the program we'll reduce:
#include<stdio.h>
#include<string.h>
int main(int argc, char **argv) {
if (argv[1]) {
printf("I can't believe it's not butter");
}
// But I want to believe it's not butter...
memset(NULL, 1, 1);
}
The memset
statement causes a crash when we run this program. There's some
junk in there that we don't need to trigger the crash. Let's get started.
Step 2. Go into the example
directory
cd example
Next, we'll use a "pretend compiler" that crashes when it "compiles" our program (in reality, our "compiler" crashes when it runs a valid C program, not when actually compiling it, but we'll suspend our greater knowledge for now).
Step 3: Run this command to crash the compiler
./compiler.sh program.c
You'll see something like this at the end: ./compiler.sh: line 7: 41936 Segmentation fault: 11 ./program
Step 4: Reduce the program
comby-reducer program.c --file /tmp/in.c --lang .c --transforms ../transforms -- ./compiler.sh @@
You should see:
[+] Loaded 22 transformation rules
[+] Did pass 0 pass
[+] Did pass 1 pass
[+] Did pass 2 pass
[+] Did pass 3 pass
[+] Result:
#include<string.h>
void main() {
memset(NULL, 1, 1);
}
Nice, our program is smaller! comby-reduce
found that a smaller valid
program keeps crashing our "compiler", but without the cruft.
Let's break down the command invocation:
-
The part after
--
is the command we want to run that causes a crash. In our case,./compiler.sh @@
- The
@@
part is substituted with a file containing a program (likeprogram.c
) - To feed input from
stdin
, remove the@@
and add the--stdin
command line flag.
- The
-
--file /tmp/in.c
says that the@@
we substitute should be named/tmp/in.c
. The.c
extension may matter if our compiler expects a file with a.c
extesion, for example.comby-reducer
will try borrow the extension of the original file but--file
exists to give you control over the file name that your program sees. -
--lang .c
says that the language we want to reduce is C-like.comby-reducer
uses language definitions to parse input according to some language. This matters so that our transforms can accurately match strictly code blocks and avoids bothering with not-actually-code-syntax that come up in comments and strings. This may not be a big deal. You can use--lang .generic
if you have some DSL or smart contract language. Here's the list of specific language parsers. -
--transforms <dir>
loads transform definitions from.toml
files in the specifieddir
(default dir istransforms
). Transforms are specified in a TOML format usingcomby
syntax. See Usage below for more details.
Usage
Transformations
comby-reducer
makes it easy to write rules for transformation using comby syntax.
A handful of defaults are included in
transforms/config.toml
that will probably get you
very far already. Here are some examples.
[empty_paren]
match='(:[1])'
rewrite='()'
rule='where nested'
This transform matches any content between balanced parentheses (including
newlines) and deletes the content. The :[1]
is a variable that can be used in
the rewrite part. By default, comby-reducer
will try to apply this
transformation at the top-level of a file, wherever it sees (...)
. The
rule='where nested'
tells comby-reducer
that it should also attempt to
reduce nested matches of (...)
inside other matched (...)
. In general,
parentheses are a common syntax to nest expressions in programs, so it makes
sense to add rule='where nested'
.
Another transform preserves the first comma-separated element inside parentheses:
[preserve_first_paren_element]
match='(:[1],:[2])'
rewrite='(:[1])'
Program syntax often use call or function-like syntax that comma-separate
parameters or arguments inside parenthes. This transformation attempts to remove
elements in such syntax. This transform doesn't have a rule
part, since it
might not be as fruitful to attempt nested reductions inside of :[1]
or
:[2]
. But, we could easily add it.
A last example uses a special form :[var:e]
which matches "expression-like"
syntax.
[remove_first_expression_for_colon_sep_space]
match=':[1:e], :[2:e]'
rewrite=':[2]'
Expression-like syntax matches contiguous non-whitespace characters like foo
or foo.bar
, as well as contiguous character sequences that include valid code
block structures like balanced parentheses in function(foo, bar)
(notice how
whitespace is allowed inside the parentheses). The transform above will attempt
to remove expression-like syntax between commas, which often separate
expressions inside objects, records, or lists.
More info. You can learn more about the underlying matching engine at comby.dev. You can try out transformations on comby.live to check that a transformation behaves the way you want it to.
Limitations and known issues.
-
Although regular expression matching is possible with
:[...]
syntax incomby
, it's not yet possible to write regular expression holes incomby-reducer
transforms. -
Some inputs may trigger a stack overflow in
node
. Post a GH issue with the input if you run into this.
Tips
Customize crash criteria with scripts
comby-reducer
expects a program to exit with signal 139
or 134
to consider
it a crash. Many programs that crash won't exit with these values, however. For
example, the Solidity compiler exits
with a signal 1
. Even more challenging, the exit signal 1
may mean that the
program crashes, or that the program doesn't compile (and we want the program to
still compile). The exit signal 1
is not a reliable way to know that the
program crashed "for real". What to do?
It'll depend on your program, but you generally want to define some criteria that constitutes a valid crash, and wrap that logic in a script. For Solidity, a valid program that crashes the compiler will emit something like:
Internal compiler error during compilation:
/solidity/libsolidity/ast/Types.h(797): Throw in function virtual std::unique_ptr<ReferenceType>
We can use this information in a script, and exit with the expected crash code
to signal a crash. Here's one I used before, called compile.sh
, that will exit the script with
signal 139
when it sees the Internal compiler error
message:
#!/bin/bash
RESULT=$(~/solidity/build/solc/solc $1 2>&1)
MATCH=$(echo $RESULT | grep -c "Internal compiler error")
if [ $MATCH == 0 ]; then
exit 0 # no match, program doesn't cause expected crash
fi
exit 139
You can get very fancy with your script, and can use it further refine program reduction. For more inspiration read up on interestingness tests covered by C-reduce.
Output
Output the final reduced program by piping the comby-reducer
command to a
file. The final program is sent to stdout
, the informative messages are
printed to stderr
.
Options
Some additional command line flags:
--record
is an optional flag that emits the program at each step of a
successful reduction, in the form <num>.step
, in the current directory. You
can replay the transformations by running comby-reducer-replay
in the current
directory. See more on comby-reducer-replay below.
--lang <extension>
is a flag that determines how the source file is
parsed. Using an extension like .c
or .go
will make comby-reducer
parse
the input according to that language.
click to expand the list of accepted extensions
.s Assembly
.sh Bash
.c C
.cs C#
.css CSS
.dart Dart
.dyck Dyck
.clj Clojure
.elm Elm
.erl Erlang
.ex Elixir
.f Fortran
.fsx F#
.go Go
.html HTML
.hs Haskell
.java Java
.js JavaScript
.jsx JSX
.json JSON
.jsonc JSONC
.gql GraphQL
.dhall Dhall
.jl Julia
.kt Kotlin
.tex LaTeX
.lisp Lisp
.nim Nim
.ml OCaml
.paren Paren
.pas Pascal
.php PHP
.py Python
.re Reason
.rb Ruby
.rs Rust
.scala Scala
.sql SQL
.swift Swift
.txt Text
.ts TypeScript
.tsx TSX
.xml XML
.generic Generic
--transforms <dir>
will use .toml
transform definitions in the specified dir
.
--debug
will emit the reduced program after each step, and the transformation that succeeded to stderr
.
comby-reducer-replay
comby-reducer-replay
is the answer to "How was my program reduced?".
comby-reducer-replay
is installed along with comby-reducer
and should be
available based on how you installed it.
After running comby-reducer
with --record
, simply run comby-reducer-replay
in the current directory, and step through the transformed program at each step
(left and right arrow keys). Try running comby-reducer-replay
inside
replay-example to step through a recording of a previous
crash reduction for a Solidity compiler bug.
By default replays will use git diff
to render changes. To override the
default, a custom diff command can be entered on the command-line like this:
comby-reducer-replay colordiff -y
where colordiff -y
shows a side-by-side colored diff of changes. Underneath
the hood, the .step
files will be appended to the command, like colordiff -y 000.step 001.step
Some sensible default flags are included for common diff tools, which you can explore by entering only the name of the tool and no other extra command line flags:
comby-reducer-replay git # the default
comby-reducer-replay patdiff # an enhanced patience diff tool
comby-reducer-replay colordiff # colordiff, configured to render side-by-side
comby-reducer-replay diff # plain old diff, configured to render side-by-side
I recommend installing patdiff
for
an enhanced viewing experience. patdiff
simply understands diffs a bit
better. To get patdiff
, you'll have to:
-
Install opam with
sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)
-
Run
eval $(opam env)
-
Run
opam install patdiff
And patdiff
should now be available on your path.
Development
- Install
npm
. - Install
npx
.
npm i typescript @types/node minimist @types/minimist @iarna/toml @types/iarna__toml
npx tsc