Advanced Usage
Rules
Comby includes a small rule language that you can use to perform additional
operations for matches and rewrites. Rules start with the word where
, and can
perform equality checks, rewriting, or
nested pattern matching.
Equality
A rule can check whether two variables are syntactically equal. For example, we can check for duplicate expressions in if-conditions with the following match template and rule:
if (:[left_side] && :[right_side])
where :[left_side] == :[right_side]
This matches code where the programmer perhaps made a mistake and duplicated an
expression without changing a variable like x
to y
:
if (x == 500 && x == 500)
playground ↗
You can use the !=
operator to check inequality. Multiple conditions can be
separated by a comma, and mean "logical and". The following adds a condition to
ignore our match case above:
where :[left_side] == :[right_side], :[left_side] != "x == 500"
playground ↗
Variables can be compared to other variables or string contents (enclosed by double quotes).
Rewrite expressions
A rewrite { ... }
expression can rewrite syntax captured in a hole. This is
useful for rewriting repetitions of a pattern. This example converts arguments
of a dict to a JSON-like format, where dict(foo=bar,baz=qux)
becomes
{"foo": bar, "baz": qux}
:
dict(:[args])
=> {:[args]}
where rewrite :[args] { ":[[k]]=:[[v]]" -> "\":[k]\": :[v]" }
playground ↗
The pattern rewrites every matching instance of :[[k]]=:[[v]] to ":[k]": :[v]
.
The contents of the :[args]
hole are overwritten if the rewrite pattern fires.
Note that the left and right hand sides inside the { ... }
need enclosing
string quotes. This means that our pattern needs to escape the double quotes on
the right hand side.
Conceptually, a rewrite rule works the same way as a toplevel match and rewrite template, but only for a particular hole, and has the effect of overwriting the hole contents when there are substitutions.
It is possible to have sequences of rewrite expressions in a rule. Here a second
rewrite expression adds quotes around :[v]
:
where
rewrite :[args] { ":[[k]]=:[[v]]" -> "\":[k]\": :[v]" },
rewrite :[args] { ": :[[v]]" -> ": \":[v]\"" }
playground ↗
The rewrite expressions are evaluated in a left-to-right sequence and overwrite
:[args]
in every case where expressions succeed. Rewrite expressions always
return true, even if they don’t succeed in rewriting a pattern. What this means
for the example above is that the first rewrite expression will be attempted on
:[args]
. Even if it does not succeed in rewriting any patterns, the second
rewrite expression will also be attempted. If neither rewrite expression change
the contents of :[args]
, it remains unchanged in the output of the toplevel
rewrite template.
It is not currently possible to nest rewrite statements.
Pattern Match expressions
Pattern match expressions are in active development and may change slightly in meaning or syntax, but are currently available to use or experiment with.
Here is an example using the nested matching syntax:
where match :[left_side] {
| "x == 600" -> false
| "x == 500" -> true
}
The match { ... }
says to match the text bound to :[left_side]
against each
of the match cases | match_case, and to perform the filter on the right-hand
side of the ->
when the pattern matches. Nested matching statements can nest:
where match :[left_side] {
| "x == 500" ->
match :[right_side] {
| "x == 500" -> true
| "x == 600" -> false
}
| "x == 600" -> false
}
Submatching with regular expressions
Use regular expressions on extracted contents using match patterns like so:
where match :[hole] {
| ":[_~\\d+]" -> true
| ":[_]" -> false
}
Note that patterns are quoted, so \
needs to be escaped for regex classes like \d
.
Custom language definitions
Hopefully the language you’re interested is already supported or works with the generic matcher. If you have your own DSL or data format, you can define a small language definition for it in a simple JSON file, and pass it as a custom matcher. Just define the following supported language constructs in JSON, like this:
{
"user_defined_delimiters":[
[
"case",
"esac"
]
],
"escapable_string_literals":{
"delimiters":[
"\""
],
"escape_character":"\\"
},
"raw_string_literals": [],
"comments":[
[
"Multiline",
"/*",
"*/"
],
[
"Until_newline",
"//"
]
]
}
Put the contents above in a JSON file, like my-language.json
, and then specify
your file with the -custom-matcher
flag. Here’s how to run the custom language
rewrite on all files with the extension .newlang
:
comby -custom-matcher my-language.json 'match...' 'rewrite...' .newlang
If you want your missing language to be built into Comby, open a feature request, or have a look at the languages file which can be modified for additional languages.
Note that languages can currently be added and expanded with respect to syntactic code structures that Comby recognizes: balanced delimiters, comments, and kinds of string literals. By design, it currently isn’t possible to further refine the meaning of syntax into keywords or high-level structures like functions.
Custom comby metasyntax
You can change comby
's internal syntax, like :[var]
for variables, to be
something else, like $var
. This is done by defining a JSON file for your
metasyntax. Below is the JSON definition for the default syntax. You can edit it
in a file like mine.json
, and then invoke comby
like this:
comby -custom-metasynax mine.json ...
{
"syntax": [
// :[var]
[ "Hole", [ "Everything" ], [ "Delimited", ":[", "]" ] ],
// :[var:e]
[ "Hole", [ "Expression" ], [ "Delimited", ":[", ":e]" ] ],
// :[[var]]
[ "Hole", [ "Alphanum" ], [ "Delimited", ":[[", "]]" ] ],
// :[var.]
[ "Hole", [ "Non_space" ], [ "Delimited", ":[", ".]" ] ],
// :[var\n]
[ "Hole", [ "Line" ], [ "Delimited", ":[", "\\n]" ] ],
// :[ var]
[ "Hole", [ "Blank" ], [ "Delimited", ":[ ", "]" ] ],
// :[var~regex]
[ "Regex", ":[", "~", "]" ],
// String aliases for the "Everything" hole.
[ "Hole", [ "Everything" ],
[ "Reserved_identifiers",
[ "Γ", "Δ", "Θ", "Λ", "Ξ", "Π", "Σ", "Φ", "Ψ", "Ω" ]
]
],
// String aliases for the "Expression" hole.
[ "Hole", [ "Expression" ],
[ "Reserved_identifiers",
[
"α", "β", "γ", "δ", "ε", "ζ", "η", "θ", "ι", "κ", "λ",
"μ", "ξ", "π", "ρ", "ς", "σ", "τ", "υ", "φ", "χ", "ψ",
"ω"
]
]
]
],
// characters allowed for hole names like "var"
"identifier":
"abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_"
}
Here’s what the fields do, and ways that you can customize the syntax.
Custom syntax is only partially supported in rules. If you make use of rules, it’s advised to use the default syntax until full support is available.
Custom hole delimiters
The line
[ "Hole", [ "Everything" ], [ "Delimited", ":[", "]" ] ],
Makes it so that the hole that corresponds to matching “Everything” has a left
delimiter :[
and a right delimiter ]
. The “Everything” kind hole corresponds to
the behavior of :[var]
in the first row of the syntax reference.
To change the syntax so that $var$
does this matching instead, redefine it like this:
[ "Hole", [ "Everything" ], [ "Delimited", "$", "$" ] ]
If you only want the prefix, like $var
, then you can make the right delimiter null
:
[ "Hole", [ "Everything" ], [ "Delimited", "$", null ] ]
You can also make the left delimiter null, if you want var$
instead:
[ "Hole", [ "Everything" ], [ "Delimited", null, "$" ] ]
You can define multiple of these definitions for the same kind. You can also omit any that you don’t want.
Custom regex hole
The definition for embedding regular expressions is slightly different, because
comby
needs some kind of syntax to understand when to stop parsing a regular
expression pattern. The current format is supported with a definition like this:
[ "Regex", ":[", "~", "]" ]
You must define both the left and right delimiters, :[
and ]
respectively,
as well as a character separator like ~
. Regular expression patterns will
then be recognized in your template as :[
var
~
regex-pattern
]
. It’s
not currently possible to otherwise change the structure of this syntax.
The order of definitions are significant if syntax overlaps. For example, in the default syntax,
:[var]
and:[var~regex]
overlap in the prefix:[var
. To ensure things work correctly, define the short and simple syntax first, followed by longer variations.
Custom identifier characters
By default, comby
accepts any alphanumeric string, and _
as valid
identifiers for a hole like :[var]
. By changing the identifiers
field in the
JSON, you can restrict or grow the set of allowed characters. For example, to allow only
capital letters for holes, like $VAR
, define the set as follows:
"identifier": "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Now, if comby
scans sees a pattern in your match template like $var
, it will
not think it is a variable, and instead it will match $var
literally in the
input program.
The
_
character reserves special meaning for matching when it is included in the identifier set.
In general, using the same variable twice like (:[x], :[x])
means that both
:[x]
must match the same thing. The exception is when the variable is :[_]
,
where (:[_], :[_])
may or may not match the same thing–the _
makes the hole
a wildcard matcher. This behavior cannot currently be customized.
Custom hole aliases
Hole aliases are reserved strings and can be optionally defined. When hole
aliases are defined, comby
will treat the reserved strings as variables, and
assign any matching content to that variable name.
[ "Hole", [ "Expression" ], [ "Reserved_identifiers", [ "α", "αα", "β", "ββ" ] ] ]
This way, you can simply match the input and substitute values by using a
variable α
, and avoid any syntax conflicts. You can even use emojis if you like.
If you want to define only
Reserved_identifiers
for holes, you’ll currently need to add a placeholder for at least oneDelimited
definition, like[ "Everything" ], [ "Delimited", "ignore-me", null ]
It’s not currently possible to associate custom syntax with other matching behaviors, or alias syntax to a regular expression.
Substituting fresh identifiers
Rewrite templates may contain the syntax :[id()]
which generates a random
alphanumeric identifier in the output. This is useful, for example,
when creating fresh variable names during a refactor.
var a_:[id()] = 42
var :[left] = :[[right]] + 1
playground ↗
To reference the same identifier in multiple places in the template, simply supply a label to id
, like so:
anon_:[id(my_label)] = func(){:[body]}
anon_:[id(my_label)]()
playground ↗