Help solving regex errors for syntax highlighting

I am writing a highlighter for the Koka programming language and I am using the vscode extension for Koka to provide highlighting. So far I have done most of the highlighting but I am running into the same error for a few of the regex’s. I should also mention that I am not the greatest at regex myself.

I am getting a unclosed parenthesis on quite a few of them and I am not exactly sure what is wrong with them here are the regex’s causing issues.

(?:c|cs|js|inline)\s+(?:inline\s+)?(?:(?:file|header-file|header-end-file)\s+)?(?=["{]|r#*")
(finally|initially)\s*(?=->|[{\(])
(>)|(?=[\)\{\}\[\]=;"`]|(infix|infixr|infixl|inline|noinline|fip|fbip|tail|type|co|rec|effect|context|ambient|alias|extern|fn|fun|function|val|var|con|if|then|else|elif|match|inject|mask|named|handle|handler|return|module|import|as|pub|abstract)(?![\w\-?']))

Then this regex has a different problem. I get a Quantifiers cannot be used in lookarounds right at the very end of the regex

(\?(?:[@a-z][\w\-@]*/#?)*)([@a-z][\w\-@]*[']*|\([$%&\*\+@!/\\\\^~=\.:\-?\|<>]+\))(?=\s*[=])

Any help is appreciated.

Your second question is easier to answer, so I’ll answer it first.

I get a “Quantifiers cannot be used in lookarounds” right at the very end of the regex

This is the look-around assertion (this one is specifically a “look-ahead”) at the end of the regex:

(?=\s*[=])

It means “after this point in the match, the following text should match the regex \s*[=]”; that is, zero-or-more (*) whitespace characters (\s) followed by an equal sign (=)¹.

A quantifier tells the regex engine how many matches to expect; the only quantifier in this lookaround is the zero-or-more asterisk, so that’s probably what it’s complaining about. Some regex engines can handle quantifiers in lookarounds, but Kakoune’s regex engine chooses not to do that, so it can use a faster regex-matching algorithm.

In this case, since it’s just for syntax highlighting, I’d probably just move the \s* bit outside the assertion, like this:

\s*(?=[=])

That means that the whitespace will be coloured as part of whatever token this is, but since syntax highlighting generally changes the foreground colour of each token rather than the background colour, that’s probably fine.

For your first question, let’s start with the shortest regex:

(finally|initially)\s*(?=->|[{\(])

If I just paste that into Kakoune, I get the following error:

regex parse error: unclosed parenthesis at '(finally|initially)\s*(?=->|<<<HERE>>>[{\(])'

By deleting different parts of the regex and simplifying it, I can reproduce basically the same error with:

(?=a|b)
regex parse error: unclosed parenthesis at '(?=a|<<<HERE>>>b)'

I think this is another instance of Kakoune having only limited support for look-around assertions. Consulting the documentation in :doc regex zero-width-assertions we find:

For performance reasons, lookaround contents must be a sequence of literals, character classes, or any character (.); quantifiers are not supported.

It doesn’t actually mention the alternation operator as not supported, but it clearly doesn’t work. As we saw with quantifiers above, other unsupported operators get more helpful error messages, I’m not sure why this one is different.

In this case, the solution is not so easy as just moving the offending item outside the look-ahead. In this case, I would probably just try removing the look-ahead entirely, or if that messed up the highlighting too much, split it into two otherwise identical highlighters:

(finally|initially)\s*(?=->)
(finally|initially)\s*(?=[{\(])

It looks like the other two problem regexes also have alternations in look-arounds, so the same things would apply there.

¹: Technically the equal sign is in a character class, but since it’s the only character in the class, it’s effectively the same as just the character on its own)

EDIT: I filed a bug: [BUG] Alternations in lookarounds produce an unhelpful error message · Issue #5203 · mawww/kakoune · GitHub

Thank you this was really helpful.