I see that in the asciidoc.kak file, the regex of the highlighter for italicized text is:
\b_[^\n]+?_\b
meaning that between the underscore delimiters, the text to be italicized cannot comprise a line return, \n
.
Question: why prohibiting line returns? AFAIK, the asciidoc spec allows italicized (or bold) text to span multiple lines. Was this done for performance reasons (e.g., to avoid scanning a 1GB asciidoc document with a single stray underline)?
Suggestion: what about allowing italicized text over multiple lines, while fixing a reasonable limit to how many characters should be scanned before giving up on a stray underline? For example, with a limit of 3 * 80 = 240 characters (meaning at most 3 lines of 80 chars each), the regex for italicized text would be:
\b_.{1,240}?_\b
This would allow a (long) journal title in italics, for example, to span over 2 or 3 lines.
Maybe for simplicity and to avoid false positive.
Makes sense :-). But, can you think of a strong disadvantage of my proposal? It would allow for italics/bold straddling lines, and would still limit false positives to some extent.
If other regions are correctly defined (such as code blocks) it could be a region like we have for quoted strings in programming languages no?
A simple regex doesn’t look correct to represent italicized text because other formats such as links can be contained in it and regex
highlighter will eat the whole.
Is .
or [^\n]
here really the correct character class? I was under the impression that the syntax is such that _word_
is allowed but _ word _
is not.
1 Like
I have not found a formal reference that would be 100% rigorous on this topic, but the common docs make it intuitively clear that you are right. The current regex in asciidoc.kak is incorrect. More generally, asciidoc.kak is very different from markdown.kak; the latter uses regions, the former does not. Sooner or later somebody will need to improve asciidoc.kak 
1 Like