Kakoune is pretty flexible about integrating with other tools; as long as it can read from stdin and write to stdout, you can probably use it to write a Kakoune plugin. Because a lot of plugins need to send commands to Kakoune to execute, it uses a somewhat unusual quoting scheme designed to be easy to implement in any language:
- replace all instances of an apostrophe with two apostrophes
- wrap the whole thing in a pair of apostrophes
For example, the :info
command takes exactly one argument, the string to display. If you want to display the name of They Might Be Giantsâ first single, you have to make it a single string and escape the apostrophes, like this:
:info 'Don''t Let''s Start'
However, although Kakoune is designed to integrate with most languages, the only language guaranteed to be present is POSIX shell. And although Kakouneâs quoting system is designed to be easy to implement in most languages, POSIX shell is surprisingly deficient at string processing, especially for a language that is all about banging strings together. So an interesting question is, whatâs the best way to do Kakoune-quoting in pure POSIX shell? Or since âbestâ is difficult to define, at least whatâs the fastest?
I originally started looking into this in issue 3340, where I came up with a few different approaches â a short one based on sed, a longer one based on the shell slicing operators, and some variants like âis backslash-quoting in a shell-script slower or faster than double-quote-quotingâ. I also came up with a benchmarking harness that executed each implementation continuously for 5 seconds, and counted the number of iterations (which means that you can run a fast implementation a statistically significant number of times, without a slow implementation making the tests take forever).
The fastest implementation I was able to come up with was this one, which I called single_builtin_quoter_no_backslashes_ntmp
â âsingleâ means it only quotes a single value, not each argument individually; âbuiltin_quoterâ means it uses only POSIX shell constructs, âno_backslashesâ means it uses double-quotes to protect apostrophes, and I forget what ântmpâ means.
single_builtin_quoter_no_backslashes_ntmp() {
text="$*"
printf "'"
while true; do
case "$text" in
*"'"*)
printf "%s''" "${text%%"'"*}"
text=${text#*"'"}
;;
*)
printf "%s' " "$text"
break
;;
esac
done
}
However, yesterday somebody named âarachsysâ on GitHub showed up with more implementations, the best of which according to my benchmark is this:
arachsys_single_builtin_quoter_2() {
set -- "$1" ""
while [ "${1#*\'}" != "$1" ]; do
set -- "${1#*\'}" "$2${1%%\'*}''"
done
printf "'%s' \n" "$2$1"
}
Rather than a case
statement, it compares the string to a sliced version of itself, which I assumed would make for some kind of O(n²) behaviour as it worked through the string, but at least in this microbenchmark itâs really fast.
They also pointed out another implementation approach: if youâre using a machine where /bin/sh
is more capable than basic POSIX shell, such as bash
or zsh
, you might have a built-in search-and-replace operation:
arachsys_single_bash_quoter() {
set -- "$*"
printf "'%s' \n" "${1//\'/\'\'}"
}
That is a lot shorter, and does all the hard work in C, so it should be really fast. But letâs benchmark it, just to check. I ran the benchmarks on my ancient netbook, because performance matters more there than on the latest Ryzen desktop. Iâm running up-to-date Debian Testing, with dash 0.5.12-9 and bash 5.2.32-1+b2. Using the testing framework I linked above, and choosing the best-of-three trials for each implementation:
Implementation | dash | bash |
---|---|---|
single_builtin_quoter_no_backslashes_ntmp | 78,804 | 18,498 |
arachsys_single_builtin_quoter_2 | 89,146 | 25,242 |
arachsys_single_bash_quoter | â | 41,545 |
If you know your script is going to run in bash, you should use the bash-specific quoting mechanism: Itâs simple and compact and nearly twice as fast as anything else. If you donât know youâll be running in bash, you should consider arachsysâ plain POSIX implementation, itâs currently the fastest quoting implementation I know of.