Scripting Kakoune with external tools

I love the way Kakoune integrates with other scripting languages instead of requiring a particular language, but choosing which language to use requires some thought.

POSIX shell

Plugins can assume this is present, since Kakoune requires it. It has the basics (loops, conditionals) and some very simple string parsing (case and prefix/suffix trimming) and formatting (printf) options, but there’s no support for data structures, which is pretty limiting.

This little laptop can launch /bin/dash 3,398 times in five seconds.

sed

sed is often most people’s first tool when it comes to Unix text processing. While I’m pretty sure it’s Turing-complete, it seems more like an accident than deliberate. sed is reasonable if you need basic search-and-replace, but anything more complex that and you’ll need to enjoy intense puzzle-solving.

AWK

AWK is the great grandaddy of UNIX scripting. Not only does it have all the usual programming features, it has built-in support for processing tabular data, regexes, and it supports associative arrays too.

Unfortunately, various implementations of awk are… unreliable for handling UTF-8 data. Take the following AWK program, for example:

BEGIN {
    print substr("üu", 1, 1)
}

This prints the first character in üu, which I would expect to be ü. Here’s how various implementations fared:

  • mawk (Debian’s default):
  • nawk (the original implementation):
  • plan9port awk: üu (!)
  • gawk (GNU awk): ü

Of those, only gawk produces the answer I expected. Sadly, it’s also the slowest implementation:

  • mawk: 2714 invocations in 5 seconds (faster than GNU sed!)
  • plan9port awk: 2158
  • gawk: 1113

Perl

Perl was designed to be (as I understand it) the ultra-AWK: just as flexible, but more powerful and more sophisticated. It has a module system, it supports object-oriented programming, it’s still under active maintenance… but for modern tastes it’s very baroque. AWK and sed have the excuse that they’re very limited languages, but Perl is vastly larger and still feels very quirky.

At least Perl is broadly available (the perl-base package is “Priority: required” in Debian), and it has possibly the best Unicode support of any language on any platform.

Perl 5.32.1 launches 1272 times in 5 seconds, on par with gawk and just a little slower than GNU sed.

Python 3

But enough about all these archaic and crufty languages, Kakoune let’s us write plugins in any language, so why not pick something clean and modern?

Python3 is broadly available, has a decent standard library and excellent third-party library support. It’s a great fit for just about anything your Kakoune plugin needs. Python 3.9.2 launches… wait, really? Is that a typo?

Apparently, Python 3.9.2 launches 54 times in five seconds, 1/23rd the speed of Perl 5. So, uh, maybe don’t use that for any hooks that fire often.

Conclusion

AWK is a pretty decent tool for intermediate Kakoune scripting, so long as you don’t mind depending on gawk instead of “any awk”.

Otherwise, maybe it’s about time I looked into learning Perl, even though it’s so… Perly.

5 Likes

Such a shame that the python interpreter start up time is so long. Hopefully something comes out from this: Improve startup time. · Issue #32 · faster-cpython/ideas · GitHub

Let’s not forget about Lua. It’s small, readable and available on almost every distro. While UTF-8 story is not great (you need external library) it has other advantages. It starts very quickly. There is fantastic PEG library for parsing (lpeg). And we already have nice integration plugin.


I did some extremely unscientific benchmark. I compared awk, Lua 5.4 and Python 3.9. I also compared launch time with shell and raw exec.

go test -bench . -benchtime 5s -cpu 1                                                                                                                                                                  2m 5s
goos: linux
goarch: amd64
pkg: tst
cpu: AMD Ryzen 9 5900X 12-Core Processor
BenchmarkRaw/Lua         	   14853	    404017 ns/op
BenchmarkRaw/Awk         	    4809	   1240155 ns/op
BenchmarkRaw/Python      	     808	   7380447 ns/op
BenchmarkShell/Lua       	    9108	    681058 ns/op
BenchmarkShell/Awk       	    3782	   1597863 ns/op
BenchmarkShell/Python    	     786	   7720292 ns/op
PASS
ok  	tst	42.209s
3 Likes

There’s also Bash, which is extremely powerful since v4.4 (2017): arrays, hash-tables, pattern matching and substitution, functions, dynamic scoping, data & code serialization, multi-processing, automatic error-checking come to mind. Yes, it’s quirky to learn to use properly. But for a lot of things it’s actually more powerful than “real” programming languages. Now, I’m actually curious how fast it is (startup-wise at least) compared to Dash these days.

The discussion here pointed me to the -S flag for the Python interpreter: Looks like you can speed up startup (significantly, in some systems) with it if you are not importing any non-built-in modules. I think that would be safe to assume for Kakoune plugins.

Doing some benchmarking with hyperfine, hyperfine -w 5 "python3 -c ''" on my (decently specced) desktop takes 6.5 ms ± 0.2 ms vs. 4.5 ms ± 0.1 ms using -S. On my (old) laptop it is 13.0 ms ± 0.6 ms vs. 8.7 ms ± 0.4 ms. Given that some folks have startup times as high as 90 ms (5 s / 54 times), I’d be interested to learn how this switch affects it.

(In a slightly more realistic scenario that only measures the startup time of smooth-scroll.py, the improvement is from 11.4 ms ± 0.2 ms to 10.5 ms ± 0.1 ms on my desktop with a similar improvement on laptop, so it is not very significant. However I might as well update it to use the switch as it might help slower systems.)

2 Likes

Perl newb here, but inspired to write the substr example to measure non-trivial startup time of the perl interpreter, and measured using hyperfine:

$ hyperfine "perl -C -Mutf8 -Mv5.28.0 -e '$U'" "gawk '$T'"
Benchmark #1: perl -C -Mutf8 -Mv5.28.0 -e 'say substr("üu", 0, 1)'
  Time (mean ± σ):       3.3 ms ±   0.4 ms    [User: 1.9 ms, System: 1.4 ms]
  Range (min … max):     2.1 ms …   4.4 ms    655 runs
 
  Warning: Command took less than 5 ms to complete. Results might be inaccurate.
 
Benchmark #2: gawk 'BEGIN { print substr("üu", 1, 1) }'
  Time (mean ± σ):       2.9 ms ±   0.3 ms    [User: 1.6 ms, System: 1.3 ms]
  Range (min … max):     1.8 ms …   4.0 ms    702 runs
 
  Warning: Command took less than 5 ms to complete. Results might be inaccurate.
 
Summary
  'gawk 'BEGIN { print substr("üu", 1, 1) }'' ran
    1.15 ± 0.20 times faster than 'perl -C -Mutf8 -Mv5.28.0 -e 'say substr("üu", 0, 1)''

Technically, I didn’t need to store the perl expression in a variable to avoid quoting issues since qq(abc) (quote quote) can stand in for “abc”. -C is needed to specify that stdout is utf8. -Mutf8 is needed to specify that the source is in utf8.

Thanks for the suggestions and feedback, everyone!

My initial concern about Lua was that it uses 1-based indexing, but so does AWK, so I can’t complain about that anymore. :slight_smile:

My bigger worry about Lua is that, as I understand it, it’s intended to be vendored into each system that uses it, so the ecosystem is pretty lax about breaking changes. Debian offers me five separate versions of Lua (5.0, 5.1, 5.2, 5.3, 5.4), but only one version of Python (3.9), two versions of Perl (5.30 and 5.32), one version of Ruby (2.7) and one version of tcl (8.6). Also, my understanding is that Lua’s standard library is very small - if I write a plugin using Python or Perl, I can call just about every function in the C standard library and bunch of other useful stuff besides (XML, JSON, MIME, date arithmetic) without having to teach anybody about a language-specific packaging tool.

Of course, that’s not a hard rule; I hope there are and continue to be Kakoune plugins that draw deeply from CPAN and PyPI and crates.io and RubyGems and LuaRocks and NPM, but for my own plugins I want to make installation as painless as possible.

My expectation is that bash is not as widely available as one might hope; certainly it’s everywhere on Linux, but I think the BSDs hate it and macOS only ships an ancient version because Apple is terrified of GPLv3. But, I haven’t actually done a survey or anything, so maybe it would actually be practical as a plugin helper?

Whoa, that’s amazing. I hadn’t heard of hyperfine before, but it’s pretty, and it lets you benchmark shell functions! That’s pretty great.

Unfortunately it seems to gripe about timing things that take less than 5ms, which is annoying since ideally all of these interpreters would startup in less than 5ms, but at least it’s a more relatable number than “startups in 5 seconds”.

Anyway, thanks to hyperfine here’s the same interpreter startup times as before, plus the new ones people have mentioned:

Benchmark #1: bash -c ''
  Time (mean ± σ):       2.7 ms ±   0.4 ms    [User: 1.7 ms, System: 1.2 ms]
Benchmark #2: python3 -S -c ''
  Time (mean ± σ):      21.4 ms ±   0.7 ms    [User: 15.8 ms, System: 5.5 ms]
Benchmark #3: lua5.4 -e ''
  Time (mean ± σ):       2.4 ms ±   0.3 ms    [User: 1.5 ms, System: 1.1 ms]
Benchmark #4: perl -e ''
  Time (mean ± σ):       3.8 ms ±   0.4 ms    [User: 1.9 ms, System: 2.2 ms]
Benchmark #5: gawk 'BEGIN {}'
  Time (mean ± σ):       3.9 ms ±   0.4 ms    [User: 1.9 ms, System: 2.2 ms]
Benchmark #6: mawk 'BEGIN {}'
  Time (mean ± σ):       1.7 ms ±   0.3 ms    [User: 1.1 ms, System: 0.8 ms]
Benchmark #7: sed -e '' </dev/null
  Time (mean ± σ):       2.4 ms ±   0.4 ms    [User: 1.4 ms, System: 1.2 ms]
Benchmark #8: /bin/dash -c ''
  Time (mean ± σ):       1.3 ms ±   0.4 ms    [User: 1.0 ms, System: 0.6 ms]
Benchmark #9: python3 -c ''
  Time (mean ± σ):      74.5 ms ±   3.7 ms    [User: 61.7 ms, System: 11.8 ms]

Summary
  '/bin/dash -c ''' ran
    1.29 ± 0.46 times faster than 'mawk 'BEGIN {}''
    1.85 ± 0.60 times faster than 'lua5.4 -e '''
    1.85 ± 0.62 times faster than 'sed -e '' </dev/null'
    2.08 ± 0.68 times faster than 'bash -c '''
    2.90 ± 0.90 times faster than 'perl -e '''
    2.93 ± 0.93 times faster than 'gawk 'BEGIN {}''
   16.22 ± 4.85 times faster than 'python3 -S -c '''
   56.86 ± 13.73 times faster than 'python3 -c '''
3 Likes

Mostly out of curiosity I’ve been screwing around with implementing my first plugin (mru-files) in pure POSIX shell, and wanted to see how far one can go without resorting to external utilities. Turns out you can emulate a lot of text processing (head, grep, simple awk / cut) using just shell pipes and sub-shells — which should be really fast and which is where the shell still shines compared to other languages.

I’ve extracted this shell code into a reusable library / kak module that can be require'd by other shell-based plugins: k9s0ke-shlib. Maybe I’ll announce it separately (hope I’m not completely off the mark doing this, writing shell sounds so… 1970’s).

1 Like

That’s pretty cool, but if you make a shell library for helping with Kakoune scripts, I heartily recommend adding functions to Kakoune-quote and shell-quote strings, based on the “multi_builtin_quoter_all_backslashes_ntmp” function in this issue.

1 Like

Thanks for the suggestion @Screwtapello. I commented on that issue (though it’s closed) — I think you get even better performance (around 15%) if you build up the string incrementally and printf at the end. I’ll probably add the functions to my… humble plugin… though it seems like something kakoune should provide, for reasons of trust and convenience.

See my kak_quoter_benchmark.sh snippet, function multi_builtin_quoter_allbkslash_nt1p (there’s also a recursive *_rec version just for completeness; it’s a bit worse than _ntmp and requires an auxiliary function)

1 Like

@Screwtapello aybe it’s a good time to look into babashka - it is a Clojure interpreter, compiled with graalvm into a binary, with a some selection of Clojure and Java libraries baked in. It starts instantly and allows scripting in a very good (IMO) language, with focus on immutability and concurrency, which actually makes it really good for writing robust programs in functional style.

Lua is really great small language too, one of my favorite languages at the moment actually, thanks to Fennel.

Lua 5.3 and 5.4 have utf8 table with these functions built-in: char, codepoint, codes, len, offset.

That’s because Lua doesn’t have arrays. Arrays in C are pointers, and 0 based inexing there is meaningful, because the start of the memory section has 0 offset. In Lua tables are not mapped to memory, thus 0 based indexing makes little sense (similarly in Awk). Lua actually allows 0 index in its tables no problem, as well as negative inexes, which is very handy. Lua itself uses this feature:

$ echo "return print(arg[-1], arg[0], arg[1])" > l.lua
$ lua l.lua -i
lua     l.lua   -i

That’s due to Lua having combined tables that simultaniously have hashed associative and sequential indexed parts together, so negative indexes actually are in the associative part, but still iterable with custom iterator.

If using anything but sh, won’t the startup cost include spawning sh itself (which then hands over to the other interpreter / binary)? Is there a way to avoid going through %sh{} to run commands?

Yes. As a general rule, Kakoune values implementation simplicity over efficiency. That’s not always true (The regex engine uses the more complex and more efficient finite-automata model rather than the more straight-forward backtracking model), but it’s true here.

It’s not that big of a deal in general. I have a field in my status bar that shows the Unicode code-point under the cursor, and I do that in pure sh because it executes on every keystroke. On the other hand, I don’t care if :make launches some heavy-weight language to do output post-processing because it’s going to be a tiny fraction of the time taken by the compiler and linker.

Obviously, but I was considering Lua for “on every keystroke”-type jobs. If startup cost doubles, that’s not so great.

Why not have an %exec{} that translates directly to execve() (with the specified args as argv[], obviously)? It wouldn’t change any of the reasons why sh was chosen as a glue and would require next to… well, nothing? (to implement)

1 Like