I’ve been poking at my half-finished GTK UI for Kakoune, and I just hit a crash that surprised me. Basically, if I’m editing a Python file (with kak-lsp and the python language server, and I paste a unicode character (I’ve been using @, the full-width @, but also works) into my document, the character appears, and then the screen updates and my GTK UI crashes because Kakoune sent it invalid UTF-8.
Specifically, the invalid fragment looks like this:
{ "face": { "fg": "default", "bg": "default", "attributes": [] }, "contents": "\xef" },
{ "face": { "fg": "red", "bg": "default", "attributes": [] }, "contents": "\xbc\xa0" },
Note that “\xef\xbc\xa0” is the UTF-8 encoding of @.
If I had to guess, I’d say that the Python language server is detecting an invalid identifier and reporting it in character coördinates, while something later (kak-lsp? Kakoune itself?) is interpreting those as byte coördinates, and trying to apply different syntax-highlighting to different bytes of a UTF-8-encoded stream.
Has anybody else ever come across behaviour like this? Does anybody happen to know which software is at fault?