Call for a testing kak-lsp better position/range handling


As you may already know, position handling in LSP is a bit of a problem in both clients and servers because of unnecessary inconsistency introduced by the spec. See for more details on the topic.

When talking about kak-lsp in particular, the way it handles Position.character is just converting between 0 and 1 base (LSP is 0-based, and Kakoune uses base 1) and between exclusive (LSP) and inclusive (Kakoune) ranges. It doesn’t care about what Position.character is: byte, code unit or code point in UTF-8 or UTF-16 or whatever. But because Kakoune itself treats column as a byte offset in most of the places, we can say that kak-lsp is effectively working with UTF-8 code units or just byte offsets.

It works well with language servers which violate protocol in the same way (i.e. pyls or bingo) but leads to problems like or when language server conforms protocol or violates it in a different way (i.e. RLS which uses UTF-8 code points) as soon as line contains characters outside of Basic Latin set.

In the branch I made a few steps forward the solution of that problem:

  1. Copy of the latest version of the buffer (which was already sent to kak-lsp in didOpen and didChange requests) is now stored for further analysis.
  2. When position/range arrives from Kakoune side or is about to be sent to Kakoune, it is converted from/to byte offset based on the buffer content treated as UTF-8.
  3. The default mode of conversion is to treat LSP Position.character as an offset in UTF-8 code points. It means that now kak-lsp should work with spec-conforming servers within the entire Basic Multilingual Plane, and even outside it with language servers which violate spec by using UTF-8 code points (i.e. RLS).
  4. For language servers which use UTF-8 code units (i.e. pyls) there is a new option which could be set in kak-lsp.toml: offset_encoding = "utf-8" (like this Why just “utf-8” when UTF-8 offset could be represented in both ways (code units and code point)? Because it is inspired by this convention However, please note that offset_encoding option only influences kak-lsp behaviour and is not sent to language server!

This branch is not ready to merge into master yet, because I want to write a bit more docs and create a few unit tests; few places need a decision about the level of gracefulness in error handling as well. But it is functionally complete and I ask you a favour of trying it with your typical workflow if you are keen. There are several things you can do:

  1. If you like Rust and want to spend time reading kak-lsp code, I’d appreciate a code review.
  2. If your projects don’t use characters outside of Basic Latin, please check that kak-lsp works for you as before.
  3. If your projects do use characters outside of Basic Latin, kak-lsp should now work with them properly within Basic Multilingual Plane! Beware that your language server might need adding offset_encoding = "utf-8" to the config if it uses bytes to encode Position.character

Thanks for the help!


Sure! I’m going to test it, in the software project I’m working on there are lot of strings in non Latin symbols.

Thanks for your hard work, @ulis!