Tree editors

One of the great force of Kakoune is combination of both multi-selections and powerful text-objects. To summary, one could say that’s a good fit for structured editing.

But what if the concept could go even further, like coding directly on AST instead of raw text?
This article attempts to give an overview of all the existing solutions in this field and I think we may find some interesting bits:


When I was writing Clojure I was infinitely delighted by operating on AST using ParEdit; it was a multiplier for code juggling productivity, I miss it a lot. I would totally love to do the same in other languages. Now I’m even have an itching to start implementing such plugin on top of, say, tree-sitter

At one extreme, you have editors like JetBrains IDEs that have a full and deep understanding of a particular language and can do very powerful things very easily, but require a team of developers to implement and maintain for each language.

At the other extreme, you have Kakoune, which knows almost nothing about any language, but has a bunch of primitives like <a-w> and ]p and regexes that can be composed for each editing task. Kakoune’s primitives are versatile, but it’s always bothered me that Kakoune (and Vim before it) are so heavily based on regexes when they’re often used to edit structured data, and regexes can’t match structured data. Sure, Kakoune provides some special-case structured primitives like m and ]c, but it’s easy to fool m with grouping characters inside strings.

I’d like to try (maybe not use, but try) an editor that is to parse-trees what Kakoune is to regexes - you’d need a syntax for defining grammars not much more complex than regex syntax, operators to find text chunks matching a grammar, that kind of thing. Even better, if you had a grammar for some particular file-format, you could use that for syntax highlighting, or to allow structural navigation (imagine consecutive presses of a key selecting the surrounding word, then the surrounding expression, then the surrounding if block…). You could even let the user reference existing syntax nodes in ad-hoc grammar expressions, so you could say “select string literals” without having to on-the-fly invent a regex for this particular language’s string syntax.

I imagine the result could be something like tree-sitter, but the creating parsers documentation is a lot more heavy-weight than writing a regex.

1 Like

I have an idea about annotating the buffer with parse-tree information (which is a bit more basic than the highlighters), then the highlighters can feed off that, and so can text objects. It’s kind of the tree-sitter thing. Basically, some range is spat out by the parse tool saying like, from this point to this point is a function definition (aka ‘F’), then <a-a>F will work inside it (or just before it).

I have a specific ideas on how to store the parse tree information (in a “triplet store”). But I’m still thinking this through, and I’m still building my basic Clojure environment yet…

Okay, I’ve got a PoC of using tree-sitter with a content and cursor position provided by Kakoune.

Now it’s time to decide:

  1. How to map text object to a shell call to save plugin from duplicating a lot of Kakoune’s core functionality. @mawww what do you think?
  2. How to group node kinds into text objects because exposing them all could be a bit overwhelming. Example of node kinds emitted by tree-sitter for Rust: function_item, identifier, parameters, parameter, type_identifier, block, line_comment, let_declaration, mutable_specifier, call_expression, scoped_identifier, arguments, reference_expression etc. etc. etc.

Would be nice to come up with several essential groups of node kinds and then map text objects to them. That would allow nice structural selection and editing in Kakoune.

Part of the reason Kakoune has such a rich array of text objects is because different languages use different patterns to describe their structure. Since Kakoune doesn’t have hard-coded knowledge of any language, its text objects need to be (a) generic enough to be useful in many situations, and (b) small enough in number that a human doesn’t suffer analysis-paralysis trying to decide which one to use.

When you do have hard-coded knowledge of a language, the trade-offs are very different. Just off the top of my head, I’d like to see “objects” like:

  • select the sibling object following each selected object
  • select the sibling object preceding each selected object
  • select the first child object of each selected object
  • select the immediate parent object of each selected object
  • repeatedly select the immediate parent of the deepest selected object until every selection has the same depth
  • select the node whose path is the longest common prefix of the paths to all the selections
  • check all the paths to all the selections, and from all the node-kinds that are common to all paths:
    • pick the node-kind that (on average) appears deepest in each path
    • for each selection, select the nearest ancestor node of that kind (function call, expression, statement, etc.)

To keep the process open from the very beginning:

At the moment only one command is available, tree-select-node. This command extends each selection to the range of the deepest of its enclosing nodes. With one exception: if selection already covers node exactly, then it is extended to the parent node. It allows growing selections scope by just repeating the command. Directions of selections are not preserved yet.



That is super-cool. I built it just to try it out, and repeatedly running :tree-select-node works beautifully.

However, then I tried:

map global object <a-a> ': tree-select-node<ret>'

…and then when I hit <a-a><a-a>, the cursor moves to the end of the line, and Kakoune says “no selections remaining”.

This must be some quirk of mapping in the object semi-mode. :confused:

Does object mode work with arbitrary commands? I’m not sure.

One of the thing which I see as an immediate improvement for tree-select-node is to allow black/whitelisting node kinds so expansion happens only for scopes which matters.

Does object mode work with arbitrary commands? I’m not sure.

Apparently it does, but user mode is the only one where Kakoune explicitly goes back to normal mode before executing the mapping. This works fine:

map global object <a-a> '<esc>: tree-select-node<ret>'

Another thing that surprised me: when you select an entire (Rust) function and activate tree-select-node again, it doesn’t select the whole file. I assumed there would be an implicit “rust-document” top-level node covering the entire file, but I guess not?

One of the thing which I see as an immediate improvement for tree-select-node is to allow black/whitelisting node kinds so expansion happens only for scopes which matters.

You want to be able to jump straight from (say) identifier to statement, without all the intermediate expression nodes?

But it isn’t a real text object, is it? I guess it doesn’t allow you to do i.e. ] with it, it always just results into tree-select-node. It would be nice to have some way to use that selection as a real text object to not duplicate Kakoune’s operations on objects in the plugin.

There is an implicit root node, I believe it’s just a bug in kak-lsp which prevents entire buffer selection.

Yes, exactly.

Update on kak-tree:

  • Added tree-select-next-node and tree-select-prev-node for siblings selection. These commands select siblings as opposed to extending selection to sibling which will be added later.
  • Added support for per-filetype white/blacklisting node kinds to skip uninteresting selection steps.
  • Added tree-node-sexp for tree-parser output inspection.

@ulis will you expose kak-tree to provide highlighting, and later folds?

I think if tree-siter can be used in the same way how language servers are used, it is good to have plugin for Kakoune, since tree-siter can provide neat ways to navigate and all that semantic stuff for highlighting. More than that, since tree-siter is kinda pioner here, I think at some point there will be other solutions like tree-siter that will expose same idea, so maybe kakoune need to be ready to support such things

Tbch I am not interested in syntax highlighting and folds as I don’t use them, and they sound orthogonal enough to be implemented as separate plugins even if they all will be based on tree-sitter.

That said, I can imagine the reason to have a tree-sitter-based Swiss Army knife instead of a collection of tools, which is efficiency:

  1. Including a bunch of tree-sitter parsers into binary blows it up to dozens of megabytes. Why doing it thrice then?
  2. Having separate plugins for structural selection, syntax highlighting and folds would lead to triple parsing on every buffer change and triple memory consumption, especially if using incremental parsing which requires holding buffer’s tree in memory during the entire buffer life cycle.

I would be happy to make an effort and convert kak-tree into a platform suitable for the efficient integration of tree-sitter-based functionality. I just don’t have enough resources, both time-wise and interest-wise, to build things like syntax highlighting and folds on top of it; someone else should take ownership of those features.