Beyond Vim and Emacs: A Scalable UI Paradigm


I found this 15min video presented during the EmacsConf 2020 of last week-end quite mesmerizing:

The narrator starts by explaining that there’s a ground level, the insert mode where almost everything you type is reflected, and then there’s a meta mode above, the normal mode where pressing keys triggers commands. Nothing fancy here, just a recap of how a modal editors like Kakoune work.

Then it leads to a reflection that many of us probably had about the “kitchen sink” aspect of the normal mode. It is followed by the definition of a simple truth: if you can name something on screen, then it’s a noun, and therefore it’s a mode. The examples are “word mode”, “paragraph mode”, “sentence mode”, “window mode”… And for each of them doing H J K L has the same predictable behavior.

But it’s now reaching the point where it gets interesting and meta. If “mode” is a noun. It can be manipulated. Therefore there’s a mode mode. Which in the author’s jargon forms a Tower (the parallel with Kakoune is the mode stack with the associated push/pop hooks). And then you can go a step further by entering the tower mode to organize your towers…

I still need time to ingest these concepts to perceive how this kind of reasoning could lead to practical application in text/editing but nonetheless, this meta journey was quite fun.


This… just… I am going to sit and think for a bit.

I like it (except the meta-planes got a bit too philosophical and at the same time did not seem very powerful)

I think instead of linearly going from window to buffer to line to char to insert one step at a time it seems more sensible to go directly to the mode you want. Then we do end up with Kakoune user-modes. But the philosophical shift is that normal mode wouldn’t get the status of normal any more, all user-modes would be on equal footing.

Btw, did you also hear the sound of a pet drinking water towards the end of the video? :smiley:

My only expansion is that I’m not sure about a linear set of modes. Rather, a tree of modes seems pretty interesting. E.g. while in word mode, I might do escape to go up to normal, and from there be able to directly drill into several modes: word, paragraph, line, etc.

Wow, the idea is really inspiring. I just got from it an idea that I really want to try in my config. Here’s the systrace of my idea:

  • At first I wonder if the paradigm (word mode, paragraph mode) can directly translate to very efficient key bindings.
  • But probably not. It would be annoying if you have to type two keys to select a single word (enter word mode, then hjkl).
  • There can be a word around, though. For example, if we let w perform two actions at the same time: select next word, and enter word mode.
  • But wait, we don’t need so many modes achieve that! We can modify the behavior of hjkl as follows:
    • Normally, they work as usual: one character/line in the respect direction
    • But if the last command before pressing hjkl happens to be another selection command, like w , then hjkl still select something in the respective direction, but by the same unit as that last command. For example, wlll is equivalent to wwww.
  • What’s the benefit, then? One obvious benefit is that, hjkl is the most comfortable to press on the keyboard, when you are selecting many words/paragraphs interactively (not easy to know exactly how much to select), that would make the operation easier.

This makes the most sense to me. If you start messing around with your modes, then you’ll forget how many times you’ll have to hit Escape/Enter to get to the mode you want. I feel like the key-bindings should just be something like:

  • <leader>w to enter word-mode
  • <leader>l to enter line-mode
  • <leader>p to enter paragraph-mode

In fact this looks pretty easy to do in Kakoune because of user-modes.

I don’t favor changing behavior based on the last command entered because one of the traits I value most in an editor is predictability, and I find it much easier to manage the modes I’m in over the last command I entered.

Check this out. I just wrote a user mode for lines and paragraphs. It feels so smooth to be able to select lines/paragraphs with j/k. I haven’t implemented moving paragraphs around yet, but moving lines with J/K just feels so good.

The one weird thing though is that with paragraphs you can’t really select a left/right paragraph, so h/l don’t really make sense.

Indeed. There’s one thing I forgot to mention: the “bases on last command” hjkl behavior should be enabled with a timeout. In this way the behavior would be much more intuitive.

maybe h/l swaps paragraphs? When I re-wrote my tmux configs, I ended up binding <cmd>h and <cmd>l to move tabs, and <cmd>j and <cmd>k to swap tabs. I duplicated this for chrome, and in both cases, it’s very useful.

I think J/K makes the most sense for swapping paragraphs just like how it swaps lines.

The one weird thing though is that with paragraphs you can’t really select a left/right paragraph, so h/l don’t really make sense

I would map h to <a-s>
and l to <a-s>gi
or something like that