Kakoune, Accessibility (Merged Threads)

I am losing my vision, this has motivated me to consider the best way to write code while blind. There are a lot of perspectives on this in the accessibility community, but generally it involves the popular tools (VoiceOver and NVDA) and using heavy doses of assistance from autocomplete with tools like vscode.

This works well enough and I an exploring it. But the more I considered it the more I thought possibly using a light scripted layer with “say” (available on mac) which lets you pipe stuff to it to be read aloud with a focus on code might be worth more. Certain things could be done easily, like calling out indent depth, custom names for code-specific things that you might not want elsewhere. It would allow things like “on selection 1/9 content foo” “rotated selection” “applied to 7 locations” “opened insert on 7 lines” and woudl allow new commands useful to blind people, like “read all selected lines”. I think terminal editors have often been overlooked by the accessibility community, as they have a tendancy to just fill screens of text.

I am very new to this way of thinking, so I am mostly thinking out loud and do not speak for the accessibility community, but I am curious what ideas the community might have on how to how to use kakoune if your monitor was turned off…

Brainstorming ideas?


Hey @robertmeta, sad to hear you are losing your vision.

While I never designed Kakoune with blind coding in mind, I think the fundamental editing and extension model could be well adapted to that use case. The focus on a simple, predictable and scriptable set of normal mode commands should already go a long way, as Kakoune is already usable for complex editing tasks with reduced feedback.

Feedback is obviously still needed, for that I hope the existing extension model can enable Kakoune to interact with tools such as a braille reader or voice based system. Where we might still be lacking good alternatives is for non-buffer content information, which is not always easily available or easy to send to external tools.

This is not something I thought for a long time about, so those are just initial thoughts. In any case, if we can make it easier to use Kakoune without relying on display, I think that’s worthwhile.

1 Like

The simple command on MacOS “say” uses the voice and intonation, so just piping text to it works great. The hard spots are around handling like help and popup lists. The basic framing of the plugin will be just hooking events and piping data to say. Even idle can be used to do neat stuff like emit status information. The more I look at it, the more I think Kakoune could be near the ideal editor for the blind as the model of knowing what you are working on is key. Additionally, the menu system could be ideal for screen readers as well… as you are given the top level context first, so you can hit the key if you know it, or you can wait for it to be read. I might be going too far, but I think the gap between where it is and near ideal isn’t that far… some things I am curious about the best way to do, file finding, etc. But possibly the best way for a blind user is to simply populate a file with a grep style output and let it be treated normally to open files, also makes going back to it easy. Lots of gaps, but I think I am going to start building out the basic tooling and see how it goes, would love Kakoune to be the default editor of the vision impaired.


Wow that’s rough, I’m sorry.

It looks like the Orca screen reader reads whatever text is drawn on the terminal.
For example whenever the modeline is redrawn, it reads that (doing that on every movement quickly gets annoying so I set an empty modeline).
Orca also reads the completion list and sometimes even info boxes.

Proper Kakoune integration sounds much better though.
A plugin probably wants hooks that have a widget’s text as argument. Luckily, there are only few relevant widget types (I think):

  • info boxes
  • prompt line (echo/fail or navigating prompt history)
  • modeline (though it’s probably better to have shortcuts the individual attributes)
  • insert/prompt completer
  • menu (not used that much AFAIK)

there are also highlighters like line-flags but I can’t imagine them being necessary.

Somewhat related: here is someone who hacked Emacs to work impressively well with voice input. I found that really inspiring.

Another exciting idea is https://oskars.org/ , a braille keyboard for smartphones.

FWIW I use a (grayscale) eink screen because it’s easier on the eyes. I’ve been using this colorscheme lately:

# Black-on-white colorscheme for minimal distraction & maximal contrast.
# Works well with e-ink screens, also in sunlight.

# For Code
face global value default
face global type default
face global variable default
face global module default
face global function default
face global string default
face global keyword default
face global operator default
face global attribute default
face global comment default
face global documentation comment
face global meta default
face global builtin default

# For markup
face global title default
face global header default
face global mono default
face global block default
face global link default
face global bullet default
face global list default

# builtin faces
face global Default default,default
face global PrimarySelection black,rgb:cccccc+fg
face global SecondarySelection black,rgb:e0e0e0+fg
face global PrimaryCursor default,default+rfg
face global SecondaryCursor white,rgb:777777+fg
face global PrimaryCursorEol black,rgb:777777+fg
face global SecondaryCursorEol black,rgb:777777+fg
face global LineNumbers default,default
face global LineNumberCursor default,default+r
face global MenuForeground white,black
face global MenuBackground black,rgb:dddddd
face global MenuInfo default
face global Information black,white
face global Error white,black
face global DiagnosticError default
face global DiagnosticWarning default
face global StatusLine default,default
face global StatusLineMode default,default
face global StatusLineInfo default,default
face global StatusLineValue default,default
face global StatusCursor default,default+r
face global Prompt default,default+r
face global MatchingChar default,default+b
face global Whitespace default,default+fd
face global BufferPadding default,default

Yeah, the idea would be while in kakoune to disable the screen reader because Kakoune would provide all the needed feedback. Making it able to be more tuned to the task. So, only data piped to say would be heard. So hooking things like what buffer has the input.

I don’t want to change the UI too much either, because when pairing it is good for the other developer to be able to see it regularly.

The other thing I am struggling with is the best way to REPL and do a test cycle, might finally force me into using proper test driven development! Because generally I just un another terminal right next door and send commands to it, but with the screen reader off those changes would be invisible. Maybe just routing them to a file and reading the file would work.

I use :make for that, it’s very convenient because I can quickly cycle trough stack traces.

Hi @robertmeta, I came across Microsoft Accessibility resources a while back and given your predicament you might find it useful in your quest for kakoune and programming with Accessibility services in mind.

I have not read the title but is rather a resource for when my programming matures past algorithms and data structures. The below title its a high level overview for software design with a focus on .Net framework.

You may enjoy it or not and that’s ok, it’s an idea from the other side of the fence which may help in your quest.

Microsoft 2009, ‘Engineering Software for Accessibility’, Microsoft Press, viewed 04 April 2022, https://www.microsoft.com/en-us/download/details.aspx?id=19262

I also have access to a large multinational database network at my university if you need anything (research material) just ask. Even ‘pm’ me with a bunch of defined keywords and I can screenshot you back a list of resources or a specific title to broaden your knowledge base.

Looking forward to Kakoune Accessibility, bye :wave:.

Hey all! I have been on editor walkabout learning what tools exist in the industry, how stuff is generally sewn together and I have even contributed patches to Emcaspeak! I am talking to some Neovimmers about getting an accessibility mode up and running there and … you guys know me, I am starting a blind tech community discourse and discord server … but I digress.

Anyway, Emacs has some great features and honestly, Emacspeak is the best tool I have found for coding with vision impairments, but … it isn’t Kakoune! Armed with enough information to be dangerous and mildly clueful now I am likely to embark (and look for assistance) at getting Kakoune all the way tooled up for the visual impaired.

Here are my random thoughts so far … as much for myself as you folks…

The Emacspeak model is very simple, text over stdin is how emacspeak talks to its “servers” which can be written in any language. It has a very simple wire protocol that has many years of use and multiple servers which are written to link to the TTS engines. TTS Servers (Emacspeak User’s Manual — 2nd Edition.)

At a basic level, it only does a few things…

  1. Speak text (letters or sentences, different speeds and intonations, etc)
  2. “Altered” text (special effects like echo not directly supported by the TTS engine which requires using a library like Sox). I am have convinced this isn’t worth the complexity at this point.
  3. Playing “audio icons” which are simply sound files, aiff, ogg, mp3, wav, etc
  4. Playing “tones” – exactly what it sounds like, a tone.

That is basically it. One nice thing about this simple interface is with a port forward and a bit of redirection it works over SSH!

So, I don’t think we need to recreate the wheel here, I would love to see both Neovim and Kakoune using the same servers as Emacspeak so they can grow more robust. Which goes into what is needed… hooks hooks and hooks!

I have high hopes with hooks, and some elbow grease the basic Kakoune package can be made usable in relatively short order, might find some hooks we need to add, some other cracks and crevices that need to be filled in to help identify location and context.

My biggest concern for this in regards to plugins, one of the magic bits of Emacspeak is lisp and the fact that Emacspeak can modify other plugins without any direct touching of them, just with the defadvice system and other stuff. That is incredibly powerful and lets Emacspeak work with a large number of plugins by just hooking into when the functions they use are run.

On the basic untouched Kakoune front, with no plugins. It will just be about if we have the hooks to emit the audio information that is required. The goal of what is emitted should be useful and timely, unlike visual editing, we can not just toss a bunch of stuff on screen.

Due to multiple cursors, going to have some unique challenges, but I think all of them are overcomable, just need to start digging into kakoune to see how we launch and keep to STDIN open to a subprocess for an entire editor session.

Core implementation will of course be:

  • Read line on line switch
  • Read selected content
  • Read selected content with multiple cursors, with an ident for which copy you are on
  • Read chars as they are typed or deleted (with speed multiplier)
  • Read by movements without selecting (this might be impossible now?)
  • Read by word as completed
  • Read autocomplete options (in editing or like file picker)
  • Read help options (needs to be skippable to next one, prev one)
  • Read file name
  • Read position in file
  • … a lot of other corners and things that will have to be looked at.

So, as documented elsewhere on here I have had to deal with vision loss recently, and I ended up in Emacs due to its incredible support for accessibility due to a 30+ year quest primarily of one developer. I am incredibly grateful and have become proficient in emacs, but it isn’t for me at the end of the day. I now understand enough about the Emacs “voice servers” to start using from Kakoune, but not sure exactly where to start.

So the Emacspeak “servers” are very simple processes that act as bridges to various voice servers, they have a small set of plain text commands and are driven over STDIN. You run the process, keep the STDIN open and spam it with messages. A huge advantage of this is that it can be forwarded across SSH connections as just like a dumb telnet port.

Audio Icons - basically wav and mp3 files can be played (queued or instant)
Tones - tones can be played (wavelength / duration)
Voice - speach can be generated (queued or instant) and has a small subset of additional embedded commands for delaying speach, changing pitch and a few other things depending on the underlying tech.

These abilities are served via an incredibly minimal STDIN protocol:

  • a audio_icon_name.wav :: plays it
  • c embed_code :: queue a special embed code, like echo
  • d :: dispatch everything in queue
  • s :: halt speaking (my implementation also stops audio icons and tones, but this is a bug in mine)
  • p audio_icon_name.wav :: play sound instantly not queued
  • l letter :: say the name of the letter rather than making its sound
  • q speech :: queue speech to be said when speaker is done
  • sh silence_duration :: be silent for a number of milliseconds for a pause
  • t wavelength duration :: play a tone
  • version :: say the version outloud
  • tts_exit :: exit the speech server (only in mine, not in other emacspeak servers)
  • tts_pause :: pause entire tts system
  • tts_reset :: reset back to default
  • tts_resume :: unpause
  • tts_say words :: stop all other speaking and speak this instantly
  • tts_set_character_scale decimal :: how much faster letters should go than words as a multipl,e
  • tts_set_punctuations string :: what punctuation to speak (none, some, all)
  • tts_set_speech_rate rate :: how many words per minute
  • tts_split_caps bool :: should it treat CamelCase like two words Camel Case
  • tts_sync_state :: lets you set a few options together (2x caps settings, speech rate, punctuation)

So… the question of how to design an interface is far from a solved one in the visually impaired community, most of 'em suck. Emacspeak does a good job of trying to make tradeoffs that make sense but I think there is a place for true greenfield thought in this regard.

Important: being able to read stuff has to be separate from selecting it in any manner, imagine if you couldn’t view anything without selecting it. Read by letter, word, sentence, paragraph, indent, etc is all fantastic, between, basically anything you might want to select you might want to read between () … etc

I got some basic ideas, but wondering if anyone has crazy ideas …

1 Like

I don’t think reading something without selecting it would make sense in kakoune. “Read this line” is the vim way of doing stuff and would be hard to get working in kakoune.

Some great way to save and restore selections would probably be better and easier to implement. There are many ways such a plugin could work:

  • a plugin that allows you to have separate reading and editing selections and sync them at will.
  • a plugin that replaces the standard selection saving bound to z with an last in first out stack

But don’t trust anything I say to much, I have no experience with screen readers or any of that and I’m writing this on my phone so I couldn’t try out any ideas

I am trying to think through how that would work. So the default state of Kak would be silence, on selection it would read the selected… so instead of hitting down and it automatically reading the next line, I would either have to change down motion to select the next line or explicitly select it… now as you said that hard part is around restoring selection … because lets say I selected “foo” on a line and I think it is the correct line, but I need to read the line before and line after to confirm, once I confirm I would want to snap back to where I was…

But interestingly by keeping the selection as the holy grail (which I would like) it makes stuff like expand out in both directions work really well…

So the core would be making all selection events emit to the server and then as you said, a great and easy way to go back to what was selected before… which is great for blind and sighted alike.

Regarding restoring selections you could either use c-h and c-k or write your own plugin.
Watching for selection changes looks to be resource heavy, you would have to continuously save the previous selection. You could try to hook into NormalIdle but that would read anytime you stop typing.

Maybe cost of hooking all selections could be reduced in the core code, as that is absolutely required to make this workable. Lower latency is key in these type of things. When I go down one row, the reading has to begin instantly.

Yes, creating a new hook for selection changes would be a good idea and would probably benefit other plugins too. In kakoune/src/selection at line 223 there is an selection update function keeping track of changes made to the selection. So all work inorder to detect selection changes is already being done, all you would have to do is create the hook

Trying to consider all the things I need to hook to have full UI accessible via audio.

  • selection (audio icon for expanding, shrinking, etc)
  • tooltip / help (amazing for blind users, if could rewalk it / replay it or go 1 by 1 even more amazing)
  • auto-complete (this is a hard one to decide how to do)
  • command line (: )

After that it is a series of specific read things, like filename, status line, stuff like that could all be under a binding for “info” or something.

Selections are kinda strange because of the various type of selections that exist. How to verbally deal with multiple selections … not sure exactly how to call those out.

“On selection 1, ‘fish’ on line 1, column 11” … very wordy

Looks good. Regarding multiple selection there are a few ways I can think of doing it:

  • read surrounding text inorder to give some context i.e. “fish, friendly interactive shell”
  • read line column and selection index without saying what the values mean i.e. “1, 1, 11 fish”

How to handle multiple selections will be an problem for audio icons. If you have 5 selections and half of them shrunk and half of them grown what audio icon do you play?
I were honestly surprised how short that list was. But thanks to kakounes simplicity there really aren’t anything else you will need.

So, one way to do it is to read the entire line and use pitch changes for the selections. Which works but I am trying to think of the “least noisey” way to do it. Maybe it is just a language around selections (read after, read before, read line).

We don’t have too many tools, basically 3, so finding the most minimal representations is absolutely key. Another sort of problematic thing is that kakoune primarily runs inside a terminals and offloads key features to other tools (like window management) which adds a whole new layer of complexity for accessibility … but using buffers and piping stuff in and out helps a lot with rereading it and like a tail mode, etc.

I got no good idea around autocomplete at this point as it is so prolific.

To read the line I am on, having to select it then undo the selection feels a bit painful, and I if I snap it back instantly, it would change the selection hence change what is being read. I guess the selection thing and making it work smoothly is still my #1 concern, if the only things ever made audible are selections that gives all selection tools, which are great but there is obviously a notion of having something read that isn’t selected and is unselecting then undoing good enough

One very interesting idea is to have a line read word by word which would give a unique capability of as soon as I hear something I want to interact with, I could stop it and it would leave the word selection where it is … that could be a very interesting and good experience for those navigating by voice, instant mid-line interaction during readback. But that would stack tons of stuff on the undo buffer … which would make going back to before the line read hard unless I stashed the info away.

I’m starting to see the problem with reading the current selection. Maybe the selection should only be read of you hit key? Adding that would make the entire command to read this line and restore the selection xr<c-k> if you bind reading the current selection to r. That is one keystroke longer then the vim translation rr which might not be acceptebel. One solution to this might be to add an additional key mapping that reads the selection and undos the last selection change. If that was mapped to R reading this line would become xR. That’s as short as vim and auto reading on selection changes!
But auto undo wouldn’t work in cases where you used multiple keystrokes to select what your trying to read. We could instead save the selection and them go back to that selection when reading. But that might just course some unpredictable behaviour?

Reading and being able to stop in the middle of an line seams like a really good idea. Can’t the hole line be selected while reading and a single word be selected only if you decide to stop the reading?

Reading has to be the default. The sighted equivalent would be only seeing lines when you hit a key (ed comeback time). This is why initially I considered a second reading cursor.

As for mid-line interception, the issue is the voiceover engine generally doesn’t give feedback on what word it is on, a sentence is dispatched to the reader and read at the speed that user likes, in my case between 600 and 800 WPM.

In other news I dug into why a SelectionChanged hook doesn’t exist all the way back to 2018 and it seems back then it was a very messy process because selection change code was not centralized.

I wonder if using the alternative UI subsystem while being a lot more work obviously might give a more complete solution. Could it run at the same time as standard console Kak?