Insert unicode characters by codepoint with printf

Short-time Kakoune user here.

I’m trying to use :insert-output with printf in order to insert unicode characters by codepoint in Kakoune. This would be useful for platforms which don’t have a system-wide mechanism for this, as seems to be the case in Termux. However, it doesn’t seem to work.

Example:

When I enter printf '\u2026' in the shell, I get “…” (dot-dot-dot).

When I try the same in Kakoune’s :insert-output, I get “\u2026” (without the backslash if I omit the quotes).

What am I missing?

:insert-output isn’t a standard Kakoune command, I assume you mean the ! normal-mode command whose prompt is “insert-output”.

printf is typically a shell builtin, so exactly how it behaves depends on the shell. The POSIX standard for printf only requires support for the escape sequences \\, \a, \b, \f, \n, \r, \t, \v, and \0ddd (where ddd is an octal number), not \u.

I’m guessing that your interactive shell is probably something like bash or zsh, which support significant expansions beyond POSIX, including things like \u.

Because Kakoune scripts should work the same way whether a user is using bash or zsh or something weirder like fish or PowerShell, Kakoune executes all commands with a basic POSIX shell, so things like \u may not be available.

Instead of printf '\u2026' you might try something like bash -c "printf '\u2026'" to ensure you’re using bash’s version of printf, or even /usr/bin/printf '\u2026' if you’re OK assuming that it’s the GNU coreutils version of printf and not some BSD version. For even more portability, I’m sure there’s a way to do it in Perl.

Brillant, that’s exactly what I was missing. Calling bash’s printf or /usr/bin/printf explicitely works with \u. Thank you for the detailed and informative reply!

:insert-output isn’t a standard Kakoune command, I assume you mean the ! normal-mode command whose prompt is “insert-output”.

Yes, I was confused about that.

For even more portability, I’m sure there’s a way to do it in Perl.

I would guess more than one.

And for anybody with the same use case who finds this thread, here is my best shot at a custom keybinding:

map global user "u" "!/usr/bin/printf '\u'<left>"

Usage: Hit space and u, enter codepoint, hit enter.

2 Likes

If you do it often you can implement a “digraphs” mode:

declare-user-mode digraphs

define-command enter_digraphs_mode %{
  enter-user-mode digraphs
}

define-command insert_text -params 1 %{
  evaluate-commands -save-regs '"' %{
    set-register '"' %arg{1}
    execute-keys -draft ';P'
  }
}

define-command open_insert_unicode_character_prompt %{
  prompt unicode_code_point: %{
    insert_text %sh{bash -c 'printf "\u$kak_text"'}
  }
}

map -docstring 'insert unicode character' global digraphs '=' ':open_insert_unicode_character_prompt<ret>'
map -docstring 'â' global digraphs 'q' ':insert_text â<ret>'
map -docstring 'é' global digraphs 'w' ':insert_text é<ret>'
map -docstring 'É' global digraphs 'W' ':insert_text É<ret>'
map -docstring 'è' global digraphs 'e' ':insert_text è<ret>'
map -docstring 'ë' global digraphs 'E' ':insert_text ë<ret>'
map -docstring 'ê' global digraphs 'r' ':insert_text ê<ret>'
map -docstring 'ù' global digraphs 'u' ':insert_text ù<ret>'
map -docstring 'û' global digraphs 'U' ':insert_text û<ret>'
map -docstring 'î' global digraphs 'i' ':insert_text î<ret>'
map -docstring 'ï' global digraphs 'I' ':insert_text ï<ret>'
map -docstring 'ô' global digraphs 'o' ':insert_text ô<ret>'
map -docstring 'ê' global digraphs '[' ':insert_text ê<ret>'
map -docstring 'ë' global digraphs ']' ':insert_text ë<ret>'
map -docstring '«' global digraphs '{' ':insert_text «<ret>'
map -docstring '»' global digraphs '}' ':insert_text »<ret>'
map -docstring 'à' global digraphs 'a' ':insert_text à<ret>'
map -docstring 'À' global digraphs 'A' ':insert_text À<ret>'
map -docstring 'â' global digraphs 's' ':insert_text â<ret>'
map -docstring 'ç' global digraphs 'c' ':insert_text ç<ret>'
map -docstring 'Ç' global digraphs 'C' ':insert_text Ç<ret>'
map -docstring 'æ' global digraphs 'z' ':insert_text æ<ret>'
map -docstring 'œ' global digraphs 'x' ':insert_text œ<ret>'
map -docstring '…' global digraphs '.' ':insert_text …<ret>'
map -docstring '’' global digraphs '<space>' ':insert_text ’<ret>'
map -docstring '“' global digraphs '<lt>' ':insert_text “<ret>'
map -docstring '”' global digraphs '<gt>' ':insert_text ”<ret>'
map -docstring '—' global digraphs '<minus>' ':insert_text —<ret>'
map -docstring '–' global digraphs '_' ':insert_text –<ret>'

Example configuration:

map -docstring 'enter digraphs mode' global insert <c-k> '<a-;>:enter_digraphs_mode<ret>'

That’s great. However, I’m very used to typing codepoints and will stick to it, especially since it works on my desktop as well (with ctrl+shift+u).

Out of interest: Where do the digraph mappings ( â-q, é-w etc.) come from? Did you make them up? Is it a standard?

And by the way, since this functionality is, of course, most needed in insert mode, here is a variation for my approach above for insert mode (shortcut alt+u):

map global insert "<a-u>" "<a-;>! /usr/bin/printf '\u'<left>"

Note this command will insert the character before selected text, and not before cursors.

So it does! I can see how that would trip me up. My first naive solution would be to discard the selection first:

map global insert "<a-u>" "<a-;>;<a-;>! /usr/bin/printf '\u'<left>"

But then, of course, I lose the selection.

Looking into your code, I learned about execute_keys -draft as the solution to that. And about a few other nifty things like :prompt and :evaluate_commands. Thanks! I trimmed your code down to not have the custom mode and the digraphs, and it seems to be working perfectly for me.

That said, I am surprised how complex it is to make kakoune insert text at the cursor in insert mode.

You can use <c-r> in insert mode to insert contents from a register at the cursor positions.

define-command open_insert_unicode_character_prompt %{
  prompt unicode_code_point: %{
    evaluate-commands -save-regs '"' %{
      set-register '"' %sh{bash -c 'printf "\\u$kak_text"'}
      execute-keys '<c-r>"'
    }
  }
}

map -docstring 'insert unicode character' global insert <a-u> '<a-;>:open_insert_unicode_character_prompt<ret>'