Flygrep like grepping in Kakoune

andreyorst · July 18, 2019, 11:00am

There’s a plugin for Vim, and many plugins for Emacs that do this kind of thing. I did this via fzf.kak through skim, but I’ve thought that it would be nice to have such grep without plugins:

define-command -override -docstring "flygrep: run grep on every key" \
flygrep %{
    edit -scratch *grep*
    prompt "flygrep: " -on-change %{
        flygrep-call-grep %val{text}
    } nop
}

define-command -override flygrep-call-grep -params 1 %{ evaluate-commands %sh{
    length=$(printf "%s" "$1" | wc -m)
    [ -z "${1##*&*}" ] && text=$(printf "%s\n" "$1" | sed "s/&/&&/g") || text="$1"
    if [ ${length:-0} -gt 2 ]; then
        printf "%s\n" "info"
        printf "%s\n" "evaluate-commands %&grep '$text'&"
    else
        printf "%s\n" "info -title flygrep %{$((3-${length:-0})) more chars}"
    fi
}}

It reuses original Kakoune grep command in order to do the search. Most of the code here is just escaping, and counting chars. As result, this command allows you type the pattern and search for it in semi-realtime

ftonneau · July 18, 2019, 1:11pm

Hi Andrey,

Honest question (no trolling ;-). Why do you use:

length=$(printf “%s” “$1” | wc -m)

to compute string length in chars instead of:

${#1}

which should work in any POSIX shell, not just in Bash, and has the performance advantage of not calling any external dependency? Is it because people on Solaris (for example) do not have a POSIX shell by default (although one is available and could be given priority in their PATH)?

More general question: which of the two ways would be more portable and/or recommended for Kakoune scripts?

andreyorst · July 18, 2019, 1:35pm

I did not knew about this way

Kakoune is designed to work in POSIX environment, so I guess in order to use Kakoune, users of Solaris would need a POSIX complaint shell.

ftonneau · July 18, 2019, 2:28pm

I was thinking that perhaps you had other reasons (e.g., portability of calling wc versus using ${#…} on Solaris). In any case, you are right about Kakoune’s logic, so the best practice in kak scripts would be to use the POSIX shell expansion, ${#…}.

Other expansions that are valid in Bash (e.g., general pattern replacement in a string) are not POSIX, however, so in the latter case calling sed would be the best option.

andreyorst · July 18, 2019, 3:06pm

It’s not exactly true. Kakoune uses UTF-8 strings, so the length may vary because symbols do not have fixed size. For example:

What is the length of this string? (try selecting it with your mouse)

नमस्ते

When your guess is done, open this

$ s="नमस्ते"
$ echo $s
नमस्ते
$ echo ${#s}
6
$ printf "%s" "$s" | wc --bytes
18
$ printf "%s" "$s" | wc --chars
6
$ echo $s | sed "s/\(.\)/'\1'\n/g"
'न'
'म'
'स'
'्'
'त'
'े'

There is 3 separate Unicode symbols: न, म, स्ते which are 4 different letters: न, म, स्, ते, which, in turn, are 6 different diacritics: ‘न’, ‘म’, ‘स’, ‘्’, ‘त’, ‘े’, which actually are 18 separate byte values: 224, 164, 168, 224, 164, 174, 224, 164, 184, 224, 165, 141, 224, 164, 164, 224, 165, 135

So neither ${#var} nor wc approach is really reliable here.

As of calling printf and wc in this particular function, I think that time that grep will take to display is much bigger than calculating length, so this is somewhat premature optimization.

andreyorst · July 18, 2019, 3:13pm

Actually I would appreciate if some experienced POSIX script writer could proofread my Kakoune scripts in order to point me to the things like that one

ftonneau · July 18, 2019, 3:51pm

I had thought about the utf-8 issue before posting, but found that ${#…} gave the same answer as wc --chars … on Bash (which is aware of one’s current locale). Just tried with Dash to double-check, and sure enough it fails!

Morals: yes, wc will be more portable if we are not sure that the string is simple Ascii. And yes, in this case the pure shell solution is worst. In other cases though (e.g. a script with a lot of looping, and simple Ascii strings), the pure shell solution may be worth considering.

ftonneau · July 19, 2019, 11:54am

In this example I think you are lookin into string width on a terminal, not string length in unicode points. And the correct answer should be 4, right? (On my system your example string renders as four rectangles.) Then there is an option of GNU wc that does seem to give the correct answer:

wc -L

However, this is specific to the GNU implementation, and determining display width in a relatively portable way is always a pain. See:

andreyorst · July 19, 2019, 12:40pm

Someone doesn’t have full Unicode font on the system

Yeah there’s 4 grapheme clusters (Unicode points) and currently there’s no known to me POSIX way to calculate that. Usually I use wc because it is portable and consistent, and most of the time that’s enough.

mawww · July 22, 2019, 12:27am

Kakoune should automatically select the POSIX shell on any compliant platform, this is why it goes through confstr(_CS_PATH) to try to find the shell instead of just using /bin/sh. (POSIX does not mandate /bin/sh to be a posix shell AFAIK, but it does mandate such a shell to be available).