Extra_word_chars is used for two unrelated things

extra_word_chars is used for movement but also for lsp identifier extraction. I think these should be decoupled by introducing extra_word_chars_id, unless there’s a better idea.

The issue is that I change extra_word_chars to suit my movement workflow (underscore is not part of a literal word, but is part of an identifier “word”) but this breaks lsp which no longer correctly detects thing-at-point for definition lookup etc.

Since extra_word_chars for idents depends on the filemode it further conflates this issue of a what word is grammatically (which is fixed) vs as a programming language unit (which varies).

Personally if I want to treat underscore (or any non-whitespace chars) as part of a word there’s <a-w> for just that purpose, so to me the default extra_word_chars value should be empty (well, a dummy value as it can’t be empty for whatever reason).

Thoughts?

1 Like

extra_word_chars is used for “movement by words” (the w, b, and e keys), for word objects in the <a-a> and <a-i> menus, for generating word-based completions (the word=all or word=buffer values in the completers option) and (I think) for detecting if you’ve accepted a completion (by typing a non-word character). It is very much about making Kakoune’s idea of a “word” conform to the “identifier” rules for the language in the current buffer, much like Vim’s iskeyword option.

Perhaps you could make mappings for w, b, e that search forward or backward for a regex like [A-Za-z']+ ? Then they’d skip over other things that aren’t grammatical words, like punctuation, numbers, and mathematical operators.

I do wish extra_word_chars defaulted to _, instead of defaulting to empty and having the special rule “if it’s empty, it behaves as if it were set to an underscore”. That would make writing plugins that respect extra_word_chars much simpler.

EDIT: Oh, there’s already a snippet on the wiki: Selections · mawww/kakoune Wiki · GitHub

The wiki snippet doesn’t behave like the w, b motions sadly, just like a textobject instead. I managed to implement this:

define-command -params 1 with-subword %{
    evaluate-commands %{
        # note: this is an unused char U+FFFFF
        set-option local extra_word_chars '￿'
        execute-keys %val{count} %arg|@|
    }
}

map global normal <c-b> ':with-subword b<ret>'
map global normal <c-B> ':with-subword B<ret>'
map global normal <c-w> ':with-subword w<ret>'
map global normal <c-W> ':with-subword W<ret>'
map global normal <c-e> ':with-subword e<ret>'
map global normal <c-E> ':with-subword E<ret>'

it can account for snake case, kebab case, but not camel/pascal case.

2 Likes

OK so in essence it should’ve been called extra_keyword_chars or extra_symbol_chars. The description doesn’t allude to this either:

extra_word_chars codepoint-list
a list of all additional codepoints that should be considered part of a word, for the purposes of the w, b, and e commands…

Let’s assume we update the description at least, then things are less surprising/confusing. However, it’s inconsistent with <a-w> which only cares about whitespace and ignores all code boundaries and so to me it’s more in the grammar realm. But it’s a minor quirk I guess.

I can hack around it by rebinding <w> etc like you say but it doesn’t sit right with me. For one thing it’ll behave differently when I move around vs word-completion etc or even break them perhaps…

I might actually test out flipping it around and try using the default behaviour/value of <w>/extra_word_chars but remap <a-w> to a subword jump (thanks @ficd) as I don’t use <a-w> that much, and if I get used to jumping over code words then, combined with f<spc> or /<spc>nnn, I might never miss the default <a-w> 99% of the time. I might get annoyed if I end up holding Alt 99% of the time though :laughing:… shouldn’t be that bad, we’ll see.

Actually, for that remaining 1%, is there a way to quickly toggle reverting to the default bindings? …without having to create a mode lock and remap all my keys… I’m hoping there’s a simple native way to temporarily disable all user bindings.

Thanks for your input guys.

Thanks! The wiki link does point to GitHub - Delapouite/kakoune-hump: Commands to select subwords ("humps") for kakoune which seems like a good fit and supports camel case, in case you missed it.

I’ve used that plugin, my issue with it is that we end up with something more like a textobject than a motion. For example if my cursor is in the middle of a subword, I’d like to be able to select it by pressing be for example. kakoune-hump only works if I make my cursor leave the word completely, and then invoke it

This is a quirk inherited from vi, I believe. In vi, w moves past spans of alphanumeric or punctuation characters (so “abc(def)” is four stops), while W moves past spans of non-whitespace characters. vi isn’t very configurable, so it provides two different approximations of “a word” that could be used in different situations. Mnemonically, vi users refer to those two patterns as “words” and “WORDs”.

Kakoune inherited those two patterns (and you can still see “word” and “WORD” listed in the <a-i> menu), but because of how it standardises the Shift modifier, the keys are now w for “word” and <a-w> for “WORD”. Kakoune also adds extra_word_chars to alter the definition of “words”, so instead of having “words” and “WORDs” that are about equally useful, we now have “words” that are almost always the ones you want, and “WORDs” that are reserved for special occasions and old habits.

If Kakoune were being designed from scratch today, we might not have <a-w> at all, but now we have it there’s not much point getting rid of it.

I don’t believe there is. The backslash prefix disables hooks, but not mappings. That might be a useful feature to have, though, if the user accidentally maps : to something so they can’t get back to the prompt to unmap it. I don’t think <a-\> has a function yet.

1 Like

That might be a useful feature to have, though

I think it would fit in nicely as <a-\>. I remapped a handful of other keys and it’s always in the back of my mind that one day it’ll cause me a headache debugging why a script I wrote behaves differently vs interactively.

Thanks for explaining the origins of <a-w>, makes more sense now - I never used/knew about it even after over a decade of daily vim; only discovered it in kak thanks to the consistent mapping scheme and great on-the-fly docs.

1 Like

I see what you mean. In fact I can’t even use it because if the cursor is at the start of a word [a]pple then jumping a hump results in a[pple] instead of [apple] as with regular w. I tried fixing that but it just introduces new bugs.

I’ll stick to your version and continue to curse camelCase :slight_smile:

1 Like

…one second later and I just realised that’s the same issue you were describing :face_with_open_eyes_and_hand_over_mouth:

1 Like