How should indentation be implemented in kak?

Guest0 · February 4, 2020, 4:41pm

The OCaml language support of kak now is quite poor (with only an incomplete syntax highlighting)
and I am willing to improve it. The common approach to OCaml integration in other editors is using merlin, a powerful semantic based tool. However, merlin integration requires quite a lot of work, and current merlin lsp wrappers don’t seem to support formating & highlighting.
After going through some builtin filetype plugins, I have done a more complete syntax highlighting, but I am not sure how indention should be implemented in kak (in particular, how to insert/delete indentations, no matter they are tabs/spaces)

prion · February 4, 2020, 11:48pm

there’s a plugin for it although you may want to just implement things yourself to prevent dependencies and avoid the complexity of the plugin + shell calls.

The general gist of things (see the expandtab-impl function) is to add a hook to \t insert that replaces the inserted characters with spaces using the built in @ command. When you delete a space you want to check to see if all the preceding characters are spaces (<a-h><a-k>^\h+.\z) then de-indent using <lt>. You’ll also want a BufCreate hook for *.ml files that sets up the hooks. The plugin seems to handle a few other cases (and inserts an extra space for some reason?) but I’ll admit that I can’t figure out the purpose of them. @andreyorst could probably explain though.

EDIT: The extra space is probably because the delete space event you’ve hooked into still needs something to delete.

andreyorst · February 5, 2020, 3:47am

@prion I don’t think smarttab.kak is a good choice for understanding how semantic indentation works in Kakoune, because smarttab.kak only changes the characters that are being inserted and how spaces are deleted, while file type hooks are still a primary source for indentation rules.

@Guest0 Usually you just copy the indentation of previous line, and check some previous lines for a specific pattern. If something found then you should increase or decrease indentation depending on the pattern.

prion · February 5, 2020, 4:04am

AFAIK Kakoune doesn’t have support for tabs as spaces though right? You can only change the display size of tabs. I’m pretty sure you’re not supposed to use tab characters in OCaml which is why I suggested taking a look at your plugin’s implementation for spaces as tabs.

Probably should have been more clear in why I was bringing it up.

andreyorst · February 5, 2020, 5:17am

Ah, I’ve did not know about spaces in OCaml. Then yeah, for deletion of spaces as fake tabstops my plugin is (I hope) a good source. Scheme and Lisp should also be indented with spaces, though I don’t remember if it is this way in Kakoune.

Guest0 · February 5, 2020, 5:18am

Well, thank for the suggestions.
I have some inspirations, but not sure whether applicable / howto.
In terms of tab/space problem, I actually found an appealing approach (although I don’t understand all its details) in the builtin Haskell plugin, which is done by selecting \A and \z and align them. i wonder if anyone knows how this works, as it avoids the tab/space problem.
Also, regex-based indentation is doomed to be very limited, as as example, consider the following properly indented OCaml code:

let x =
   1
in x

let y =
   some_function
       some_argument
in y

In the first example, my experimental implementation can properly reduce a level of indentation at the ‘in’, but in the second example, the 'in’s indentation needs to be reduced by 2 levels, and 2 can be any greater number, and I conjecture that the correct behavior is not possible to obtain without really parsing the source.
So my following question is if it is possible to use external, language-specific formatting tools (ocaml-format, in this case) to do indentation? (This is also how formatting is supposed to be supported in the ocaml-lsp server, see this issue)

andreyorst · February 5, 2020, 5:51am

can in be indented to the right or it is always on the same level as let? If so I don’t see the regex problem, as we can just look current line, and if it has some specific keyword relationship copy the needed indent level from let block or some other

Guest0 · February 5, 2020, 6:47am

There are actually two styles in OCaml, used interchangeably in different situations, both very common.
In one style, as my example above, with let/in paired together (like parenthesis) and shares the same indentation. Another common style is to not put ‘in’ on a separate line, but let it follows the last line. The first style is more common, AFAIK.
This is a piece of code took from the OCaml compiler itself:

let exp =
    { pexp_desc = Pexp_constant (Pconst_string(ds.ds_body, ds.ds_loc, None));
      pexp_loc = ds.ds_loc;
      pexp_loc_stack = [];
      pexp_attributes = []; }
in

Here the ‘in’ line’s indentation is indeed reduced by two levels (one for let, one for the curly braces)

Guest0 · February 5, 2020, 6:57am

I have found an interesting tool which seems promising enough for the job of indentation. However I believe builtin plugins for kak should not have any external dependencies… So maybe there should be a separate OCaml plugin, or perhaps a ‘use kak for OCaml’ tutorial, if the customization process is simple enough…

gainhad · February 7, 2020, 6:09pm

There’s already an ocp-indent kakoune plugin. It works okay.

As far as improving ocaml support in Kakoune, I’ve been looking to the same thing! I’ve only done a little bit of work on syntax highlighting, but I want put more time into it. Let me know if you want to collaborate.

Guest0 · February 8, 2020, 5:17am

Oh, I see the plugin. My own version uses a different approach. I pass ‘–numeric’ parameter to ocp-indent, and apply the indent by temporarily setting the indentwidth option to 1 and use the ‘>’ key.

As for syntax highlighting, I am now using the builtin one with the addition of module names, constructors and operators. In merlin, the highlighter can identify modules and constructors, but I believe this needs semantic based analysis. And I am personally OK with highlighting modules and constructors using the same color.

I did try to identify modules and constructors (approximately) by using regex, by setting anything following ‘((module(\h+type)?)|open)\h+’ or followed by ‘.’ , but functor applications won’t be highlighted properly, and module names followed by ‘\h+\.’ cannot be highlighted properly, due to the lack of quantifiers in lookaround regex.

I have not dived into the group system of kak yet, but it should be able to do some parser-like jobs. However I am actually betting my hopes on LSP, there are issues about that. Since merlin already does syntax highlighting, I believe it is not hard for lsp server frontends of merlin to have syntax highlighting (when syntax highlighting eventually become part of lsp, though…)

gainhad · February 8, 2020, 11:33pm

Are you sure that merlin provides syntax highlighting? I haven’t seen that

Guest0 · February 9, 2020, 1:58am

Well, I have not dived into merlin for that. My experience of using merlin in neovim includes a very good syntax highlighting (for example, the module/constructor distinction mentioned above), and I have an uncertain memory that once I manually turned off the builtin OCaml support of vim for debugging and the highlighting remains. Anyway, no direct evidence for that, point me out if I am wrong. I would also be glad if we can learn from the vim builtin OCaml support.

gainhad · February 9, 2020, 6:59pm

Here is what I see in vim without merlin:

And here is the ocaml syntax file that comes with vim: vim/runtime/syntax/ocaml.vim at master · vim/vim · GitHub

Guest0 · February 10, 2020, 7:27am

Oh, I have forgotten that spaces can be highlighted without visual change. Now I can highlight properly module paths (dots) and simple module/signature definitions, but functor definition & application are still not supported. For these I suppose the region system is needed to imitate a parser.

Guest0 · February 11, 2020, 2:12pm

I have done a improved version of OCaml syntax highlighting, including extra support for number literals, constructors, operators and (part of) module names. I have also done some bug fixes (fix nested comments). This is how a OCaml file looks with the new highlighters:

Currently I am mapping number literals and constructors to the ‘value’ face, but not sure if there is any better choice.

gainhad · February 11, 2020, 2:40pm

This looks really nice! Do you have this publicly available?

Guest0 · February 12, 2020, 3:11am

I have not published it yet, but I am planning to commit it to the official kak repo.

gainhad · February 12, 2020, 2:41pm

Awesome! Do you mind posting here when you do? I’d love to pull down your changes and test before it’s merged. Thanks for doing this

Guest0 · February 13, 2020, 12:36am

Well, I am not sure how to upload a text file here, so I’ll post the difference manually.
Replace the comment line by this (for nested comments):

add-highlighter shared/ocaml/comment region -recurse \Q(* \Q(* \Q*) fill comment

Add the following lines in the ‘Highlighter’ part and before the ‘Macro’ part:

# integer literals
add-highlighter shared/ocaml/code/ regex \b[0-9][0-9_]*([lLn]?)\b                          0:value
add-highlighter shared/ocaml/code/ regex \b0[xX][A-Fa-f0-9][A-Fa-f0-9_]*([lLn]?)\b         0:value
add-highlighter shared/ocaml/code/ regex \b0[oO][0-7][0-7_]*([lLn]?)\b                     0:value
add-highlighter shared/ocaml/code/ regex \b0[bB][01][01_]*([lLn]?)\b                       0:value
# float literals
add-highlighter shared/ocaml/code/ regex \b[0-9][0-9_]*(\.[0-9_]*)?([eE][+-]?[0-9][0-9_]*)?                       0:value
add-highlighter shared/ocaml/code/ regex \b0[xX][A-Fa-f0-9][A-Fa-f0-9]*(\.[a-fA-F0-9_]*)?([pP][+-]?[0-9][0-9_]*)? 0:value
# constructors. must be put before any module name highlighter, as a fallback for capitalized identifiers
add-highlighter shared/ocaml/code/ regex \b[A-Z][a-zA-Z0-9_]*\b                            0:value
# the module name in a module path, e.g. 'M' in 'M.x'
add-highlighter shared/ocaml/code/ regex (\b[A-Z][a-zA-Z0-9_]*[\h\n]*)(?=\.)               0:module
# (simple) module declarations
add-highlighter shared/ocaml/code/ regex (?<=module)([\h\n]+[A-Z][a-zA-Z0-9_]*)          0:module
# (simple) signature declarations. 'type' is also highlighted, due to the lack of quantifiers in lookarounds.
# Hence we must put keyword highlighters after this to recover proper highlight for 'type'
add-highlighter shared/ocaml/code/ regex (?<=module)([\h\n]+type[\h\n]+[A-Z][a-zA-Z0-9_]*) 0:module
# (simple) open statements
add-highlighter shared/ocaml/code/ regex (?<=open)([\h\n]+[A-Z][a-zA-Z0-9_]*)              0:module
# operators
add-highlighter shared/ocaml/code/ regex [@!$%%^&*\-+=|<>/?]+                              0:operator