CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

(1/2) > >>

J.:
I am wondering why it is the KMP search algorithm that is used in the CodeCompletion plugin (tokenizer.cpp).

The KMP algorithm optimizes the search for any sub-matches in arbitrary strings, while it is my understanding that

* the CB tokenizer passes tokens (i.e. words) to the parser and
* matches can only happen at the beginning of a token (which would then make KMP obsolete).
Moreover, the KMP preprocessing routine is called for every next-match search regardless if the search word (key) has changed or not. A performance downside.

... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?

Any views?

oBFusCATed:

--- Quote from: J. on November 20, 2018, 11:26:47 pm ---... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?

--- End quote ---
My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.

ollydbg:
kmp algorithm in the tokenizer class is only used for macro string expansion.

Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.

If you have motivation and time, we welcome any contribution, thanks

J.:

--- Quote from: oBFusCATed on November 20, 2018, 11:43:44 pm ---My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.

--- End quote ---
That's intriguing. Assuming that clangd would then essentially replace the current code completion module, there should not be too much efforts put into it until then from my point of view.

J.:

--- Quote from: ollydbg on November 21, 2018, 01:38:47 am ---kmp algorithm in the tokenizer class is only used for macro string expansion.

Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.

--- End quote ---
But c++ parsing together with cpp macro string expansion would then be covered by clangd once implemented, no? See above.

--- Quote ---If you have motivation and time, we welcome any contribution, thanks

--- End quote ---
The question is by when clangd should be expected to be part of CB and then - if I am correct here - life would become much easier with regards to c/c++ parsing incl. cpp stages.
I will see if I can spare some time for the code completion part. I am inclined to think that KMP is over the top, since we are not looking for substrings in words, but the matches are always at the beginning of a word. My hope would be that dropping KMP and using PCRE regular expressions speed things up. Currently I am not sure if the advanced wx regexp flavour is sufficient what I have in mind.

Navigation

[0] Message Index

[#] Next page

Go to full version