Developer forums (C::B DEVELOPMENT STRICTLY!) > Development
CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
J.:
I am wondering why it is the KMP search algorithm that is used in the CodeCompletion plugin (tokenizer.cpp).
The KMP algorithm optimizes the search for any sub-matches in arbitrary strings, while it is my understanding that
* the CB tokenizer passes tokens (i.e. words) to the parser and
* matches can only happen at the beginning of a token (which would then make KMP obsolete).
Moreover, the KMP preprocessing routine is called for every next-match search regardless if the search word (key) has changed or not. A performance downside.
... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?
Any views?
oBFusCATed:
--- Quote from: J. on November 20, 2018, 11:26:47 pm ---... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?
--- End quote ---
My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.
ollydbg:
kmp algorithm in the tokenizer class is only used for macro string expansion.
Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.
If you have motivation and time, we welcome any contribution, thanks
J.:
--- Quote from: oBFusCATed on November 20, 2018, 11:43:44 pm ---My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.
--- End quote ---
That's intriguing. Assuming that clangd would then essentially replace the current code completion module, there should not be too much efforts put into it until then from my point of view.
J.:
--- Quote from: ollydbg on November 21, 2018, 01:38:47 am ---kmp algorithm in the tokenizer class is only used for macro string expansion.
Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.
--- End quote ---
But c++ parsing together with cpp macro string expansion would then be covered by clangd once implemented, no? See above.
--- Quote ---If you have motivation and time, we welcome any contribution, thanks
--- End quote ---
The question is by when clangd should be expected to be part of CB and then - if I am correct here - life would become much easier with regards to c/c++ parsing incl. cpp stages.
I will see if I can spare some time for the code completion part. I am inclined to think that KMP is over the top, since we are not looking for substrings in words, but the matches are always at the beginning of a word. My hope would be that dropping KMP and using PCRE regular expressions speed things up. Currently I am not sure if the advanced wx regexp flavour is sufficient what I have in mind.
Navigation
[0] Message Index
[#] Next page
Go to full version