Code::Blocks Forums

Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => Topic started by: J. on November 20, 2018, 11:26:47 pm

Title: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: J. on November 20, 2018, 11:26:47 pm
I am wondering why it is the KMP search algorithm that is used in the CodeCompletion plugin (tokenizer.cpp).

The KMP algorithm optimizes the search for any sub-matches in arbitrary strings, while it is my understanding that

Moreover, the KMP preprocessing routine is called for every next-match search regardless if the search word (key) has changed or not. A performance downside.

... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?

Any views?
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: oBFusCATed on November 20, 2018, 11:43:44 pm
... or is it intended to move to another more robust and more performant lexer/parser combination in the very next future for CB?
My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: ollydbg on November 21, 2018, 01:38:47 am
kmp algorithm in the tokenizer class is only used for macro string expansion.

Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.

If you have motivation and time, we welcome any contribution, thanks
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: J. on November 26, 2018, 03:48:17 pm
My personal goal is to move to a language server plugin and use clangd for parsing. I have a working prototype, but I have to spend a lot of time in polishing also clangd has to implement a lot of features to be really useful.
That's intriguing. Assuming that clangd would then essentially replace the current code completion module, there should not be too much efforts put into it until then from my point of view.   
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: J. on November 26, 2018, 03:58:40 pm
kmp algorithm in the tokenizer class is only used for macro string expansion.

Parsing the C++ source code is quite hard, I have been maintaining the parsing algorithm for a long time, but it still has a lot of things to implement.
But c++ parsing together with cpp macro string expansion would then be covered by clangd once implemented, no? See above.
Quote
If you have motivation and time, we welcome any contribution, thanks
The question is by when clangd should be expected to be part of CB and then - if I am correct here - life would become much easier with regards to c/c++ parsing incl. cpp stages. 
I will see if I can spare some time for the code completion part.  I am inclined to think that KMP is over the top, since we are not looking for substrings in words, but the matches are always at the beginning of a word.  My hope would be that dropping KMP and using PCRE regular expressions speed things up.  Currently I am not sure if the advanced wx regexp flavour is sufficient what I have in mind.
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: oBFusCATed on November 26, 2018, 05:38:23 pm
The question is by when clangd should be expected to be part of CB and then...
When it is ready, as always. I hope to spend some time on the LSP plugin soon...
But clangd as a server has limitations, too. If you want to contribute to a parser, contributing to clangd is a good option. :)
Title: Re: CodeCompletion plugin performance: usage of KMP search algorithm (tokenizer.cpp)
Post by: ollydbg on November 27, 2018, 06:36:06 am
Language Server Protocol - Wikipedia (https://en.wikipedia.org/wiki/Language_Server_Protocol) sounds interesting, and it's initially designed by Microsoft. :)