Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

Wrong spell checker on russian (and probably other languages)

<< < (7/10) > >>

oBFusCATed:
1. m_pSpellHelper->HasStyleToBeChecked(lang, stc->GetStyleAt(pos)) in OnlineSpellChecker::DoSetIndications is wrong. It should be using wordstart
2. pos++ in the same loop is suspicious. Why don't you move with a whole word if HasStyleToBeChecked returns false?
3. do you need to WordEndPosition if WordStartPosition fails?
4. wordstart and wordend variables could have smaller scope, they aren't used outside of the loop
5. m_invalidatedRangesStart and m_invalidatedRangesEnd are (almost) already at word boundaries, why don't you take advantage of this, or even don't do the word boundaries adjustment for them
6. What happens if some of the new function calls return wrong result in SpellCheckerPlugin::OnThesaurus? Is no the selection case handled correctly there?
7. Same in SpellCheckerPlugin::OnCamelCase

BlueHazzard:

--- Quote ---1. m_pSpellHelper->HasStyleToBeChecked(lang, stc->GetStyleAt(pos)) in OnlineSpellChecker::DoSetIndications is wrong. It should be using wordstart
--- End quote ---
fixed

--- Quote ---2. pos++ in the same loop is suspicious. Why don't you move with a whole word if HasStyleToBeChecked returns false?
--- End quote ---
because this can end in an infinite loop.
if i use this code:

--- Code: ---pos = wordend + 1;
--- End code ---
I tested it and found this case: End of line

--- Code: ---*\n\r
--- End code ---
1) Word start is 0 ('*') word end is  0 ('*')
1.1) pos = wordend (0) + 1 == 1
2) Word start is 1 ('\n') word end is 1 ('\n')
2.1) pos = wordend (1) + 1 == 2
3) Word start is 1 ('\n') word end is 1 ('\n') because '\r' is not used as word start
3.1) pos = wordend (1) + 1 == 2
4 ...... (endless loop)

pos++ guarantees that strange word combinations are skipped (for example faulty utf code points scintilla can not parse, purely speculative)....


--- Quote ---3. do you need to WordEndPosition if WordStartPosition fails?
--- End quote ---
This really sounds like premature optimization (and what did Knuth tell us about that?) Anyway, i added a check for it


--- Quote ---4. wordstart and wordend variables could have smaller scope, they aren't used outside of the loop
--- End quote ---
fixed


--- Quote ---5. m_invalidatedRangesStart and m_invalidatedRangesEnd are (almost) already at word boundaries, why don't you take advantage of this, or even don't do the word boundaries adjustment for them
--- End quote ---
This two variables contain the whole invalidated/modified text. This range is also used to remove the marker. So the whole block that has to be rechecked is invalidated at one call. If we split this range into single words, we get rid of two messages calls and loose this advantages, because we call a lot more times IndicatorSetStyle (for every word, that can also be a single charachter)... No gain for speed here...


--- Quote ---6. What happens if some of the new function calls return wrong result in SpellCheckerPlugin::OnThesaurus? Is no the selection case handled correctly there?
--- End quote ---
Yes and i added additional checks


--- Quote ---7. Same in SpellCheckerPlugin::OnCamelCase
--- End quote ---
see above

I have committed the code now (revision: 11888)... I think we are at a good point and i would like to have it tested in the next nightly....
Thank you again!

oBFusCATed:

--- Quote from: BlueHazzard on October 28, 2019, 11:20:58 pm ---
--- Quote ---2. pos++ in the same loop is suspicious. Why don't you move with a whole word if HasStyleToBeChecked returns false?
--- End quote ---
because this can end in an infinite loop.
if i use this code:

--- Code: ---pos = wordend + 1;
--- End code ---
I tested it and found this case: End of line

--- Code: ---*\n\r
--- End code ---
1) Word start is 0 ('*') word end is  0 ('*')
1.1) pos = wordend (0) + 1 == 1
2) Word start is 1 ('\n') word end is 1 ('\n')
2.1) pos = wordend (1) + 1 == 2
3) Word start is 1 ('\n') word end is 1 ('\n') because '\r' is not used as word start
3.1) pos = wordend (1) + 1 == 2
4 ...... (endless loop)

--- End quote ---
You can always do a pos++ if you detect you're not moving. Keep in mind that checks are inexpensive relatively to calls to Scintilla! My guess would 10:1 the difference in execution time. Scintilla calls are non inlined, there 1 or even 2 big switches and then there is more code for the actual operation.


--- Quote from: BlueHazzard on October 28, 2019, 11:20:58 pm ---
--- Quote ---3. do you need to WordEndPosition if WordStartPosition fails?
--- End quote ---
This really sounds like premature optimization (and what did Knuth tell us about that?) Anyway, i added a check for it

--- End quote ---
Are you sure you're quoting Knuth correctly? Do you actually know the context where he formulated it and where it was valid? This quote is probably from different time and it is not valid any more.

Computers are fast, really fast, software is just painfully inefficient...

BlueHazzard:
ok, your last comment made me wonder, and indeed;

--- Code: ---diff --git a/src/plugins/contrib/SpellChecker/OnlineSpellChecker.cpp b/src/plugins/contrib/SpellChecker/OnlineSpellChecker.cpp
index b08b91f51..2753582e6 100644
--- a/src/plugins/contrib/SpellChecker/OnlineSpellChecker.cpp
+++ b/src/plugins/contrib/SpellChecker/OnlineSpellChecker.cpp
@@ -198,6 +198,8 @@ void OnlineSpellChecker::DoSetIndications(cbEditor* ctrl) const
             // remove styling:
             stc->IndicatorClearRange(start, end - start);

+            int oldWordStart = 0;
+            int posToAdd = 0;
             for ( int pos = start ;  pos < end ; pos++)
             {
                 EditorColourSet* colour_set = Manager::Get()->GetEditorManager()->GetColourSet();
@@ -215,7 +217,17 @@ void OnlineSpellChecker::DoSetIndications(cbEditor* ctrl) const
                      m_pSpellHelper->HasStyleToBeChecked(lang, stc->GetStyleAt(wordstart)) )
                 {
                     DissectWordAndCheck(stc, wordstart, wordend);
-                    pos = wordend;
+                }
+                pos = wordend;
+                if ( oldWordStart == wordstart )
+                {
+                    posToAdd++;
+                    pos += posToAdd;
+                }
+                else
+                {
+                    posToAdd = 0;
+                    oldWordStart = wordstart;
                 }

             }

--- End code ---
this patch makes loading the main.cpp file nearly 33% faster...

oBFusCATed:
I don't really understand what it does...  :o
It would be good if you either add a comment explaining it or even better make it simpler and more clear.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version