Author Topic: Spellchecker Issues  (Read 45836 times)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #30 on: August 06, 2015, 11:15:30 pm »
Maybe the wrong single quote is used?
Re-read White-Tiger's post. He seems to have found the reason.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #31 on: August 10, 2015, 01:44:13 pm »
Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?
Which second suggestion exactly?
If you're talking about the wxIspunct and unification as mentioned, I've came to the conclusion that wxIspunct should be ok here. It got introduced in r10014 (spellchecker: replace hardcoded character set with unicode compatible calls, improves checking accuracy in utf8 comments) by alpha0010.
"wxIspunct" seems to handle everything that isn't a word character.. this includes ' and other characters that might be "part" of a word in some languages. So just filter those few characters out and it'll be fine

HunspellInterface.cpp:130 might be a bit troublesome to make use of wxIspunct... as it requires to rewrite the code so that we manually loop over the string and search/parse words..

I've also took a peek at Firefox's spell checker and it's also using ispunct, but the apostrophe is a special case... it is a punct if set alone or not between 2 "words",
that is " Windows' " is seen as " Windows " as there's no letter after the apostrophe. (and thus is successfully checked for spelling. "Windows'" is not part of the dictionary because it's not required if those rules are to be followed.

So something like "IsWhiteSpace()" returning 0 for non-space / word characters, 1 for space and 2 for "special". When it returns "2" we'll check if another !IsWhiteSpace follows which means it isn't a space. Otherwise it was.
Though Firefox uses ispunct() together with IsConditionalPunctuation() which returns true for ', 0x2019 /*RIGHT SINGLE QUOTATION MARK*/ and 0x00B7 /*MIDDLE DOT*/
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Spellchecker Issues
« Reply #32 on: August 18, 2015, 01:09:09 pm »
...
Well I am a bit lost now. Could you please state shortly once again what changes will fix the original bug reported? Maybe you can even provide a patch? Its easy to do: Checkout from SVN, make the changes in the working copy, run this command at the root of you working copy:
svn diff > diff.patch
(...assuming you have the SVN executable in the path.)
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: The new 16.01 spellchecker not work
« Reply #33 on: January 29, 2016, 07:18:04 am »
The new version 16.01 spell checker also confuses words, threatening the collapse of the whole program, of course.
Well I figured out meanwhile that this is not a C::B but a hunspell issue (that's the lib we use for spellchecking). I told the Hunspell maintainers but 'I got nothing in return so far...
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #34 on: January 29, 2016, 04:06:20 pm »
why do you think it's a hunspell issue?
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Spellchecker Issues
« Reply #35 on: January 29, 2016, 10:45:18 pm »
why do you think it's a hunspell issue?
Well to be more precise: We actually have two issues here:
1.) (hunspell): If the dictionaries are in a path with non-ASCII characters hunspell is unable to pick up any dictionary.
2.) The Russian words are broken due to the way we handle to find word boundaries in OnlineSpellChecker.cpp. Here (search for the comment "//find recheck range end:") we check for whitespace in a way that it does not work for e.g. Russian (see SpellCheckHelper::IsWhiteSpace(ch)).

The latter we can do something about it... the first one we can't. Both lead to Russian SpellChecking being broken, unfortunately.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #36 on: January 30, 2016, 01:33:37 am »
1) is also a C::B issue.. see: http://forums.codeblocks.org/index.php/topic,20195.msg139323.html#msg139323
Hunspell has means to support such paths.. C::B is simply not using them.
One could argue that they could have supported wchar_t* directly.. though their way is a bit less platform dependent

So something like this in HunspellInterface.cpp:61-63
Code: cpp
    wxCharBuffer affixFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strAffixFile);
    wxCharBuffer dictionaryFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strDictionaryFile);
    m_pHunspell = new Hunspell(affixFileCharBuffer, dictionaryFileCharBuffer);
would work for Windows (this is what I'm using locally, and as far as I can tell, it seems to work)
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Spellchecker Issues
« Reply #37 on: January 30, 2016, 07:48:28 am »
So something like this in HunspellInterface.cpp:61-63
Code: cpp
    wxCharBuffer affixFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strAffixFile);
    wxCharBuffer dictionaryFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strDictionaryFile);
    m_pHunspell = new Hunspell(affixFileCharBuffer, dictionaryFileCharBuffer);
would work for Windows (this is what I'm using locally, and as far as I can tell, it seems to work)
I've applied a cross-platform compatible version of this - for me that really seems to work. Nice catch!

So now whats missing is umlauts and Unicode... at least we are getting closer...
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ