Author Topic: Spellchecker Issues (Read 46628 times)

oBFusCATed · « **Reply #30 on:** August 06, 2015, 11:15:30 pm »

Quote from: stahta01 on August 06, 2015, 10:53:00 pm

Maybe the wrong single quote is used?

Re-read White-Tiger's post. He seems to have found the reason.

White-Tiger · « **Reply #31 on:** August 10, 2015, 01:44:13 pm »

Quote from: oBFusCATed on August 06, 2015, 08:56:05 pm

Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?

Which second suggestion exactly?
If you're talking about the wxIspunct and unification as mentioned, I've came to the conclusion that wxIspunct should be ok here. It got introduced in r10014 (spellchecker: replace hardcoded character set with unicode compatible calls, improves checking accuracy in utf8 comments) by alpha0010.
"wxIspunct" seems to handle everything that isn't a word character.. this includes ' and other characters that might be "part" of a word in some languages. So just filter those few characters out and it'll be fine

HunspellInterface.cpp:130 might be a bit troublesome to make use of wxIspunct... as it requires to rewrite the code so that we manually loop over the string and search/parse words..

I've also took a peek at Firefox's spell checker and it's also using ispunct, but the apostrophe is a special case... it is a punct if set alone or not between 2 "words",
that is " Windows' " is seen as " Windows " as there's no letter after the apostrophe. (and thus is successfully checked for spelling. "Windows'" is not part of the dictionary because it's not required if those rules are to be followed.

So something like "IsWhiteSpace()" returning 0 for non-space / word characters, 1 for space and 2 for "special". When it returns "2" we'll check if another !IsWhiteSpace follows which means it isn't a space. Otherwise it was.
Though Firefox uses ispunct() together with IsConditionalPunctuation() which returns true for ', 0x2019 /*RIGHT SINGLE QUOTATION MARK*/ and 0x00B7 /*MIDDLE DOT*/

MortenMacFly · « **Reply #32 on:** August 18, 2015, 01:09:09 pm »

Quote from: White-Tiger on August 10, 2015, 01:44:13 pm

...

Well I am a bit lost now. Could you please state shortly once again what changes will fix the original bug reported? Maybe you can even provide a patch? Its easy to do: Checkout from SVN, make the changes in the working copy, run this command at the root of you working copy:
svn diff > diff.patch
(...assuming you have the SVN executable in the path.)

MortenMacFly · « **Reply #33 on:** January 29, 2016, 07:18:04 am »

Quote from: Khram on January 29, 2016, 01:35:43 am

The new version 16.01 spell checker also confuses words, threatening the collapse of the whole program, of course.

Well I figured out meanwhile that this is not a C::B but a hunspell issue (that's the lib we use for spellchecking). I told the Hunspell maintainers but 'I got nothing in return so far...

White-Tiger · « **Reply #34 on:** January 29, 2016, 04:06:20 pm »

why do you think it's a hunspell issue?

MortenMacFly · « **Reply #35 on:** January 29, 2016, 10:45:18 pm »

Quote from: White-Tiger on January 29, 2016, 04:06:20 pm

why do you think it's a hunspell issue?

Well to be more precise: We actually have two issues here:
1.) (hunspell): If the dictionaries are in a path with non-ASCII characters hunspell is unable to pick up any dictionary.
2.) The Russian words are broken due to the way we handle to find word boundaries in OnlineSpellChecker.cpp. Here (search for the comment "//find recheck range end:") we check for whitespace in a way that it does not work for e.g. Russian (see SpellCheckHelper::IsWhiteSpace(ch)).

The latter we can do something about it... the first one we can't. Both lead to Russian SpellChecking being broken, unfortunately.

White-Tiger · « **Reply #36 on:** January 30, 2016, 01:33:37 am »

1) is also a C::B issue.. see: http://forums.codeblocks.org/index.php/topic,20195.msg139323.html#msg139323
Hunspell has means to support such paths.. C::B is simply not using them.
One could argue that they could have supported wchar_t* directly.. though their way is a bit less platform dependent

So something like this in HunspellInterface.cpp:61-63

Code: cpp

    wxCharBuffer affixFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strAffixFile);
    wxCharBuffer dictionaryFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strDictionaryFile);
    m_pHunspell = new Hunspell(affixFileCharBuffer, dictionaryFileCharBuffer);

would work for Windows (this is what I'm using locally, and as far as I can tell, it seems to work)

MortenMacFly · « **Reply #37 on:** January 30, 2016, 07:48:28 am »

Quote from: White-Tiger on January 30, 2016, 01:33:37 am

So something like this in HunspellInterface.cpp:61-63
Code: cpp
    wxCharBuffer affixFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strAffixFile);
    wxCharBuffer dictionaryFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strDictionaryFile);
    m_pHunspell = new Hunspell(affixFileCharBuffer, dictionaryFileCharBuffer);
would work for Windows (this is what I'm using locally, and as far as I can tell, it seems to work)

I've applied a cross-platform compatible version of this - for me that really seems to work. Nice catch!

So now whats missing is umlauts and Unicode... at least we are getting closer...

Code::Blocks Forums

News:

Author Topic: Spellchecker Issues (Read 46628 times)

oBFusCATed

Re: Spellchecker Issues

White-Tiger

Re: Spellchecker Issues

MortenMacFly

Re: Spellchecker Issues

MortenMacFly

Re: The new 16.01 spellchecker not work

White-Tiger

Re: Spellchecker Issues

MortenMacFly

Re: Spellchecker Issues

White-Tiger

Re: Spellchecker Issues

MortenMacFly

Re: Spellchecker Issues