If making a guess, the issue might be here:
bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}
Please find dictonary on yandex.disk - https://yadi.sk/d/glyHPzRKgsZkQ
Testfile and screenshot of wrong behaviour are attached
Here is C::B version string and localisation settings:
Code::Blocks svn build rev 10309 May 25 2015, 10:02:04 - wx2.8.12 (Linux, unicode) - 64 bit
alatar@al_work:~% locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=
alatar@al_work:~% uname -a
Linux al_work 3.17.7-gentoo #1 SMP PREEMPT Mon Mar 30 18:24:07 MSK 2015 x86_64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz GenuineIntel GNU/Linux
@Alatar:
I've tried the hunspell binary and it cannot spell-check correctly the Russian part of the file using your dictionary.
I've tried something like
$ copy the dictionary to /usr/share/hunspell
$ hunspell -d Russian-English -i utf-8 /tmp/spellcheck_check.txt
I guess this is the problem:
error: unknown encoding Windows-1251: using iso88591 as fallback
Please keep in mind that hunspell uses iconv to do the conversions.
If you can reproduce the problem with huspell in a console, then you should talk to either hunspell devs or the vendor of your dictionary.
I'm running this test on gentoo linux.
well.. the way Code::Blocks opens files seems to be fine... not sure if Code::Blocks uses Unicode / wchar_t / TCHAR on Windows, but the thing is that SpellChecker finds and successfully opens the dictionaries... otherwise this shouldn't work:
SpellChecker: Thesaurus files 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_GB.idx' not found!
SpellChecker: Loading 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_US.idx' instead...
So parts of SpellChecker seem to work, while others doesn't
edit: My bet:
HunspellInterface.cpp:61-62: should both prefix the path with "\\?\" to let Hunspell handle UTF-8 paths on Windows.. (Windows only)
see: /* Hunspell(aff, dic) - constructor of Hunspell class
* input: path of affix file and dictionary file
*
* In WIN32 environment, use UTF-8 encoded paths started with the long path
* prefix \\\\?\\ to handle system-independent character encoding and very
* long path names (without the long path prefix Hunspell will use fopen()
* with system-dependent character encoding instead of _wfopen()).
*/
If making a guess, the issue might be here:
bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}
This was actually the root of a problem I've encountered after fixing my file path issue above.
I had to change the "wxIspunct(ch)" part into "(wxIspunct(ch) && ch!='\'')" because words such as "doesn't" also showed up to be misspelled..
I suggest to unify the source code and use something like seen in HunspellInterface.cpp:130 (uses a list of known "non-word" chars)
wxString strDelimiters = _T(" \t\r\n.,?!@#$%^&*()-=_+[]{}\\|;:\"<>/~0123456789");
wxStringTokenizer tkz(strText, strDelimiters);
I've further noticed that SpellChecker doesn't seem to handle UTF-8 at all.. at least when I try to correct the word "doesn¾" and use the suggested "doesn't", I'll end up with "doesn'txBE"
The menu item also only showed "doesn" without any visible char thereafter. (so only the first half of the UTF-8 char)
1) is also a C::B issue.. see: http://forums.codeblocks.org/index.php/topic,20195.msg139323.html#msg139323
Hunspell has means to support such paths.. C::B is simply not using them.
One could argue that they could have supported wchar_t* directly.. though their way is a bit less platform dependent
So something like this in HunspellInterface.cpp:61-63
wxCharBuffer affixFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strAffixFile);
wxCharBuffer dictionaryFileCharBuffer = ConvertToUnicode(_T("\\\\?\\") + strDictionaryFile);
m_pHunspell = new Hunspell(affixFileCharBuffer, dictionaryFileCharBuffer);
would work for Windows (this is what I'm using locally, and as far as I can tell, it seems to work)