Spellchecker Issues

Developer forums (C::B DEVELOPMENT STRICTLY!) > Plugins development

Spellchecker Issues

<< < (6/8) > >>

raynebc:
One insight I can offer is that the traditional file I/O C functions like fopen (at least in Windows with MinGW) tend to not support file paths containing Unicode or extended ASCII characters. It's been a huge thorn in my side for some time now. Third party I/O functions (like the ones in the Allegro game library) can open such files with absolutely no problem. Non cross-platform implementations like the ones in Visual Studio also probably support such file paths because I've never run into any Windows-specific application with that limitation.

White-Tiger:
well.. the way Code::Blocks opens files seems to be fine... not sure if Code::Blocks uses Unicode / wchar_t / TCHAR on Windows, but the thing is that SpellChecker finds and successfully opens the dictionaries... otherwise this shouldn't work:

--- Code: ---SpellChecker: Thesaurus files 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_GB.idx' not found!
SpellChecker: Loading 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_US.idx' instead...
--- End code ---

So parts of SpellChecker seem to work, while others doesn't

edit: My bet:
HunspellInterface.cpp:61-62: should both prefix the path with "\\?\" to let Hunspell handle UTF-8 paths on Windows.. (Windows only)
see:
--- Quote from: hunspell/hunspell.hxx --- /* Hunspell(aff, dic) - constructor of Hunspell class
* input: path of affix file and dictionary file
*
* In WIN32 environment, use UTF-8 encoded paths started with the long path
* prefix \\\\?\\ to handle system-independent character encoding and very
* long path names (without the long path prefix Hunspell will use fopen()
* with system-dependent character encoding instead of _wfopen()).
*/
--- End quote ---

White-Tiger:

--- Quote from: Alpha on May 14, 2015, 04:22:37 am ---If making a guess, the issue might be here:

--- Code: (cpp) ---bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}
--- End code ---

--- End quote ---
This was actually the root of a problem I've encountered after fixing my file path issue above.
I had to change the "wxIspunct(ch)" part into "(wxIspunct(ch) && ch!='\'')" because words such as "doesn't" also showed up to be misspelled..
I suggest to unify the source code and use something like seen in HunspellInterface.cpp:130 (uses a list of known "non-word" chars)

--- Code: (cpp) --- wxString strDelimiters = _T(" \t\r\n.,?!@#$%^&*()-=_+[]{}\\|;:\"<>/~0123456789");
wxStringTokenizer tkz(strText, strDelimiters);
--- End code ---

I've further noticed that SpellChecker doesn't seem to handle UTF-8 at all.. at least when I try to correct the word "doesn¾" and use the suggested "doesn't", I'll end up with "doesn'txBE"
The menu item also only showed "doesn" without any visible char thereafter. (so only the first half of the UTF-8 char)

oBFusCATed:
Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?

stahta01:

--- Quote from: oBFusCATed on August 06, 2015, 08:56:05 pm ---Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?

--- End quote ---

Maybe the wrong single quote is used?

Tim S.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version