Author Topic: Spellchecker Issues  (Read 45842 times)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #15 on: April 14, 2015, 02:16:52 am »
Can someone try the ru-ru dictionary that is coming with libre office to spellcheck some of the files in the attached project on windows?

@Khram: It will be easier if you post the files for the dictionary yourself and so others can use them to debug the issue.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: Spellchecker Issues
« Reply #16 on: April 14, 2015, 07:59:27 am »
Can someone try the ru-ru dictionary that is coming with libre office to spellcheck some of the files in the attached project on windows?
Well I picked just one ru_RU dictionary I found and they are not correctly spell-checked. Maybe I picked the wrong one?

@Khram: What dictionary do you use exactly?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline Alpha

  • Developer
  • Lives here!
  • *****
  • Posts: 1513
Re: Spellchecker Issues
« Reply #17 on: May 14, 2015, 04:22:37 am »
If making a guess, the issue might be here:
Code: cpp
bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
    return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #18 on: May 14, 2015, 09:29:00 am »
In new nightly build  (10253) - spellSheck no working
Of course it is not working - no one has fixed it, because they can't reproduce it.

Please post a source file and a dictionary file that should be used to reproduce the problem.
Also (probably) post a screenshot with your regional settings.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline Alatar

  • Multiple posting newcomer
  • *
  • Posts: 60
Re: Spellchecker Issues
« Reply #19 on: May 26, 2015, 11:35:10 am »
Please find dictonary on yandex.disk - https://yadi.sk/d/glyHPzRKgsZkQ
Testfile and screenshot of wrong behaviour are attached

Here is C::B version string and localisation settings:

Code
Code::Blocks svn build  rev 10309 May 25 2015, 10:02:04 - wx2.8.12 (Linux, unicode) - 64 bit

alatar@al_work:~% locale
LANG=en_GB.UTF-8
LC_CTYPE="en_GB.UTF-8"
LC_NUMERIC="en_GB.UTF-8"
LC_TIME="en_GB.UTF-8"
LC_COLLATE="en_GB.UTF-8"
LC_MONETARY="en_GB.UTF-8"
LC_MESSAGES="en_GB.UTF-8"
LC_PAPER="en_GB.UTF-8"
LC_NAME="en_GB.UTF-8"
LC_ADDRESS="en_GB.UTF-8"
LC_TELEPHONE="en_GB.UTF-8"
LC_MEASUREMENT="en_GB.UTF-8"
LC_IDENTIFICATION="en_GB.UTF-8"
LC_ALL=

alatar@al_work:~% uname -a
Linux al_work 3.17.7-gentoo #1 SMP PREEMPT Mon Mar 30 18:24:07 MSK 2015 x86_64 Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz GenuineIntel GNU/Linux

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #20 on: May 27, 2015, 12:58:46 am »
@Alatar:
I've tried the hunspell binary and it cannot spell-check correctly the Russian part of the file using your dictionary.
I've tried something like
Code
$ copy the dictionary to /usr/share/hunspell
$ hunspell -d Russian-English  -i utf-8 /tmp/spellcheck_check.txt

I guess this is the problem:
Code
error: unknown encoding Windows-1251: using iso88591 as fallback

Please keep in mind that hunspell uses iconv to do the conversions.
If you can reproduce the problem with huspell in a console, then you should talk to either hunspell devs or the vendor of your dictionary.

I'm running this test on gentoo linux.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #21 on: July 23, 2015, 04:53:58 pm »
I've also problems with SpellChecker and CB r10341
Basically it's not working at all.. the only thing that works is the "user dictionary".

I'm not even using any kind of weird language.. I only need the English spell checking to work as that's the main language used by developers.
Not sure what's wrong here though.. I don't see any errors in the CB consoles (only when I delete the th_* files as they can't be found, or if I switch to the GB dictionary because it's then loading the US one)

It's not only highlighting everything that is not in the custom dictionary, but also Edit->Spelling... doesn't provide suggestions or that like..
The source files I've checked with aren't even UTF-8 yet, they are still plain ASCII without special chars in them

Here are the dicts I've tried to use on my Windows machine with dictionary path set to %AppData%\codeblocks\SpellChecker : https://db.tt/pSVUEisr
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #22 on: July 23, 2015, 06:38:15 pm »
@White-Tiger:
Just tried them and they work as expected in both r10333 and r10358 on linux.
Do you have any other hunspell based apps that you can try if they work correctly?

Also is there a nightly that just works with this dictionary?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #23 on: July 29, 2015, 06:52:29 pm »
well... yes those dictionaries work in Miranda NG (IM)
I've also just tried the last stable of Code::Blocks, that one didn't work as well... going to boot up my XP VM now and try it there

edit:
tried my XP VM with r10341 nightly, SpellChecker seemed to work at first.. yet I've found out the reasons. The problem lies in the path... "%AppData%\codeblocks\SpellChecker" by itself is fully functional, but my user name includes a special character: "é"
So as soon as there's any non-ASCII char in the path, it fails to work.
Normally I wouldn't choose such a Windows user name.. but Windows simply used my real name the moment I've signed in with my Microsoft account... And so far, I didn't had a program that couldn't handle it. (and Code::Blocks works in most cases)
« Last Edit: July 29, 2015, 07:19:35 pm by White-Tiger »
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #24 on: July 29, 2015, 08:48:01 pm »
Interesting. I guess someone running windows should have to debug this.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline raynebc

  • Almost regular
  • **
  • Posts: 217
Re: Spellchecker Issues
« Reply #25 on: July 30, 2015, 07:45:33 pm »
One insight I can offer is that the traditional file I/O C functions like fopen (at least in Windows with MinGW) tend to not support file paths containing Unicode or extended ASCII characters.  It's been a huge thorn in my side for some time now.  Third party I/O functions (like the ones in the Allegro game library) can open such files with absolutely no problem.  Non cross-platform implementations like the ones in Visual Studio also probably support such file paths because I've never run into any Windows-specific application with that limitation.

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #26 on: August 01, 2015, 01:21:00 pm »
well.. the way Code::Blocks opens files seems to be fine... not sure if Code::Blocks uses Unicode / wchar_t / TCHAR on Windows, but the thing is that SpellChecker finds and successfully opens the dictionaries... otherwise this shouldn't work:
Code
SpellChecker: Thesaurus files 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_GB.idx' not found!
SpellChecker: Loading 'C:\Users\René\AppData\Roaming\codeblocks\SpellChecker\th_en_US.idx' instead...

So parts of SpellChecker seem to work, while others doesn't

edit: My bet:
HunspellInterface.cpp:61-62: should both prefix the path with "\\?\" to let Hunspell handle UTF-8 paths on Windows.. (Windows only)
see:
Quote from: hunspell/hunspell.hxx
  /* Hunspell(aff, dic) - constructor of Hunspell class
   * input: path of affix file and dictionary file
   *
   * In WIN32 environment, use UTF-8 encoded paths started with the long path
   * prefix \\\\?\\ to handle system-independent character encoding and very
   * long path names (without the long path prefix Hunspell will use fopen()
   * with system-dependent character encoding instead of _wfopen()).
   */
« Last Edit: August 01, 2015, 01:53:20 pm by White-Tiger »
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline White-Tiger

  • Multiple posting newcomer
  • *
  • Posts: 83
Re: Spellchecker Issues
« Reply #27 on: August 06, 2015, 03:48:57 pm »
If making a guess, the issue might be here:
Code: cpp
bool SpellCheckHelper::IsWhiteSpace(const wxChar &ch)
{
    return wxIsspace(ch) || wxIspunct(ch) || wxIsdigit(ch);
}
This was actually the root of a problem I've encountered after fixing my file path issue above.
I had to change the "wxIspunct(ch)" part into "(wxIspunct(ch) && ch!='\'')" because words such as "doesn't" also showed up to be misspelled..
I suggest to unify the source code and use something like seen in HunspellInterface.cpp:130 (uses a list of known "non-word" chars)
Code: cpp
  wxString strDelimiters = _T(" \t\r\n.,?!@#$%^&*()-=_+[]{}\\|;:\"<>/~0123456789");
  wxStringTokenizer tkz(strText, strDelimiters);

I've further noticed that SpellChecker doesn't seem to handle UTF-8 at all.. at least when I try to correct the word "doesn¾" and use the suggested "doesn't", I'll end up with "doesn'txBE"
The menu item also only showed "doesn" without any visible char thereafter. (so only the first half of the UTF-8 char)
Windoze 8.1 x86_64 16GiB RAM, wxWidgets-2.8x (latest,trunk), MinGW-builds (latest, posix-threads)
Code::Blocks (x86 , latest , selection length patch , build option fixes/additions , toggle comments)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: Spellchecker Issues
« Reply #28 on: August 06, 2015, 08:56:05 pm »
Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline stahta01

  • Lives here!
  • ****
  • Posts: 7582
    • My Best Post
Re: Spellchecker Issues
« Reply #29 on: August 06, 2015, 10:53:00 pm »
Hm, I've wondered why "doesn't" is detected as misspelled.
Can you post a patch with your second suggestion?

Maybe the wrong single quote is used?

Tim S.
C Programmer working to learn more about C++ and Git.
On Windows 7 64 bit and Windows 10 64 bit.
--
When in doubt, read the CB WiKi FAQ. http://wiki.codeblocks.org