Code::Blocks Forums
Developer forums (C::B DEVELOPMENT STRICTLY!) => Development => Topic started by: Jenna on March 01, 2009, 01:16:30 pm
-
I'm currently experimenting mozillas charset-detection for C::B (see this thread: http://forums.codeblocks.org/index.php/topic,10159.msg70493.html#msg70493 (http://forums.codeblocks.org/index.php/topic,10159.msg70493.html#msg70493))
I'm looking for files that use encodings, that are not correctly recognized by C::B's encoding detection.
I mean any files that can only be opened after conversion to UTF-8, or by forcing a special fallback encoding or by bypassing C::B's autodetetction.
Especially files in that contain chinese, japanese, cyrillic, eastern-europe or hebrew characters.
It would be nice to have a native and a UTF-8 version to see if the characters are detected/displayed correctly.
Please don't attach such files to your posts, but send them via mail to "chardet at jenslody dot de".
So we reduce unnecessary server-load.
I will put them on my server, for others to test them, if they want.
They will be available on http://chardet.jenslody.de/ (http://chardet.jenslody.de/) (empty at the moment).
If you don't want the files to be published, please put a short note inside the mail.
I'm interested in single-files and of course also complete (short example) projects/workspaces.
-
Ok, I can report some files which are located in code::blocks source folder:
src/plugins/codecompletion/parser/tokenizer.cpp
src/sdk/wxscintilla/src/scintilla/src/LexMatlab.cxx
src/sdk/wxscintilla/src/scintilla/src/LexErlang.cxx
src/sdk/wxscintilla/src/scintilla/src/Editor.cxx
src/sdk/resources/lexers/lexer_css.xml
src/plugins/compilergcc/compilergcc.cpp
Thank you!
-
The last tow files are identified correctly in pure trunk and with the mozilla detection (one as UTF-8 with BOM and the as UTF-8 without BOM).
The others work only using system fallback on trunk and are detected as CP1252 (Windows 1252) by the mozilla detector.
-
ok :D
These files came from this bug report message one week ago.
http://forums.codeblocks.org/index.php/topic,10130.msg70316.html#msg70316
-
I send one.
-
I send one.
Thanks nanyu.
With mozilla-detection the non-UTF-8 is detected as chinese simpilfied (cp936) by C::B. That means the encoding-detector told me it is gb18030, but I change it internally to cp936 (windows-936), because wxWidgets only knows this one.
The trunk version only opens the UTF-8 file correctly on my system (detected as UTF-8 with BOM).
In my test version all chars are identical in both files, but some seem to miss: line 18 to 22 show a square as first character.
That's most likely a limitation of the characterset on my system, because iceweasel (the debian name for firefox) shows the same.
-
......
..... but some seem to miss: line 18 to 22 show a square as first character....
:D Don't worry for it! , because those four square characters ARE meant to four square characters.
-
......
..... but some seem to miss: line 18 to 22 show a square as first character....
:D Don't worry for it! , because those four square characters ARE meant to four square characters.
:D,Yes, Maybe, Jens' system can't display Chinese characters.
-
......
..... but some seem to miss: line 18 to 22 show a square as first character....
:D Don't worry for it! , because those four square characters ARE meant to four square characters.
:D,Yes, Maybe, Jens' system can't display Chinese characters.
My linux-system at home can display them, but not my windows-system (even after installing support for chinese characters in XP).
Maybe I'm missing something.
<EDIT>
After installing support for east-asian languages it works in C::B. Windows seems to need more files than just new fonts to display it correctly.
</EDIT>
But I can not read chinese, so I did not know whether the squares are wanted or just replacements.
(My father was able to read and speak a little chinese, but he died 15 months ago, so he can not help me.)
-
those squares are wanted, not for replacement. now you see?
-
I've just sent a file with chars used in Italian (à, è. é. ì, ò and ù).
Not working in SVN 5696 and 5716.
Works in 5678 and older.
-
I've just sent a file with chars used in Italian (à, è. é. ì, ò and ù).
Not working in SVN 5696 and 5716.
Works in 5678 and older.
Thanks, I found the cause for your problems, answer is here (http://forums.codeblocks.org/index.php/topic,10912.msg74883/topicseen.html#msg74883) .
-
I'm currently experimenting mozillas charset-detection for C::B (see this thread: http://forums.codeblocks.org/index.php/topic,10159.msg70493.html#msg70493 (http://forums.codeblocks.org/index.php/topic,10159.msg70493.html#msg70493))
I'm looking for files that use encodings, that are not correctly recognized by C::B's encoding detection.
I mean any files that can only be opened after conversion to UTF-8, or by forcing a special fallback encoding or by bypassing C::B's autodetetction.
Especially files in that contain chinese, japanese, cyrillic, eastern-europe or hebrew characters.
It would be nice to have a native and a UTF-8 version to see if the characters are detected/displayed correctly.
Please don't attach such files to your posts, but send them via mail to "chardet at jenslody dot de".
So we reduce unnecessary server-load.
I will put them on my server, for others to test them, if they want.
They will be available on http://chardet.jenslody.de/ (http://chardet.jenslody.de/) (empty at the moment).
If you don't want the files to be published, please put a short note inside the mail.
I'm interested in single-files and of course also complete (short example) projects/workspaces.
I think this is enough .....
I do agree with you. Those are the most effective way
comparatif simulation taux pret auto (http://pret-auto.org) - taux pret auto differe selon la prise en compte ... calculent automatiquement le taux pour un prêt automobile donne.comparatif simulation taux pret auto (http://pret-auto.org)