Author Topic: F() function is not thread safe?  (Read 33204 times)

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: F() function is not thread safe?
« Reply #15 on: June 10, 2014, 10:30:10 am »
I'd prefer if you start a branch on github or show a patch before committing.
And please name the function SafeF or ThreadedF.

Also why don't you just use wxString::Format it does the same but it is a bit longer to type?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6025
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: F() function is not thread safe?
« Reply #16 on: June 11, 2014, 04:02:57 am »
I'd prefer if you start a branch on github or show a patch before committing.
OK, I will show a patch file here later.

Quote
And please name the function SafeF or ThreadedF.

Also why don't you just use wxString::Format it does the same but it is a bit longer to type?
Oh, yes, We can use this build-in wxWidgets functions, but I'm not sure it handle the %ls and %s replacement well:
Code
+#if wxCHECK_VERSION(2,9,0) && wxUSE_UNICODE
+// in wx >=  2.9 unicode-build (default) we need the %ls here, or the strings get
+// cut after the first character
+    wxString tmp(msg);
+    tmp.Replace(s, ls);
+    tmp = wxString::FormatV(tmp.wx_str(), arg_list)
+#else
+    wxString tmp(wxString::FormatV(msg, arg_list));
+#endif
I don't really understand why we need %ls for wx verison >2.9.0.
If we use wxString::Format, what should we use there? %ls or %s ?
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: F() function is not thread safe?
« Reply #17 on: June 11, 2014, 09:36:54 am »
Why don't you test it?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6025
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: F() function is not thread safe?
« Reply #18 on: June 11, 2014, 11:43:27 am »
Why don't you test it?
Good idea. Before I did some test, I firstly search the Google, and I hit one discussion:
wxChar wxStringCharType error, this link to our forum thread Compiling C::B wxChar wxStringCharType error.

The core reason is: Under Linux, the wx 2.9.0 to 2.9.3 by default use UTF8 as its wxString internal encoding, which is actually char for wxChar, but wx 2.9.4 and wx 3.0.x later switched to wchar_t as its default wxString encoding, so actually wxChar is wchar_t.

See this reply: Re: wxChar wxStringCharType error
Quote
Form the docs:

    const wxStringCharType* wxString::wx_str() const
    Explicit conversion to C string in the internal representation (either wchar_t* or UTF-8-encoded char*, depending on the build).


The wxString internal representation in Unix has been for a long time a char sequence, encoded in UTF8 (as GTK does).
So, you may try to recompile with wxUSE_UNICODE_WCHAR==1.

Anyhow, I know this internal representation has been changed (wx294 or wx-svn, I don't remember) to use wchar_t.
I would try with wx-svn.

Under Windows, there is no issue about this, because all the wx unicode build default to wchar_t.

Now, I think C::B won't support build against wx2.9.0 to 2.9.3, so the preprocessor hack should be totally removed.


EDIT, by the way, I can't find official document about %s and %ls, but it looks like I get the answer from this post: Unicode and printf %s formatter - Google Groups, %s is used for a wide string, and %ls is used for wxString.
« Last Edit: June 11, 2014, 03:32:09 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13413
    • Travis build status
Re: F() function is not thread safe?
« Reply #19 on: June 11, 2014, 09:13:01 pm »
Then probably place a upper limit to the version check macro or just remove it.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: F() function is not thread safe?
« Reply #20 on: June 11, 2014, 10:09:21 pm »
[OT] I think wxWidgets has made the badest decision they could have made. Switching to wchar_t is the box of the Pandora, because you can't say  it it is 16bit or 32bit or even 64bit. The only right decision would be to have used UTF8 with char also on windows, and add some translation api... [\OT]

I recommend you to not use the buffer directly but always strings and the append or stream functions to combine strings and other values. This can maybe slow down the whole process, but it is a clean and save way...

greetings

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6025
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: F() function is not thread safe?
« Reply #21 on: June 12, 2014, 04:57:35 am »
Then probably place a upper limit to the version check macro or just remove it.
I prefer remove it.
But I see this already happens years ago, but later reverted.

See:
Code
Revision: a3cea10c6e1ba93287aef84a7dc3c34ab879e8da
Author: jenslody <jenslody@2a5c6006-c6dd-42ca-98ab-0921f2732cef>
Date: 2013-6-1 1:14:14
Message:
* Revert last commit, because logging is still broken in wx2.9 unicode (at least in 64bit)

git-svn-id: http://svn.code.sf.net/p/codeblocks/code/trunk@9125 2a5c6006-c6dd-42ca-98ab-0921f2732cef
----
Modified: src/include/logmanager.h
Modified: src/plugins/contrib/envvars/envvars_common.cpp

Revision: 67a9f7cee0ca671cc7baafe78436f319b89395be
Author: biplab <biplab@2a5c6006-c6dd-42ca-98ab-0921f2732cef>
Date: 2013-5-31 22:20:59
Message:
* Reverted: Rev 8259 as it may not be needed anymore. However if original bug persists then revert this commit.

git-svn-id: http://svn.code.sf.net/p/codeblocks/code/trunk@9124 2a5c6006-c6dd-42ca-98ab-0921f2732cef
----
Modified: src/include/logmanager.h
Modified: src/plugins/contrib/envvars/envvars_common.cpp



@Jens or other devs, can you confirm change in rev9124  is OK with 64 bit Linux system (with wx 3.0)? Thanks.


[OT] I think wxWidgets has made the badest decision they could have made. Switching to wchar_t is the box of the Pandora, because you can't say  it it is 16bit or 32bit or even 64bit. The only right decision would be to have used UTF8 with char also on windows, and add some translation api... [\OT]
I don't have an option, because the different OS have too many string encodings. Maybe, wx guys just want wxString in wx 3.0 be consistent with 2.8.x.

Quote
I recommend you to not use the buffer directly but always strings and the append or stream functions to combine strings and other values. This can maybe slow down the whole process, but it is a clean and save way...
greetings
Not only the F() function, but there are many places in CC use directly wxChar*. It is OK if wxString use fixed length code encoding, in this case, wxChar == wchar_t.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline BlueHazzard

  • Developer
  • Lives here!
  • *****
  • Posts: 3353
Re: F() function is not thread safe?
« Reply #22 on: June 12, 2014, 07:41:19 am »
And exactly this is the problem. Wchar_t isn't a fixed length encoding (at least under windows). For fixed length you need utf-32, but windows uses utf-16 and wchar_t with 16 bit. So you can`t use one buffer entry as one character... This is a common error and miss interpration. I can't add some links because i'm writhing this on my old phone, but you have only to google a bit (something like utf16 vs utf8)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6025
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: F() function is not thread safe?
« Reply #23 on: June 12, 2014, 08:08:57 am »
And exactly this is the problem. Wchar_t isn't a fixed length encoding (at least under windows). For fixed length you need utf-32, but windows uses utf-16 and wchar_t with 16 bit. So you can`t use one buffer entry as one character... This is a common error and miss interpration. I can't add some links because i'm writhing this on my old phone, but you have only to google a bit (something like utf16 vs utf8)
Hi, thanks, I do understand the truth that utf16 is not the same thing as one buffer entry as one character. Under Windows, say wchar_t is 16 bit, and one buffer entry can hold at most 65536 different characters. But in-fact we have more than 65536 characters, so some special case, a character occupies two wchar_t entries. See: Surrogates and Supplementary Characters (Windows) and c++ - Size of wchar_t* for surrogate pair (Unicode character out of BMP) on Windows - Stack Overflow

I remember that wxwidgets just drop those special case, which means they believe we seldom have surrogate pairs in the wxStrings. See: wxWidgets: wxString Overview
Quote
For simplicity of implementation, wxString uses per code unit indexing instead of per code point indexing when using UTF-16, i.e. in the default wxUSE_UNICODE_WCHAR==1 build under Windows and doesn't know anything about surrogate pairs. In other words it always considers code points to be composed by 1 code unit, while this is really true only for characters in the BMP (Basic Multilingual Plane), as explained in more details in the Unicode Representations and Terminology section. Thus when iterating over a UTF-16 string stored in a wxString under Windows, the user code has to take care of surrogate pairs himself. (Note however that Windows itself has built-in support for surrogate pairs in UTF-16, such as for drawing strings on screen.)

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.