Today, I carefully read this article:
Optimizations That Aren't (In a Multithreaded World)This is a very interesting article about how to use a shared object(e.g. string) in muti-thread cases.
After reading this article, I just try to see what has done in wxWidgets' wxString, I just open the file:
wxWidgets-2.8.12\include\wx\string.h
I see such code:
struct WXDLLIMPEXP_BASE wxStringData
{
int nRefs; // reference count
size_t nDataLength, // actual string length
nAllocLength; // allocated memory size
// mimics declaration 'wxChar data[nAllocLength]'
wxChar* data() const { return (wxChar*)(this + 1); }
// empty string has a special ref count so it's never deleted
bool IsEmpty() const { return (nRefs == -1); }
bool IsShared() const { return (nRefs > 1); }
// lock/unlock
void Lock() { if ( !IsEmpty() ) nRefs++; }
// VC++ will refuse to inline Unlock but profiling shows that it is wrong
#if defined(__VISUALC__) && (__VISUALC__ >= 1200)
__forceinline
#endif
// VC++ free must take place in same DLL as allocation when using non dll
// run-time library (e.g. Multithreaded instead of Multithreaded DLL)
#if defined(__VISUALC__) && defined(_MT) && !defined(_DLL)
void Unlock() { if ( !IsEmpty() && --nRefs == 0) Free(); }
// we must not inline deallocation since allocation is not inlined
void Free();
#else
void Unlock() { if ( !IsEmpty() && --nRefs == 0) free(this); }
#endif
// if we had taken control over string memory (GetWriteBuf), it's
// intentionally put in invalid state
void Validate(bool b) { nRefs = (b ? 1 : 0); }
bool IsValid() const { return (nRefs != 0); }
};
You see, the wxString use reference counting, but look at its function Lock(), and Unlock(), I see that it just operation on the normal member variable: int nRefs, no special protection is used.
Compared to what the article said:
Using a COW String: Multi-Threaded
Now let's return to our friend, the Error subsystem. Here's the problematic client code again:
// thread A
String newerr = SetError( "A" );
newerr += " (set by A)";
cout << newerr << endl;
// thread B
String newerr = SetError( "B" );
newerr += " (set by B)";
cout << newerr << endl;
As before, the issue is that there has to be some serialization (using, for example, a mutex lock) in order to avoid corrupting the Error module's internal strings and produce reasonable output. But who should be responsible for doing the serialization? Consider the two levels again:
1. Do locking within String member functions. As before, there are two problems with this: It's at the wrong granularity to solve the SetError function's interleaving, and it can seriously degrade performance because the locking would be done at least once for every mutating (and possibly even nonmutating) String operation. What's different from before, however, is that this time Option 1 is necessary anyway (see below).
2. Do locking in the code that owns/manipulates a String object. Again, this is necessary, otherwise we haven't solved the SetError interleaving problem.
3. [NECESSARY] Do both. This time the story isn't so good: Option 2 is necessary, but we have to do Option 1 too because the code that owns/manipulates a COW string can't. Specifically, client code alone can never lock the right strings in the right way because it has no way of knowing which visible string objects might be sharing a representation at any given time. Look again at the thread-safe Error subsystem code from Example 2--the locking that it does is still necessary but is now insufficient, because even though we serialize all access to the internal err error string object, the client threads are taking and manipulating copies of that object, which copies might share representation.
Pay attention of the quoted text above, especially the option 3,
[NECESSARY] Do both, this means if the string class is reference counted (copy on write, COW), both the object and the operation on the object should be protected in a multiply thread case.
The article later said that the option 1 is : the reference counter should be protected by some locker. (the reference counter should be thread safe).
But compared with the wxwidgets 2.8.12's code, I see that wxString's reference counter is
NOT protected.
Now, wxWidgets 2.9.x, as I remembered, the wx guys have announced that the wxString in 2.9 or 3.x was internally use std::basic_string.
So, lets see what std::basic_string did:
I open the file:
E:\code\gcc\PCXMinGW463\lib\gcc\i686-w64-mingw32\4.6.3\include\c++\bits\basic_string.h
See:
template<typename _CharT, typename _Traits, typename _Alloc>
class basic_string
{
typedef typename _Alloc::template rebind<_CharT>::other _CharT_alloc_type;
// Types:
public:
typedef _Traits traits_type;
typedef typename _Traits::char_type value_type;
typedef _Alloc allocator_type;
typedef typename _CharT_alloc_type::size_type size_type;
typedef typename _CharT_alloc_type::difference_type difference_type;
typedef typename _CharT_alloc_type::reference reference;
typedef typename _CharT_alloc_type::const_reference const_reference;
typedef typename _CharT_alloc_type::pointer pointer;
typedef typename _CharT_alloc_type::const_pointer const_pointer;
typedef __gnu_cxx::__normal_iterator<pointer, basic_string> iterator;
typedef __gnu_cxx::__normal_iterator<const_pointer, basic_string>
const_iterator;
typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
typedef std::reverse_iterator<iterator> reverse_iterator;
private:
// _Rep: string representation
// Invariants:
// 1. String really contains _M_length + 1 characters: due to 21.3.4
// must be kept null-terminated.
// 2. _M_capacity >= _M_length
// Allocated memory is always (_M_capacity + 1) * sizeof(_CharT).
// 3. _M_refcount has three states:
// -1: leaked, one reference, no ref-copies allowed, non-const.
// 0: one reference, non-const.
// n>0: n + 1 references, operations require a lock, const.
// 4. All fields==0 is an empty string, given the extra storage
// beyond-the-end for a null terminator; thus, the shared
// empty string representation needs no constructor.
struct _Rep_base
{
size_type _M_length;
size_type _M_capacity;
_Atomic_word _M_refcount;
};
struct _Rep : _Rep_base
{
// Types:
typedef typename _Alloc::template rebind<char>::other _Raw_bytes_alloc;
Hey, it use "_Atomic_word _M_refcount;", and later, I see:
void
_M_dispose(const _Alloc& __a)
{
#ifndef _GLIBCXX_FULLY_DYNAMIC_STRING
if (__builtin_expect(this != &_S_empty_rep(), false))
#endif
{
// Be race-detector-friendly. For more info see bits/c++config.
_GLIBCXX_SYNCHRONIZATION_HAPPENS_BEFORE(&this->_M_refcount);
if (__gnu_cxx::__exchange_and_add_dispatch(&this->_M_refcount,
-1) <= 0)
{
_GLIBCXX_SYNCHRONIZATION_HAPPENS_AFTER(&this->_M_refcount);
_M_destroy(__a);
}
}
} // XXX MT
see the "__gnu_cxx::__exchange_and_add_dispatch", this is the atomic operation (mostly it will be an CPU instruction in my case)
Back to the original article I read, it looks like the COW have some issue about multiply thread access, so when assigning a wxString, we should do: (if we have a shared string)
wxString new_string = mutithread_shared_string.wx_str();
then use new_string in the same thread
This way, we can let it do a deep copy of the buffer.
Any ideas?
EDIT: there are a lot of discussion in the wxString forum:
Re: Please help with GUI crash problem?As our CC are a typical multithread case, we should take care about this.