WoW64 layer (bottleneck)
It's true that the WoW64 layer adds some considerable overhead, especially when compiling or CC parsing (RtlInitializeExceptionChain, anyone?), and in particular because we create many more threads than would be necessary, which is extra expensive under WoW64. Unluckily, this is something we will have to live with for some time.
We have two half-working, broken platform implementations to replace the respective wxWidgets code, but getting them into a working, non-broken implementation is so far wishful thinking due to lack of time.
wxWidgets simply spawns one extra thread per process to listen on the pipe handles (or was it even one thread per pipe? I don't remember...). This works, but it also stinks.
Our half-working replacement implementations handle all pipes in one thread, and use
WaitForMultipleObjects and
select, respectively. Since there are 4 handles to watch per process, this however limits us to 16 processes maximum (also, the Unix implementation has a bug with return codes). Once upon a time when only few people had quad core processors, 16 concurrent processes seemed "huge", but with present time computers, this doesn't look like an implementation that we might want to publish.
Basically, what would be needed is an implementation around IOCP and
epoll_wait/
poll, which doesn't have such a limitation, and preferrably one that coalesces partial reads into single message blocks that are passed to the application after the process has exited (no use in having standard output tickle in "live" line by line as it is now, really -- every line currently creates one message which goes through the wxWidgets message system, including the 4 or 5 copies that are made during that process).
But sure enough, no time for approaching that before Christmas here, and even then I won't promise. Though maybe someone else is bored to death and wants to give it a try?
And not talking about the addressability (available memory when you have an RAM consuming app).
Well I don't know about the size of your projects, or anyone else's, but for me Code::Blocks very rarely uses more than 250 megabytes of committed memory, and I've never run out of address space. That said, VMMap tells me that Code::Blocks has 1.5GiB of free address space right now.
It's true that the linker will regularly run out of address space with LTO, but LTO is not really usable except for "hello world" anyway. At least for me it doesn't ever seem to work as soon as you link more than 2 libraries or have more than 10 source files...
The only program that I've ever used which truly needs 64 bits is WorldMachine, and once you generate maps which are that big, the recalculations become so painfully slow (as in, click "build" and go to lunch, and when you come back it still takes 2 hours) that you don't want to use it at all any more anyway.
The one big advantage of 64-bit mode is that you have twice as many registers. I've had some number crunching code speed up by 20 times due to that.