small refactoring in globals.cpp

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (5/7) > >>

frithjofh:
here the corrected patch. search and destroy please... :)

@obfuscated:

--- Quote ---The code runs faster because you've removed the calls to wxString::Remove.
This call is removing characters from the beginning of the string and copying the rest of the string to the front.
So the improvement is not because of better cache locality, but because there is less copying.
--- End quote ---

agree about the removing of wxString::Remove. the most gain lies here in avoiding the copying and also in the avoiding of modification of the wxString in place. building the output piece by piece is faster. as to cache locality: I was just making an assumption, nothing more. I am no professional programmer, I did not measure cache misses.

--- Quote ---Probably the compiler has re-written the loop and has extracted the test outside of it.
I'm sure that if you remove the check from the original version it won't change its performance!
--- End quote ---

I wouldn't know, as I did not measure this particular part of the code before and after. perhaps the compiler does this optimization on its own, perhaps not, I didn't look at the generated machine code. if there is a performance gain here, i think it will stand back behind the gain from avoiding the copying above anyway. by lengths. I find the new version easier to read though and easier to reason about.

--- Quote ---Why do you need to add elements and then remove them?
--- End quote ---

because as above, I think it makes it a lot easier to read and reason about. and strictly spoken, the elements are not removed from the container locale to the function, just copied around (and in some future only moved around). the return value is an unnamed object constructed directly from the resulting new range of valid entries.

--- Quote ---The return statement is doing a redundant copy.
--- End quote ---

correct, that is what some former version of the patch did. I took up the hint given and changed it accordingly.

@all:

thx for all the input. thx for not ignoring my efforts all together. apologies for my beginner mistakes.

please give me more bashing, so I will eventually learn and improve ... ;)

oBFusCATed:
I've done some benchmarking and profiling.
orig - current version in codeblocks
forum - your version + a fix when find returns npos
mine - obviously this is my implementation

The results printed by google's benchmark lib:

--- Code: ---> bin/release/wx_routine_bench
Success: 13 tests passed.
Test time: 0.00 seconds.
Run on (4 X 3599.88 MHz CPU s)
2016-02-06 17:15:06
Benchmark Time(ns) CPU(ns) Iterations
-----------------------------------------------------------
BM_orig/1 742 741 958904
BM_orig/8 7456 7457 93333
BM_orig/64 85563 85504 8140
BM_orig/512 2997799 3000000 233
BM_orig/4k 173072815 173000000 4
BM_orig/32k 28505832434 28504000000 1
BM_orig/64k 139696512699 139686000000 1
BM_forum/1 774 774 886076
BM_forum/8 5031 5029 140000
BM_forum/64 33001 32993 21277
BM_forum/512 256982 256831 2745
BM_forum/4k 2011854 2014368 348
BM_forum/32k 19614743 19611111 36
BM_forum/64k 43754816 43764706 17
BM_mine/1 246 246 2834008
BM_mine/8 1321 1320 530303
BM_mine/64 9476 9481 74468
BM_mine/512 69313 69295 10145
BM_mine/4k 537183 536715 1321
BM_mine/32k 6133614 6135135 111
BM_mine/64k 12912220 12907407 54
BM_mine_vector/1 217 217 3225806
BM_mine_vector/8 1447 1445 479452
BM_mine_vector/64 9060 9064 77778
BM_mine_vector/512 68974 68972 10294
BM_mine_vector/4k 539424 538986 1321
BM_mine_vector/32k 6113907 6115044 113
BM_mine_vector/64k 13649380 13666667 51
BM_no_trim_orig/1 383 383 1861702
BM_no_trim_orig/8 4165 4166 168269
BM_no_trim_orig/64 61783 61825 11290
BM_no_trim_orig/512 2833573 2829960 247
BM_no_trim_orig/4k 167948902 168000000 4
BM_no_trim_orig/32k 29410523891 29410000000 1
BM_no_trim_orig/64k 137438740492 137432000000 1
BM_no_trim_forum/1 357 357 1949861
BM_no_trim_forum/8 1751 1753 400000
BM_no_trim_forum/64 11392 11384 61404
BM_no_trim_forum/512 85726 85749 8140
BM_no_trim_forum/4k 676759 676385 1029
BM_no_trim_forum/32k 8598947 8597561 82
BM_no_trim_forum/64k 19704975 19714286 35
BM_no_trim_mine/1 250 250 2788845
BM_no_trim_mine/8 1527 1528 457516
BM_no_trim_mine/64 10476 10470 66667
BM_no_trim_mine/512 78441 78449 8974
BM_no_trim_mine/4k 606494 606684 1167
BM_no_trim_mine/32k 7209706 7218750 96
BM_no_trim_mine/64k 15034847 15021739 46
BM_no_trim_mine_vector/1 224 224 3139013
BM_no_trim_mine_vector/8 1649 1649 419162
BM_no_trim_mine_vector/64 9681 9672 72165
BM_no_trim_mine_vector/512 74088 74147 9589
BM_no_trim_mine_vector/4k 585973 585750 1207
BM_no_trim_mine_vector/32k 7304332 7305263 95
BM_no_trim_mine_vector/64k 15878234 15883721 43

--- End code ---

See the attached file for the source code.
Notes:
1. There are cases where the original version is faster than the forum version.
2. Mine version is always faster.
3. When doing trimming it is 3x or so faster. :)
4. Obviously faster code is uglier :)

frithjofh:
wow, you really took this to heart...

well, yes, you are right, there is one case, where the original version is faster... slipped me... ;)

and yes, your version is faster, in every regard. it is nice to see what a real programmer can do. a little bit of competitive spirit always works wonders... three iterations over the vector == three times slower than only one :o

I tried to think about the sense of my changes to this particular function when benchmarking. I wondered, if testing the function for 5.000 entries would really make sense, or even be realistic. perhaps the interesting range is from 64 to 4.000 entries, at least if the string that gets tokenized is some normal file. but I don't really know.

I like your version. post a patch. it is a bit a pain in the neck to read, and it replaces 2 functions with 6, but the speed gain is great and those helper functions surely could find use in other functions too...

have you checked the user functions too which I changed with the last patch?

oBFusCATed:

--- Quote from: frithjofh on February 06, 2016, 05:43:46 pm ---wow, you really took this to heart...

--- End quote ---
Not really, I'm just playing with perf and also wanted to play with google.benchmark a bit.
And for sure I wanted to verify that what I'm talking about is correct.
And it turned out that the difference is not that big that I thought it will be.
The reason is that most of the time is wasted constructing wxString/std::string objects.
BTW the wx3.0 version is quite a lot slower, because it uses std::string

--- Quote from: frithjofh on February 06, 2016, 05:43:46 pm ---have you checked the user functions too which I changed with the last patch?

--- End quote ---
Not yet.
For the getstringfromarray one my guess is that there won't be any difference:)

oBFusCATed:
Heh, it turned out that your version of GetStringFromArray is a lot slower than the original, because you're using operator+. ::)

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version