Generalizing programming language patterns in CodeBlocks

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (10/13) > >>

beqroson:

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---With both this stuff about comments and your UTF-82 talk, I think you are WAY overcomplicating things.

--- End quote ---

Overcomplicating... nope, I am not

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---To me, the potential "win" here is to create set of standardized translation tables

--- End quote ---

Standardized translation? Noooo...

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---Comments, especially the doc strings for toolkits like wxWidgets, would be nice, but they aren't necessary to get a program to compile and dealing with them in the right way has to be part of a much larger translation effort.

--- End quote ---

Agree, comments need more effort from the developer than from any translation tool creator.

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---To reiterate, you don't really need to integrate this into C::B to make your proof of concept.

--- End quote ---

No, that is true, to the no...

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---And you shouldn't because if it is useful to C::B users it will be useful to programmers more genrally.

--- End quote ---

No, I should not. But can I keep my sticky fingers from it? Nope.

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---Why don't you start by writing a simple tool that takes the users foreign language source files (UTF-8), a speficified translation table, and outputs the english programming language equivalent (and vice versa). From there it would be easy enough to integrate into the GCC and other toolchains. Then turn it into a Library and IDEs will be able to take advantage of it too.

--- End quote ---

No, that I will do. So, no. Wtf, I mean yes, YES.

beqroson:

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---Comments, especially the doc strings for toolkits like wxWidgets, would be nice, but they aren't necessary to get a program to compile and dealing with them in the right way has to be part of a much larger translation effort.

--- End quote ---

I think you are too speeding reading. In conclusion my point all the time was that I will not touch comments, just skip over them in the translator.

beqroson:

--- Quote from: dmoore on November 11, 2013, 06:22:47 pm ---your UTF-82 talk

--- End quote ---

8)
Well, you are probably correct about overcomplicating that stuff. I hope the UTF82 is just a temporary hickup. If I decide to implement it I will be way beyond my available time not to mention unnecessarily complex.

beqroson:

--- Quote from: beqroson on November 05, 2013, 08:18:20 pm ---The general idea is to enable that any programming language should be possible to use with very little preparation in the codeblocks IDE. I am not talking about just C++, but ANY language. By this I mean compiling, code-completion, hightlightning, everything that a developer needs.

--- End quote ---

Wtf!! What the hell was I talking about? My mouth must be running at a higher clock speed than my mind. I mean, OK for an idea, but it is not easy to implement in an afternoon. I knew that already, but I talk too much!

beqroson:
However, I am looking to create the algorithm for the translation phase. This is what I came up with so far:

In order to translate terms in the source document, the basic algorithm is as follows:

* Use a source string and a destination string for the pass.
* Scan the document for comments, skip all comment by writing directly to the destination.
* Create hash for each term found by doing the following
* For each byte that is not within a comment, check if it is equal or greater than 0x80, if so, it is a character to check.
* If it is less than 0x80, use a lookup table to get if it is a character to check, e.g [A-Z,a-z,0-9].
* For all characters to check, create a hash value by round robin over an uint64_t.
* When a byte that is not a character to check is entered, stop creating the hash and save the length of the byte sequence.
* With the length of the string, go into a lookup array using the length as index.
* In the lookup array two values are retrieved, the lower bound and the upper bound as a subset of a long list of hashes.
* We now know in what range in the long hash table a match can be found.
* Using binary search algorithm, the upper and lower bound converges until a definite match can be found.
* If the hash value is found in the list, there is great probability that it is the correct match, but that should probably be confirmed by double checking.
* Using the index of the matched hash, enter the same index into another lookup table, retrieve another index and length that goes into a compact table with the replacement string and write to the destination.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version