Generalizing programming language patterns in CodeBlocks

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (11/13) > >>

beqroson:
Question is if there need to be the double checking of the found hashvalue at all? Since the hash value is equal to the byte sequence for the first eight bytes, all short strings will have an exact hash match. For any string of byte length nine or greater, the probability of a hash collision must be very small. Doing the double check will take relatively much processing time. The shortest string with eight bytes, however, is two character UTF8 of very high unicode point values. Question is also how often they will show up. More common short strings of UTF8 with eight bytes will be somewhere of 4+ characters.

An example where a hash collsion will occur is the terms "templates" and "semplatet". Using those terms, the only anagrams of nine byte strings that collide is where the first and last letters are interchanged. Question is if you would wait until a collision is detected by the user, and let the user switch to a checked version of the hash. I know it is substandard to make shortcuts like this. That is why I am asking.

beqroson:
Probably it is going to be checked, that is the only reasonable way. Besides, the search string to check can reside in the same table as the replacement string, right before it. Thus the term is first checked, then replaced by just keep going in the same table, using two length values and one index value.

beqroson:
Great, the inner loop in its most basic form seems to become somewhat defined. Now, the next item to solve in the list will be how to treat variable scope.

beqroson:
One way to treat variable scope, is to... not treat variable scope. To avoid variable scope is to avoid using parsing technology. To avoid using parsing technology is to create one-for-all mechanism for the translation.

The idea is to use several versions of the source files as follows:

________________________________________________________
|
| Specific language cpp-source file, *.hscrp.h and *.cppscrp.cpp
|________________________________________________________

A
| Lossless bidirectional translation
V

_________________________________________________________
|
| Common language cpp-source file, *.hlang.h and *.cpplang.cpp
|________________________________________________________

| One way translation
V
___________________________________
|
| Normal cpp-source file, *.h and *.cpp
|__________________________________

thomas:

--- Quote from: beqroson on November 10, 2013, 04:55:51 pm ---Yes, my definition was that both the programming language and the native language can be one. Such as if I write wholly in English, ie "function DoSomething()" or in Italian, ie "funzione FareQualcosa()", then both the native and the programming sentences could be categorized as "Italian" language.

Now, in the world of programming, I was thinking that the translation can be only to exchange words one by one, straight.

--- End quote ---
Don't get me wrong on that, but this is the most stupid idea I've heard in a while.

Not only that, but it also won't work. Languages do not translate word by word, and languages have grossly different grammar. Many languages have characters that do not exist in others. What if someone writes Tagalog or Chinese and you expect Italian or German? How is this supposed to work? Do you expect comments being magically translated as well?
Not few terms translate in an awkward manner to say the least, even when done by professinal translators. I regularly have to stop and think what they're trying to say when I see IT translations from English to my native language done by professionals working for multi-million-dollar companies. Let alone word-by-word computer translation.

Plus, most people who are moderately familiar with programming are also firm in English.

That much for natural languages, and as far as "any programming language" goes, I can think of least 6 grossly different categories of languages, and these are certainly not all:

* compiler based bytecode languages (e.g. Java)
* compiler and linker based languages (e.g. C or C++)
* interpreted/bytecode languages without explicit compiler (e.g. Python, Lua,... )
* interpreted/bytecode embedded languages (e.g. AngelScript, Squirrel, and again Python, Lua)
* interpreted/bytecode remote languages (e.g. PHP)
* weirdo languages that do near unpredictable stuff (e.g. bash script, perl)
* weirdo languages that nobody can understand (e.g. Lisp)
Some of these need a compiler invoked, some of them need the executable to be linked afterwards. Some need the binaries and resources packed in a zip file and a bytecode interpreter launched afterwards instead.
Some need an interpreter launched, some need a host application (including bindings).
Some need files being uploaded to a different machine where an interpreter runs as server process.
Some need ... something else.

All of these categories are so grossly different that it is hardly possible to pack them all into one unified build process or one unified notion of a "project".

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version