lexer file loading ...

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (3/5) > >>

thomas:

--- Quote from: Michael on April 05, 2006, 09:51:14 am ---May be lexers could be handle as file associations are. Depending on which kind of file a user open, C::B loads the relative lexer automatically [...]
--- End quote ---
I have been trying to implement just that last evening, but it is not as easy as you think. First, you don't know what a lexer refers to without loading it. Thus, you would have to encode this information somewhere. Keeping around an extra map file for this would work best, but then you are building up a dependency which is not good. When adding a new lexer, you have to update the map, or it won't work.
One could think about putting the extension which is handled into the lexer's name, but most lexers handle several (up to 6) file types, so filenames would become quite cluttered (still possible).

--- Quote ---By default C::B has pre-defined associations (stored into the C::B config file?).

--- End quote ---
Hardcoded at the present time. We discussed this in January when restructuring the file association code, but decided to leave it hardcoded for now to not further complicate things.

--- Quote ---Each type of file has its lexer. The user has the possibility to modify this list by either adding a new lexer and its relative file type association and/or to modify an existing one. User specific lexers could be stored separately into the C::B config file (or an alternative lexer config file).
--- End quote ---
That's basically how it used to be in the dark ages when all lexers were copied to the configuration. Currently, only differences are stored to the config.

My current plan is to scan the lexer folder once and load all lexers once. That provides us with a mapping of extensions to lexers which can be saved in the config file. On subsequent loads, Code::Blocks will know which lexer to load when opening a specific file type, and that can indeed be done on request then. When installing a new lexer, one would have to hit the "refresh button" to force reloading the map. That way, you don't need to configure anything, which is a good thing. I am still looking for a weak spot in this approach, but I guess it might just work fine.
What do you think about this approach?

--- Quote ---TinyXML parsing is slow, with a rough measure of 200ms each lexer. [...]
SciTE loads way more lexers than C::B. The SciTE lexers also have more features.
--- End quote ---
You're comparing apples and oranges again. SciTE lexers have a collection of single line key/value pairs, and Code::Blocks lexers are xml documents that are validated for well-formedness. Of course it takes time to validate a document, this is not surprising.
The same goes for your network load story. You're missing the point here, too.
We are making on the order of 13,000 isolated file accesses during a "normal" startup. On a local file system, much of this can be cached, but it is absolutely not surprising that this is a major performance bottleneck over a network.
wxWidgets makes on the order of 10,000 distinct file accesses alone to load the XRC files. You can easily verify this using FileMon if you have any doubts about it.
To get back to TinyXML which is so terribly slow: the configuration file loads with about 6-7 file accesses, and all lexers are loaded using about 100 distinct file accesses. The time that TinyXML takes to parse those files is just ridiculous compared to the network latency of 10k accesses...

Michael:

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---
--- Quote from: Michael on April 05, 2006, 09:51:14 am ---May be lexers could be handle as file associations are. Depending on which kind of file a user open, C::B loads the relative lexer automatically [...]
--- End quote ---
First, you don't know what a lexer refers to without loading it.

--- End quote ---

In my idea, you have a table where for each lexer there are the file extensions supported. But instead of putting just the lexer name, you put its path (possibly relative) and its name (as alternative, just the lexer filename and the lexer folder path stored separately). In a similar way as for the include files and libraries. In this case C::B knows which lexer it has to load (without before parsing all the lexers).

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---Thus, you would have to encode this information somewhere. Keeping around an extra map file for this would work best, but then you are building up a dependency which is not good. When adding a new lexer, you have to update the map, or it won't work.
One could think about putting the extension which is handled into the lexer's name, but most lexers handle several (up to 6) file types, so filenames would become quite cluttered (still possible).

--- End quote ---

The information could be stored into an XML file. When C::B starts, it loads the XML file, parses it, gets the info and fills the table. When a user add/modify a lexer/extension, this can be easily saved into the XML file. May be a multimap can be used, where the lexer "name" would be the key and the extensions the values.

Disadvantage is that you build some dependencies which is not good.

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---My current plan is to scan the lexer folder once and load all lexers once. That provides us with a mapping of extensions to lexers which can be saved in the config file. On subsequent loads, Code::Blocks will know which lexer to load when opening a specific file type, and that can indeed be done on request then. When installing a new lexer, one would have to hit the "refresh button" to force reloading the map. That way, you don't need to configure anything, which is a good thing. I am still looking for a weak spot in this approach, but I guess it might just work fine.
What do you think about this approach?

--- End quote ---

I think it is a good alternative :). The question is how to manage the updates of the map (addition, deletion, modification of a lexer). E.g., if you add/modify a lexer would C::B re-parses all the lexer again or just to new/modified one? If you re-scan all the lexers (easiest solution), it would take time and may be the user will not appreciate. May be a thread with low priority could be used to manage this update process.

Anyway, as you say it should work fine :). May be to spot some problems, at the beginning a simple implementation could be used. If no major problems are reported, it could be extended and improved. It would be not so good if a large amount of time is invested at the beginning, just to know that the idea will not work. Better beginning with a simple solution and extend it successively.

Best wishes,
Michael

thomas:

--- Quote ---I think it is a good alternative Smile. The question is how to manage the updates of the map (addition, deletion, modification of a lexer). E.g., if you add/modify a lexer would C::B re-parses all the lexer again or just to new/modified one? If you re-scan all the lexers (easiest solution), it would take time and may be the user will not appreciate. May be a thread with low priority could be used to manage this update process.
--- End quote ---
Modifying a lexer should not matter at all (unless you change the file mapping). Reparsing everything from scratch is very attractive, as it is simple to implement. It may take 3-5 seconds, but so what... you don't add new lexers every day :)
Deletion should not be a problem, if the file is not found, you simply return the same value (called LEX_NONE or something) that is returned if a lexer is not known at all.

Putting those extension/file mappings into the config is probably the least painful. I would not want to require the user to edit a configuration file by hand just to add a lexer. Also, this would not work well with internet update/install. To modify an external file, we would need to either implement a complete parser or distribute a tool like sed or something with Code::Blocks. On the other hand, allowing the updater to fire a "reload lexers" event is trivial and 100% safe.

Michael:

--- Quote from: thomas on April 05, 2006, 03:08:19 pm ---
--- Quote ---I think it is a good alternative Smile. The question is how to manage the updates of the map (addition, deletion, modification of a lexer). E.g., if you add/modify a lexer would C::B re-parses all the lexer again or just to new/modified one? If you re-scan all the lexers (easiest solution), it would take time and may be the user will not appreciate. May be a thread with low priority could be used to manage this update process.

--- End quote ---
Modifying a lexer should not matter at all (unless you change the file mapping). Reparsing everything from scratch is very attractive, as it is simple to implement. It may take 3-5 seconds, but so what... you don't add new lexers every day :)
Deletion should not be a problem, if the file is not found, you simply return the same value (called LEX_NONE or something) that is returned if a lexer is not known at all.

--- End quote ---

If it takes around 5 second or so, I think it is not an issue. And yes, you do not add a lexer each day :).

--- Quote from: thomas on April 05, 2006, 03:08:19 pm ---Putting those extension/file mappings into the config is probably the least painful. I would not want to require the user to edit a configuration file by hand just to add a lexer. Also, this would not work well with internet update/install. To modify an external file, we would need to either implement a complete parser or distribute a tool like sed or something with Code::Blocks. On the other hand, allowing the updater to fire a "reload lexers" event is trivial and 100% safe.

--- End quote ---

The user should not touch the XML file where the lexers and relative associations are stored, but just the table in C::B. The modifications are then stored by C::B. But if this might make problems, better a 100% safe solution as "reload lexers" :).

Best wishes,
Michael

takeshimiya:

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---
--- Quote ---TinyXML parsing is slow, with a rough measure of 200ms each lexer. [...]
SciTE loads way more lexers than C::B. The SciTE lexers also have more features.
--- End quote ---
You're comparing apples and oranges again. SciTE lexers have a collection of single line key/value pairs, and Code::Blocks lexers are xml documents that are validated for well-formedness. Of course it takes time to validate a document, this is not surprising.

--- End quote ---
Of course it's not surprising, and of course I'm comparing apples to oranges... Because they're different formats.
But that was a point.
Notice that those rough 200ms per xml lexer is on local disk, guess what takes to parse more than 50 C::B xml lexers.

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---The same goes for your network load story. You're missing the point here, too.
We are making on the order of 13,000 isolated file accesses during a "normal" startup. On a local file system, much of this can be cached, but it is absolutely not surprising that this is a major performance bottleneck over a network.

--- End quote ---
Yes, that's another point, further improvements can be done with caching,

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---wxWidgets makes on the order of 10,000 distinct file accesses alone to load the XRC files.

--- End quote ---
I thought that the XRC's were loaded from the zips, which then were read from memory instead of disk.
(zip's from disk, xrc's from memory, uncompressed).

--- Quote from: thomas on April 05, 2006, 11:50:19 am ---To get back to TinyXML which is so terribly slow: the configuration file loads with about 6-7 file accesses, and all lexers are loaded using about 100 distinct file accesses. The time that TinyXML takes to parse those files is just ridiculous compared to the network latency of 10k accesses...

--- End quote ---
True, but if network latency were the only issue, why the SciTE lexers takes 1 second on LAN, while it haves more lexers?
What is SciTE doing <somehow> that reduces network latency? Perhaps what Michael suggested?

--- Quote from: mandrav on April 05, 2006, 11:28:14 am ---Revision 2306 has fixed the delay when opening "Settings->Editor". That's a start ;).

--- End quote ---
Great :D

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version