Improving "search in files" with a word index? And other ideas with metadata

Developer forums (C::B DEVELOPMENT STRICTLY!) > Development

<< < (2/3) > >>

Jenna:

--- Quote from: ollydbg on October 13, 2011, 03:18:24 am ---
--- Code: ---@CBNOTE:45
--- End code ---
The doxygen already support put a link of images/latex style formula in the comment, so we can only interpret that special command, and show the image when the mouse hover it.

--- End quote ---
But that would mean, that todo's written for developpers get part of the doxygen-genrated docu, or is there a way to exclude some of them ?

rickg22:

--- Quote from: ollydbg on October 13, 2011, 03:18:24 am ---So, it looks like you want to implement a text search. Not reg search. right?
The dictionary could contain some thing like:

--- Code: ---keyword(string) -> [file index(int), offset in the file(int)]

--- End code ---
That's all.

--- End quote ---

Yes, it would be for normal searches (not regex). To find a string we would just find if all its words (or tokens, if you prefer) are present in the file, by using the index. This way we can discard files that cannot possibly contain our search string.

But I'm wondering how to do it in the most efficient and less-convoluted way. I think maintaining a global index would be overkill - perhaps doing it in a per-file basis would be the best. This way, each time a file was saved, only its index would be updated. Otherwise, we would need to use a database engine for it.

Maybe we could allow the user to have an (optional) SQL engine (with username, password) to store the offset values instead of flat data - or do we have an SQLite engine running with C::B already?

So, instead of searching for all the files, we would just parse the files index for the search.

oBFusCATed:

--- Quote from: rickg22 on October 15, 2011, 01:26:54 am ---This way, each time a file was saved, only its index would be updated. Otherwise, we would need to use a database engine for it.

--- End quote ---
I think it will be better to update the index/db when the user searches and the timestamp of the file is newer than the actual database.

--- Quote from: rickg22 on October 15, 2011, 01:26:54 am ---Maybe we could allow the user to have an (optional) SQL engine (with username, password) to store the offset values instead of flat data - or do we have an SQLite engine running with C::B already?

--- End quote ---
No SQLite used in C::B at the moment, but this engine is pretty slow.

eranif:

--- Quote from: oBFusCATed on October 15, 2011, 10:15:52 am ---No SQLite used in C::B at the moment, but this engine is pretty slow.

--- End quote ---

I recently used QDBM for a project of mine - and I can tell you that it is *way* faster than SQLite
You interact directly with the B-Tree, it has cursor functionality and it is even a transcational storage ;)

The Odeum API (is exactly what you are looking for):
http://fallabs.com/qdbm/spex.html#odeumapi

Villa API (b-tree API with transcation support):
http://fallabs.com/qdbm/spex.html#villaapi
It is licensed under the LGPL which I guess its OK for C::B

I used the Villa API because I needed the cursor functionality (it allows you a very fast search for a given prefix)
It also supports revert-index for full text search

You can also replace it default compare function per search, so you could perform searches case-sensitive / non-case-sensitive
Eran

ollydbg:
I just found that codelite now have a branch using QDBM. :D

But What I see from QDBM main page is:

--- Quote ---Copyright (C) 2000-2007 Mikio Hirabayashi
Last Update: Thu, 26 Oct 2006 15:00:20 +0900
--- End quote ---

Sounds like it have no updates in last fine years. :(

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version