Author Topic: Bug Report: [#18755] C::B hangs for 20 seconds while opening large project...  (Read 62950 times)

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Fine with me. I won't have time to test anything new for at least several days.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
GOOD NEWS EVERYONE!

I'm very close to fixing the bug!

I made a relative path cache: If a file is in an already-tested directory, its relative path is calculated based on its directory's relative path. When a file is not found in the cache, its paths are calculated using MortenMacFly's slow-but-sure path calculation via wxFileName.

Here are the preliminary results:

WITHOUT THE PATH CACHE:
Code
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():394  took : 50 ms
Project's common toplevel path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():460  took : 13496 ms
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():461: Num Files: 1927; Num Cache Misses: 1927

WITH THE PATH CACHE:
Code
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():404  took : 55 ms
Project's common toplevel path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():464  took : 2437 ms
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():465: Num Files: 1927; Num Cache Misses: 318

Now, since the files in the project are not in order (because we're using a HashMap and not an ordered map), many directories still trigger cache misses. Even so, execution time has disminished by 80%.

I was able to diminish the load time by a further 2% by aditionally checking the parent directory in the cache; Then again, the randomized file order doesn't help. So I'm rewriting the algorithm by using a cleaned-up recursive version of the cache so that we'll have to use wxFileName's functions only twice: Once for the working directory, and once for the project's common top level path; the rest will be mere string comparisons and concatenations. I'll keep you updated when I get the algorithm working.
« Last Edit: November 03, 2012, 03:37:24 am by rickg22 »

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
UPDATE:

I finished polishing the Path Cache. There are some edge cases I haven't tested, mainly at root directories or the top directories of other drives, both for filenames and for the project file.

For standard projects with all files under the project file's directory, things work Okay.

So... ladies and gentlemen... drum rolls, please...

Code
Loading project files...
1927 files loaded
Done loading project in 2042ms
Project's base path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():530  took : 50 ms
Project's common toplevel path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():570  took : 120 ms
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():571: Num Files: 1927; Num Cache Misses: 0

Ta-da!  8)

I'm attaching the patch. It needs some styling (you know, braces, tabs, etc), but I hope you like it.


Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Attached is an untested refinement to Morten's patch that uses the old fast method in the more normal case where files are below the project dir, and uses MakeRelativeTo otherwise. This could be combined with Rick's patch to reduce the amount of cache usage.

Here's the changed code:

Code
        if (   (prjHasUNCName && fileHasUNCName)
            || (   !prjHasUNCName
                && !fileHasUNCName
                && vol.IsSameAs(f->file.GetVolume()) ) )
        {
            if (!fileName.StartsWith(m_CommonTopLevelPath))
            {
                wxFileName relFileCTLP(f->file);
                relFileCTLP.MakeRelativeTo( m_CommonTopLevelPath );
                f->relativeToCommonTopLevelPath = relFileCTLP.GetFullPath();
            }
            else
                f->relativeToCommonTopLevelPath = fileName.Right(fileName.Length() - m_CommonTopLevelPath.Length());

            if (!fileName.StartsWith(GetBasePath()))
            {
                wxFileName relFileBase(f->file);
                relFileBase.MakeRelativeTo( GetBasePath() );
                f->relativeFilename             = relFileBase.GetFullPath();
            }
            else
                f->relativeFilename = fileName.Right(fileName.Length() - GetBasePath().Length());
        }

I am a little bit confused by this:

Code
+            // The commented (old) method to obtain the relativeToCommonTopLevelPath is fast, but does *not* work, if you save
+            // the project on a different drive in a sub-folder of an existing source file on that (different) drive:
+            // I.e.: Project on C:\Folder\Project.cbp has file C:\Folder\SubFolder\foo.cpp and D:\Folder\bar.cpp
+            // Saved the project under D:\Folder\SubFolder\ProjectNew.cbp would cause a wrong computation of bar.cpp otherwise!!

In this case should D:\Folder be the new common top level path? I'm not sure why the old method doesn't work unless the common top level path isn't being calculated correctly. Maybe the problem is really that f->relativeFilename isn't being calculated correctly?

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Attached is an untested refinement to Morten's patch that uses the old fast method in the more normal case where files are below the project dir, and uses MakeRelativeTo otherwise. This could be combined with Rick's patch to reduce the amount of cache usage.

Here's the changed code:

Code
...

Hmm... maybe that code will make my patch redundant, but I like having a separate cache for relative paths. It could become useful later, and it does separate all the path checking.

Quote
In this case should D:\Folder be the new common top level path? I'm not sure why the old method doesn't work unless the common top level path isn't being calculated correctly. Maybe the problem is really that f->relativeFilename isn't being calculated correctly?

Perhaps. Anyway I got another idea to speed up project loading, I'll post it in the next reply.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
OK guys, I found another bottleneck in project loading!

Code
ProjectFile* cbProject::AddFile(int targetIndex, const wxString& filename, bool compile, bool link, unsigned short int /*weight*/)
{
...
if (!wxFileExists(fullFilename))
pf->SetFileState(fvsMissing);
else if (!wxFile::Access(fullFilename.c_str(), wxFile::write)) // readonly
pf->SetFileState(fvsReadOnly);

In other words, C::B checks for the existence of ALL the project files on load! Just to set the little "broken" or readonly flags on the project tree.  Imagine doing this for a ten-thousand-files java project where you only got to modify the configuration -_-.

I changed it to this:

Code
namespace cbProjectAuxVariables {
bool check_for_existence_of_files = true;
};

...

ProjectFile* cbProject::AddFile(int targetIndex, const wxString& filename, bool compile, bool link, unsigned short int /*weight*/)
{
...

// Check for the file's status if the User chose to in the Environment settings dialog;
// or if the file is being added manually and not as part of the project loading.
    if(!m_CurrentlyLoading || cbProjectAuxVariables::check_for_existence_of_files) {
if (!wxFileExists(fullFilename))
pf->SetFileState(fvsMissing);
else if (!wxFile::Access(fullFilename.c_str(), wxFile::write)) // readonly
pf->SetFileState(fvsReadOnly);
    }


cbProjectAuxVariables::check_for_existence_of_files is modified on project load by checking a new Environment setting:

Code
<object class="sizeritem">
<object class="wxCheckBox" name="chkMissingFilesOnLoad">
<label>Check for missing files on project load</label>
<tooltip>When checked, Code::Blocks will check that all the project files exist,&#x0A;and update their status on the Project Tree.&#x0A;Useful at times, but slower for very large projects.</tooltip>
<checked>1</checked>
</object>
<flag>wxTOP|wxLEFT|wxEXPAND|wxALIGN_LEFT|wxALIGN_TOP</flag>
<border>8</border>
</object>

And here's the result, with my previous patch and the file checking on load disabled:

Code
1927 files loaded
Done loading project in 365ms
Project's base path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():539  took : 50 ms
Project's common toplevel path: C:\htdocs\SIG\SGPG\
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():579  took : 130 ms
C:\projects\cb\src\sdk\cbproject.cpp::void cbProject::CalculateCommonTopLevelPath():580: Num Files: 1927; Num Cache Misses: 0

(Total project loading time: 2.50s)  8) ;D 8) ;D 8)

I'm attaching the full patch, including my previous changes. Enjoy.

[attachment deleted by admin]
« Last Edit: November 03, 2012, 03:47:20 pm by rickg22 »

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Hmm... maybe that code will make my patch redundant, but I like having a separate cache for relative paths. It could become useful later, and it does separate all the path checking.

Even if we do use the cache, it would make sense to use the logic to create the path (and add it to the cache) when there are cache missses to reduce the number of MakeRelativeTo calls.

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
OK guys, I found another bottleneck in project loading!

Another good catch. But instead of having an environment setting, why not make this a project setting and/or offer an annoying dialog for projects with a file count above a certain size to disable the check for the project? In general, I would think that if you have one big project open and some smaller ones open, you would still want the checks on the smaller projects.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
Actually, I was thinking on moving the whole file checking stuff to an event triggered by opening a node in the project tree. In the meantime, I think that having a user-modifiable setting will suffice. Users accustomed to having small projects won't bother changing the setting (which defaults to 'always check'), and users accustomed to large projects are free to trade between load speed and file checking.

We could also add a treshold number to define small and large projects (i.e. 100, 200 files?). In any case, at least now we have a choice. We can refine it later in further commits.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283

Even if we do use the cache, it would make sense to use the logic to create the path (and add it to the cache) when there are cache missses to reduce the number of MakeRelativeTo calls.

Yes, that's exactly what the cache does. Everytime it doesn't find a path, it creates it in memory and adds it to the list; (with the exception of files outside the base path i.e. the Common Top Level Path, because otherwise we could run in the risk of adding a parent path to the whole tree; actually, I think that extra check is unnecessary; we could cache external paths without problem, but we'd need to do some tests after disabling that check.)

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576

Even if we do use the cache, it would make sense to use the logic to create the path (and add it to the cache) when there are cache missses to reduce the number of MakeRelativeTo calls.

Yes, that's exactly what the cache does.

Sorry, wasn't clear. What I meant was whenever you have a cache miss, instead of using MakeRelativeTo in all cases, use the faster method when possible.

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283

Sorry, wasn't clear. What I meant was whenever you have a cache miss, instead of using MakeRelativeTo in all cases, use the faster method when possible.

Ah, my apologies. I forgot to explain how my Cache works, and what I consider a "cache miss".

It's not really a full path cache; it's a PARENT PATH Cache. And it's recursive :D

Let's say that your filename has a full path + name like this:

C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader.h

What my cache does is searching first for the parent path, and if it doesn't find it, go to its own parent path, and so on, like this:

Code
C:\myProjects\myProject1\src\plugins\myplugin\includes
C:\myProjects\myProject1\src\plugins\myplugin
C:\myProjects\myProject1\src\plugins
C:\myProjects\myProject1\src
C:\myProjects\myProject1

At that point, C:\myProjects\myProject1 is actually our project common top level path. So it's found (which is not considered a cache miss, because we can construct the relative path without resorting to wxFilename's functions). Then, as the cache returns from its call stack, adds the items in the reverse order in which they were searched:

Code
C:\myProjects\myProject1\src => src
C:\myProjects\myProject1\src\plugins => src\plugins
C:\myProjects\myProject1\src\plugins\myplugin => src\plugins\myplugin
C:\myProjects\myProject1\src\plugins\myplugin\includes => src\plugins\myplugin\includes
C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader.h => src\plugins\myplugin\includes\myheader.h (not added to the cache, just calculated)

The actual filename, C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader.h is never added to the cache, because it will only be requested ONCE. However,

Code
C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader2.h
C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader3.h
C:\myProjects\myProject1\src\plugins\myplugin\includes\myheader4.h

WILL be requested. Their common denominator is their path, which is already cached. And so are their parent directories.

This means that at most, ONE cache miss will be present: The project's common top level path itself, but that's already added by default!  8)

The real cache misses will be files outside the project's common top level path, which require using wxFilename anyway, so they're not added to the cache. BUT I might add them, I just wasn't sure if adding them would trigger some misbehavior in the cache...
« Last Edit: November 04, 2012, 12:15:52 am by rickg22 »

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
I just realized something.

With minor modifications to my cache's constructor, we can add all the parent paths from the project's CTLP to the root directory, and we will cover ALL cases!

Code
C:\projects\myProject1 => ''
C:\projects => '..'
C:\ => '..\..'

which means that

Code
C:\somepathoutside\someotherpath\whatever\myexternalfile.cpp => ..\..\somepathoutside\someotherpath\whatever\myexternalfile.cpp
At this point, using wxFilename won't be necessary because it's already certain that the file we're searching for and our project reside in the same unit (that's the big "if" that was added in the latest SVN, for cases with different units, the absolute paths are added verbatim). And since they're in the same unit, we have no cache misses AT ALL! For ANY FILE, either inside or outside our CTLP.

(The only caveat is if we have a symlink circular reference around... I'd need to think about how to solve that)
« Last Edit: November 04, 2012, 12:44:50 am by rickg22 »

Offline rickg22

  • Lives here!
  • ****
  • Posts: 2283
OK, I'm adding a *disabled* version of the path cache to trunk (to enable it, you have to #define use_path_cache_for_ctlp). I moved the path cache code to a separate file, so the code changes can be more obvious.

I'm also adding the "check for missing files" option.

Should I commit, or just post another patch? The code's ready.
« Last Edit: November 04, 2012, 02:41:39 am by rickg22 »

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Should I commit, or just post another patch? The code's ready.
It is tested against UNC path's again and also against different project tree layout options?

BTW: I had a few days off and I am trying to catch up - what's with the other patches in the other posts of this thread? Does this one cover all?

Oh - and btw: I don't know if I personally would sacrifice the read-only flag for speed. When working with SVN controlled projects its often vital to see if a file is write-protected (i.e. locked). So I would definitely like to see a global option to be able to always have this check on in one place.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ