Author Topic: (excessively) large source editing  (Read 3956 times)

Offline heilpern

  • Single posting newcomer
  • *
  • Posts: 7
(excessively) large source editing
« on: October 25, 2007, 05:55:58 pm »
Using SVN 4554, the precompiled build for Windows...

I opened a source file someone provided me... this file is HUGE... 129kB. In CodeBlocks, the editor does not display any contents of this file; instead it acts as if my file has a single empty line in it.

I know the file is ridiculously large and certainly not typical of anything any code editor should ever see, but if the editor does see this, I'd hope it would either still deal with it, or display an error.

Is this a known issue?

Offline Biplab

  • Developer
  • Lives here!
  • *****
  • Posts: 1874
    • Biplab's Blog
Re: (excessively) large source editing
« Reply #1 on: October 25, 2007, 06:20:31 pm »
Using SVN 4554, the precompiled build for Windows...

I opened a source file someone provided me... this file is HUGE... 129kB. In CodeBlocks, the editor does not display any contents of this file; instead it acts as if my file has a single empty line in it.

I know the file is ridiculously large and certainly not typical of anything any code editor should ever see, but if the editor does see this, I'd hope it would either still deal with it, or display an error.

Is this a known issue?

129 kb is not that large. But I suspect the file's encoding is not detected properly. Which encoding is being detected (you can see this at the status bar)?

Could you try opening the same file in a pre-compiled nightly < revision 4548?

Also if possible please post the zipped file. That would help us trace the error.
Be a part of the solution, not a part of the problem.

Offline heilpern

  • Single posting newcomer
  • *
  • Posts: 7
Re: (excessively) large source editing
« Reply #2 on: October 25, 2007, 07:10:14 pm »
The file encoding is interpreted as "UTF-8". Is that likely to be the problem?

I haven't had a chance to try with an older binary yet; please tell me if you think this would still be helpful. I've isolated an offending line of code (a comment) in my source which leads to the file being interpreted as UTF-8, and verified that a new file with just this line is still unable to be processed. I've attached a zip of the file here.

[attachment deleted by admin]

Offline Biplab

  • Developer
  • Lives here!
  • *****
  • Posts: 1874
    • Biplab's Blog
Re: (excessively) large source editing
« Reply #3 on: October 25, 2007, 07:31:49 pm »
The file encoding is interpreted as "UTF-8". Is that likely to be the problem?

I haven't had a chance to try with an older binary yet; please tell me if you think this would still be helpful. I've isolated an offending line of code (a comment) in my source which leads to the file being interpreted as UTF-8, and verified that a new file with just this line is still unable to be processed. I've attached a zip of the file here.

Thanks a lot for the pointing out the offending line. The encoding detection code has been improved; but still it's not perfect. It may give some false positives.

It detects UTF-8 signature in some bytes and assumes that the whole file is in UTF-8 encoding. We don't use any charset detection library for this purpose. So this false-positives may appear.

Could you do the following steps?
1) Save the file (without the offending line) in UTF-8 encoding first.
2) Now write the offending line and press save.
3) Close and reopen the file to see if C::B can detect and reopen the file properly. See if the file has been saved properly or not.
Be a part of the solution, not a part of the problem.

Offline Biplab

  • Developer
  • Lives here!
  • *****
  • Posts: 1874
    • Biplab's Blog
Re: (excessively) large source editing
« Reply #4 on: October 25, 2007, 08:12:48 pm »
After some tests, it shows that Byte 15 and 16 passes UTF-8 signature tests and thus C::B assumes that the whole file is a UTF-8 file. Later the encoding conversion routine fails. That is the reason that you're greeted with a blank line.

The file is actually encoded with CP932 encoding whose detection is not supported by C::B.

I'll try to add some options so that  one will be able to load a file with a pre-defined encoding (where user would be able to specify encoding). :)
« Last Edit: October 25, 2007, 08:14:30 pm by Biplab »
Be a part of the solution, not a part of the problem.

Offline heilpern

  • Single posting newcomer
  • *
  • Posts: 7
Re: (excessively) large source editing
« Reply #5 on: October 25, 2007, 10:15:41 pm »
Actually, I don't know how to specify the encoding type when saving a file. If you can tell me how, I'd be happy to do what you asked (assuming you still could use that info).

The editor I was able to open the original in is Source Insight. I don't know if any C::B developers have seen that tool, but it's a really nice (commercial) programmer's editor, worthy of emulating features from.