Author Topic: EOL issue, the EOL should be automatically detected, and make consistent  (Read 32580 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
I just want to have a feature request:
When I open a LF file, when I hit enters, it should add many lines of LF.
When I open a CRLF file, when I hit enters, it should add many lines of CRLF.
But currently, I see that I can't do that, the EOL is fixed, and either I result a mixed EOF file, or I need to set the EOL in the editor when I open each projects.

This is really annoying.

I have see such report 6 years ago, see: Feature Request: Auto EOL Mode, but I don't think it was implemented.

BTW: I see that Notepad++ have such feature. Will C::B finally implement it?

EDIT: I also find on in the feature request:
Leave EOL format be

EDIT: I just tested the SciTE 3.2.1, and it works as what I want "leave as it is".
« Last Edit: August 24, 2012, 11:31:21 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
I dig the code a little, and see that the EOL is set in the editor before the file is loaded.

sdk\cbeditor.cpp

Code
// static
void cbEditor::InternalSetEditorStyleBeforeFileOpen(cbStyledTextCtrl* control)
{
    if (!control)
        return;

    ......
    ......


    // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
    control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
    control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
    control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
    if (mgr->ReadBool(_T("/selection/use_vspace"), false))
        control->SetVirtualSpaceOptions(wxSCI_SCVS_RECTANGULARSELECTION | wxSCI_SCVS_USERACCESSIBLE);
    else
        control->SetVirtualSpaceOptions(wxSCI_SCVS_NONE);
}
Here, it just read the "eol/eolmode" from the configure file.

So, my question is: If we have add an option like "leave the eol as it is", then we need to Set the EOL after the file is loaded, right?

If I remove the line "control->SetEOLMode....", I think the control should have a eol which is consistent to the opened file?
« Last Edit: August 24, 2012, 11:07:08 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Hi, guys, I dig it further, and I found that the SciTE use this code:

In its source: C:\scite321\scite\src\SciTEIO.cxx

Code
void SciTEBase::DiscoverEOLSetting() {
SetEol();
if (props.GetInt("eol.auto")) {
int linesCR;
int linesLF;
int linesCRLF;
CountLineEnds(linesCR, linesLF, linesCRLF);
if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_LF);
else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_CR);
else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
wEditor.Call(SCI_SETEOLMODE, SC_EOL_CRLF);
}
}

So, it change the EOL dynamically.

BTW: currently, by default, under Windows, it use CRLF, see:
[scintilla] default eol mode on OS X - Google Groups
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
I'm confused (though I think you want the correct thing, but brain is not working well during the last weeks, so I'm not sure).

To clarify, with "add many LF" do you mean "change editor's EOL mode to file's EOL mode" or do you mean "change all line endings in the file"? You basically want anything typed into a file to be whatever the file was before, correct?

Also, what if the file, like many files from OSS projects is mixed already?
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
I'm confused (though I think you want the correct thing, but brain is not working well during the last weeks, so I'm not sure).
Hi, thomas, sorry about the confusion, I'm not a English native speaker, so my English may be poor.

Quote
To clarify, with "add many LF" do you mean "change editor's EOL mode to file's EOL mode" or do you mean "change all line endings in the file"?
I say "add many LF" which means when I hit the enter key many times, the editor add many lines which have LF styled EOL.

Here is an example, when I open a file, which has LF styled EOL, by default, if I have the configuration with "CR LF" in the editor setting, the new added line in the editor will have "CR LF" ending, so, the file becomes "LF" and "CR LF" mixed.

See the image below: (The new added empty line have CRLF styled EOL, but the original file have all LF styled EOL)


On the other side, if you set the editor as "LF" format, then you will have the same issue that an original CRLF styled file will contains many LF styled lines.
Quote
You basically want anything typed into a file to be whatever the file was before, correct?
Yes, What I want is to let the editor automatically detect which EOL is used in the original file, then the new added line should still use same type of EOL.

Currently, both SciTE and NotePad++ has this feature, so I believe C::B should have this feature too.
Quote
Also, what if the file, like many files from OSS projects is mixed already?
For how to detect a files EOL, I think it is simple, just scan the first lines of the file, and set the EOL of the whole file as it is, I think SciTe use this kind of detection.

OK about my brain?  :)

EDIT:
It looks like to determine which EOL is used for a mixed EOL files, SciTe use a voting
Code
void SciTEBase::CountLineEnds(int &linesCR, int &linesLF, int &linesCRLF) {
linesCR = 0;
linesLF = 0;
linesCRLF = 0;
int lengthDoc = LengthDocument();
char chPrev = ' ';
TextReader acc(wEditor);
char chNext = acc.SafeGetCharAt(0);
for (int i = 0; i < lengthDoc; i++) {
char ch = chNext;
chNext = acc.SafeGetCharAt(i + 1);
if (ch == '\r') {
if (chNext == '\n')
linesCRLF++;
else
linesCR++;
} else if (ch == '\n') {
if (chPrev != '\r') {
linesLF++;
}
} else if (i > 1000000) {
return;
}
chPrev = ch;
}
}


« Last Edit: August 24, 2012, 04:43:49 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Looks like you are on the right track. Why don't you make a patch? For mixed files, voting makes sense.

Do we have room on the status bar for the EOL mode? (Or is it there already) Some sort of warning, especially for mixed sources would be nice (though not everyone likes warnings and they shouldn't be intrusive).

Offline thomas

  • Administrator
  • Lives here!
  • *****
  • Posts: 3979
OK about my brain?  :)
Yours is hopefully ok, it's mine that isn't good.

The approach sounds very good, though I would like to also reserve a way to always have a particular encoding and line ending (for example I'm using UTF-8 and LF everywhere, for everything, even under Windows -- don't want to have the editor interfere with this in any way, even hypothetically).

Maybe these options would be a good solution:
  • use system default
  • use XYZ line ending
  • use file's line ending (and fall back to system default)

To detect the encoding, I'd just scan over all lines (does not really take that much longer!) and take whichever encoding occurs more often. On a tie, fall back to system default.
"We should forget about small efficiencies, say about 97% of the time: Premature quotation is the root of public humiliation."

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Thank you thomas and dmoore for the constructive suggestion.

OK, I will take the job to implement this, but before that, I need to firstly comprehend what the current way do. (Now, I even do not understand what does "Ensure consistent EOLs" used for, so it will take some time to read the source)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
(Now, I even do not understand what does "Ensure consistent EOLs" used for, so it will take some time to read the source)

Converts all EOLs in the doc to whatever is the currently set EOL mode of that editor when you save.

Note that I put in some features related to EOLs in the editortweaks plugin so users could temporarly change from the default manually, per editor. Would be nice not to break that, but I suspect what you are doing will be fine. (If installed, those options are in Edit->Editor tweaks)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Here is the draft patch, I add an item "AUTO" in the editor setting, this has the value "3".
The changed file are: src\sdk\cbeditor.cpp and src\sdk\resources\editor_configuration.xrc
Note, I have problems to generate the patch file, because wxsmith change the resource file a lot(strange, it convert the spaces to tabs, I manually change them back to spaces), so I add the xrc file. Both of the files were generated by tortoiseGit, which have LF format.
I added a static function, which I think is not a good idea, right?
Comments are welcome. Thanks.

EDIT: I remove the old attachment, and the patch against the latest svn revision is below:
Code
Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,39 @@
 
 END_EVENT_TABLE()
 
+
+// Count the EOL in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    //int GetCharAt(int pos)
+    //int GetLength() const;
+    int lengthDoc = control->GetLength();
+    char chPrev = ' ';
+
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++) {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r') {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        } else if (ch == '\n') {
+            if (chPrev != '\r') {
+                linesLF++;
+            }
+        } else if (i > 1000000) {
+            return;
+        }
+        chPrev = ch;
+    }
+}
+
 // class constructor
 cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
     : EditorBase(parent, filename),
@@ -1557,7 +1590,6 @@
         control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);
 
     // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
     control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
     control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
     control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1607,23 @@
 
     ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));
 
+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+    if (eolMode == 3) //auto detect the eol
+    {
+        int linesCR;
+        int linesLF;
+        int linesCRLF;
+        CountLineEnds(control, linesCR, linesLF, linesCRLF);
+        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
+            eolMode = wxSCI_EOL_LF;
+        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
+            eolMode = wxSCI_EOL_CR;
+        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
+            eolMode = wxSCI_EOL_CRLF;
+    }
+    control->SetEOLMode(eolMode);
+
     // Interpret #if/#else/#endif to grey out code that is not active
     control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));
 
Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                 <item>CR LF</item>
                                                 <item>CR</item>
                                                 <item>LF</item>
+                                                <item>AUTO</item>
                                               </content>
                                               <style>wxCB_READONLY</style>
                                             </object>


« Last Edit: August 27, 2012, 04:59:40 pm by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #10 on: August 27, 2012, 11:38:32 am »
Damn, I found is that my local git clone was merged around 2012-08-10 (it is the code base my git patch was against on), and after that day( in 2012-08-14) there is a new svn commit 8230 in the xrc file, see:
WebSVN - codeblocks - Path Comparison - /trunk/src/sdk/resources/editor_configuration.xrc Rev 8142 and /trunk/src/sdk/resources/editor_configuration.xrc Rev 8230

So, I think lots of the "extra change"(I believed the bug of wxsmith) in my attachment file in the previous post did the same thing as the commit 8230.

Sorry about the noise. :-[
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ironhead

  • Almost regular
  • **
  • Posts: 210
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #11 on: August 27, 2012, 08:20:43 pm »
I've manually applied the patch against r.8250 (some of the offsets were wrong) and it's working really well!  Thank you for implementing this feature, I hope to see it applied to the official repository. ;)

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #12 on: August 28, 2012, 05:03:06 am »
I don't have time to test right now, but the code looks good. I have one small suggestion to use the info window to alert the user to the outcome of the voting logic if there are mixed line endings. In your logic that sets the line ending, you could modify it as follows:

Code
        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
        {
            eolMode = wxSCI_EOL_LF;
            if((linesCR>0) || (linesCRLF>0))
            {
                wxBell();
                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"LF\""), 1000);
            }
        }
        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
        {
            eolMode = wxSCI_EOL_CR;
            if((linesLF>0) || (linesCRLF>0))
            {
                wxBell();
                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR\""), 1000);
            }
        }
        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
        {
            eolMode = wxSCI_EOL_CRLF;
            if((linesCR>0) || (linesLF>0))
            {
                wxBell();
                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR-LF\""), 1000);
            }
        }
« Last Edit: August 28, 2012, 05:09:37 am by dmoore »

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 5910
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #13 on: August 28, 2012, 07:59:13 am »
Thanks ironhead for the test.
Thanks dmoore for the suggestion, below is the refined patch. (combined your suggestion, code style and comments refined)
Code
Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,42 @@
 
 END_EVENT_TABLE()
 
+// Count lines of EOL style in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    int lengthDoc = control->GetLength();
+    const int maxLineNumber = 1000000;
+    char chPrev = ' ';
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++)
+    {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r')
+        {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        }
+        else if (ch == '\n')
+        {
+            if (chPrev != '\r')
+            {
+                linesLF++;
+            }
+        }
+        else if (i > maxLineNumber)     // stop the loop if the file contains too many lines
+            return;
+
+        chPrev = ch;
+    }
+}
+
 // class constructor
 cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
     : EditorBase(parent, filename),
@@ -1557,7 +1593,6 @@
         control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);
 
     // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
     control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
     control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
     control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1610,53 @@
 
     ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));
 
+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+
+    // The code snippet of auto detect EOL is copied from SciTE's source, CountLineEnds() function
+    // scans the current source file, and counts lines of each EOL style, then finally sets the EOL
+    // by voting logic. In the case of mixed EOL files, we give the user a beep and InfoWindow notification.
+    if (eolMode == 3) //auto detect the EOL
+    {
+        int linesCR;
+        int linesLF;
+        int linesCRLF;
+
+        // count lines of each EOL style
+        CountLineEnds(control, linesCR, linesLF, linesCRLF);
+
+        //voting logic
+        unsigned int delay = 2000;
+        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_LF;
+            if((linesCR>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"LF\""), delay);
+            }
+        }
+        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_CR;
+            if((linesLF>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR\""), delay);
+            }
+        }
+        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
+        {
+            eolMode = wxSCI_EOL_CRLF;
+            if((linesCR>0) || (linesLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR-LF\""), delay);
+            }
+        }
+    }
+    control->SetEOLMode(eolMode);
+
     // Interpret #if/#else/#endif to grey out code that is not active
     control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));
 
Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                 <item>CR LF</item>
                                                 <item>CR</item>
                                                 <item>LF</item>
+                                                <item>AUTO</item>
                                               </content>
                                               <style>wxCB_READONLY</style>
                                             </object>


I would like to show the filename of the control in the InfoWindow::Display(), I see cbEditor(editorbase) have a member function:
Code
virtual const wxString& GetShortName() const { return m_Shortname; }
What a pity is that
Code
// static
void cbEditor::InternalSetEditorStyleAfterFileOpen(cbStyledTextCtrl* control)
This is a static member function, so no access to the filename.  :(
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9694
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #14 on: August 28, 2012, 08:02:50 am »
... you shouldn't not forget to do nothing (!) if the file dos not contain a line-feed, btw. For this to work, you can return true/false in the method you've introduced.
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ