Author Topic: EOL issue, the EOL should be automatically detected, and make consistent  (Read 34573 times)

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #15 on: August 28, 2012, 09:47:14 am »
... you shouldn't not forget to do nothing (!) if the file dos not contain a line-feed, btw. For this to work, you can return true/false in the method you've introduced.
Thanks, here is an improved patch, which handle them in the "else" clause.

Code
Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,42 @@
 
 END_EVENT_TABLE()
 
+// Count lines of EOL style in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    int lengthDoc = control->GetLength();
+    const int maxLengthDoc = 1000000;
+    char chPrev = ' ';
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++)
+    {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r')
+        {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        }
+        else if (ch == '\n')
+        {
+            if (chPrev != '\r')
+            {
+                linesLF++;
+            }
+        }
+        else if (i > maxLengthDoc)     // stop the loop if the file contains too many characters
+            return;
+
+        chPrev = ch;
+    }
+}
+
 // class constructor
 cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
     : EditorBase(parent, filename),
@@ -1557,7 +1593,6 @@
         control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);
 
     // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
     control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
     control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
     control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1610,79 @@
 
     ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));
 
+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+
+    // The code snippet of auto detect EOL is copied from SciTE's source, CountLineEnds() function
+    // scans the current source file, and counts lines of each EOL style, then finally sets the EOL
+    // by voting logic. In the case of mixed EOL files, we give the user a beep and InfoWindow notification.
+    if (eolMode == 3) //auto detect the EOL
+    {
+        int linesCR;
+        int linesLF;
+        int linesCRLF;
+
+        // count lines of each EOL style
+        CountLineEnds(control, linesCR, linesLF, linesCRLF);
+
+        //voting logic
+        unsigned int delay = 2000;
+        if (((linesLF >= linesCR) && (linesLF > linesCRLF)) || ((linesLF > linesCR) && (linesLF >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_LF;
+            if((linesCR>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"LF\""), delay);
+            }
+        }
+        else if (((linesCR >= linesLF) && (linesCR > linesCRLF)) || ((linesCR > linesLF) && (linesCR >= linesCRLF)))
+        {
+            eolMode = wxSCI_EOL_CR;
+            if((linesLF>0) || (linesCRLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR\""), delay);
+            }
+        }
+        else if (((linesCRLF >= linesLF) && (linesCRLF > linesCR)) || ((linesCRLF > linesLF) && (linesCRLF >= linesCR)))
+        {
+            eolMode = wxSCI_EOL_CRLF;
+            if((linesCR>0) || (linesLF>0))
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode \"CR-LF\""), delay);
+            }
+        }
+        else // in the case of the file does not contain any line-feed, or the file have equally counts of EOL styles
+        {
+            //set to the fall back mode
+            wxString eolModeStr;
+            if (platform::windows)
+            {
+                eolMode =  wxSCI_EOL_CRLF;
+                eolModeStr = _T("\"CR-LF\"");
+            }
+            else
+            {
+                eolMode =  wxSCI_EOL_LF;
+                eolModeStr = _T("\"LF\"");
+            }
+
+            if((linesCR>0) || (linesLF>0) || (linesCRLF>0)) //equal counts
+            {
+                wxBell();
+                InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode ") + eolModeStr, delay);
+            }
+            else //no line-feed found
+            {
+                wxBell();
+                InfoWindow::Display(_("No Line Endings"), _("No Line Endings found, setting mode ") + eolModeStr, delay);
+            }
+        }
+    }
+    control->SetEOLMode(eolMode);
+
     // Interpret #if/#else/#endif to grey out code that is not active
     control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));
 
Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                 <item>CR LF</item>
                                                 <item>CR</item>
                                                 <item>LF</item>
+                                                <item>AUTO</item>
                                               </content>
                                               <style>wxCB_READONLY</style>
                                             </object>


I have tested several cases, the only minor issue can be: e.g. Under Windows, a file have: linesCR = 0; linesLF = 5; linesCRLF = 5; --------> this will set the eol mode to "LF"......Logicially, I think it should set the eol mode to "CRLF", because under Windows, the CRLF take precedence. Any good idea about the voting logic? I can't find a better, simple and clean way. ;)
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9693
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #16 on: August 28, 2012, 10:13:56 am »
Logicially, I think it should set the eol mode to "CRLF", because under Windows, the CRLF take precedence. Any good idea about the voting logic? I can't find a better, simple and clean way. ;)
Well you can initialise eolMode according the platform and then only change it if you are sure you counted enough different EOL's (so not >=, but only >).
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13406
    • Travis build status
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #17 on: August 28, 2012, 10:14:30 am »
Is it possible to put the new code in a separate function?
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9693
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #18 on: August 28, 2012, 10:19:31 am »
Is it possible to put the new code in a separate function?
Maybe even rename CountLineEnds to DetectLineEnds (or alike) and have have it return what eolMode should be set.

Which makes me wonder: Could it be, that scintilla has such a function, too?
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #19 on: August 28, 2012, 10:29:53 am »
Is it possible to put the new code in a separate function?
Maybe even rename CountLineEnds to DetectLineEnds (or alike) and have have it return what eolMode should be set.

Which makes me wonder: Could it be, that scintilla has such a function, too?
Scintilla does not have such feature, but SciTE does....
Scintilla should keep as simple as it can. (From my point of view), so it leave this function to client.

Is it possible to put the new code in a separate function?
OK, I will did this in the next patch.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #20 on: August 28, 2012, 04:43:16 pm »
OK, I will did this in the next patch.
Ok, see this patch. Any further comments?
Code
Index: cbeditor.cpp
===================================================================
--- cbeditor.cpp (revision 8249)
+++ cbeditor.cpp (working copy)
@@ -731,6 +731,95 @@
 
 END_EVENT_TABLE()
 
+// Count lines of EOL style in the opened file
+static void CountLineEnds(cbStyledTextCtrl* control, int &linesCR, int &linesLF, int &linesCRLF)
+{
+    linesCR = 0;
+    linesLF = 0;
+    linesCRLF = 0;
+
+    int lengthDoc = control->GetLength();
+    const int maxLengthDoc = 1000000;
+    char chPrev = ' ';
+    char chNext = control->GetCharAt(0);
+    for (int i = 0; i < lengthDoc; i++)
+    {
+        char ch = chNext;
+        chNext = control->GetCharAt(i + 1);
+        if (ch == '\r')
+        {
+            if (chNext == '\n')
+                linesCRLF++;
+            else
+                linesCR++;
+        }
+        else if (ch == '\n')
+        {
+            if (chPrev != '\r')
+            {
+                linesLF++;
+            }
+        }
+        else if (i > maxLengthDoc)     // stop the loop if the file contains too many characters
+            return;
+
+        chPrev = ch;
+    }
+}
+
+static int DetectLineEnds(cbStyledTextCtrl* control)
+{
+    int eolMode;
+    wxString eolModeStr;
+    // initial EOL mode depend on OS
+    if (platform::windows)
+    {
+        eolMode =  wxSCI_EOL_CRLF;
+        eolModeStr = _T("\"CR-LF\"");
+    }
+    else
+    {
+        eolMode =  wxSCI_EOL_LF;
+        eolModeStr = _T("\"LF\"");
+    }
+
+    int linesCR;
+    int linesLF;
+    int linesCRLF;
+    // count lines of each EOL style
+    CountLineEnds(control, linesCR, linesLF, linesCRLF);
+
+    // voting logic
+    // if the file does not contain any line-feed or the most largest counts are equal( e.g.: linesLF=5,
+    // linesCRLF=5, linesCR=0 ), then we will use the initial EOL mode
+    if ( (linesLF > linesCR) && (linesLF > linesCRLF) )
+    {
+        eolMode = wxSCI_EOL_LF;
+        eolModeStr = _T("\"LF\"");
+    }
+    else if ( (linesCR > linesLF) && (linesCR > linesCRLF) )
+    {
+        eolMode = wxSCI_EOL_CR;
+        eolModeStr = _T("\"CR\"");
+    }
+    else if ( (linesCRLF > linesLF) && (linesCRLF > linesCR))
+    {
+        eolMode = wxSCI_EOL_CRLF;
+        eolModeStr = _T("\"CR-LF\"");
+    }
+
+    unsigned int delay = 2000;
+    if (  ( (linesCR>0) && (linesCRLF>0) )
+       || ( (linesLF>0) && (linesCRLF>0) )
+       || ( (linesCR>0) && (linesLF>0) ) )
+    {
+        //In mixed EOL file, give the user a beep and InfoWindow notification.
+        wxBell();
+        InfoWindow::Display(_("Mixed Line Endings"), _("Mixed line endings found, setting mode ") + eolModeStr, delay);
+    }
+    return eolMode;
+}
+
 // class constructor
 cbEditor::cbEditor(wxWindow* parent, const wxString& filename, EditorColourSet* theme)
     : EditorBase(parent, filename),
@@ -1557,7 +1646,6 @@
         control->SetMarginWidth(C_CHANGEBAR_MARGIN, 0);
 
     // NOTE: duplicate line in editorconfigurationdlg.cpp (ctor)
-    control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
     control->SetScrollWidthTracking(      mgr->ReadBool(_T("/margin/scroll_width_tracking"), false));
     control->SetMultipleSelection(        mgr->ReadBool(_T("/selection/multi_select"),       false));
     control->SetAdditionalSelectionTyping(mgr->ReadBool(_T("/selection/multi_typing"),       false));
@@ -1575,6 +1663,14 @@
 
     ConfigManager* mgr = Manager::Get()->GetConfigManager(_T("editor"));
 
+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only
+    int eolMode = mgr->ReadInt(_T("/eol/eolmode"), platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF);
+
+    if (eolMode == 3) //auto detect the EOL
+        eolMode = DetectLineEnds(control);
+
+    control->SetEOLMode(eolMode);
+
     // Interpret #if/#else/#endif to grey out code that is not active
     control->SetProperty(_T("lexer.cpp.track.preprocessor"), mgr->ReadBool(_T("/track_preprocessor"), true) ? _T("1") : _T("0"));
 
Index: resources/editor_configuration.xrc
===================================================================
--- resources/editor_configuration.xrc (revision 8249)
+++ resources/editor_configuration.xrc (working copy)
@@ -200,6 +200,7 @@
                                                 <item>CR LF</item>
                                                 <item>CR</item>
                                                 <item>LF</item>
+                                                <item>AUTO</item>
                                               </content>
                                               <style>wxCB_READONLY</style>
                                             </object>

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Jenna

  • Administrator
  • Lives here!
  • *****
  • Posts: 7252
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #21 on: August 28, 2012, 05:17:13 pm »
Code
+    // set the EOL, fall back value: Windows takes CR+LF, other platforms LF only

The default line-ending on Mac is CR as far as I know.

Sorry to make it a little more complicated.

Offline MortenMacFly

  • Administrator
  • Lives here!
  • *****
  • Posts: 9693
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #22 on: August 28, 2012, 07:12:07 pm »
The default line-ending on Mac is CR as far as I know.
Sorry to make it a little more complicated.
EOL's and he 21st century... still feeling like stone-age... ::)
Compiler logging: Settings->Compiler & Debugger->tab "Other"->Compiler logging="Full command line"
C::B Manual: https://www.codeblocks.org/docs/main_codeblocks_en.html
C::B FAQ: https://wiki.codeblocks.org/index.php?title=FAQ

Offline oBFusCATed

  • Developer
  • Lives here!
  • *****
  • Posts: 13406
    • Travis build status
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #23 on: August 28, 2012, 09:06:04 pm »
I think CR or LF are supported on MacOSX, prior to 10.xx, the CR was the default and now with the FreeBSD heritage it is the LF.
But I think most softwares support both and users have files with both endings.
(most of the time I ignore long posts)
[strangers don't send me private messages, I'll ignore them; post a topic in the forum, but first read the rules!]

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #24 on: August 29, 2012, 02:49:35 am »
I think CR or LF are supported on MacOSX, prior to 10.xx, the CR was the default and now with the FreeBSD heritage it is the LF.
But I think most softwares support both and users have files with both endings.


The default line-ending on Mac is CR as far as I know.

Adding the fall-back on Mac as CR is quite easy. But I'm mostly follow what Scintilla did and a discussion on their forum:
See:[scintilla] default eol mode on OS X - Google Groups

And in the source code: Document.cxx
Code
Document::Document() {
refCount = 0;
#ifdef _WIN32
eolMode = SC_EOL_CRLF;
#else
eolMode = SC_EOL_LF;
#endif
...
...
So, I believe Mac can support LF without any problem. :)

If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline dmoore

  • Developer
  • Lives here!
  • *****
  • Posts: 1576
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #25 on: August 29, 2012, 03:25:56 am »
Don't we set a CR for wxMAC in a few places? (E.g. the default when the user first uses C::B on a mac?)I think whatever behavior we have should be consistent in all places.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #26 on: August 29, 2012, 03:37:57 am »
Don't we set a CR for wxMAC in a few places? (E.g. the default when the user first uses C::B on a mac?)I think whatever behavior we have should be consistent in all places.
This is the old behavior in cbEditor.cpp
Code
control->SetEOLMode(                  mgr->ReadInt(_T("/eol/eolmode"),                   platform::windows ? wxSCI_EOL_CRLF : wxSCI_EOL_LF)); // Windows takes CR+LF, other platforms LF only
If the user does not select an eol mode, or read eol from configure manager failed, we use either CRLF(windows) or LF(non-windows). So, I think my new patch is just consistent with the old behavior.

EDIT:
I just search the C::B source, and it looks like the CR is the old mac EOL style, see:
plugins\astyle\astyle\astyle_main.h
Code
int eolWindows;        // number of Windows line endings, CRLF
int eolLinux;          // number of Linux line endings, LF
int eolMacOld;         // number of old Mac line endings. CR
eolMacOld

« Last Edit: August 29, 2012, 03:40:39 am by ollydbg »
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 6034
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: EOL issue, the EOL should be automatically detected, and make consistent
« Reply #27 on: September 03, 2012, 04:37:34 am »
Committed in trunk rev 8332, thanks for everyone's suggestion.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.