Author Topic: What about using debugging symbols for code completion?  (Read 1404 times)

Offline Calmarius

  • Multiple posting newcomer
  • *
  • Posts: 32
What about using debugging symbols for code completion?
« on: December 16, 2015, 01:12:46 pm »
I don't know if this is already discussed here but I give it a try:

Parsing C++ is extremely difficult. Also the compiler, the makefile or the build system may define extra preprocessor symbols the parser may not know about derailing the parser easily.

But the information we can use for code completion is already there inside or next to every executable: the debug information.

This debug information contains basically everything one may need for code completion: a complete symbol graph, with line and scope information.

So far I have experience with PDB files and DIA SDK. I believe the same graph available in DWARF files as well.

Knowing the source file and the line number we can look up the scope that line is in and we can walk the symbol graph, and we don't need to bother with parsing. And it's possible to always get the right list of symbols.

But it has some drawbacks:

- You need to compile your program to get the information.
- The editor needs to keep track changed lines to be able to correctly match the line numbers to the built binary.
- Macros and documentation comments won't be available.
- Symbols that are optimized out may not be available.
- You will see macro expanded symbol names like "MessageBoxW" instead of the preferred "MessageBox".

What's your opinions about this idea?
« Last Edit: December 16, 2015, 01:16:43 pm by Calmarius »

Offline ollydbg

  • Developer
  • Lives here!
  • *****
  • Posts: 4909
  • OpenCV and Robotics
    • Chinese OpenCV forum moderator
Re: What about using debugging symbols for code completion?
« Reply #1 on: December 16, 2015, 03:31:39 pm »
I think it is hard to get code completion from analysing the debug info. Did you try it? As I see, even gdb debugger has very limited code complete support. Have you see some plugins like clangcc which gives really sematic code completion.
If some piece of memory should be reused, turn them to variables (or const variables).
If some piece of operations should be reused, turn them to functions.
If they happened together, then turn them to classes.

Offline Calmarius

  • Multiple posting newcomer
  • *
  • Posts: 32
Re: What about using debugging symbols for code completion?
« Reply #2 on: December 16, 2015, 07:00:22 pm »
I wrote this program to test with:

Code: [Select]
int a;
int b;

typedef struct
{
struct
{
int a;
union
{
struct
{
int x, y, z;
} v;
float k;
} u;
} x;
} NestedStruct;

namespace whatever
{
template <class T> class Tmpl
{
T a;
T b;
};
}

using namespace whatever;

int main()
{
    int c = 0;
    int d = c;
    volatile int a;
    volatile int b;
    Tmpl<int> templated;
   
    if (a < b)
    {
int itsSmaller = 666;
}
else
{
int itsLarger = 777;
}
   
    return 0;
}

Then downloaded dwarfdump and dumped the debug symbols and got this:

Code: [Select]
.debug_info

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              "GNU C++ 4.8.1 -mtune=generic -march=x86-64 -g -fstack-protector"
                    DW_AT_language              DW_LANG_C_plus_plus
                    DW_AT_name                  "main.cpp"
                    DW_AT_comp_dir              "/home/calmarius/stuff/source/crucible"
                    DW_AT_low_pc                0x004004d0
                    DW_AT_high_pc               <offset-from-lowpc>55
                    DW_AT_stmt_list             0x00000000

LOCAL_SYMBOLS:
< 1><0x0000002d>    DW_TAG_base_type
                      DW_AT_byte_size             0x00000004
                      DW_AT_encoding              DW_ATE_signed
                      DW_AT_name                  "int"
< 1><0x00000034>    DW_TAG_base_type
                      DW_AT_byte_size             0x00000004
                      DW_AT_encoding              DW_ATE_float
                      DW_AT_name                  "float"
< 1><0x0000003b>    DW_TAG_namespace
                      DW_AT_name                  "whatever"
                      DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                      DW_AT_decl_line             0x00000015
                      DW_AT_sibling               <0x0000006b>
< 2><0x00000046>      DW_TAG_class_type
                        DW_AT_name                  "Tmpl<int>"
                        DW_AT_byte_size             0x00000008
                        DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                        DW_AT_decl_line             0x00000016
< 3><0x0000004e>        DW_TAG_member
                          DW_AT_name                  "a"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000018
                          DW_AT_type                  <0x0000002d>
                          DW_AT_data_member_location  0
< 3><0x00000058>        DW_TAG_member
                          DW_AT_name                  "b"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000019
                          DW_AT_type                  <0x0000002d>
                          DW_AT_data_member_location  4
< 3><0x00000062>        DW_TAG_template_type_parameter
                          DW_AT_name                  "T"
                          DW_AT_type                  <0x0000002d>
< 1><0x0000006b>    DW_TAG_imported_module
                      DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                      DW_AT_decl_line             0x0000001d
                      DW_AT_import                <0x0000003b>
< 1><0x00000072>    DW_TAG_subprogram
                      DW_AT_external              yes(1)
                      DW_AT_name                  "main"
                      DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                      DW_AT_decl_line             0x0000001f
                      DW_AT_type                  <0x0000002d>
                      DW_AT_low_pc                0x004004d0
                      DW_AT_high_pc               <offset-from-lowpc>55
                      DW_AT_frame_base            len 0x0001: 9c: DW_OP_call_frame_cfa
                      DW_AT_GNU_all_call_sites    yes(1)
                      DW_AT_sibling               <0x00000128>
< 2><0x00000093>      DW_TAG_lexical_block
                        DW_AT_low_pc                0x004004d4
                        DW_AT_high_pc               <offset-from-lowpc>49
< 3><0x000000a4>        DW_TAG_variable
                          DW_AT_name                  "c"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000021
                          DW_AT_type                  <0x0000002d>
                          DW_AT_location              len 0x0002: 9150: DW_OP_fbreg -48
< 3><0x000000b0>        DW_TAG_variable
                          DW_AT_name                  "d"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000022
                          DW_AT_type                  <0x0000002d>
                          DW_AT_location              len 0x0002: 9154: DW_OP_fbreg -44
< 3><0x000000bc>        DW_TAG_variable
                          DW_AT_name                  "a"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000023
                          DW_AT_type                  <0x00000128>
                          DW_AT_location              len 0x0002: 914c: DW_OP_fbreg -52
< 3><0x000000c8>        DW_TAG_variable
                          DW_AT_name                  "b"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000024
                          DW_AT_type                  <0x00000128>
                          DW_AT_location              len 0x0002: 9160: DW_OP_fbreg -32
< 3><0x000000d4>        DW_TAG_variable
                          DW_AT_name                  "templated"
                          DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                          DW_AT_decl_line             0x00000025
                          DW_AT_type                  <0x00000046>
                          DW_AT_location              len 0x0002: 9160: DW_OP_fbreg -32
< 3><0x000000e2>        DW_TAG_lexical_block
                          DW_AT_low_pc                0x004004f0
                          DW_AT_high_pc               <offset-from-lowpc>7
                          DW_AT_sibling               <0x00000106>
< 4><0x000000f7>          DW_TAG_variable
                            DW_AT_name                  "itsSmaller"
                            DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                            DW_AT_decl_line             0x00000029
                            DW_AT_type                  <0x0000002d>
                            DW_AT_location              len 0x0002: 9158: DW_OP_fbreg -40
< 3><0x00000106>        DW_TAG_lexical_block
                          DW_AT_low_pc                0x004004f9
                          DW_AT_high_pc               <offset-from-lowpc>7
< 4><0x00000117>          DW_TAG_variable
                            DW_AT_name                  "itsLarger"
                            DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                            DW_AT_decl_line             0x0000002d
                            DW_AT_type                  <0x0000002d>
                            DW_AT_location              len 0x0002: 915c: DW_OP_fbreg -36
< 1><0x00000128>    DW_TAG_volatile_type
                      DW_AT_type                  <0x0000002d>
< 1><0x0000012d>    DW_TAG_variable
                      DW_AT_name                  "a"
                      DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                      DW_AT_decl_line             0x00000001
                      DW_AT_type                  <0x0000002d>
                      DW_AT_external              yes(1)
                      DW_AT_location              len 0x0009: 031c10600000000000: DW_OP_addr 0x0060101c
< 1><0x00000140>    DW_TAG_variable
                      DW_AT_name                  "b"
                      DW_AT_decl_file             0x00000001 /home/calmarius/stuff/source/crucible/main.cpp
                      DW_AT_decl_line             0x00000002
                      DW_AT_type                  <0x0000002d>
                      DW_AT_external              yes(1)
                      DW_AT_location              len 0x0009: 032010600000000000: DW_OP_addr 0x00601020

.debug_line: line number info for a single cu
Source lines (from CU-DIE at .debug_info offset 0x0000000b):

<pc>        [row,col] NS BB ET PE EB IS= DI= uri: "filepath"
NS new statement, BB new basic block, ET end of text sequence
PE prologue end, EB epilogue begin
IA=val ISA number, DI=val discriminator value
0x004004d0  [  32, 0] NS uri: "/home/calmarius/stuff/source/crucible/main.cpp"
0x004004d4  [  33, 0] NS
0x004004db  [  34, 0] NS
0x004004e1  [  39, 0] NS
0x004004f0  [  41, 0] NS
0x004004f9  [  45, 0] NS
0x00400500  [  48, 0] NS
0x00400505  [  49, 0] NS
0x00400507  [  49, 0] NS ET

.debug_pubnames

.debug_macinfo

.debug_string
name at offset 0x00000000, length   37 is '/home/calmarius/stuff/source/crucible'
name at offset 0x00000026, length   10 is 'itsSmaller'
name at offset 0x00000031, length   63 is 'GNU C++ 4.8.1 -mtune=generic -march=x86-64 -g -fstack-protector'
name at offset 0x00000071, length    9 is 'templated'
name at offset 0x0000007b, length    8 is 'main.cpp'
name at offset 0x00000084, length    4 is 'main'
name at offset 0x00000089, length    9 is 'Tmpl<int>'
name at offset 0x00000093, length    9 is 'itsLarger'
name at offset 0x0000009d, length    5 is 'float'
name at offset 0x000000a3, length    8 is 'whatever'

.debug_aranges

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              "GNU C++ 4.8.1 -mtune=generic -march=x86-64 -g -fstack-protector"
                    DW_AT_language              DW_LANG_C_plus_plus
                    DW_AT_name                  "main.cpp"
                    DW_AT_comp_dir              "/home/calmarius/stuff/source/crucible"
                    DW_AT_low_pc                0x004004d0
                    DW_AT_high_pc               <offset-from-lowpc>55
                    DW_AT_stmt_list             0x00000000


arange starts at 0x004004d0, length of 0x00000037, cu_die_offset = 0x0000000b
arange end
.debug_frame

.debug_static_func

.debug_static_vars

.debug_weaknames


You can see that function and variable names are recorded quite well. External variables also. But unused things are stripped (you don't find NestedStruct).

You can also see the location of the declaration, extra info like, position in the struct or relative address from the stack frame.

At the end you can see an address to line map.

Of course we need some work to make a reverse map to get address from line, and need to make the lookup structure to turn addresses to symbols to find out the current scope. But that's easier to do than doing the parsing ourselves.

So it seems dwarf debug info has the same capabilities the PDB debug info has. So far it looks like a perfect candidate to base code completion on.
« Last Edit: December 16, 2015, 07:08:31 pm by Calmarius »

Offline l_inc

  • Multiple posting newcomer
  • *
  • Posts: 56
Re: What about using debugging symbols for code completion?
« Reply #3 on: December 16, 2015, 11:19:07 pm »
Calmarius
Quote
So far it looks like a perfect candidate to base code completion on.
Considering all the disadvantages you mentioned, it's by far not perfect. Because of how laggy it would be I'd even argue it would be even less usable than the standard Code::Blocks CC plugin. This one however is almost perfect. And it already works well.

Offline Calmarius

  • Multiple posting newcomer
  • *
  • Posts: 32
Re: What about using debugging symbols for code completion?
« Reply #4 on: December 16, 2015, 11:55:43 pm »
Also given the problem of libraries built without debug symbols...

Probably I abandon the idea then...