Minggu, 06 Desember 2015
Devirtualization in C++, part 7 (Enforcing One Definition Rule)
Minggu, 06 Desember 2015 by Alfanet Fx
This time I am going to write about a feature that I implemented as a side effect of the devirtualization machinery: the verification of One Definition Rule on types.One Definition Rule (ODR) is one of more interesting cases where C and C++ differs in very subtle way. Wikipedia describes it as follows:
In short, the ODR states that:This means that C++ type names are global and should not be re-used in different ways in different source files. (In C doing so is perfectly valid and common, types across units are considered equivalent if their structure is.)Some violations of the ODR must be diagnosed by the compiler. Other violations, particularly those that span translation units, are not required to be diagnosed.[1]
- In any translation unit, a template, type, function, or object can have no more than one definition. Some of these can have any number of declarations. A definition provides an instance.
- In the entire program, an object or non-inline function cannot have more than one definition; if an object or function is used, it must have exactly one definition. You can declare an object or function that is never used, in which case you don't have to provide a definition. In no event can there be more than one definition.
- Some things, like types, templates, and extern inline functions, can be defined in more than one translation unit. For a given entity, each definition must be the same. Non-extern objects and functions in different translation units are different entities, even if their names and types are the same.
One of motivations for ODR is to make name mangling possible. Type names are used, for example, as program wide identifiers to distinguish functions and methods of same name but taking different types of arguments (function overloading):
struct A {int a;};This compilation unit will result in exporting two function symbols: _Z6getval1A and _Z6getval1B. A and B comes from name of the argument's type.
int getval (struct A a) {return a.a;}
struct B {char a;};
int getval (struct B a) { return a.a;}
Now, obviously, if another unit defines completely different structure A and completely different getval taking it as argument, the function will also be called _Z6getval1A and things will fail miserably. General violations of ODR can not be diagnosed by compiler working on single translation unit in isolation. If one is lucky, ODR violation turns into linker error about duplicated symbol name. With presence of inline functions however this will not happen and one of the two functions will be eliminated from program. This may result in invocation of a function on wrong data crashing program in random ways.
-detect-odr-violations in gold
One ODR checking tool I am aware of is gold's -detect-odr-violations:gold uses a heuristic to find potential ODR violations: if the same symbol is seen defined in two different input files, and the two symbols have different sizes, then gold looks at the debugging information in the input objects. If the debugging information suggests that the symbols were defined in different source files, gold reports a potential ODR violation. This approach has both false negatives and false positives. However, it is reasonably reliable at detecting problems when linking unoptimized code. It is much easier to find these problems at link time than to debug cases where the wrong symbol.This is very simplistic analysis, but linker can hardly do a lot better; it lacks knowledge of types. Moreover there are some valid cases where the difference can happen and thus the warning can get false positives.
Ian Lance Taylor: New ELF linker, Proceedings of the GCC Developer's Summit 2008
-Wodr: detecting ODR violations in GCC
As discussed in earlier post about construction of type inheritance graph, establishing type equivalency based on their names is important step in building cross-compilation-unit type inheritance graph. It is tricky to do for templates and thus in GCC 4.8 I used simple trick of identifying types based on mangled names of their associated virtual tables. This works only on types with virtual methods and thus with help of our C++ maintainer, Jason Merill, I extended the mehanizm to all class and union types: Before streaming out LTO data I simply invoke C++ frontend langhook to compute mangled name of a type and save it into the LTO object file. At linktime I can use the names to establish equivalence of all named types. This way middle-end does not need to know details of C++ namespaces and templates with the exception of being able to identify types in anonymous namespace and disable merging for those.This streaming is not free, it seems to add about 2% of extra memory use and comile time for the mangled names. One day I plan to use the mechanism to strengthen the type based alias analysis for C++ and also to reduce redundancies in debug info produced (the second is actually done by LLVM to help fight memory usage explosion).
Once ODR based type equivalency is established, I can actually compare the types that become equivalent as part of LTO linking process. Doing so is a simple recursive walk over the GCC's type representation (that is sufficiently close to parse tree of declarations) and compare it. This step is not completely trivial because one needs to get past some differences that are considered valid. In particular complete and incomplete types of the same name are equivalent and there are few cases that we want to tolerate - such as use of attributes and perhaps differences caused by command line options, such as -fsigned-char/-funsigned-char even if those are ABI breaking and could lead to wrong code.
Making the warning useful
My first attempt resulted in plenty of warning looking as follows:/aux/hubicka/firefox4/modules/libjar/zipstruct.h:38:16: warning: type ?struct ZipCentral_? violates one definition rule [-Wodr]
typedef struct ZipCentral_
^
../../dist/include/zipstruct.h:38:0: note: a different type is defined in another translation unit
typedef struct ZipCentral_
^
My best guess of average Joe's attempt to understand the message is "those types are the same! Stupid GCC!". It is a common case that types declared same way and in the same header are actually different because the difference is carried in from earlier includes. Without being able to report why the types are different, the warning would probably be ignored and declared bogus by most developers. For this reason I ended up adding about 200 lines of code just doing kind of structural diff and trying to output useful information about the difference.In this particular cases it winds down to:
/aux/hubicka/firefox4/modules/libjar/zipstruct.h:38:16: warning: type ?struct ZipCentral_? violates one definition rule [-Wodr]
typedef struct ZipCentral_
^
../../dist/include/zipstruct.h:38:0: note: a different type is defined in another translation unit
typedef struct ZipCentral_
^
/aux/hubicka/firefox4/modules/libjar/zipstruct.h:47:25: note: the first difference of corresponding definitions is field ?MOZ_Z_crc32?
unsigned char crc32 [4];
^
../../dist/include/zipstruct.h:47:0: note: a field with different name is defined in another translation unit
unsigned char crc32 [4];
^
A fair try, but still not perfect :). Luckily, the problem is now not hard to guess. There are two zlib implementations linked into the Firefox and one of them redefine crc32 to MOZ_Z_crc32 (apparently to avoid ODR conflicts identified earlier by Gold). I thus decided that such warnings are verbose enough and did not add further information into this particular case. I expect to improve the diagnostic as we get more experience about individual errors reported.Limitations of ODR detection
Even my ODR verifier is not a complete ODR nazi - it verifies only violations for types that survive into actual program and only for types where name matters (i.e. is part of type mangling). For example, if one unit definetypedef int mytype;and the other:
typedef char mytype;the violation will not be reported. This is because typedefs are ignored for type mangling purposes and thus I have no way to figure out that those two types are intended to be equivalent.
Other problem is, of course, that detection is limited to one binary. More conflicts can be introduced when linking with non-LTO library either statically, dynamically or at runtime via dlopen.
Detecting virtual table ODR violations
ODR relates not only to types, but also to functions and methods. While it is very hard to compare two methods coming from different unit because the body may differ in representation (because each is compiled with different flags or optimized in different context) but it may have the same semanticsI can however easily do is to compare virtual table. Mismatches in those are especially deadly because they may result in dispatch to a wrong virtual function.Here is an example of ODR violation in Firefox:
../../dist/include/mozilla/a11y/DocAccessible.h:40:0: warning: type ?struct DocAccessible? violates one definition rule [enabled by default]
class DocAccessible : public HyperTextAccessibleWrap,
^
/aux/hubicka/firefox/accessible/src/generic/DocAccessible.h:40:0: note: a type with the same name but different virtual table is defined in another translation unit
class DocAccessible : public HyperTextAccessibleWrap,
^
/aux/hubicka/firefox/accessible/src/generic/DocAccessible.cpp:1232:0: note: the first different method is ?HandleAccEvent?
DocAccessible::HandleAccEvent(AccEvent* aEvent)
^
/aux/hubicka/firefox/accessible/src/atk/AccessibleWrap.cpp:956:0: note: a conflicting method is ?HandleAccEvent?
AccessibleWrap::HandleAccEvent(AccEvent* aEvent)
^
How common are ODR violations?
Fairly common :) Out of 3 programs (GCC, Firefox and Libreoffice) I tried to build with ODR checking, each of them had type ODR violations one of them (Firefox) had violation in virtual table. Today I leant that LLVM team has fixed ODR violations based on new warning, too. What are the most common causes?Random type name clashes
Each time you create type with short name in global namespace (say hash_t), you risk clash with other units doing the same for completely unrelated reason. This is a good motivation to use namespaces and anonymous namespaces (though the anonymous namespaces are causing problems with GDB debugging, see PR16874). A firefox example looks as follows:/aux/hubicka/firefox4/rdf/base/nsInMemoryDataSource.cpp:149:0: warning: type ?struct Entry? violates one definition rule [-Wodr]
struct Entry {
^
/aux/hubicka/firefox4/gfx/skia/trunk/src/core/SkFlattenable.cpp:66:0: note: a different type is defined in another translation unit
struct Entry {
^
/aux/hubicka/firefox4/rdf/base/nsInMemoryDataSource.cpp:150:0: note: the first difference of corresponding definitions is field ?mHdr?
PLDHashEntryHdr mHdr;
^
/aux/hubicka/firefox4/gfx/skia/trunk/src/core/SkFlattenable.cpp:67:0: note: a field with different name is defined in another translation unit
const char* fName;
^
Some examples come in this thread: http://nabble.documentfoundation.org/Re-Performance-samples-for-LibreOffice-td4119856.html.
Cut&paste
Copying code from one unit to another and adjusting the datastructures for new purpose is a common way to introduce name clashes. For example in GCC we have plenty of structures expr and occr that all originate from gcse.c that implemented the first dataflow based global optimizer.Again, a Firefox example:
/aux/hubicka/firefox4/netwerk/sctp/datachannel/DataChannel.h:57:0: warning: type ?struct BufferedMsg? violates one definition rule [-Wodr]
class BufferedMsg
^
../../../../dist/include/mozilla/net/DataChannel.h:57:0: note: a different type is defined in another translation unit
/aux/hubicka/firefox4/netwerk/sctp/datachannel/DataChannel.h:64:26: note: the first difference of corresponding definitions is field ?mSpa?
struct sctp_sendv_spa *mSpa;
^
../../../../dist/include/mozilla/net/DataChannel.h:64:0: note: a field of same name but different type is defined in another translation unit
/aux/hubicka/firefox4/netwerk/sctp/datachannel/../src/usrsctp.h:198:8: note: type ?struct sctp_sendv_spa? should match type ?struct sctp_sendv_spa? but is defined in different namespace
struct sctp_sendv_spa {
^
../../../../dist/include/mozilla/net/DataChannel.h:60:0: note: the incompatible type is defined here
Here the type declaration is unchanged, but one of types it is built from has changed. This case shows also a namespace clash."Upgrading" codebase from C to C++
Because C does not have ODR, updating compiling code that was originally written as a C code and later upgraded to C++ needs audit for ODR violations. This is what the warning found for GCC: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01402.htmlLinking multiple versions of given library into the program
This is particularly common case for large programs. For example:/aux/hubicka/firefox4/content/media/fmp4/ffmpeg/libav53/../FFmpegDecoderModule.h:18:0: note: a type with different memory representation is defined in another translation unit
class FFmpegDecoderModule : public PlatformDecoderModule
^
/aux/hubicka/firefox4/content/media/fmp4/ffmpeg/libav53/include/libavcodec/avcodec.h:985:0: warning: type ?struct AVFrame? violates one definition rule [-Wodr]
typedef struct AVFrame {
^
/aux/hubicka/firefox4/content/media/fmp4/ffmpeg/libav54/include/libavcodec/avcodec.h:989:0: note: a different type is defined in another translation unit
typedef struct AVFrame {
^
/aux/hubicka/firefox4/content/media/fmp4/ffmpeg/libav53/include/libavcodec/avcodec.h:997:0: note: the first difference of corresponding definitions is field ?data?
uint8_t *data[AV_NUM_DATA_POINTERS];
^
is one of many warnings issued during Firefox build. Conditional compilation
In some cases, that looked particularly dangerous for me, there are fields that are compiled conditionally (such #ifdef DEBUG) and the condition is not set consistently across the whole program.This is also quite common case in Firefox. It usually shows itself as a warning at same position but complaining about mismatch of field names.Non-LTO and whole program ODR checker
It seems that ODR violations are very common bugs in large C++ programs. Majority of the ODR problems found are "safe" in a way that they will not result in wrong code: the types in question are never used in overloaded functions or do not have methods attached. There are was however few real bugs and many of those random clashes have a potential to become real bugs later.Maintainers of the programs usually welcome the reported issues and fixed them quite promptly. Especially welcome response came from Michael Meeks:
It seems to me it would make sense to implement ODR checking that is not dependent on LTO and do not suffer from limitations of gold in the following way:Jan - wow - that is a nice error =) are there any other ODR issues ?
they habitually bite us hard so ... great to get libmerged debugged even
more. CC'ing the list too.
- Extend DWARF info by mangled names of types.
- Implement ODR based debug info merging in linker and warn on mismatches.
Merging of ODR types would likely reduce debug info size considerably (judging from GCC and LLVM memory footprint issues).
Linker has also chance to examine shared libraries linked into the program as long as they do have the annotated debug info available. This is not possible in my LTO implementation. - Implement dynamic library that will optionally verify types at dynamic linking time. This way one can report issues caused by dlopen.
reff : http://hubicka.blogspot.com/2014/09/devirtualization-in-c-part-6-enforcing.html Tags:
Langganan:
Posting Komentar (Atom)
0 Responses to “Devirtualization in C++, part 7 (Enforcing One Definition Rule)”
Posting Komentar