Fragile binary interface problem
Encyclopedia
The fragile binary interface problem or FBI is a shortcoming of certain object-oriented programming
language compiler
s, in which internal changes to an underlying class library can cause descendant libraries or programs to cease working. It is an example of software brittleness
.
Note that this problem is more often called the fragile base class problem or FBC; however, that term also has a different (but related) sense. (See fragile base class
.)
languages such as C
and Pascal
.
In these languages there were no objects in the modern sense, but there was a similar construct known as a record
(or "struct" in C) that held a variety of related information in one piece of memory. The parts within a particular record were accessed by keeping track of the starting location of the record, and knowing the offset from that starting point to the part in question. For instance a "person" record might have a first name, last name and middle initial, to access the initial the programmer writes
When object-oriented language compilers were first being developed, much of the existing compiler technology was used, and objects were built on top of the record concept. In these languages the objects were referred to by their starting point, and their public data, known as "fields", were accessed through the known offset. In effect the only change was to add another field to the record, one that lists the various methods (functions), such that the record knows about both its data and functions. When compiled, the offsets are used to access both the data and the code.
when they are constructed from libraries. If the author of the library changes the size or layout of the public fields within the object, the offsets are now invalid and the program will no longer work. This is the FBI problem.
Although changes in implementation may be expected to cause problems, the insidious thing about FBI is that nothing really changed, only the layout of the object that is hidden in a compiled library. One might expect that if you change
In complex object-oriented programs
or libraries the highest-level classes may be inheriting from tens of classes. Each of those base classes could be inherited by hundreds of other classes as well. These base classes are fragile because a small change to one of them could cause problems for any class that inherits from it, either directly or from inheriting another class that does. This can cause the library to collapse like a house of cards as many classes are damaged by one change to a base class. The problem may not be noticed as the modifications are being written if the inheritance tree is complex.
, have extensive documentation on what changes are safe to make without causing FBI problems.
Another solution is to write out an intermediate file listing the offsets and other information from the compile stage, known as meta-data. The linker then uses this information to correct itself when the library is loaded into an application. Platforms such as .NET do this.
However, the market has selected programming languages such as C++
that are indeed "position dependent" and therefore exhibit FBI. In these cases there are still a number of solutions to the problem. One puts the burden on the library author by having them insert a number of "placeholder" objects in case they need to add additional functionality in the future (this can be seen in the structs used in the DirectX
library). This solution works well until you run out of these dummies -- and you do not want to add too many because it takes up memory.
, the library format allowed for multiple versions of one library and included some functionality for selecting the proper library when called. However this was not always needed because the offsets were only needed for fields, since methods offsets were collected at runtime and could not cause FBI. Since methods tend to change more often than fields, ObjC had few FBI problems in the first place, and those it did could be corrected with the versioning system. The TOM language
has extended this even further, using runtime collected offsets for everything, making FBI impossible.
Using static instead of dynamic libraries where possible is another solution, as the library then cannot be modified without also recompiling the application and updating the offsets it uses. However static libraries have serious problems of their own, such as a larger binary and the inability to use newer versions of the library "automatically" as they are introduced.
The vast majority of programming languages in use today do nothing to protect the programmer from FBI. This is somewhat surprising, as the problem has been known about since the 1980s.
instead of base classes with virtual function
s, as interfaces themselves do not contain code, only a guarantee that each method signature the interface declares will be supported by every object that implements the interface.
Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
language compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
s, in which internal changes to an underlying class library can cause descendant libraries or programs to cease working. It is an example of software brittleness
Software brittleness
In computer programming and software engineering, the term software brittleness refers to the increased difficulty in fixing older software that may appear reliable, but fails badly when presented with unusual data or altered in a seemingly minor way...
.
Note that this problem is more often called the fragile base class problem or FBC; however, that term also has a different (but related) sense. (See fragile base class
Fragile base class
The fragile base class problem is a fundamental architectural problem of object-oriented programming systems where base classes are considered "fragile" because seemingly safe modifications to a base class, when inherited by the derived classes, may cause the derived classes to malfunction...
.)
Cause
The problem occurs due to a "shortcut" used with compilers for many common object-oriented (OO) languages, a design feature that was kept when OO languages were evolving from earlier non-OO structured programmingStructured programming
Structured programming is a programming paradigm aimed on improving the clarity, quality, and development time of a computer program by making extensive use of subroutines, block structures and for and while loops - in contrast to using simple tests and jumps such as the goto statement which could...
languages such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
and Pascal
Pascal (programming language)
Pascal is an influential imperative and procedural programming language, designed in 1968/9 and published in 1970 by Niklaus Wirth as a small and efficient language intended to encourage good programming practices using structured programming and data structuring.A derivative known as Object Pascal...
.
In these languages there were no objects in the modern sense, but there was a similar construct known as a record
Record (computer science)
In computer science, a record is an instance of a product of primitive data types called a tuple. In C it is the compound data in a struct. Records are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed...
(or "struct" in C) that held a variety of related information in one piece of memory. The parts within a particular record were accessed by keeping track of the starting location of the record, and knowing the offset from that starting point to the part in question. For instance a "person" record might have a first name, last name and middle initial, to access the initial the programmer writes
thisPerson.middleInitial
which the compiler turns into something like a = location(thisPerson) + offset(middleInitial)
. Modern CPUs typically include instructions for this common sort of access.When object-oriented language compilers were first being developed, much of the existing compiler technology was used, and objects were built on top of the record concept. In these languages the objects were referred to by their starting point, and their public data, known as "fields", were accessed through the known offset. In effect the only change was to add another field to the record, one that lists the various methods (functions), such that the record knows about both its data and functions. When compiled, the offsets are used to access both the data and the code.
Symptoms
This leads to a problem in larger programsComputer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
when they are constructed from libraries. If the author of the library changes the size or layout of the public fields within the object, the offsets are now invalid and the program will no longer work. This is the FBI problem.
Although changes in implementation may be expected to cause problems, the insidious thing about FBI is that nothing really changed, only the layout of the object that is hidden in a compiled library. One might expect that if you change
doSomething
to doSomethingElse
that it might cause a problem, but in this case you can cause problems without changing doSomething
, it can be caused as easily as moving lines of source code around for clarity. Worse, the programmer has little or no control over the resulting layout generated by the compiler, making this problem almost completely hidden from view.In complex object-oriented programs
Computer program
A computer program is a sequence of instructions written to perform a specified task with a computer. A computer requires programs to function, typically executing the program's instructions in a central processor. The program has an executable form that the computer can use directly to execute...
or libraries the highest-level classes may be inheriting from tens of classes. Each of those base classes could be inherited by hundreds of other classes as well. These base classes are fragile because a small change to one of them could cause problems for any class that inherits from it, either directly or from inheriting another class that does. This can cause the library to collapse like a house of cards as many classes are damaged by one change to a base class. The problem may not be noticed as the modifications are being written if the inheritance tree is complex.
Languages
The best solution to the fragile binary interface problem is to write a language that knows the problem exists, and does not let it happen in the first place. Most custom-written OO languages, as opposed to those evolved from earlier languages, construct all of their offset tables at load time. Changes to the layout of the library will be "noticed" at that point. Other OO languages, like Self, construct everything at runtime by copying and modifying the objects found in the libraries, and therefore do not really have a base class that can be fragile. Some languages, like JavaJava (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, have extensive documentation on what changes are safe to make without causing FBI problems.
Another solution is to write out an intermediate file listing the offsets and other information from the compile stage, known as meta-data. The linker then uses this information to correct itself when the library is loaded into an application. Platforms such as .NET do this.
However, the market has selected programming languages such as C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
that are indeed "position dependent" and therefore exhibit FBI. In these cases there are still a number of solutions to the problem. One puts the burden on the library author by having them insert a number of "placeholder" objects in case they need to add additional functionality in the future (this can be seen in the structs used in the DirectX
DirectX
Microsoft DirectX is a collection of application programming interfaces for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with Direct, such as Direct3D, DirectDraw, DirectMusic, DirectPlay,...
library). This solution works well until you run out of these dummies -- and you do not want to add too many because it takes up memory.
Linkers
Another solution requires a smarter linker. In Objective-CObjective-C
Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it...
, the library format allowed for multiple versions of one library and included some functionality for selecting the proper library when called. However this was not always needed because the offsets were only needed for fields, since methods offsets were collected at runtime and could not cause FBI. Since methods tend to change more often than fields, ObjC had few FBI problems in the first place, and those it did could be corrected with the versioning system. The TOM language
TOM (object-oriented programming language)
TOM was an object-oriented programming language developed in the 1990s that built on the lessons learned from Objective-C. The main purpose of TOM was to allow for "unplanned reuse" of code via a well-developed extension mechanism...
has extended this even further, using runtime collected offsets for everything, making FBI impossible.
Using static instead of dynamic libraries where possible is another solution, as the library then cannot be modified without also recompiling the application and updating the offsets it uses. However static libraries have serious problems of their own, such as a larger binary and the inability to use newer versions of the library "automatically" as they are introduced.
The vast majority of programming languages in use today do nothing to protect the programmer from FBI. This is somewhat surprising, as the problem has been known about since the 1980s.
Architecture
In these languages the problem is lessened by enforcing single inheritance (as this reduces the complexity of the inheritance tree), and by the use of interfacesInterface (computer science)
In the field of computer science, an interface is a tool and concept that refers to a point of interaction between components, and is applicable at the level of both hardware and software...
instead of base classes with virtual function
Virtual function
In object-oriented programming, a virtual function or virtual method is a function or method whose behaviour can be overridden within an inheriting class by a function with the same signature...
s, as interfaces themselves do not contain code, only a guarantee that each method signature the interface declares will be supported by every object that implements the interface.
Distribution method
The whole problem collapses if the source code of the used libraries is available. Then a simple recompilation will do the trick.External links
- BeOS'sBeOSBeOS is an operating system for personal computers which began development by Be Inc. in 1991. It was first written to run on BeBox hardware. BeOS was optimized for digital media work and was written to take advantage of modern hardware facilities such as symmetric multiprocessing by utilizing...
paper on the problem and their solution