Software archaeology
Encyclopedia
Software archaeology or software archeology is the study of poorly documented or undocumented legacy software
implementations, as part of software maintenance
. Software archaeology, named by analogy with archaeology
, includes the reverse engineering
of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules. The term has been in use for several decades, and reflects a fairly natural metaphor: a programmer reading legacy code may feel that he or she is in the same situation as an archaeologist exploring the rubble of an ancient civilization.
(Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to object-oriented programming
:
More generally, Andy Hunt
and Dave Thomas
note the importance of version control
, dependency management, text indexing tools such as GLIMPSE
and SWISH-E
, and "[drawing] a map as you begin exploring."
Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors. At the OOPSLA workshop, Ward Cunningham
suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and curly braces. In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. Another technique identified at the workshop was the use of aspect-oriented programming
tools such as AspectJ
to systematically introduce tracing
code without directly editing the legacy program.
Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced.
Michael Rozlog of Embarcadero Technologies
has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?" These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using software metric
s to look for design and style violations, using unit testing and profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process. Software archaeology can also be a service provided to programmers by external consultants.
Software archaeology has continued to be a topic of discussion at more recent software engineering conferences.
Legacy system
A legacy system is an old method, technology, computer system, or application program that continues to be used, typically because it still functions for the users' needs, even though newer technology or more efficient methods of performing a task are now available...
implementations, as part of software maintenance
Software maintenance
Software Maintenance in software engineering is the modification of a software product after delivery to correct faults, to improve performance or other attributes....
. Software archaeology, named by analogy with archaeology
Archaeology
Archaeology, or archeology , is the study of human society, primarily through the recovery and analysis of the material culture and environmental data that they have left behind, which includes artifacts, architecture, biofacts and cultural landscapes...
, includes the reverse engineering
Reverse engineering
Reverse engineering is the process of discovering the technological principles of a device, object, or system through analysis of its structure, function, and operation...
of software modules, and the application of a variety of tools and processes for extracting and understanding program structure and recovering design information. Software archaeology may reveal dysfunctional team processes which have produced poorly designed or even unused software modules. The term has been in use for several decades, and reflects a fairly natural metaphor: a programmer reading legacy code may feel that he or she is in the same situation as an archaeologist exploring the rubble of an ancient civilization.
Techniques
A workshop on Software Archaeology at the 2001 OOPSLAOOPSLA
OOPSLA is an annual ACM research conference. OOPSLA mainly takes place in the United States, while the sister conference of OOPSLA, ECOOP, is typically held in Europe...
(Object-Oriented Programming, Systems, Languages & Applications) conference identified the following software archaeology techniques, some of which are specific to object-oriented programming
Object-oriented programming
Object-oriented programming is a programming paradigm using "objects" – data structures consisting of data fields and methods together with their interactions – to design applications and computer programs. Programming techniques may include features such as data abstraction,...
:
- Scripting languageScripting languageA scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...
s to build static reports and for filtering diagnostic output - Ongoing documentation in HTML pages or Wikis
- Synoptic signature analysis, statistical analysis, and software visualizationSoftware visualizationSoftware visualization is the static or animated 2-D or 3-D visual representation of information about software systems based on their structure, size, history, or behavior....
tools - Reverse-engineering tools
- Operating-system-level tracing via trussTruss (Unix)truss is a system tool available on some Unix-like operating systems. When invoked with an additional executable command-line argument, truss makes it possible to print out the system calls made by and the signals received by this executable command-line argument...
or straceStracestrace is a debugging utility for Linux and some other Unix-like systems to monitor the system calls used by a program and all the signals it receives, similar to "truss" utility in other Unix systems... - Search engines and tools to search for keywords in source files
- IDEIntegrated development environmentAn integrated development environment is a software application that provides comprehensive facilities to computer programmers for software development...
file browsing - Test harnesses such as JUnitJUnitJUnit is a unit testing framework for the Java programming language. JUnit has been important in the development of test-driven development, and is one of a family of unit testing frameworks collectively known as xUnit that originated with SUnit....
and CppUnitCPPUnitCppUnit is a unit testing framework module for the C++ programming language, described as a C++ port of JUnit. The library is released under the GNU Lesser General Public License. The library can be compiled for a variety of POSIX platforms, allowing unit-testing of 'C' sources as well as C++ with... - API documentation generation using tools such as JavadocJavadocJavadoc is a documentation generator from Sun Microsystems for generating API documentation in HTML format from Java source code.The "doc comments" format used by Javadoc is the de facto industry standard for documenting Java classes. Some IDEs, such as Netbeans and Eclipse automatically generate...
and doxygenDoxygenDoxygen is a documentation generator for multiple programming languages.Doxygen is a tool for writing software reference documentation. The documentation is written within code, and is thus relatively easy to keep up to date... - DebuggerDebuggerA debugger or debugging tool is a computer program that is used to test and debug other programs . The code to be examined might alternatively be running on an instruction set simulator , a technique that allows great power in its ability to halt when specific conditions are encountered but which...
s
More generally, Andy Hunt
Andy Hunt (author)
Andy Hunt is a writer of books on software development. Hunt co-authored The Pragmatic Programmer, six other books and many articles,...
and Dave Thomas
Dave Thomas (programmer)
Dave Thomas is a computer programmer and author/editor. He has written about Ruby.Dave and partner Andy Hunt wrote The Pragmatic Programmer and run The Pragmatic Bookshelf publishing company.Dave Thomas lives in Flower Mound, Texas...
note the importance of version control
Revision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...
, dependency management, text indexing tools such as GLIMPSE
GLIMPSE
GLIMPSE is a text indexing and retrieval software program originally developed at the University of Arizona by Udi Manber, Sun Wu, and Burra Gopal. A web server version called WebGlimpse is now being maintained under a pay per line licence. Neither project could be considered open source although...
and SWISH-E
SWISH-E
SWISH-E stands for Simple Web Indexing System for Humans - Enhanced. It is used to index collections of documents ranging up to one million documents in size and includes import filters for many document types.- See also :...
, and "[drawing] a map as you begin exploring."
Like true archaeology, software archaeology involves investigative work to understand the thought processes of one's predecessors. At the OOPSLA workshop, Ward Cunningham
Ward Cunningham
Howard G. "Ward" Cunningham is an American computer programmer who developed the first wiki. A pioneer in both design patterns and Extreme Programming, he started programming the software WikiWikiWeb in 1994 and installed it on the website of his software consultancy, Cunningham & Cunningham , on...
suggested a synoptic signature analysis technique which gave an overall "feel" for a program by showing only punctuation, such as semicolons and curly braces. In the same vein, Cunningham has suggested viewing programs in 2 point font in order to understand the overall structure. Another technique identified at the workshop was the use of aspect-oriented programming
Aspect-oriented programming
In computing, aspect-oriented programming is a programming paradigm which aims to increase modularity by allowing the separation of cross-cutting concerns...
tools such as AspectJ
AspectJ
AspectJ is an aspect-oriented extension created at PARC for the Java programming language. It is available in Eclipse Foundation open-source projects, both stand-alone and integrated into Eclipse. AspectJ has become the widely-used de-facto standard for AOP by emphasizing simplicity and usability...
to systematically introduce tracing
Tracing (software)
In software engineering, tracing is a specialized use of logging to record information about a program's execution. This information is typically used by programmers for debugging purposes, and additionally, depending on the type and detail of information contained in a trace log, by experienced...
code without directly editing the legacy program.
Network and temporal analysis techniques can reveal the patterns of collaborative activity by the developers of legacy software, which in turn may shed light on the strengths and weaknesses of the software artifacts produced.
Michael Rozlog of Embarcadero Technologies
Embarcadero Technologies
Embarcadero Technologies is an American computer software company that develops, manufactures, licenses, and supports a wide range of products and services related to software through its various dynamic product divisions...
has described software archaeology as a six-step process which enables programmers to answer questions such as "What have I just inherited?" and "Where are the scary sections of the code?" These steps, similar to those identified by the OOPSLA workshop, include using visualization to obtain a visual representation of the program's design, using software metric
Software metric
A software metric is a measure of some property of a piece of software or its specifications. Since quantitative measurements are essential in all sciences, there is a continuous effort by computer science practitioners and theoreticians to bring similar approaches to software development...
s to look for design and style violations, using unit testing and profiling to look for bugs and performance bottlenecks, and assembling design information recovered by the process. Software archaeology can also be a service provided to programmers by external consultants.
Software archaeology has continued to be a topic of discussion at more recent software engineering conferences.
See also
- Code refactoring
- Software brittlenessSoftware brittlenessIn computer programming and software engineering, the term software brittleness refers to the increased difficulty in fixing older software that may appear reliable, but fails badly when presented with unusual data or altered in a seemingly minor way...
- Software rotSoftware rotSoftware rot, also known as code rot or software erosion or software decay or software entropy, is a type of bit rot. It describes the perceived slow deterioration of software over time that will eventually lead to it becoming faulty, unusable, or otherwise in need of maintenance...
External links
- Position papers, OOPSLA 2001 Workshop on Software Archeology: Understanding Large Systems
- Writing code, reading code and software archeology, Once More into the Code blog at ComputerworldComputerworldComputerworld is an IT magazine that provides information for senior IT leaders. It is published in many countries around the world under the same or similar names. Its publisher is International Data Group. Computerworld serves the needs of IT management via print and online...
, September 23, 2009 - How To Apply Software Archeology To Your Development Process, presentation by Michael Rozlog, March 13, 2008
- OOPSLA 2008 Podcast with Grady BoochGrady BoochGrady Booch is an American software engineer. Booch is best known for developing the Unified Modeling Language with Ivar Jacobson and James Rumbaugh. Grady is recognized internationally for his innovative work in software architecture, software engineering, and collaborative development environments...
on software archaeology and related topics