ILLIAC IV
Encyclopedia
The ILLIAC IV was one of the most infamous supercomputer
s ever built. One of a series of research machines, the ILLIAC
s from the University of Illinois
, the ILLIAC IV design featured fairly high parallelism
with up to 256 processors, used to allow the machine to work on large data sets in what would later be known as vector processing
. The machine was finally ready for operation in 1976, after a decade of development that was now late, over budget, and outperformed by existing commercial machines like the Cray-1
.
. At the time, computer design focused on adding as many instructions as possible to the machine's CPU, a concept known as "orthogonality", which made programs smaller and more efficient in use of memory. It also made the computers themselves fantastically complex, and in an era when many CPUs were hand-wired from individual transistors, the cost of additional orthogonality was often very high. Adding instructions could potentially slow the machine down; maximum speed was defined by the signal timing in the hardware, which was in turn a function of the overall size of the machine. The state of the art
hardware design techniques of the time used individual transistor
s to build up logic circuits, so any increase in logic processing meant a larger machine. CPU
speeds appeared to be reaching a plateau.
Several solutions to these problems were explored in the 1960s. One, then known as overlap but today known as an instruction pipeline
, allows a single CPU
to work on small parts of several instructions at a time. Normally the CPU would fetch an instruction from memory, "decode" it, run the instruction and then write the results back to memory. While the machine is working on any one stage, say decoding, the other portions of the CPU are not being used. Pipelining allows the CPU to start the load and decode stages (for instance) on the "next" instruction while still working on the last one and writing it out. Pipelining was a major feature of Seymour Cray
's groundbreaking design, the CDC 7600
, which outperformed almost all other machines by about ten times when it was introduced.
Another solution to the problem was parallel computing
; building a computer out of a number of general purpose CPUs. The "computer" as a whole would have to be able to keep all of the CPUs busy, asking each one to work on a small part of the problem and then collecting up the results at the end into a single "answer". Not all tasks can be handled in this fashion, and extracting performance from multiple processors remains a problem even today, yet the concept has the advantage of having no theoretical limit to speed – if you need more performance, simply add more CPUs. General purpose CPUs were very expensive, however, so any "massively parallel
" design would either be too expensive to be worth it, or have to use a much simpler CPU design.
math only, the "processing element"s, or PEs. Since much of the complexity of a CPU is due to the instruction fetching and decoding process, Solomon's PEs ended up being much simpler than the CU, so many of them could be built without driving up the price. Modern microprocessor
designs are quite similar to this layout in general terms, with a single instruction decoder feeding a number of subunits dedicated to processing certain types of data. Where Solomon differed from modern designs was in the number of subunits; a modern CPU might have three or four integer
units and a similar number of floating point
, in Solomon there were 256 PE's, all dedicated to floating point.
Solomon would read instructions from memory, decode them, and then hand them off to the PE's for processing. Each PE had its own memory for holding operands and results, the PE Memory module, or PEM. The CU could access the entire memory via a dedicated memory bus
, whereas the PE's could only access their own PEM. Although there are problems, known as embarrassingly parallel
, that can be handled by entirely independent units, these problems are generally rare. To allow results from one PE to be used as inputs in another, a separate network connected each PE to its eight closest neighbors. Similar arrangements were common on massively parallel machines in the 1980s.
Unlike modern designs, Solomon's PEs could only run a single instruction at a time, and every PE had to be running the same instruction. That means the system was only useful when working on data sets that had "wide" arrays that could be spread out over the PEs. These sorts of problems are not uncommon in scientific processing, and are very common today when working with multimedia
data. The concept of applying a single instruction to a large number of data elements at once is now common to most microprocessor designs, where it is referred to as SIMD
, for "Single Instruction, Multiple Data". In Solomon, the CU would normally load up the PEMs with data, scatter the instructions across the PEMs, and then start feeding the instructions to the PE's, one at every clock cycle.
Under a contract from the US Air Force's RADC
research arm, they had built a breadboard
prototype machine in 1964, but the RADC contract ended and Westinghouse decided not to follow it up on their own.
, managed to gain the interest of Burroughs, who at that time was not able to serve the high-end scientific market. However, development of such a machine for an unknown customer base was risky, and Slotnick arranged for the University of Illinois
to be both initial customer and development partner. As the performance of the machine was much more than the University could make good use of, it was expected that time on the machine would be rented out to commercial users. In 1964 the University signed a contract with DARPA to fund the effort, which became known as ILLIAC IV, following in line from a number of earlier research machines developed there. Development started in 1965, and a first-pass design was completed in 1966.
In many ways the machine was treated as an experimental design, so it included the most advanced features then available. The logic circuits were based on ECL
integrated circuit
s (ICs), whereas many machines of the era still relied on individual transistor
s or low-speed ICs. Texas Instruments
was contracted for the ECL based ICs. Each PE was given 2048-words of 240 ns thin film memory
(later replaced with semiconductor
memory) for storing results. Burroughs also supplied the specialized disk drives, which featured a separate stationary head for every track and could offer speeds up to 500 Mbit/s and stored about 80 MB
per 36" disk. They also provided a Burroughs B6500 mainframe to act as a front-end controller. Connected to the B6500 was a laser optical recording medium, a write-once system that stored up to 1 Tbit
on a plastic disk covered with a thin metal film.
The ILLIAC was a 64-bit
design, in a pre-ASCII
era when 48-bit
machines were more common and no word length could be considered "standard". The CPU had sixty-four 64-bit registers
and another four 64-bit accumulators. The PEs had only six 64-bit registers, each with a special purpose. One of these, RGR, was used for communicating data to neighboring PEs, moving one "hop" per clock cycle. Another, RGD, indicated whether or not that PE was currently active. The PEs had instruction formats for 64, 32 and 8-bit data, and could be placed into a 32-bit mode that made it appear that there were 128 PEs.
The design goal called for a computer with the ability to process 1 billion floating point operations per second, or in today's terminology, 1 GFLOPS. To do this the basic design would require 256 PEs running on a 13 MHz clock, driven by four CPUs. Originally they intended to house all 256 PEs in a single large mainframe
, but the project quickly ran behind schedule. Instead, a modification was made to divide the ALUs into quadrants of 64 with a single CU each, housed in separate cabinets. Eventually it became clear that only one quadrant would become available in any realistic timeframe, reducing performance from 1 GFLOPS to about 200 MFLOPS.
Work at the University was primarily aimed at ways to efficiently fill the PEs with data. Unless the "problem" being fed into the computer could be parallelized in SIMD fashion, the ILLIAC would be no faster than any other computer, and much slower than designs from companies like Control Data, which featured much higher clock rates. In order to make this as easy as possible, several new computer languages were created; IVTRAN and TRANQUIL were parallelized versions of FORTRAN
, and Glypnir was a similar conversion of ALGOL
. Generally these languages provided support for loading arrays of data "across" the PEs to be executed in parallel, and some even supported the unwinding of loops into array operations.
(through ARPA), and felt that the University had sold out to a conspiracy. The protests reached a boiling point on 9 May 1970, in a day of "Illiaction". Three months after the August 24th bombing
at a University of Wisconsin mathematics building, the University of Illinois decided to back out of the project, and have it moved to a more secure location. The work was picked up by NASA
, then still cash-flush in the post-Apollo
years and interested in almost anything "high tech". They formed a new Advanced Computing division, and had the machine moved to Moffett Field, California, home of Ames Research Center.
The move slowed development, and the machine was not completed until 1972. By this time the original $8 million estimated from the first design in 1966 had risen to $31 million, while the performance had dropped even further, from 1 GFLOPS to 250 MFLOPS to perhaps 100 MFLOPS with peaks of 150. NASA also decided to replace the B6500 with a PDP-10
, which were in common use at Ames, but this required the development of new compilers and support software. When the ILLIAC was finally turned on in 1972 it was found to be barely operable, failing continually. Efforts to correct the reliability allowed it to run its first complete program in 1974, and go into full operation in 1975. Even "full operation" was somewhat limited; the machine was operated only Monday to Friday and had up to 40 hours of planned maintenance a week. The first full application was run on the machine in 1976, the same year the Cray-1
was released with roughly the same performance.
Nevertheless the ILLIAC was increasingly used over the next few years, and Ames added their own FORTRAN version, CFD. On problems that could be parallelized the machine was still the fastest in the world, outperforming the CDC 7600
by two to six times, and it is generally credited as the fastest machine in the world until 1981. For NASA the machine was "perfect", as its performance was tuned for programs running the same operation on lots of data, which is exactly what computational fluid dynamics
is all about.
The machine was eventually decommissioned in 1982, and NASA's advanced computing division ended with it.
Burroughs was able to use the basic design for only one commercial system, the Parallel Element Processing Ensemble
, or PEPE. PEPE was designed to allow high-accuracy tracking of 288 incoming ICBM warheads, each one assigned to a modified PE. Burroughs built only one PEPE system, although a follow-on design was built by Bell Labs
.
The ILLIAC IV control unit and one processing element chassis is now at the Computer History Museum
in Mountain View, California
.
are excellent examples of the "classic" ILLIAC IV concept, although they also included far better interconnectivity between their PE's in order to avoid data bottlenecks that reduced the problem set suitable for use on the ILLIAC.
Most supercomputers of the era took another approach to higher performance, using a single very high speed vector processor
. Similar to the ILLIAC in concept at least, these processor designs loaded up many data elements into a single custom processor instead of a large number of low-powered ones. The classic example of this design is the Cray-1
, which had performance similar to the ILLIAC, but was able to provide this high performance on a much wider variety of problems, not just those that were highly parallel. There was more than a little "backlash" against the ILLIAC design as a result, and for some time the supercomputer market looked on massively parallel designs with disdain, even when they were successful. As Seymour Cray
famously quipped, "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?"
But time has proven the ILLIAC approach to be the better one for almost all scientific computing. Today, supercomputers are almost universally made up from large numbers of commodity computers, precisely the concept that the ILLIAC pioneered. Progress in compiler technology explains much of this, although the rapid, and perhaps unexpected, continued improvement in microprocessor design rendered custom vector designs slower in most workloads.
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...
s ever built. One of a series of research machines, the ILLIAC
ILLIAC
ILLIAC was a series of supercomputers built at a variety of locations, some at the University of Illinois at Urbana-Champaign. In all, five computers were built in this series between 1951 and 1974...
s from the University of Illinois
University of Illinois at Urbana-Champaign
The University of Illinois at Urbana–Champaign is a large public research-intensive university in the state of Illinois, United States. It is the flagship campus of the University of Illinois system...
, the ILLIAC IV design featured fairly high parallelism
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
with up to 256 processors, used to allow the machine to work on large data sets in what would later be known as vector processing
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
. The machine was finally ready for operation in 1976, after a decade of development that was now late, over budget, and outperformed by existing commercial machines like the Cray-1
Cray-1
The Cray-1 was a supercomputer designed, manufactured, and marketed by Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history...
.
Background
By the early 1960s computer designs were approaching the point of diminishing returnsDiminishing returns
In economics, diminishing returns is the decrease in the marginal output of a production process as the amount of a single factor of production is increased, while the amounts of all other factors of production stay constant.The law of diminishing returns In economics, diminishing returns (also...
. At the time, computer design focused on adding as many instructions as possible to the machine's CPU, a concept known as "orthogonality", which made programs smaller and more efficient in use of memory. It also made the computers themselves fantastically complex, and in an era when many CPUs were hand-wired from individual transistors, the cost of additional orthogonality was often very high. Adding instructions could potentially slow the machine down; maximum speed was defined by the signal timing in the hardware, which was in turn a function of the overall size of the machine. The state of the art
State of the art
The state of the art is the highest level of development, as of a device, technique, or scientific field, achieved at a particular time. It also refers to the level of development reached at any particular time as a result of the latest methodologies employed.- Origin :The earliest use of the term...
hardware design techniques of the time used individual transistor
Transistor
A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...
s to build up logic circuits, so any increase in logic processing meant a larger machine. CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
speeds appeared to be reaching a plateau.
Several solutions to these problems were explored in the 1960s. One, then known as overlap but today known as an instruction pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
, allows a single CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
to work on small parts of several instructions at a time. Normally the CPU would fetch an instruction from memory, "decode" it, run the instruction and then write the results back to memory. While the machine is working on any one stage, say decoding, the other portions of the CPU are not being used. Pipelining allows the CPU to start the load and decode stages (for instance) on the "next" instruction while still working on the last one and writing it out. Pipelining was a major feature of Seymour Cray
Seymour Cray
Seymour Roger Cray was an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research which would build many of these machines. Called "the father of supercomputing," Cray has been credited...
's groundbreaking design, the CDC 7600
CDC 7600
The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using core and variable-size secondary memory...
, which outperformed almost all other machines by about ten times when it was introduced.
Another solution to the problem was parallel computing
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
; building a computer out of a number of general purpose CPUs. The "computer" as a whole would have to be able to keep all of the CPUs busy, asking each one to work on a small part of the problem and then collecting up the results at the end into a single "answer". Not all tasks can be handled in this fashion, and extracting performance from multiple processors remains a problem even today, yet the concept has the advantage of having no theoretical limit to speed – if you need more performance, simply add more CPUs. General purpose CPUs were very expensive, however, so any "massively parallel
Massively parallel
Massively parallel is a description which appears in computer science, life sciences, medical diagnostics, and other fields.A massively parallel computer is a distributed memory computer system which consists of many individual nodes, each of which is essentially an independent computer in itself,...
" design would either be too expensive to be worth it, or have to use a much simpler CPU design.
Solomon
Westinghouse explored the latter solution in a project known as Solomon. Since the highest performing computers were being used primarily for math processing in science and engineering, they decided to focus their CPU design on math alone. They designed a system in which the instruction stream was fetched and decoded by a single CPU, the "control unit" or CU. The CU was attached to an array of processors built to handle floating pointFloating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
math only, the "processing element"s, or PEs. Since much of the complexity of a CPU is due to the instruction fetching and decoding process, Solomon's PEs ended up being much simpler than the CU, so many of them could be built without driving up the price. Modern microprocessor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
designs are quite similar to this layout in general terms, with a single instruction decoder feeding a number of subunits dedicated to processing certain types of data. Where Solomon differed from modern designs was in the number of subunits; a modern CPU might have three or four integer
Integer
The integers are formed by the natural numbers together with the negatives of the non-zero natural numbers .They are known as Positive and Negative Integers respectively...
units and a similar number of floating point
Floating point
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. The base for the scaling is normally 2, 10 or 16...
, in Solomon there were 256 PE's, all dedicated to floating point.
Solomon would read instructions from memory, decode them, and then hand them off to the PE's for processing. Each PE had its own memory for holding operands and results, the PE Memory module, or PEM. The CU could access the entire memory via a dedicated memory bus
Memory bus
The memory bus is the computer bus which connects the main memory to the memory controller in computer systems. Originally, general-purpose buses like VMEbus and the S-100 bus were used, but to reduce latency, modern memory buses are designed to connect directly to DRAM chips, and thus are...
, whereas the PE's could only access their own PEM. Although there are problems, known as embarrassingly parallel
Embarrassingly parallel
In parallel computing, an embarrassingly parallel workload is one for which little or no effort is required to separate the problem into a number of parallel tasks...
, that can be handled by entirely independent units, these problems are generally rare. To allow results from one PE to be used as inputs in another, a separate network connected each PE to its eight closest neighbors. Similar arrangements were common on massively parallel machines in the 1980s.
Unlike modern designs, Solomon's PEs could only run a single instruction at a time, and every PE had to be running the same instruction. That means the system was only useful when working on data sets that had "wide" arrays that could be spread out over the PEs. These sorts of problems are not uncommon in scientific processing, and are very common today when working with multimedia
Multimedia
Multimedia is media and content that uses a combination of different content forms. The term can be used as a noun or as an adjective describing a medium as having multiple content forms. The term is used in contrast to media which use only rudimentary computer display such as text-only, or...
data. The concept of applying a single instruction to a large number of data elements at once is now common to most microprocessor designs, where it is referred to as SIMD
SIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
, for "Single Instruction, Multiple Data". In Solomon, the CU would normally load up the PEMs with data, scatter the instructions across the PEMs, and then start feeding the instructions to the PE's, one at every clock cycle.
Under a contract from the US Air Force's RADC
Rome Laboratory
The Rome Laboratory, formerly known as the Rome Air Development Center, is one of eight research and development labs run by the US Air Force located at Griffiss AFB in Rome, NY. One of four superlabs run by the Air Force, the Rome Lab is tasked with generic research, as opposed to having a...
research arm, they had built a breadboard
Breadboard
A breadboard is a construction base for prototyping of electronics. The term is commonly used to refer to solderless breadboard ....
prototype machine in 1964, but the RADC contract ended and Westinghouse decided not to follow it up on their own.
ILLIAC IV
When Solomon ended, the principal investigator, Daniel SlotnickDaniel Slotnick
Daniel Leonid Slotnick was a mathematician and computer architect. Slotnick, in papers published with John Cocke in 1958, discussed the use of parallelism in numerical calculations for the first time. He later served as the chief architect of the ILLIAC IV supercomputer.-References:...
, managed to gain the interest of Burroughs, who at that time was not able to serve the high-end scientific market. However, development of such a machine for an unknown customer base was risky, and Slotnick arranged for the University of Illinois
University of Illinois at Urbana-Champaign
The University of Illinois at Urbana–Champaign is a large public research-intensive university in the state of Illinois, United States. It is the flagship campus of the University of Illinois system...
to be both initial customer and development partner. As the performance of the machine was much more than the University could make good use of, it was expected that time on the machine would be rented out to commercial users. In 1964 the University signed a contract with DARPA to fund the effort, which became known as ILLIAC IV, following in line from a number of earlier research machines developed there. Development started in 1965, and a first-pass design was completed in 1966.
In many ways the machine was treated as an experimental design, so it included the most advanced features then available. The logic circuits were based on ECL
Emitter coupled logic
In electronics, emitter-coupled logic , is a logic family that achieves high speed by using an overdriven BJT differential amplifier with single-ended input, whose emitter current is limited to avoid the slow saturation region of transistor operation....
integrated circuit
Integrated circuit
An integrated circuit or monolithic integrated circuit is an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material...
s (ICs), whereas many machines of the era still relied on individual transistor
Transistor
A transistor is a semiconductor device used to amplify and switch electronic signals and power. It is composed of a semiconductor material with at least three terminals for connection to an external circuit. A voltage or current applied to one pair of the transistor's terminals changes the current...
s or low-speed ICs. Texas Instruments
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...
was contracted for the ECL based ICs. Each PE was given 2048-words of 240 ns thin film memory
Thin film memory
Thin-film memory is a high-speed variation of core memory developed by Sperry Rand in a government-funded research project.Instead of threading individual ferrite cores on wires, thin-film memory consisted of 4 micrometre thick dots of permalloy, an iron-nickel alloy, deposited on small glass...
(later replaced with semiconductor
Semiconductor
A semiconductor is a material with electrical conductivity due to electron flow intermediate in magnitude between that of a conductor and an insulator. This means a conductivity roughly in the range of 103 to 10−8 siemens per centimeter...
memory) for storing results. Burroughs also supplied the specialized disk drives, which featured a separate stationary head for every track and could offer speeds up to 500 Mbit/s and stored about 80 MB
Megabyte
The megabyte is a multiple of the unit byte for digital information storage or transmission with two different values depending on context: bytes generally for computer memory; and one million bytes generally for computer storage. The IEEE Standards Board has decided that "Mega will mean 1 000...
per 36" disk. They also provided a Burroughs B6500 mainframe to act as a front-end controller. Connected to the B6500 was a laser optical recording medium, a write-once system that stored up to 1 Tbit
Terabit
The terabit is a multiple of the unit bit for digital information or computer storage. The prefix tera is defined in the International System of Units as a multiplier of 1012 , and therefore...
on a plastic disk covered with a thin metal film.
The ILLIAC was a 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...
design, in a pre-ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
era when 48-bit
Bit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
machines were more common and no word length could be considered "standard". The CPU had sixty-four 64-bit registers
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
and another four 64-bit accumulators. The PEs had only six 64-bit registers, each with a special purpose. One of these, RGR, was used for communicating data to neighboring PEs, moving one "hop" per clock cycle. Another, RGD, indicated whether or not that PE was currently active. The PEs had instruction formats for 64, 32 and 8-bit data, and could be placed into a 32-bit mode that made it appear that there were 128 PEs.
The design goal called for a computer with the ability to process 1 billion floating point operations per second, or in today's terminology, 1 GFLOPS. To do this the basic design would require 256 PEs running on a 13 MHz clock, driven by four CPUs. Originally they intended to house all 256 PEs in a single large mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
, but the project quickly ran behind schedule. Instead, a modification was made to divide the ALUs into quadrants of 64 with a single CU each, housed in separate cabinets. Eventually it became clear that only one quadrant would become available in any realistic timeframe, reducing performance from 1 GFLOPS to about 200 MFLOPS.
Work at the University was primarily aimed at ways to efficiently fill the PEs with data. Unless the "problem" being fed into the computer could be parallelized in SIMD fashion, the ILLIAC would be no faster than any other computer, and much slower than designs from companies like Control Data, which featured much higher clock rates. In order to make this as easy as possible, several new computer languages were created; IVTRAN and TRANQUIL were parallelized versions of FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
, and Glypnir was a similar conversion of ALGOL
ALGOL
ALGOL is a family of imperative computer programming languages originally developed in the mid 1950s which greatly influenced many other languages and became the de facto way algorithms were described in textbooks and academic works for almost the next 30 years...
. Generally these languages provided support for loading arrays of data "across" the PEs to be executed in parallel, and some even supported the unwinding of loops into array operations.
ILLIAC moves
When the computer was being built in the late 1960s, it was met with hostility by protesters who were suspicious of the University's tie with the Department of DefenseUnited States Department of Defense
The United States Department of Defense is the U.S...
(through ARPA), and felt that the University had sold out to a conspiracy. The protests reached a boiling point on 9 May 1970, in a day of "Illiaction". Three months after the August 24th bombing
Sterling Hall bombing
The Sterling Hall Bombing that occurred on the University of Wisconsin–Madison campus on August 24, 1970 was committed by four young people as a protest against the University's research connections with the US military during the Vietnam War...
at a University of Wisconsin mathematics building, the University of Illinois decided to back out of the project, and have it moved to a more secure location. The work was picked up by NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...
, then still cash-flush in the post-Apollo
Project Apollo
The Apollo program was the spaceflight effort carried out by the United States' National Aeronautics and Space Administration , that landed the first humans on Earth's Moon. Conceived during the Presidency of Dwight D. Eisenhower, Apollo began in earnest after President John F...
years and interested in almost anything "high tech". They formed a new Advanced Computing division, and had the machine moved to Moffett Field, California, home of Ames Research Center.
The move slowed development, and the machine was not completed until 1972. By this time the original $8 million estimated from the first design in 1966 had risen to $31 million, while the performance had dropped even further, from 1 GFLOPS to 250 MFLOPS to perhaps 100 MFLOPS with peaks of 150. NASA also decided to replace the B6500 with a PDP-10
PDP-10
The PDP-10 was a mainframe computer family manufactured by Digital Equipment Corporation from the late 1960s on; the name stands for "Programmed Data Processor model 10". The first model was delivered in 1966...
, which were in common use at Ames, but this required the development of new compilers and support software. When the ILLIAC was finally turned on in 1972 it was found to be barely operable, failing continually. Efforts to correct the reliability allowed it to run its first complete program in 1974, and go into full operation in 1975. Even "full operation" was somewhat limited; the machine was operated only Monday to Friday and had up to 40 hours of planned maintenance a week. The first full application was run on the machine in 1976, the same year the Cray-1
Cray-1
The Cray-1 was a supercomputer designed, manufactured, and marketed by Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history...
was released with roughly the same performance.
Nevertheless the ILLIAC was increasingly used over the next few years, and Ames added their own FORTRAN version, CFD. On problems that could be parallelized the machine was still the fastest in the world, outperforming the CDC 7600
CDC 7600
The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using core and variable-size secondary memory...
by two to six times, and it is generally credited as the fastest machine in the world until 1981. For NASA the machine was "perfect", as its performance was tuned for programs running the same operation on lots of data, which is exactly what computational fluid dynamics
Computational fluid dynamics
Computational fluid dynamics, usually abbreviated as CFD, is a branch of fluid mechanics that uses numerical methods and algorithms to solve and analyze problems that involve fluid flows. Computers are used to perform the calculations required to simulate the interaction of liquids and gases with...
is all about.
The machine was eventually decommissioned in 1982, and NASA's advanced computing division ended with it.
Burroughs was able to use the basic design for only one commercial system, the Parallel Element Processing Ensemble
Parallel Element Processing Ensemble
The Parallel Element Processing Ensemble was one of the very early parallel computing systems. This computer was designed and built in the mid 1970s by Burroughs Corporation at their Great Valley Labs engineering facility in Paoli, Pennsylvania...
, or PEPE. PEPE was designed to allow high-accuracy tracking of 288 incoming ICBM warheads, each one assigned to a modified PE. Burroughs built only one PEPE system, although a follow-on design was built by Bell Labs
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...
.
The ILLIAC IV control unit and one processing element chassis is now at the Computer History Museum
Computer History Museum
The Computer History Museum is a museum established in 1996 in Mountain View, California, USA. The Museum is dedicated to preserving and presenting the stories and artifacts of the information age, and exploring the computing revolution and its impact on our lives.-History:The museum's origins...
in Mountain View, California
Mountain View, California
-Downtown:Mountain View has a pedestrian-friendly downtown centered on Castro Street. The downtown area consists of the seven blocks of Castro Street from the Downtown Mountain View Station transit center in the north to the intersection with El Camino Real in the south...
.
Aftermath
Although the ILLIAC effort ended in uninspiring results, attempts to understand the reasons for the failure of the ILLIAC IV architecture pushed forward research in parallel computing. During the 1980s a number of companies used the same approach to build even more parallel machines, with compilers that could make better use of the parallelism. The Thinking Machines CM-1 and CM-2Connection Machine
The Connection Machine was a series of supercomputers that grew out of Danny Hillis' research in the early 1980s at MIT on alternatives to the traditional von Neumann architecture of computation...
are excellent examples of the "classic" ILLIAC IV concept, although they also included far better interconnectivity between their PE's in order to avoid data bottlenecks that reduced the problem set suitable for use on the ILLIAC.
Most supercomputers of the era took another approach to higher performance, using a single very high speed vector processor
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
. Similar to the ILLIAC in concept at least, these processor designs loaded up many data elements into a single custom processor instead of a large number of low-powered ones. The classic example of this design is the Cray-1
Cray-1
The Cray-1 was a supercomputer designed, manufactured, and marketed by Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history...
, which had performance similar to the ILLIAC, but was able to provide this high performance on a much wider variety of problems, not just those that were highly parallel. There was more than a little "backlash" against the ILLIAC design as a result, and for some time the supercomputer market looked on massively parallel designs with disdain, even when they were successful. As Seymour Cray
Seymour Cray
Seymour Roger Cray was an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research which would build many of these machines. Called "the father of supercomputing," Cray has been credited...
famously quipped, "If you were plowing a field, which would you rather use? Two strong oxen or 1024 chickens?"
But time has proven the ILLIAC approach to be the better one for almost all scientific computing. Today, supercomputers are almost universally made up from large numbers of commodity computers, precisely the concept that the ILLIAC pioneered. Progress in compiler technology explains much of this, although the rapid, and perhaps unexpected, continued improvement in microprocessor design rendered custom vector designs slower in most workloads.
See also
- Amdahl's lawAmdahl's lawAmdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved...
- ORDVACORDVACThe ORDVAC or Ordnance Discrete Variable Automatic Computer, an early computer built by the University of Illinois for the Ballistics Research Laboratory at Aberdeen Proving Ground, was based on the IAS architecture developed by John von Neumann, which came to be known as the von Neumann architecture...
- ILLIAC IILLIAC IThe ILLIAC I , a pioneering computer built in 1952 by the University of Illinois, was the first computer built and owned entirely by a US educational institution, Manchester University UK having built Manchester Mark 1 in 1948.ILLIAC I was based on the Institute for Advanced Study Von Neumann...
- ILLIAC IIILLIAC IIThe ILLIAC II was a revolutionary super-computer built by the University of Illinois that became operational in 1962.-Description:The concept, proposed in 1958, pioneered Emitter-coupled logic circuitry, pipelining, and transistor memory with a design goal of 100x speedup compared to ILLIAC...
- ILLIAC IIIILLIAC IIIThe ILLIAC III was a fine-grained SIMD pattern recognition computer built by the University of Illinois in 1966.This ILLIAC's initial task was image processing of bubble chamber experiments used to detect nuclear particles. Later it was used on biological images.The machine was destroyed in a fire,...
- Parallel Element Processing EnsembleParallel Element Processing EnsembleThe Parallel Element Processing Ensemble was one of the very early parallel computing systems. This computer was designed and built in the mid 1970s by Burroughs Corporation at their Great Valley Labs engineering facility in Paoli, Pennsylvania...
External links
- ILLIAC IV documentation at bitsavers.org
- Oral history interview with Ivan Sutherland, Charles Babbage InstituteCharles Babbage InstituteThe Charles Babbage Institute is a research center at the University of Minnesota specializing in the history of information technology, particularly the history since 1935 of digital computing, programming/software, and computer networking....
, University of Minnesota. SutherlandIvan SutherlandIvan Edward Sutherland is an American computer scientist and Internet pioneer. He received the Turing Award from the Association for Computing Machinery in 1988 for the invention of Sketchpad, an early predecessor to the sort of graphical user interface that has become ubiquitous in personal...
describes his tenure from 1963-65 as head of the Information Processing Techniques Office (IPTO) and new initiatives such as ILLIAC IV.