CDC STAR-100
Encyclopedia
The STAR-100 was a vector
supercomputer
designed, manufactured, and marketed by Control Data Corporation
(CDC). It was one of the first machines to use a vector processor
to improve performance on appropriate scientific applications.
The name STAR was a construct of the words STrings and ARrays. The 100 came from 100 million floating point operations per second (MFLOPS), the speed at which the machine was designed to operate. The computer was announced very early during the 1970s and was supposed to be several times faster than the CDC 7600
, which was then the world's fastest supercomputer with a peak performance of 36 MFLOPS. On August 17, 1971, CDC announced that General Motors
had placed the first commercial order for a STAR-100.
A number of basic design features of the machine meant that its "real world" performance was much lower than expected when first used commercially in 1974, and was one of the primary reasons CDC was pushed from its former dominance in the supercomputer market when the Cray-1
was announced in 1975.
was supported by a number of peripheral processors that offloaded housekeeping tasks and allowed the CPU to crunch numbers as quickly as possible. In the STAR, both the CPU and peripheral processors were deliberately simplified, however, to lower the cost and complexity of implementation. The STAR also differed from the earlier designs by being based on a 64-bit architecture instead of 60-bit, a side effect of the increasing use of 8-bit ASCII
processing. Also unlike previous machines, the STAR made heavy use of microcode
and also supported a virtual memory
capability.
The main innovation in the STAR was the inclusion of instructions for vector processing. These new and more complex instructions approximated what was available to users of the APL programming language and operated on huge vectors that were stored in consecutive locations in the main memory. The CPU was designed to use these instructions to set up additional hardware that fed in data from the main memory as quickly as possible. For instance, a program could use single instruction with a few parameters to add all the elements in two vectors that could be as long as 65,535 elements. The CPU only had to decode a single instruction, set up the memory hardware, and start feeding the data into the math units. As with instruction pipelines in general, the performance of any one instruction was no better than it was before, but since the CPU was effectively working on a number of instructions at once (or in this case, data points) the overall performance dramatically improves due to the assembly line nature of the task.
The main memory had a capacity of 65,536 superwords (SWORDs), which are 512-bit words. The main memory was 32-way interleaved
to pipeline memory accesses. It was constructed from core memory with an access time
of 1.28 μs. The main memory was accessed via a 512-bit bus, controlled by the storage access controller (SAC), which handled requests from the stream unit. The stream unit accesses the main memory through the SAC via three 128-bit data buses, two for reads, and one for writes. Additionally, there is a 128-bit data bus for instruction fetch, I/O, and control vector access. The stream unit serves as the control unit, fetching and decoding instructions, initiating memory accesses on the behalf of the pipelined functional units, and controlling instruction execution, among other tasks. It also contains two read buffers and one write buffer for streaming data to the execution units.
The STAR-100 has two pipelines where arithmetic is performed. The first pipeline contains a floating point adder and multiplier, whereas the second pipeline is multifunctional, capable of executing all scalar instructions. It also contains a floating point adder, multiplier, and divider. Both pipelines are 64-bit for floating point operations and are controlled by microcode. The STAR-100 can split its floating point pipelines into four 32-bit pipelines, doubling the peak performance of the system to 100 MFLOPS at the expense of half the precision.
The STAR-100 uses I/O processors to offload I/O from the CPU. Each I/O processor is a 16-bit minicomputer
with its own main memory of 65,536 words of 16 bits each, which is implemented with core memory. The I/O processors all share a 128-bit data bus to the SAC.
When the machine was released in 1974, it quickly became apparent that the general performance was nowhere near what people expected. Very few programs can be effectively vectorized into a series of single instructions; nearly all calculations will rely on the results of some earlier instruction, yet the results had to clear the pipelines before they could be fed back in. This forced most programs to hit the high setup cost of the vector units, and generally the ones that did "work" were extreme examples. Making matters worse was that the basic scalar performance was sacrificed in order to improve vector performance. Any time that the program had to run scalar instructions, the overall performance of the machine dropped dramatically. (See Amdahl's Law
.)
Two STAR-100 systems were eventually delivered to the Lawrence Livermore National Laboratory
and one to NASA Langley Research Center
. In preparation for the STAR deliveries, LLNL programmers developed a library of subroutine
s, called STACKLIB, on the 7600 to emulate the vector operations of the STAR. In the process of developing STACKLIB, it was noticed that STACKLIB-based applications could run even faster on the 7600 than they had prior to the integration of the vector library. This discovery placed further pressures on the performance problems of the STAR.
The STAR-100 was a disappointment to everyone involved, and Jim Thornton, the chief designer, left CDC to form Network Systems Corporation
. An updated version was later released in 1979 as the Cyber 203, followed by the Cyber 205 in 1980, but by this point systems from Cray Research with considerably higher performance were on the market. The failure of the STAR led to CDC being pushed from its former dominance in the supercomputer market, something they tried to address with the formation of ETA Systems
in September 1983.
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
supercomputer
Supercomputer
A supercomputer is a computer at the frontline of current processing capacity, particularly speed of calculation.Supercomputers are used for highly calculation-intensive tasks such as problems including quantum physics, weather forecasting, climate research, molecular modeling A supercomputer is a...
designed, manufactured, and marketed by Control Data Corporation
Control Data Corporation
Control Data Corporation was a supercomputer firm. For most of the 1960s, it built the fastest computers in the world by far, only losing that crown in the 1970s after Seymour Cray left the company to found Cray Research, Inc....
(CDC). It was one of the first machines to use a vector processor
Vector processor
A vector processor, or array processor, is a central processing unit that implements an instruction set containing instructions that operate on one-dimensional arrays of data called vectors. This is in contrast to a scalar processor, whose instructions operate on single data items...
to improve performance on appropriate scientific applications.
The name STAR was a construct of the words STrings and ARrays. The 100 came from 100 million floating point operations per second (MFLOPS), the speed at which the machine was designed to operate. The computer was announced very early during the 1970s and was supposed to be several times faster than the CDC 7600
CDC 7600
The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using core and variable-size secondary memory...
, which was then the world's fastest supercomputer with a peak performance of 36 MFLOPS. On August 17, 1971, CDC announced that General Motors
General Motors
General Motors Company , commonly known as GM, formerly incorporated as General Motors Corporation, is an American multinational automotive corporation headquartered in Detroit, Michigan and the world's second-largest automaker in 2010...
had placed the first commercial order for a STAR-100.
A number of basic design features of the machine meant that its "real world" performance was much lower than expected when first used commercially in 1974, and was one of the primary reasons CDC was pushed from its former dominance in the supercomputer market when the Cray-1
Cray-1
The Cray-1 was a supercomputer designed, manufactured, and marketed by Cray Research. The first Cray-1 system was installed at Los Alamos National Laboratory in 1976, and it went on to become one of the best known and most successful supercomputers in history...
was announced in 1975.
Description
In general organization, the STAR was similar to CDC's earlier supercomputers, where a simple RISC-like CPUCentral processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
was supported by a number of peripheral processors that offloaded housekeeping tasks and allowed the CPU to crunch numbers as quickly as possible. In the STAR, both the CPU and peripheral processors were deliberately simplified, however, to lower the cost and complexity of implementation. The STAR also differed from the earlier designs by being based on a 64-bit architecture instead of 60-bit, a side effect of the increasing use of 8-bit ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
processing. Also unlike previous machines, the STAR made heavy use of microcode
Microcode
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...
and also supported a virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
capability.
The main innovation in the STAR was the inclusion of instructions for vector processing. These new and more complex instructions approximated what was available to users of the APL programming language and operated on huge vectors that were stored in consecutive locations in the main memory. The CPU was designed to use these instructions to set up additional hardware that fed in data from the main memory as quickly as possible. For instance, a program could use single instruction with a few parameters to add all the elements in two vectors that could be as long as 65,535 elements. The CPU only had to decode a single instruction, set up the memory hardware, and start feeding the data into the math units. As with instruction pipelines in general, the performance of any one instruction was no better than it was before, but since the CPU was effectively working on a number of instructions at once (or in this case, data points) the overall performance dramatically improves due to the assembly line nature of the task.
The main memory had a capacity of 65,536 superwords (SWORDs), which are 512-bit words. The main memory was 32-way interleaved
Interleaved memory
Interleaved memory is a technique for compensating the relatively slow speed of DRAM. The CPU can access alternative sections immediately without waiting for memory to be cached. Multiple memory banks take turns supplying data....
to pipeline memory accesses. It was constructed from core memory with an access time
Access time
Access time is the time delay or latency between a request to an electronic system, and the access being completed or the requested data returned....
of 1.28 μs. The main memory was accessed via a 512-bit bus, controlled by the storage access controller (SAC), which handled requests from the stream unit. The stream unit accesses the main memory through the SAC via three 128-bit data buses, two for reads, and one for writes. Additionally, there is a 128-bit data bus for instruction fetch, I/O, and control vector access. The stream unit serves as the control unit, fetching and decoding instructions, initiating memory accesses on the behalf of the pipelined functional units, and controlling instruction execution, among other tasks. It also contains two read buffers and one write buffer for streaming data to the execution units.
The STAR-100 has two pipelines where arithmetic is performed. The first pipeline contains a floating point adder and multiplier, whereas the second pipeline is multifunctional, capable of executing all scalar instructions. It also contains a floating point adder, multiplier, and divider. Both pipelines are 64-bit for floating point operations and are controlled by microcode. The STAR-100 can split its floating point pipelines into four 32-bit pipelines, doubling the peak performance of the system to 100 MFLOPS at the expense of half the precision.
The STAR-100 uses I/O processors to offload I/O from the CPU. Each I/O processor is a 16-bit minicomputer
Minicomputer
A minicomputer is a class of multi-user computers that lies in the middle range of the computing spectrum, in between the largest multi-user systems and the smallest single-user systems...
with its own main memory of 65,536 words of 16 bits each, which is implemented with core memory. The I/O processors all share a 128-bit data bus to the SAC.
Real world performance, users and impact
The STAR-100's architecture meant that its real world performance was a fraction of its peak performance. This was due to a number of reasons. Firstly, the vector instructions, being memory-to-memory, had a relatively long startup time, since the pipeline from the memory to the functional units was very long. In contrast to the register-based pipelined functional units in the 7600, the STAR pipelines were much deeper. The problem was compounded by the fact that the STAR had a slower cycle time than the 7600 (40 ns vs 27.5 ns). So the vector length needed for the STAR to run faster than the 7600 occurred at about 50 elements; if the loops were working on data sets with fewer elements, the cost of setting up the vector pipeline was higher than the savings provided by the vector instruction(s).When the machine was released in 1974, it quickly became apparent that the general performance was nowhere near what people expected. Very few programs can be effectively vectorized into a series of single instructions; nearly all calculations will rely on the results of some earlier instruction, yet the results had to clear the pipelines before they could be fed back in. This forced most programs to hit the high setup cost of the vector units, and generally the ones that did "work" were extreme examples. Making matters worse was that the basic scalar performance was sacrificed in order to improve vector performance. Any time that the program had to run scalar instructions, the overall performance of the machine dropped dramatically. (See Amdahl's Law
Amdahl's law
Amdahl's law, also known as Amdahl's argument, is named after computer architect Gene Amdahl, and is used to find the maximum expected improvement to an overall system when only part of the system is improved...
.)
Two STAR-100 systems were eventually delivered to the Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory
The Lawrence Livermore National Laboratory , just outside Livermore, California, is a Federally Funded Research and Development Center founded by the University of California in 1952...
and one to NASA Langley Research Center
Langley Research Center
Langley Research Center is the oldest of NASA's field centers, located in Hampton, Virginia, United States. It directly borders Poquoson, Virginia and Langley Air Force Base...
. In preparation for the STAR deliveries, LLNL programmers developed a library of subroutine
Subroutine
In computer science, a subroutine is a portion of code within a larger program that performs a specific task and is relatively independent of the remaining code....
s, called STACKLIB, on the 7600 to emulate the vector operations of the STAR. In the process of developing STACKLIB, it was noticed that STACKLIB-based applications could run even faster on the 7600 than they had prior to the integration of the vector library. This discovery placed further pressures on the performance problems of the STAR.
The STAR-100 was a disappointment to everyone involved, and Jim Thornton, the chief designer, left CDC to form Network Systems Corporation
Network Systems Corporation
Network Systems Corporation was an early manufacturer of high-performance computer networking products. Founded in 1974, NSC produced hardware products that connected IBM and Control Data Corporation mainframe computers to peripherals at remote locations...
. An updated version was later released in 1979 as the Cyber 203, followed by the Cyber 205 in 1980, but by this point systems from Cray Research with considerably higher performance were on the market. The failure of the STAR led to CDC being pushed from its former dominance in the supercomputer market, something they tried to address with the formation of ETA Systems
ETA Systems
ETA Systems was a supercomputer company spun off from Control Data Corporation in the early 1980s in order to regain a footing in the supercomputer business. They successfully delivered an excellent machine, the ETA-10, but lost money continually while doing so...
in September 1983.
Further reading
- R.G. Hintz and D.P. Tate, "Control Data STAR-100 processor design," Proc. Compcon, 1972, pp. 1–4.
- P.B. Schneck, Supercomputer Architecture, Kluwer Academic, 1987, pp. 99–118.
External links
- Neil R. Lincoln with 18 Control Data Corporation (CDC) engineers on computer architecture and design, Charles Babbage InstituteCharles Babbage InstituteThe Charles Babbage Institute is a research center at the University of Minnesota specializing in the history of information technology, particularly the history since 1935 of digital computing, programming/software, and computer networking....
, University of Minnesota. Engineers include Robert Moe, Wayne Specker, Dennis Grinna, Tom Rowan, Maurice Hutson, Curt Alexander, Don Pagelkopf, Maris Bergmanis, Dolan Toth, Chuck Hawley, Larry Krueger, Mike Pavlov, Dave Resnick, Howard Krohn, Bill Bhend, Kent Steiner, Raymon Kort, and Neil R. Lincoln. Discussion topics include CDC 1604CDC 1604The CDC 1604 was a 48-bit computer designed and manufactured by Seymour Cray and his team at the Control Data Corporation. The 1604 is known as the first commercially successful transistorized computer. Legend has it that the 1604 designation was chosen by adding CDC's first street address to...
, CDC 6600CDC 6600The CDC 6600 was a mainframe computer from Control Data Corporation, first delivered in 1964. It is generally considered to be the first successful supercomputer, outperforming its fastest predecessor, IBM 7030 Stretch, by about three times...
, CDC 7600CDC 7600The CDC 7600 was the Seymour Cray-designed successor to the CDC 6600, extending Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz and had a 65 Kword primary memory using core and variable-size secondary memory...
, CDC 8600CDC 8600The CDC 8600 was the last of Seymour Cray's supercomputer designs while working for the Control Data Corporation. The "natural successor" to the CDC 6600 and CDC 7600, the 8600 was intended to be about 10 times as fast as the 7600, already the fastest computer on the market.Development started in...
, CDC STAR-100 and Seymour CraySeymour CraySeymour Roger Cray was an American electrical engineer and supercomputer architect who designed a series of computers that were the fastest in the world for decades, and founded Cray Research which would build many of these machines. Called "the father of supercomputing," Cray has been credited...
.