MasPar
Encyclopedia
MasPar Computer Corporation was a minisupercomputer
vendor that was founded in 1987 by Jeff Kalb. The company was based in Sunnyvale, California
.
While Kalb was the Vice-President of DEC
's VLSI chip-building division, some researchers in that division were building a supercomputer based on the Goodyear MPP
(massively parallel processor) supercomputer. The DEC researchers enhanced the architecture by:
After Digital decided not to commercialize the research project, Kalb decided to start a company to sell this minisupercomputer. In 1990, the first generation product MP-1 was delivered. In 1992, the follow-on MP-2 was shipped. The company shipped more than 200 systems. Samples of MasPar MPs, from the NASA
Goddard Space Flight Center
, are in storage at the Computer History Museum
.
Maspar offered a family of SIMD machines, second sourced by DEC. The processor units are proprietary.
There was no MP-3. MasPar exited the computer hardware business in June 1996, halting all hardware development and transforming itself into a new data mining
software company called Neovista Software. In 1999, Neovista was acquired by Accrue Software.
supercomputers (as opposed to vector machines). In this approach, a collection of ALU
's listen to a program broadcast from a central source. The ALUs can do their own data fetch, but are all under control of a central Array Control Unit. There is a central clock. The emphasis is on communications efficiency, and low latency. The MasPar architecture is designed to scale, and balance processing, memory, and communication.
Maspar uses a full custom
CMOS
chip, the MP-2 PE, designed in-house, and fabricated by various vendors such as HP or TI
.
The Array Control Unit (ACU) handles instruction fetch. It is a load-store architecture. The MasPar architecture is Harvard
in a broad sense. The ACU implements a microcode
d instruction fetch, but achieves a RISC-like 1 instruction per clock. The Arithmetic units, ALU's with data fetch capability, are implemented 32 to a chip. Each ALU is connected in a nearest neighbor fashion to 8 others. The edge connections are brought off-chip. In this scheme, the perimeters can be toroid
-wrapped. Up to 16,384 units can be connected within the confines of a cabinet. A global router, essentially a cross-bar switch, provides external I/O to the processor array.
The MP-2 PE chip contains 32 processor elements, each a full 32-bit ALU with floating point, registers, and a barrel shifter
. Only the instruction fetch feature is removed, and placed in the ACU. The PE design is literally replicated 32 times on the chip. The chip is designed to interface to DRAM
, to other processor array chips, and to communication router chips.
Each ALU, called a PE slice, contains sixty four 32 bit registers that are used for both integer and floating point. The registers are, interestingly, bit and byte
addressable. The floating point unit handles single precision and double precision
arithmetic on IEEE format numbers. Each PE slice contains two registers for data memory address, and the data. Each PE also has two one-bit serial ports, one for inbound and one for outbound communication to its nearest neighbor. The direction of communication is controlled globally. The PEs also have inbound and outbound paths to a global router for I/O. A broadcast port allows a single instance of data to be "promoted" to parallel data. Alternately, global data can be 'or-ed' to a scalar result.
The serial links support 1 Mbyte/s bit-serial communication that allows coordinated register-register communication between processors. Each processor has its own local memory, implemented in DRAM. No internal memory is included on the processors. Microcoded instruction decode is used.
The 32 PEs on a chip are clustered into two groups sharing a common memory interface, or M-machine, for access. A global scoreboard keeps track of memory and register usage. The path to memory is 16 bits wide. Both big and little endian formats are supported. Each processor has its own 64 Kbyte of memory. Both direct and indirect data memory addressing are supported.
The chip is implemented in 1.0-micrometre
, two-level, metal CMOS, dissipates 0.8 watt, and is packaged in a 208-pin PQFP
. A relatively low clock rate of 12.5 MHz is used.
The Maspar machines are front ended by a host machine, usually a VAX
. They are accessed by extensions to Fortran
and C
. Full IEEE single- and double-precision floating point are supported.
There is no cache for the ALU's. Cache is not required, due to the memory interface operating at commensurate speed with the ALU data accesses.
The ALU's do not implement memory management
for data memory. The ACU uses demand paged virtual memory
for the instruction memory.
criticized the open government support, by DARPA, of competitors Intel for their hypercube Personal SuperComputers (iPSC) and the Thinking Machines Connection Machine
on the pages of Datamation
.
Minisupercomputer
Minisupercomputers constituted a short-lived class of computers that emerged in the mid-1980s. As scientific computing using vector processors became more popular, the need for lower-cost systems that might be used at the departmental level instead of the corporate level created an opportunity for...
vendor that was founded in 1987 by Jeff Kalb. The company was based in Sunnyvale, California
Sunnyvale, California
Sunnyvale is a city in Santa Clara County, California, United States. It is one of the major cities that make up the Silicon Valley located in the San Francisco Bay Area...
.
While Kalb was the Vice-President of DEC
Digital Equipment Corporation
Digital Equipment Corporation was a major American company in the computer industry and a leading vendor of computer systems, software and peripherals from the 1960s to the 1990s...
's VLSI chip-building division, some researchers in that division were building a supercomputer based on the Goodyear MPP
Goodyear MPP
The Goodyear Massively Parallel Processor was amassively parallel processing supercomputer built by Goodyear Aerospacefor the NASA Goddard Space Flight Center.It was designed to deliver enormous computational power at lower cost than...
(massively parallel processor) supercomputer. The DEC researchers enhanced the architecture by:
- making the processor elements to be 4-bitBitA bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
instead of 1-bit - increasing the connectivity of each processor element to 8 neighbors from 4.
- adding a global interconnect for all of the processing elements, which was a triple-redundant switch which was easier to implement than a full crossbar switchCrossbar switchIn electronics, a crossbar switch is a switch connecting multiple inputs to multiple outputs in a matrix manner....
.
After Digital decided not to commercialize the research project, Kalb decided to start a company to sell this minisupercomputer. In 1990, the first generation product MP-1 was delivered. In 1992, the follow-on MP-2 was shipped. The company shipped more than 200 systems. Samples of MasPar MPs, from the NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...
Goddard Space Flight Center
Goddard Space Flight Center
The Goddard Space Flight Center is a major NASA space research laboratory established on May 1, 1959 as NASA's first space flight center. GSFC employs approximately 10,000 civil servants and contractors, and is located approximately northeast of Washington, D.C. in Greenbelt, Maryland, USA. GSFC,...
, are in storage at the Computer History Museum
Computer History Museum
The Computer History Museum is a museum established in 1996 in Mountain View, California, USA. The Museum is dedicated to preserving and presenting the stories and artifacts of the information age, and exploring the computing revolution and its impact on our lives.-History:The museum's origins...
.
Maspar offered a family of SIMD machines, second sourced by DEC. The processor units are proprietary.
There was no MP-3. MasPar exited the computer hardware business in June 1996, halting all hardware development and transforming itself into a new data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
software company called Neovista Software. In 1999, Neovista was acquired by Accrue Software.
Hardware
MasPar is unique in being a manufacturer of SIMDSIMD
Single instruction, multiple data , is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously...
supercomputers (as opposed to vector machines). In this approach, a collection of ALU
Arithmetic logic unit
In computing, an arithmetic logic unit is a digital circuit that performs arithmetic and logical operations.The ALU is a fundamental building block of the central processing unit of a computer, and even the simplest microprocessors contain one for purposes such as maintaining timers...
's listen to a program broadcast from a central source. The ALUs can do their own data fetch, but are all under control of a central Array Control Unit. There is a central clock. The emphasis is on communications efficiency, and low latency. The MasPar architecture is designed to scale, and balance processing, memory, and communication.
Maspar uses a full custom
Full custom
Full-custom design is a methodology for designing integrated circuits by specifying the layout of each individual transistor and the interconnections between them...
CMOS
CMOS
Complementary metal–oxide–semiconductor is a technology for constructing integrated circuits. CMOS technology is used in microprocessors, microcontrollers, static RAM, and other digital logic circuits...
chip, the MP-2 PE, designed in-house, and fabricated by various vendors such as HP or TI
Texas Instruments
Texas Instruments Inc. , widely known as TI, is an American company based in Dallas, Texas, United States, which develops and commercializes semiconductor and computer technology...
.
The Array Control Unit (ACU) handles instruction fetch. It is a load-store architecture. The MasPar architecture is Harvard
Harvard architecture
The Harvard architecture is a computer architecture with physically separate storage and signal pathways for instructions and data. The term originated from the Harvard Mark I relay-based computer, which stored instructions on punched tape and data in electro-mechanical counters...
in a broad sense. The ACU implements a microcode
Microcode
Microcode is a layer of hardware-level instructions and/or data structures involved in the implementation of higher level machine code instructions in many computers and other processors; it resides in special high-speed memory and translates machine instructions into sequences of detailed...
d instruction fetch, but achieves a RISC-like 1 instruction per clock. The Arithmetic units, ALU's with data fetch capability, are implemented 32 to a chip. Each ALU is connected in a nearest neighbor fashion to 8 others. The edge connections are brought off-chip. In this scheme, the perimeters can be toroid
Toroid
Toroid may refer to*Toroid , a doughnut-like solid whose surface is a torus.*Toroidal inductors and transformers which have wire windings on circular ring shaped magnetic cores.*Vortex ring, a toroidal flow in fluid mechanics....
-wrapped. Up to 16,384 units can be connected within the confines of a cabinet. A global router, essentially a cross-bar switch, provides external I/O to the processor array.
The MP-2 PE chip contains 32 processor elements, each a full 32-bit ALU with floating point, registers, and a barrel shifter
Barrel shifter
A barrel shifter is a digital circuit that can shift a data word by a specified number of bits in one clock cycle. It can be implemented as a sequence of multiplexers , and in such an implementation the output of one mux is connected to the input of the next mux in a way that depends on the shift...
. Only the instruction fetch feature is removed, and placed in the ACU. The PE design is literally replicated 32 times on the chip. The chip is designed to interface to DRAM
Dram
Dram or DRAM may refer to:As a unit of measure:* Dram , an imperial unit of mass and volume* Armenian dram, a monetary unit* Dirham, a unit of currency in several Arab nationsOther uses:...
, to other processor array chips, and to communication router chips.
Each ALU, called a PE slice, contains sixty four 32 bit registers that are used for both integer and floating point. The registers are, interestingly, bit and byte
Byte
The byte is a unit of digital information in computing and telecommunications that most commonly consists of eight bits. Historically, a byte was the number of bits used to encode a single character of text in a computer and for this reason it is the basic addressable element in many computer...
addressable. The floating point unit handles single precision and double precision
Double precision
In computing, double precision is a computer number format that occupies two adjacent storage locations in computer memory. A double-precision number, sometimes simply called a double, may be defined to be an integer, fixed point, or floating point .Modern computers with 32-bit storage locations...
arithmetic on IEEE format numbers. Each PE slice contains two registers for data memory address, and the data. Each PE also has two one-bit serial ports, one for inbound and one for outbound communication to its nearest neighbor. The direction of communication is controlled globally. The PEs also have inbound and outbound paths to a global router for I/O. A broadcast port allows a single instance of data to be "promoted" to parallel data. Alternately, global data can be 'or-ed' to a scalar result.
The serial links support 1 Mbyte/s bit-serial communication that allows coordinated register-register communication between processors. Each processor has its own local memory, implemented in DRAM. No internal memory is included on the processors. Microcoded instruction decode is used.
The 32 PEs on a chip are clustered into two groups sharing a common memory interface, or M-machine, for access. A global scoreboard keeps track of memory and register usage. The path to memory is 16 bits wide. Both big and little endian formats are supported. Each processor has its own 64 Kbyte of memory. Both direct and indirect data memory addressing are supported.
The chip is implemented in 1.0-micrometre
Micrometre
A micrometer , is by definition 1×10-6 of a meter .In plain English, it means one-millionth of a meter . Its unit symbol in the International System of Units is μm...
, two-level, metal CMOS, dissipates 0.8 watt, and is packaged in a 208-pin PQFP
PQFP
PQFP, or plastic quad flat pack, is a type of IC packaging. PQFP is a special case of QFP, as is the thinner TQFP package.PQFP packages can vary in thickness from 2.0 mm to 3.8 mm.-References:*...
. A relatively low clock rate of 12.5 MHz is used.
The Maspar machines are front ended by a host machine, usually a VAX
VAX
VAX was an instruction set architecture developed by Digital Equipment Corporation in the mid-1970s. A 32-bit complex instruction set computer ISA, it was designed to extend or replace DEC's various Programmed Data Processor ISAs...
. They are accessed by extensions to Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
and C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. Full IEEE single- and double-precision floating point are supported.
There is no cache for the ALU's. Cache is not required, due to the memory interface operating at commensurate speed with the ALU data accesses.
The ALU's do not implement memory management
Memory management
Memory management is the act of managing computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and freeing it for reuse when no longer needed. This is critical to the computer system.Several...
for data memory. The ACU uses demand paged virtual memory
Virtual memory
In computing, virtual memory is a memory management technique developed for multitasking kernels. This technique virtualizes a computer architecture's various forms of computer data storage , allowing a program to be designed as though there is only one kind of memory, "virtual" memory, which...
for the instruction memory.
Business History
MasPar along with NCUBENCUBE
nCUBE was a series of parallel computing computers from the company of the same name. Early generations of the hardware used a custom microprocessor...
criticized the open government support, by DARPA, of competitors Intel for their hypercube Personal SuperComputers (iPSC) and the Thinking Machines Connection Machine
Connection Machine
The Connection Machine was a series of supercomputers that grew out of Danny Hillis' research in the early 1980s at MIT on alternatives to the traditional von Neumann architecture of computation...
on the pages of Datamation
Datamation
Datamation was a print computer magazine published in the United States between 1957 and 1998. When first published it wasn't clear there would be a significant market for a computer magazine given how few computers there were...
.