GT.M
Encyclopedia
GT.M is a high-throughput key-value database
engine optimized for transaction processing
. (It is a type also referred to as "schema-less", "schema-free," or "NoSQL.") GT.M is also an application development platform and a compiler
for the ISO standard M
language, also known as MUMPS
.
GT.M, an abbreviation for Greystone Technology M, was developed by the Greystone Technology Corp in the 1980s. It is an implementation of ANSI standard M for various UNIX
systems and OpenVMS
. In addition to preserving the traditional features of M, GT.M also offers an optimizing compiler that produces object code that does not require internal interpreters
during execution.
The database engine, made open source in 2000, is maintained by Fidelity Information Services
. GT.M is used as the backend of their FIS Profile banking application, and it powers ING DIRECT banks in the United States, Canada, Spain, France and Italy. It is also used as an open source backend for the Electronic Health Record
system WorldVistA
and other open source EHRs such as Medsphere's OpenVista. It is listed as an open source healthcare solution partner of Red Hat
. Today it consists of approximately 2 million lines of code.
There is only one data structure - multi-dimensional sparse arrays (key-value nodes, sub-trees, and associative memory are all equally valid descriptions) with up to 32 subscripts. A scalar can be thought of as an array element with zero subscripts. Nodes with varying numbers of subscripts (including one node with no subscripts) can freely co-exist in the same array. For example, if one wanted to represent the national capitals of the United States:
Variables are created on demand when first assigned to. Thus, the first Set command above would create the variable
but the caret (^) means that it is a database access. Variables used for database access have a single global scope, and of course persist and shared between processes. They are called global variables. The first 31 characters of a variable name are significant.
The Kill and ZKill commands are used to delete subtrees of values.
Numbers in GT.M are accurate to 18 significant digits, of which accuracy to 3 digits to the right of the decimal point are assured when there are 15 or fewer digits to the left. Scientific notation is supported for larger numbers.
GT.M uses Unicode
(ISO/IEC-10646
) for international character set support.
A database file consists of up to 224M (276,168,704) database blocks. A database block is a multiple of 512 bytes, with a maximum size of 65,024 bytes. Commonly used block sizes are 4KB, 8KB and 16KB - so, with an 8KB block size, an individual global variable can grow to 1,792GB. A global variable node (global variable, subscripts plus value) must fit in one database block and each block has a 16 byte overhead. So, the largest node that will fit in a database with a 4KB block size is 4,080 bytes. A key (global variable plus subscripts) can be up to 255 bytes.
The database engine is daemonless and processes accessing the database operate with normal user and group ids - a process has access to a database file if and only if the ownership and permissions of that database file (plus any layered access control such as SELinux permits access). Each process has within its address space all the logic needed to manage the database, and processes cooperate with one another to manage database files. When a database file is journaled, updates are written to journal files before being written to database files, and in the event of a system crash, database files can be recovered from journal files.
The database engine also supports transaction processing
. So, code such as:
implements an ACID
transaction. GT.M uses optimistic concurrency control
to manage transactions.
A plug-in architecture allows the database to be encrypted in order to protect data at rest. GT.M is distributed with reference plug-in that uses GnuPG
.
GT.M routines are dynamically compiled and linked for execution in the address space of each process. With the exception of the 32-bit implementation of GT.M for the x86 GNU/Linux platform, object modules can also be placed in shared libraries with the standard
has over 20,000 routines whose compiled object code exceeds 200MB. A large hospital running VistA can have thousands of concurrently running user processes.
With a couple of small exceptions, GT.M includes a nearly complete implementation of ISO standard M (affectionately known as MUMPS
for historical reasons).
In GT.M, M code can freely call out to C code (or code in other languages with a C compatible interface), and C code can freely call in to M code (so the top level program can be a C
.
Web services written in GT.M can be deployed under an Internet super server
such as inetd
or xinetd
. Web enabled applications can use layered software such as EWD.
GT.M is also supported on the following platforms. Although bugs are fixed, releases get new functionality only when the code changes are portable to the platforms with no extra work:
On the latter set of platforms, and on GNU/Linux
on the IA-32
(x86) architecture, GT.M is a 32-bit application; on all others, it is a 64-bit application.
The code base for GT.M on GNU/Linux
on IA-32
(x86) includes changes needed to run on Cygwin
on Microsoft Windows
but this is not yet considered a supported platform.
on x86-64
& IA-32
(x86), and on OpenVMS
on Alpha/AXP
, GT.M is released as Free / Open Source Software (FOSS)
under the terms of the GNU Affero General Public License, version 3. On other platforms, it is available under proprietary licenses.
. Through FIS Profile, it powers ING DIRECT banks in the United States, Canada, Spain, France and Italy.
SQL
and ODBC access to GT.M databases exists as separate commercial products.
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
engine optimized for transaction processing
Transaction processing
In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state...
. (It is a type also referred to as "schema-less", "schema-free," or "NoSQL.") GT.M is also an application development platform and a compiler
Compiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
for the ISO standard M
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...
language, also known as MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...
.
GT.M, an abbreviation for Greystone Technology M, was developed by the Greystone Technology Corp in the 1980s. It is an implementation of ANSI standard M for various UNIX
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
systems and OpenVMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...
. In addition to preserving the traditional features of M, GT.M also offers an optimizing compiler that produces object code that does not require internal interpreters
Interpreter (computing)
In computer science, an interpreter normally means a computer program that executes, i.e. performs, instructions written in a programming language...
during execution.
The database engine, made open source in 2000, is maintained by Fidelity Information Services
Fidelity National Information Services
Fidelity National Information Services, Inc. , also known as FIS, is a publicly traded corporation and is a global provider of banking and payments technology solutions, processing services and information-based services...
. GT.M is used as the backend of their FIS Profile banking application, and it powers ING DIRECT banks in the United States, Canada, Spain, France and Italy. It is also used as an open source backend for the Electronic Health Record
Electronic Health Record
An electronic health record is an evolving concept defined as a systematic collection of electronic health information about individual patients or populations...
system WorldVistA
WorldVistA
WorldVistA is an open source implementation of the Veteran Administration's Electronic Health Record system intended for use in health care facilities outside the VA.-Background:...
and other open source EHRs such as Medsphere's OpenVista. It is listed as an open source healthcare solution partner of Red Hat
Red Hat
Red Hat, Inc. is an S&P 500 company in the free and open source software sector, and a major Linux distribution vendor. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina with satellite offices worldwide....
. Today it consists of approximately 2 million lines of code.
Technical Overview
GT.M consists of a language subsystem, a database subsystem, and utility programs. The language subsystem and database subsystem are closely integrated, but each is usable without the other. The language and database subsystems share common data organization and typing.Data Organization and Typing
GT.M has only two data types - canonical numbers and strings. A string is any arbitrary sequence of bytes (including nulls). A string such as"42"
is a canonical number. Data typing is dynamic and conversion between the two types is performed on the fly as needed: 1+"42"
yields the result 43
, and the first character of 43
is 4
.There is only one data structure - multi-dimensional sparse arrays (key-value nodes, sub-trees, and associative memory are all equally valid descriptions) with up to 32 subscripts. A scalar can be thought of as an array element with zero subscripts. Nodes with varying numbers of subscripts (including one node with no subscripts) can freely co-exist in the same array. For example, if one wanted to represent the national capitals of the United States:
- Set Capital("United States")="Washington"
- Set Capital("United States",1774,1776)="Philadelphia"
- Set Capital("United States",1776,1777)="Baltimore"
Variables are created on demand when first assigned to. Thus, the first Set command above would create the variable
Capital
. Variables have scope in the language, and are called local variables. A database access looks like an array access, for example:
- Set ^Capital("United States")="Washington"
but the caret (^) means that it is a database access. Variables used for database access have a single global scope, and of course persist and shared between processes. They are called global variables. The first 31 characters of a variable name are significant.
The Kill and ZKill commands are used to delete subtrees of values.
Numbers in GT.M are accurate to 18 significant digits, of which accuracy to 3 digits to the right of the decimal point are assured when there are 15 or fewer digits to the left. Scientific notation is supported for larger numbers.
GT.M uses Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
(ISO/IEC-10646
Universal Character Set
The Universal Character Set , defined by the International Standard ISO/IEC 10646, Information technology — Universal multiple-octet coded character set , is a standard set of characters upon which many character encodings are based...
) for international character set support.
Database Subsystem
The logical database of a GT.M process consists of one or more global variable name spaces, each consisting of unlimited number of global variables. For each global variable name space, a global directory maps global variables to the database files where they actually reside. An unlimited number of global variables can fit within one database file; a global variable must fit in one database file.A database file consists of up to 224M (276,168,704) database blocks. A database block is a multiple of 512 bytes, with a maximum size of 65,024 bytes. Commonly used block sizes are 4KB, 8KB and 16KB - so, with an 8KB block size, an individual global variable can grow to 1,792GB. A global variable node (global variable, subscripts plus value) must fit in one database block and each block has a 16 byte overhead. So, the largest node that will fit in a database with a 4KB block size is 4,080 bytes. A key (global variable plus subscripts) can be up to 255 bytes.
The database engine is daemonless and processes accessing the database operate with normal user and group ids - a process has access to a database file if and only if the ownership and permissions of that database file (plus any layered access control such as SELinux permits access). Each process has within its address space all the logic needed to manage the database, and processes cooperate with one another to manage database files. When a database file is journaled, updates are written to journal files before being written to database files, and in the event of a system crash, database files can be recovered from journal files.
The database engine also supports transaction processing
Transaction processing
In computer science, transaction processing is information processing that is divided into individual, indivisible operations, called transactions. Each transaction must succeed or fail as a complete unit; it cannot remain in an intermediate state...
. So, code such as:
- TStart
- Set ^Capital("France")="Paris"
- Set ^Country("Paris")="France"
- TCommit
implements an ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
transaction. GT.M uses optimistic concurrency control
Optimistic concurrency control
In the field of relational database management systems, optimistic concurrency control is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect...
to manage transactions.
A plug-in architecture allows the database to be encrypted in order to protect data at rest. GT.M is distributed with reference plug-in that uses GnuPG
GNU Privacy Guard
GNU Privacy Guard is a GPL Licensed alternative to the PGP suite of cryptographic software. GnuPG is compliant with RFC 4880, which is the current IETF standards track specification of OpenPGP...
.
Language Subsystem
Unlike the database where global variable nodes must fit within a database block, local variable strings can grow to 1MB. The GT.M run-time provides dynamic storage allocation with garbage collection. The number of local variables and the number of nodes in local variables are limited only by storage available to the process. The default scope of a local variable is the lifetime of a process. Local variables created within routines using the New command have more limited scope.GT.M routines are dynamically compiled and linked for execution in the address space of each process. With the exception of the 32-bit implementation of GT.M for the x86 GNU/Linux platform, object modules can also be placed in shared libraries with the standard
ld
command, in which case the memory used is shared. This is important because an application such as VistAVistA
The Veterans Health Information Systems and Technology Architecture is an enterprise-wide information system built around an Electronic Health Record , used throughout the United States Department of Veterans Affairs medical system, known as the Veterans Health Administration .It's a collection...
has over 20,000 routines whose compiled object code exceeds 200MB. A large hospital running VistA can have thousands of concurrently running user processes.
With a couple of small exceptions, GT.M includes a nearly complete implementation of ISO standard M (affectionately known as MUMPS
MUMPS
MUMPS , or alternatively M, is a programming language created in the late 1960s, originally for use in the healthcare industry. It was designed for the production of multi-user database-driven applications...
for historical reasons).
In GT.M, M code can freely call out to C code (or code in other languages with a C compatible interface), and C code can freely call in to M code (so the top level program can be a C
main
). For example is a GT.M module in CPAN and m_python for access from PythonPython (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
.
Web services written in GT.M can be deployed under an Internet super server
Super-server
A super-server or sometimes called a service dispatcher is a type of daemon run generally on Unix-like systems.- Usage :It starts other servers when needed, normally with access to them checked by a TCP wrapper. It uses very few resources when in idle state...
such as inetd
Inetd
inetd is a super-server daemon on many Unix systems that manages Internet services. First appearing in 4.3BSD , it is generally located at /usr/sbin/inetd.-Function:...
or xinetd
Xinetd
In computer networking, xinetd, the eXtended InterNET Daemon, is an open-source super-server daemon which runs on many Unix-like systems and manages Internet-based connectivity...
. Web enabled applications can use layered software such as EWD.
Platforms
GT.M is fully supported on the following platforms (in alphabetic order):- AIX on IBM System pIBM System pThe System p, formerly known as RS/6000, was IBM's RISC/UNIX-based server and workstation product line.In April 2008, IBM announced a rebranding of the System p and its unification with the System i platform. The resulting product line is called IBM Power Systems.-History:It was originally a line...
- GNU/LinuxLinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
on ItaniumItaniumItanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems...
, x86 64X86-64x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
and IA-32IA-32IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...
(x86) architectures - HP-UXHP-UXHP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...
on ItaniumItaniumItanium is a family of 64-bit Intel microprocessors that implement the Intel Itanium architecture . Intel markets the processors for enterprise servers and high-performance computing systems... - Solaris on SPARCSPARCSPARC is a RISC instruction set architecture developed by Sun Microsystems and introduced in mid-1987....
- z/OSZ/OSz/OS is a 64-bit operating system for mainframe computers, produced by IBM. It derives from and is the successor to OS/390, which in turn followed a string of MVS versions.Starting with earliest:*OS/VS2 Release 2 through Release 3.8...
on IBM System z
GT.M is also supported on the following platforms. Although bugs are fixed, releases get new functionality only when the code changes are portable to the platforms with no extra work:
- HP-UXHP-UXHP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...
on HP 9000HP 9000HP 9000 is the name for a line of workstation and server computer systems produced by the Hewlett-Packard Company . The native operating system for almost all HP 9000 systems is HP-UX, a derivative of Unix. The HP 9000 brand was introduced in 1984 to encompass several existing technical...
(PA-RISC) - OpenVMSOpenVMSOpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...
on Alpha/AXPDEC AlphaAlpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors... - Tru64 UNIXTru64 UNIXTru64 UNIX is a 64-bit UNIX operating system for the Alpha instruction set architecture , currently owned by Hewlett-Packard . Previously, Tru64 UNIX was a product of Compaq, and before that, Digital Equipment Corporation , where it was known as Digital UNIX .As its original name suggests, Tru64...
on Alpha/AXPDEC AlphaAlpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
On the latter set of platforms, and on GNU/Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
on the IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...
(x86) architecture, GT.M is a 32-bit application; on all others, it is a 64-bit application.
The code base for GT.M on GNU/Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
on IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...
(x86) includes changes needed to run on Cygwin
Cygwin
Cygwin is a Unix-like environment and command-line interface for Microsoft Windows. Cygwin provides native integration of Windows-based applications, data, and other system resources with applications, software tools, and data of the Unix-like environment...
on Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
but this is not yet considered a supported platform.
Licensing
On GNU/LinuxLinux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
on x86-64
X86-64
x86-64 is an extension of the x86 instruction set. It supports vastly larger virtual and physical address spaces than are possible on x86, thereby allowing programmers to conveniently work with much larger data sets. x86-64 also provides 64-bit general purpose registers and numerous other...
& IA-32
IA-32
IA-32 , also known as x86-32, i386 or x86, is the CISC instruction-set architecture of Intel's most commercially successful microprocessors, and was first implemented in the Intel 80386 as a 32-bit extension of x86 architecture...
(x86), and on OpenVMS
OpenVMS
OpenVMS , previously known as VAX-11/VMS, VAX/VMS or VMS, is a computer server operating system that runs on VAX, Alpha and Itanium-based families of computers. Contrary to what its name suggests, OpenVMS is not open source software; however, the source listings are available for purchase...
on Alpha/AXP
DEC Alpha
Alpha, originally known as Alpha AXP, is a 64-bit reduced instruction set computer instruction set architecture developed by Digital Equipment Corporation , designed to replace the 32-bit VAX complex instruction set computer ISA and its implementations. Alpha was implemented in microprocessors...
, GT.M is released as Free / Open Source Software (FOSS)
Free and open source software
Free and open-source software or free/libre/open-source software is software that is liberally licensed to grant users the right to use, study, change, and improve its design through the availability of its source code...
under the terms of the GNU Affero General Public License, version 3. On other platforms, it is available under proprietary licenses.
Common applications
GT.M is predominantly used in healthcare and financial services industry. The first production use of GT.M was in 1986 at the Elvis Presley Memorial Trauma Center in Memphis, TennesseeMemphis, Tennessee
Memphis is a city in the southwestern corner of the U.S. state of Tennessee, and the county seat of Shelby County. The city is located on the 4th Chickasaw Bluff, south of the confluence of the Wolf and Mississippi rivers....
. Through FIS Profile, it powers ING DIRECT banks in the United States, Canada, Spain, France and Italy.
SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
and ODBC access to GT.M databases exists as separate commercial products.
Further reading
- Ignacio Valdes (November 17, 2002), K.S. Bhaskar Receives 2002 LMN Achievement Award, linuxmednews.com