Metakit
Encyclopedia
Metakit is an embedded database
library with a small footprint. It fills the gap between flat-file
, relational
, object-oriented
, and tree-structured
database
s, supporting relational joins, serialization, nested structures, and instant schema evolution. Interfaces for C++
(native), Python
and Tcl
are the most used.
. Its development started around 1997 and in 2001 it released as open source under the MIT X11
license. The author provides commercial support. In the last few years, however, Wippler has spent less time on Metakit and more on his other projects.
The database is used in several commercial products (including Address Book
in Mac OS X
10.4 and earlier) and in several open source (KDE's
feed reader Akregator among) and in-house projects (typically using Python or TCL interface). A related project, Starkit (virtual file system
for TCL), written by Wippler, reached popularity among TCL programmers.
The mailing-list of Metakit has active subscribers and is regularly posted to by Wippler. Other developers have contributed to the project with bug fixes and suggestions.
systems which store rows of a database table
in one place (row-oriented architecture) Metakit stores individual columns separately (column-oriented architecture
). For many years only linear access to the tables was possible (with complexity O(1)
for access and O(N)
for search), later hash structures
and B-tree
like structures were added (reducing typical search complexity to O(1)). Relational operations (like group-by and joins
) were also added over years. It is possible to combine and process table data via flexible mechanisms called views. The database data are portable among platforms. Disk space overhead of Metakit is very low — several techniques are employed automatically to reduce it as much as possible. Viewer of Metakit database structures (named Kitview) is provided.
Practical limit to database size is around 1GB
(even on 64-bit
platforms). Multithreaded and multiuser access requires manual support from the programmer and is discouraged (in C++, TCL and Python use one automatically global lock). Combinations of more advanced features are often not tested and may fail. It is possible to obtain somewhat better performance than with other databases (published benchmarks include SQLite
and Berkeley DB
) but it requires lot of testing and lot of knowledge of Metakit internals. Metakit's API
is low level, compared to SQL.
The biggest weakness of Metakit is its rather spotty and sometimes obsolete documentation
. Full understanding of its API and performance tuning requires deep study of library's source code. Metakits terminology has many differences to standard database terminology. The API and file format has changed several times over time.
Metakit is tested on Windows
, Unix
and Mac OS X
.
Embedded Database
An embedded database system is a database management system which is tightly integrated with an application software that requires access to stored data, such that the database system is “hidden” from the application’s end-user and requires little or no ongoing maintenance...
library with a small footprint. It fills the gap between flat-file
Flat file database
A flat file database describes any of various means to encode a database model as a single file .- Overview :...
, relational
Relational model
The relational model for database management is a database model based on first-order predicate logic, first formulated and proposed in 1969 by Edgar F...
, object-oriented
Object database
An object database is a database management system in which information is represented in the form of objects as used in object-oriented programming...
, and tree-structured
Hierarchical model
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The structure allows representing information using parent/child relationships: each parent can have many children, but each child has only one parent...
database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s, supporting relational joins, serialization, nested structures, and instant schema evolution. Interfaces for C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
(native), Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
and Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
are the most used.
History
Metakit was written by Jean-Claude Wippler, a software developer from the NetherlandsNetherlands
The Netherlands is a constituent country of the Kingdom of the Netherlands, located mainly in North-West Europe and with several islands in the Caribbean. Mainland Netherlands borders the North Sea to the north and west, Belgium to the south, and Germany to the east, and shares maritime borders...
. Its development started around 1997 and in 2001 it released as open source under the MIT X11
MIT License
The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms...
license. The author provides commercial support. In the last few years, however, Wippler has spent less time on Metakit and more on his other projects.
The database is used in several commercial products (including Address Book
Address Book
Address Book is an address book for Apple's Mac OS X. It features various syncing features and integrates into the rest of the OS.-Features:*Exports and imports cards to and from vCard 3.0 format....
in Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
10.4 and earlier) and in several open source (KDE's
KDE
KDE is an international free software community producing an integrated set of cross-platform applications designed to run on Linux, FreeBSD, Microsoft Windows, Solaris and Mac OS X systems...
feed reader Akregator among) and in-house projects (typically using Python or TCL interface). A related project, Starkit (virtual file system
Virtual file system
A virtual file system or virtual filesystem switch is an abstraction layer on top of a more concrete file system. The purpose of a VFS is to allow client applications to access different types of concrete file systems in a uniform way...
for TCL), written by Wippler, reached popularity among TCL programmers.
The mailing-list of Metakit has active subscribers and is regularly posted to by Wippler. Other developers have contributed to the project with bug fixes and suggestions.
Features
Unlike most other databaseDatabase
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
systems which store rows of a database table
Table (database)
In relational databases and flat file databases, a table is a set of data elements that is organized using a model of vertical columns and horizontal rows. A table has a specified number of columns, but can have any number of rows...
in one place (row-oriented architecture) Metakit stores individual columns separately (column-oriented architecture
Column-oriented DBMS
A column-oriented DBMS is a database management system that stores its content by column rather than by row. This has advantages for data warehouses and library catalogues where aggregates are computed over large numbers of similar data items....
). For many years only linear access to the tables was possible (with complexity O(1)
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
for access and O(N)
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
for search), later hash structures
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
and B-tree
B-tree
In computer science, a B-tree is a tree data structure that keeps data sorted and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-tree is a generalization of a binary search tree in that a node can have more than two children...
like structures were added (reducing typical search complexity to O(1)). Relational operations (like group-by and joins
Join (SQL)
An SQL join clause combines records from two or more tables in a database. It creates a set that can be saved as a table or used as is. A JOIN is a means for combining fields from two tables by using values common to each. ANSI standard SQL specifies four types of JOINs: INNER, OUTER, LEFT, and RIGHT...
) were also added over years. It is possible to combine and process table data via flexible mechanisms called views. The database data are portable among platforms. Disk space overhead of Metakit is very low — several techniques are employed automatically to reduce it as much as possible. Viewer of Metakit database structures (named Kitview) is provided.
Practical limit to database size is around 1GB
Gigabyte
The gigabyte is a multiple of the unit byte for digital information storage. The prefix giga means 109 in the International System of Units , therefore 1 gigabyte is...
(even on 64-bit
64-bit
64-bit is a word size that defines certain classes of computer architecture, buses, memory and CPUs, and by extension the software that runs on them. 64-bit CPUs have existed in supercomputers since the 1970s and in RISC-based workstations and servers since the early 1990s...
platforms). Multithreaded and multiuser access requires manual support from the programmer and is discouraged (in C++, TCL and Python use one automatically global lock). Combinations of more advanced features are often not tested and may fail. It is possible to obtain somewhat better performance than with other databases (published benchmarks include SQLite
SQLite
SQLite is an ACID-compliant embedded relational database management system contained in a relatively small C programming library. The source code for SQLite is in the public domain and implements most of the SQL standard...
and Berkeley DB
Berkeley DB
Berkeley DB is a computer software library that provides a high-performance embedded database for key/value data. Berkeley DB is a programmatic software library written in C with API bindings for C++, PHP, Java, Perl, Python, Ruby, Tcl, Smalltalk, and most other programming languages...
) but it requires lot of testing and lot of knowledge of Metakit internals. Metakit's API
Application programming interface
An application programming interface is a source code based specification intended to be used as an interface by software components to communicate with each other...
is low level, compared to SQL.
The biggest weakness of Metakit is its rather spotty and sometimes obsolete documentation
Software documentation
Software documentation or source code documentation is written text that accompanies computer software. It either explains how it operates or how to use it, and may mean different things to people in different roles....
. Full understanding of its API and performance tuning requires deep study of library's source code. Metakits terminology has many differences to standard database terminology. The API and file format has changed several times over time.
Metakit is tested on Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, Unix
Unix
Unix is a multitasking, multi-user computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs, including Ken Thompson, Dennis Ritchie, Brian Kernighan, Douglas McIlroy, and Joe Ossanna...
and Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
.
Language bindings
- C++C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
(native): Metakit is written in C++ (without using its new features so even very old compilers can handle it). - PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
: called Mk4py - TclTclTcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
: called Mk4tcl, with an optional OO binding on top called Oomk. - Other languages can be interfaced with help of SWIGSWIGSWIG is an open source software tool used to connect computer programs or libraries written in C or C++ with scripting languages such as Lua, Perl, PHP, Python, R, Ruby, Tcl, and other languages like C#, Java, Modula-3, Objective Caml, Octave, and Scheme...
.
External links
- Metakit site
- e4Graph: library that allows programs to store and manipulate graphGraph (data structure)In computer science, a graph is an abstract data structure that is meant to implement the graph and hypergraph concepts from mathematics.A graph data structure consists of a finite set of ordered pairs, called edges or arcs, of certain entities called nodes or vertices...
-like data persistently, built on top of Metakit