MongoDB
Encyclopedia
MongoDB is an open source, high-performance, schema-free, document-oriented database
written in the C++
programming language. It manages collections of BSON
documents that can be nested in complex hierarchies and still be easy to query and index, which allows many applications to store data in a natural way that matches their native data types and structures.
Development of MongoDB began in October 2007 by 10gen
. The first public release was in February 2009.
More features:
Queries can return specific fields of documents (instead of the entire document), as well as sorting, skipping, and limiting results.
MongoDB's query optimizer will try a number of different query plans when a query is run and select the fastest, periodically resampling. Developers can see the index being used with the `explain` function and choose a different index with the `hint` function.
Indexes can be created or removed at any time.
and lighttpd
.
of MongoDB and can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.
Example of JavaScript in a query:
> db.foo.find({$where : function { return this.x this.y; }})
Example of code sent to the database to be executed:
> db.eval(function(name) { return "Hello, "+name; }, ["Joe"])
This returns "Hello, Joe".
JavaScript variables can also be stored in the database and used by any other JavaScript as a global variable. Any legal JavaScript type, including functions and objects, can be stored in MongoDB so that JavaScript can be used to write "stored procedures."
A special type of cursor, called a tailable cursor, can be used with capped collections. This cursor was named after the `tail -f` command, and does not close when it finishes returning results but continues to wait for more to be returned, returning new results as they are inserted into the capped collection.
Deployment
MongoDB can be built and installed from source, but it is more commonly installed from a binary package. Many Linux package management systems now include a MongoDB package, including CentOS and Fedora, Debian and Ubuntu, Gentoo and Arch Linux. It can also be acquired through the official website.
MongoDB uses memory-mapped file
s, limiting data size to 2GB on 32-bit machines (64-bit systems have a much larger data size). The MongoDB server can only be used on little-endian
systems, although most of the drivers work on both little-endian and big-endian systems.
There are also a large number of unofficial drivers, for C# and .NET, ColdFusion, Delphi, Erlang, Factor, Fantom, Go, JVM languages (Clojure, Groovy, Scala, etc.), Lua, node.js, HTTP REST, Ruby, Racket, and Smalltalk.
MongoDB allows developers to guarantee that an operation has been replicated to at least N servers on a per-operation basis.
Example: starting a master/slave pair locally:
$ mkdir -p ~/dbs/master ~/dbs/slave
$ ./mongod --master --port 10000 --dbpath ~/dbs/master
$ ./mongod --slave --port 10001 --dbpath ~/dbs/slave --source localhost:10000
and PNUTS
scaling model. The developer chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.)
The developer's application must know that it is talking to a sharded cluster when performing some operations. For example, a "findAndModify" query must contain the shard key if the queried collection is sharded. The application talks to a special routing process called `mongos` that looks identical to a single MongoDB server. This `mongos` process knows what data is on each shard and routes the client's requests appropriately. All requests flow through this process: it not only forwards requests and responses but also performs any necessary final data merges or sorts. Any number of `mongos` processes can be run: usually one per application server is recommended.
, so it is a full JavaScript shell as well as being able to connect to MongoDB servers.
Administrative information can also be accessed through the admin interface: a simple html webpage that serves information about the current server status. By default, this interface is 1000 ports above the database port (http://localhost:28017) and it can be turned off with the --norest option.
mongostat is a command-line tool that displays a simple list of stats about the last second: how many inserts, updates, removes, queries, and commands were performed, as well as what percentage of the time the database was locked and how much memory it is using.
mongosniff sniffs network traffic going to and from MongoDB.
Licensing and support
Prominent users
External links
Document-oriented database
A document-oriented database is a computer program designed for storing, retrieving, and managing document-oriented, or semi structured data, information...
written in the C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
programming language. It manages collections of BSON
BSON
BSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...
documents that can be nested in complex hierarchies and still be easy to query and index, which allows many applications to store data in a natural way that matches their native data types and structures.
Development of MongoDB began in October 2007 by 10gen
10gen
10gen is a software company that develops and provides commercial support for the open source database MongoDB.-Overview:10gen was founded in 2007 by former DoubleClick Founder and CTO Dwight Merriman and former DoubleClick engineer and ShopWiki Founder and CTO Eliot Horowitz...
. The first public release was in February 2009.
Features
Among the features are:- Consistent UTF-8UTF-8UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...
encoding. Non-UTF-8 data can be saved, queried, and retrieved with a special binary data type. - Cross-platform support: binaries are available for Windows, LinuxLinuxLinux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, OS X, and Solaris. MongoDB can be compiled on almost any little-endian system. - Type-rich: supports dates, regular expressions, code, binary data, and more (all BSONBSONBSON is a computer data interchange format used mainly as a data storage and network transfer format in the MongoDB database. It is a binary form for representing simple data structures and associative arrays...
types) - CursorCursor (databases)In computer science and technology, a database cursor is a control structure that enables traversal over the records in a database. Cursors facilitate subsequent processing in conjunction with the traversal, such as retrieval, addition and removal of database records...
s for query results
More features:
Ad hoc queries
In MongoDB, any field can be queried at any time. MongoDB supports range queries, regular expression searches, and other special types of queries in addition to exactly matching fields. Queries can also include user-defined JavaScript functions (if the function returns true, the document matches).Queries can return specific fields of documents (instead of the entire document), as well as sorting, skipping, and limiting results.
Querying nested fields
Queries can "reach into" embedded objects and arrays. If the following object is inserted into the users collection:Indexing
The software supports secondary indexes, including single-key, compound, unique, non-unique, and geospatial indexes. Nested fields (as described above in the ad hoc query section) can also be indexed and indexing an array type will index each element of the array.MongoDB's query optimizer will try a number of different query plans when a query is run and select the fastest, periodically resampling. Developers can see the index being used with the `explain` function and choose a different index with the `hint` function.
Indexes can be created or removed at any time.
Aggregation
In addition to ad hoc queries, the database supports a couple of tools for aggregation, including MapReduce and a group function similar to SQL's GROUP BY.File storage
The software implements a protocol called GridFS that is used to store and retrieve files from the database. This file storage mechanism has been used in plugins for NGINXNginx
nginx is a Web server and a reverse proxy server for HTTP, SMTP, POP3 and IMAP protocols, with a strong focus on high concurrency, performance and low memory usage. It is licensed under a BSD-like license and it runs on Unix, Linux, BSD variants, Mac OS X, Solaris, and Microsoft Windows.- Overview...
and lighttpd
Lighttpd
lighttpd is an open-source web server more optimized for speed-critical environments than common products while remaining standards-compliant, secure and flexible...
.
Server-side JavaScript execution
JavaScript is the lingua francaLingua franca
A lingua franca is a language systematically used to make communication possible between people not sharing a mother tongue, in particular when it is a third language, distinct from both mother tongues.-Characteristics:"Lingua franca" is a functionally defined term, independent of the linguistic...
of MongoDB and can be used in queries, aggregation functions (such as MapReduce), and sent directly to the database to be executed.
Example of JavaScript in a query:
> db.foo.find({$where : function { return this.x this.y; }})
Example of code sent to the database to be executed:
> db.eval(function(name) { return "Hello, "+name; }, ["Joe"])
This returns "Hello, Joe".
JavaScript variables can also be stored in the database and used by any other JavaScript as a global variable. Any legal JavaScript type, including functions and objects, can be stored in MongoDB so that JavaScript can be used to write "stored procedures."
Capped collections
MongoDB supports fixed-size collections called capped collections. A capped collection is created with a set size and, optionally, number of elements. Capped collections are the only type of collection that maintains insertion order: once the specified size has been reached, a capped collection behaves like a circular queue.A special type of cursor, called a tailable cursor, can be used with capped collections. This cursor was named after the `tail -f` command, and does not close when it finishes returning results but continues to wait for more to be returned, returning new results as they are inserted into the capped collection.
Deployment
MongoDB can be built and installed from source, but it is more commonly installed from a binary package. Many Linux package management systems now include a MongoDB package, including CentOS and Fedora, Debian and Ubuntu, Gentoo and Arch Linux. It can also be acquired through the official website.
MongoDB uses memory-mapped file
Memory-mapped file
A memory-mapped file is a segment of virtual memory which has been assigned a direct byte-for-byte correlation with some portion of a file or file-like resource. This resource is typically a file that is physically present on-disk, but can also be a device, shared memory object, or other resource...
s, limiting data size to 2GB on 32-bit machines (64-bit systems have a much larger data size). The MongoDB server can only be used on little-endian
Endianness
In computing, the term endian or endianness refers to the ordering of individually addressable sub-components within the representation of a larger data item as stored in external memory . Each sub-component in the representation has a unique degree of significance, like the place value of digits...
systems, although most of the drivers work on both little-endian and big-endian systems.
Language support
MongoDB has official drivers for:- CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
- C++C++C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
- C#
- Erlang
- HaskellHaskell (programming language)Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
- JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
- JavaScriptJavaScriptJavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles....
- Lisp
- PerlPerlPerl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
- PHPPHPPHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
- PythonPython (programming language)Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
- RubyRuby (programming language)Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto...
- Scala
There are also a large number of unofficial drivers, for C# and .NET, ColdFusion, Delphi, Erlang, Factor, Fantom, Go, JVM languages (Clojure, Groovy, Scala, etc.), Lua, node.js, HTTP REST, Ruby, Racket, and Smalltalk.
Replication
MongoDB supports master-slave replication. A master can perform reads and writes. A slave copies data from the master and can only be used for reads or backup (not writes).MongoDB allows developers to guarantee that an operation has been replicated to at least N servers on a per-operation basis.
Master-slave
As operations are performed on the master, the slave will replicate any changes to the data.Example: starting a master/slave pair locally:
$ mkdir -p ~/dbs/master ~/dbs/slave
$ ./mongod --master --port 10000 --dbpath ~/dbs/master
$ ./mongod --slave --port 10001 --dbpath ~/dbs/slave --source localhost:10000
Replica sets
Replica sets are similar to master-slave, but they incorporate the ability for the slaves to elect a new master if the current one goes down.Sharding
MongoDB scales horizontally using a system called sharding which is very similar to the BigTableBigTable
BigTable is a compressed, high performance, and proprietary database system built on Google File System , Chubby Lock Service, SSTable and a few other Google technologies; it is currently not distributed nor is it used outside of Google, although Google offers access to it as part of their Google...
and PNUTS
Pnuts
Pnuts is a dynamic scripting language for the Java platform. It is designed to be used in a dual language system with the Java programming language. The goals of the Pnuts project are to provide a small, fast scripting language that has tight integration with the Java language...
scaling model. The developer chooses a shard key, which determines how the data in a collection will be distributed. The data is split into ranges (based on the shard key) and distributed across multiple shards. (A shard is a master with one or more slaves.)
The developer's application must know that it is talking to a sharded cluster when performing some operations. For example, a "findAndModify" query must contain the shard key if the queried collection is sharded. The application talks to a special routing process called `mongos` that looks identical to a single MongoDB server. This `mongos` process knows what data is on each shard and routes the client's requests appropriately. All requests flow through this process: it not only forwards requests and responses but also performs any necessary final data merges or sorts. Any number of `mongos` processes can be run: usually one per application server is recommended.
Official tools
The most powerful and useful management tool is the database shell, mongo. The shell lets developers view, insert, remove, and update data in their databases, as well as get replication information, setting up sharding, shut down servers, execute JavaScript, and more. mongo is built on SpiderMonkeySpiderMonkey
SpiderMonkey is the code name for the first-ever JavaScript engine, written by Brendan Eich at Netscape Communications, later released as open source and now maintained by the Mozilla Foundation.-History:Eich "wrote JavaScript in ten days" in 1995,...
, so it is a full JavaScript shell as well as being able to connect to MongoDB servers.
Administrative information can also be accessed through the admin interface: a simple html webpage that serves information about the current server status. By default, this interface is 1000 ports above the database port (http://localhost:28017) and it can be turned off with the --norest option.
mongostat is a command-line tool that displays a simple list of stats about the last second: how many inserts, updates, removes, queries, and commands were performed, as well as what percentage of the time the database was locked and how much memory it is using.
mongosniff sniffs network traffic going to and from MongoDB.
Monitoring
There are monitoring plugins available for MongoDB:- muninMunin (Network Monitoring Application)Munin is a network/system monitoring application that presents output in graphs through a web interface. Its emphasis is on plug and play capabilities. About 500 monitoring plugins are currently available. Using Munin you can monitor the performance of your computers, networks, SANs, and...
- gangliaGanglia (software)Ganglia is a scalable distributed system monitor tool for high-performance computing systems such as clusters and grids. It allows the user to remotely view live or historical statistics for all machines that are being monitored.-Ganglia:It is based on a hierarchical design targeted at...
- scoutScoutA scout is a soldier performing reconnaissance and other support duties.Scout may also refer to:-Aircraft:* Scout , pre-1920s terminology for a single-seat fighter...
- cacti
GUIs
Several GUIs have been created by MongoDB's developer community to help visualize their data. Some popular ones are:- Fang of Mongo – a web-based UI built with Django and jQuery.
- Futon4Mongo – a clone of the CouchDBCouchDBApache CouchDB, commonly referred to as CouchDB, is an open source document-oriented database written mostly in the Erlang programming language. It is part of the NoSQL group of data stores and is designed for local replication and to scale horizontally across a wide range of devices...
Futon web interface for MongoDB. - JMongoBrowser – a desktop application for all platforms.
- Mongo3 – a Ruby-based interface.
- MongoHub – a native OS X application for managing MongoDB.
- Opricot – a browser-based MongoDB shell written in PHP.
- Database Master Windows based MongoDB Management Studio, supports also RDBMS.
Licensing and support
MongoDB is available for free under the GNU Affero General Public LicenseAffero General Public LicenseThe Affero General Public License, often abbreviated as Affero GPL and AGPL , refers to two distinct, though historically related, free software licenses:...
. The language drivers are available under an Apache LicenseApache LicenseThe Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer....
.
Prominent users
- MTV NetworksMTV NetworksMTV Networks is a division of media conglomerate Viacom that oversees the operations of many television channels and Internet brands, including the original MTV channel in the United States...
- craigslistCraigslistCraigslist is a centralized network of online communities featuring free online classified advertisements, with sections devoted to jobs, housing, personals, for sale, services, community, gigs, résumés, and discussion forums....
- Disney Interactive Media GroupDisney Interactive Media GroupThe Disney Interactive Media Group , formerly known as the Walt Disney Internet Group and Disney Interactive Studios, oversees various websites and interactive media owned by The Walt Disney Company and its subsidiaries....
- WordnikWordnikWordnik.com is an online dictionary and language resource that provides dictionary and thesaurus content, some of it based on print dictionaries such as the Century Dictionary, the American Heritage Dictionary, WordNet, and GCIDE...
- diasporaDiaspora (software)Diaspora is a free personal web server that implements a distributed social networking service. Installations of the software form nodes which make up the distributed Diaspora social network....
- ShutterflyShutterflyShutterfly is an Internet-based social expression and personal publishing service. Shutterfly's flagship product is its photo book line. It is based in Redwood City, California.-Features:...
- foursquare
- bit.lyBit.lybitly is a URL shortening service owned by bitly, Inc., a betaworks company. It is especially popular on microblogging website Twitter because it is the default URL shortening service on the website since May 6, 2009, replacing TinyURL...
- The New York TimesThe New York TimesThe New York Times is an American daily newspaper founded and continuously published in New York City since 1851. The New York Times has won 106 Pulitzer Prizes, the most of any news organization...
- SourceForgeSourceForgeSourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...
- Business InsiderBusiness InsiderBusiness Insider is a U.S. business/entertainment news website launched in February 2009. Founded by DoubleClick Founder and former C.E.O. Kevin P. Ryan it is the overarching brand beneath which fall the Silicon Alley Insider and Clusterstock verticals...
- EtsyEtsyEtsy is an e-commerce website focused on handmade or vintage items as well as art and craft supplies. These items cover a wide range including art, photography, clothing, jewelry, edibles, bath & beauty products, quilts, knick-knacks and toys. Many individuals also sell craft supplies like beads,...
- CERN LHCLarge Hadron ColliderThe Large Hadron Collider is the world's largest and highest-energy particle accelerator. It is expected to address some of the most fundamental questions of physics, advancing the understanding of the deepest laws of nature....
- ThumbtackThumbtack (website)Thumbtack is an internet marketplace for local services, launched in December 2009. Thumbtack allows service providers and consumers to find each other and negotiate jobs online...
- AppScaleAppScaleAppScale is an open-source framework for running Google App Engine applications. It is an implementation of a cloud computing platform , supporting Xen, KVM, Amazon EC2 and Eucalyptus. It has been developed and is maintained by the RACELab at UC Santa Barbara.AppScale allows users to upload...
- UberÜberÜber comes from the German language. It has one umlaut. It is a cognate of both Latin super and Greek ὑπέρ...
External links
- Official MongoDB Project Website
- MongoDB with ZanPHP Spanish Documentation
- mongoDB User Group on LinkedInLinkedInLinkedIn is a business-related social networking site. Founded in December 2002 and launched in May 2003, it is mainly used for professional networking. , LinkedIn reports more than 120 million registered users in more than 200 countries and territories. The site is available in English, French,...
- MongoDB news and articles on myNoSQL
- Eric Lai. (2009, July 1). No to SQL? Anti-database movement gains steam
- MongoDB articles on NoSQLDatabases.com
- June 2009 San Francisco NOSQL Meetup Page
- Designing for the Cloud at MIT Technology Review
- EuroPython Conference Presentation
- Interview with Mike Dirolf on The Changelog about MongoDB background and design decisions
- MongoMvc - A MongoDB Demo App with ASP.NET MVC
- FAQs about MongoDB