Mondrian data analysis
Encyclopedia
Mondrian is a general-purpose statistical data-visualization system. It features outstanding visualization techniques for data of almost any kind, and has its particular strength compared to other tools when working with Categorical Data, Geographical Data and LARGE Data.
All plots in Mondrian are fully linked, and offer various interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots.
Currently implemented plots comprise Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x.
Mondrian works with data in standard tab-delimited or comma-separated ASCII files and can load data from R workspaces. There is basic support for working directly on data in databases.
Mondrian links to R and offers statistical procedures like interactive density estimation
, scatterplot smoother
s, multidimensional scaling
(MDS) and principal component analysis (PCA).
as ".txt". If the Rserve link and R are present, Mondrian also reads data directly from R workspace files (.RData files).
All plots in Mondrian are fully linked, and offer various interactions and queries. Any case selected in a plot in Mondrian is highlighted in all other plots.
Currently implemented plots comprise Mosaic Plot, Scatterplots and SPLOM, Maps, Barcharts, Histograms, Missing Value Plot, Parallel Coordinates/Boxplots and Boxplots y by x.
Mondrian works with data in standard tab-delimited or comma-separated ASCII files and can load data from R workspaces. There is basic support for working directly on data in databases.
Mondrian links to R and offers statistical procedures like interactive density estimation
Density estimation
In probability and statistics,density estimation is the construction of an estimate, based on observed data, of an unobservable underlying probability density function...
, scatterplot smoother
Scatterplot smoother
In statistics, several scatterplot smoothing methods are available to fit a function through the points of a scatterplot to best represent the relationship between the variables....
s, multidimensional scaling
Multidimensional scaling
Multidimensional scaling is a set of related statistical techniques often used in information visualization for exploring similarities or dissimilarities in data. MDS is a special case of ordination. An MDS algorithm starts with a matrix of item–item similarities, then assigns a location to each...
(MDS) and principal component analysis (PCA).
Overview
Starting in 1997, Mondrian was first developed with a focus on visualization techniques for categorical data and enhanced selection techniques. Over the years, a complete suite of visualizations for univariate and multivariate data measured on any scale were added. The link to R offers well tested statistical procedures, which integrate seamlessly into the interactive graphics. Today, even geographical data is supported with highly interactive maps.Supported data sources
Mondrian works on plain text files with tab-separated columns with variable header, as exported from Microsoft ExcelMicrosoft Excel
Microsoft Excel is a proprietary commercial spreadsheet application written and distributed by Microsoft for Microsoft Windows and Mac OS X. It features calculation, graphing tools, pivot tables, and a macro programming language called Visual Basic for Applications...
as ".txt". If the Rserve link and R are present, Mondrian also reads data directly from R workspace files (.RData files).
Visualizations
- 1-d: Barchart, Spineplot, HistogramHistogramIn statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
, Spinogram, Boxplot - 2-d: ScatterplotScatterplotA scatter plot or scattergraph is a type of mathematical diagram using Cartesian coordinates to display values for two variables for a set of data....
, Boxplot y by x - High-D:
- Multivariate continuous: Scatterplot matrix, Parallel coordinatesParallel coordinatesParallel coordinates is a common way of visualizing high-dimensional geometry and analyzing multivariate data.To show a set of points in an n-dimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced...
- Multivariate categorical: Mosaic plot (see also TreemappingTreemappingIn information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles.- Main idea :...
)
- Multivariate continuous: Scatterplot matrix, Parallel coordinates
- Geographical: MapMapA map is a visual representation of an area—a symbolic depiction highlighting relationships between elements of that space such as objects, regions, and themes....
- Special: missing value plot
Further reading
- Theus, M. (2002). Interactive Data Visualization using Mondrian, in Journal of Statistical Software 7 (11): 1–9.
- Theus, M. and Urbanek, S. (2008). Interactive Graphics for Data Analysis: Principles and Examples (Computer Science and Data Analysis), Chapman & Hall / CRC.
External links
- Mondrian: Graphical Data Analysis Software
- Homepage for the book “Interactive Graphics for Data Analysis – Principles and Examples” - the book is heavily based on Mondrian
- theusrus - the homepage of Martin Theus