JHepWork
Encyclopedia
jHepWork is an interactive framework for scientific computation, data analysis and data visualization designed for scientists, engineers and students. jHepWork is multiplatform since it is written in Java
, thus it runs on any operating system where the Java virtual machine can be installed.
The program is designed for interactive scientific plots in 2D and 3D and contains numerical scientific libraries implemented in Java for mathematical functions, random numbers, statistical analysis, curve fitting and other data mining algorithms.
jHepWork uses a high-level programming language Jython
(Python implemented in Java), but Java
coding can also be used to call jHepWork numerical and graphical libraries.
jHepWork is an attempt to create a data-analysis environment using open-source packages with a coherent user interface and to create a tool competitive to commercial programs.
The idea behind the project is to incorporate open-source mathematical and numerical software packages with GUI-type of user interfaces
into a coherent program in which the main user interface is based on short-named Java/Python classes. This was required to build an analysis environment using Java scripting
concept. A typical example will be shown below.
HepWork runs on any platforms (Windows, Mac, Linux, etc.) where Java
can be installed.
Scripts and Java code (in case of the Java programming) can be run either in a GUI editor of jHepWork or as batch programs
.
The graphical libraries of jHepWork can be used to create applets. All charts (or "Canvases") used for data representation can be embedded into Web browsers.
jHepWork can be used everywhere where an analysis of large numerical data volumes,
data mining
, statistical data analysis
and mathematics
are essential. The program can be used in natural science
s,
engineering
, modeling and analysis of financial market
s.
jHepWork is considered among five best free and open source
data-mining software.
There are several other jHepWork reviews available,.
While the program falls into the category of open source
software, it is not completely free for commercial usage (see below).
where data mining is a primary tasks. jHepWork was initially written for data analysis for particle physics
using the Java
software concept for International Linear Collider
project
developed at SLAC. Later versions of jHepWork were modified for general public use (for scientists, engineers, students for educational purpose) since the International Linear Collider project has stalled. Currently, jHepWork is a community-supported program. The main source of reference
is the book "Scientific Data analysis using Jython Scripting and Java" which discusses in depth data analysis methods
using Java
and Jython
scripting.
The string "HEP" in the project name "jHepWork" abbreviates "High-Energy Physics". But due to a wide popularity outside this area of physics, there is a trend to call the project shortly as jWork, thus skipping the abbreviation "HEP".
. The interactive development environment (IDE) used by jHepWork has some restrictions for commercial usage since language files, documentation files, examples, installer, code-assist databases, interactive help are licensed by the creative-common license. Full members of the jHepWork project have several benefits, such as: the license for a commercial usage, access to the source repository, an extended help system, a user script repository and an access to the complete documentation.
(Non-primary mirrors are not shown)
format.
This script illustrates how to glue and mix the native JAVA classes (from the package java.util) and jHepWork classes (the package jhplot) inside a script written using the Python syntax.
This script can be run either using jHepWork IDE or using a stand-alone Jython after specifying classpath to jHepWork libraries.
Here is the output of this script:
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, thus it runs on any operating system where the Java virtual machine can be installed.
The program is designed for interactive scientific plots in 2D and 3D and contains numerical scientific libraries implemented in Java for mathematical functions, random numbers, statistical analysis, curve fitting and other data mining algorithms.
jHepWork uses a high-level programming language Jython
Jython
Jython, successor of JPython, is an implementation of the Python programming language written in Java.-Overview:Jython programs can seamlessly import and use any Java class. Except for some standard modules, Jython programs use Java classes instead of Python modules...
(Python implemented in Java), but Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
coding can also be used to call jHepWork numerical and graphical libraries.
jHepWork is an attempt to create a data-analysis environment using open-source packages with a coherent user interface and to create a tool competitive to commercial programs.
The idea behind the project is to incorporate open-source mathematical and numerical software packages with GUI-type of user interfaces
Graphical user interface
In computing, a graphical user interface is a type of user interface that allows users to interact with electronic devices with images rather than text commands. GUIs can be used in computers, hand-held devices such as MP3 players, portable media players or gaming devices, household appliances and...
into a coherent program in which the main user interface is based on short-named Java/Python classes. This was required to build an analysis environment using Java scripting
Scripting language
A scripting language, script language, or extension language is a programming language that allows control of one or more applications. "Scripts" are distinct from the core code of the application, as they are usually written in a different language and are often created or at least modified by the...
concept. A typical example will be shown below.
HepWork runs on any platforms (Windows, Mac, Linux, etc.) where Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
can be installed.
Scripts and Java code (in case of the Java programming) can be run either in a GUI editor of jHepWork or as batch programs
Batch processing
Batch processing is execution of a series of programs on a computer without manual intervention.Batch jobs are set up so they can be run to completion without manual intervention, so all input data is preselected through scripts or command-line parameters...
.
The graphical libraries of jHepWork can be used to create applets. All charts (or "Canvases") used for data representation can be embedded into Web browsers.
jHepWork can be used everywhere where an analysis of large numerical data volumes,
data mining
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
, statistical data analysis
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
and mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...
are essential. The program can be used in natural science
Natural science
The natural sciences are branches of science that seek to elucidate the rules that govern the natural world by using empirical and scientific methods...
s,
engineering
Engineering
Engineering is the discipline, art, skill and profession of acquiring and applying scientific, mathematical, economic, social, and practical knowledge, in order to design and build structures, machines, devices, systems, materials and processes that safely realize improvements to the lives of...
, modeling and analysis of financial market
Financial market
In economics, a financial market is a mechanism that allows people and entities to buy and sell financial securities , commodities , and other fungible items of value at low transaction costs and at prices that reflect supply and demand.Both general markets and...
s.
jHepWork is considered among five best free and open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
data-mining software.
There are several other jHepWork reviews available,.
While the program falls into the category of open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
software, it is not completely free for commercial usage (see below).
Overview
jHepWork has several features useful for data analysis:- uses JythonJythonJython, successor of JPython, is an implementation of the Python programming language written in Java.-Overview:Jython programs can seamlessly import and use any Java class. Except for some standard modules, Jython programs use Java classes instead of Python modules...
scripting, BeanShellBeanShellBeanShell is a Java scripting language, invented by Patrick Niemeyer. It runs in the Java Runtime Environment and uses Java syntax, in addition to scripting commands and syntax.- Features :...
or the standard JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
; - can be integrated with the Web in forms of applets or Java Web-start applications, thus it is suited for distributed analysis environment via the Internet;
- jHepWork is designed from the ground up to support programming with multiple threads;
- has a full-featured IDE with syntax highlighting, syntax checker, code completion and analyser. It includes a version of IDE for small-screen devices;
- includes a help system with a code completion based on the Java reflectionReflection (physics)Reflection is the change in direction of a wavefront at an interface between two differentmedia so that the wavefront returns into the medium from which it originated. Common examples include the reflection of light, sound and water waves...
technology; - uses a platform-neutral I/O based on Google's Protocol BuffersProtocol BuffersProtocol Buffers are a serialization format with an interface description language developed by Google. The original Google implementation for C++, Java and Python is available under a free software, open source license....
. Data can be written in C++ and analyzed using Java/Jython. - databases (object databases and SQLSQLSQL is a programming language designed for managing data in relational database management systems ....
-based databases) - has a browser for serialized objects and objects created using Google Protocol Buffers;
- includes packages for statistical calculations;
- symbolic calculations similar to those found in the GNU OctaveGNU OctaveGNU Octave is a high-level language, primarily intended for numerical computations. It provides a convenient command-line interface for solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly compatible with MATLAB...
project, but rewritten in Java.
Data-analysis features
The package supports several mathematical, data-analysis and data mining features:- 2D2D computer graphics2D computer graphics is the computer-based generation of digital images—mostly from two-dimensional models and by techniques specific to them...
and 3D3D computer graphics3D computer graphics are graphics that use a three-dimensional representation of geometric data that is stored in the computer for the purposes of performing calculations and rendering 2D images...
interactive visualizationVisualization (graphic)Visualization is any technique for creating images, diagrams, or animations to communicate a message. Visualization through visual imagery has been an effective way to communicate both abstract and concrete ideas since the dawn of man...
of data, functions, histograms, charts. - analytic calculations using MatlabMATLABMATLAB is a numerical computing environment and fourth-generation programming language. Developed by MathWorks, MATLAB allows matrix manipulations, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs written in other languages,...
or OctaveOctaveIn music, an octave is the interval between one musical pitch and another with half or double its frequency. The octave relationship is a natural phenomenon that has been referred to as the "basic miracle of music", the use of which is "common in most musical systems"...
syntax - histogramHistogramIn statistics, a histogram is a graphical representation showing a visual impression of the distribution of data. It is an estimate of the probability distribution of a continuous variable and was first introduced by Karl Pearson...
s in 2D and 3D, as well as profile histograms - random numbersRandom number generationA random number generator ) is a computational or physical device designed to generate a sequence of numbers or symbols that lack any pattern, i.e. appear random....
and statistical samples - functions, including parametric equationParametric equationIn mathematics, parametric equation is a method of defining a relation using parameters. A simple kinematic example is when one uses a time parameter to determine the position, velocity, and other information about a body in motion....
s in 3D - contourContour lineA contour line of a function of two variables is a curve along which the function has a constant value. In cartography, a contour line joins points of equal elevation above a given level, such as mean sea level...
plots, scatter plots - neural networkNeural networkThe term neural network was traditionally used to refer to a network or circuit of biological neurons. The modern usage of the term often refers to artificial neural networks, which are composed of artificial neurons or nodes...
s - linear regressionLinear regressionIn statistics, linear regression is an approach to modeling the relationship between a scalar variable y and one or more explanatory variables denoted X. The case of one explanatory variable is called simple regression...
and curve fittingCurve fittingCurve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function...
using several minimization techniques - clustering analysis (K-means clustering analysis (single and multi pass), Fuzzy (C-means)Fuzzy clusteringFuzzy clustering is a class of algorithms for cluster analysis in which the allocation of data points to clusters is not "hard" but "fuzzy" in the same sense as fuzzy logic.- Explanation of clustering :...
algorithm, agglomerative hierarchical clustering) - input/outputI/OI/O may refer to:* Input/output, a system of communication for information processing systems* Input-output model, an economic model of flow prediction between sectors...
for all data objects (arrays, functions, histograms) are based on JavaJava (programming language)Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
serializationSerializationIn computer science, in the context of data storage and transmission, serialization is the process of converting a data structure or object state into a format that can be stored and "resurrected" later in the same or another computer environment...
. There is also a support for I/O from/to C++ and other languages using the Google's Protocol buffer formatProtocol BuffersProtocol Buffers are a serialization format with an interface description language developed by Google. The original Google implementation for C++, Java and Python is available under a free software, open source license....
. Several databases are supported (Java-object databases and SQLSQLSQL is a programming language designed for managing data in relational database management systems ....
-based) - cellular automation
- output to high-quality Vector graphicsVector graphicsVector graphics is the use of geometrical primitives such as points, lines, curves, and shapes or polygon, which are all based on mathematical expressions, to represent images in computer graphics...
. Support for PostScriptPostScriptPostScript is a dynamically typed concatenative programming language created by John Warnock and Charles Geschke in 1982. It is best known for its use as a page description language in the electronic and desktop publishing areas. Adobe PostScript 3 is also the worldwide printing and imaging...
, EPS, PDFPortable Document FormatPortable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....
and raster formatsRaster graphicsIn computer graphics, a raster graphics image, or bitmap, is a data structure representing a generally rectangular grid of pixels, or points of color, viewable via a monitor, paper, or other display medium...
Symbolic and numeric calculations
- Systems of polynomial equationsSystems of polynomial equationsA system of polynomial equations is a set of simultaneous equations f1 = 0, ..., fh = 0 where the fi are polynomials in several variables, say x1, ..., xn, over some field k....
solving - vectorVectorVector, a Latin word meaning "carrier", may refer in English to:-In computer science:*A one-dimensional array**Vector , a data type in the C++ Standard Template Library...
s and matrixMatrix (mathematics)In mathematics, a matrix is a rectangular array of numbers, symbols, or expressions. The individual items in a matrix are called its elements or entries. An example of a matrix with six elements isMatrices of the same size can be added or subtracted element by element...
algebra - FactorizationFactorizationIn mathematics, factorization or factoring is the decomposition of an object into a product of other objects, or factors, which when multiplied together give the original...
- derivatives
- integralIntegralIntegration is an important concept in mathematics and, together with its inverse, differentiation, is one of the two main operations in calculus...
s (rational functions) - boolean algebraBoolean algebraIn abstract algebra, a Boolean algebra or Boolean lattice is a complemented distributive lattice. This type of algebraic structure captures essential properties of both set operations and logic operations. A Boolean algebra can be seen as a generalization of a power set algebra or a field of sets...
- simplification
- geometric algebraAlgebraAlgebra is the branch of mathematics concerning the study of the rules of operations and relations, and the constructions and concepts arising from them, including terms, polynomials, equations and algebraic structures...
History
jHepWork has its roots in particle physicsParticle physics
Particle physics is a branch of physics that studies the existence and interactions of particles that are the constituents of what is usually referred to as matter or radiation. In current understanding, particles are excitations of quantum fields and interact following their dynamics...
where data mining is a primary tasks. jHepWork was initially written for data analysis for particle physics
Particle physics
Particle physics is a branch of physics that studies the existence and interactions of particles that are the constituents of what is usually referred to as matter or radiation. In current understanding, particles are excitations of quantum fields and interact following their dynamics...
using the Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
software concept for International Linear Collider
International Linear Collider
The International Linear Collider is a proposed linear particle accelerator. It is planned to have a collision energy of 500 GeV initially, and, if approved after the project has published its Technical Design Report, planned for 2012, could be completed in the late 2010s. A later upgrade to 1000...
project
developed at SLAC. Later versions of jHepWork were modified for general public use (for scientists, engineers, students for educational purpose) since the International Linear Collider project has stalled. Currently, jHepWork is a community-supported program. The main source of reference
is the book "Scientific Data analysis using Jython Scripting and Java" which discusses in depth data analysis methods
using Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
and Jython
Jython
Jython, successor of JPython, is an implementation of the Python programming language written in Java.-Overview:Jython programs can seamlessly import and use any Java class. Except for some standard modules, Jython programs use Java classes instead of Python modules...
scripting.
The string "HEP" in the project name "jHepWork" abbreviates "High-Energy Physics". But due to a wide popularity outside this area of physics, there is a trend to call the project shortly as jWork, thus skipping the abbreviation "HEP".
License terms
The core source code of the numerical and graphical libraries is licensed by the GNU General Public LicenseGNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
. The interactive development environment (IDE) used by jHepWork has some restrictions for commercial usage since language files, documentation files, examples, installer, code-assist databases, interactive help are licensed by the creative-common license. Full members of the jHepWork project have several benefits, such as: the license for a commercial usage, access to the source repository, an extended help system, a user script repository and an access to the complete documentation.
External links
(Non-primary mirrors are not shown)
Technical manual
- Scientific Data analysis using Jython Scripting and Java. Book. 497 pp, by S.V.Chekanov (Springer-Verlag, 2010, ISBN 978-1-84996-286-5)
Examples of Jython scripts
Here is a simple example which illustrates how to fill a 2D histogram and display it on a canvas. The script also creates a figure in the PDFPortable Document Format
Portable Document Format is an open standard for document exchange. This file format, created by Adobe Systems in 1993, is used for representing documents in a manner independent of application software, hardware, and operating systems....
format.
This script illustrates how to glue and mix the native JAVA classes (from the package java.util) and jHepWork classes (the package jhplot) inside a script written using the Python syntax.
from java.util import Random
from jhplot import *
c1 = HPlot3D("Canvas")
c1.setGTitle("Global title")
c1.setNameX("X")
c1.setNameY("Y")
c1.visible
c1.setAutoRange
h1 = H2D("2D histogram",25,-3.0, 3.0,25,-3.0, 3.0)
rand = Random;
for i in range(200):
h1.fill(rand.nextGaussian,rand.nextGaussian)
c1.draw(h1);
c1.export("jhplot3d.png")
This script can be run either using jHepWork IDE or using a stand-alone Jython after specifying classpath to jHepWork libraries.
Here is the output of this script:
- http://jwork.org/jhepwork/examples/jhplot3d.png
See also
- ROOTROOTROOT is an object-oriented program and library developed by CERN. It was originally designed for particle physics data analysis and contains several features specific to this field, but it is also used in other applications such as astronomy and data mining....
– C++ data analysis framework developed at CERNCERNThe European Organization for Nuclear Research , known as CERN , is an international organization whose purpose is to operate the world's largest particle physics laboratory, which is situated in the northwest suburbs of Geneva on the Franco–Swiss border... - Java Analysis StudioJava Analysis StudioJava Analysis Studio is an object oriented data analysis package developed for the analysis of particle physics data. The latest major version is JAS3.JAS3 is particularly notable for being a fully AIDA-compliant data analysis system...
— a Java-based AIDA-compliant data analysis system