ECL, data-centric programming language for Big Data
Encyclopedia
ECL is a declarative, data centric programming language designed in 2000 to allow a team of programmers to process Big Data across a high performance computing cluster without the programmer being involved in many of the lower level, imperative decisions.
and then again as a major source of synergies when LexisNexis acquired ChoicePoint Inc.
‘Hello World’.
Perhaps a more flavorful example would take a list of strings, sort them into order, and then return that as a result instead.
The statements containing a := are defined in ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”
OUTPUT(SD)
What is an SD?
SD := SORT(D,Value);
SD is a D that has been sorted by ‘Value’
What is a D?
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
D is a dataset with one column labeled ‘Value’ and containing the following list of data.
implementation, by default, most ECL constructs will execute in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.
History
ECL was initially designed and developed in 2000 by David Bayliss as an in-house productivity tool within Seisint Inc and was considered to be a ‘secret weapon’ that allowed Seisint to gain market share in its data business. Equifax had an SQL-based process for predicting who would go bankrupt in the next 30 days, but it took 26 days to run the data. The first ECL implementation solved the same problem in 6 minutes. The technology was cited as a driving force behind the acquisition of Seisint by LexisNexisLexisNexis
LexisNexis Group is a company providing computer-assisted legal research services. In 2006 it had the world's largest electronic database for legal and public-records related information...
and then again as a major source of synergies when LexisNexis acquired ChoicePoint Inc.
Language Constructs
ECL, at least in its purest form, is a declarative, data centric language. Programs, in the strictest sense, do not exist. Rather an ECL application will specify a number of core datasets (or data values) and then the operations which are to be performed on those values.Hello world
ECL is to have succinct solutions to problems and sensible defaults. The ‘Hello World’ program is characteristically short:‘Hello World’.
Perhaps a more flavorful example would take a list of strings, sort them into order, and then return that as a result instead.
// First declare a dataset with one column containing a list of strings
// Datasets can also be binary, csv, xml or externally defined structures
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
SD := SORT(D,Value);
output(SD)
The statements containing a := are defined in ECL as attribute definitions. They do not denote an action; rather a definition of a term. Thus, logically, an ECL program can be read: “bottom to top”
OUTPUT(SD)
What is an SD?
SD := SORT(D,Value);
SD is a D that has been sorted by ‘Value’
What is a D?
D := DATASET([{'ECL'},{'Declarative'},{'Data'},{'Centric'},{'Programming'},{'Language'}],{STRING Value;});
D is a dataset with one column labeled ‘Value’ and containing the following list of data.
ECL Primitives
ECL primitives that act upon datasets include: SORT, ROLLUP, DEDUP, ITERATE, PROJECT, JOIN, NORMALIZE, DENORMALIZE, PARSE, CHOSEN, ENTH, TOPN, DISTRIBUTEECL Encapsulation
Whilst ECL is terse and LexisNexis claims that 1 line of ECL is roughly equivalent to 120 lines of C++ it still has significant support for large scale programming including data encapsulation and code re-use. The constructs available include: MODULE, FUNCTION, INTERFACE, MACRO, EXPORT, SHAREDSupport for Parallelism in ECL
In the HPCCHPCC
HPCC , also known as DAS , is a Data Intensive Computing system platform developed by LexisNexis Risk Solutions. The HPCC platform incorporates a software architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing Big...
implementation, by default, most ECL constructs will execute in parallel across the hardware being used. Many of the primitives also have a LOCAL option to specify that the operation is to occur locally on each node.
Comparison to Map-Reduce
The Hadoop Map-Reduce paradigm actually consists of three phases which correlate to ECL primitives as follows.Hadoop Name/Term | ECL equivalent | Comments |
---|---|---|
MAPing within the MAPper | PROJECT/TRANSFORM | Takes a record and coverts to a different format; in the Hadoop Hadoop Apache Hadoop is a software framework that supports data-intensive distributed applications under a free license. It enables applications to work with thousands of nodes and petabytes of data... case the conversion is into a key-value pair |
SHUFFLE (Phase 1) | DISTRIBUTE) | The records from the mapper are distributed dependent upon the KEY value |
SHUFFLE (Phase 2) | SORT | The records arriving at a particular reducer are sorted into KEY order |
REDUCE | ROLLUP | The records for a particular KEY value are now combined together |