GLite
Encyclopedia
gLite is a middleware
computer software project for grid computing
used by the CERN
LHC
experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet
. The gLite services were adopted by more than 250 computing centres and used by more than 15000 researchers in Europe and around the world.
(LCG-2) distribution was reached in May 2006 when gLite 3.0 was released and became the official middleware of the Enabling Grids for E-sciencE (EGEE) project which ended in 2010.
The Grid Security Infrastructure (GSI) in WLCG
/EGEE enables secure authentication and communication over an open network. GSI is based on public key encryption, X.509
certificates, and the Secure Sockets Layer (SSL) communication protocol, with extensions for single sign-on and delegation.
To authenticate himself, a user needs to have a digital X.509 certificate issued by a Certification Authority (CA) trusted by the infrastructure running the middleware.
The authorisation of a user on a specific Grid resource can be done in two different ways. The first is simpler, and relies on the grid-mapfile mechanism. The second way relies on the Virtual Organisation Membership Service (VOMS) and the LCAS/LCMAPS mechanism, which allow for a more detailed definition of user privileges.
There are two GG implementations in gLite 3.1: the LCG CE, developed by EDG and used in LCG-22, and the gLite CE, developed by EGEE. Sites can choose what to install, and some of them provide both types. The GG is responsible for accepting jobs and dispatching them for execution on the WNs via the LRMS.
In gLite 3.1 supported LRMS types were OpenPBS
/PBSPro, Platform LSF, Maui/Torque, BQS and Condor, and Sun Grid Engine
.
Storage Elements can support different data access protocols and interfaces. Simply speaking, GSIFTP (a GSI-secure FTP) is the protocol for whole-file transfers, while local and remote file access is performed using RFIO or gsidcap.
Most storage resources are managed by a Storage Resource Manager
(SRM), a middleware service providing capabilities like transparent file migration from disk to tape, file pinning, space reservation, etc. However, different SEs may support different versions of the SRM protocol and the capabilities can vary.
There is a number of SRM implementations in use, with varying capabilities. The Disk Pool Manager (DPM) is used for fairly small SEs with disk-based storage only, while CASTOR is designed to manage large-scale MSS, with front-end disks and back-end tape storage. dCache is targeted at both MSS and large-scale disk array storage systems. Other SRM implementations are in development, and the SRM protocol specification itself is also evolving.
Classic SEs, which do not have an SRM interface, provide a simple disk-based storage model. They are in the process of being phased out.
Much of the data published to the IS conforms to the GLUE Schema, which defines a common conceptual data model to be used for Grid resource monitoring and discovery.
The Information System that is used in gLite 3.1 inherits its main concepts from the Globus Monitoring and Discovery Service (MDS). However, the GRIS and GIIS in MDS has been replaced by the Berkeley Database Information Index which is essentially an OpenLDAP
server that is updated by an external process.
Jobs to be submitted are described using the Job Description Language (JDL), which specifies, for example, which executable to run and its parameters, files to be moved to and from the Worker Node on which the job is run, input Grid files needed, and any requirements on the CE and the Worker Node.
The choice of CE to which the job is sent is made in a process called match-making, which first selects, among all available CEs, those which fulfill the requirements expressed by the user and which are close to specified input Grid files. It then chooses the CE with the highest rank, a quantity derived from the CE status information which expresses the goodness of a CE (typically a function of the numbers of running and queued jobs).
The RB locates the Grid input files specified in the job description using a service called the Data Location Interface (DLI), which provides a generic interface to a file catalogue. In this way, the Resource Broker can talk to file catalogues other than LFC (provided that they have a DLI interface).
The most recent implementation of the WMS from EGEE allows not only the submission of single jobs, but also collections of jobs (possibly with dependencies between them) in a much more efficient way then the old LCG-2 WMS, and has many other new options.
Finally, the Logging and Bookkeeping service (LB) tracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.
Middleware
Middleware is computer software that connects software components or people and their applications. The software consists of a set of services that allows multiple processes running on one or more machines to interact...
computer software project for grid computing
Grid computing
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files...
used by the CERN
CERN
The European Organization for Nuclear Research , known as CERN , is an international organization whose purpose is to operate the world's largest particle physics laboratory, which is situated in the northwest suburbs of Geneva on the Franco–Swiss border...
LHC
LHC
LHC may refer to:* Large Hadron Collider, a particle accelerator and collider located on the Franco-Swiss border near Geneva, SwitzerlandLHC also may refer to:* La hora Chanante, a Spanish comedy television show...
experiments and other scientific domains. It was implemented by collaborative efforts of more than 80 people in 12 different academic and industrial research centers in Europe. gLite provides a framework for building applications tapping into distributed computing and storage resources across the Internet
Internet
The Internet is a global system of interconnected computer networks that use the standard Internet protocol suite to serve billions of users worldwide...
. The gLite services were adopted by more than 250 computing centres and used by more than 15000 researchers in Europe and around the world.
History
After prototyping phases in 2004 and 2005, convergence with the LHC Computing GridLHC Computing Grid
The Worldwide LHC Computing Grid is a computer network designed by CERN to handle the massive amounts of data produced by the Large Hadron Collider .-Description:A design report was published in 2005....
(LCG-2) distribution was reached in May 2006 when gLite 3.0 was released and became the official middleware of the Enabling Grids for E-sciencE (EGEE) project which ended in 2010.
Security
The gLite user community is grouped into Virtual Organisations (VOs). A user must join a VO supported by the infrastructure running gLite to be authenticated and authorized to using grid resources.The Grid Security Infrastructure (GSI) in WLCG
LHC Computing Grid
The Worldwide LHC Computing Grid is a computer network designed by CERN to handle the massive amounts of data produced by the Large Hadron Collider .-Description:A design report was published in 2005....
/EGEE enables secure authentication and communication over an open network. GSI is based on public key encryption, X.509
X.509
In cryptography, X.509 is an ITU-T standard for a public key infrastructure and Privilege Management Infrastructure . X.509 specifies, amongst other things, standard formats for public key certificates, certificate revocation lists, attribute certificates, and a certification path validation...
certificates, and the Secure Sockets Layer (SSL) communication protocol, with extensions for single sign-on and delegation.
To authenticate himself, a user needs to have a digital X.509 certificate issued by a Certification Authority (CA) trusted by the infrastructure running the middleware.
The authorisation of a user on a specific Grid resource can be done in two different ways. The first is simpler, and relies on the grid-mapfile mechanism. The second way relies on the Virtual Organisation Membership Service (VOMS) and the LCAS/LCMAPS mechanism, which allow for a more detailed definition of user privileges.
User interface
The access point to the gLite Grid is the User Interface (UI). This can be any machine where users have a personal account and where their user certificate is installed. From a UI, a user can be authenticated and authorized to use the WLCG/EGEE resources, and can access the functionalities offered by the Information, Workload and Data management systems. It provides CLI tools to perform some basic Grid operations:- list all the resources suitable to execute a given job;
- submit jobs for execution;
- cancel jobs;
- retrieve the output of finished jobs;
- show the status of submitted jobs;
- retrieve the logging and bookkeeping information of jobs;
- copy, replicate and delete files from the Grid;
- retrieve the status of different resources from the Information System.
Computing element
A Computing Element (CE), in Grid terminology, is some set of computing resources localized at a site (i.e. a cluster, a computing farm). A CE includes a Grid Gate (GG)1, which acts as a generic interface to the cluster; a Local Resource Management System (LRMS) (sometimes called batch system), and the cluster itself, a collection of Worker Nodes (WNs), the nodes where the jobs are run.There are two GG implementations in gLite 3.1: the LCG CE, developed by EDG and used in LCG-22, and the gLite CE, developed by EGEE. Sites can choose what to install, and some of them provide both types. The GG is responsible for accepting jobs and dispatching them for execution on the WNs via the LRMS.
In gLite 3.1 supported LRMS types were OpenPBS
Portable Batch System
Portable Batch System is the name of computer software that performs job scheduling. Its primary task is to allocate computational tasks, i.e., batch jobs, among the available computing resources...
/PBSPro, Platform LSF, Maui/Torque, BQS and Condor, and Sun Grid Engine
Sun Grid Engine
Oracle Grid Engine, previously known as Sun Grid Engine , previously known as CODINE or GRD , is an open source batch-queuing system, developed and supported by Sun Microsystems...
.
Storage element
A Storage Element (SE) provides uniform access to data storage resources. The Storage Element may control simple disk servers, large disk arrays or tape-based Mass Storage Systems (MSS). Most WLCG/EGEE sites provide at least one SE.Storage Elements can support different data access protocols and interfaces. Simply speaking, GSIFTP (a GSI-secure FTP) is the protocol for whole-file transfers, while local and remote file access is performed using RFIO or gsidcap.
Most storage resources are managed by a Storage Resource Manager
Storage Resource Manager
The Storage Resource Management technology was initiated by the Scientific Data Management Group at LBNL and developed in response to growing needs of managing large datasets on a variety of storage systems...
(SRM), a middleware service providing capabilities like transparent file migration from disk to tape, file pinning, space reservation, etc. However, different SEs may support different versions of the SRM protocol and the capabilities can vary.
There is a number of SRM implementations in use, with varying capabilities. The Disk Pool Manager (DPM) is used for fairly small SEs with disk-based storage only, while CASTOR is designed to manage large-scale MSS, with front-end disks and back-end tape storage. dCache is targeted at both MSS and large-scale disk array storage systems. Other SRM implementations are in development, and the SRM protocol specification itself is also evolving.
Classic SEs, which do not have an SRM interface, provide a simple disk-based storage model. They are in the process of being phased out.
Information service
The Information Service (IS) provides information about the WLCG/EGEE Grid resources and their status. This information is essential for the operation of the whole Grid, as it is via the IS that resources are discovered. The published information is also used for monitoring and accounting purposes.Much of the data published to the IS conforms to the GLUE Schema, which defines a common conceptual data model to be used for Grid resource monitoring and discovery.
The Information System that is used in gLite 3.1 inherits its main concepts from the Globus Monitoring and Discovery Service (MDS). However, the GRIS and GIIS in MDS has been replaced by the Berkeley Database Information Index which is essentially an OpenLDAP
OpenLDAP
OpenLDAP Software is a free, open source implementation of the Lightweight Directory Access Protocol developed by the OpenLDAP Project. It is released under its own BSD-style license called the OpenLDAP Public License. LDAP is a platform-independent protocol. Several common Linux distributions...
server that is updated by an external process.
Workload management
The purpose of the Workload Management System (WMS) is to accept user jobs, to assign them to the most appropriate Computing Element, to record their status and retrieve their output. The Resource Broker (RB) is the machine where the WMS services run.Jobs to be submitted are described using the Job Description Language (JDL), which specifies, for example, which executable to run and its parameters, files to be moved to and from the Worker Node on which the job is run, input Grid files needed, and any requirements on the CE and the Worker Node.
The choice of CE to which the job is sent is made in a process called match-making, which first selects, among all available CEs, those which fulfill the requirements expressed by the user and which are close to specified input Grid files. It then chooses the CE with the highest rank, a quantity derived from the CE status information which expresses the goodness of a CE (typically a function of the numbers of running and queued jobs).
The RB locates the Grid input files specified in the job description using a service called the Data Location Interface (DLI), which provides a generic interface to a file catalogue. In this way, the Resource Broker can talk to file catalogues other than LFC (provided that they have a DLI interface).
The most recent implementation of the WMS from EGEE allows not only the submission of single jobs, but also collections of jobs (possibly with dependencies between them) in a much more efficient way then the old LCG-2 WMS, and has many other new options.
Finally, the Logging and Bookkeeping service (LB) tracks jobs managed by the WMS. It collects events from many WMS components and records the status and history of the job.
External links
- gLite official Web site
- gLite 3 User Manual
Software components
Some gLite components and services with the contributing partners:- VOMSVomsVOMS is an acronym used for Virtual Organization Membership Service in grid computing. It is structured as a simple account database with fixed formats for the information exchange and features single login, expiration time, backward compatibility, and multiple virtual organizations...
and VOMSAdmin (INFN) - Proxy and attribute certificate renewal (CESNETCESNETCESNET is Czech Republic's National Research and Education Network operator. It has been founded in 1996 by universities and the Academy of Sciences of the Czech Republic. Association is a successor of activity CESNET provided by Czech Technical University, started as FESNET in 1992.CESNET has...
) - Shibboleth interoperability: SLCS, VASH, STS (SWITCHSwitchIn electronics, a switch is an electrical component that can break an electrical circuit, interrupting the current or diverting it from one conductor to another....
) - LCAS/LCMAPS (NIKHEFNIKHEFNIKHEF is an acronym for Nationaal Instituut voor Kernfysica en Hoge-Energiefysica . Nowadays this acronym is not used anymore and the name is changed to Nationaal instituut voor subatomaire fysica...
) - gLExec (NIKHEF)
- Delegation Framework (CERNCERNThe European Organization for Nuclear Research , known as CERN , is an international organization whose purpose is to operate the world's largest particle physics laboratory, which is situated in the northwest suburbs of Geneva on the Franco–Swiss border...
, HIP, STFCSTFCSTFC can stand for:* Science and Technology Facilities Council, a UK research council created by the merger of the Council for the Central Laboratory of the Research Councils and the Particle Physics and Astronomy Research Council on 1 April 2007* Star Trek: First Contact, a 1996 filmSTFC may...
) - CGSI_gSOAP (CERN)
- gsoap-plugin (CESNET)
- Trustmanager (HIP)
- Util-java (HIP)
- Gridsite (STFC)
- Authorization Framework (HIP, INFN, NIKHEF, SWITCH)
- BDIIBDIIThe BDII, which stands for Berkeley Database Information Index, consists of a standard LDAP server which is updated by an external process. The update process obtains LDIF from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the...
(CERN) - Grid Laboratory Uniform EnvironmentGrid Laboratory Uniform EnvironmentGLUE, which stands for Grid Laboratory Uniform Environment, is a technology-agnostic information model for a uniform representation of Grid resources.-GLUE 1.3:...
(CERN) - R-GMA (STFC)
- CREAM (INFN)
- CEMon (INFN)
- BLAH (INFN)
- WMS (INFN, ElsagDatamat)
- LB (CESNET)
- DPM (CERN)
- GFAL (CERN)
- LFC (CERN)
- FTS (CERN)
- lcg_utils (CERN)
- EDS and Hydra (HIP)
- AMGAGLite-AMGAThe ARDA Metadata Grid Application is a general purpose metadata catalogue and part of the gLite middleware distribution. It was developed by the EGEE project, when it became clear that many Grid applications needed metadata information on files and to organize a work-flow...
(CERN, KISTI, INFN)