Comparison of regular expression engines
Encyclopedia
Libraries
Official website | Programming language Programming language A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely.... |
Software license | |
---|---|---|---|
Boost.Regex | Boost C++ Libraries | C++ | Boost Software License |
Boost.Xpressive | Boost C++ Libraries | C++ | Boost Software License |
CL-PPCRE | Edi Weitz | Common Lisp Common Lisp Common Lisp, commonly abbreviated CL, is a dialect of the Lisp programming language, published in ANSI standard document ANSI INCITS 226-1994 , . From the ANSI Common Lisp standard the Common Lisp HyperSpec has been derived for use with web browsers... |
BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
cppre | Jeff Stuart | C++ | GPL |
DEELX | RegExLab | C++ | "free for personal use and commercial use" |
FREJ | Fuzzy Regular Expressions for Java | Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
LGPL |
GLib GLib GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product... /GRegex |
Marco Barisione | C | LGPL |
GRETA Greta The name Greta is derived from the name Margaret, which comes from the Greek word margarites or "pearl."Greta may refer to:-People:* Gréta Arn , professional tennis player* Greta Bösel , executed Nazi concentration camp guard... |
Microsoft Research | C++ | |
ICU International Components for Unicode International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all... |
International Components for Unicode | C/C++/Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
ICU license |
Jakarta/Regexp | The Apache Jakarta Project | Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
Apache License Apache License The Apache License is a copyfree free software license authored by the Apache Software Foundation . The Apache License requires preservation of the copyright notice and disclaimer.... |
JRegex | JRegex | Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
BSD |
Oniguruma Oniguruma by K. Kosako is a BSD licensed regular expression library that supports a variety of character encodings. The Ruby programming language, since version 1.9, as well as PHP's multi-byte string module , use Oniguruma as their regular expression engine. It is also used in products such as Tera Term,... |
Kosako | C | BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
Pattwo | Stevesoft | Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... (compatible with Java 1.0) |
LGPL GNU Lesser General Public License The GNU Lesser General Public License or LGPL is a free software license published by the Free Software Foundation . It was designed as a compromise between the strong-copyleft GNU General Public License or GPL and permissive licenses such as the BSD licenses and the MIT License... |
PCRE Perl Compatible Regular Expression Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries... |
Philip Hazel | C/C++ | BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
Qt Qt (toolkit) Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers... /QRegExp |
http://doc.trolltech.com/4.7/qregexp.html | C++ | Qt GNU GPL v. 3.0 / Qt GNU LGPL v. 2.1 / Qt Commercial |
regex - Henry Spencer Henry Spencer Henry Spencer is a Canadian computer programmer and space enthusiast. He wrote "regex", a widely-used software library for regular expressions, and co-wrote C News, a Usenet server program. He also authored The Ten Commandments for C Programmers. He is coauthor, with David Lawrence, of the book... 's regular expression libraries |
ArgList | C | BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
re2 | Google Code | C++ | BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
TRE TRE (computing) TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license.... |
Ville Laurikari | C | BSD BSD licenses BSD licenses are a family of permissive free software licenses. The original license was used for the Berkeley Software Distribution , a Unix-like operating system after which it is named.... |
TPerlRegEx | TPerlRegEx VCL Component | Object Pascal Object Pascal Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:... |
MPLv1.1 Mozilla Public License The Mozilla Public License is a free and open source software license. Version 1.0 was developed by Mitchell Baker when she worked as a lawyer at Netscape Communications Corporation and version 1.1 at the Mozilla Foundation... |
TRegExpr | RegExp Studio | Object Pascal Object Pascal Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:... |
double licensed: Freeware Freeware Freeware is computer software that is available for use at no cost or for an optional fee, but usually with one or more restricted usage rights. Freeware is in contrast to commercial software, which is typically sold for profit, but might be distributed for a business or commercial purpose in the... or LGPL with static linking exception |
RGX | RGX | C++ based component library | P6R license |
Languages
Language | Official website | Software license | Remarks |
---|---|---|---|
.NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
MSDN | Proprietary | |
C++ C++0x C++11, also formerly known as C++0x, is the name of the most recent iteration of the C++ programming language, replacing C++03, approved by the ISO as of 12 August 2011... |
since ISO14822:2011(e) | ||
D D (programming language) The D programming language is an object-oriented, imperative, multi-paradigm, system programming language created by Walter Bright of Digital Mars. It originated as a re-engineering of C++, but even though it is mainly influenced by that language, it is not a variant of C++... |
D | Boost Software License Boost Software License The Boost Software License is an open-source license used by the Boost C++ Libraries. It is also a popular license for a significant number of other open source C++ projects... |
|
Go Go (programming language) Go is a compiled, garbage-collected, concurrent programming language developed by Google Inc.The initial design of Go was started in September 2007 by Robert Griesemer, Rob Pike, and Ken Thompson. Go was officially announced in November 2009. In May 2010, Rob Pike publicly stated that Go was being... |
Golang.org | BSD-style license | |
Haskell Haskell (programming language) Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the... |
Haskell.org | BSD3 | Not included in the language report; nor in GHC's Hierarchical Libraries |
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
Java | GNU General Public License GNU General Public License The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project.... |
REs are written as strings in source code (all backslashes must be doubled, hurting readability). |
JavaScript JavaScript JavaScript is a prototype-based scripting language that is dynamic, weakly typed and has first-class functions. It is a multi-paradigm language, supporting object-oriented, imperative, and functional programming styles.... /ECMAScript ECMAScript ECMAScript is the scripting language standardized by Ecma International in the ECMA-262 specification and ISO/IEC 16262. The language is widely used for client-side scripting on the web, in the form of several well-known dialects such as JavaScript, JScript, and ActionScript.- History :JavaScript... |
Limited but REs are first-class citizens of the language with a specific /.../mod syntax. |
||
Lua | Lua.org | MIT License MIT License The MIT License is a free software license originating at the Massachusetts Institute of Technology . It is a permissive license, meaning that it permits reuse within proprietary software provided all copies of the licensed software include a copy of the MIT License terms... |
Uses a simplified, limited dialect. Can be bound to a more powerful library, like PCRE or an alternative parser like LPeg. |
Object Pascal Object Pascal Object Pascal refers to a branch of object-oriented derivatives of Pascal, mostly known as the primary programming language of Embarcadero Delphi.-Early history at Apple:... (Free Pascal Free Pascal Free Pascal Compiler is a free Pascal and Object Pascal compiler.In addition to its own Object Pascal dialect, Free Pascal supports, to varying degrees, the dialects of several other compilers, including those of Turbo Pascal, Delphi, and some historical Macintosh compilers... ) |
www.freepascal.org | LGPL with static linking exception | Free Pascal 2.6+ ships with TRegExpr from Sorokin as well as with 2 other regular expression libraries. See http://wiki.lazarus.freepascal.org/Regexpr |
Objective-C Objective-C Objective-C is a reflective, object-oriented programming language that adds Smalltalk-style messaging to the C programming language.Today, it is used primarily on Apple's Mac OS X and iOS: two environments derived from the OpenStep standard, though not compliant with it... (Cocoa Cocoa (API) Cocoa is Apple's native object-oriented application programming interface for the Mac OS X operating system and—along with the Cocoa Touch extension for gesture recognition and animation—for applications for the iOS operating system, used on Apple devices such as the iPhone, the iPod Touch, and... on iOS only) |
Apple | Proprietary Proprietary software Proprietary software is computer software licensed under exclusive legal right of the copyright holder. The licensee is given the right to use the software under certain conditions, while restricted from other uses, such as modification, further distribution, or reverse engineering.Complementary... |
Currently only available on iOS 4+ |
OCaml | Caml | LGPL | |
Perl Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular... |
Perl.com | Artistic License Artistic License The Artistic License refers most commonly to the original Artistic License , a software license used for certain free and open source software packages, most notably the standard Perl implementation and most CPAN modules, which are dual-licensed under the Artistic License and the GNU General Public... or the GNU General Public License GNU General Public License The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project.... |
Full, central part of the language. |
PHP PHP PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document... |
PHP.net | PHP License PHP License The PHP License is the software license under which the PHP scripting language is released. The PHP License is a non-copyleft free software license according to the Free Software Foundation and an open source license according to the Open Source Initiative... |
Has two implementations, with PCRE being the more efficient (speed, functionalities). |
Python Python (programming language) Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive... |
python.org | Python Software Foundation License Python Software Foundation License The Python Software Foundation License is a BSD-style, permissive free software license which is compatible with the GNU General Public License . Its primary use is for distribution of the Python project software... |
|
Ruby Ruby (programming language) Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto... |
ruby-doc.org | GNU Library General Public License | Ruby 1.8 and 1.9 use different engines; Ruby 1.9 integrates Oniguruma. |
SAP ABAP | SAP.com | ||
Tcl Tcl Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own... 8.4 |
tcl.tk | Tcl/Tk License (Permissive, similar to BSD) |
|
ActionScript ActionScript ActionScript is an object-oriented language originally developed by Macromedia Inc. . It is a dialect of ECMAScript , and is used primarily for the development of websites and software targeting the Adobe Flash Player platform, used on Web pages in the form of... 3 |
|||
Language features
NOTE: An application using a library for regular expression support does not necessarily offer the full set of features of the library, e.g. GNU Grep which uses PCRE does not offer lookaheadsupport, though PCRE does.
Part 1
"+" quantifier | Negated character classes | Recursion | Lookahead | Lookbehind | >9 indexable captures | ||||
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | |||||||||
Boost.Xpressive | |||||||||
CL-PPCRE | |||||||||
EmEditor EmEditor EmEditor is a lightweight extensible commercial text editor for Microsoft Windows. It was developed by Yutaka Emura of Emurasoft, Inc. EmEditor includes full Unicode support, 32-bit and 64-bit builds, syntax highlighting, find and replace with regular expressions, vertical selection editing,... |
|||||||||
FREJ | |||||||||
GLib GLib GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product... /GRegex |
|||||||||
GNU Grep Grep grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p... |
|||||||||
Haskell Haskell (programming language) Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the... |
|||||||||
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
|||||||||
ICU International Components for Unicode International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all... Regex |
|||||||||
JGsoft | |||||||||
.NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
|||||||||
OCaml | |||||||||
OmniOutliner OmniOutliner OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple... 3.6.2 |
|||||||||
PCRE Perl Compatible Regular Expression Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries... |
|||||||||
Perl Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular... |
|||||||||
PHP PHP PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document... |
|||||||||
Python Python (programming language) Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive... |
|||||||||
Qt Qt (toolkit) Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers... /QRegExp |
|||||||||
re2 | |||||||||
Ruby Ruby (programming language) Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto... |
|||||||||
TRE TRE (computing) TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license.... |
|||||||||
Vim Vim (text editor) Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface... |
|||||||||
RGX | |||||||||
TRegExpr |
Part 2
Conditionals | Comments | Embedded code | Fuzzy matching | Unicode Unicode Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems... property support http://www.unicode.org/reports/tr18/ |
|||||
---|---|---|---|---|---|---|---|---|---|
Boost.Regex | |||||||||
Boost.Xpressive | |||||||||
CL-PPCRE | |||||||||
EmEditor EmEditor EmEditor is a lightweight extensible commercial text editor for Microsoft Windows. It was developed by Yutaka Emura of Emurasoft, Inc. EmEditor includes full Unicode support, 32-bit and 64-bit builds, syntax highlighting, find and replace with regular expressions, vertical selection editing,... |
|||||||||
FREJ | |||||||||
GLib GLib GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product... /GRegex |
|||||||||
GNU Grep Grep grep is a command-line text-search utility originally written for Unix. The name comes from the ed command g/re/p... |
|||||||||
Haskell Haskell (programming language) Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the... |
|||||||||
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
|||||||||
ICU International Components for Unicode International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all... Regex |
|||||||||
JGsoft | |||||||||
.NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
|||||||||
OCaml | |||||||||
OmniOutliner OmniOutliner OmniOutliner is commercial outlining software for Mac OS X produced by The Omni Group. OmniOutliner has most features of a conventional outliner, allowing the user to create nested lists of topics for almost any purpose, but has additional features extending its functionality beyond simple... 3.6.2 |
|||||||||
PCRE Perl Compatible Regular Expression Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries... |
|||||||||
Perl Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular... |
|||||||||
PHP PHP PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document... |
|||||||||
Python Python (programming language) Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive... |
|||||||||
Qt Qt (toolkit) Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers... /QRegExp |
|||||||||
re2 | ? | ||||||||
Ruby Ruby (programming language) Ruby is a dynamic, reflective, general-purpose object-oriented programming language that combines syntax inspired by Perl with Smalltalk-like features. Ruby originated in Japan during the mid-1990s and was first developed and designed by Yukihiro "Matz" Matsumoto... |
|||||||||
TRE TRE (computing) TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license.... |
|||||||||
Vim Vim (text editor) Vim is a text editor written by Bram Moolenaar and first released publicly in 1991. Based on the vi editor common to Unix-like systems, Vim is designed for use both from a command line interface and as a standalone application in a graphical user interface... |
|||||||||
RGX |
API features
Native UTF-16 support | Native UTF-8 UTF-8 UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks... support |
Non-linear input support | Dot-matches-newline option | Anchor-matches-newline option | |
---|---|---|---|---|---|
Boost.Regex | |||||
GLib GLib GLib is a cross-platform software utility library that began as part of the GTK+ project. However, before releasing version 2 of GTK+, the project's developers decided to separate non-GUI-specific code from the GTK+ platform, thus creating GLib as a separate product... /GRegex |
|||||
ICU International Components for Unicode International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all... Regex |
|||||
Java Java (programming language) Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities... |
|||||
.NET .NET Framework The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability... |
|||||
PCRE Perl Compatible Regular Expression Perl Compatible Regular Expressions is a regular expression C library inspired by Perl's external interface, written by Philip Hazel. PCRE's syntax is much more powerful and flexible than either of the POSIX regular expression flavors and many classic regular expression libraries... |
|||||
Qt Qt (toolkit) Qt is a cross-platform application framework that is widely used for developing application software with a graphical user interface , and also used for developing non-GUI programs such as command-line tools and consoles for servers... /QRegExp |
|||||
TRE TRE (computing) TRE is an open-source library for texts search, which works like regular expression engine with ability of fuzzy string searching. It is developed by Ville Laurikari under 2-clause BSD-like license.... |
|||||
RGX |
External links
- Regular Expression Flavor Comparison — Detailed comparison of the most popular regular expression flavors
- Regexp Syntax Summary