Categories:

JavaScript Kit > JavaScript Reference > Here

RegExp (regular expression) object

Updated: June 27th, 2011

Regular expressions are a powerful tool for performing pattern matches in Strings in JavaScript. You can perform complex tasks that once required lengthy procedures with just a few lines of code using regular expressions. Regular expressions are implemented in JavaScript in two ways:

Literal syntax:

//match all 7 digit numbers
var phonenumber= /\d{7}/

Dynamically, with the RegExp() constructor:

//match all 7 digit numbers (note how "\d" is defined as "\\d")
var phonenumber=new RegExp("\\d{7}", "g")

A pattern defined inside RegExp() should be enclosed in quotes, with any special characters escaped to retain its meaning (ie: "\d" must be defined as "\\d"). The RegExp() method allows you to dynamically construct the search pattern as a string, and is useful when the pattern is not known ahead of time.

Related Tutorials (highly recommended readings)

Pattern flags (switches)

Property Description Example
 i Ignore the case of characters. /The/i matches "the" and "The" and "tHe"
 g Global search for all occurrences of a pattern /ain/g matches both "ain"s in "No pain no gain", instead of just the first.
 gi Global search, ignore case. /it/gi matches all "it"s in "It is our IT department" 
 m Multiline mode. Causes ^ to match beginning of line or beginning of string. Causes $ to match end of line or end of string. JavaScript1.5+ only. /hip$/m matches "hip" as well as "hip\nhop"

Position Matching

Symbol Description Example
 ^ Only matches the beginning of a string. /^The/ matches "The" in "The night" by not "In The Night"
 $ Only matches the end of a string. /and$/ matches "and" in "Land" but not "landing"
 \b Matches any word boundary (test characters must exist at the beginning or end of a word within the string) /ly\b/ matches "ly" in "This is really cool."
 \B Matches any non-word boundary. /\Bor/ matches “or” in "normal" but not "origami."
(?=pattern) A positive look ahead. Requires that the following pattern in within the input. Pattern is not included as part of the actual match. /(?=Chapter)\d+/ matches any digits when it's proceeded by the words "Chapter", such as 2 in "Chapter 2", though not "I have 2 kids."
(?!pattern) A negative look ahead. Requires that the following pattern is not within the input. Pattern is not included as part of the actual match. /JavaScript(?! Kit)/ matches any occurrence of the word "JavaScript" except when it's inside the phrase "JavaScript Kit"

Literals

Symbol Description
Alphanumeric All alphabetical and numerical characters match themselves literally. So /2 days/ will match "2 days" inside a string.
\O Matches NUL character.
 \n Matches a new line character
 \f Matches a form feed character
 \r Matches carriage return character
 \t Matches a tab character
 \v Matches a vertical tab character
[\b] Matches a backspace.
 \xxx Matches the ASCII character expressed by the octal number xxx.

"\50" matches left parentheses character "("
 \xdd Matches the ASCII character expressed by the hex number dd.

"\x28" matches left parentheses character "("
 \uxxxx Matches the ASCII character expressed by the UNICODE xxxx.

"\u00A3" matches "".

The backslash (\) is also used when you wish to match a special character literally. For example, if you wish to match the symbol "$" literally instead of have it signal the end of the string, backslash it: /\$/ 

Character Classes

Symbol Description Example
 [xyz] Match any one character enclosed in the character set. You may use a hyphen to denote range. For example. /[a-z]/ matches any letter in the alphabet, /[0-9]/ any single digit. /[AN]BC/ matches "ABC" and "NBC" but not "BBC" since the leading “B” is not in the set.
 [^xyz] Match any one character not enclosed in the character set. The caret indicates that none of the characters

NOTE: the caret used within a character class is not to be confused with the caret that denotes the beginning of a string. Negation is only performed within the square brackets.

/[^AN]BC/ matches "BBC" but not "ABC" or "NBC".
 . (Dot). Match any character except newline or another Unicode line terminator. /b.t/ matches "bat", "bit", "bet" and so on.
 \w Match any alphanumeric character including the underscore. Equivalent to [a-zA-Z0-9_]. /\w/ matches "200" in "200%"
 \W Match any single non-word character. Equivalent to [^a-zA-Z0-9_]. /\W/ matches "%" in "200%"
 \d Match any single digit. Equivalent to [0-9].
 \D Match any non-digit. Equivalent to [^0-9]. /\D/ matches "No" in "No 342222"
 \s Match any single space character. Equivalent to [ \t\r\n\v\f].
 \S Match any single non-space character. Equivalent to [^ \t\r\n\v\f].
 

Repetition

Symbol Description Example
{x} Match exactly x occurrences of a regular expression. /\d{5}/ matches 5 digits.
{x,} Match x or more occurrences of a regular expression. /\s{2,}/ matches at least 2 whitespace characters.
{x,y} Matches x to y number of occurrences of a regular expression. /\d{2,4}/ matches at least 2 but no more than 4 digits.
? Match zero or one occurrences. Equivalent to {0,1}.

"?" can also be used following one of the quantifiers *, +, ?, or {} to make the later match non greedy, or the minimum number of times versus the default maximum. For example, using the string "He counted 12345", the expression /\d+/ matches "12345", while /\de?/ would match just "1", or the minimum match.

/a\s?b/ matches "ab" or "a b".

/\d{2,4}?/ matches "12" in the string "12345" instead of "1234" due to "?" at the end of the quantifier.

* Match zero or more occurrences. Equivalent to {0,}. /we*/ matches "w" in "why" and "wee" in "between", but nothing in "bad"
+ Match one or more occurrences. Equivalent to {1,}. /fe+d/ matches both "fed" and "feed"

Alternation & Grouping

Symbol Description Example
( ) Grouping characters together to create a clause. May be nested. /(abc)+(def)/ matches one or more occurrences of "abc" followed by one occurrence of "def".
( ) Apart from grouping characters (see above), parenthesis also serve to capture the desired subpattern within a pattern. The values of the subpatterns can then be retrieved using RegExp.$1, RegExp.$2 etc after the pattern itself is matched or compared. For example, the following matches "2 chapters" in "We read 2 chapters in 3 days", and furthermore isolates the value "2":

var mystring="We read 2 chapters in 3 days"
var needle=/(\d+) chapters/

mystring.match(needle) //matches "2 chapters"
alert(RegExp.$1) //alerts captured subpattern, or "2"

The subpattern can also be back referenced later within the main pattern. See "Back References" below.

The following finds the text "John Doe" and swaps their positions, so it becomes "Doe John":

"John Doe".replace(/(John) (Doe)/, "$2 $1")

(?:x) Matches x but does not capture it. In other words, no numbered references are created for the items within the parenthesis. /(?:.d){2}/ matches but doesn't capture "cdad".
 
x(?=y) Positive lookahead: Matches x only if it's followed by y. Note that y is not included as part of the match, acting only as a required conditon. /George(?= Bush)/ matches "George" in "George Bush" but not "George Michael" or "George Orwell".

/Java(?=Script|Hut)/ matches "Java" in "JavaScript" or "JavaHut" but not "JavaLand".

x(?!y) Negative lookahead: Matches x only if it's NOT followed by y. Note that y is not included as part of the match, acting only as a required condiiton. /^\d+(?! years)/ matches "5" in "5 days" or "5 oranges", but not "5 years".

 

| Alternation combines clauses into one regular expression and then matches any of the individual clauses. Similar to "OR" statement. /forever|young/ matches "forever" or "young"

/(ab)|(cd)|(ef)/ matches and remembers "ab" or "cd" or "ef".

Back references

Symbol Description
( )\n "\n" (where n is a number from 1 to 9) when added to the end of a regular expression pattern allows you to back reference a subpattern within the pattern, so the value of the subpattern is remembered and used as part of the matching . A subpattern is created by surrounding it with parenthesis within the pattern. Think of "\n" as a dynamic variable that is replaced with the value of the subpattern it references. For example:

/(hubba)\1/

is equivalent to the pattern /hubbahubba/, as "\1" is replaced with the value of the first subpattern within the pattern, or (hubba), to form the final pattern.

Lets say you want to match any word that occurs twice in a row, such as "hubba hubba." The expression to use would be:

/(\w+)\s+\1/

"\1" is replaced with the value of the first subpattern's match to essentially mean "match any word, followed by a space, followed by the same word again".

If there were more than one set of parentheses in the pattern string you would use \2 or \3 to match the desired subpattern based on the order of the left parenthesis for that subpattern. In the example:

/(a (b (c)))/

"\1" references (a (b (c))), "\2" references (b (c)), and "\3" references (c).

Regular Expression methods

Method Description Example
String.match(regular expression) Executes a search for a match within a string based on a regular expression. It returns an array of information or null if no match is found.

Note: Also updates the $1…$9 properties in the RegExp object.

var oldstring="Peter has 8 dollars and Jane has 15"
newstring=oldstring.match(/\d+/g)
//returns the array ["8","15"]
RegExp.exec(string) Similar to String.match() above in that it returns an array of information or null if no match is found. Unlike String.match() however, the parameter entered should be a string, not a regular expression pattern. var match = /s(amp)le/i.exec("Sample text")
//returns ["Sample","amp"]
String.replace(regular expression, replacement text) Searches and replaces the regular expression portion (match) with the replaced text instead. For the "replacement text" parameter, you can use the keywords $1 to $99 to replace the original text with values from subpatterns defined within the main pattern.

The following finds the text "John Doe" and swaps their positions, so it becomes "Doe John":

var newname="John Doe".replace(/(John) (Doe)/, "$2 $1")

The following characters carry special meaning inside "replacement text":

  • $1 to $99: References the submatched substrings inside parenthesized expressions within the regular expression. With it you can capture the result of a match and use it within the replacement text.
  • $&: References the entire substring that matched the regular expression
  • $`: References the text that proceeds the matched substring
  • $': References the text that follows the matched substring
  • $$: A literal dollar sign

The "replacement text" parameter can also be substituted with a callback function instead. See example below.

var oldstring="(304)434-5454"
newstring=oldstring.replace(/[\(\)-]/g, "")
//returns "3044345454" (removes "(", ")", and "-")
String.split (string literal or regular expression) Breaks up a string into an array of substrings based on a regular expression or fixed string. var oldstring="1,2, 3,  4,   5"
newstring=oldstring.split(/\s*,\s*/)
//returns the array ["1","2","3","4","5"]
String.search(regular expression) Tests for a match in a string. It returns the index of the match, or -1 if not found. Does NOT support global searches (ie: "g" flag not supported). "Amy and George".search(/george/i)
//returns 8
RegExp.test(string) Tests if the given string matches the Regexp, and returns true if matching, false if not. var pattern=/george/i
pattern.test("Amy and George")
//retuns true

Example- Replace "<", ">", "&" and quotes (" and ') with the equivalent HTML entity instead

function html2entities(sometext){
 var re=/[(<>"'&]/g
 arguments[i].value=sometext.replace(re, function(m){return replacechar(m)})
}

function replacechar(match){
 if (match=="<")
  return "&lt;"
 else if (match==">")
  return "&gt;"
 else if (match=="\"")
  return "&quot;"
 else if (match=="'")
  return "&#039;"
 else if (match=="&")
  return "&amp;"
}

html2entities(document.form.namefield.value) //replace "<", ">", "&" and quotes in a form field with corresponding HTML entity instead


Reference List

Partners
Right column

CopyRight 1998-2014 JavaScript Kit. NO PART may be reproduced without author's permission.