About Regex Tester
Regular expressions (regex) are powerful patterns for matching and manipulating text. Whether you're validating input, extracting data, replacing text, or parsing files, our Regex Tester helps you write, test, and debug regex patterns instantly. Supports JavaScript regex syntax with live matching, capture groups, lookahead/lookbehind assertions, and detailed match explanations. Test against sample text, see all matches highlighted, extract capture groups, and preview replacements before applying them. All processing happens locally in your browser.
What is Regular Expression (Regex)?
A regular expression is a pattern used to match character combinations in strings. Regex allows you to define search patterns, validate input formats, extract information from text, and perform find-and-replace operations. The pattern is composed of: (1) Literal characters (exact matches): "cat" matches "cat" in "The cat sat". (2) Character classes (define character groups): [a-z] matches any lowercase letter. (3) Quantifiers (specify how many): + means one or more, * means zero or more. (4) Anchors (match positions): ^ matches start of line, $ matches end. (5) Alternation (either/or): cat|dog matches either "cat" or "dog". (6) Groups (capture or organize): (color|colour) matches both spellings. Regular expressions are supported in virtually all programming languages (JavaScript, Python, Java, C#, PHP, Ruby, Go, Rust, etc.) with slight syntax variations. In JavaScript, regex uses forward slashes: /pattern/flags. Regex is immensely powerful but has a learning curve - "some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems." (Jamie Zawinski). Mastering regex patterns saves countless hours of manual string processing.
How to Use This Tool
- Enter Regex Pattern: Type your regex pattern (e.g., /\d+/g for all numbers)
- Choose Flags: Select flags (global, case-insensitive, multiline, dotall, sticky)
- Enter Test String: Paste text you want to test the pattern against
- See Live Matches: All matches highlight in real-time as you type
- View Capture Groups: See extracted groups for each match
- Test Replacement: Enter replacement text and preview the result
- Explain Pattern: Get breakdown of what each part does
- Save Patterns: Store frequently used patterns for quick access
- Common Patterns: Browse library of pre-made regex patterns
- Language Support: Test how pattern works in different languages
- Performance Metrics: See how many iterations, time taken
- Export: Copy pattern or test results to clipboard
Regex Pattern Syntax Fundamentals
Regex patterns consist of literal characters and special metacharacters. Literals match exactly: the pattern "hello" matches only that string. Metacharacters have special meaning and must be escaped with backslash if you want literal: . * + ? [ ] { } ( ) ^ $ | \ . (dot) matches any character except newline. * (asterisk) matches zero or more of previous element. + (plus) matches one or more of previous element. ? (question mark) matches zero or one of previous element. [abc] matches any character in brackets (a or b or c). [^abc] matches any character NOT in brackets. [a-z] matches range from a to z. (group) creates a capture group, useful for extracting parts. | (pipe) means OR - alternative patterns. ^ (caret) matches start of string. $ (dollar) matches end of string. \ (backslash) escapes the next character. Combinations work together: hello+ matches "hello", "helloo", "hellooo" etc. [a-z]+ matches one or more lowercase letters (whole words). d+ matches one or more digits (positive integers). \w+ matches one or more word characters (letters, digits, underscore). \s matches whitespace (space, tab, newline). Patterns are read left-to-right, matching greedily by default (takes as much as possible) unless you use quantifiers like *?, +?, ?? (lazy versions).
Character Classes and Escape Sequences
- \d matches any digit (0-9), \D matches any non-digit
- \w matches any word character (a-z, A-Z, 0-9, _), \W matches non-word
- \s matches whitespace (space, tab, newline, carriage return), \S matches non-whitespace
- \b matches word boundary (between word and non-word), \B matches non-boundary
- [a-z] matches lowercase, [A-Z] matches uppercase, [0-9] matches digits
- [a-zA-Z0-9_] matches word characters (same as \w in most engines)
- [^...] negated class - matches anything NOT in brackets
- . (dot) matches any character except newline (or any if dotall flag)
- [abc] character class - matches a, b, or c
- [a-c] character range - matches a, b, or c
- [a-zA-Z0-9] alphanumeric - letters and digits
- \d+\.\d+ matches decimal numbers (escaped dot = literal dot)
Quantifiers: Controlling Matches
Quantifiers specify how many times an element should match. * (zero or more): a* matches "", "a", "aa", "aaa" etc. + (one or more): a+ matches "a", "aa", "aaa" but NOT "" (empty). ? (zero or one): a? matches "" or "a" but nothing else. {n} (exactly n): a{3} matches exactly "aaa". {n,} (n or more): a{2,} matches "aa", "aaa", "aaaa" etc. {n,m} (between n and m): a{2,4} matches "aa", "aaa", or "aaaa". Greedy vs Lazy: By default, quantifiers are greedy (match as much as possible). a+ in "aaaaaa" matches all 6 a's. Use ? after quantifier to make lazy (match as little as possible): a+? in "aaaaaa" matches just one "a". Example: <.*> matches entire <h1>hello</h1> (greedy), but <.*?> matches just <h1> (lazy). Greedy matching often causes unexpected results with wildcards. Common pattern: <div>(.*?)</div> uses lazy matching to extract content between tags. Quantifiers apply to the element immediately before them: abc+ matches "ab", "abc", "abcc" (c repeated), not "abcabc". Use groups for repeating multiple characters: (abc)+ matches "abc", "abcabc", "abcabcabc" etc.
Groups, Capture Groups, and Backreferences
Groups use parentheses () to group elements together or capture matched text. (abc) creates a group matching "abc". Capture groups are numbered starting at 1: first group is \1, second is \2, etc. Groups are captured even if not matched (captured as undefined). Non-capturing group (?:...) groups without capturing - useful when you only need grouping. Example: ([a-z]+)@([a-z]+\.com) captures email parts - group 1 is username, group 2 is domain. In JavaScript: "test@example.com".match(/([a-z]+)@([a-z]+\.[a-z]+)/); returns array with full match and two captured groups. Backreferences refer back to captured groups: (\d+)\s\1 matches repeated digits like "123 123" but NOT "123 456". Used for finding duplicates or repeated patterns. Replace with groups: "firstName lastName".replace(/(\w+)\s(\w+)/, "$2, $1"); produces "lastName, firstName". Named capture groups (?<name>...): JavaScript supports capturing with names - useful for clarity. (?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2}) captures date parts with names. Accessing: match.groups.year gives you the year. Nested groups: ((a)(b)) creates 3 groups - group 1 is entire match, group 2 is "a", group 3 is "b".
Anchors and Assertions
Anchors match positions, not characters. ^ (caret) matches the start of string. $ (dollar) matches the end of string. \b (word boundary) matches between word and non-word characters. \B (non-word boundary) matches where word boundary doesn't exist. Without anchors, pattern abc matches anywhere: "xabcx" has a match. ^abc anchors to start, so only "abcdef" matches, not "xabcdef". abc$ anchors to end, so only "xabc" matches, not "xabcy". ^abc$ anchors to both start and end, so ONLY exactly "abc" matches. Lookahead (?=...) asserts that what follows matches pattern without consuming it: \d+(?=px) matches digits followed by "px" but "px" is not included in match. Lookbehind (?<=...) asserts what precedes matches pattern: (?<=\$)\d+ matches digits preceded by "$" but "$" not included. Negative lookahead (?!...) asserts what follows does NOT match: \d+(?!px) matches digits NOT followed by "px". Negative lookbehind (?<!...) asserts what precedes does NOT match: (?<!\$)\d+ matches digits NOT preceded by "$". Example: match password with 8+ chars, 1 uppercase, 1 number: ^(?=.*[A-Z])(?=.*\d).{8,}$ uses multiple lookaheads. Assertions don't consume characters, so remaining pattern continues from same position.
Regex Flags and Modifiers
- g (global) - Match all occurrences, not just first one. Without g: "aaa".match(/a/) returns ["a"]. With g: "aaa".match(/a/g) returns ["a", "a", "a"]
- i (case-insensitive) - Match ignoring case. /hello/i matches "hello", "HELLO", "HeLLo"
- m (multiline) - Treat ^ and $ as line start/end, not string start/end. Without m: pattern ^test only matches string start. With m: pattern ^test matches after any newline
- s (dotall) - Make . (dot) match newlines. Without s: . matches any char except \n. With s: . matches everything including \n
- u (unicode) - Enable Unicode support, properly handle emoji and extended characters
- y (sticky) - Match must start at current lastIndex position. Used for tokenizing
- In JavaScript: /pattern/flags or new RegExp("pattern", "flags")
- Multiple flags: /pattern/gi means global and case-insensitive
- Most useful combinations: /pattern/g (all matches), /pattern/gi (all matches, case-insensitive)
- Flags change behavior significantly - always consider which flags needed
Common Regex Patterns and Use Cases
- Email: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ (basic validation, not RFC compliant)
- Phone: /\d{3}-\d{3}-\d{4}/ matches XXX-XXX-XXXX format
- URL: /^https?:\/\/.+/ matches http or https URLs (simplified)
- IP Address: /\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}/ matches IPv4 (no validation)
- Hex Color: /#[0-9a-f]{6}/i matches #RRGGBB color codes
- Date (YYYY-MM-DD): /\d{4}-\d{2}-\d{2}/ (format check only)
- Number (with decimals): /^-?\d+(\.\d+)?$/ matches -123, 45.67, 0
- HTML Tags: /<[^>]+>/g matches any HTML tag
- Remove Extra Spaces: /\s+/g replaces multiple spaces with single space
- Capitalize Words: /\b(\w)/g with replacement \1.toUpperCase()
- Extract Numbers: /\d+/g gets all number sequences
- Password Strength: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)[a-zA-Z\d]{8,}$/ requires lowercase, uppercase, digit, 8+ chars
Lookahead and Lookbehind Assertions (Advanced)
Lookahead and lookbehind assertions (lookarounds) are zero-width assertions - they don't consume characters but check if pattern exists. Positive lookahead (?=pattern): matches if pattern follows, without including pattern in match. Example: /\d+(?=px)/ matches "10px" capturing only "10", not "px". This is useful when you want digits only if followed by specific unit. Negative lookahead (?!pattern): matches if pattern does NOT follow. Example: /\d+(?!px)/ matches numbers NOT followed by "px". Positive lookbehind (?<=pattern): matches if pattern precedes. Example: /(?<=\$)\d+/ matches "10" in "$10" but not "โฌ10". JavaScript supports lookbehind since ES2018, but some older environments don't. Negative lookbehind (?<!pattern): matches if pattern does NOT precede. Example: /(?<!\$)\d+/ matches numbers not preceded by "$". Multiple lookarounds: /^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$])(?!.*abc).{8,}$/ enforces multiple password requirements in one pattern. Lookarounds are powerful for complex validation but can hurt readability - use comments or explanation in production code. Lookarounds cause performance issues on very large texts - use simpler patterns if possible. Many regex engines limit lookaround nesting - test if your engine supports complex lookarounds.
Regex Performance and Optimization
Regex performance depends on: (1) Pattern complexity - simpler patterns faster. (2) String length - longer strings take longer. (3) Number of matches - finding all matches slower than first. (4) Backtracking - when regex engine tries many paths before matching/failing. Catastrophic backtracking (ReDoS - Regular Expression Denial of Service): pattern like (a+)+b against "aaaaaaaaac" causes exponential backtracking, taking seconds for small strings. Avoid nested quantifiers: /(a+)+/ is dangerous, /(a)+/ is safe. Use atomic groups or possessive quantifiers to prevent backtracking (not all engines support). Optimize by: (1) Using specific patterns instead of wildcards. Instead of /.*name.*/, use /[^\n]*name[^\n]*/. (2) Anchoring patterns: /^pattern$/ is faster than /pattern/. (3) Alternation order: put most common alternative first: /cat|dog|bird/ instead of /bird|dog|cat/. (4) Avoiding excessive backtracking: /.* is greedy and backtracks a lot, use .*? (lazy) or specific character classes [^>]* (more specific). (5) Using character classes: [a-z0-9] is faster than ([a-z]|[0-9]). Testing regex on realistically-sized strings before deploying. JavaScript engines have complexity limits - patterns that hang your browser should be rewritten. Use online regex performance testers (regex101.com) to analyze backtracking behavior on your patterns.
Regex in Programming Languages
- JavaScript: /pattern/flags, string.match(), string.replace(), string.split(), /pattern/.test(), new RegExp()
- Python: import re; re.match(), re.search(), re.findall(), re.sub(). Uses r"pattern" for raw strings
- Java: Pattern.compile(), matcher.matches(), matcher.find(), matcher.replaceAll()
- C#: Regex.Match(), Regex.Matches(), Regex.Replace(), Regex.IsMatch()
- PHP: preg_match(), preg_match_all(), preg_replace(), uses /pattern/ delimiters
- Go: regexp.Compile(), regexp.MatchString(), regexp.FindAll(), regexp.ReplaceAll()
- Ruby: /pattern/, string.match(), string.scan(), string.gsub(), =~
- Rust: regex crate, Regex::new(), is_match(), find(), find_iter(), replace()
- Swift: NSRegularExpression, uses Foundation framework
- Syntax differences: Some engines support features others don't. Lookbehind not supported in old JavaScript. Named groups syntax differs (Python: ?P<name>, Java: ?<name>). PCRE (Perl Compatible Regex) is most feature-rich, JavaScript is most common in web development. Test patterns in your target language before using in production.
Common Regex Mistakes and Best Practices
- Wrong: Matching email with simple pattern - RFC 5322 is extremely complex, use validation library
- Wrong: Using . without escaping where you mean literal dot - /file.txt/ matches "fileXtxt", use /file\.txt/
- Wrong: Forgetting to escape special chars in user input before using in regex - untrusted input can break pattern
- Wrong: Greedy matching when lazy needed - /.*<div>/ matches too much, use /.*?<div>/
- Wrong: Not anchoring patterns - /hello/ matches "hello" anywhere, maybe want /^hello$/
- Wrong: Case-sensitive when should be case-insensitive - /Hello/ won't match "HELLO", use /hello/i
- Right: Test regex thoroughly on sample data before using in production
- Right: Use online tools (regex101.com) with detailed explanation of pattern
- Right: Break complex patterns into comments explaining each part
- Right: Consider using validation libraries for complex formats (email, URL, phone)
- Right: Use raw strings in languages that support (r"pattern" in Python)
- Right: Use lookarounds instead of consuming characters when you need to check but not match
Frequently Asked Questions
What's the difference between . and [^\n]?
. matches any character except newline (in most regex engines without dotall flag). [^\n] matches any character except newline, explicitly. They're functionally equivalent. With the s (dotall) flag, . matches newlines too, but [^\n] still doesn't. Use . for simplicity - it's the standard way. Use [^\n] only if you explicitly want to avoid dots matching newlines even with dotall flag. In practice, . is preferred because it's more readable and universally understood.
When should I use \b (word boundary) vs ^ and $?
^$ are anchors that match string/line start and end. \b matches the boundary between word and non-word characters. Use ^ and $ to ensure pattern matches entire string or line: /^hello$/ matches ONLY "hello". Use \b to match word boundaries: /\bhello\b/ matches "hello" in "hello world" but not in "helloworld". \b is useful for matching whole words without anchoring to start/end. Example: /\bcat\b/ matches "cat" in "the cat sat" but not "catch" or "catty". For email validation, use anchors: /^[\w._%+-]+@[\w.-]+\.[a-z]{2,}$/. For word search in text, use word boundaries: /\bsearch\b/g.
Why does my regex hang/freeze the browser?
This is catastrophic backtracking (ReDoS - Regular Expression Denial of Service). Your pattern is causing exponential backtracking - the regex engine tries millions of paths before failing or matching. Example: (a+)+b against "aaaaaaaaac" makes the engine try 2^n combinations. Avoid nested quantifiers: /(a+)+/ is dangerous, /(a)+/ or /a+/ is safe. Also avoid alternation with overlapping patterns: /(a|a)*/ or /(a|ab)*/ cause excessive backtracking. Solution: rewrite pattern to avoid backtracking. Instead of /.*name.*/ (causes backtracking), use /[^\n]*name[^\n]*/. Test patterns on regex101.com which shows backtracking visualization. Never accept user-supplied regex patterns directly without strict validation - it's a security vulnerability.
How do I match email addresses correctly?
Email validation with regex is deceptively hard. RFC 5322 (official email spec) is extremely complex - the most complete regex for it is 6,731 characters long! For practical purposes, use a simplified pattern: /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/ This catches 99% of real emails but permits some technically invalid formats. For strict validation: use server-side verification (send confirmation email), not client-side regex alone. Most real-world apps use: simple regex for client-side user feedback, then server-side email verification (send code, confirm it's real). Never trust email validation regex completely - attackers can spoof emails. The correct approach: regex for basic format check, then verify the email actually works by sending a confirmation link.
What are non-capturing groups (?:...) and when should I use them?
Non-capturing groups (?:...) group patterns without capturing the matched text. Regular groups (name)+ create a capture - the matched text is stored. Non-capturing groups (?:name)+ group without capturing - more efficient. Use non-capturing when you need grouping but don't need to extract the matched text. Example: /(?:cat|dog)+ flies/ matches "cat dog cat flies" but only captures if you use named groups. Benefits: (1) Performance - capturing adds overhead, non-capturing is slightly faster. (2) Clarity - shows intent that you're grouping, not capturing. (3) Simplicity - your capture group numbering doesn't get messed up by internal groups. Example: /^(?:https?|ftp):\/\/([a-z]+)$/ captures only the domain name, not the protocol part. In replace: "cat and dog".replace(/(cat|dog)/g, "$1 is an animal") vs "cat and dog".replace(/(?:cat|dog)/g, "$& is an animal") where $& is the entire match.
How do lookahead and lookbehind work exactly?
Lookahead (?=...) checks if pattern follows WITHOUT consuming it - the match cursor stays in same place. Example: /\d+(?=px)/ matches digits in "10px" - captures "10", cursor after "0" not after "px". Lookbehind (?<=...) checks if pattern precedes WITHOUT consuming it. Example: /(?<=\$)\d+/ matches digits in "$10" - captures "10", cursor before "1". They're "zero-width" - they don't add characters to the match, just assert the surrounding context. Negative versions: (?!...) asserts pattern does NOT follow. (?<!...) asserts pattern does NOT precede. Common use: /\d+(?!px)/ matches numbers NOT followed by "px". Or: /(?<!\$)\d+/ matches numbers NOT preceded by "$". Performance note: lookarounds can be slow on large texts because engine checks every position. Limitation: lookbehind must be fixed-width in some engines (not /(?<=a+)/), but JavaScript now supports variable-width. Use sparingly for performance-sensitive regex.
What's the difference between greedy and lazy quantifiers?
Greedy quantifiers match as MUCH as possible. /a+/ in "aaaaaa" matches all 6 a's. Lazy quantifiers match as LITTLE as possible. /a+?/ in "aaaaaa" matches just 1 a. Greedy: *, +, ?, {n,m} (default). Lazy: *?, +?, ??, {n,m}? (add ? after). Example: /".*"/ in "She said \"hello\" and \"goodbye\"" matches from first quote to last quote (entire string), greedy. /".*?"/ matches just first "\"hello\"", lazy - stops at first closing quote. This is why lazy matching is crucial for HTML: /<.*>/ greedily matches entire file if it has opening and closing tags, but /<.*?>/ correctly matches individual tags. Greedy is default because most of the time it's what you want - match as much as possible of a pattern. Use lazy only when you need to stop at a specific boundary. Performance: lazy quantifiers are sometimes faster because they match less, but can be slower in some contexts - test both.
How do I extract multiple parts from a string using groups?
Use capture groups (parentheses) to extract parts. Example: parsing "John, 30, john@example.com": const pattern = /([A-Za-z]+),\s*(\d+),\s*([a-z@.]+)/; const match = "John, 30, john@example.com".match(pattern); Then access: match[1] = "John", match[2] = "30", match[3] = "john@example.com". With global flag, use matchAll: [...str.matchAll(pattern)].forEach(m => { ... }) to process all matches. In replace, use $1, $2 etc: str.replace(/(\w+)\s(\w+)/, "$2, $1"); swaps names. Named groups (modern JavaScript): const pattern = /(?<name>[A-Za-z]+),\s*(?<age>\d+)/; const match = str.match(pattern); Then: match.groups.name and match.groups.age. Named groups are more readable in complex patterns. Pro tip: test groups on regex101.com which shows exactly what each group captures - invaluable for debugging group-heavy patterns.
Why is my regex pattern case-sensitive when I expected case-insensitive?
By default, regex patterns are case-sensitive: /hello/ matches "hello" but NOT "Hello" or "HELLO". To make case-insensitive, add the i flag: /hello/i matches "hello", "Hello", "HELLO", "hElLo", etc. In JavaScript: "HELLO".match(/hello/); returns null. "HELLO".match(/hello/i); returns ["HELLO"]. In languages without inline flags, pass flag separately: Python: re.match(r"hello", "HELLO", re.IGNORECASE). Common mistake: forgetting the flag on important validations. Password checking: /Password/ won't match "password" - if case shouldn't matter, use /password/i. URLs: domain names are case-insensitive so /https:\/\/example.com/i is safer. Remember: most of the time you want /pattern/i for user input because users don't think about case. Exception: if case matters (like comparing passwords hash or case-sensitive filenames), don't add i flag.
How do I prevent my regex from breaking on special characters in user input?
If you're building a regex from user input, special characters like . * + [ ] ( ) { } ^ $ | \ can break your pattern. Example: user enters "C++" and you do new RegExp(input) - this breaks because + is a quantifier. Solution: Escape special characters before using them. JavaScript: const escaped = input.replace(/[.+*?^${}()|[\]\\]/g, "\\$&");. Or use a helper: const escapeRegex = s => s.replace(/[.*+?^${}()|[\]\\]/g, "\\$&");. After escaping, you can safely use it: new RegExp(escapeRegex(userInput)). Most languages have built-in escape functions: Python: re.escape(), Java: Pattern.quote(), Ruby: Regexp.escape(). Best practice: NEVER accept user-supplied regex patterns directly - it's a security vulnerability (ReDoS attacks). If you need user-supplied patterns, use a regex builder library that sanitizes input. If you only need literal matching from user input, use string methods (indexOf, includes) instead of regex.