about
10/31/2024
Help: RegEx
Help: RegEx
finding substrings in files and strings
About
click header to toggle Site Explorer
Some people, when confronted with a problem, think "I know, I'll use regular expressions."
Now they have two problems.
- Jamie Zawinski
- Jamie Zawinski
Synopsis:
1.0 Regular Expression Specifiers
1.1 Metacharacters
↓Metacharacter | Meaning |
---|---|
|
period matches any character except a newline |
|
star matches zero or more of the preceding character |
|
plus matches one or more of the preceding character |
|
question mark matches zero or one instances of the preceding character |
|
carot matches start of the test string, e.g., the preceding character is the first. |
|
dollar matches end of the test string, e.g., the preceding character is the last. |
|
back-slash escapes a metacharacter so it is treated as an ordinary character. |
|
alternation character represents an "either or", as in cat|dog matches either cat or dog. |
fn test_reg_ex(test_str:&str, reg_ex_str:&str) -> bool {
/*
RegEx::new(reg_ex_str) returns RegEx state machine wrapped in
Ok if valid reg_ex_str
*/
if let Ok(pattern) = Regex::new(reg_ex_str) {
pattern.is_match(test_str)
/* Ok if valid RegExStr, returns true if match else false */
}
else {
false /* not Ok, invalid RegExStr */
}
}
fn show_match_op(test_str:&str, reg_ex_str:&str) {
let m = test_reg_ex(test_str, reg_ex_str);
if m {
println!("{} matches RegEx", test_str);
}
else {
println!("{} did not match RegEx", test_str);
}
}
Test literal string matching
RegEx string: Rust|rust|Language|language
rust matches RegEx
Language matches RegEx
foo did not match RegEx
Test metacharacters matching
RegEx string: R.s+
Rust matches RegEx
Rbst matches RegEx
Rcsttt matches RegEx
Rctttt did not match RegEx
rctttt did not match RegEx
1.2 Metastrings
↓Metastrings | Meaning |
---|---|
|
matches any digit |
|
matches any non-digit character |
|
matches any word character, i.e., [a-zA-Z0-9_] |
|
matches any non-word character, i.e., [^a-zA-Z0-9_] |
|
matches any word boundary, transition from word character to non-word character or vice versa |
|
matches non-word boundary |
|
matches any white-space character, i.e., [\t\r\n\f] |
|
matches any non-white-space character, i.e., [^\t\r\n\f] |
|
matches exactly n occurences of the preceding character or group, e.g., \d{3} matches three adjacent digits |
1.3 Character Classes
↓Character Classes | Meaning |
---|---|
|
matches "r", "s", or "t" |
|
matches any lower case letter between "b" and "y", including first and last |
|
matches any character that is not upper case ASCII |
|
captures a specific part of test string, e.g., (\d{3}) captures first three digits. |
|
carot matches start of the test string, e.g., the preceding character is the first. |
|
dollar matches end of the test string, e.g., the preceding character is the last. |
|
back-slash escapes a metacharacter so it is treated as an ordinary character. |
fn show_capture(test_str:&str, reg_ex_str:&str) {
match Regex::new(reg_ex_str) {
Ok(re) => {
if let Some(caps) = re.captures(test_str) {
if let Some(group1) = caps.get(1) {
println!("Group 1: {}", group1.as_str());
}
if let Some(group2) = caps.get(2) {
println!("Group 2: {}", group2.as_str());
}
if let Some(group3) = caps.get(3) {
println!("Group 3: {}", group3.as_str());
}
}
}
Err(_e) => {
println!("Invalid RegExStr");
}
}
}
Test capture
RegEx string: (\d{3})-(\d{3})-(\d{4})
test_str: 012-345-6789
Group 1: 012
Group 2: 345
Group 3: 6789
2.0 Regular Expression State Machines
RegEx Method | Explaination |
---|---|
|
Compiles RegExSpec string into state machine, but does not execute. Returns Ok(RegEx) if there are no errors building state machine, otherwise None. |
|
Executes state machine to see if test_string matches regex_spec. |
|
Searches for first match of test_string with regex_str, returning Option |
|
FindMatches is an iterator over all non-overlapping matches in text_str. |
References:
Link | Comments |
---|---|
Crate regex | Crate documentation for regex 1.1 |
regex tutorial from rust-cookbook | Several examples of regex applications. |
RegEx Notes - Ray Toal | Contains most of the content to be summarized here |