What are Regular Expressions?
• A regular expression (regex) is a sequence of characters
used in pattern matching on text
• Parse number descriptions and read raw data
Why use Regular Expressions?
• Ex: “TAPE AND REEL”
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T
R, TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
Terminology
• String: text to search through
• Index i: the location in between
letters of our string, starting before
the first one
• Ex:
• Regex: KEMET
• String: “KEMET electronic components”
• Group: part of a regex
• T(APE)
Regex Components
String Literals
• The most basic form of pattern matching
• Matches a regex to an exact string or part of a string
• Regex KEM will match “kem”
• Regex CAP matches the string “The shorthand for capacitance
is cap” twice, between indices 18 to 20 and also 33 to 35
• Regex bcd234 will not match the string “abcde12345”
Character Classes
• Use brackets ‘[ ]’ to group characters together as an option for a
single character
• Simple classes: set characters side-by-side, all are available options
• [O0]8[O0]5 will match to “O8O5” and “0805”
• [O0]8[O0]5 does not match to “O8O05”
• Ranges: shortcuts for defining a character class containing a range
of values
•
•
•
•
[a-d] matches to the one of the letters a, b, c, or d
[1-5] matches to one number between 1 and 5
[a-c1-3] matches to a, b, c, 1, 2, or 3
pop[2-5] matches to “pop3” but not “pop6” nor “poptart”
Predefined Character Classes
• Character class shortcuts
• 1\d3
• 123, 183, 1a3
• 5\D8
• 5!8, 528, 5a8
• \w
• *, A, !, _, 1
• \W2
• &2, J2, %2, 32
Construct
Description
\d
A digit: [0-9]
\D
Any non-digit
\w
A word
character:
[a-zA-Z_0-9]
\W
A non-word
character
\s
A whitespace
character
\S
A nonwhitespace
character
Metacharacters
• Special characters that affect the way a pattern is matched
• b.t matches to “bat”, “bgt”, “b2t”, etc., since ‘.’ will match to
any character
• ^Volts
• Volts and Amps
• 10 Volts
• volt$
• voltage
• 1 volt
• [^456]
•4
•3
•9
• (CAP)|(IND) matches to “CAP” or “IND”
Metacharact
er
Description
. (period)
Any character
^ (carrot)
Start of string
$
End of string
[^…]
Negation
x|y
OR operator
Quantifiers
• Number of occurrences
• (+|-)?10%
• +10%, -10%, 10%
• 12.3(4)*
• 12.34, 12.344, 12.3444, etc.
• 12.3
• Not 12. or 12
• 0.(3)+ W
• 0.3 W, 0.33 W, 0.333 W, etc.
Quantifier
Meaning
X?
X, 0 or 1
occurrences
X*
X, 0 or more
occurrences
X+
X, 1 or more
occurrences
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?(\s)*
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?(\s)*
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
Recap
• We use regexes as a pattern to search through text
• Character classes, metacharacters, quantifiers
• Can get complicated!
• M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})
So, why is this important for us?
• In general, regexes are very confusing to write
• Not industry specific
• Value Expressions
• 10 pF, 100 nF, .1 uF
• \vFarad