Regular Expressions

Published on February 2017 | Categories: Documents | Downloads: 52 | Comments: 0 | Views: 366
of 22
Download PDF   Embed   Report

Comments

Content

Regular Expressions
Software Team

What are Regular Expressions?
• A regular expression (regex) is a sequence of characters
used in pattern matching on text
• Parse number descriptions and read raw data

Why use Regular Expressions?
• Ex: “TAPE AND REEL”
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T
R, TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
• T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?

Terminology
• String: text to search through
• Index i: the location in between
letters of our string, starting before
the first one
• Ex:
• Regex: KEMET
• String: “KEMET electronic components”

• Group: part of a regex
• T(APE)

Regex Components

String Literals
• The most basic form of pattern matching
• Matches a regex to an exact string or part of a string
• Regex KEM will match “kem”
• Regex CAP matches the string “The shorthand for capacitance
is cap” twice, between indices 18 to 20 and also 33 to 35
• Regex bcd234 will not match the string “abcde12345”

Character Classes
• Use brackets ‘[ ]’ to group characters together as an option for a
single character
• Simple classes: set characters side-by-side, all are available options
• [O0]8[O0]5 will match to “O8O5” and “0805”
• [O0]8[O0]5 does not match to “O8O05”

• Ranges: shortcuts for defining a character class containing a range
of values





[a-d] matches to the one of the letters a, b, c, or d
[1-5] matches to one number between 1 and 5
[a-c1-3] matches to a, b, c, 1, 2, or 3
pop[2-5] matches to “pop3” but not “pop6” nor “poptart”

Predefined Character Classes
• Character class shortcuts
• 1\d3
• 123, 183, 1a3

• 5\D8
• 5!8, 528, 5a8

• \w
• *, A, !, _, 1

• \W2
• &2, J2, %2, 32

Construct

Description

\d

A digit: [0-9]

\D

Any non-digit

\w

A word
character:
[a-zA-Z_0-9]

\W

A non-word
character

\s

A whitespace
character

\S

A nonwhitespace
character

Metacharacters
• Special characters that affect the way a pattern is matched
• b.t matches to “bat”, “bgt”, “b2t”, etc., since ‘.’ will match to
any character
• ^Volts
• Volts and Amps
• 10 Volts

• volt$
• voltage
• 1 volt

• [^456]
•4
•3
•9

• (CAP)|(IND) matches to “CAP” or “IND”

Metacharact
er

Description

. (period)

Any character

^ (carrot)

Start of string

$

End of string

[^…]

Negation

x|y

OR operator

Quantifiers
• Number of occurrences
• (+|-)?10%
• +10%, -10%, 10%

• 12.3(4)*
• 12.34, 12.344, 12.3444, etc.
• 12.3
• Not 12. or 12

• 0.(3)+ W
• 0.3 W, 0.33 W, 0.333 W, etc.

Quantifier

Meaning

X?

X, 0 or 1
occurrences

X*

X, 0 or more
occurrences

X+

X, 1 or more
occurrences

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*((AND)|&)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*((AND)|&)?

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*((AND)|&)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*((AND)|&)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
• TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

• T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?

Recap
• We use regexes as a pattern to search through text
• Character classes, metacharacters, quantifiers
• Can get complicated!
• M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})

So, why is this important for us?
• In general, regexes are very confusing to write
• Not industry specific
• Value Expressions
• 10 pF, 100 nF, .1 uF
• \vFarad

• Shortcut
• http://localhost:8080/definition-manager/regex

Sponsor Documents

Or use your account on DocShare.tips

Hide

Forgot your password?

Or register your new account on DocShare.tips

Hide

Lost your password? Please enter your email address. You will receive a link to create a new password.

Back to log-in

Close