To be honest, I’m writing this post for myself. A place where I can store many little tips and bits of information about regex that can be useful on everyday life. If you want something more dense and better documented, I recommend to skip to the end of this post where you can find some great references, including websites that teaches regex exclusively. I’m not here to compete with them.
What is regex?
A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.
From Wikipedia
How can it improve your life?
Regex is “Find” or “Find and Replace” but with Steroids. Regex allows us to use a LOT of search patterns that makes it a Swiss knife for the daily uses (when you get used to it).
I won’t bullshit, it is NOT human readable and the learning curve is quite steep, but regex is that kind of with that after learned, you will use for the rest of your life in a computer. Editing documents, converting text to table, programming, or even studying and reading on the web.
Some basics
Many of the examples bellow are from other websites that I found and saved for myself. If you want it removed or credited, please ask me that.
Special characters
There are 12 characters with special meanings:
- the backslash \,
- the caret ^,
- the dollar sign $,
- the period or dot .,
- the vertical bar or pipe symbol |,
- the question mark ?,
- the asterisk or star *,
- the plus sign +,
- the opening parenthesis (,
- the closing parenthesis ),
- the opening square bracket [, and
- and the opening curly brace {
Special characters in regex must always be preceded by “\” . For example: if I want to find “US$ 2.5 (diet coke)” it would be written: “US\$ 2\.5 \(diet coke\)”.
Things get even crazier when you start to declare common characters using escape backslash and special characters, eg.: search for “\\SE+VER.23(beta)” would be written “\\\\SE\+VER\.23\(beta\)”.
the backslash \
\+plus = +plus
\\esc = \esc
\t = "tab" (the special character)
Code language: JavaScript (javascript)
the caret ^
^text = text (in the beginning of a line)
^text = text is something
^text ≠ something is text (text is not on the beginning of the line)
the period or dot .
ton. = tone
ton. = ton#
ton. = ton4
ton. ≠ tones
. = any char except newline
\. = the actual dot character
.? = .{0,1} = match any char except newline zero or one times
.* = .{0,} = match any char except newline zero or more times
.+ = .{1,} = match any char except newline one or more times
Code language: PHP (php)
the vertical bar or pipe symbol |
pand(abc|123) = pandora OR pand123
the asterisk or star *
tre*= tree (e is found 2 times)
tre* = tre (e is found 1 time)
tre* = tr (e is found 0 times)
tre* ≠ trees
the plus sign +
tre+ = tree (e is found 2 times)
tre+ = tre (e is found 1 time)
tre+ ≠ tr (e is found 0 times)
the opening and closing parenthesis ( )
Find:
(something) = There's something around here. (finds something)
Replace:
I can taste $1. = I can taste something.
Find:
(frog) = There's a frog in the pound. (finds frog)
Replace:
I can taste $1. = I can taste frog.
Code language: PHP (php)
the opening square bracket [
[a-z] = mice (will find 4 entries: m, i, c, e)
[a-z]+ = mice (will find mice)
[a-z]+ = car monkey dinamite (will find 3 entries: car, monkey, and dinamite)
[A-Z] = mice (won't find anything)
[A-Z]+ = Mice (won't find anything)
[A-Z]+ = MICE (will find one entry)
[a-zA-Z]+ = MICE (will find MICE)
[a-zA-Z]+ = mice (will find mice)
[a-zA-Z]+ = Mice (will find Mice)
[a-zA-Z]+ = MICE Mice mice (will find 3 entries: MICE, Mice, and mice)
Code language: PHP (php)
and the opening curly brace {
a
The next post
For today that’s all. On the next post I’ll show some examples and useful tips that I learned while using regex.