How regex changed my life (for good) – always learning – match exact number of letters/numbers

On April 21st, 2022 in Tech

I was working today on a Google Spreadsheet with more than 5,000 lines and I had to filter some columns. Google already have a pretty good filter by value, but you can also use conditions or custom functions.

I was looking for a way to filter every cell that doesn’t begins with exact 2 numbers. That one was pretty simple and fast (please note that I use Google Spreadsheets with my locale in Brazil, so you may have to adapt the function):

=NO(REGEXMATCH(B2;"^[0-9]{2}"))Code language: JavaScript (javascript)

Next, I would like to filter other column for avoiding cells that begins with “+”, AND is not blank, AND doesn’t contains 8 or 9 numbers. The 1st and 2nd rules were easy, but I had some trouble with the 3rd.

Searching on the internet, I found this Stackoverflow post that solved my problem. This is my function:

=NOT(OR(REGEXMATCH(AO2;"^\+");REGEXMATCH(AO2;"^(?:[^0-9]*[0-9]){8,9}[^0-9]*$");ISBLANK(AO2)))Code language: JavaScript (javascript)

What I learned today was how to use (?:…). “(?:…) is a non-capturing group that allows quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences in a string” as Mr. Wiktor Stribiżew explained on his answer.

I had already used (?:…) for the purpose of only non-capturing group sometime ago, but for “quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences” was new to me.

You can try his demo at https://regex101.com/r/DyP5e7/1.

I hope it may help somebody else too.

Tags: regex

How regex changed my life (for good)

On April 17th, 2021 in Tech

To be honest, I’m writing this post for myself. A place where I can store many little tips and bits of information about regex that can be useful on everyday life. If you want something more dense and better documented, I recommend to skip to the end of this post where you can find some great references, including websites that teaches regex exclusively. I’m not here to compete with them.

What is regex?

A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of characters that specifies a search pattern. Usually such patterns are used by string-searching algorithms for “find” or “find and replace” operations on strings, or for input validation. It is a technique developed in theoretical computer science and formal language theory.

From Wikipedia

How can it improve your life?

Regex is “Find” or “Find and Replace” but with Steroids. Regex allows us to use a LOT of search patterns that makes it a Swiss knife for the daily uses (when you get used to it).

I won’t bullshit, it is NOT human readable and the learning curve is quite steep, but regex is that kind of with that after learned, you will use for the rest of your life in a computer. Editing documents, converting text to table, programming, or even studying and reading on the web.

Some basics

Many of the examples bellow are from other websites that I found and saved for myself. If you want it removed or credited, please ask me that.

Special characters

There are 12 characters with special meanings:

  • the backslash \,
  • the caret ^,
  • the dollar sign $,
  • the period or dot .,
  • the vertical bar or pipe symbol |,
  • the question mark ?,
  • the asterisk or star *,
  • the plus sign +,
  • the opening parenthesis (,
  • the closing parenthesis ),
  • the opening square bracket [, and
  • and the opening curly brace {

Special characters in regex must always be preceded by “\” . For example: if I want to find “US$ 2.5 (diet coke)” it would be written: “US\$ 2\.5 \(diet coke\)”.

Things get even crazier when you start to declare common characters using escape backslash and special characters, eg.: search for “\\SE+VER.23(beta)” would be written “\\\\SE\+VER\.23\(beta\)”.

the backslash \

\+plus = +plus
\\esc = \esc
\t = "tab" (the special character)Code language: JavaScript (javascript)

the caret ^

^text = text (in the beginning of a line)
^text = text is something
^text ≠ something is text (text is not on the beginning of the line)

the period or dot .

ton. = tone
ton. = ton#
ton. = ton4
ton. ≠ tones

. = any char except newline
\. = the actual dot character
.? = .{0,1} = match any char except newline zero or one times
.* = .{0,} = match any char except newline zero or more times
.+ = .{1,} = match any char except newline one or more timesCode language: PHP (php)

the vertical bar or pipe symbol |

pand(abc|123) = pandora OR pand123

the asterisk or star *

tre*= tree (e is found 2 times)
tre* = tre (e is found 1 time)
tre* = tr (e is found 0 times)
tre* ≠  trees

the plus sign +

tre+ = tree (e is found 2 times)
tre+ = tre (e is found 1 time)
tre+ ≠  tr (e is found 0 times)

the opening and closing parenthesis ( )

Find:
(something) = There's something around here. (finds something)
Replace:
I can taste $1. = I can taste something.

Find:
(frog) = There's a frog in the pound. (finds frog)
Replace:
I can taste $1. = I can taste frog.Code language: PHP (php)

the opening square bracket [

[a-z] = mice (will find 4 entries: m, i, c, e)

[a-z]+ = mice (will find mice)
[a-z]+ = car monkey dinamite (will find 3 entries: car, monkey, and dinamite)

[A-Z] = mice (won't find anything)
[A-Z]+ = Mice (won't find anything)
[A-Z]+ = MICE (will find one entry)

[a-zA-Z]+ = MICE (will find MICE)
[a-zA-Z]+ = mice (will find mice)
[a-zA-Z]+ = Mice (will find Mice)
[a-zA-Z]+ = MICE Mice mice (will find 3 entries: MICE, Mice, and mice)Code language: PHP (php)

and the opening curly brace {

a

The next post

For today that’s all. On the next post I’ll show some examples and useful tips that I learned while using regex.

List of regex references

List of regex tools

Tags: regex