Python Regex Tutorial with Examples

A Regular Expression, commonly referred to as RegEx, is a series of distinctive characters that create a search pattern. RegEx can be utilized to verify whether a string includes the designated search pattern.

By identifying a specified pattern within a text, it is capable of detecting both the existence and the lack of that text, as well as deconstructing the pattern into smaller components.

Python Regex Module

In Python, regular expressions are managed using the built-in module known as "re." You can utilize the import statement to bring this module into your code.

Python Regex Module Syntax

It has the following syntax:

Example

# importing the re module

import re

How to Use RegEx in Python?

The example below demonstrates how to locate the term "platform" within a specified string and outputs both its starting and ending indices:

Example

Example

# simple example to show the use of regular expression

# importing the re module

import re

# given string

str_1 = 'Result: An amazing platform to learn coding'

# searching for specified pattern in given string

matched_str = re.search(r'platform', str_1) # using the search() method

# printing the starting and ending index

print('Beginning Index:', matched_str.start())

print('Ending Index:', matched_str.end())

Output:

Output

Beginning Index: 24

Ending Index: 32

Explanation:

In this illustration, we have utilized the search method from the re module to identify the defined pattern within the provided string. In this case, the search pattern represented by the character r in (r'platform') indicates that it is a raw string. Subsequently, we employed the start and end methods to obtain the starting and ending indices of the found pattern within the string.

RegEx Functions in Python

In Python, the re module encompasses a variety of functions that enable users to search, match, and modify strings according to defined patterns.

The subsequent table highlights several of these functions:

RegEx Function Description
re.search() It is used to locate the first occurrence of a character.
re.findall() It is used to find and return all matching that occurs in a list.
re.compile() It is used to compile the regular expressions into pattern objects.
re.split() It is used to split the string on the basis of the occurrences of a specific character or pattern.
re.sub() It is used to replace all occurrences of a character or pattern with a specified replacement string.
re.escape() It is used to escape special characters.

Let us understand the working of these functions with the help of examples.

re.search

The re.search method is utilized to identify the initial instance of a specified pattern. This method scans the complete string and produces a match object when a match is discovered. In the absence of a match, it returns None to indicate that no match was located.

Python re.search Function Example

Let us consider an example that illustrates how to find a specific pattern within a provided string.

Example

Example

# importing the re module

import re

# given string

str_1 = 'I have been working as a Web Developer since 2023.'

regex_pattern = r"([a-zA-Z]+) (\d+)"

# searching for specified pattern in given string

matched_str = re.search(regex_pattern, str_1) # using the search() method

# checking the returned object

if matched_str:

  # printing the matched pattern details

  print('Match Found:', matched_str.group())

  print('Beginning Index:', matched_str.start())

  print('Ending Index:', matched_str.end())

else:

  print('Match Not Found')

Output:

Output

Match Found: since 2023

Beginning Index: 39

Ending Index: 49

Explanation:

In the preceding example, we employed the search method from the re module to identify the presence of a word that is succeeded by a number. As this specific pattern appears in the provided string as 'since 2023', it was successfully retrieved. Subsequently, we utilized the group method to display the matched pattern, along with its starting and ending indices, which were obtained using the start and end methods, respectively.

re.findall

The re.findall method is utilized to generate a list of non-overlapping matches found within the specified string. In contrast to the search method, which provides only the initial match, the findall function retrieves all matches and presents them in a list format.

Python re.findall Function Example

Let us now take a look at a simple example:

Example

Example

# importing the re module

import re

# given string

str_1 = """My house no. is 4567 and

          my office no. is 8910."""

regex_pattern = r"([a-zA-Z]+) (\d+)"

# searching all occurrences in the given string

matched_str_list = re.findall(regex_pattern, str_1)

# checking the returned object

if matched_str_list:

  print(matched_str_list)

else:

  print("No match found")

Output:

Output

[('is', '4567'), ('is', '8910')]

Explanation:

In this instance, we utilized the findall method from the re module to identify every occurrence of the designated pattern within the provided string, storing the resulting object in a variable. Subsequently, we displayed the array of matched patterns.

re.compile

The function re.compile serves to convert a regular expression pattern into an object that can be reused, enabling us to employ its methods (including search, findall, and others) repeatedly without the need to redefine the pattern each time.

Python re.compile Function Example

Below is an illustration demonstrating the application of the compile function:

Example

Example

# importing the re module

import re

# given string

str_1 = "Welcome to our tutorial."

# using compile() function

regex_pattern = re.compile('[a-e]')

# searching all occurrences in the given string

matched_str_list = re.findall(regex_pattern, str_1)

if matched_str_list:

  print(matched_str_list)

else:

  print("No match found")

Output:

Output

['e', 'c', 'e', 'e', 'c']

Explanation:

In this illustration, we utilized the re.compile method to transform a regular expression pattern into a regex object that can be reused. Following this, we invoked the findall method to identify all instances within the specified string.

re.split

The re.split function serves the purpose of dividing a string at each point where a regex pattern corresponds, akin to the functionality of the str.split method; nevertheless, it offers enhanced capabilities for pattern matching.

Python re.split Function Example

Let’s examine the subsequent example to illustrate how the re.split function operates in Python.

Example

Example

# importing the re module

import re

# given string

str_1 = "mango banana,apple;orange,cherry"

# using the split() function

regex_pattern = r'[;,\s]'

# splitting on semicolon, comma, or space

matched_str_list = re.split(regex_pattern, str_1)

if matched_str_list:

  print(matched_str_list)

else:

  print("No match found")

Output:

Output

['mango', 'banana', 'apple', 'orange', 'cherry']

Explanation:

In this instance, we utilized the split function from the re module to divide the provided string at every point where the defined regex pattern corresponds.

re.sub

The re.sub method is a feature of the re module that facilitates the substitution of every occurrence of a regex pattern within a string with a specified replacement string. This function operates similarly to a "find and replace" mechanism utilizing regex patterns.

Python re.sub Function Example

In this section, we will examine a specific example to illustrate the usage of the re.sub function in Python.

Example

Example

# importing the re module

import re

# given string

original_str = "Roses are red, Violets are blue."

print("Original String:", original_str)

# pattern and replacement

pattern = "red"

replacement = "white"

# using the sub() function

new_str = re.sub(pattern, replacement, original_str)

print("New String:", new_str)

Output:

Output

Original String: Roses are red, Violets are blue.

New String: Roses are white, Violets are blue.

Explanation:

In this instance, we have utilized the sub function from the re module to identify the defined pattern within the provided string and substitute it with the designated replacement.

re.subn

The subn function is an additional function found in the re module, operating similarly to the sub function. Nevertheless, it provides a tuple that includes both the modified string and the count of substitutions made within the specified string.

Python re.subn Function Example

Let's examine the subsequent example to grasp how the re.subn function operates.

Example

Example

# importing the re module

import re

# given string

original_str = "This building has 4 floors. There are 3 flats on each floor."

# pattern and replacement

pattern = r'\d+'

replacement = 'many'

# using the subn() function

new_str, num_subs = re.subn(pattern, replacement, original_str)

# printing the results

print("Original string:", original_str)

print("New string:", new_str)

print("Number of substitutions:", num_subs)

Output:

Output

Original string: This building has 4 floors. There are 3 flats on each floor.

New string: This building has many floors. There are many flats on each floor.

Number of substitutions: 2

Explanation:

In this illustration, we utilized the re.subn function to substitute every instance of the defined pattern within the provided string with the specified replacement. Subsequently, we saved both the modified string and the count of substitutions made, and then we printed those results.

re.escape

The escape function belongs to the re module and is utilized to escape every special character within a string, ensuring it can be safely treated as a literal in a regular expression.

Python re.escape Function Example

Below is an illustration demonstrating the application of the re.escape function within Python.

Example

Example

# importing the re module

import re

# using the re.escape() function

print(re.escape("Welcome to our tutorial"))

print(re.escape("We've \t learned various [a-9] concepts of& Python ^!"))

Output:

Output

Welcome\ to\ C# Tutorial\ Tech

We've\ \	\ learned\ various\ \[a\-9\]\ concepts\ of\&\ Python\ \^!

Explanation:

In this instance, we have utilized the re.escape function to appropriately escape any special characters present within the provided strings.

Meta-characters in Python Regex

In regular expressions, meta-characters are defined as unique symbols that dictate how patterns are interpreted. These characters are regarded as non-literal unless they are prefixed with a backslash "\\" to escape their special functionality.

Presented below is a table that includes different meta-characters utilized in Python regular expressions:

Meta-Character Description
It is used to drop the special meaning of character following it
[] It represents a character class
^ It is used to match the beginning
$ It is used to match the end
. It is used match any character except newline
It means OR (Matches with any of the characters separated by it.
? It matches zero or one occurrence
* It is used to show any number of occurrences (including 0 occurrences)
+ It is used to display one or more occurrences
{} It indicates the number of occurrences of a preceding regex to match.
() It encloses a group of Regex

Meta-characters Example in Python Regex

Let us examine an illustration of meta-characters utilized in Python regular expressions.

Example

Example

# importing the re module

import re

# The . meta-character matches any character (except newline)

text = "cat bat sat mat"

pattern = r"at"  # Matches 'at' in any word

matches = re.findall(pattern, text)

print(matches)

# The ^ meta-character matches the start of the string

text = "The quick brown fox"

pattern = r"^The" # Matches 'The' only if it is at the beginning

matches = re.findall(pattern, text)

print(matches)

# The $ meta-character matches the end of the string

text = "The quick brown fox"

pattern = r"fox$" # Matches 'fox' only if it is at the end

matches = re.findall(pattern, text)

print(matches)

# The * meta-character matches zero or more occurrences of the preceding character

text = "ab abb abbb"

pattern = r"ab*" # Matches 'a' followed by zero or more 'b's

matches = re.findall(pattern, text)

print(matches)

# The + meta-character matches one or more occurrences of the preceding character

text = "ab abb abbb"

pattern = r"ab+" # Matches 'a' followed by one or more 'b's

matches = re.findall(pattern, text)

print(matches)

Output:

Output

['at', 'at', 'at', 'at']

['The']

['fox']

['ab', 'abb', 'abbb']

['ab', 'abb', 'abbb']

Explanation:

In this illustration, various meta-characters are demonstrated, including ., ^, $, *, and +. We have applied these meta-characters within distinct regex patterns and utilized the findall function to identify all occurrences within the specified strings.

Special Sequences in Python Regex

Special Sequences serve as abbreviated representations for frequently used character classes or positions. These sequences initiate with a backslash "\\" followed by a specific letter or symbol.

Below is a compilation of frequently utilized special sequences in regular expressions (regex):

Special Sequence Description
d Digit (0 - 9)
D Non-digit
w Word character (letters, digits, underscore)
W Non-word character
s Whitespace (space, tab, newline)
S Non-whitespace
b Word boundary
B Not a word boundary
A Start of string
Z End of string

Special Sequences Example in Python Regex

Next, we will examine the subsequent illustration of unique sequences within Python's regex functionality.

Example

Example

# importing the re module

import re

# \d Matches a digit

txt = "Welcome to our tutorial 123 Tech"

x = re.findall("\d", txt)

print(x)

# \S Matches a non-whitespace character

txt = "logicpractice tech"

x = re.findall("\S", txt)

print(x)

# \w Matches a word character (alphanumeric + underscore)

txt = "logicpractice world_123"

x = re.findall("\w", txt)

print(x)

# \b Matches the boundary between a word and a non-word character

txt = "logicpractice tech"

x = re.findall(r"\btech\b", txt)

print(x)

Output:

Output

['1', '2', '3']

['t', 'p', 'o', 'i', 'n', 't', 't', 'e', 'c', 'h']

['t', 'p', 'o', 'i', 'n', 't', 'w', 'o', 'r', 'l', 'd', '_', '1', '2', '3']

['tech']

Explanation:

In this illustration, we observe the application of various special sequences such as \d, \S, \w, and \b within regular expressions (regex).

Match Objects in Python Regex

When employing functions such as re.search or re.match, a match object is returned if a match is detected. This object contains various details about the match, such as its content and location within the string.

Match Objects Example in Python Regex

Let’s explore an example that illustrates the match objects in Python’s regular expressions (regex).

Example

Example

# importing the re module

import re

# given string

str_1 = "He is a boy and he plays cricket everyday"

# regex pattern

regex_pattern = r"he"

# Finding the first occurrence and return a match object

match_obj = re.search(regex_pattern, str_1, re.IGNORECASE)

if match_obj:

    print(f"Search match object: {match_obj}")

    print(f"Match start index: {match_obj.start()}")

    print(f"Match end index: {match_obj.end()}")

    print(f"Matched string: {match_obj.group(0)}")

Output:

Output

Search match object: <re.Match object; span=(0, 2), match='He'>

Match start index: 0

Match end index: 2

Matched string: He

Explanation:

In this illustration, we can observe the application of various match object methods within regular expressions (regex).

Below is a compilation of frequently utilized methods associated with match objects in Python's regular expressions:

Method Description
.start() It returns the start index of the match
.end() It returns the end index of the match
.span() It returns a tuple of (start, end)
.group() It returns the matched string
.groups() It returns all capture groups as a tuple
.group(n) It returns the nth capture group

Conclusion

In this tutorial, we explored the concept of regular expressions within the Python programming language. We gained insights into the functionality of regex in Python and examined its multiple functions. Additionally, we delved into other essential topics such as meta-characters, special sequences, and match objects.

Python Regular Expression MCQs

  1. Which module is used for regular expressions in Python?
  • regex
  • pattern
  • string
  1. What does \d match in regex?
  • A letter
  • A space
  • A digit
  • A special character
  1. What function is used to find all matches of a pattern in a string?
  • match
  • findall
  • search
  • compile

Response: b) re.findall

  1. Which option corresponds to a match for an individual whitespace character?
  1. What does the . (dot) meta-character match?
  • A digit only
  • End of a string
  • A whitespace character
  • Any single character except newline

Input Required

This code uses input(). Please provide values below: