A Regular Expression, commonly referred to as RegEx, is a series of distinctive characters that create a search pattern. RegEx can be utilized to verify whether a string includes the designated search pattern.
By identifying a specified pattern within a text, it is capable of detecting both the existence and the lack of that text, as well as deconstructing the pattern into smaller components.
Python Regex Module
In Python, regular expressions are managed using the built-in module known as "re." You can utilize the import statement to bring this module into your code.
Python Regex Module Syntax
It has the following syntax:
# importing the re module
import re
How to Use RegEx in Python?
The example below demonstrates how to locate the term "platform" within a specified string and outputs both its starting and ending indices:
Example
# simple example to show the use of regular expression
# importing the re module
import re
# given string
str_1 = 'Result: An amazing platform to learn coding'
# searching for specified pattern in given string
matched_str = re.search(r'platform', str_1) # using the search() method
# printing the starting and ending index
print('Beginning Index:', matched_str.start())
print('Ending Index:', matched_str.end())
Output:
Beginning Index: 24
Ending Index: 32
Explanation:
In this illustration, we have utilized the search method from the re module to identify the defined pattern within the provided string. In this case, the search pattern represented by the character r in (r'platform') indicates that it is a raw string. Subsequently, we employed the start and end methods to obtain the starting and ending indices of the found pattern within the string.
RegEx Functions in Python
In Python, the re module encompasses a variety of functions that enable users to search, match, and modify strings according to defined patterns.
The subsequent table highlights several of these functions:
| RegEx Function | Description |
|---|---|
| re.search() | It is used to locate the first occurrence of a character. |
| re.findall() | It is used to find and return all matching that occurs in a list. |
| re.compile() | It is used to compile the regular expressions into pattern objects. |
| re.split() | It is used to split the string on the basis of the occurrences of a specific character or pattern. |
| re.sub() | It is used to replace all occurrences of a character or pattern with a specified replacement string. |
| re.escape() | It is used to escape special characters. |
Let us understand the working of these functions with the help of examples.
re.search
The re.search method is utilized to identify the initial instance of a specified pattern. This method scans the complete string and produces a match object when a match is discovered. In the absence of a match, it returns None to indicate that no match was located.
Python re.search Function Example
Let us consider an example that illustrates how to find a specific pattern within a provided string.
Example
# importing the re module
import re
# given string
str_1 = 'I have been working as a Web Developer since 2023.'
regex_pattern = r"([a-zA-Z]+) (\d+)"
# searching for specified pattern in given string
matched_str = re.search(regex_pattern, str_1) # using the search() method
# checking the returned object
if matched_str:
# printing the matched pattern details
print('Match Found:', matched_str.group())
print('Beginning Index:', matched_str.start())
print('Ending Index:', matched_str.end())
else:
print('Match Not Found')
Output:
Match Found: since 2023
Beginning Index: 39
Ending Index: 49
Explanation:
In the preceding example, we employed the search method from the re module to identify the presence of a word that is succeeded by a number. As this specific pattern appears in the provided string as 'since 2023', it was successfully retrieved. Subsequently, we utilized the group method to display the matched pattern, along with its starting and ending indices, which were obtained using the start and end methods, respectively.
re.findall
The re.findall method is utilized to generate a list of non-overlapping matches found within the specified string. In contrast to the search method, which provides only the initial match, the findall function retrieves all matches and presents them in a list format.
Python re.findall Function Example
Let us now take a look at a simple example:
Example
# importing the re module
import re
# given string
str_1 = """My house no. is 4567 and
my office no. is 8910."""
regex_pattern = r"([a-zA-Z]+) (\d+)"
# searching all occurrences in the given string
matched_str_list = re.findall(regex_pattern, str_1)
# checking the returned object
if matched_str_list:
print(matched_str_list)
else:
print("No match found")
Output:
[('is', '4567'), ('is', '8910')]
Explanation:
In this instance, we utilized the findall method from the re module to identify every occurrence of the designated pattern within the provided string, storing the resulting object in a variable. Subsequently, we displayed the array of matched patterns.
re.compile
The function re.compile serves to convert a regular expression pattern into an object that can be reused, enabling us to employ its methods (including search, findall, and others) repeatedly without the need to redefine the pattern each time.
Python re.compile Function Example
Below is an illustration demonstrating the application of the compile function:
Example
# importing the re module
import re
# given string
str_1 = "Welcome to our tutorial."
# using compile() function
regex_pattern = re.compile('[a-e]')
# searching all occurrences in the given string
matched_str_list = re.findall(regex_pattern, str_1)
if matched_str_list:
print(matched_str_list)
else:
print("No match found")
Output:
['e', 'c', 'e', 'e', 'c']
Explanation:
In this illustration, we utilized the re.compile method to transform a regular expression pattern into a regex object that can be reused. Following this, we invoked the findall method to identify all instances within the specified string.
re.split
The re.split function serves the purpose of dividing a string at each point where a regex pattern corresponds, akin to the functionality of the str.split method; nevertheless, it offers enhanced capabilities for pattern matching.
Python re.split Function Example
Let’s examine the subsequent example to illustrate how the re.split function operates in Python.
Example
# importing the re module
import re
# given string
str_1 = "mango banana,apple;orange,cherry"
# using the split() function
regex_pattern = r'[;,\s]'
# splitting on semicolon, comma, or space
matched_str_list = re.split(regex_pattern, str_1)
if matched_str_list:
print(matched_str_list)
else:
print("No match found")
Output:
['mango', 'banana', 'apple', 'orange', 'cherry']
Explanation:
In this instance, we utilized the split function from the re module to divide the provided string at every point where the defined regex pattern corresponds.
re.sub
The re.sub method is a feature of the re module that facilitates the substitution of every occurrence of a regex pattern within a string with a specified replacement string. This function operates similarly to a "find and replace" mechanism utilizing regex patterns.
Python re.sub Function Example
In this section, we will examine a specific example to illustrate the usage of the re.sub function in Python.
Example
# importing the re module
import re
# given string
original_str = "Roses are red, Violets are blue."
print("Original String:", original_str)
# pattern and replacement
pattern = "red"
replacement = "white"
# using the sub() function
new_str = re.sub(pattern, replacement, original_str)
print("New String:", new_str)
Output:
Original String: Roses are red, Violets are blue.
New String: Roses are white, Violets are blue.
Explanation:
In this instance, we have utilized the sub function from the re module to identify the defined pattern within the provided string and substitute it with the designated replacement.
re.subn
The subn function is an additional function found in the re module, operating similarly to the sub function. Nevertheless, it provides a tuple that includes both the modified string and the count of substitutions made within the specified string.
Python re.subn Function Example
Let's examine the subsequent example to grasp how the re.subn function operates.
Example
# importing the re module
import re
# given string
original_str = "This building has 4 floors. There are 3 flats on each floor."
# pattern and replacement
pattern = r'\d+'
replacement = 'many'
# using the subn() function
new_str, num_subs = re.subn(pattern, replacement, original_str)
# printing the results
print("Original string:", original_str)
print("New string:", new_str)
print("Number of substitutions:", num_subs)
Output:
Original string: This building has 4 floors. There are 3 flats on each floor.
New string: This building has many floors. There are many flats on each floor.
Number of substitutions: 2
Explanation:
In this illustration, we utilized the re.subn function to substitute every instance of the defined pattern within the provided string with the specified replacement. Subsequently, we saved both the modified string and the count of substitutions made, and then we printed those results.
re.escape
The escape function belongs to the re module and is utilized to escape every special character within a string, ensuring it can be safely treated as a literal in a regular expression.
Python re.escape Function Example
Below is an illustration demonstrating the application of the re.escape function within Python.
Example
# importing the re module
import re
# using the re.escape() function
print(re.escape("Welcome to our tutorial"))
print(re.escape("We've \t learned various [a-9] concepts of& Python ^!"))
Output:
Welcome\ to\ C# Tutorial\ Tech
We've\ \ \ learned\ various\ \[a\-9\]\ concepts\ of\&\ Python\ \^!
Explanation:
In this instance, we have utilized the re.escape function to appropriately escape any special characters present within the provided strings.
Meta-characters in Python Regex
In regular expressions, meta-characters are defined as unique symbols that dictate how patterns are interpreted. These characters are regarded as non-literal unless they are prefixed with a backslash "\\" to escape their special functionality.
Presented below is a table that includes different meta-characters utilized in Python regular expressions:
| Meta-Character | Description | |
|---|---|---|
|
It is used to drop the special meaning of character following it | |
| [] | It represents a character class | |
^ |
It is used to match the beginning | |
| $ | It is used to match the end | |
. |
It is used match any character except newline | |
|
It means OR (Matches with any of the characters separated by it. | |
? |
It matches zero or one occurrence | |
* |
It is used to show any number of occurrences (including 0 occurrences) | |
+ |
It is used to display one or more occurrences | |
| {} | It indicates the number of occurrences of a preceding regex to match. | |
| () | It encloses a group of Regex |
Meta-characters Example in Python Regex
Let us examine an illustration of meta-characters utilized in Python regular expressions.
Example
# importing the re module
import re
# The . meta-character matches any character (except newline)
text = "cat bat sat mat"
pattern = r"at" # Matches 'at' in any word
matches = re.findall(pattern, text)
print(matches)
# The ^ meta-character matches the start of the string
text = "The quick brown fox"
pattern = r"^The" # Matches 'The' only if it is at the beginning
matches = re.findall(pattern, text)
print(matches)
# The $ meta-character matches the end of the string
text = "The quick brown fox"
pattern = r"fox$" # Matches 'fox' only if it is at the end
matches = re.findall(pattern, text)
print(matches)
# The * meta-character matches zero or more occurrences of the preceding character
text = "ab abb abbb"
pattern = r"ab*" # Matches 'a' followed by zero or more 'b's
matches = re.findall(pattern, text)
print(matches)
# The + meta-character matches one or more occurrences of the preceding character
text = "ab abb abbb"
pattern = r"ab+" # Matches 'a' followed by one or more 'b's
matches = re.findall(pattern, text)
print(matches)
Output:
['at', 'at', 'at', 'at']
['The']
['fox']
['ab', 'abb', 'abbb']
['ab', 'abb', 'abbb']
Explanation:
In this illustration, various meta-characters are demonstrated, including ., ^, $, *, and +. We have applied these meta-characters within distinct regex patterns and utilized the findall function to identify all occurrences within the specified strings.
Special Sequences in Python Regex
Special Sequences serve as abbreviated representations for frequently used character classes or positions. These sequences initiate with a backslash "\\" followed by a specific letter or symbol.
Below is a compilation of frequently utilized special sequences in regular expressions (regex):
| Special Sequence | Description |
|---|---|
d |
Digit (0 - 9) |
D |
Non-digit |
w |
Word character (letters, digits, underscore) |
W |
Non-word character |
s |
Whitespace (space, tab, newline) |
S |
Non-whitespace |
b |
Word boundary |
B |
Not a word boundary |
A |
Start of string |
Z |
End of string |
Special Sequences Example in Python Regex
Next, we will examine the subsequent illustration of unique sequences within Python's regex functionality.
Example
# importing the re module
import re
# \d Matches a digit
txt = "Welcome to our tutorial 123 Tech"
x = re.findall("\d", txt)
print(x)
# \S Matches a non-whitespace character
txt = "logicpractice tech"
x = re.findall("\S", txt)
print(x)
# \w Matches a word character (alphanumeric + underscore)
txt = "logicpractice world_123"
x = re.findall("\w", txt)
print(x)
# \b Matches the boundary between a word and a non-word character
txt = "logicpractice tech"
x = re.findall(r"\btech\b", txt)
print(x)
Output:
['1', '2', '3']
['t', 'p', 'o', 'i', 'n', 't', 't', 'e', 'c', 'h']
['t', 'p', 'o', 'i', 'n', 't', 'w', 'o', 'r', 'l', 'd', '_', '1', '2', '3']
['tech']
Explanation:
In this illustration, we observe the application of various special sequences such as \d, \S, \w, and \b within regular expressions (regex).
Match Objects in Python Regex
When employing functions such as re.search or re.match, a match object is returned if a match is detected. This object contains various details about the match, such as its content and location within the string.
Match Objects Example in Python Regex
Let’s explore an example that illustrates the match objects in Python’s regular expressions (regex).
Example
# importing the re module
import re
# given string
str_1 = "He is a boy and he plays cricket everyday"
# regex pattern
regex_pattern = r"he"
# Finding the first occurrence and return a match object
match_obj = re.search(regex_pattern, str_1, re.IGNORECASE)
if match_obj:
print(f"Search match object: {match_obj}")
print(f"Match start index: {match_obj.start()}")
print(f"Match end index: {match_obj.end()}")
print(f"Matched string: {match_obj.group(0)}")
Output:
Search match object: <re.Match object; span=(0, 2), match='He'>
Match start index: 0
Match end index: 2
Matched string: He
Explanation:
In this illustration, we can observe the application of various match object methods within regular expressions (regex).
Below is a compilation of frequently utilized methods associated with match objects in Python's regular expressions:
| Method | Description |
|---|---|
| .start() | It returns the start index of the match |
| .end() | It returns the end index of the match |
| .span() | It returns a tuple of (start, end) |
| .group() | It returns the matched string |
| .groups() | It returns all capture groups as a tuple |
| .group(n) | It returns the nth capture group |
Conclusion
In this tutorial, we explored the concept of regular expressions within the Python programming language. We gained insights into the functionality of regex in Python and examined its multiple functions. Additionally, we delved into other essential topics such as meta-characters, special sequences, and match objects.
Python Regular Expression MCQs
- Which module is used for regular expressions in Python?
- regex
- pattern
- string
- What does \d match in regex?
- A letter
- A space
- A digit
- A special character
- What function is used to find all matches of a pattern in a string?
- match
- findall
- search
- compile
Response: b) re.findall
- Which option corresponds to a match for an individual whitespace character?
- What does the . (dot) meta-character match?
- A digit only
- End of a string
- A whitespace character
- Any single character except newline