Comprehensive tutorial on Python's re.sub function with practical examples
last modified April 20, 2025
The re.sub function is a powerful tool in Python’s re module for performing substitutions using regular expressions. It searches for patterns in strings and replaces them with specified text.
This function is essential for text processing tasks like data cleaning, formatting, and transformation. It offers more flexibility than simple string replacement methods.
re.sub can use both literal replacements and callbacks for dynamic substitutions. It supports flags to modify matching behavior and can reference matched groups in replacements.
The basic syntax of re.sub has three required parameters:
re.sub(pattern, repl, string, count=0, flags=0)
pattern is the regex to match. repl is the replacement string. string is the input text to process.
Optional count limits replacements. flags modify matching behavior. The function returns the modified string.
Let’s start with a simple example replacing colors in a sentence.
basic_sub.py
#!/usr/bin/python
import re
text = “The sky is blue and the grass is green” result = re.sub(r’blue’, ‘gray’, text)
print(result)
This replaces all occurrences of ‘blue’ with ‘gray’ in the input text. The replacement is case-sensitive by default.
result = re.sub(r’blue’, ‘gray’, text)
The first argument is the pattern to match. The second is the replacement string. The third is the text to process.
re.sub shines when using regex patterns for matching.
regex_pattern.py
#!/usr/bin/python
import re
text = “Order 12345 shipped, Order 67890 processing” result = re.sub(r’Order \d+’, ‘Order XXXX’, text)
print(result)
This replaces all order numbers with ‘XXXX’. The \d+ pattern matches one or more digits.
The example demonstrates how regex patterns can match variable text for consistent replacements.
We can reference matched groups in the replacement string.
group_reference.py
#!/usr/bin/python
import re
text = “2023-04-20” result = re.sub(r’(\d{4})-(\d{2})-(\d{2})’, r’\2/\3/\1’, text)
print(result)
This reformats a date from YYYY-MM-DD to MM/DD/YYYY. Parentheses create capture groups referenced as \1, \2, etc.
r’(\d{4})-(\d{2})-(\d{2})’
The pattern captures year, month, and day as separate groups. Each \d matches a digit, with {n} specifying quantity.
For dynamic replacements, we can use a callback function.
callback_function.py
#!/usr/bin/python
import re
def double_match(match): return str(int(match.group()) * 2)
text = “Scores: 10, 20, 30” result = re.sub(r’\d+’, double_match, text)
print(result)
This doubles all numbers in the text. The callback receives a match object and returns the replacement string.
The function approach enables complex transformations based on matched content. It’s more flexible than static replacement strings.
The count parameter limits how many substitutions occur.
count_parameter.py
#!/usr/bin/python
import re
text = “apple apple apple apple” result = re.sub(r’apple’, ‘orange’, text, count=2)
print(result)
This replaces only the first two occurrences of ‘apple’. The remaining matches stay unchanged.
Controlling replacement count is useful when you want partial substitutions or to process only certain matches.
Flags like re.IGNORECASE modify matching behavior.
case_insensitive.py
#!/usr/bin/python
import re
text = “Python is GREAT, really great!” result = re.sub(r’great’, ‘awesome’, text, flags=re.IGNORECASE)
print(result)
This replaces all case variants of ‘great’ with ‘awesome’. The flag makes the match case-insensitive.
Flags can be combined using bitwise OR (|) when multiple behaviors are needed simultaneously.
Here’s a more complex example swapping word positions.
word_swap.py
#!/usr/bin/python
import re
text = “John Doe, Jane Smith, Mike Johnson” result = re.sub(r’(\w+) (\w+)’, r’\2, \1’, text)
print(result)
This swaps first and last names, adding a comma between them. The \w+ pattern matches word characters.
The example shows how regex groups can restructure text in powerful ways. This technique is useful for data reformatting.
When using re.sub, consider these best practices:
Use raw strings (r’’) for patterns to avoid escaping issues
Pre-compile patterns with re.compile if reused frequently
Be specific with patterns to avoid unintended matches
Use callback functions for complex replacement logic
Test patterns thoroughly with various input cases
re.sub performance depends on pattern complexity and input size. Simple patterns on small texts are fast, while complex patterns on large texts may need optimization.
For repeated substitutions, pre-compiling the pattern with re.compile improves performance. Avoid unnecessary capturing groups when possible.
This tutorial covered the essential aspects of Python’s re.sub function. Mastering pattern substitution will enhance your text processing capabilities significantly.
My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.
List all Python tutorials.