Comprehensive tutorial on Python's Match.span method with practical examples
last modified April 20, 2025
The Match.span method is part of Python’s re module. It returns a tuple containing the start and end positions of a match.
This method is available on match objects returned by regex operations. It provides precise location information about where matches occur in text.
Understanding span is crucial for text processing tasks that require position information, such as highlighting or extracting substrings.
The syntax for Match.span is straightforward:
Match.span(group=0)
The optional group parameter specifies which capture group to return positions for. Default is 0 (the entire match).
Let’s start with a simple example of finding a word’s position in text.
basic_span.py
#!/usr/bin/python
import re
text = “The quick brown fox jumps over the lazy dog” pattern = re.compile(r’fox')
match = pattern.search(text) if match: start, end = match.span() print(f"Found ‘fox’ from position {start} to {end}") print(f"Matched text: ‘{text[start:end]}’")
This example shows how to get the start and end positions of a match. The span is used to extract the matched substring from the original text.
start, end = match.span()
The span method returns a tuple with two integers. The first is the start index, the second is the end index.
text[start:end]
Using the span indices, we can slice the original string to get exactly the matched portion. This is more efficient than using group.
The span method can also return positions for specific groups.
group_span.py
#!/usr/bin/python
import re
text = “Date: 2023-12-25” pattern = re.compile(r’(\d{4})-(\d{2})-(\d{2})’)
match = pattern.search(text) if match: print(f"Full match span: {match.span()}") print(f"Year span: {match.span(1)}") print(f"Month span: {match.span(2)}") print(f"Day span: {match.span(3)}")
This demonstrates getting spans for different capture groups. Group 0 is always the entire match, while groups 1+ are captures.
When processing multiple matches, span helps locate each one.
multi_span.py
#!/usr/bin/python
import re
text = “cat bat hat mat” pattern = re.compile(r’[a-z]at’)
for match in pattern.finditer(text): start, end = match.span() word = text[start:end] print(f"Found ‘{word}’ at positions {start}-{end}")
The finditer method returns match objects for all occurrences. We use span to get each match’s position in the original text.
Named groups make span positions more readable and maintainable.
named_span.py
#!/usr/bin/python
import re
text = “John Doe, age 30” pattern = re.compile(r’(?P<first>\w+) (?P<last>\w+), age (?P<age>\d+)’)
match = pattern.search(text) if match: print(f"Name span: {match.span(‘first’)} to {match.span(’last’)}") print(f"Age span: {match.span(‘age’)}")
Named groups allow accessing spans by meaningful names instead of numbers. This makes code more self-documenting and less prone to errors.
Span information can guide complex text replacement operations.
replace_span.py
#!/usr/bin/python
import re
text = “Error 404: Not Found; Error 500: Server Error” pattern = re.compile(r’Error (\d{3}): ([A-Za-z ]+)’)
def replace_error(match): code_span = match.span(1) desc_span = match.span(2) code = match.group(1) desc = match.group(2).lower() return f"Code {code} ({desc})"
result = pattern.sub(replace_error, text) print(result)
This example uses span information within a replacement function. The spans help understand the structure of each match for processing.
The span method helps identify overlapping matches.
overlap_span.py
#!/usr/bin/python
import re
text = “ababababab” pattern = re.compile(r’(?=(abab))’)
for i, match in enumerate(pattern.finditer(text), 1): start, end = match.span(1) print(f"Match {i}: ‘{match.group(1)}’ at {start}-{end}")
This finds all overlapping occurrences of ‘abab’ in the text. The lookahead pattern with span captures each position.
Handling multiline text requires understanding how span counts positions.
multiline_span.py
#!/usr/bin/python
import re
text = “““First line Second line Third line””” pattern = re.compile(r’^(\w+)’, re.MULTILINE)
for match in pattern.finditer(text): start, end = match.span() line = text[start:end] print(f"Found ‘{line}’ at positions {start}-{end}")
The re.MULTILINE flag makes ^ match line starts. Span positions are counted across the entire string, including newlines.
When using Match.span, follow these best practices:
Always check if a match was found before calling span
Use named groups for better readability with span positions
Remember span indices follow Python’s slicing rules (end is exclusive)
Combine with group when you need both text and position
Handle Unicode characters carefully as they may affect position counts
The span method is highly optimized and has minimal overhead. It’s more efficient than using start and end separately.
For very large texts, be mindful that span positions are byte offsets. With Unicode, character counts may differ from byte positions.
Python Match.span() documentation
This tutorial covered the essential aspects of Python’s Match.span method. Mastering position retrieval will enhance your text processing capabilities.
My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.
List all Python tutorials.