Python XML with SAX tutorial shows how to use the SAX API for event-driven XML parsing in Python.
last modified February 15, 2025
In this article, we show how to use the SAX (Simple API for XML) in Python for event-driven XML parsing. SAX is a memory-efficient approach to parsing XML documents, making it suitable for large files. Unlike DOM (Document Object Model), SAX does not load the entire XML document into memory. Instead, it processes the document sequentially and triggers events as it encounters elements, attributes, and text.
The xml.sax module is part of Python’s standard library, so no additional installation is required.
The following example demonstrates how to parse an XML document using SAX. We create a custom handler class to handle events such as start elements, end elements, and character data.
main.py
import xml.sax from io import StringIO
class MyHandler(xml.sax.ContentHandler): def init(self): self.current_element = "" self.current_data = ""
# Called when an element starts
def startElement(self, tag, attributes):
self.current_element = tag
if tag == "book":
print("Book Id:", attributes["id"])
# Called when an element ends
def endElement(self, tag):
if tag == "title":
print("Title:", self.current_data)
elif tag == "author":
print("Author:", self.current_data)
elif tag == "year":
print("Year:", self.current_data)
self.current_data = ""
# Called when character data is found
def characters(self, content):
if self.current_element in ["title", "author", "year"]:
self.current_data += content.strip()
xml_data = """ <catalog> <book id=“1”> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <year>1925</year> </book> <book id=“2”> <title>1984</title> <author>George Orwell</author> <year>1949</year> </book> </catalog> """
parser = xml.sax.make_parser() handler = MyHandler() parser.setContentHandler(handler)
parser.parse(StringIO(xml_data))
In this program, the MyHandler class inherits from xml.sax.ContentHandler and overrides the startElement, endElement, and characters methods to handle XML events.
parser.parse(StringIO(xml_data))
The StringIO is used to create an in-memory file-like object from the xml_data string. This allows the parser.parse method to read the XML data as if it were reading from a file.
$ python main.py Book Id: 1 Title: The Great Gatsby Author: F. Scott Fitzgerald Year: 1925 Book Id: 2 Title: 1984 Author: George Orwell Year: 1949
The following example demonstrates how to handle attributes in XML elements using SAX.
main.py
import xml.sax from io import StringIO
import xml.sax
class MyHandler(xml.sax.ContentHandler): def init(self): self.current_element = ""
# Called when an element starts
def startElement(self, tag, attributes):
self.current_element = tag
if tag == "book":
print("Book Id:", attributes["id"])
print("Category:", attributes["category"])
# Called when an element ends
def endElement(self, tag):
pass
# Called when character data is found
def characters(self, content):
pass
xml_data = """ <catalog> <book id=“1” category=“fiction”> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <year>1925</year> </book> <book id=“2” category=“dystopian”> <title>1984</title> <author>George Orwell</author> <year>1949</year> </book> <book id=“3” category=“fiction”> <title>War and Peace</title> <author>Leo Tolstoy</author> <year>1869</year> </book> </catalog> """
parser = xml.sax.make_parser() handler = MyHandler() parser.setContentHandler(handler)
parser.parse(StringIO(xml_data))
In this program, the startElement method is used to handle attributes of the book element, such as id and category.
$ python main.py Book Id: 1 Category: fiction Book Id: 2 Category: dystopian Book Id: 3 Category: fiction
The following example demonstrates how to parse an XML file using SAX. This approach is memory-efficient because it processes the file sequentially without loading it entirely into memory.
products.xml
<products> <product> <id>1</id> <name>Product 1</name> <price>10.99</price> <quantity>30</quantity> </product> <product> <id>2</id> <name>Product 2</name> <price>20.99</price> <quantity>130</quantity> </product> <product> <id>4</id> <name>Product 4</name> <price>24.59</price> <quantity>350</quantity> </product> <product> <id>5</id> <name>Product 5</name> <price>9.9</price> <quantity>650</quantity> </product> <product> <id>6</id> <name>Product 6</name> <price>45</price> <quantity>290</quantity> </product> </products>
This is the file.
main.py
from xml.sax import make_parser, ContentHandler
class ProductHandler(ContentHandler): def init(self): self.current_data = "" self.product = {}
def startElement(self, name, attrs): self.current_data = "" if name == “product”: self.product = {}
def characters(self, content): self.current_data += content.strip()
def endElement(self, name): if name != “product”: self.product[name] = self.current_data else: print(f"Id: {self.product[‘id’]}, Name: {self.product[’name’]}")
parser = make_parser() parser.setContentHandler(ProductHandler()) parser.parse(“products.xml”)
In this program, the parser.parse method is used to parse a XML file named products.xml. The SAX parser processes the file sequentially, making it suitable for large files.
$ python main.py Id: 1, Name: Product 1 Id: 2, Name: Product 2 Id: 4, Name: Product 4 Id: 5, Name: Product 5 Id: 6, Name: Product 6
In this article, we have shown how to use the SAX API in Python for event-driven XML parsing. The SAX approach is memory-efficient and suitable for large XML files.
My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.
List all Python tutorials.