Python XML with SAX

Python XML with SAX tutorial shows how to use the SAX API for event-driven XML parsing in Python.

Python XML with SAX

Python XML with SAX

last modified February 15, 2025

In this article, we show how to use the SAX (Simple API for XML) in Python for event-driven XML parsing. SAX is a memory-efficient approach to parsing XML documents, making it suitable for large files. Unlike DOM (Document Object Model), SAX does not load the entire XML document into memory. Instead, it processes the document sequentially and triggers events as it encounters elements, attributes, and text.

The xml.sax module is part of Python’s standard library, so no additional installation is required.

Basic SAX Parsing

The following example demonstrates how to parse an XML document using SAX. We create a custom handler class to handle events such as start elements, end elements, and character data.

main.py

import xml.sax from io import StringIO

class MyHandler(xml.sax.ContentHandler): def init(self): self.current_element = "" self.current_data = ""

# Called when an element starts
def startElement(self, tag, attributes):
    self.current_element = tag
    if tag == "book":
        print("Book Id:", attributes["id"])

# Called when an element ends
def endElement(self, tag):
    if tag == "title":
        print("Title:", self.current_data)
    elif tag == "author":
        print("Author:", self.current_data)
    elif tag == "year":
        print("Year:", self.current_data)
    self.current_data = ""

# Called when character data is found
def characters(self, content):
    if self.current_element in ["title", "author", "year"]:
        self.current_data += content.strip()

XML data

xml_data = """ <catalog> <book id=“1”> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <year>1925</year> </book> <book id=“2”> <title>1984</title> <author>George Orwell</author> <year>1949</year> </book> </catalog> """

Create a SAX parser

parser = xml.sax.make_parser() handler = MyHandler() parser.setContentHandler(handler)

Parse the XML data

parser.parse(StringIO(xml_data))

In this program, the MyHandler class inherits from xml.sax.ContentHandler and overrides the startElement, endElement, and characters methods to handle XML events.

parser.parse(StringIO(xml_data))

The StringIO is used to create an in-memory file-like object from the xml_data string. This allows the parser.parse method to read the XML data as if it were reading from a file.

$ python main.py Book Id: 1 Title: The Great Gatsby Author: F. Scott Fitzgerald Year: 1925 Book Id: 2 Title: 1984 Author: George Orwell Year: 1949

Handling Attributes

The following example demonstrates how to handle attributes in XML elements using SAX.

main.py

import xml.sax from io import StringIO

import xml.sax

class MyHandler(xml.sax.ContentHandler): def init(self): self.current_element = ""

# Called when an element starts
def startElement(self, tag, attributes):
    self.current_element = tag
    if tag == "book":
        print("Book Id:", attributes["id"])
        print("Category:", attributes["category"])

# Called when an element ends
def endElement(self, tag):
    pass

# Called when character data is found
def characters(self, content):
    pass

XML data

xml_data = """ <catalog> <book id=“1” category=“fiction”> <title>The Great Gatsby</title> <author>F. Scott Fitzgerald</author> <year>1925</year> </book> <book id=“2” category=“dystopian”> <title>1984</title> <author>George Orwell</author> <year>1949</year> </book> <book id=“3” category=“fiction”> <title>War and Peace</title> <author>Leo Tolstoy</author> <year>1869</year> </book> </catalog> """

Create a SAX parser

parser = xml.sax.make_parser() handler = MyHandler() parser.setContentHandler(handler)

Parse the XML data

parser.parse(StringIO(xml_data))

In this program, the startElement method is used to handle attributes of the book element, such as id and category.

$ python main.py Book Id: 1 Category: fiction Book Id: 2 Category: dystopian Book Id: 3 Category: fiction

Parsing XML Files

The following example demonstrates how to parse an XML file using SAX. This approach is memory-efficient because it processes the file sequentially without loading it entirely into memory.

products.xml

<products> <product> <id>1</id> <name>Product 1</name> <price>10.99</price> <quantity>30</quantity> </product> <product> <id>2</id> <name>Product 2</name> <price>20.99</price> <quantity>130</quantity> </product> <product> <id>4</id> <name>Product 4</name> <price>24.59</price> <quantity>350</quantity> </product> <product> <id>5</id> <name>Product 5</name> <price>9.9</price> <quantity>650</quantity> </product> <product> <id>6</id> <name>Product 6</name> <price>45</price> <quantity>290</quantity> </product> </products>

This is the file.

main.py

from xml.sax import make_parser, ContentHandler

class ProductHandler(ContentHandler): def init(self): self.current_data = "" self.product = {}

def startElement(self, name, attrs): self.current_data = "" if name == “product”: self.product = {}

def characters(self, content): self.current_data += content.strip()

def endElement(self, name): if name != “product”: self.product[name] = self.current_data else: print(f"Id: {self.product[‘id’]}, Name: {self.product[’name’]}")

parser = make_parser() parser.setContentHandler(ProductHandler()) parser.parse(“products.xml”)

In this program, the parser.parse method is used to parse a XML file named products.xml. The SAX parser processes the file sequentially, making it suitable for large files.

$ python main.py Id: 1, Name: Product 1 Id: 2, Name: Product 2 Id: 4, Name: Product 4 Id: 5, Name: Product 5 Id: 6, Name: Product 6

Source

Python SAX - Documentation

In this article, we have shown how to use the SAX API in Python for event-driven XML parsing. The SAX approach is memory-efficient and suitable for large XML files.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Python tutorials.

ad ad