Polars String Operations

Python tutorial on Polars, covering string operations for data manipulation with practical examples.

polars

Aug 29, 2025

Polars String Operations

last modified March 1, 2025

Polars is a fast DataFrame library in Rust with Python bindings. It is designed for efficient data manipulation and analysis. String operations are essential for cleaning and transforming text data in DataFrames. This tutorial covers common string operations in Polars with practical examples.

String operations include tasks like substring extraction, case conversion, and pattern matching. Polars provides a rich set of string methods for these tasks, making it a powerful tool for text data processing.

Convert to Lowercase

This example shows how to convert a column of strings to lowercase.

lowercase.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello”, “WORLD”, “Polars”, “Tutorial”] })

df = df.with_column(pl.col(“text”).str.to_lowercase().alias(“lowercase_text”)) print(df)

The str.to_lowercase method converts all strings in the ’text' column to lowercase. This is useful for standardizing text data.

Extract Substrings

This example demonstrates how to extract substrings from a column.

substring.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello World”, “Polars Tutorial”, “Data Science”] })

df = df.with_column(pl.col(“text”).str.slice(0, 5).alias(“substring”)) print(df)

The str.slice(0, 5) method extracts the first 5 characters from each string in the ’text’ column. This is useful for extracting fixed-length data.

Replace Substrings

This example shows how to replace substrings in a column.

replace.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello World”, “Polars Tutorial”, “Data Science”] })

df = df.with_column(pl.col(“text”).str.replace(“World”, “Universe”).alias(“replaced_text”)) print(df)

The str.replace(“World”, “Universe”) method replaces ‘World’ with ‘Universe’ in the ’text’ column. This is useful for correcting or updating text.

Split Strings

This example demonstrates how to split strings into lists based on a delimiter.

split.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello,World”, “Polars,Tutorial”, “Data,Science”] })

df = df.with_column(pl.col(“text”).str.split(",").alias(“split_text”)) print(df)

The str.split(",") method splits each string in the ’text’ column by commas. This is useful for parsing CSV-like data.

Concatenate Strings

This example shows how to concatenate strings from multiple columns.

concat.py

import polars as pl

df = pl.DataFrame({ “first_name”: [“John”, “Jane”, “Alice”], “last_name”: [“Doe”, “Smith”, “Johnson”] })

df = df.with_column((pl.col(“first_name”) + " " + pl.col(“last_name”)).alias(“full_name”)) print(df)

The + operator concatenates ‘first_name’ and ’last_name’ columns with a space in between. This is useful for creating full names or combined text.

Check for Substrings

This example demonstrates how to check if a substring exists in a column.

contains.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello World”, “Polars Tutorial”, “Data Science”] })

df = df.with_column(pl.col(“text”).str.contains(“World”).alias(“contains_world”)) print(df)

The str.contains(“World”) method checks if ‘World’ exists in each string of the ’text’ column. This is useful for filtering or flagging data.

Trim Whitespace

This example shows how to trim leading and trailing whitespace from strings.

trim.py

import polars as pl

df = pl.DataFrame({ “text”: [" Hello “, " Polars “, " Data Science “] })

df = df.with_column(pl.col(“text”).str.strip().alias(“trimmed_text”)) print(df)

The str.strip method removes leading and trailing whitespace from each string in the ’text’ column. This is useful for cleaning messy data.

Regular Expression Matching

This example demonstrates how to use regular expressions to extract patterns.

regex.py

import polars as pl

df = pl.DataFrame({ “text”: [“Hello123”, “Polars456”, “Data789”] })

df = df.with_column(pl.col(“text”).str.extract(r”\d+”).alias(“extracted_numbers”)) print(df)

The str.extract(r”\d+") method extracts numeric sequences from each string in the ’text’ column. This is useful for pattern-based extraction.

Best Practices for String Operations

Standardize Text: Use lowercase or uppercase for consistency.
Handle Missing Data: Check for null values before operations.
Use Regular Expressions Wisely: Test regex patterns thoroughly.
Optimize Performance: Use vectorized operations for large datasets.

Source

Polars Documentation

In this article, we have explored how to perform string operations in Polars.

Author

My name is Jan Bodnar, and I am a passionate programmer with extensive programming experience. I have been writing programming articles since 2007. To date, I have authored over 1,400 articles and 8 e-books. I possess more than ten years of experience in teaching programming.

List all Polars tutorials.