Pandas

Pandas tutorial shows how to do basic data analysis in Python with Pandas library. Pandas is a library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

python

Aug 29, 2025

Pandas

last modified January 29, 2024

In this article we show how to do basic data analysis in Python with Pandas library. The code examples and the data are available at the author’s Github repository.

Pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language.

The name of the library comes from the term “panel data”, which is an econometrics term for data sets that include observations over multiple time periods for the same individuals.

It offers data structures and operations for manipulating numerical tables and time series. The main two data types are: Series and DataFrame.

DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). It is a spreadsheet-like data structure. Series is a single column of the DataFrame. A DataFrame can be thought of as a dictionary of Series objects.

Python Pandas installation

Pandas is installed with the following command:

$ pip install pandas

We use the pip3 command to install pandas module.

$ pip install numpy

Some examples also use numpy.

Pandas simple example

The following is a simple Pandas example.

simple.py

#!/usr/bin/python

import pandas as pd

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]] df = pd.DataFrame(data, columns=[‘Name’, ‘Age’])

print(df)

In the program, we create a simple DataFrame and print it to the console.

import pandas as pd

We import the Pandas library.

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]]

This is the data to be displayed in the frame. Each nested list is a row in the table. Note that there are many ways how to initialize a Pandas DataFrame.

df = pd.DataFrame(data, columns=[‘Name’, ‘Age’])

A DataFrame is created from the data. We give the frame column names with columns property.

$ python simple.py Name Age 0 Alex 10 1 Ronald 18 2 Jane 33

This is the output. The first column are row indexes.

Pandas changing index

We can update the index so that it does not start from 0.

change_index.py

#!/usr/bin/python

import pandas as pd

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]] df = pd.DataFrame(data, columns=[‘Name’, ‘Age’]) df.index = df.index + 1

print(df)

In the example, we add 1 to the index.

$ python change_index.py Name Age 1 Alex 10 2 Ronald 18 3 Jane 33

Pandas scalar series

The following example creates a series of a scalar value.

series_scalar.py

#!/usr/bin/python

import pandas as pd

s = pd.Series(5, index=[0, 1, 2, 3]) print(s)

We have a column containing fives.

$ python series_scalar.py 0 5 1 5 2 5 3 5 dtype: int64

The left column is the index.

Pandas series ndarray

We can create a series object from a numpy ndarray .

series_numpy.py

#!/usr/bin/python

import pandas as pd import numpy as np

data = np.array([‘a’, ‘b’, ‘c’, ’d’]) s = pd.Series(data)

print(s)

The example creates a column of letters from an ndarray.

$ python series_numpy.py 0 a 1 b 2 c 3 d dtype: object

Pandas series dict

A series can be created from a Python dictionary.

series_dict.py

#!/usr/bin/python

import pandas as pd import numpy as np

data = {‘coins’ : 22, ‘pens’ : 3, ‘books’ : 28} s = pd.Series(data)

print(s)

The example creates a series object from a dicionary of items.

$ python series_dict.py coins 22 pens 3 books 28 dtype: int64

The index consits of the names of the items.

Pandas series retrieve

The following example retrieves values form a series object.

series_retrieve.py

#!/usr/bin/python

import pandas as pd

s = pd.Series([1, 2, 3, 4, 5], index=[‘a’, ‘b’, ‘c’, ’d’, ’e’])

print(s[0]) print(’———————–’)

print(s[1:4]) print(’———————–’)

print(s[[‘a’,‘c’,’d’]])

The example retrieves values from a series object.

print(s[0])

Here we get a single value.

print(s[1:4])

We retrieve rows by their indexes.

print(s[[‘a’,‘c’,’d’]])

Here we get the values by the index labels.

$ python series_retrieve.py 1

b 2 c 3 d 4 dtype: int64

a 1 c 3 d 4 dtype: int64

Pandas custom index

The index column does not have to be numerical. We can create our own custom index.

custom_index.py

#!/usr/bin/python

import pandas as pd

data = {“country”: [“Brazil”, “Russia”, “India”, “China”, “South Africa”], “capital”: [“Brasilia”, “Moscow”, “New Dehli”, “Beijing”, “Pretoria”], “area”: [8.516, 17.10, 3.286, 9.597, 1.221], “population”: [200.4, 143.5, 1252, 1357, 52.98]}

frame = pd.DataFrame(data) print(frame)

print(’——————————’)

frame.index = [“BR”, “RU”, “IN”, “CH”, “SA”] print(frame)

In the example, we create a data frame from a data dictionary. We print the data frame and then we change the index column with index property.

$ python custom_index.py country capital area population 0 Brazil Brasilia 8.516 200.40 1 Russia Moscow 17.100 143.50 2 India New Dehli 3.286 1252.00 3 China Beijing 9.597 1357.00 4 South Africa Pretoria 1.221 52.98

     country    capital    area  population

BR Brazil Brasilia 8.516 200.40 RU Russia Moscow 17.100 143.50 IN India New Dehli 3.286 1252.00 CH China Beijing 9.597 1357.00 SA South Africa Pretoria 1.221 52.98

Pandas index, columns & values

Pandas DataFrame has three basic parts: index, columns, and values.

index_vals_cols.py

#!/usr/bin/python

import pandas as pd

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]] df = pd.DataFrame(data, columns=[‘Name’, ‘Age’])

print(f’Index: {df.index}’) print(f’Columns: {df.columns}’) print(f’Values: {df.values}’)

The example prints the index, columns, and values of a data frame.

$ python index_vals_cols.py Index: RangeIndex(start=0, stop=3, step=1) Columns: Index([‘Name’, ‘Age’], dtype=‘object’) Values: [[‘Alex’ 10] [‘Ronald’ 18] [‘Jane’ 33]]

Pandas sum and max value

The following example calculates the sum and the maximum of values in a data frame column. It uses also numpy library.

sum_max.py

#!/usr/bin/python

import pandas as pd import numpy as np

df = pd.DataFrame(np.arange(0, 1200, 2), columns=[‘A’])

df.index = df.index + 1

print(sum(df[‘A’])) print(max(df[‘A’]))

print(df)

The example calculates the maximum and the sum of values. It uses numpy’s arange fuction to generate an array of values.

print(sum(df[‘A’]))

When we compute the sum value, we refer to the column by its name.

$ sum_max.py 359400 1198

Pandas read CSV

Pandas reads data from a CSV file with read_csv.

military_spending.csv

Pos, Country, Amount (Bn. $), GDP 1, United States, 610.0, 3.1 2, China, 228.0, 1.9 3, Saudi Arabia, 69.4, 10.0 4, Russia, 66.3, 4.3 5, India, 63.9, 2.5 6, France, 57.8, 2.3 7, United Kingdom, 47.2, 1.8 8, Japan, 45.4, 0.9 9, Germany, 44.3, 1.2 10, South Korea, 39.2, 2.6 11, Brazil, 29.3, 1.4 12, Italy Italy, 29.2, 1.5 13, Australia Australia, 27.5, 2.0 14, Canada Canada, 20.6, 1.3 15, Turkey Turkey, 18.2, 2.2

This is a simple CSV file containing data about military spending of countries.

Note: CSV files may have optional column names in the first row.

read_from_csv.py

#!/usr/bin/python

import pandas as pd

df = pd.read_csv(“military_spending.csv”)

print(df.to_string(index=False))

The example reads all data from the military_spending.csv file and prints it in tabular format to the console. It uses read_csv method.

print(df.to_string(index=False))

Since we have positions column, we hide the index from the output.

$ python read_from_csv.py Pos Country Amount (Bn. $) GDP 1 United States 610.0 3.1 2 China 228.0 1.9 3 Saudi Arabia 69.4 10.0 4 Russia 66.3 4.3 5 India 63.9 2.5 6 France 57.8 2.3 7 United Kingdom 47.2 1.8 8 Japan 45.4 0.9 9 Germany 44.3 1.2 10 South Korea 39.2 2.6 11 Brazil 29.3 1.4 12 Italy Italy 29.2 1.5 13 Australia Australia 27.5 2.0 14 Canada Canada 20.6 1.3 15 Turkey Turkey 18.2 2.2

Pandas write CSV

A DataFrame is written to a CSV file with to_csv.

write_csv.py

#!/usr/bin/python

import pandas as pd

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]] df = pd.DataFrame(data, columns=[‘Name’, ‘Age’])

df.to_csv(“users.csv”, index=False)

The example writes data to the users.csv file.

Pandas random rows

Random rows from the data frame can be selected with sample.

random_sample.py

#!/usr/bin/python

import pandas as pd

df = pd.read_csv(“military_spending.csv”)

print(df.sample(3))

In the example, we print three random rows from the data frame.

Pandas to_dict function

The to_dict transforms a data frame to a Python dictionary. The dictionary can be shown in different data outputs.

data_orient.py

#!/usr/bin/python

import pandas as pd

data = [[‘Alex’, 10], [‘Ronald’, 18], [‘Jane’, 33]] df = pd.DataFrame(data, columns=[‘Name’, ‘Age’])

print(’list’) print(df.to_dict(orient=‘list’))