As a data analyst or a data scientist, you might have come across the need to iterate over a Pandas dataframe rows. In this article, we will discuss how to iterate over rows of a Pandas dataframe and some best practices for doing so.
Loading a Sample Pandas Dataframe
Copy the code below if you’d like to follow along with an example dataframe. To print a small dataframe in its totality, we will load it. Running this dataframe should not cause any significant speed issues, but they will become more apparent as your dataset expands.
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# print the DataFrame
print(df)
In this example, we create a dictionary of sample data with three columns: name
, age
, and country
. We then create a Pandas DataFrame from the dictionary using the pd.DataFrame()
constructor. Finally, we print the DataFrame to verify that it was created correctly.
How to Vectorize Instead of Iterating Over Rows
To vectorize operations in Pandas instead of iterating over rows, you can use built-in functions and methods that operate on entire columns or subsets of data. Here are some examples:
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# vectorize operations instead of iterating over rows
df['age_squared'] = df['age'] ** 2
df['is_adult'] = df['age'] >= 18
grouped_df = df.groupby('country').mean()
# print the DataFrame
print(df)
print(grouped_df)
Output:
# name age country age_squared is_adult
#0 Alice 25 USA 625 True
#1 Bob 30 Canada 900 True
#2 Charlie 35 Australia 1225 True
# age age_squared is_adult
#country
#Australia 35.0 1225.0 1.0
#Canada 30.0 900.0 1.0
#USA 25.0 625.0 1.0
In this example, we first create a DataFrame from a dictionary of sample data. Instead of iterating over rows to apply a function, we use the vectorization technique to compute the square of all values in the age
column and store the result in a new column called age_squared
. We also create a new column called is_adult
that checks whether the value in the age
column is greater than or equal to 18. We then use the groupby()
method to group the data by the values in the country
column and compute the mean of each group. Finally, we print the original DataFrame and the grouped DataFrame to verify that the operations were performed correctly.
How to Use Pandas iterrows to Iterate over a Dataframe Rows
Pandas .iterrows()
method can be used to truly iterate through the rows of a Pandas dataframe. A generator object built on tuples is produced by the function. This indicates that each tuple includes the values for each row and an index (from the dataframe). The fact that .iterrows()
does not retain data types is crucial to keep in mind in this situation. Check out the next section on.itertuples if you want to keep data types ().
Let’s check the .iterrows()
function in action:
import pandas as pd
# create a sample DataFrame
df = pd.DataFrame({'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']})
# iterate over rows using iterrows()
for index, row in df.iterrows():
print(f"Name: {row['name']}, Age: {row['age']}, Country: {row['country']}")
Output:
# Name: Alice, Age: 25, Country: USA
# Name: Bob, Age: 30, Country: Canada
# Name: Charlie, Age: 35, Country: Australia
In this example, we first create a sample DataFrame with three columns: name
, age
, and country
. We then iterate over each row of the DataFrame using the iterrows()
method, which returns a tuple containing the index of the row and a Series object representing the row itself. We then access the values in each row using the index of the column, which is specified as a string. In this case, we print the name, age, and country for each row using f-strings to format the output.
How to Use Pandas itertuples to Iterate over a Dataframe Rows
The itertuples()
method returns an iterator yielding namedtuples for each row in the DataFrame. The namedtuples contain the index of the row as the first element, followed by the values of each column in the row.
Here’s an example:
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# iterate over the rows of the DataFrame using itertuples()
for row in df.itertuples():
print(row.Index, row.name, row.age, row.country)
Output:
# 0 Alice 25 USA
# 1 Bob 30 Canada
# 2 Charlie 35 Australia
Note that row.Index
refers to the index of the row, while row.name
, row.age
, and row.country
refer to the values in the “name”, “age”, and “country” columns, respectively, for that row.
To print only the values of the “country” column of the DataFrame, you can modify the previous code by only printing the value of row.country
inside the loop.
Here’s an example:
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# iterate over the rows of the DataFrame using itertuples()
for row in df.itertuples():
print(row.country)
Output:
# USA
# Canada
# Australia
This code demonstrates how you can use itertuples()
to iterate over the rows of a DataFrame and access the values in a specific column.
How to Use Pandas itertuples to Iterate over a Dataframe Rows
The Pandas .items()
method lets you access each item in a Pandas row. It generates generator objects for each column and their items. To use Pandas items() to iterate over a DataFrame’s rows, you can modify the previous code by adding a for loop that iterates over the rows of the DataFrame using iterrows().
Here’s an example:
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# iterate over the rows of the DataFrame using items()
for index, row in df.items():
print(row['country'])
Output:
# USA
# Canada
# Australia
This code adds a for
loop that iterates over the rows of the DataFrame using items()
. The items()
method returns a tuple for each row of the DataFrame, which contains the index of the row and a Series object representing the row’s data. The loop then prints the value of the “country” column for each row, which is accessed using row['country']
.
Note that in this case, index
refers to the index of the column being iterated over, which is not used in the loop.
How to Use a For Loop to Iterate over a Pandas Dataframe Rows
To get that result, we can use the Pandas .iloc
accessor to access different rows while looping over the length of the for loop.
Here’s an example:
import pandas as pd
# create a dictionary of sample data
data = {'name': ['Alice', 'Bob', 'Charlie'],
'age': [25, 30, 35],
'country': ['USA', 'Canada', 'Australia']}
# create a Pandas DataFrame from the dictionary
df = pd.DataFrame(data)
# iterate over the rows of the dataframe using .iloc
for i in range(len(df)):
print(df.iloc[i]['name'], df.iloc[i]['age'], df.iloc[i]['country'])
Output:
# Alice 25 USA
# Bob 30 Canada
# Charlie 35 Australia
This code imports the pandas
library, creates a dictionary called data
with sample data, creates a pandas dataframe from the dictionary, and then uses a for loop with .iloc
to iterate over the rows of the dataframe. Within the loop, we print out the values for the name
, age
, and country
columns for each row using .iloc
to access the row by its index. The range(len(df))
generates a range object from 0 to the number of rows in the dataframe, which is then used to iterate over the dataframe. We can access the columns’ values by the column name as df.iloc[i]['column_name']
.
Wrap up
To learn more about the Pandas .iterrows()
method, check out: the official documentation here:
https://pandas.pydata.org/docs
Thanks for reading. Happy coding!