In this guide, we’ll explore the basics of CSV files, how to read them with Pandas, and some useful tips and tricks to make your data analysis smoother and more efficient. CSV (Comma-Separated Values) files are a common format for storing and exchanging data, especially in the data science and machine learning fields. Pandas, a Python library for data manipulation and analysis, makes it easy to read, manipulate and analyze CSV files.

What is a CSV file?

A CSV file is a text document that contains data organized in a tabular format, with each row denoting a record and each column a field of that record. CSV files are a well-liked format for storing and exchanging data between various apps and systems because they are simple to read and write. Simple text editors and spreadsheet programs like Microsoft Excel, Google Sheets, and LibreOffice Calc all support opening and editing CSV files.

This tutorial explains several ways to read CSV files into Python using the following CSV file named example.csv’:

				
					Name,Income,Gender
Alice,20000,female
Bob,80000,male
Charlie,70000,male
David,150000,male
Sarah,40000,female

				
			

Example 1: Read CSV File into pandas DataFrame

To read a CSV file into a pandas DataFrame, you can use the read_csv() function from the pandas library. Here’s an example of how to use it:

				
					import pandas as pd

# Read the CSV file into a pandas DataFrame
df = pd.read_csv('example.csv')

# Display the first few rows of the DataFrame
print(df.head())
				
			

Output:

				
					#        Name   Income  Gender
#0       Alice   20000  female
#1         Bob   80000    male
#2     Charlie   70000    male
#3       David  150000    male
#4       Sarah   40000  female
				
			

Example 2: Read Specific Columns from CSV File

To read specific columns from a CSV file into a pandas DataFrame, you can use the usecols parameter of the read_csv() function.

Here’s an example:

				
					import pandas as pd

# Read only specific columns from the CSV file
df = pd.read_csv('example.csv', usecols=['Name', 'Gender'])

# Display the first few rows of the DataFrame
print(df.head())

				
			

Output:

				
					#        Name     Gender
#0       Alice    female
#1         Bob     male
#2     Charlie     male
#3       David     male
#4       Sarah    female
				
			

As second option you can also use indices:

				
					import pandas as pd

# Read only specific columns from the CSV file
df = pd.read_csv('example.csv', usecols=[0,2])

# Display the first few rows of the DataFrame
print(df.head())

				
			

Example 3: Specify Header Row when Importing CSV File

To specify the header row when importing a CSV file using Pandas, you can use the header parameter of the read_csv() function. This parameter allows you to specify which row of the CSV file should be used as the header row. By default, Pandas assumes that the first row of the CSV file contains the column names.

Here’s an example:

				
					import pandas as pd

# read CSV file with header row at index 0
df = pd.read_csv('example.csv', header=0)

# Display the first few rows of the DataFrame
print(df.head())

				
			

Output:

				
					#      Name  Income  Gender
#0    Alice   20000  female
#1      Bob   80000    male
#2  Charlie   70000    male
#3    David  150000    male
#4    Sarah   40000  female
				
			

In this example, the header parameter is set to 0, which tells Pandas to use the first row of the CSV file as the header row. If the header row is located at a different row index, you can simply specify the corresponding index number in the header parameter.

If your CSV file doesn’t have a header row, you can set the header parameter to None and then use the names parameter to specify the column names:

				
					import pandas as pd

# read CSV file without header row
df = pd.read_csv('example.csv', header=None, names=['Name', 'Income' , 'Gender'])

# Display the first few rows of the DataFrame
print(df.head())

				
			

In this example, the header parameter is set to None to indicate that the CSV file doesn’t have a header row. The names parameter is then used to specify the column names as a list of strings.

Example 4: Skip Rows when Importing CSV File

To skip rows when importing a CSV file using Pandas, you can use the skiprows parameter of the read_csv() function. This parameter allows you to specify which rows of the CSV file should be skipped during the import process.

Here’s an example:

				
					import pandas as pd

# read CSV and skip second row
df = pd.read_csv('example.csv', skiprows=[1])

# Display the first few rows of the DataFrame
print(df.head())

				
			

Output:

				
					#      Name  Income  Gender
#0      Bob   80000    male
#1  Charlie   70000    male
#2    David  150000    male
#3    Sarah   40000  female
				
			

And the following code shows how to skip the second and third row when importing the CSV file:

				
					import pandas as pd

# read CSV and skip second row
df = pd.read_csv('example.csv', skiprows=[1,2])

# Display the first few rows of the DataFrame
print(df.head())

				
			

Output:

				
					#      Name  Income  Gender
#0  Charlie   70000    male
#1    David  150000    male
#2    Sarah   40000  female
				
			

Skip rows with parameter

You can also use the skiprows parameter to skip rows based on a condition. For example, if your CSV file has a header row followed by several rows of comments, you can skip the comment rows by checking if the first character of each row is a # symbol:

				
					import pandas as pd

# define a function to check if a row is a comment
def is_comment(row):
    return row.startswith('#')

# read the CSV file and skip comment rows
df = pd.read_csv('example.csv', skiprows=lambda x: is_comment(x))

# display the DataFrame
print(df.head())
				
			

In this example, we define a function called is_comment() that checks if a row starts with a # symbol. We then use a lambda function to pass each row of the CSV file to the is_comment() function and skip the rows that return True. This allows us to skip the comment rows and import only the data rows.

Example 5: Read CSV Files with Custom Delimiter

Sometimes  you may have a CSV file with a delimiter that is different from a comma. 

To read CSV files with a custom delimiter using Pandas, you can use the delimiter or sep parameter of the read_csv() function. By default, Pandas assumes that CSV files are comma-separated, but you can specify a different delimiter using the delimiter or sep parameter.

Here’s an example of how to read a CSV file with a tab delimiter using the delimiter parameter:

				
					import pandas as pd

# read CSV file with tab delimiter
df = pd.read_csv('my_data.csv', delimiter='\t')

# display the DataFrame
print(df.head())

				
			

In this example, the delimiter parameter is set to \t, which tells Pandas to use a tab character as the delimiter.

Alternatively, you can use the sep parameter to specify the delimiter:

				
					import pandas as pd

# read CSV file with pipe delimiter
df = pd.read_csv('my_data.csv', sep='|')

# display the DataFrame
print(df.head())

				
			

In this example, the sep parameter is set to |, which tells Pandas to use a pipe character as the delimiter.

You can also specify a regular expression pattern as the delimiter using the sep parameter. For example, if your CSV file uses a delimiter that consists of multiple spaces, you can specify a regular expression pattern to match the delimiter:

				
					import pandas as pd

# read CSV file with multiple-space delimiter
df = pd.read_csv('my_data.csv', sep='\s+')

# display the DataFrame
print(df.head())

				
			

In this example, the sep parameter is set to \s+, which is a regular expression pattern that matches one or more whitespace characters. This allows Pandas to correctly parse the CSV file even if the delimiter consists of multiple spaces.

Wrap up


Thanks for reading. Happy coding!