How to Read Excel File in Python

In this tutorial, we will explore how to read an Excel file with Python by utilizing different libraries and modules. Excel is a spreadsheet software created by Microsoft. It serves as a user-friendly tool for organizing, analyzing, and storing data within tables. Its widespread application can be found in numerous settings, including offices, educational institutions, and universities, to effectively manage data.

Reading from an Excel file

Initially, we must install the xlrd library by executing the command below:

Example

pip install xlrd

Creating a Workbook

An Excel file encompasses a complete workbook that contains all the data. You have the option to generate a new workbook from the ground up, or you can conveniently create a workbook based on an already existing Excel file.

Reading an Excel file using the Xlrd module

Note: The Xlrd module only supports .xls files. Earlier, it used to support .xlsx files too, but the support was terminated from version 2.0.0, although we have shown an example for a better understanding.

In a similar vein, there are libraries such as xlwt and xlutils that exclusively cater to .xls file formats, which are not pertinent at this time.

Example

#importing the xlrd module

import xlrd

# writing the path of the file

file = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xls"

# Open workbook

wb = xlrd.open_workbook(file)

# Select first sheet

sheet = wb.sheet_by_index(0)

# Get value at row 0, column 1

cell_value = sheet.cell_value(0, 1)

#printing the value from the Excel sheet

print(cell_value)

Output:

Output

Example

Explanation

In the example provided above, we utilized the xlrd module to access the .xls file. We extracted and displayed the value found in the first row (0th index) and the second column (1st index) of the Excel document.

What if the XLRD module is used with the .xlsx extension

If you attempt to execute the XLRD module using the .xlsx file extension, you will encounter an error, as illustrated in the following section:

Output:

Output

XLRDError                                 Traceback (most recent call last)

Cell In[2], line 6

      4 file = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"

      5 # Open workbook

----> 6 wb = xlrd.open_workbook(file)

      8 # Select first sheet

      9 sheet = wb.sheet_by_index(0)

File c:\Users\mk\AppData\Local\Programs\Python\Python313\Lib\site-packages\xlrd\_init_.py:170, in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows, ignore_workbook_corruption)

    167 # We have to let unknown file formats pass through here, as some ancient

    168 # files that xlrd can parse don't start with the expected signature.

    169 if file_format and file_format != 'xls':

--> 170     raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')

    172 bk = open_workbook_xls(

    173     filename=filename,

    174     logfile=logfile,

   (...)    182     ignore_workbook_corruption=ignore_workbook_corruption,

    183 )

    185 return bk

XLRDError: Excel xlsx file; not supported

1. Reading from the Pandas

Pandas is characterized as a free and open-source library that operates on the foundation of the NumPy library. It facilitates rapid data analysis, data cleansing, and data preparation for users, while also accommodating both xls and xlsx file formats sourced from URLs.

We can also read the CSV file using Pandas.

This package for Python offers a valuable data structure known as a data frame.

Example: Reading the .xlsx file using Pandas

To illustrate the process of reading an .xlsx file utilizing Pandas, let’s consider an example.

Input File: hellologicpracticetech.xlsx

Code:

Example

import pandas as pd

# Specify the correct file path

file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"

logicpractice = pd.read_excel(file_path, engine='openpyxl')

print(logicpractice)

Output:

Explanation:

In the preceding example, the contents of the Excel file have been accessed utilizing the Pandas library. We have included the pandas library with the alias pd.

At this point, we will utilize the Pandas library to read the CSV file.

Example: Reading the CSV file using Pandas

Let’s consider an example to illustrate the process of reading a CSV file with the help of Pandas.

Example

import pandas as pd

# specify the path of the CSV file

file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.csv"

# Read the file

data = pd.read_csv(file_path, low_memory=False)

# Output the number of rows

print("Total rows: {0}".format(len(data)))

# Print the list of column headers

print(list(data.columns))

Output:

Output

Total rows: 0

['Hello..', 'Example', 'Tech', 'Pvt Ltd']

Explanation:

In the preceding example, we accessed the data contained within the CSV file through the Pandas library. We initiated the Pandas library by importing it as pd. This illustrates that with the utilization of Pandas, we have the capability to read not only CSV files but also Excel files.

2. Reading from the openpyxl

Using openpyxl, it is possible to extract information from an already established spreadsheet. This library not only facilitates data retrieval but also enables users to execute computations and incorporate additional content that was not included in the initial dataset.

Installing the openpyxl library

To begin with, it is necessary to install the openpyxl library via pip through the command line interface.

Example

pip install openpyxl

Python Example to Read Data from the openpyxl

Let us consider an example to illustrate how to extract data using the openpyxl library in Python.

Example

#importing the openpyxl library

import openpyxl

# writing the full raw path of our file

file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"

# Here we are Loading workbook

workbook = openpyxl.load_workbook(file_path)

# Select active sheet

sheet = workbook.active

# Printing all rows of the Excel file

for row in sheet.iter_rows(values_only=True):

    print(row)

Output:

Explanation

In the preceding example, the openpyxl library has been utilized to access and display the contents of the Excel file titled "hellologicpracticetech.xlsx".

For further insights on openpyxl, please explore our comprehensive tutorial. Click Here. We have covered important information in this guide.

3. Reading from the Xlwings

The Xlwings library is an additional tool that accommodates the .xlsx file format. This library not only enables us to input data but also facilitates data retrieval. Let's explore an example of how to utilize Xlwings for data entry.

Example

Example

#importing the xlwings

import xlwings as xw

# Open the Excel file

wb = xw.Book(r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx")

# Select a sheet

sheet = wb.sheets['Sheet1']

# Insert a list of lists into multiple rows into Name, role, and attendance format

sheet.range("A3:D5").value = [

    ['Prashant Y.','SEO', 85],

    ['Sam A.','Graphic Designer',92],

    ['David J','Java Developer', 83]

]

# Save and close

wb.save()

wb.close()

Output: Before executing the code

Final Outcome: Upon modifying the information within the Excel spreadsheet:

Explanation:

In the preceding example, the xlwings library was utilized to alter or update the information contained within the Excel spreadsheet. The modifications were made to the data spanning from cell A3 to D5. We presented the Excel file prior to the execution of the code as well as subsequent to the code's execution.

Python Read Excel File FAQs

1. Which libraries are commonly used to read Excel files in Python?

  • Pandas: pandas is the most popular and easiest library used to read Excel.
  • Openpyxl: Openpyxl is also a popular library used to work with .xlsx files (read/write)
  • Xlrd: Xlrd is a Legacy library, which means that it is not used now.
  • 2. How can we read an Excel file using pandas?

We can utilize pandas to read an Excel file. Let's explore this with an example:

Example

import pandas as pd

df = pd.read_excel("data.xlsx")

print(df.head())  # Shows first 5 rows

3. How can we read a specific sheet from an Excel file?

To access a particular sheet within an Excel document, we can utilize the following method:

Example

df = pd.read_excel("data.xlsx", sheet_name="Sheet2")

4. How can we read Excel files with openpyxl instead of pandas?

Example

from openpyxl import load_workbook

wb = load_workbook("data.xlsx")

sheet = wb["Sheet1"]

for row in sheet.iter_rows(values_only=True):

    print(row)

5. How can we read only specific columns from Excel?

To retrieve data from particular columns in an Excel file, we can utilize the following method:

Example

df = pd.read_excel("data.xlsx", usecols=["Name", "Age"])

Input Required

This code uses input(). Please provide values below: