In this tutorial, we will explore how to read an Excel file with Python by utilizing different libraries and modules. Excel is a spreadsheet software created by Microsoft. It serves as a user-friendly tool for organizing, analyzing, and storing data within tables. Its widespread application can be found in numerous settings, including offices, educational institutions, and universities, to effectively manage data.
Reading from an Excel file
Initially, we must install the xlrd library by executing the command below:
pip install xlrd
Creating a Workbook
An Excel file encompasses a complete workbook that contains all the data. You have the option to generate a new workbook from the ground up, or you can conveniently create a workbook based on an already existing Excel file.
Reading an Excel file using the Xlrd module
Note: The Xlrd module only supports .xls files. Earlier, it used to support .xlsx files too, but the support was terminated from version 2.0.0, although we have shown an example for a better understanding.
In a similar vein, there are libraries such as xlwt and xlutils that exclusively cater to .xls file formats, which are not pertinent at this time.
#importing the xlrd module
import xlrd
# writing the path of the file
file = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xls"
# Open workbook
wb = xlrd.open_workbook(file)
# Select first sheet
sheet = wb.sheet_by_index(0)
# Get value at row 0, column 1
cell_value = sheet.cell_value(0, 1)
#printing the value from the Excel sheet
print(cell_value)
Output:
Example
Explanation
In the example provided above, we utilized the xlrd module to access the .xls file. We extracted and displayed the value found in the first row (0th index) and the second column (1st index) of the Excel document.
What if the XLRD module is used with the .xlsx extension
If you attempt to execute the XLRD module using the .xlsx file extension, you will encounter an error, as illustrated in the following section:
Output:
XLRDError Traceback (most recent call last)
Cell In[2], line 6
4 file = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"
5 # Open workbook
----> 6 wb = xlrd.open_workbook(file)
8 # Select first sheet
9 sheet = wb.sheet_by_index(0)
File c:\Users\mk\AppData\Local\Programs\Python\Python313\Lib\site-packages\xlrd\_init_.py:170, in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows, ignore_workbook_corruption)
167 # We have to let unknown file formats pass through here, as some ancient
168 # files that xlrd can parse don't start with the expected signature.
169 if file_format and file_format != 'xls':
--> 170 raise XLRDError(FILE_FORMAT_DESCRIPTIONS[file_format]+'; not supported')
172 bk = open_workbook_xls(
173 filename=filename,
174 logfile=logfile,
(...) 182 ignore_workbook_corruption=ignore_workbook_corruption,
183 )
185 return bk
XLRDError: Excel xlsx file; not supported
1. Reading from the Pandas
Pandas is characterized as a free and open-source library that operates on the foundation of the NumPy library. It facilitates rapid data analysis, data cleansing, and data preparation for users, while also accommodating both xls and xlsx file formats sourced from URLs.
We can also read the CSV file using Pandas.
This package for Python offers a valuable data structure known as a data frame.
Example: Reading the .xlsx file using Pandas
To illustrate the process of reading an .xlsx file utilizing Pandas, let’s consider an example.
Input File: hellologicpracticetech.xlsx
Code:
import pandas as pd
# Specify the correct file path
file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"
logicpractice = pd.read_excel(file_path, engine='openpyxl')
print(logicpractice)
Output:
Explanation:
In the preceding example, the contents of the Excel file have been accessed utilizing the Pandas library. We have included the pandas library with the alias pd.
At this point, we will utilize the Pandas library to read the CSV file.
Example: Reading the CSV file using Pandas
Let’s consider an example to illustrate the process of reading a CSV file with the help of Pandas.
import pandas as pd
# specify the path of the CSV file
file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.csv"
# Read the file
data = pd.read_csv(file_path, low_memory=False)
# Output the number of rows
print("Total rows: {0}".format(len(data)))
# Print the list of column headers
print(list(data.columns))
Output:
Total rows: 0
['Hello..', 'Example', 'Tech', 'Pvt Ltd']
Explanation:
In the preceding example, we accessed the data contained within the CSV file through the Pandas library. We initiated the Pandas library by importing it as pd. This illustrates that with the utilization of Pandas, we have the capability to read not only CSV files but also Excel files.
2. Reading from the openpyxl
Using openpyxl, it is possible to extract information from an already established spreadsheet. This library not only facilitates data retrieval but also enables users to execute computations and incorporate additional content that was not included in the initial dataset.
Installing the openpyxl library
To begin with, it is necessary to install the openpyxl library via pip through the command line interface.
pip install openpyxl
Python Example to Read Data from the openpyxl
Let us consider an example to illustrate how to extract data using the openpyxl library in Python.
#importing the openpyxl library
import openpyxl
# writing the full raw path of our file
file_path = r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx"
# Here we are Loading workbook
workbook = openpyxl.load_workbook(file_path)
# Select active sheet
sheet = workbook.active
# Printing all rows of the Excel file
for row in sheet.iter_rows(values_only=True):
print(row)
Output:
Explanation
In the preceding example, the openpyxl library has been utilized to access and display the contents of the Excel file titled "hellologicpracticetech.xlsx".
For further insights on openpyxl, please explore our comprehensive tutorial. Click Here. We have covered important information in this guide.
3. Reading from the Xlwings
The Xlwings library is an additional tool that accommodates the .xlsx file format. This library not only enables us to input data but also facilitates data retrieval. Let's explore an example of how to utilize Xlwings for data entry.
Example
#importing the xlwings
import xlwings as xw
# Open the Excel file
wb = xw.Book(r"C:\Users\mk\Desktop\hello_logicpractice_tech.xlsx")
# Select a sheet
sheet = wb.sheets['Sheet1']
# Insert a list of lists into multiple rows into Name, role, and attendance format
sheet.range("A3:D5").value = [
['Prashant Y.','SEO', 85],
['Sam A.','Graphic Designer',92],
['David J','Java Developer', 83]
]
# Save and close
wb.save()
wb.close()
Output: Before executing the code
Final Outcome: Upon modifying the information within the Excel spreadsheet:
Explanation:
In the preceding example, the xlwings library was utilized to alter or update the information contained within the Excel spreadsheet. The modifications were made to the data spanning from cell A3 to D5. We presented the Excel file prior to the execution of the code as well as subsequent to the code's execution.
Python Read Excel File FAQs
1. Which libraries are commonly used to read Excel files in Python?
- Pandas: pandas is the most popular and easiest library used to read Excel.
- Openpyxl: Openpyxl is also a popular library used to work with .xlsx files (read/write)
- Xlrd: Xlrd is a Legacy library, which means that it is not used now.
2. How can we read an Excel file using pandas?
We can utilize pandas to read an Excel file. Let's explore this with an example:
import pandas as pd
df = pd.read_excel("data.xlsx")
print(df.head()) # Shows first 5 rows
3. How can we read a specific sheet from an Excel file?
To access a particular sheet within an Excel document, we can utilize the following method:
df = pd.read_excel("data.xlsx", sheet_name="Sheet2")
4. How can we read Excel files with openpyxl instead of pandas?
from openpyxl import load_workbook
wb = load_workbook("data.xlsx")
sheet = wb["Sheet1"]
for row in sheet.iter_rows(values_only=True):
print(row)
5. How can we read only specific columns from Excel?
To retrieve data from particular columns in an Excel file, we can utilize the following method:
df = pd.read_excel("data.xlsx", usecols=["Name", "Age"])