Most of the world today relies on Excel for a lot of processes. We all know Excel is used for organizing data, carrying out computations, financial analysis, visualization, etc. However, sometimes there comes a need to automate a few repetitive tasks which are difficult to carry out manually. And here comes Python to rescue us from monotonous tasks and help automate. One useful library of Python is Openpyxl which we will be learning about in this article.
Openpyxl is a Python library that is used to read from an Excel file or write to an Excel file. Data scientists use Openpyxl for data analysis, data copying, data mining, drawing charts, styling sheets, adding formulas, and more.
Workbook: A spreadsheet is represented as a workbook in openpyxl. A workbook consists of one or more sheets.
Sheet: A sheet is a single page composed of cells for organizing data.
Cell: The intersection of a row and a column is called a cell. Usually represented by A1, B5, etc.
Row: A row is a horizontal line represented by a number (1,2, etc.).
Column: A column is a vertical line represented by a capital letter (A, B, etc.).
Openpyxl can be installed using the pip command and it is recommended to install it in a virtual environment.
pip install openpyxl
We start by creating a new spreadsheet, which is called a workbook in Openpyxl. We import the workbook module from Openpyxl and use the function Workbook()
which creates a new workbook.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
from openpyxl
import Workbook
#creates a new workbook
wb = Workbook()
#Gets the first active worksheet
ws = wb.active
#creating new worksheets by using the create_sheet method
ws1 = wb.create_sheet("sheet1", 0) #inserts at first position
ws2 = wb.create_sheet("sheet2") #inserts at last position
ws3 = wb.create_sheet("sheet3", -1) #inserts at penultimate position
#Renaming the sheet
ws.title = "Example"
#save the workbook
wb.save(filename = "example.xlsx")
We load the file using the function load_Workbook()
which takes the filename as an argument. The file must be saved in the same working directory.
1 2
#loading a workbook wb = openpyxl.load_workbook("example.xlsx")
1 2 3 4 5 6 7 8 9 10 11 12 13 14
#getting sheet names wb.sheetnames result = ['sheet1', 'Sheet', 'sheet3', 'sheet2'] #getting a particular sheet sheet1 = wb["sheet2"] #getting sheet title sheet1.title result = 'sheet2' #Getting the active sheet sheetactive = wb.active result = 'sheet1'
1
2
3
4
5
6
7
8
9
10
11
#get a cell from the sheet
sheet1["A1"] <
Cell 'Sheet1'.A1 >
#get the cell value
ws["A1"].value 'Segment'
#accessing cell using row and column and assigning a value
d = ws.cell(row = 4, column = 2, value = 10)
d.value
10
1
2
3
4
5
6
7
8
9
10
11
12
13
#looping through each row and column
for x in range(1, 5):
for y in range(1, 5):
print(x, y, ws.cell(row = x, column = y)
.value)
#getting the highest row number
ws.max_row
701
#getting the highest column number
ws.max_column
19
There are two functions for iterating through rows and columns.
1
2
3
4
5
Iter_rows() => returns the rows
Iter_cols() => returns the columns {
min_row = 4, max_row = 5, min_col = 2, max_col = 5
} => This can be used to set the boundaries
for any iteration.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#iterating rows
for row in ws.iter_rows(min_row = 2, max_col = 3, max_row = 3):
for cell in row:
print(cell) <
Cell 'Sheet1'.A2 >
<
Cell 'Sheet1'.B2 >
<
Cell 'Sheet1'.C2 >
<
Cell 'Sheet1'.A3 >
<
Cell 'Sheet1'.B3 >
<
Cell 'Sheet1'.C3 >
#iterating columns
for col in ws.iter_cols(min_row = 2, max_col = 3, max_row = 3):
for cell in col:
print(cell) <
Cell 'Sheet1'.A2 >
<
Cell 'Sheet1'.A3 >
<
Cell 'Sheet1'.B2 >
<
Cell 'Sheet1'.B3 >
<
Cell 'Sheet1'.C2 >
<
Cell 'Sheet1'.C3 >
To get all the rows of the worksheet we use the method worksheet.rows and to get all the columns of the worksheet we use the method worksheet.columns. Similarly, to iterate only through the values we use the method worksheet.values.
Example:
1 2 3
for row in ws.values: for value in row: print(value)
Writing to a workbook can be done in many ways such as adding a formula, adding charts, images, updating cell values, inserting rows and columns, etc… We will discuss each of these with an example.
1 2 3 4 5
#creates a new workbook wb = openpyxl.Workbook() #saving the workbook wb.save("new.xlsx")
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#creating a new sheet
ws1 = wb.create_sheet(title = "sheet 2")
#creating a new sheet at index 0
ws2 = wb.create_sheet(index = 0, title = "sheet 0")
#checking the sheet names
wb.sheetnames['sheet 0', 'Sheet', 'sheet 2']
#deleting a sheet
del wb['sheet 0']
#checking sheetnames
wb.sheetnames['Sheet', 'sheet 2']
1
2
3
4
5
6
7
8
9
10
#checking the sheet value
ws['B2'].value
null
#adding value to cell
ws['B2'] = 367
#checking value
ws['B2'].value
367
We often require formulas to be included in our Excel datasheet. We can easily add formulas using the Openpyxl module just like you add values to a cell.
For example:
1 2 3 4 5 6 7 8 9 10
import openpyxl from openpyxl import Workbook wb = openpyxl.load_workbook("new1.xlsx") ws = wb['Sheet'] ws['A9'] = '=SUM(A2:A8)' wb.save("new2.xlsx")
The above program will add the formula (=SUM(A2:A8)) in cell A9. The result will be as below.
Two or more cells can be merged to a rectangular area using the method merge_cells(), and similarly, they can be unmerged using the method unmerge_cells().
For example:
Merge cells
1 2 3
#merge cells B2 to C9 ws.merge_cells('B2:C9') ws['B2'] = "Merged cells"
Adding the above code to the previous example will merge cells as below.
1 2
#unmerge cells B2 to C9 ws.unmerge_cells('B2:C9')
The above code will unmerge cells from B2 to C9.
To insert an image we import the image function from the module openpyxl.drawing.image. We then load our image and add it to the cell as shown in the below example.
Example:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import openpyxl
from openpyxl
import Workbook
from openpyxl.drawing.image
import Image
wb = openpyxl.load_workbook("new1.xlsx")
ws = wb['Sheet']
#loading the image(should be in same folder)
img = Image('logo.png')
ws['A1'] = "Adding image"
#adjusting size
img.height = 130
img.width = 200
#adding img to cell A3
ws.add_image(img, 'A3')
wb.save("new2.xlsx")
Result:
Charts are essential to show a visualization of data. We can create charts from Excel data using the Openpyxl module chart. Different forms of charts such as line charts, bar charts, 3D line charts, etc., can be created. We need to create a reference that contains the data to be used for the chart, which is nothing but a selection of cells (rows and columns). I am using sample data to create a 3D bar chart in the below example:
Example
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import openpyxl
from openpyxl
import Workbook
from openpyxl.chart
import BarChart3D, Reference, series
wb = openpyxl.load_workbook("example.xlsx")
ws = wb.active
values = Reference(ws, min_col = 3, min_row = 2, max_col = 3, max_row = 40)
chart = BarChart3D()
chart.add_data(values)
ws.add_chart(chart, "E3")
wb.save("MyChart.xlsx")
Result