Rotating PDF files with PyPDF2 and Tkinter Introduction Sometimes we need simple tools to get the...
Splitting a pdf to single pages with PyPDF2
Splitting Pdfs with Python
In this post, we are going to have a look at how to split all pages from a single pdf into one-page pdf files. Splitting a pdf into several pages can easily be done with almost any pdf tool worth its salt. However, splitting a pdf into single pages is a manual operation, and if you have to do it on several pdfs an automated tool makes sense. This is where PyPDF2 comes in handy. If you just want the complete code without all the fancy explanations, you can find it at the end.
Preparations
If you haven't done so already, fire up your command prompt, PowerShell or terminal and install PyPDF2 with pip.
pip install pypdf2
Currently I am running 32-bit Python 3.8 with PyPDF2 version 1.26.0 on Windows 10. The code works on this setup, and probably also for other OS'es.
Code line by line
Imports
We start with importing PdfFileWriter and PdfFileReader so that we can read the existing pdf and later write new pdfs. We also need to import sys so that we can check what files we have have in our working directory.
from PyPDF2 import PdfFileWriter, PdfFileReaderimport os
Getting the pdf files to split
First we do a list comprehension in os.listdir(".")
if the provided path is a file os.path.isfile(f)
. After that we filter out all the pdf files from the list files
with files = list(filter(lambda f: f.lower().endswith((".pdf")), files))
.
files = [f for f in os.listdir(".") if os.path.isfile(f)]
files = list(filter(lambda f: f.lower().endswith((".pdf")), files))
Splitting and creating new pdf
Now it is time to process all our pdf files. We go through each of our pdf in files with a for loop for pdf in files:
. We then open the pdf with open(pdf, "rb") as f:
and load each pdf into a PdfFileReader object with inputpdf = PdfFileReader(f)
.
Now it is time to start the splitting. With another for loop, we loop through all pages in the pdf. You can get the number of pages with numPages
. We create a PdfFileWriter object named output and add the first page with getPage(i)
. We name the output pdf with the original name, add -Page and the page number. name = pdf[:-4]+"-Page "+str(i)+".pdf"
. Finally, we save the output.
with open(name, "wb") as outputStream: output.write(outputStream)
Complete code
from PyPDF2 import PdfFileWriter, PdfFileReader import os files = [f for f in os.listdir(".") if os.path.isfile(f)] files = list(filter(lambda f: f.lower().endswith((".pdf")), files)) for pdf in files: with open(pdf, "rb") as f: inputpdf = PdfFileReader(f) for i in range(inputpdf.numPages): output = PdfFileWriter() output.addPage(inputpdf.getPage(i)) name = pdf[:-4]+"-Page "+str(i)+".pdf" with open(name, "wb") as outputStream: output.write(outputStream)