Struggling with multiple programming languages? No worries. Our Code Converter has got you covered. Give it a go!
There are many scenarios where you want to split a PDF document into several files automatically, from invoices, to official company reports and documents.
In a previous tutorial, we saw how you can merge multiple PDF documents into one. In this tutorial, you will learn how you can split PDF documents with Python using the pikepdf
library.
Download: Practical Python PDF Processing EBook.
To get started, let's install pikepdf:
Open up a new Python file and let's import it:
First of all, let's make a Python dictionary that maps the new PDF file index with the original PDF file's page range:
In the above setting, we're going to split our PDF file into 3 new PDF documents, the first contains the first 9 pages, from 0 to 9 (while 9 is not included). The second file will contain the pages from 9 (included) to 11, and the last file will contain the page range from 11 until the end or until reaching page 100 if it exists.
This way, we assure maximum flexibility as each one of you has its own use case. If you want to split each page into a new PDF document, you can simply replace [0, 9]
to [0]
, so it'll be a list of one element and that is the first page, and so on.
This is the file we're going to split (you can get it here if you want to follow along):
Loading the file:
Next, we make the resulting PDF files (3 in this case) as a list:
To make a new PDF file, you simply call the Pdf.new()
method. The new_pdf_index
variable is the index of the file, it will only be incremented when we're done with making the previous file. Diving into the main loop:
Master PDF Manipulation with Python by building PDF tools from scratch. Get your copy now!
Download EBookFirst, we iterate over all the PDF files using the pdf.pages
attribute. If the page index is in the file page range in the file2pages
dictionary, then we simply add the page into our new file. Otherwise, then we know we're done with the previous file, and it is time to save it to the disk using save()
method, and we continue the loop until all pages are assigned to their files. And then finally, we save the last file outside the loop.
Here's the output when I run the code:
And indeed, the new PDF files are created:
And there you go! I hope this quick guide helped you out splitting your PDF file into several documents, you can check the full code here. If you want to merge several PDF files into one, then this tutorial will definitely help you.
Here are some PDF-related tutorials:
For more PDF handling guides on Python, you can check our Practical Python PDF Processing EBook, where we dive deeper into PDF document manipulation with Python, make sure to check it out here if you're interested!
Happy coding ♥
Save time and energy with our Python Code Generator. Why start from scratch when you can generate? Give it a try!
View Full Code Transform My Code
Got a coding query or need some guidance before you comment? Check out this Python Code Assistant for expert advice and handy tips. It's like having a coding tutor right in your fingertips!