Extract Tables from Notion with Python

Yu Yang
2 min readAug 22, 2021

I have become a fan of Notion these days. I use it on a daily basis to write my daily journal. On my daily journal, there are two tables (Time Log and Time Budget) and five text blocks, as shown in later examples.

In order to do a monthly reflection on my time spending, I would like to extract the tables. The csv export functionality in Notion is slow and not convenient for batch operation. I usually export my daily journals on Notion to pdf files on a monthly basis and then save them locally, as Notion cannot work offline. Therefore, I decided to take the following track to achieve the goal.

  1. export Notion documents to pdf files
  2. extract tables from the pdf files and save in the csv format

The 2nd step, however, is more difficult than imagined. I tried several python packages and many of them couldn’t satisfy my need. In the post, I will first describe the final method that I employ, and then talk about the failed experiments using the other tools.

In the post, I will use an example daily journal of mine to illustrate. Note that I have made the time log very long so that it occupies two pages, which is a possible case in our daily life and will make the extraction more general.

pdfplumber

I highly recommend you give it a try on pdfplumber. It can not only extract text, but also tables, with detailed documentations and debug tools. Please take a quick look at the Demonstration of pdfplumber’s extract_table method to check this fascinating tool.

The following is the code for processing one pdf file.

The resulting dataframe is as follows. Well done!

We can also easily adapt it into batch processing.

Failed Experiments

I have also tried camelot and tabula following How to Extract Tables from PDF in Python, but neither works for Notion-exported pdf files.

  • camelot couldn’t extract any content from the pdf file except ‘Untitled’.
  • tabula succeeded for some documents, but for the others, it failed in converting the pdf file into csv files and would generate ParserError. For your reference, I have also included my code that succeeded on some documents but failed in others.

Hope this post will help you more quickly extracting desired tables from Notion! Thanks!

References

--

--