# tabula-py now able to extract remote PDF and multiple tables at once

(Note: Oct 7th, 2019) As of Oct. 2019, I launched a documentation site and Google Colab notebook for tabula-py. The FAQ would be good place to execute accurate extraction.

tabula-py is a Python library which enables you to extract tables from PDF into pandas DataFrames. Today, I released v0.8.0. In this post, I will introduce improvements after previous post of tabula-py. If you don’t familiar with tabula-py, you can see previous one.

### Change Notes

• Able to read remote PDF passing URL
• [Experimental] Add multiple_tables mode
• Add batch conversion method:convert_into_by_batch()
• Add encoding option
• Add java_options
• Will deprecate read_pdf_table() method

I will explain important features.

#### Read remote PDF passing URL

If you want extract a DataFrame from the internet, you can extract remote PDF without downloading it manually.

read_pdf("https://github.com/tabulapdf/tabula-java/raw/master/src/test/resources/technology/tabula/12s0324.pdf")


#### [Experimental] Add “multiple_tables" mode

tabula-py is a simple wrapper of tabula-java, it was hard to handle multiple tables in a page. But now, you can extract multiple tables in a page using multiple_tables option.

read_pdf('tests/resources/data.pdf', pages=2, multiple_tables=True)


This function create a list of DataFrames via JSON from tabula-java, so if tabula-java’s JSON format will change, the output could be broken. If you see CParserError , try to set multiple_tables option.

#### Add batch conversion method: “convert_into_by_batch()"

After tabula-java v0.9.2, we can extract tables from PDF by batch. You can use this function through convert_into_by_batch() method.

convert_into_by_batch(path_to_dir, output_format='csv')


You should set directory path of PDFs, not the specific pdf path.

tabula-py extracts tables same directory as input files.

### TODOs

There are several problems those may be fixed after releasing of tabula-java 0.9.3. e.g) Handling embedded font, including Japanese…