expand_less In an effort to map supply networks we found many supplier lists were published in structured tables, but only available as PDF reports, making them difficult to translate into spreadsheets.
To tackle this challenge, CERTH developed a PDF Table Extractor tool towards the end of 2017, which would enable scraping of such PDFs into spreadsheet format. The tool was developed based on the Java library of Tabula and is available online.
{{PDF Extractor|float:right; size:large; content}}{{PDF Extractor|float:center; size:large; content}}
The tool addresses a need by the ChainReact consortium among many others working in ESG and supply chain transparency, to scrape public data from tables in PDF reports.
It has allowed the extraction, and then mapping of around 13,000 relationships between suppliers and companies sourcing from them.