In our efforts to empower citizen groups around the globe to hold their governments accountable, we here at PWYP-US aim to provide you with every tool you might need in the fight for transparency. And while there are many ways to use data to investigate extractive industry payment disclosures (like QGIS), often the biggest hurdle is to simply extract the data from its location on the web into a usable format. This short guide will cover the use of one tool that can simplify that process for web-based tables: the Scraper web extension for Google Chrome. A web scraper automatically collects information from the internet, allowing users to easily transfer that information to a format from which they can analyze or manipulate it. This can greatly speed up the process over manually copying data and reduce the possibility of error.
The Scraper extension is free and can be found here in the Chrome web store. It is easily installed by clicking “Add to Chrome” in the top right corner and following the short instructions.
Once Scraper is successfully installed, it can be found in the top right corner of your browser near the Menu button.
How and When to Use Scraper
Using Scraper is truly simple, but it is not often clear where it is best employed. Many online data tables you will find can simply be copied and pasted directly into a spreadsheet program and used from there. Scraper comes in when the web page formatting gets in the way of a simple cut and paste operation.
For the rest of the demonstration, we will use tables found in the 2015 form 20-F from Total S.A. filed at the US Securities and Exchange Commision. It can be found here, with the pertinent tables on page S-18.
The “Report on the payments made to governments” contains information on Total’s payments to governments in all countries of operation. This includes payment streams such as royalties, taxes, rents and fees. The data is categorized into payments by project and payments to various levels of government in a country. Once it is in machine-readable format (such as an excel file) we can compare it to other data we may have, be it from USEITI, government data, or other countries’ EITI reports.
To use Scraper, first identify a table from which you want to extract data. We will use table 3.1: “Reporting by country and type of payment” on page S-19.
Once you have your table identified, use your cursor to highlight part of the table, being sure to select two or more full rows.
Then, leaving the cells highlighted, we have two options:
Option 1: Go to the Scraper icon in the top right corner of the browser and select “Scrape similar…” from the dropdown menu
Option 2: Right click and select the “Scrape similar…” option from the cursor dropdown menu
From there, Scraper will open a new window containing the scraped data. Scraper will automatically locate the bounds of the table and find the column names from only the few rows we highlighted in the previous step.
We can now view the data in the right panel and check that it has been scraped properly. If we decide it looks good enough to export, we simply click “Copy to clipboard”
Then go to our preferred spreadsheet program and paste with Ctrl + v
Now our data is in Microsoft Excel. But before we start any analysis, we need to perform some data checking and cleaning to make sure the data is 100% correct and usable. We can tell from a first look that there are a lot of empty columns and the column headers are not lined up with the values.
First, we need to move the column headers into the right positions. By cross checking with the original html table on the SEC website, we can simply drag and drop the headers in the Excel spreadsheet into their correct locations.
For example, we see that “License bonus” in the Excel spreadsheet has ended up over the “License fees” column.
Now, before we delete the empty columns, we need to make sure they are really empty. In doing this, we will find that one cell has been split incorrectly.
Checking back to the original table, we see that (1) is a footnote indicating that the (599) is a negative payment. We can fix the inconsistency by clearing both cells and replacing
(599 with -599. Now we can delete the empty columns as well as the first row.
Our spreadsheet should look like this:
Finally, we need to check that the data matches correctly. By using the spreadsheets =sum() function, we can compare the imported “TotalofPayments” in column I to a total we calculate ourselves using the imported values in columns B through H.
This is done by selecting the empty cell in column J and entering =sum(B2:H2)
This gives us a total calculated using the numbers in the spreadsheet that should equal the imported number under “TotalofPayments”. We can check this visually or set an equality function in the next cell by entering =J2=I2 (meaning: is J2 equal to I2?)
The result is “TRUE”, which means that the data in that row imported correctly. To check the remaining rows, we can easily expand the =sum function and the equality function to the other cells in columns J and K.
First: select both cells:
Then, clicking on the lower right-hand corner of cell K2, drag down all the way to the last row of our data and release.
This action will copy the formulas in J2 and K2 to all the cells below while updating the row information to be consistent. Now it is very easy to see that all of the cells in column K are TRUE and that our data has been imported 100% correctly.
Tools like Scraper can make data collection go from an arduous chore to a simple task, greatly increasing the efficiency and accuracy and leaving us more time to examine the numbers. Stay tuned for a walkthrough of a similar program, Tabula, to learn how to scrape charts from pdf reports.
Tommy Morrison is a Research Assistant at PWYP-US, follow us on Twitter @pwypusa
Want updates from Extract-A-Fact? Sign-up for our newsletter
On May 19, PWYP-US hosted the first of two data skills training workshops to explore ways of using the USEITI data with the open source QGIS mapping software. In the first session, our trainer, University of Maryland Geographical Sciences PhD candidate, Diana Parker, introduced the basics of organizing data as well as an introductory understanding of geographic information systems (GIS).
With the growing availability of voluntary disclosures through the Extractive Industries Transparency Initiative (EITI) and mandatory disclosure requirements coming into effect in a number of countries (and soon in the United States) there is an ever expanding pool of natural resource data that is now available. With these trainings, PWYP-US sets out to empower interested actors and provide them the tools to apply this wealth of data to improve governance of the natural resource sector. The United States implementation of EITI (USEITI) dataset was used in this training as an example of how geographically delineated data can be cleaned, sorted, and mapped.
Attendees were guided through the processes of preparing the USEITI data for use in QGIS, followed by an introductory tutorial of making basic maps. The first step of cleaning and sorting the data is critical to ensure the interoperability with applications like QGIS. Not all datasets are created equal. For example, datasets concerning oil production and revenues from different sources are more than likely presenting the data in slightly different ways. The initial step of any data analysis will often be working with software such as Excel and the use of pivot tables to ensure the data file is prepared in a way that can be interpreted by data analysis and data visualization software. This critical first step of of preparing the data was covered in detail in this introductory training.
Once organized, the data itself can yield some insights, but visualizing the information can be an important next step in raising questions and drawing conclusions from the data. Through visualizations the information can be made more accessible to broader set of stakeholders. There are a number of ways to visualize data that should be familiar to most people who have worked with even the most basic set of financial or budget spreadsheets. There are even some useful shortcuts to creating quick bar graphs and pie charts, such as using the ‘explore’ button in online apps like Google Sheets. Organizing geographically-tagged data and visualizing the information in a map can be a very effective way of gaining a better understanding of the underlying data.
The basic map below (one of the outputs of the first training session) visually depicts which counties across the United States have the highest revenue from natural resource extraction.
As one example, advocates can draw conclusions from such a map and decide how best to direct their campaigns to assist communities most affected by mining activity.
There is much more that can be done with QGIS using the USEITI data. For those interested in learning more, the next training will be held June 7 (RSVP here) and will instruct participants on how to construct more complex maps in QGIS.
Click here to watch the training video and follow this link to download the instruction manual and datasets. Instructions for installing the free QGIS software are also available in the linked Google folder. If you take our course, tell us what you think by filling out our post-course survey.
Waseem Mardini is the Policy Advisor at PWYP-US, follow him on Twitter @pwypusa
Want updates from Extract-A-Fact? Sign-up for our newsletter
Are we getting a good deal on our natural resources?
Extract-A-Fact will provide training modules detailing useful and creative ways to find, analyze, and visualize extractives data, as well as blog posts from PWYP-US and our partners as we dig deeper into oil, gas, and mining sector data to answer questions critical to communities impacted by natural resources.
Over the last 14 years Publish What You Pay coalitions around the world have advocated for a more transparent extractives sector by petitioning governments to require oil, gas, and mining companies to publish what they pay for the right to explore, develop, and extract natural resources. There are now mandatory disclosure laws in force in over 30 countries, and the first company reports were released in 2015. These reports can be analyzed alongside other data sources to get a more complete picture of a country’s natural resource sector.
How much did Big Oil pay (or not pay) in taxes last year?
The goal of Extract-A-Fact is to enable civil society organizations, citizens, journalists, government officials, academics, and other stakeholders to effectively analyze this data and put it to use to hold both companies and governments accountable for how natural resources are extracted and managed.
In the coming weeks and months, we will feature blogs and training modules that will help readers enhance their ability to work with and explore the data. But we also want to hear from you - Extract-A-Fact is intended to be a collaborative space for those interested in working with the growing set of extractives sector data. Please tell us what questions you would like to answer, or if you wish to collaborate on a training or story.
How do I make a map to show mining revenues owed vs. revenues paid to my community?
These are exciting times for the transparency movement, as the books are now being opened on the payments that oil, gas and mining companies make around the world. It is now time for us to translate this new data into accountability. We hope you will join us in this effort and that Extract-A-Fact will be a useful resource in your work.
Help us spread the word - use #ExtractAFact to tell folks about this project!
Jana Morgan is the Director of PWYP-US, follow her on Twitter @janalmorgan and @pwypusa
Want updates from Extract-A-Fact? Sign-up for our newsletter.
Click here for the archives to see our full list of posts.