In our first video training session, we presented a walkthrough of how to organize USEITI data for use in the open source mapping software QGIS. Fortunately, that dataset included geographic identifiers called Federal Information Processing Standard (FIPS) county codes--five digit codes identifying counties and county equivalents throughout the United States. However, not every dataset will include a geographic identifier alongside data attributed to a location. Google Refine is a powerful and versatile tool that can allow users to clean, manipulate, and transform their data. In this post we will walk through the process of using Google Refine to add geographic coordinates to a dataset.
Step 1 - Download and Install Google Refine
Navigate to the OpenRefine download page, and download Google Refine 2.5 for your operating system. Google Refine operates as a hybrid desktop and web application. When you run Google Refine, a browser window should open automatically and present you with the Google Refine web interface. Despite operating within a web browser window, Google Refine does not require an active internet connection to work. As long as the Google Refine application is running, you can navigate to http://127.0.0.1:3333/ to access the web interface.
Before we move to the next step, take a moment to download the following .csv file. This dataset was downloaded from ResourceProjects.org, and was reduced to only include 2015 projects carried out by Tullow Oil. Google Refine is a powerful piece of software, however, it can quickly get bogged down with very large sets of data. This file was limited to one company for the purposes of this tutorial.
Step 2 - Upload your dataset to Google Refine
To get started, click ‘Create Project’. You will be presented with a number of options for data inputs. We will create a new project using data from ‘this computer.’ Select the file downloaded in the step above, and click next to start the process of uploading the dataset.
Step 3 - Add a new column to fetch location information
With the dataset uploaded, Google Refine will present a preview of the entries. Review the data and headers to make sure everything appears as it should. At the bottom of the window check that the ‘Parse next’ box is ticked so that the first row entries are parsed as column headers.
Click the ‘create project’ button in the upper right corner to proceed to the main working space of Google Refine. As noted above, we will be adding in additional geographic information to this dataset. To do so, click the triangle in the ‘Paid to’ column and navigate to ‘Edit column’ > ‘Add column by fetching URLs…’
A window will pop up as shown below. Name the column and enter in the following text into the ‘Expression’ box. (Click here to learn more about General Refine Expression Language)
Click ‘OK’ and the expression will produce a column containing what is essentially the output of a search of the google maps application programming interface (API) on the basis of each term in the ‘Paid to’ column. This operation will typically take several minutes to complete depending on the size of the dataset. While you wait for the process to complete you can experiment to get a better of sense of how this function works. Enter the expression we just used, leaving off the last portion, into the address bar of another browser window:
Fill in the name of any location around the world after the “=” and you will see a page with all the relevant location information for that location. This should give you a better sense of what is happening under the hood with the fetching URLs function in Google Refine.
Step 4 - Add another column to parse the information from the previous step
Once the process has completed, you will see a column filled with a long string of text and numbers.
To clean this up we will add another column parsing through that data. Click on the triangle in the new column you created in Step 3 containing all the Google maps information, and select ‘Edit column’ > ‘Add column based on this column…’ Write in a title for this new column and enter in the following text into the ‘Expression’ box:
Click ‘OK’ and the new column will populate with a neat seat of latitude and longitude coordinates separated by a comma derived from the data in the column we produced in Step 3.
STEP 5 - Export your project
The final step is to click ‘Export’ in the upper right corner of the Google Refine window. Select ‘Comma-separated value’ or ‘Excel’ from the dropdown list of file types.
You can then open the exported file in a desktop application to delete the column containing the unparsed location information while leaving the second column we created that includes the latitude and longitude coordinates. Google Refine is ideal for refining, cleaning and adding to a dataset, but operations like deleting rows and columns should be done in programs like Excel.
While this post demonstrates how latitude and longitude coordinates can be derived from a country name, the exact same process can be carried out for any other location. If instead of country names the dataset contained the names of cities or provinces, the same steps can be used to obtain the latitude and longitude coordinates. Location information can help you to create persuasive maps and other visualizations of your data. To learn more about what can be done with extractives data and mapping, navigate to the training section of Extract-A-Fact.
Always on the lookout for interesting data, I was excited when I recently came across a comprehensive trove of data on offshore production in the Gulf of Mexico from the Bureau of Ocean Energy Management (BOEM).The datasets at data.boem.gov include:
However, because the data is interspersed between four datasets, it is downright complicated to find out which leases a company owns, what the lease attributes are, and how much oil and gas has been produced from the lease. With that in mind, I have created an interactive map application that combines the geographic data, the ownership data, and the production data into one easy-to-use tool.
See the full-sized map here.
This map allows the user to view:
Take a look at the map, hosted on our Github page at the following link: https://pwypusa.github.io/pages/gulf_explorer.html
Click around and let us know what you find out, here or on Twitter @PWYPUSA!
There are two types of disclosures. One is disclosure for the sake of transparency, while the other is disclosure that actually works for the people it is intended to help. Ensuring the latter is the philosophy Bantay Kita has applied to its engagement with natural resources data.
When the Philippine's Extractive Industries Transparency Initiative (PHEITI) Country Report was first published in 2014, an incredible amount of data was made available to the public. As a civil society representative to the PHEITI Multi Stakeholder Group, Bantay Kita was tasked with making sure that the data was used to facilitate greater accountability.
On July 2016, Bantay Kita soft launched its DATA Portal - short for Demanding Action, Transparency, and Accountability Portal. The portal was conceptualized in Jakarta, Indonesia during the first Publish What You Pay Data Extractors Program workshop, and later brought to life in the second Data Extractors workshop in Harare, Zimbabwe.
How much did it cost? $0
What programming language was used? I'm not even sure - I don't know a single one.
How long did it take to make the prototype? Minus the snack breaks and random Facebook checking, about four hours.
All it really took was some creativity and a handful of free web tools, which can be learned by users in 1 to 6 hours, depending on the person’s willingness to learn.
The DATA Portal is community-targeted, hence it uses project-level, provincial and regional data related to oil, gas and mining extraction. Data was analyzed from the Philippine EITI Country Report and other sources, such as the Mines and Geosciences Bureau.
The portal includes data for all 18 administrative regions in the Philippines. Each "Regional Page" has subpages for its Extractive Projects Database, News about Extractives, and Extractives Statistics. On top of this, there are "Company Pages" where data for individual projects can be found, such as production numbers, government payments, employee demographics, social and environmental spending, community demographics, poverty incidence, transparency measure, and so on.
There are digital metric tons of data available on extractives, but not all are relevant to specific communities. Since our soft launching in July, we have been traveling the Philippines to conduct open data workshops and collect locally translated versions of "data user templates." We ask local CSOs to fill them out, with a specific advocacy goal they are working on in mind, and ask what kind of data would best help them to influence decision makers and other stakeholders to support their advocacy goal. This makes the data we produce not only specific to a certain community, but also relevant to them.
What's quite unique about the DATA Portal is that it's not simply a box full of big datasets and tables of data, but rather data visualizations and infographics to make the information easier to understand. This makes the the numbers less intimidating (especially for those traumatized by college algebra).
The next step for the DATA Portal is to make Action and Accountability happen. To that end, we hope to begin to provide communities with quarterly data - production and sales - disclosed by extractive companies through SMS. Since most taxes and royalties are based on sales, this information will enable local communities to estimate what subnational transfers from mining activities their communities should receive, and thus enable them to plan for the following year. This information can also be used by indigenous peoples’ organizations that receive royalties. Though schedule of payments can vary, at a minimum these disclosures will allow communities to validate the accuracy of the received funds.
Making a difference takes more than making data open. Stakeholders need to make these disclosures relevant to communities to actually make an impact. Through our DATA Portal, Bantay Kita hopes to do just that.
Marco Zaplan is the Research and Communications Officer for Bantay Kita. Follow him @zaplanmarco
By Jana Morgan, Director, Publish What You Pay - United States
This post originally appeared on the 2016 International Open Data Conference blog on October 5, 2016
2016 is an historic year for transparency advocates and data geeks alike.
After fourteen years of campaigning by the global Publish What You Pay (PWYP) coalition, laws requiring oil, gas and mining companies to publicly disclose project-level payments to governments for access to natural resources are now in force in over 30 countries. This means we will know how much each covered company paid in royalties, taxes, fees, and other vital information for every project in every country of operation.
Too often, natural resource wealth has served to enrich corrupt corporate execs and political elites, while the owners of those natural resources - citizens - have been left footing the bill for environmental destruction, community displacements, and lost economic opportunities. These reports present a treasure trove of information for advocacy groups, data scientists, journalists and citizens. Now, they can dig deep into oil, gas and mining data that was previously shrouded in secrecy.
In fact, some of the world’s largest extractive companies either listed or incorporated in the European Union, like Shell (UK), BP (UK), Total (France), Statoil (Norway) and Rio Tinto (UK), have already published their first reports. We can expect reporting from Canadian listed/incorporated companies in 2017; with the bulk of US-listed company reports disclosed in 2019.
Transparency advocates are beginning the hard work of sifting through this newly released data, and have already begun asking important questions like:
At Publish What You Pay – United States (PWYP-US), we launched a website in June – Extract-A-Fact – with the intention of empowering citizens, activists and journalists to harness oil, gas, and mining data and use it as a tool to demand accountability from governments and extractive companies. Extract-A-Fact does this by providing training modules detailing useful and creative ways to find, analyze, and visualize extractives data, as well as blog posts from PWYP-US and our partners as we dig deeper into oil, gas, and mining sector data to answer questions critical to communities impacted by natural resources.
Over the past year, other important initiatives and tools have sprung up across the globe:
These new initiatives are already empowering activists to demand accountability for how their country’s natural resources are managed. But civil society groups are not the only actors who recognize the importance of these disclosures. Recently, the law firm Holland & Knight (citing research done by a PWYP Data Extractor) wrote that payment transparency could lead to increased voluntary Foreign Corrupt Practices Act (FCPA) disclosures, aid law-enforcement officials in investigating possible FCPA violations, and have a deterrent effect on corruption.
This new era in extractives transparency is why I am so excited for IODC 2016. If you believe that a country’s natural resources belong to the citizens who live there, and that profits made from those resources should be used to benefit those people, rather than swell the pockets of corrupt government officials and shady executives – let’s talk. I want to work with you to brainstorm ways we can fight corruption by translating this data into accountability.
Interested to know more? Follow us on twitter - @pwypusa @janalmorgan @pwyptweets #ExtractAFact #NoSecretDeals
Click here for the archives to see our full list of posts.