In 2014, Niger announced it had successfully renegotiated uranium extraction contracts with French state-owned company Areva to secure a greater share of the wealth deriving from their uranium resources. Three years later, an analysis carried out by Oxfam based on data released by Areva calls into question the benefits for Niger in the contract renegotiation.
This analysis was carried out as part of the data extractor program developed by Publish What You Pay.
You can read more about Areva in Niger and more in the English version of “Beyond Transparency: Investigating the Investigating the New Extractive Industry Disclosures.” This report was published by Publish What You Pay France, Oxfam France, ONE, and Sherpa.
Understanding the context: why is Nigerien uranium so important for Areva?
Uranium is a strategic commodity for France. More than 75% of electricity produced in France comes from nuclear power. Most of the uranium used for nuclear combustion in France is supplied by Areva. Up to 1 in 5 lightbulbs in France would be lit up thanks to Nigerien uranium.
For years, civil society organizations have called out Areva for the uneven partnership with Niger. Despite vast resources in uranium, Niger has yet to convert this valuable resource into tangible wealth: the country still ranks second to last in the Human Development Index.
The renegotiation: a game-changer for Niger?
In 2013, Oxfam and ROTAB, a Nigerien NGO – both members of Publish What You Pay – launched a campaign denouncing the unbalanced partnership between Areva and Niger and calling for the renegotiation of the contracts. Oxfam and ROTAB specifically pointed that Areva’s contracts included a sweetheart clause enabling Areva to pay a lower rate of royalty than the applicable regime in Niger. Royalties make up the majority of uranium mining revenues to the Nigerien government.
In 2014, after months of pressure from civil society organizations around the world, Areva and Niger agreed to a new contract without the sweetheart clause. In June 2014, a Strategic Partnership Agreement signed between Areva and Niger stressed that Areva would be subject to the legal royalty regime, raising hopes of a fairer share of the revenues for Niger. This agreement was published on the Journal Officiel- the official gazette of the Republic of Niger where major legal official information are published.
In August 2016, Areva released for the first time the payments the company makes to governments where it mines uranium, as part of new EU regulations. In Niger, it was the first time the public had access to Areva’s payments since the renegotiation took place in Niger. And the results are surprising:
This tutorial is the first in a series that will help you learn how to analyse data about the extractives industry using Python.
We start off with a few FAQs before diving into the tutorial.
What are Python, R, Jupyter Notebooks, and Pandas?
Python and R are the most widely used programming languages for data science.
Python and R are just the languages, but you need a program on your computer where you can write, test, and run programs using those languages. For that purpose, you need a Development Environment, which is a program that lets you write, test and run code. For Python, the most popular Development Environment is called Jupyter Notebooks (which run right on a web browser), and for R, the most popular Development Environment is called RStudio.
On top of these languages, there are libraries which allows you to do specialised things. In Python, for example, the Pandas library is the most widely used library for data analysis.
Why are they useful?
How are they better than Excel, Tableau, etc?
What is the difference between R and Python?
Python is a general purpose programming language, which means that in addition to doing data science, you can write anything in Python ranging from web apps to online games. People with a background in programming usually find Python to be easier to learn than R.
R is a programming language that is especially made for data science. As a result you can’t write all kinds of things like games and apps with it, but it is awesome if you want to work on data. People coming from a data analysis or statistics background usually find R to be easier to learn than Python.
In terms of working with data, such as wrangling, analysing, and visualising, both Python and R works great for pretty much anything you can think of. The two languages are very different in terms of style and syntax, but their functionality is very similar.
What do I need to get started?
For this tutorial, we will just stick to Python, and we will just run it using an online version of Jupyter Notebooks that does not require installing anything on your computer. You can get started by going here: https://try.jupyter.org/
To learn more, head over to GitHub for the full tutorial.
A change in government often brings significant shifts in policy. Major initiatives taken up by a previous administration can be slowed or reversed, and information that was once publicly available may be taken down or censored. The White House webpage provides some clear examples of this phenomenon. Following the inauguration this past January, press reported that the Trump Administration White House homepage underwent some changes, such as striking references to climate change and removing a spanish language option. Fortunately, if a user wants to view the content from the White House homepage of President Barack Obama, it is still possible to do so by navigating to https://obamawhitehouse.archives.gov.
Citizens can also use the Internet Archive “Wayback Machine” to access www.WhiteHouse.gov and see content for any given day going back several years. These archive solutions are helpful for viewing web content, but hosted files on these pages still have the potential to get lost. Documents that are hosted on pages can become inaccessible as other content is changed.
Since 2010 the Publish What You Pay coalition, academics, industry, investors and other actors submitted hundreds of comment letters to the Securities and Exchange Commission (SEC) to influence the agency’s Section 1504 rulemaking. Every single comment that has been submitted to the SEC is available on the regulatory agency’s website. The comments are available as pdf files on four separate comment records: 2010, 2010-2012, 2013-2015, and 2015-2016. Because of the current wave of government self-censorship, we wanted to make sure we could preserve the evidence in the Section 1504 record. This post will provide the steps to download all linked documents, such as pdf files, from a website. The SEC comment record will be used as an example, but the same steps can be used to download and preserve files hosted on any site.
As with other data scraping and organizing processes, the steps described in this post could be carried out manually. For example, scraping data from a company pdf report can be done manually, with a user entering in data line by line into a spreadsheet, but that is a time-consuming process. As we described previously on Extract-A-Fact, there are tools to help speed up data scraping. To automate the downloading of all linked files on a website, we will use the Google Chrome extension, Chrono Download Manager - see the tutorial below.
Step 1 - Install the Chrome extension
Navigate to the Chrome web store page for the Chrono Download Manager and click the ‘Add to Chrome’ button in the upper right. A notice will pop up and you can safely click ‘Add extension’ to confirm installation. When the installation completes you should find a new icon in the upper right corner of your Chrome browser.
Step 2 - Download linked files
Before proceeding, we recommend you set a dedicated folder for downloads. Navigate to chrome://settings in your Chrome browser and set a specific downloads folder. See the image below for an example.
Next, navigate to the page with the files you intend to download. In this case we will use the most recent 1504 comment record. Once on the page, click the Chrono Download Manager icon in the upper right. Select the ‘Document’ tab in the window that pops up.
The ‘Document’ window presents a list of all the links on the page that are interpreted as documents. In this case, we are only concerned with downloading the pdf files. To narrow the selection, click the ‘pdf’ check box as shown below.
Once you’ve selected all the relevant documents you can click ‘Start all’ in the lower right of the window to download the files into the folder you selected in the Chrome browser settings.
*Optional Step 3 - Categorize the downloaded files
If you follow the steps above you will be able to successfully download all of the files from a webpage, which will simply be listed by their filename (e.g. s72515-1.pdf). To help organize the files, you can have Chrono Download Manager automatically attach the descriptive text corresponding to each file. Click the first document highlighted in green (see image above), scroll down to the last pdf and press shift+left mouse button on the last highlighted pdf. With all of the pdf files checkmarked and selected, click the ‘Task Properties’ tab as shown below.
Click the text box next to ‘Naming Mask’ and select ‘*text*.*ext*’ then click ‘Start All’ to download all of the files. You’ll find that the downloaded files will now appear in the folder with a descriptive title (e.g. Jana L. Morgan, Director, Publish What You Pay – United States) rather than the numbered file name.
In our first video training session, we presented a walkthrough of how to organize USEITI data for use in the open source mapping software QGIS. Fortunately, that dataset included geographic identifiers called Federal Information Processing Standard (FIPS) county codes--five digit codes identifying counties and county equivalents throughout the United States. However, not every dataset will include a geographic identifier alongside data attributed to a location. Google Refine is a powerful and versatile tool that can allow users to clean, manipulate, and transform their data. In this post we will walk through the process of using Google Refine to add geographic coordinates to a dataset.
Step 1 - Download and Install Google Refine
Navigate to the OpenRefine download page, and download Google Refine 2.5 for your operating system. Google Refine operates as a hybrid desktop and web application. When you run Google Refine, a browser window should open automatically and present you with the Google Refine web interface. Despite operating within a web browser window, Google Refine does not require an active internet connection to work. As long as the Google Refine application is running, you can navigate to http://127.0.0.1:3333/ to access the web interface.
Before we move to the next step, take a moment to download the following .csv file. This dataset was downloaded from ResourceProjects.org, and was reduced to only include 2015 projects carried out by Tullow Oil. Google Refine is a powerful piece of software, however, it can quickly get bogged down with very large sets of data. This file was limited to one company for the purposes of this tutorial.
Step 2 - Upload your dataset to Google Refine
To get started, click ‘Create Project’. You will be presented with a number of options for data inputs. We will create a new project using data from ‘this computer.’ Select the file downloaded in the step above, and click next to start the process of uploading the dataset.
Step 3 - Add a new column to fetch location information
With the dataset uploaded, Google Refine will present a preview of the entries. Review the data and headers to make sure everything appears as it should. At the bottom of the window check that the ‘Parse next’ box is ticked so that the first row entries are parsed as column headers.
Click the ‘create project’ button in the upper right corner to proceed to the main working space of Google Refine. As noted above, we will be adding in additional geographic information to this dataset. To do so, click the triangle in the ‘Paid to’ column and navigate to ‘Edit column’ > ‘Add column by fetching URLs…’
A window will pop up as shown below. Name the column and enter in the following text into the ‘Expression’ box. (Click here to learn more about General Refine Expression Language)
Click ‘OK’ and the expression will produce a column containing what is essentially the output of a search of the google maps application programming interface (API) on the basis of each term in the ‘Paid to’ column. This operation will typically take several minutes to complete depending on the size of the dataset. While you wait for the process to complete you can experiment to get a better of sense of how this function works. Enter the expression we just used, leaving off the last portion, into the address bar of another browser window:
Fill in the name of any location around the world after the “=” and you will see a page with all the relevant location information for that location. This should give you a better sense of what is happening under the hood with the fetching URLs function in Google Refine.
Step 4 - Add another column to parse the information from the previous step
Once the process has completed, you will see a column filled with a long string of text and numbers.
To clean this up we will add another column parsing through that data. Click on the triangle in the new column you created in Step 3 containing all the Google maps information, and select ‘Edit column’ > ‘Add column based on this column…’ Write in a title for this new column and enter in the following text into the ‘Expression’ box:
Click ‘OK’ and the new column will populate with a neat seat of latitude and longitude coordinates separated by a comma derived from the data in the column we produced in Step 3.
STEP 5 - Export your project
The final step is to click ‘Export’ in the upper right corner of the Google Refine window. Select ‘Comma-separated value’ or ‘Excel’ from the dropdown list of file types.
You can then open the exported file in a desktop application to delete the column containing the unparsed location information while leaving the second column we created that includes the latitude and longitude coordinates. Google Refine is ideal for refining, cleaning and adding to a dataset, but operations like deleting rows and columns should be done in programs like Excel.
While this post demonstrates how latitude and longitude coordinates can be derived from a country name, the exact same process can be carried out for any other location. If instead of country names the dataset contained the names of cities or provinces, the same steps can be used to obtain the latitude and longitude coordinates. Location information can help you to create persuasive maps and other visualizations of your data. To learn more about what can be done with extractives data and mapping, navigate to the training section of Extract-A-Fact.
Click here for the archives to see our full list of posts.