Extract-A-Fact
  • Blog
  • Training
  • Data Tools
  • Maps
  • About
A project of Publish What You Pay - United States

Digging Deep into Oil, Gas, and Mining Data

Introduction to Python

6/9/2017

0 Comments

 
By Yan Naung Oak, Phandeeyar
The full version of this tutorial is available on GitHub

This tutorial is the first in a series that will help you learn how to analyse data about the extractives industry using Python.

​We start off with a few FAQs before diving into the tutorial.

What are Python, R, Jupyter Notebooks, and Pandas?
Python and R are the most widely used programming languages for data science.

​Python and R are just the languages, but you need a program on your computer where you can write, test, and run programs using those languages. For that purpose, you need a Development Environment, which is a program that lets you write, test and run code. For Python, the most popular Development Environment is called Jupyter Notebooks (which run right on a web browser), and for R, the most popular Development Environment is called RStudio.

On top of these languages, there are libraries which allows you to do specialised things. In Python, for example, the Pandas library is the most widely used library for data analysis.

Why are they useful?
  • Used by professional data scientists
  • Open source
  • Huge communities that are constantly improving them
  • You can do pretty much anything you want to do with your data using R and Python

How are they better than Excel, Tableau, etc?
  • Everything you can do in Excel, Tableau, or other applications, you can probably do with Python and R
  • Python and R are both free to use unlike Excel, Tableau, etc
  • Repetitive tasks that get tedious in point and click software can be automated. For example, if you want to make 100 different charts in Excel, it will take forever. But if you write a script, you just have to write it once and you can keep running it as many times as you want
  • Because you are writing commands for every step, you have a record of exactly what you did, and that makes it easier to see if you have made any mistakes
  • It is also easier to share your work with others, and also to incorporate scripts that others have written into your work

What is the difference between R and Python?
Python is a general purpose programming language, which means that in addition to doing data science, you can write anything in Python ranging from web apps to online games. People with a background in programming usually find Python to be easier to learn than R.

R is a programming language that is especially made for data science. As a result you can’t write all kinds of things like games and apps with it, but it is awesome if you want to work on data. People coming from a data analysis or statistics background usually find R to be easier to learn than Python.

In terms of working with data, such as wrangling, analysing, and visualising, both Python and R works great for pretty much anything you can think of. The two languages are very different in terms of style and syntax, but their functionality is very similar.

What do I need to get started?
For this tutorial, we will just stick to Python, and we will just run it using an online version of Jupyter Notebooks that does not require installing anything on your computer. You can get started by going here: https://try.jupyter.org/

To learn more, head over to GitHub for the full tutorial.

​

0 Comments
    Picture

    Recent Posts

    Click here for the archives to see our full list of posts. 
    • Accessing and using UK-based extractive company reports on payments to governments data
    • Can Petronia Avoid the Resource Curse? It’s All Up to You in a New Interactive Online Course
    • Should We Celebrate The Government-Zimplats Deal or Worry?
    • What is the Presource Curse
    • Guyana's oil deal is outlier low: government takes just over half
    • Shell Published its Payments to Governments. Nigeria Has Taken Notice
    • Digging Into Mandatory Disclosure Data: Highlights from One’s Datadive With Datakind UK
    • Spanish Energy Giant’s Iraq Payments Highlight Commodities Trading Transparency Gap
    • Visualising Extractives Data with RAW

      Stay updated!

    Subscribe to Newsletter

    Tags

    All
    Accountability
    Aleph
    Australia
    Big Oil
    Canada
    Contracts
    Data
    EITI
    ESTMA
    Fiscal Model
    France
    Investors
    Latin America
    Malawi
    Maps
    Mining
    Nigeria
    Oil & Gas
    Peru
    PWYP
    Python
    Reports
    Revenues
    Scraper
    Shell
    Tableau
    Taxes
    Tools
    Total
    Trading
    Training
    United States
    USEITI
    Visualization
    Zambia

    ​Archives

    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    December 2016
    November 2016
    October 2016
    September 2016
    August 2016
    July 2016
    June 2016

    RSS Feed

Copyright 2018
Publish What You Pay - United States
Banner photo by Daniel Sallai, available under a Creative Commons license.
  • Blog
  • Training
  • Data Tools
  • Maps
  • About