This tutorial is the first in a series that will help you learn how to analyse data about the extractives industry using Python.
We start off with a few FAQs before diving into the tutorial.
What are Python, R, Jupyter Notebooks, and Pandas?
Python and R are the most widely used programming languages for data science.
Python and R are just the languages, but you need a program on your computer where you can write, test, and run programs using those languages. For that purpose, you need a Development Environment, which is a program that lets you write, test and run code. For Python, the most popular Development Environment is called Jupyter Notebooks (which run right on a web browser), and for R, the most popular Development Environment is called RStudio.
On top of these languages, there are libraries which allows you to do specialised things. In Python, for example, the Pandas library is the most widely used library for data analysis.
Why are they useful?
How are they better than Excel, Tableau, etc?
What is the difference between R and Python?
Python is a general purpose programming language, which means that in addition to doing data science, you can write anything in Python ranging from web apps to online games. People with a background in programming usually find Python to be easier to learn than R.
R is a programming language that is especially made for data science. As a result you can’t write all kinds of things like games and apps with it, but it is awesome if you want to work on data. People coming from a data analysis or statistics background usually find R to be easier to learn than Python.
In terms of working with data, such as wrangling, analysing, and visualising, both Python and R works great for pretty much anything you can think of. The two languages are very different in terms of style and syntax, but their functionality is very similar.
What do I need to get started?
For this tutorial, we will just stick to Python, and we will just run it using an online version of Jupyter Notebooks that does not require installing anything on your computer. You can get started by going here: https://try.jupyter.org/
To learn more, head over to GitHub for the full tutorial.
Click here for the archives to see our full list of posts.