Accessing Aleph’s wealth of information requires a certain understanding of how to best use the search engine. While we are working hard to make the two million documents inside Aleph as easy to search across as possible, I have already provided many examples in a previous blog for why you should…
1) Be exact when selecting search terms
2) Narrow down search results
3) Use Aleph to find the particular, and not the common
4) Use the language of companies
5) Be creative and playful
Here I would like to add to this list of tips and tricks, highlight new features of Aleph’s latest release and suggest a few more interesting search examples. So let me continue with…
6) Sorting by filing date
The latest release of Aleph allows you to sort any given search result by “relevance”, “newest” and “oldest”. We chose “relevance” to be the default, which is defined by both the number of search matches inside a document and its respective filing date. Yet there are many scenarios in which you might want to find the most recent document. Let’s say, you are interested in the latest on the Jubilee field in Ghana: a quick search for the name of the project – sorted by newest – will allow you to find out about recent production figures, planed infrastructure developments in nearby blocks, and even gross sales volumes for the past months.
7) Filtering by company or filing type
If we now assume you are only interested in a particular company, it is helpful to filter Aleph’s search result to only contain documents from that company. Let’s say, you want to read on only one of Jubilee’s shareholders, Kosmos Energy. All you have to do is click on the “company facet” on the right sidebar and select Kosmos from the total list of all the companies, for which Aleph stores documents with your search terms. If this still doesn’t bring you to what you are searching for, for example Kosmos’ latest annual report (10-K filing), try filtering by filing type as well. In order to so, you should keep in mind however, that filing type names not only vary between the different stock exchanges, but also from year to year. In other words, keep an eye on the different ways a filing type can be named.
8) Continue to search within documents
Another tip on making best use of Aleph is to continue searching within documents. This new feature is made possible by Aleph’s in-build document viewer, that loads once you open a particular search result. For example, if you have identified a document that matches your interest – let’s stick to the example of Kosmos’ annual report: how do you now find what is of interest to you within the document? Let’s say, you want to know which corporate risks Kosmos had mentioned? Well… all you have to do is type it into searchbar at the right corner, which will direct you to the several mentions of risks, and in particular to the so-called risks section. This particular method allows you to put your initial findings into the context of a document.
9) Keep it simple
Last but not least, you should keep in mind that Aleph only shows up documents with exact matches to your query. If there is a typo in your query, for example, Aleph will not lead you to what you are looking for. Also, Aleph is not designed to be a google alternative, where you can enter a “what are the gold reserves of Ghana?” type question. Instead, keep it simple. Put in the terms “gold”, “reserves”and “Ghana” and Aleph will point you to the resource and reserve statements of all listed companies, that are active in Ghana.
When keeping these tips and tricks in mind, however, you will be surprised about what you are going to find.
By David Mihalyi and Chris Perry, Natural Resource Governance Institute
This post originally appeared on www.resourcegovernance.org on April 1, 2016
NRGI is excited to launch the public alpha version of ResourceProjects.org.
ResourceProjects.org is an open-source repository of data on oil, gas and mining projects across the world. It provides a platform to collect, display, download and search extractive project information using open data. It aims to harvest data on project-by-project payments to governments—based on recent mandatory disclosure legislation in the EU, U.S. and Canada as well as EITI reports—and link it to associated information about the project from a variety of sources. The platform will make it easier for journalists, CSOs, researchers and government officials to search, access and download relevant data.
As we continue to develop the platform and connect it to new data sources, we are inviting contributors and collaborators to get involved.
Why does project-level data matter?
Projects are the physical, tangible presence of extractive operations in a country. A project is the mine that people see out of their window or the oil field along their coastline. But a project also has a concession area where it is located, one or more participating companies, contract documents detailing their obligations and payment information giving an insight into their economic contribution.
Governments and citizens groups can also use project data to model revenues and forecast budgets, such as in Ghana, where all interested parties could see how different oil prices affected the money available for the budget. Others, such as CCSI, Global Witness and Open Oil have modelled contracts to evaluate extractive deals, while IMF economists routinely use project-level information for fiscal design and technical assistance using their publicly available FARI model. Project information has a multitude of applications beyond fiscal modeling. It can be tied to spatial data to help better understand local impacts or environmental consequences, as highlighted by recent academic papers.
Why did NRGI build this tool?
Information on extractive projects are scattered across different company and government websites, in EITI reports, as well as databases compiled by regulators, international organizations and civil society. It comes in multiple formats: PDF, spreadsheets and in computer queryable databases. These are rarely linked to each other at all.
ResourceProjects.org brings this information into one place. We are also working on linking the data gathered to other repositories on related entities, such as OpenCorporates for associated companies; ResourceContractsfor oil and mining contracts; and Open Oil`s concession map. All information on the platform is stored with details on what source it came from and how it was retrieved. By bringing this information together in a standardized and accessible format, we are allowing users to explore extractive projects with greater depth.
How to get involved?
We are now looking for people who are interested in getting involved in the site. By the end of April, we will have added company disclosures from the U.K. that are starting to be released. Beyond the U.K., many companies are beginning to release project-by-project tax payment data. We would welcome any organisations or individuals who wish to lead on sourcing data from specific countries from upcoming mandatory disclosures.
Additionally we are inviting feedback as well as interested collaborators to help develop the site and its content. Further features and enhancements will be rolled out in the coming weeks and we are looking for partners who want to get more closely involved.
Finally, we are seeking to support the growing community of data users. Please sign up to the ResourceProjects mailing list if you want to keep up to date with what’s happening and how different organizations are using project-level information for improving resource governance.
If you are interested in getting involved, please contact NRGI economic analyst David Mihalyi at email@example.com.
David Mihalyi is an economic analyst and Chris Perry is an open data analyst with NRGI.
With new disclosure laws in effect in the European Union, Norway, Canada, and now the United States, there has never been a better time to be on the front lines of the fight for transparency and accountability in the extractive industries. Project level payments are being disclosed by oil and gas companies like BP, Statoil, and Shell. However, as any seasoned data extractor will know, this information is often released in PDF format, making it difficult to transfer the data tables into a spreadsheet application where it can be put to use. While companies listed and registered in the UK are required to disclose information in XML format (company reports are made available for download in .csv files), not all countries require this, making data extraction seem an arduous task.
Tabula is a powerful and extremely useful open-source web application for extracting data locked in tables in PDF documents. Similar to the Google Scraper application we introduced in an earlier post, Tabula can expedite the process of getting information into a usable format when copying and pasting is not an option.
NOTE: Tabula will only work on optical character recognition (OCR) enabled PDF documents, not image-based documents. Put simply, PDF reports with OCR make the content searchable and interpretable by software. Even if you don’t know which type your document is, Tabula will prevent you from uploading the wrong kind.
Tabula can be downloaded at Tabula.technology and works for Windows and Mac users. Follow the instructions on the page and make sure to have a version of Java installed.
Once you have Tabula installed, double click the "tabula" application file in the Tabula folder and it will open up to a page in your web browser.
NOTE: the command prompt will open and run for a few seconds before the page opens in the browser
The first step is to import the PDF from which you want to extract data into Tabula. For the remainder of this post, we will be using the BHP Billiton Economic contribution and payments to governments Report 2015, which the company released voluntarily in September, 2015. Some company reports include this information in .csv files, but so far BHP Billiton has only provided PDF reports.
If we try a simple copy and paste operation into a spreadsheet application with any of the tables in the document, we will find that all the information is imported into a single column or a single cell.
Download the report from the company site linked above, then use the “Browse” button in the Tabula page to find the saved PDF file. Select it and click “Import”.
For today’s post we are sharing an easy way to see the full list of mandatory disclosure reports submitted to Companies House Extractives Service in the United Kingdom (UK). This site houses the project-level payments reports filed in compliance with the UK implementation of the EU Accounting and Transparency Directives, which requires oil, gas and mining companies listed on a UK stock exchange (or large companies incorporated in the UK) to report their project-level payments to governments.
International oil majors BP and Shell, as well as mining giant Rio Tinto, have now made their reports publicly available on the UK register, in a machine-readable format (excel files). However, there is a slight problem with how the Companies House site presents the reports. Upon navigating to their page you are met with the following:
As you can see above, there is nothing more than a search tool, and there is no way to browse the list of companies that have filed reports. In order to find a company report, you need to search for that name specifically.
Fortunately, we have found a way to access the full list of companies that have reported!
There are two ways to do this:
Inputting this search query gives you a list of the currently available reports in no discernible order. There are dates attached to each report but you will have to click on each company to find them:
Mandatory disclosure reporting is new and it is important that companies report at the appropriate level of granularity, and that governments make these disclosures easily accessible and usable for the public. The best practice for presenting mandatory disclosure data is to provide machine-readable files as is found on the UK site. This allows stakeholders to skip the steps of scraping and cleaning the data as was necessary with the Total report, which was filed under France’s transposition of the EU Directives.
In a future post we will describe the process for accessing extractives data in the countries where reports are available--information on where to find company disclosure data can be found on our resources page.
Click here for the archives to see our full list of posts.