Scoda openrefine-directordata


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • A recipe for grabbing director information from OpenCorporatesusing OpenRefinegiven an OpenCorporates company ID or OpenCorporates company page URL For more information, contact:
  • Here’s the start of thing we’re starting with – a list of companies…
  • Here’s the sort of thing we want – lists of directors associated with each company (where that information is available).
  • The first step is to create a web address/URL to call the OpenCorporates API and ask it for data about a particular company. OpenRefine can create a new column populated with the contents of calls made to a URL contained in, or generated from, another column.
  • The URLs should take the form: you already have company page URLs in a column, add column based on that column using:value.replace(‘http://’,’http://api”)If you have JURISDICTION/COMPANY_ID in a column, use the formula:“”+value
  • The data comes back as JSON data, which we will need to process.Each JSON result contains the data for a single company. The data relating to the directors can be found as a list down the path value.parseJson()['results']['company']['officers’]
  • Let’s parse the JSON data an put the directors information into another column…
  • What we are aiming for is a contrivance based on the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-2232866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::null…where we list director ID, name, position, appointment date, termination date.
  • This function will parse the data into string with the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null||32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-22||32866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::null||…The function reads as follows: “for each officer, join their ID, name, position, start date and end data with ::, then join each of these director descriptions using ||”.The use of two different – and hopefully unique – delimiters means we can split the data on each delimiter type separately.
  • The parsed data is put into a new column in this combined list form.
  • We can then split the data so that we create a new row for each director using the delimiter we defined: ||
  • Note that values from the other columns will not be copied into any newly created rows – we will have to do that ourselves either now, or later.
  • For each director, we now want to split their details out across several columns, one for each data field (ID, name, position, appointment date, termination date).
  • We can do this by splitting on the other separator type we used: ::
  • The newly created columns are labeled with automatically generated names. It would probably make sense to rename them to something slightly more convenient.
  • Finally, we can do a little more tidying. For any columns we want to export, such as company name, or company ID, we can Fill down using the corresponding values from the original row the directors’ information was pulled from.
  • If you want to know more, contact us…
  • Scoda openrefine-directordata

    1. 1. Grabbing Director Data
    2. 2. forEach(value.parseJson()['results']['company']['officers'], v , [,, v.officer.position, t_date, v.officer.end_date].join('::')).join('||')
    3. 3. (It would probably make sense to rename the newly created columns.)
    4. 4. Starting with the first row, Fill down will fill blank rows in a column with the value In the preceding row… (so we can fill down company names and ID columns for each corresponding director)
    5. 5.