Scoda openrefine-directordata

  • 162 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
162
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • A recipe for grabbing director information from OpenCorporatesusing OpenRefinegiven an OpenCorporates company ID or OpenCorporates company page URL For more information, contact: schoolOfData.org
  • Here’s the start of thing we’re starting with – a list of companies…
  • Here’s the sort of thing we want – lists of directors associated with each company (where that information is available).
  • The first step is to create a web address/URL to call the OpenCorporates API and ask it for data about a particular company. OpenRefine can create a new column populated with the contents of calls made to a URL contained in, or generated from, another column.
  • The URLs should take the form:http://api.opencorporates.com/companies/JURISDICTION/COMPANY_IDIf you already have company page URLs in a column, add column based on that column using:value.replace(‘http://’,’http://api”)If you have JURISDICTION/COMPANY_ID in a column, use the formula:“http://api.opencorporates.com/companies/”+value
  • The data comes back as JSON data, which we will need to process.Each JSON result contains the data for a single company. The data relating to the directors can be found as a list down the path value.parseJson()['results']['company']['officers’]
  • Let’s parse the JSON data an put the directors information into another column…
  • What we are aiming for is a contrivance based on the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-2232866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::null…where we list director ID, name, position, appointment date, termination date.
  • This function will parse the data into string with the form:32866743::SIMON ALAN CONSTANT-GLEMAS::director::2010-04-07::null||32866744::KARIN JACQUELINE HAWKINS::director::2006-01-17::2012-02-22||32866745::ANDREW WILLIAM LONGDEN::director::2003-11-03::null||…The function reads as follows: “for each officer, join their ID, name, position, start date and end data with ::, then join each of these director descriptions using ||”.The use of two different – and hopefully unique – delimiters means we can split the data on each delimiter type separately.
  • The parsed data is put into a new column in this combined list form.
  • We can then split the data so that we create a new row for each director using the delimiter we defined: ||
  • Note that values from the other columns will not be copied into any newly created rows – we will have to do that ourselves either now, or later.
  • For each director, we now want to split their details out across several columns, one for each data field (ID, name, position, appointment date, termination date).
  • We can do this by splitting on the other separator type we used: ::
  • The newly created columns are labeled with automatically generated names. It would probably make sense to rename them to something slightly more convenient.
  • Finally, we can do a little more tidying. For any columns we want to export, such as company name, or company ID, we can Fill down using the corresponding values from the original row the directors’ information was pulled from.
  • If you want to know more, contact us…

Transcript

  • 1. Grabbing Director Data
  • 2. forEach(value.parseJson()['results']['company']['officers'], v , [v.officer.id, v.officer.name, v.officer.position, v.officer.star t_date, v.officer.end_date].join('::')).join('||')
  • 3. (It would probably make sense to rename the newly created columns.)
  • 4. Starting with the first row, Fill down will fill blank rows in a column with the value In the preceding row… (so we can fill down company names and ID columns for each corresponding director)
  • 5. SchoolOfData.org