Scoda openrefine-directordata
Upcoming SlideShare
Loading in...5
×
 

Scoda openrefine-directordata

on

  • 1,116 views

 

Statistics

Views

Total Views
1,116
Views on SlideShare
1,103
Embed Views
13

Actions

Likes
1
Downloads
15
Comments
0

1 Embed 13

https://twitter.com 13

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scoda openrefine-directordata Scoda openrefine-directordata Document Transcript

  • A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine   given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL     For  more  informa<on,  contact:  schoolOfData.org   1  
  • Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…   2  
  • Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company   (where  that  informa<on  is  available).   3  
  • The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it   for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated   with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another   column.   4  
  • The  URLs  should  take  the  form:   h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID   If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that   column  using:   value.replace(‘h"p://’,’h"p://api”)   If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:   “h"p://api.opencorporates.com/companies/”+value   5  
  • The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.   Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the   directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company'] ['officers’]   6  
  • Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…   7  
  • What  we  are  aiming  for  is  a  contrivance  based  on  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null   32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22   32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null   …   where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.   8  
  • This  func<on  will  parse  the  data  into  string  with  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null|| 32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22|| 32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…   The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start   date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.   The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the   data  on  each  delimiter  type  separately.   9  
  • The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.   10  
  • We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the   delimiter  we  defined:  ||   11  
  • Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created   rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.   12  
  • For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one   for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).   13  
  • We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::   14  
  • The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would   probably  make  sense  to  rename  them  to  something  slightly  more  convenient.   15  
  • Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as   company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from   the  original  row  the  directors’  informa<on  was  pulled  from.   16  
  • If  you  want  to  know  more,  contact  us…   17