Scoda openrefine-directordata

  • 927 views
Uploaded on

 

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
927
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
15
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine   given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL     For  more  informa<on,  contact:  schoolOfData.org   1  
  • 2. Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…   2  
  • 3. Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company   (where  that  informa<on  is  available).   3  
  • 4. The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it   for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated   with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another   column.   4  
  • 5. The  URLs  should  take  the  form:   h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID   If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that   column  using:   value.replace(‘h"p://’,’h"p://api”)   If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:   “h"p://api.opencorporates.com/companies/”+value   5  
  • 6. The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.   Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the   directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company'] ['officers’]   6  
  • 7. Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…   7  
  • 8. What  we  are  aiming  for  is  a  contrivance  based  on  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null   32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22   32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null   …   where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.   8  
  • 9. This  func<on  will  parse  the  data  into  string  with  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null|| 32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22|| 32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…   The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start   date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.   The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the   data  on  each  delimiter  type  separately.   9  
  • 10. The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.   10  
  • 11. We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the   delimiter  we  defined:  ||   11  
  • 12. Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created   rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.   12  
  • 13. For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one   for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).   13  
  • 14. We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::   14  
  • 15. The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would   probably  make  sense  to  rename  them  to  something  slightly  more  convenient.   15  
  • 16. Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as   company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from   the  original  row  the  directors’  informa<on  was  pulled  from.   16  
  • 17. If  you  want  to  know  more,  contact  us…   17