Your SlideShare is downloading. ×
Scoda openrefine-directordata
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Scoda openrefine-directordata

1,037

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,037
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
17
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine   given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL     For  more  informa<on,  contact:  schoolOfData.org   1  
  • 2. Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…   2  
  • 3. Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company   (where  that  informa<on  is  available).   3  
  • 4. The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it   for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated   with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another   column.   4  
  • 5. The  URLs  should  take  the  form:   h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID   If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that   column  using:   value.replace(‘h"p://’,’h"p://api”)   If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:   “h"p://api.opencorporates.com/companies/”+value   5  
  • 6. The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.   Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the   directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company'] ['officers’]   6  
  • 7. Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…   7  
  • 8. What  we  are  aiming  for  is  a  contrivance  based  on  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null   32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22   32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null   …   where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.   8  
  • 9. This  func<on  will  parse  the  data  into  string  with  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null|| 32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22|| 32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…   The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start   date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.   The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the   data  on  each  delimiter  type  separately.   9  
  • 10. The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.   10  
  • 11. We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the   delimiter  we  defined:  ||   11  
  • 12. Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created   rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.   12  
  • 13. For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one   for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).   13  
  • 14. We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::   14  
  • 15. The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would   probably  make  sense  to  rename  them  to  something  slightly  more  convenient.   15  
  • 16. Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as   company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from   the  original  row  the  directors’  informa<on  was  pulled  from.   16  
  • 17. If  you  want  to  know  more,  contact  us…   17  

×