A	
  recipe	
  for	
  grabbing	
  director	
  informa-on	
  from	
  OpenCorporates	
  using	
  OpenRefine	
  
given	
  an	
...
Here’s	
  the	
  start	
  of	
  thing	
  we’re	
  star<ng	
  with	
  –	
  a	
  list	
  of	
  companies…	
  

2	
  
Here’s	
  the	
  sort	
  of	
  thing	
  we	
  want	
  –	
  lists	
  of	
  directors	
  associated	
  with	
  each	
  compa...
The	
  first	
  step	
  is	
  to	
  create	
  a	
  web	
  address/URL	
  to	
  call	
  the	
  OpenCorporates	
  API	
  and	...
The	
  URLs	
  should	
  take	
  the	
  form:	
  
h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID	
  
If	
 ...
The	
  data	
  comes	
  back	
  as	
  JSON	
  data,	
  which	
  we	
  will	
  need	
  to	
  process.	
  
Each	
  JSON	
  r...
Let’s	
  parse	
  the	
  JSON	
  data	
  an	
  put	
  the	
  directors	
  informa<on	
  into	
  another	
  column…	
  

7	...
What	
  we	
  are	
  aiming	
  for	
  is	
  a	
  contrivance	
  based	
  on	
  the	
  form:	
  
32866743::SIMON	
  ALAN	
 ...
This	
  func<on	
  will	
  parse	
  the	
  data	
  into	
  string	
  with	
  the	
  form:	
  
32866743::SIMON	
  ALAN	
  C...
The	
  parsed	
  data	
  is	
  put	
  into	
  a	
  new	
  column	
  in	
  this	
  combined	
  list	
  form.	
  

10	
  
We	
  can	
  then	
  split	
  the	
  data	
  so	
  that	
  we	
  create	
  a	
  new	
  row	
  for	
  each	
  director	
  u...
Note	
  that	
  values	
  from	
  the	
  other	
  columns	
  will	
  not	
  be	
  copied	
  into	
  any	
  newly	
  create...
For	
  each	
  director,	
  we	
  now	
  want	
  to	
  split	
  their	
  details	
  out	
  across	
  several	
  columns,	
...
We	
  can	
  do	
  this	
  by	
  splijng	
  on	
  the	
  other	
  separator	
  type	
  we	
  used:	
  ::	
  

14	
  
The	
  newly	
  created	
  columns	
  are	
  labeled	
  with	
  automa<cally	
  generated	
  names.	
  It	
  would	
  
pro...
Finally,	
  we	
  can	
  do	
  a	
  likle	
  more	
  <dying.	
  For	
  any	
  columns	
  we	
  want	
  to	
  export,	
  su...
If	
  you	
  want	
  to	
  know	
  more,	
  contact	
  us…	
  

17	
  
Upcoming SlideShare
Loading in...5
×

Scoda openrefine-directordata

1,099

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,099
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Scoda openrefine-directordata"

  1. 1. A  recipe  for  grabbing  director  informa-on  from  OpenCorporates  using  OpenRefine   given  an  OpenCorporates  company  ID  or  OpenCorporates  company  page  URL     For  more  informa<on,  contact:  schoolOfData.org   1  
  2. 2. Here’s  the  start  of  thing  we’re  star<ng  with  –  a  list  of  companies…   2  
  3. 3. Here’s  the  sort  of  thing  we  want  –  lists  of  directors  associated  with  each  company   (where  that  informa<on  is  available).   3  
  4. 4. The  first  step  is  to  create  a  web  address/URL  to  call  the  OpenCorporates  API  and  ask  it   for  data  about  a  par<cular  company.  OpenRefine  can  create  a  new  column  populated   with  the  contents  of  calls  made  to  a  URL  contained  in,  or  generated  from,  another   column.   4  
  5. 5. The  URLs  should  take  the  form:   h"p://api.opencorporates.com/companies/JURISDICTION/COMPANY_ID   If  you  already  have  company  page  URLs  in  a  column,  add  column  based  on  that   column  using:   value.replace(‘h"p://’,’h"p://api”)   If  you  have  JURISDICTION/COMPANY_ID  in  a  column,  use  the  formula:   “h"p://api.opencorporates.com/companies/”+value   5  
  6. 6. The  data  comes  back  as  JSON  data,  which  we  will  need  to  process.   Each  JSON  result  contains  the  data  for  a  single  company.  The  data  rela<ng  to  the   directors  can  be  found  as  a  list  down  the  path  value.parseJson()['results']['company'] ['officers’]   6  
  7. 7. Let’s  parse  the  JSON  data  an  put  the  directors  informa<on  into  another  column…   7  
  8. 8. What  we  are  aiming  for  is  a  contrivance  based  on  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null   32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22   32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null   …   where  we  list  director  ID,  name,  posi<on,  appointment  date,  termina<on  date.   8  
  9. 9. This  func<on  will  parse  the  data  into  string  with  the  form:   32866743::SIMON  ALAN  CONSTANT-­‐GLEMAS::director::2010-­‐04-­‐07::null|| 32866744::KARIN  JACQUELINE  HAWKINS::director::2006-­‐01-­‐17::2012-­‐02-­‐22|| 32866745::ANDREW  WILLIAM  LONGDEN::director::2003-­‐11-­‐03::null||…   The  func<on  reads  as  follows:  “for  each  officer,  join  their  ID,  name,  posi<on,  start   date  and  end  data  with  ::,  then  join  each  of  these  director  descrip<ons  using  ||”.   The  use  of  two  different  –  and  hopefully  unique  –  delimiters  means  we  can  split  the   data  on  each  delimiter  type  separately.   9  
  10. 10. The  parsed  data  is  put  into  a  new  column  in  this  combined  list  form.   10  
  11. 11. We  can  then  split  the  data  so  that  we  create  a  new  row  for  each  director  using  the   delimiter  we  defined:  ||   11  
  12. 12. Note  that  values  from  the  other  columns  will  not  be  copied  into  any  newly  created   rows  –  we  will  have  to  do  that  ourselves  either  now,  or  later.   12  
  13. 13. For  each  director,  we  now  want  to  split  their  details  out  across  several  columns,  one   for  each  data  field  (ID,  name,  posi<on,  appointment  date,  termina<on  date).   13  
  14. 14. We  can  do  this  by  splijng  on  the  other  separator  type  we  used:  ::   14  
  15. 15. The  newly  created  columns  are  labeled  with  automa<cally  generated  names.  It  would   probably  make  sense  to  rename  them  to  something  slightly  more  convenient.   15  
  16. 16. Finally,  we  can  do  a  likle  more  <dying.  For  any  columns  we  want  to  export,  such  as   company  name,  or  company  ID,  we  can  Fill  down  using  the  corresponding  values  from   the  original  row  the  directors’  informa<on  was  pulled  from.   16  
  17. 17. If  you  want  to  know  more,  contact  us…   17  

×