Leveraging WorldCatData Mining the Largest Library    Database in the World         Roy Tennant          OCLC Research
Algorithmically constructed                        from WorldCat records              Worldcat.org/identities/E U R O P E,...
A Union database of                               authority records                                                       ...
The Responsible Party                                               Thom Hickey                                           ...
290+ million recordsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Language Coverage  Total                  274 million                          Percentage of records for                  ...
W                                oE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
(J.K. Rowling)                                                                               (Diana Gabaldon)             ...
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Viaf.orgE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
VIAF ParticipantsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
“Super” Authority FileE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Our Cataloging Future                                                                     “Moving from                    ...
E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
Some Lessons    •    Widespread collaboration is essential    •    Normalizing the data is essential    •    Normalizing t...
Conclusions    • Data mining isn’t just useful, it’s essential    • Extracting data from MARC that is useful      in other...
Roy Tennanttennantr@oclc.org@rtennantroytennant.comE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N ...
Upcoming SlideShare
Loading in …5
×

Data Mining the Largest Library Database in the World

5,983 views
5,855 views

Published on

Presented at the OCLC EMEA Regional Council Meeting, 26 February 2013, Strasbourg, France

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
5,983
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Mining the Largest Library Database in the World

  1. 1. Leveraging WorldCatData Mining the Largest Library Database in the World Roy Tennant OCLC Research
  2. 2. Algorithmically constructed from WorldCat records Worldcat.org/identities/E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  3. 3. A Union database of authority records Viaf.orgE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  4. 4. The Responsible Party Thom Hickey Chief Scientist OCLC ResearchE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  5. 5. 290+ million recordsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  6. 6. Language Coverage Total 274 million Percentage of records for non-English materials German 36.5 French million Spanish 25.5 million Italian 11.3 Dutch million Russian 4.7 million 60.2% Latin 4.3 million 3.6 million 3.5 million 30 June 2012E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  7. 7. W oE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  8. 8. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  9. 9. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  10. 10. (J.K. Rowling) (Diana Gabaldon) (Galileo)E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  11. 11. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  12. 12. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  13. 13. Viaf.orgE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  14. 14. VIAF ParticipantsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  15. 15. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  16. 16. “Super” Authority FileE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  17. 17. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  18. 18. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  19. 19. Our Cataloging Future “Moving from cataloging to catalinking” Eric Miller, ZepheiraE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  20. 20. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  21. 21. Some Lessons • Widespread collaboration is essential • Normalizing the data is essential • Normalizing the data is complicated • Everything is interrelated: – You can’t bring names together if titles don’t match – You can’t bring titles together if names don’t match • Batch mode processing still rules (but we’re getting better and faster at it)E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  22. 22. Conclusions • Data mining isn’t just useful, it’s essential • Extracting data from MARC that is useful in other contexts is possible, but will require sophisticated processing • Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work • Thankfully, we are doing it, but there is much more to be doneE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  23. 23. Roy Tennanttennantr@oclc.org@rtennantroytennant.comE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

×