Your SlideShare is downloading. ×
Data Mining the Largest Library Database in the World
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Mining the Largest Library Database in the World

5,525

Published on

Presented at the OCLC EMEA Regional Council Meeting, 26 February 2013, Strasbourg, France

Presented at the OCLC EMEA Regional Council Meeting, 26 February 2013, Strasbourg, France

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,525
On Slideshare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Leveraging WorldCatData Mining the Largest Library Database in the World Roy Tennant OCLC Research
  • 2. Algorithmically constructed from WorldCat records Worldcat.org/identities/E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 3. A Union database of authority records Viaf.orgE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 4. The Responsible Party Thom Hickey Chief Scientist OCLC ResearchE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 5. 290+ million recordsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 6. Language Coverage Total 274 million Percentage of records for non-English materials German 36.5 French million Spanish 25.5 million Italian 11.3 Dutch million Russian 4.7 million 60.2% Latin 4.3 million 3.6 million 3.5 million 30 June 2012E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 7. W oE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 8. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 9. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 10. (J.K. Rowling) (Diana Gabaldon) (Galileo)E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 11. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 12. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 13. Viaf.orgE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 14. VIAF ParticipantsE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 15. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 16. “Super” Authority FileE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 17. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 18. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 19. Our Cataloging Future “Moving from cataloging to catalinking” Eric Miller, ZepheiraE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 20. E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 21. Some Lessons • Widespread collaboration is essential • Normalizing the data is essential • Normalizing the data is complicated • Everything is interrelated: – You can’t bring names together if titles don’t match – You can’t bring titles together if names don’t match • Batch mode processing still rules (but we’re getting better and faster at it)E U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 22. Conclusions • Data mining isn’t just useful, it’s essential • Extracting data from MARC that is useful in other contexts is possible, but will require sophisticated processing • Only very large organizations (e.g., OCLC, national libraries) have the data and resources to do this work • Thankfully, we are doing it, but there is much more to be doneE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L
  • 23. Roy Tennanttennantr@oclc.org@rtennantroytennant.comE U R O P E, M I D D L E E A S T & A F R I C A R E G I O N A L C O U N C I L

×