2012.03.20 ihr farquhar v03


Published on

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2012.03.20 ihr farquhar v03

  1. 1. Digital Scholarship, The BritishLibrary @The Future of History Adam Farquhar, Head of Digital Scholarship, The British Library
  2. 2. Outline• The British Library’s Digital Scholarship Department• Trends and requirements for research• Projects to address the requirements• Conclusion
  3. 3. A New Department: Digital Scholarship• Develop clear strategies and operating models for the British Library’s role in/contribution to digital scholarship• Develop innovative models for digital scholarship exploiting digital content and new technologies• Develop a coherent strategy for digitisation• Engage with new and existing user communities• Strengthen the Library’s capabilities 3
  4. 4. Digital Scholarship Collection Areas - Maps 4
  5. 5. Digital Scholarship Collection Areas – Arts, Sound, Video, Music
  6. 6. Digital Scholarship - Digitisation
  7. 7. Digital Scholarship – International Dunhuang Project
  8. 8. Digital Scholarship – Digital Curator Team 8
  9. 9. Digital ScholarshipDefinition Requirements • Comprehensive digital collections• The production, use and integration of digital • The ability to apply the tools of scholarship to digital collections: content, services and tools annotation, citation, comparison to facilitate scholarship and • Infrastructure to store, preserve, discover, access research • The ability to apply new tools for analysis, visualisation, and• Allow research areas to be experimentation investigated in new ways, • Collaboration through social using new tools, leading to networking tools, social bookmarking, wikis, sharing drafts with commentary new discoveries and • Non-traditional forms of outreach to analysis to generate new draw attention to research understanding 9
  10. 10. Research trendsTrend Requirement• More digital content • Mass and focused digitisation• More cross-disciplinary • Improved discovery• More collaborative • Interfaces for sharing and building services, annotation• More analysis • Visualisation tools• More data-driven • Conversion to data and analysis tools• More repurposing of content • Open licenses & APIs, documented formats
  11. 11. Creating thematic content - First World WarEuropeana – SB Berlin User generated content Centenary of the outbreak of the  Roadshows in 10 countries to First World War create unique pan-European Will create a European corpus of archive digitised materials concerning the  Preston event produced more First World War in all its aspects than 2300 images from letters, Will contribute to Europeana a diaries, medals, pictures, trench substantial collection of more art, and more than 400,000 outstanding sources
  12. 12. Creating massive digitised collections through partnershipThe British Newspaper Google BooksArchive A partnership between the British  A 6 year project starting June Library and brightsolid online publishing 2011 Will digitise up to 40 million newspaper pages from the British Librarys  250,000 Books, 1700-1870 collection over 10 years  From the French Revolution to the Collection includes runs of most end of slavery. newspapers published in the UK since  Material in major European 1800 languages Over 4m pages added since launch  Focus on books that are not yet freely available in digital form online  Access via Google Books and BL  Storage at Google and BL  Contract and terms available on the web!
  13. 13. Making broadcast news more accessible through speech-to-textBroadcast News IMPACT Historic Text• Broadcast News • Improve the digital accessibility of Television and radio news programmes receivable in the  UK, recorded by the British Library since May 2010 printed text produced before 1900  Currently record 37 hours per day from 15 TV channels  State-of-the-art OCR does not produce and 2 radio channels such as Al-Jazeera English, CNN, satisfactory results for old books, magazines France 24, Russia Today  Innovative search across subtitles (where available) and newspapers  Launch in reading rooms May 2012  Commercial OCR focuses on modern documents• Opening up speech archives  Historic material have archaic fonts, complex  AHRC-funded project looking at speech-to-text layouts, warped or degraded pages technologies for opening up audio and video archives  Manual post-correction is slow and  Project will index 3,00 hours of TV news and 3,000 hours expensive of radio content
  14. 14. Visualising Personal Digital Archives and Web ArchivesPersonal digital archives Web archives• Data analysis beyond documents • Create a research collection of UK websites• Use computer forensics techniques • Develop high-impact data analytical• Capture, management, description, and access services preservation of personal digital collections to facilitate access and • Demonstrate the potential of domain analysis level web archives, or the “haystacks”• Archives range from poets (W Cope) • UK web domain > 9m .uk domain and playwrights (H Pinter) to computer names scientists (D Michie) and biologists • Estimate 110TB/crawl
  15. 15. Making maps accessible through crowd-sourced geo-referencing• Goal • Results:  Make maps easy to find, access, use  725 maps assigned spatial metadata over 5 days  Crowd-sourcing map geo-referencing  Publicity minimal – social media key  Built on previous crowd-sourcing projects  ~90 participants  Addressed key challenges – awareness, engagement, productivity  Top five completed half the work  Data quality good: <3% had errors >.005• Approach  Accessible and convenient application  Immediate results and feedback  Competitive tools  Recognition and visible contribution
  16. 16. Conclusion – Support for Historical Research• Massive collections of digitised historical material• Increased integration of images, sound, video• Improved conversion to text• Improved support for entity extraction• Improved linkage across content silos• Improved frameworks to bring analysis tools to data• Improved tools for visualisation• Improved frameworks for annotation and sharing• Improved integration with research tools
  17. 17. Leveraging the power of digital reading individual works is as irrelevant as describing the architecture of a building from a single brick, or the layout of a city from a single church – Franco Moretti, Stanford 17