Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Like this? Share it with your network

Share

Austrian Books Online. The Austrian National Library's Large-Scale Digitisation Public-Private Partnership with Google

  • 1,628 views
Uploaded on

Library Science Talk in Geneva and Bern, Switzerland, 15 & 16 October, 2012

Library Science Talk in Geneva and Bern, Switzerland, 15 & 16 October, 2012

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,628
On Slideshare
1,583
From Embeds
45
Number of Embeds
4

Actions

Shares
Downloads
12
Comments
0
Likes
2

Embeds 45

https://twitter.com 40
http://twitter.com 3
http://www.linkedin.com 1
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Austrian Books Online The Austrian National Library’slarge-scale digitisation public-private partnership with Google Max Kaiser Head R&D, Austrian National Library Library Science Talk Geneva, 15 October 2012 Bern, 16 October 2012@maxkaiser
  • 2. Austrian Books Onlinewww.onb.ac.at/ev/austrianbooksonline/@maxkaiser
  • 3. www.slideshare.net/maxkaiser@maxkaiser
  • 4. digitisationof the entire historicalbook holdings of theAustrian National Library @maxkaiser
  • 5. largest Austrianpublic private partnershipin the cultural sector @maxkaiser
  • 6. al Libra r y an NationAustri @maxkaiser @maxkaiser
  • 7. history back to the14th century @maxkaiser
  • 8. one of the world‘s most significant collections@maxkaiser@maxkaiser
  • 9. „legal deposit“ Quelle: http://commons.wikimedia.org/wiki/File:A ustria_Hungary_ethnic_de.svg@maxkaiser@maxkaiser
  • 10. @maxkaiser@maxkaiser
  • 11. legal deposit today→print publications→online publications→web archiving@maxkaiser
  • 12. ons c ti o lle ialc pe c n ss e ve@maxkaiser@maxkaiser
  • 13. → Picture Archives and Graphics Department → Map Department → Music Department → Literary Archives → Papyri Department → Department of Planned Languages → Department of Rare Books and Manuscripts@maxkaiser
  • 14. e nt partm sic de mu@maxkaiser
  • 15. e nt partm sic de mu@maxkaiser@maxkaiser
  • 16. ms s e u mu ur fo@maxkaiser@maxkaiser
  • 17. → State Hall → Papyrus Museum → Globe Museum → Esperanto Museum@maxkaiser
  • 18. papyrus depart ment & museum @maxkaiser @maxkaiser
  • 19. guages Lan ned eum Plan Mus en t of ntoDep artm Espera & @maxkaiser @maxkaiser
  • 20. um use e M lo b G@maxkaiser@maxkaiser
  • 21. @maxkaiser@maxkaiser
  • 22. @maxkaiser@maxkaiser
  • 23. @maxkaiser@maxkaiser
  • 24. 1 6 reading rooms@maxkaiser@maxkaiser
  • 25. 9a m –9p m – 7 days/week@maxkaiser@maxkaiser
  • 26. libr ary as soc ial space@maxkaiser@maxkaiser
  • 27. @maxkaiser@maxkaiser
  • 28. @maxkaiser@maxkaiser
  • 29. services for res@maxkaiser@maxkaiser earchers
  • 30. @maxkaiser@maxkaiser
  • 31. access for everyonefrom anywhere@maxkaiser
  • 32. pagexss 0 msio.rs & legal te t+1 l new papehistorica @maxkaiser @maxkaiser
  • 33. @maxkaiser@maxkaiser
  • 34. @maxkaiser@maxkaiser
  • 35. @maxkaiser@maxkaiser
  • 36. s e ve r al 100k@maxkaiser@maxkaiser images
  • 37. 140k@maxkaiser@maxkaiser portraits
  • 38. 100k* posters * by end 2012@maxkaiser@maxkaiser
  • 39. @maxkaiser@maxkaiser papyri…
  • 40. @maxkaiser@maxkaiser
  • 41. @maxkaiser@maxkaiser
  • 42. → September 2012 http://www.onb.ac.at/ about/21043.htm @maxkaiser
  • 43. Vision 2025 Knowledge for the world of tomorrowOur holdings are digitizedWe collect and sustain knowledgeAccess to our knowledge is simpleWith us, research is more faceted and effectiveWe enrich cultural and social life @maxkaiser
  • 44. Our holdings are digitized→ substantial parts of holdings digitized→ cooperation with private partners→ full text search→ added-value services like semantic search→ unified access system @maxkaiser
  • 45. We collect and sustain knowledge→ focal point of collection policy is digital →preference for digital versions of publications→ user generated content and social networks→ digital photography→ preservation of analogue and digital collections→ scalable digital archive @maxkaiser
  • 46. Access to our knowledge is simple→ unified access system for all collections→ focus of cataloguing: metadata enrichment→ linking of metadata with external resources→ open data→ APIs and support for third party apps @maxkaiser
  • 47. With us, research is more faceted and simple→ integration of digital content in virtual research environments→ support for digital humanities→ strong research collections and libraries→ cooperation with universities and research centres @maxkaiser
  • 48. We enrich cultural and social life→ digital services, reading rooms and museums→ innovative interfaces→ mobile services→ cooperation with private partners: reuse of data for innovative services→ reinforce library as social space @maxkaiser
  • 49. Austrian Books Online@maxkaiser
  • 50. 600,000 volumes200 Mio pages @maxkaiser
  • 51. 16th century 2nd half of 19th century _ @maxkaiser
  • 52. Google Books Digital Library Austrian National Library@maxkaiser
  • 53. Partner ProgramGoogle Books Library Program @maxkaiser
  • 54. 13 Libraries in Europe5 National Libraries  Italy  Austria  The Netherlands  Czech Republic  Great Britain @maxkaiser
  • 55. >20 Mio. books > 50% non-English ~ 75% from libraries ~ 2 Mio. books from European libraries > 3 Mio. books public domain @maxkaiser
  • 56. some strategy and policy considerations… @maxkaiser
  • 57. policy slides ahead!
  • 58. @maxkaiser
  • 59. is apublic privatepartnership?@maxkaiser
  • 60. ≠ service contract or service outsourcing →long duration of the relationship →substantial investment by private partner →distribution of risks @maxkaiser
  • 61. rationales for PPPs→ private funding for Public Sector→ benefit from know-how and working methods of the private sector→but not a „miracle solution“ for the public sector (EC Green Paper on Public Private Partnerships, 2004) @maxkaiser
  • 62. public privatepartnerships in thecultural sector@maxkaiser
  • 63. objectives for public partners→ funding for digitisation→ enhanced access→ engaging new audiences→ access to technology→ access to private sector competencies→ commercial income through user fees, royalties or revenue share→ lobbying effort to increase public funding @maxkaiser
  • 64. objectives for private partners→ commercial objectives →access to new markets or customer groups →association with strong public brands →access to (rare, unique) content→ corporate social responsibility @maxkaiser
  • 65. benefits for citizens→ increased online access→ democratisation of access to knowledge→ added-value services→ benefit for learning and tourism→ new creative endeavours @maxkaiser
  • 66. 10 January 2011http://ec.europa.eu/information_society/activities/digital_libraries/doc/reflection_group/final_report_%20cds.pdf
  • 67. „Stimulating the flow of private fundsfor the digitisation of cultural assets throughequitable public private partnershipsappears as a viable and sustainable wayof tackling the pressing questionof making Europe’s cultural wealthaccessible online and preserving itfor future generations.“ @maxkaiser
  • 68. „The key question is notwhether public-privatepartnerships for digitisationshould be encouraged, buthow‚ and under whichconditions.“ @maxkaiser
  • 69. 27 October 2011
  • 70. „(...) recommends that Member States (...)encourage partnerships between culturalinstitutions and the private sector inorder to create new ways of fundingdigitisation of cultural material and tostimulate innovative uses of the material,while ensuring that public privatepartnerships for digitisation are fair andbalanced (…).“ @maxkaiser
  • 71. key principles:1. respect for intellectual property rights → ONB-Google: only public-domain works digitised2. non-exclusivity → ONB-Google: ONB free to digitise material with other partners3. transparency of the process → ONB-Google: public tender @maxkaiser @maxkaiser
  • 72. key principles:4. transparency of agreements → ONB-Google: Very detailed FAQs online5. accessibility through Europeana → ONB-Google: → all files available for non-commercial use → access via platforms like Europeana → provision to research partners6. key criteria → [Next slide] @maxkaiser
  • 73. key criteria for assessing PPPs→ total investment by private partner / effort of public partner→ (free) access to material for general public, including through Europeana→ cross-border access→ length of any period of preferential commercial use by private partner→ quality of digital copies for public partner→ usage conditions for public partner in non- commercial context→ time-scale of project @maxkaiser
  • 74. additional key elements inONB-Google cooperation:→ selection of books by library→ Institute for Conservation involved→ termination @maxkaiser
  • 75. @maxkaiser
  • 76. „Genuine PPPs currently not a widespreadmethod for financing digitisation by culturalinstitutions in Europe.“ Commission Staff Working Paper Accompanying the document Commission Recommendation on the digitisation and online accessibility of cultural material and digital preservation, p18 http://ec.europa.eu/information_society/activities/digital_libraries/doc/recommendation/recom28nov_all_versions/staff_working_paper.pdf @maxkaiser
  • 77. aim to maximize access and re-use via digitisation access restrictions / re-Use limitations in PPPs@maxkaiser
  • 78. public private partnershipsas commodificationof the cultural commons?@maxkaiser
  • 79. Cultural Commons→ Body of work freely available to the public for legal use, sharing, repurposing, and remixing→ Source for cultural creativity→ http://creativcommons.org/culture @maxkaiser
  • 80. @maxkaiser
  • 81. Public Domain→ material to derive knowledge and create new cultural works→ essential for society and economy @maxkaiser
  • 82. http://www.europeana-libraries.eu/web/europeana-project/publications @maxkaiser
  • 83. Public Domain Mark„This work has been identifiedas being free of knownrestrictions under copyrightlaw, including all related andneighbouring rights.You can copy, modify,distribute and perform the http://creativecommons.org/publicdomain/mark/1.0/work, even for commercialpurposes, all without askingpermission.“ @maxkaiser
  • 84. Public Domain Charter„Public-Private Partnerships have become oneoption for funding large scale digitisation efforts.Commercial content aggregators pay for thedigitisation in exchange for privileged access to thedigitised collections. These activities are seen as areason for attempting to exercise as much control aspossible over digital reproductions of Public Domainworks. Organisations are claiming exclusive rights indigitised versions of Public Domain works and areentering into exclusive relationships with commercialpartners that hinder free access.” @maxkaiser
  • 85. @maxkaiser
  • 86. Public Sector Information→ information produced, collected and held by public institutions→ single largest source of information in Europe→ should be widely re-used to foster economy and creativity @maxkaiser @maxkaiser
  • 87. PSI Directive→ EC “Directive on the Re-Use of Public Sector Information” (31 Dec. 2003)→ aim: Foster re-use of PSI→ legally binding document→ implemented by all Member States in 2008→ currently: Cultural & research institutions excluded from directive @maxkaiser
  • 88. key provisions of PSI Directive→ clear procedures for re-use requests→ upper limit for charging→ transparency of conditions and standard charges for re-use→ avoid discrimination between players→ prohibition of exclusive agreements @maxkaiser
  • 89. → 12 Dec 2011: Commission proposal for PSI Directive amendment @maxkaiser
  • 90. proposed changes→ withdraw current exemption for cultural institutions→ restrict public sector bodies to only apply charges for re-used based on marginal costs →exemption for libraries, archives, museums→ prohibit agreement of terms for re-use which grant exclusive rights to any one party @maxkaiser
  • 91. → discussion in Council Working Groups under Danish and Cyprus Presidencies→ latest published draft: 1 Oct. 2012 @maxkaiser
  • 92. http://register.consilium.europa.eu/pdf/en/12/st13/st13162.en12.pdf
  • 93. → Working Draft, 1 Oct. 2012: Article 11 @maxkaiser
  • 94. → Working Draft, 1 Oct. 2012: Article 11 @maxkaiser
  • 95. @maxkaiser the project …
  • 96. who is paying for what?http://www.bildarchivaustria.at/downl/1148453/layout/CE%2043_3.jpg
  • 97. costs→ full text-digitisation: very expensive→ report by Collections Trust for Comité des Sages http://ec.europa.eu/information_society/activities/digital_libraries/ doc/refgroup/annexes/digiti_report.pdf @maxkaiser
  • 98. Google:→ transport→ insurance→ scanning→ OCR→ image processing→ quality control→ Google Books @maxkaiser
  • 99. Austrian National Library:→ provision of Metadata→ selection→ internal logistics→ conservational assessment→ barcoding→ metadata adjustments→ data download and control→ data storage & digital preservation→ Digital Library @maxkaiser
  • 100. →conservation→preservation http://www.mediathek.at/akustische-chronik/popup/popup.php?document_id=1000115&zone_id= 1000043&template_id=1000016&zone_name=IMAGE_ZONE1
  • 101. which books? @maxkaiser
  • 102. entire historical book holdings16th –19th century
  • 103. 200.000 volumes State Hall@maxkaiser@maxkaiser
  • 104. Department of Manuscriptsand Rare Books Map Department Quelle: http://deu.archinform.net/projekte/107
  • 105. Department of Music
  • 106. Theatre Museum Quelle: http://commons.wikimedia.org/wiki/File:Palais_Lobkowitz_Vienna_Oct._2006_006.jpg
  • 107. Fidei Commiss Library
  • 108. @maxkaiser@maxkaiser
  • 109. 7 Work Packages  Book logistics  Metadata / Catalogues  Conservation / Restoration  Data download / Quality control  Access  IT infrastructure  Project management@maxkaiser
  • 110. preparatory projectmid - end 2010→ integration with organisational processes→ personnel resources→ logistics workflows @maxkaiser
  • 111. internal communication→ change processes→ re-evaluation of workflows→ availability of internal resources @maxkaiser
  • 112. consultation with otherGoogle partners Quelle: http://commons.wikimedia.org/wiki/File:M%C3%BCnchen_Bayerische_Staatsbibliothek_001.JPG
  • 113. 70+ staff members20+ exclusively for project → book logistics → metadata adaptation → cataloguing → conservation / restoration → quality control → software implementation → project management @maxkaiser
  • 114. end of 2010test shipment & start operational projectSpring 2011start of digitisation @maxkaiser
  • 115. no individual selection …
  • 116. size
  • 117. size
  • 118. condition
  • 119. preparation
  • 120. conservationalevaluation
  • 121. value
  • 122. book flow@maxkaiser
  • 123. logistics in theState Hall
  • 124. logistics in theState Hall
  • 125. logistics in theState Hall
  • 126. challenges…@maxkaiser
  • 127. challenges…
  • 128. challenges…
  • 129. challenges…
  • 130. challenges…
  • 131. logistics in the„Aurum“ Depot
  • 132. logistics in the„Aurum“ Depot
  • 133. preparation fordigitisation
  • 134. manipulation area …
  • 135. barcoding
  • 136. adaptation of metadata
  • 137. @maxkaiser@maxkaiser
  • 138. 8 minutes / volume
  • 139. books@maxkaiser
  • 140. hours@maxkaiser
  • 141. working days@maxkaiser
  • 142. person years@maxkaiser
  • 143. complex cases …
  • 144. bound-togethers …
  • 145. bound-togethers …
  • 146. bound-togethers …
  • 147. „slim“ volumes …
  • 148. special collections …
  • 149. conservational protection
  • 150. conservational protection
  • 151. conservational protection
  • 152. conservational protection @maxkaiser @maxkaiser
  • 153. cataloguing theFidei Commiss Library
  • 154. cataloguing theFidei Commiss Library
  • 155. ready for digitisation …
  • 156. digitisation→ scanning Center in Germany→ procedures agreed→ Austrian Federal Office for Monuments involved→ each volume checked after return→ books unavailable to users for ~ 3 months @maxkaiser
  • 157. @maxkaiser
  • 158. @maxkaiser@maxkaiser
  • 159. where are we today?@maxkaiser
  • 160. today 100.000@maxkaiser volumes digitized
  • 161. by end 2013 185.000@maxkaiser volumes digitized
  • 162. of 100.000 volumes: 9,19% 16th century14,24% 17th century31,48% 18th century43,01% 19th century 2,07% [no year of publication]@maxkaiser
  • 163. of 100.000 volumes:33,41% German31,31% Latin15,55% French13,78% Italian 2,73% English@maxkaiser
  • 164. digital flow book@maxkaiser
  • 165. book logistics digitisation data download ADOCO quality control (Austrian Books Online Download & Control) storage @maxkaiser access
  • 166. up todigitised items / day @maxkaiser
  • 167. quality control@maxkaiser
  • 168. quality control→ goal: Automated jobs→ representative samples→ IT assisted discovery of error clusters→ error candidates checked manually→ detect systematic and critical errors @maxkaiser
  • 169. Informed by „Validating Quality inerror model Large-Scale Digitization“ project of Univ. of Michigan & Univ. of Minesota, http://hathitrust-quality.projects.si.umich.edu/→ level 1: data / information → image (thick, broken) → illustration (scanner effects, tone, color etc) → full-text (OCR errors per page-image)→ level 2: entire page → blur / warp / skew → cropping → obscure / cleaned → colorization → full-text (OCR error patterns at page level) @maxkaiser
  • 170. Informed by „Validating Quality inerror model Large-Scale Digitization“ project of Univ. of Michigan & Univ. of Minesota, http://hathitrust-quality.projects.si.umich.edu/→ level 3: whole volume → order of pages → missing pages → duplicate pages → false pages → full text (OCR error patterns at volume level) @maxkaiser
  • 171. Informed by „Validating Quality inuse cases Large-Scale Digitization“ project of Univ. of Michigan & Univ. of Minesota, http://hathitrust-quality.projects.si.umich.edu/→ reading online images→ printing on demand→ processing full text data→ managing collections @maxkaiser
  • 172. @maxkaiser
  • 173. @maxkaiser
  • 174. non-critical errors@maxkaiser
  • 175. bleedthrough@maxkaiser
  • 176. @maxkaiser
  • 177. @maxkaiser
  • 178. errors@maxkaiser
  • 179. croppingerror@maxkaiser
  • 180. quality control via sampling re-processing re-download@maxkaiser
  • 181. croppingerrorFIXED @maxkaiser
  • 182. http://blogs.loc.gov/digitalpreservation/files/2012/05/3875300483_a8875fea1c-500.jpgbig data processing… @maxkaiser
  • 183. technical slides ahead!
  • 184. technologies and workflowsfrom EC co-funded FP7 projects:→ SCAPE (Scalable Preservation Environments) →http://www.scape-project.eu/→ IMPACT (Improving Access to Text) →http://www.impact-project.eu/ @maxkaiser
  • 185. experimental cluster
  • 186. hadoop / map reduce MASTER Job Tracker Name Node SLAVE 1 SLAVE 2 SLAVE n Task Tracker Task Tracker Task Tracker Data Node Data Node Data Node Hadoop Distributed File System (HDFS) → experimental 5 server cluster at ONB: → 40 cores in total → 30 cores assigned to task trackers
  • 187. use case 1: duplicate pagesin one book→ books with duplicated pages →due to scanning process & post processing→ use key points of images to determine structural image similarity @maxkaiser
  • 188. use case 1: duplicate pagesin one book
  • 189. use case 1: duplicate pagesin one book
  • 190. use case 2: book comparisonbased on image similarity→ different instances of one book, coming →e.g. from different downloads of one book at different points in time→ book similarity measure →based on comparison of book page images from two different book instances @maxkaiser
  • 191. use case 2: book comparisonbased on image similarity measure for book similarity based on book page image similarity  helps finding prominent changes in book re- downloads
  • 192. large scale document processing→ extract image metadata using Exiftool →large scale batch processing using Apache Hadoop Streaming API →bash script using Exiftool is executed on the cluster →book page image data is accessible from each node of the cluster →parallelisation of batch processing @maxkaiser
  • 193. Jp2PathCreator HadoopStreamingExiftoolRead reading files from NAS /NAS/Z119585409/00000001.jp2 Z119585409/00000001 2345 /NAS/Z119585409/00000002.jp2 Z119585409/00000002 2340 /NAS/Z119585409/00000003.jp2 Z119585409/00000003 2543 … … /NAS/Z117655409/00000001.jp2 Z117655409/00000001 2300 /NAS/Z117655409/00000002.jp2 Z117655409/00000002 2300 /NAS/Z117655409/00000003.jp2 Z117655409/00000003 2345 … find … /NAS/Z119585987/00000001.jp2 Z119585987/00000001 2300 /NAS/Z119585987/00000002.jp2 Z119585987/00000002 2340 /NAS/Z119585987/00000003.jp2 Z119585987/00000003 2432 … … /NAS/Z119584539/00000001.jp2 Z119584539/00000001 5205NAS /NAS/Z119584539/00000002.jp2 Z119584539/00000002 Z119584539/00000003 2310 2134 /NAS/Z119584539/00000003.jp2 … … /NAS/Z119599879/00000001.jp2l Z119599879/00000001 2312 /NAS/Z119589879/00000002.jp2 Z119589879/00000002 ... 2300 /NAS/Z119589879/00000003.jp2 Z119589879/00000003 2300 ... ... 1,4 GB 1,2 GB 60.000 books ~5h + ~ 38 h = ~ 43 h 24 mio pages
  • 194. large scale document processing→ store once in HDFS and read many times→ small files (TXT, HTML) stored in HDFS → files of each file type stored as one big file (SequenceFile)→ principle: store once in HDFS and read many times→ example: → storing OCR results of 24 mio pages (ca. 60.000 books)  reading data from file server and storing on cluster takes more than 1 day → subsequent processing of a Map/Reduce job (calculate average block width) takes 6 hours @maxkaiser
  • 195. HtmlPathCreator SequenceFileCreator reading files from NAS /NAS/Z119585409/00000707.html /NAS/Z119585409/00000708.html Z119585409/00000707 /NAS/Z119585409/00000709.html … /NAS/Z138682341/00000707.html Z119585409/00000708 /NAS/Z138682341/00000708.html /NAS/Z138682341/00000709.html find … Z119585409/00000709 /NAS/Z178791257/00000707.html /NAS/Z178791257/00000708.html /NAS/Z178791257/00000709.html … Z119585409/00000710 /NAS/Z967985409/00000707.htmlNAS /NAS/Z967985409/00000708.html /NAS/Z967985409/00000709.html Z119585409/00000711 … /NAS/Z196545409/00000707.html /NAS/Z196545409/00000708.html Z119585409/00000712 /NAS/Z196545409/00000709.html ... 1,4 GB 997 GB (uncompressed) 60.000 books 24 mio pages ~5h + ~ 24 h = ~ 29 h
  • 196. example map/reduce job:calculate average block width HadoopAvBlockWidthMapReduce Map/Reduce Z119585409/00000001 2100 Z119585409/00000001 2200 Z119585409/00000001 2250 Z119585409/00000001 2300 Z119585409/00000001 2400Z119585409/00000001 Z119585409/00000002 2100 Z119585409/00000002 2200 Z119585409/00000002 2250 Z119585409/00000002 2300 Z119585409/00000002 2400Z119585409/00000002 Z119585409/00000003 2100 Z119585409/00000003 2200 Z119585409/00000003 2250 Z119585409/00000003 2300Z119585409/00000003 Z119585409/00000003 2400 Z119585409/00000004 2100Z119585409/00000004 Z119585409/00000004 2200 Z119585409/00000004 2250 Z119585409/00000004 2300 Z119585409/00000004 2400 ... Z119585409/00000005 2100Z119585409/00000005 Z119585409/00000005 2200 Z119585409/00000005 2250 Z119585409/00000005 2300 Z119585409/00000005 2400 SequenceFile Textfile ~6h
  • 197. combine MySQL and Apache Hive DB book level page level metadata metadata Sqoop
  • 198. storage and access…@maxkaiser
  • 199. dataaverage size data package (~book):101 MB colour data package: 187 MB grayscale data package: 82 MB101 MB * 600.000 = 60 TB @maxkaiser
  • 200. storage & access→ data storage: in-house→ JPEG-2000 master files stored redundantly→ access copies generated on-the-fly→ URN resolver for permanent identification @maxkaiser
  • 201. book viewercatalogue /“Quick Search” [mobile apps] full-text search @maxkaiser
  • 202. USER Book Viewer Fulltext Index Server Quick Search Image Server URN Resolver Catalogue Digital Repository Master ImagesGoogle ADOCO @maxkaiser @maxkaiser
  • 203. outlook→ full-text: new possibilities for research→ data enrichment→ named entity recognition→ linked data→ new data centric research in the Humanities & Social Sciences→ http://www.diggingintodata.org/ @maxkaiser
  • 204. @maxkaiser@maxkaiser
  • 205. DM2E→ http://dm2e.eu/→ European Commission co-funded project→ stimulate creation of new tools and services for re-use of Europeana data in the Digital Humanities→ implementation of semantic annotation tool→ Austrian Books Online data part of the project @maxkaiser
  • 206. next steps→ 80.000 books already accessible via Google Books→ Spring 2013: launch of Austrian Books Online Viewer→ full text search @maxkaiser
  • 207. @maxkaiserhttp://books.google.at/books?vid=ONB%2BZ15367990X
  • 208. @maxkaiser http://books.google.at/books?vid=ONB%2BZ155606704
  • 209. @maxkaiserhttp://books.google.at/books?vid=ONB%2BZ174115105
  • 210. @maxkaiser http://books.google.at/books?vid=ONB%2BZ158211101
  • 211. http://books.google.at/books?vid=ONB%2BZ169472305 @maxkaiser
  • 212. http://books.google.at/books?vid=ONB%2BZ164893308 @maxkaiser
  • 213. more informationwww.onb.ac.at/ev/austrianbooksonlinewww.onb.ac.at/ev/austrianbooksonline/faq.htmtwitter.com/abooksonline @maxkaiser
  • 214. thank you!max.kaiser@onb.ac.atwww.onb.ac.atwww.slideshare.net/maxkaiserwww.linkedin.com/in/maxkaisergplus.to/maxkaisertwitter.com/maxkaiser @maxkaiser