DiggiCORE: Digging into Connected Repositories
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

DiggiCORE: Digging into Connected Repositories

  • 968 views
Uploaded on

This is a presentation I gave in Bristol describing the DiggiCORE project and the challenges it addresses.

This is a presentation I gave in Bristol describing the DiggiCORE project and the challenges it addresses.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
968
On Slideshare
964
From Embeds
4
Number of Embeds
1

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 4

https://twitter.com 4

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • What happens in the box: A metadata interoperability layer, metadata, content, enrichment, presentation layer
  • All text mining takes place at this phase

Transcript

  • 1. DiggiCORE: Digging into Connected Repositories Petr Knoth Knowledge Media institute The Open University 1/38
  • 2. Outline1. Connecting by aggregating Open Access (OA) publications • Why agregate and who is it for • The added value of aggregations2. The CORE system3. Supporting research in mining databases of scientific publications 2/38
  • 3. Outline1. Connecting by aggregating Open Access (OA) publications • Why agregate and who is it for • The added value of aggregations2. The CORE system3. Supporting research in mining databases of scientific publications 3/38
  • 4. The rapid rise of OA articles The graph (from Laasko and Bjorks paper - BMC Medicine 2012, 10:124) shows the numbers of papers published in three different types of online open access journals from 2000 to 2011. 4/38
  • 5. Growth of Open Access repositories 5/38
  • 6. Why we need aggregations?“Each individual repository is of limited value for research: the realpower of Open Access lies in the possibility of connecting and tyingtogether repositories, which is why we need interoperability. Inorder to create a seamless layer of content through connectedrepositories from around the world, Open Access relies oninteroperability, the ability for systems to communicate with eachother and pass information back and forth in a usable format.Interoperability allows us to exploit todays computational power sothat we can aggregate, data mine, create new tools and services,and generate new knowledge from repository content.’’ [COAR manifesto] 6/38
  • 7. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Analytical Semantic EnrichmentRepository information access Interfaces Aggregation Transaction Repository information access Content OLAP Raw data accessRepository 7/38
  • 8. Who should be supported by aggregations?• The following users groups (divided according to the level of abstraction of information they need): • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence … 8/38
  • 9. What is it all about? 9/38
  • 10. Outline1. Connecting by aggregating Open Access (OA) publications – why, how, what for?2. The CORE system3. Supporting research in mining databases of scientific publications 10/38
  • 11. CORE objective CORE aims to provide a technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction. 11/38
  • 12. CORE functionality Content harvesting, processing 12/38
  • 13. CORE functionality Semantic enrichment 13/38
  • 14. CORE functionality Providing services 14/38
  • 15. What does CORE provide at different access levels? Repository Analytics Metadata Transfer Interoperability Metadata OLTPRepository Analytical information access Interfaces Enrichment CORE Portal, CORE Aggregation Mobile, CORE Plugin Transaction Repository information access Content OLAP CORE API CORE API Raw data accessRepository 15/38
  • 16. CORE ApplicationsCORE Portal – Allows searching and navigating scientific publicationsaggregated from Open Access repositories 16/38
  • 17. CORE ApplicationsCORE Mobile – Allowssearching andnavigating scientificpublications aggregatedfrom Open Accessrepositories 17/38
  • 18. CORE ApplicationsCORE Plugin – A plugin to system that recommendations for relateditems. 18/38
  • 19. CORE ApplicationsRepository Analytics – is an analytical tool supporting providers ofopen access content (in particular repository managers). 19/38
  • 20. 20/38
  • 21. CORE ApplicationsCORE API – Enables external systems and services to interact with theCORE repository. • Search service • Pdf and plain text service • Similarity service • Classification service • Citation service 21/38
  • 22. CORE ApplicationsCORE API registered users:British Education IndexCottagelabsUKCORREuropeanaULCCLibrary, The Open UniversityLos Alamos National Laboratory, USAUniversity of Manchester LibraryUniversidad de los Andes. Bogotá, ColombiaUNESCO 22/38
  • 23. CORE visits (October 2012)More than 6000 visits per day 23/38
  • 24. Outline1. Connecting by aggregating Open Access (OA) publications – why, how, what for?2. The CORE system3. Supporting research in mining databases of scientific publications 24/38
  • 25. ObjectiveSoftware for exploration and analysis of very large andfast-growing amounts of research publications storedacross Open Access Repositories (OAR). 25/38
  • 26. DiggiCORE networksThree networks: (a) semantically related papers,(b) citation network, (c) author citation network 26/38
  • 27. The problem of result transparencyGoogle ScholarMicrosoft Academic Search 27/38
  • 28. DiggiCORE objectivesAllow researchers to use this platform to analysepublications.Why?• To identifying patterns in the behaviour of research communities• To detect trends in research disciplines• To gain new insights into the citation behaviour of researchers• To discover features that distinguish papers with high impact 28/38
  • 29. Questions the system can help answering?• What are the attributes of impact publications?• Do these attributes differ in the humanities, social sciences and computer sciences?• What are the features of research groups within disciplines and how do these features relate to contributions generated by the group?• What are the attributes of high-impact authors and what is their role within the group?• What are the dynamics of successful research groups? 29/38
  • 30. Questions the system can help answering?• What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences?• Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines?• How should the novice in the discipline get acquainted with key achievements in the discipline?• How should he/she search for the most important publications? 30/38
  • 31. Challenges• Technical issues of quick Open Access harvesting• Lack of understanding of publishers of academics of Open Access licenses• Explain the added value of full-text vs metadata aggregations: • User experience • Text-mining 31/38
  • 32. The power of full-text aggregations (WorldCat vs CORE) 32/38
  • 33. Text-mining“There are currently over 144,000 full time equivalent academicprofessionals (teaching and research) working in UK highereducation. Using data from the Higher Education Statistics Agency(HESA) for UK academic salaries, the median salary for a UKacademic falls into a band of between £42k and £55k, whichtranslates to between £26 and £33 per working hour. If text miningenabled just a 2% increase in productivity – corresponding to only45 minutes per academic per working week (and looking at CIBER’sanalysis of the impact of eJournals, this is very much anunderestimate), this would imply over 4.7 million working hours andadditional productivity worth between £123.5m and £156.8m inworking time per year.” [McDonnald & Kelly, 2012] – JISC report ontext-mining 33/38
  • 34. Cost of Gold OA http://rossmounce.co.uk/2012/09/04/the-gold-oa-plot-v0-2/ 34/38
  • 35. Summary• Aggregations should serve the needs of different user groups.• Transparency is crucial• Machine access to publications provides lots of new opportunities.• We can have many services that are part of the infrastructure, but should work with the same data.• CORE aims to • prepare the way for innovative open access services • demonstrate the benefits of programmable access to publications • data mine publications for impact characteristics 35/38
  • 36. PartnersAdvisory Board 36/38
  • 37. Questions? 37/38
  • 38. 38/38