• Save
Web-Scale Discovery: From start to for sale in one year
Upcoming SlideShare
Loading in...5
×
 

Web-Scale Discovery: From start to for sale in one year

on

  • 493 views

A presentation on building a web-scale discovery solution in one year delivered at the 2010 Access Conference in Winnipeg.

A presentation on building a web-scale discovery solution in one year delivered at the 2010 Access Conference in Winnipeg.

Statistics

Views

Total Views
493
Views on SlideShare
486
Embed Views
7

Actions

Likes
0
Downloads
0
Comments
0

3 Embeds 7

http://www.linkedin.com 5
http://www.slashdocs.com 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Here is the content provider
  • Here is how the transmit the data. It flows in one direction – but not always reliably
  • Then you have to acquire the data and try to make some sense of it
  • Then you need to refine the data so that it can be used
  • Then you need to provide the data to your users in a relevant fashion
  • Overall, a very messy job

Web-Scale Discovery: From start to for sale in one year Web-Scale Discovery: From start to for sale in one year Presentation Transcript

  • WEB-SCALE DISCOVERY
    FROM START TO FOR-SALE IN ONE YEAR
  • It ain’t easy
    “We will so feel their pain, but I hope technology and content provider engagement have improved to make it a bit easier for them!”
    Miriam Blake, Los Alamos National Laboratory Research Library when talking about Summon and other Discovery tools in reference to their own experience in building a unified index on Code4Lib on Jun 30, 2010.
    2
  • She goes on to say…
    “Aside from the contracts, I can also attest to the major amount of work it has been. We have 95M bibliographic records, stored in > 75TB of disk, and counting. Its all running on SOLR, with a local interface and the distributed aDORe repository on backend. ~ 2 FTE keep it running in production now.”
    3
  • Did she say 75 Million Records?
    That’s a drop in the bucket for what a “Unified Discovery Index” needs to provide to be successful.
    4
  • Requirements
    Strong Publisher Relations
    Plenty of funding
    Fresh team with lots of experience
    5
  • Content Acquisitions
    Met with hundreds of content providers from all over the globe
    Nearly 7000 publishers represented in Summon index
    6
  • Content Acquisition Methods
    7
  • Merged Deduplication
    8
  • Data Normalization
    Cleanup is important
    Dates
    Does October 15, 2010 = Fall 2010?
    Author Names
    Publication Title
    Is it really a journal article or a book review?
    Is an obituary really a newspaper article?
    9
  • Complex Indexing Models
    Planning
    Maintenance
    Hardware
    Planning
    Analysis
    Planning
    10
  • Better example
    Let’s pretend Data is Crude Oil
    11
  • Content Provider
    12
  • Content Acquisition
    13
  • Content Acquisition and Cleanup
    14
  • 15
  • Relevancy
    16
  • Very Messy Job
    17