SlideShare is now on Android. 15 million presentations at your fingertips.  Get the app

×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Indexing big data in the cloud

by on May 14, 2012

  • 1,467 views

Amazon Web Services offers a quick and easy way to build a scalable search platform, a flexibility is especially useful when an initial data load is required but the hardware is no longer needed for ...

Amazon Web Services offers a quick and easy way to build a scalable search platform, a flexibility is especially useful when an initial data load is required but the hardware is no longer needed for day-to-day searching and adding new documents. This presentation will cover one such approach capable of enlisting hundreds of worker nodes to ingest data, track their progress, and relinquish them back to the cloud when the job is done. The data set that will be discussed is the collection of published patent grants available through Google Patents. A single Solr instance can easily handle searching the roughly 1 million patents issued between 2010 and 2005, but up to 50 worker nodes were necessary to load that data in a reasonable amount of time. Also, the same basic approach was used to make three sizes of PNG thumbnails of the patent grant TIFF images. In that case 150 worker nodes were used to generate 1.6 Tb of data over the course of three days. In this session, attendees will learn how to leverage EC2 as a scalable indexer and tricks for using XSLT on very large XML documents.

Statistics

Views

Total Views
1,467
Views on SlideShare
1,466
Embed Views
1

Actions

Likes
0
Downloads
8
Comments
0

1 Embed 1

https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Indexing big data in the cloud Indexing big data in the cloud Presentation Transcript