How is the Government Spending Your Money? How GCE is Using Lucene and the GCE Big Data Cloud

  • 1,000 views
Uploaded on

Presented by Seshu Simhadri | Global Computer Enterprises - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 …

Presented by Seshu Simhadri | Global Computer Enterprises - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012

A leader in bringing innovative technologies to the Federal Government, GCE looks to open source tools to drive down cost and provide the foundation for building value-added services for its customers. This talk will discus GCE’s innovative use of Lucene/Solr combined with the GCE Big Data Cloud to open up access to Federal spending data. This data is in wide use across the Federal government, Federal contracting community, media and press, as well as Capitol Hill. GCE has utilized this toolset to deliver the type of capability that users typically only find in web consumer applications. This session will highlight the technical side of the challenge in implementing these tools across a large user community and data set in a Cloud environment.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,000
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Lucene in the Cloud: Leveraging the Power of Search and Big Data to Shed Light on Government Spending Seshubabu Simhadri Chief Technology Officer, GCEConfidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 2. BackgroundWhat is USASpending.gov?Moving to Our Big Data cloudSome of the design decisions Tool Selection Cluster Design Hardware DesignLimitations and enhancements Overview Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 3. What is USASpending.gov? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 4. U.S. Government Spending vs. Other Entities Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 5. Distribution of U.S. Government Spending Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 6. • Analytics •  Stats •  Top-K• Free Text Search (With auto Suggestions)• Large Data Feeds• APIs What can users do on the site? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 7. • Public• Media• Congress• Value Added Resellers Who are the users of the site? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 8. Leveraging the industry leading open source platform to deliver cost savings and scalability within a Cloud computing modelGCE Big Data and Analytics Cloud Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 9. •  Hadoop − For indexing and downloads Start by•  Distributed Solr Looking at − Analytics the Usual − Free text search Suspects•  Drupal static content•  Visualization What’s Inside the GCE Cloud? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 10. The greatestchallenge is how to optimallydesign a node – which combination ofCPUs, memory, and shard size delivers the desired performance? Solr Node Sizing Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 11. Multiple index types Different types of spending Varying sizesBreak complete dataset into shards as small as required tomeet the response times Choose shard size based on response timesSingle Core with multiple cores or Multiple Solr instances eachwith single core? Solr Node Sizing Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 12. How do you design the cluster – which ones are individual nodes and which ones are aggregators? Solr Cluster DesignConfidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 13. Should all shards be treated equal?Userà Aggregator Nodes à ShardsDifferent requirements for nodes collecting the dataand nodes serving a specific datasetAggregator Node 1,2,3 ….m Large Solr Instances, No local indexShard Nodes 1,2,3,..100..n Small Solr Instance with index Solr Cluster Design Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 14. Separate Solr instances Multiple hard drives per server Solid state disks InfinibandWhat configuration did we choose? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 15. Enhanced Faceting: Enablingaggregationby more than one field Will becontributed to Solr project Solr Enhancements Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 16. When the shards increase, management of SQLs inside Solr becomes a challenge External Data Importer Using HadoopSolr Data Importer: Why Not? Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 17. Solr in the Cloud required building a cost effective and high performance infrastructure Small vs. large Commodity serversUtilizing Large Commodity Servers Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 18. Failure of one node results in failure of multiple shards - careful design is required Disadvantages of higher capacity servers Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 19. Sharded architectureMultiple Solr instances per server each handling smalldatasetsAggregator nodes + shardsHadoop for data indexing and data feedsLarge Commodity Servers •  48-core •  256GB RAM •  SSD •  Infiniband Summary Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 20. Come build the future of Big Data GCECloud.com We’re hiring!Confidential, Do Not Disclose. Property of Global Computer Enterprises, Inc..
  • 21. Questions? ssimhadri at GCECloud.comVisit us at www.GCECloud.com