Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Building Search Engines

43 views

Published on

Search Engines use web search queries to collect information and present it to the user. How do you go about building a search engine in the first place?

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Building Search Engines

  1. 1. Building Search Engines Comparing Lucene / SolR / Elastic & Cloud Search Providers
  2. 2. Business Platform Success We design, build, and manage business platforms by leveraging DataStax, Sitecore, Salesforce, Quickbooks and other cloud software.
  3. 3. Agenda • Challenge - Why does this matter? • Info Retrieval - Retrieval / Routing • Lucene - More than meets the eye ... • Search Engine - 30k Foot View • On Premise - Lucene / SolR / Elastic • Cloud Providers - Amazon / Azure
  4. 4. Challenge - Why does this matter? Knowledge Project Information Client Service Information Corporate Guides Collaborativ e Documents Assets & Files Corporate Resources Appleseed Framework (Portal, Base, Search) G Drive Delta DropBox G Drive Delta Nutshell Dropbox Freshbooks G Drive G Sites (KB) G Drive Workflowy Evernote G Drive DropBox OwnCloud Pocket Leaves AIC (WP) Anant (WP)
  5. 5. Information Retrieval ● Document Retrieval ○ Google Search ○ Amazon Search ○ LinkedIN Search ○ *CMS Search ○ *Portal Search ○ *CRM Search ○ * Search Document Routing ● Google Alerts ● Amazon’s Recommendations ● Netflix Recommendations ● LinkedIN Recommendations
  6. 6. Lucene - Inverted Index
  7. 7. Lucene - More than meets the eye Who Next? Think of it like a “NoSQL” Database that has great indexing everywhere.
  8. 8. Search Engine - 30 Thousand Foot View The search index is only as good as your processed data. If you put everything you find in your index, you are going to spend a lot of time telling people how to search.
  9. 9. Lucene • Library • File System • Format • Fast • Embeddable* • Indexing Anywhere • Need to really know Lucene • No Interface • No server • Lots of house keeping On Premise - Lucene/ ES / SolR SolR • Server • Admin / REST Interface • Configurable • Scalable • Great at Text* • Truly Open • 10+ Years • Good ecosystem • Too customizable • Schemas* • Zookeeper Needed ElasticSearch • Server • Configurable • Scalable • Good ecosystem • Built in Clustering • Grouping / Filtering • Great for Logs • Started as a Cloud Tool • No great OTS Interface • Only REST Interface
  10. 10. Amazon • SolRCloud* • AWS* Ecosystem • 5 QParsers • Dynamic Fields • 100% Completely Managed • Been Around for a While • Data / Read Writes • No nested Objects Cloud Search - Amazon / Azure Azure • ElasticSearch* • Azure* Ecosystem • 2 QParsers • 100% Completely Managed • Good SDK • Few Years Old • Data / Read Writes • No nested Objects • Not so Dynamic Fields
  11. 11. Data & Analytics Cassandra, DataStax, Kafka, Spark Customer Experience Sitecore Information Systems Salesforce, Quickbooks, and more www.anant.us | solutions@anant.us | (855) 262-6826 3 Washington Circle, NW | Suite 301 | Washington, DC 20037

×