Intro to Search

1,332 views

Published on

A 1 hour intro to search, Apache Lucene and Solr, and LucidWorks Search. Contains a quick start with LucidWorks Search and a demo using financial data (See Github prj: http://bit.ly/lws-financial) as well as some basic vocab and search explanations

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,332
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
34
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The bar is raised: when we first started Lucid, the problems were all around standing up Lucene or Solr or dealing with performance issues, now the large majority of them are around taking search to the next level: better relevance, personalization, recommendations, etc., i.e. how to have better relevance
  • What is Lucene?What is Solr?
  • Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time
  • Intro to Search

    1. 1. © Copyright 2013 Intro to Search Grant Ingersoll CTO, LucidWorks @gsingers
    2. 2. © 2013 LucidWorks • Search is Everywhere! • The Bar is Raised - Keyword search is a commodity • Holistic view of the data AND the users is critical • Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data Search is dead, long live search Documents User Interaction Access Content Relationships
    3. 3. © 2013 LucidWorks 3 Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (top 1) - Recommendations - “Good enough” classification, clustering • Faceting, slicing and dicing of enumerated data • Spatial, spell checking, record linkage, highlighting • NoSQL
    4. 4. © 2013 LucidWorks 4 Common Use Cases • eCommerce - Search + Recs + Analysis of users • Knowledge Management - Financial, transportation, pharma • Fraud detection • Social media - Trend monitoring • Information technology - Log monitoring, analysis • Healthcare - DNA Analysis
    5. 5. © 2013 LucidWorks http://bit.ly/get-lws 5
    6. 6. © 2013 LucidWorks 6 Topics • Intros • First 5 Minutes with LucidWorks Search (Solr++) • Search Concepts • Demo Deep Dive • Level Up • Resources
    7. 7. © 2013 LucidWorks 7 › Founded in 2007 to be the go-to-company for Lucene/Solr expertise › 250+ customers (many Fortune 500) › 100% y-y growth › Over 40% of the active Apache Lucene/Solr Committers › Host fast-growing Lucene/Solr Revolution User Conference (400+ attendees) LucidWorks Overview
    8. 8. © 2013 LucidWorks 8 LucidWorks Product Suite PRODUCT LucidWorks Search LucidWorks Big Data Description Massively adopted open source search technology Enterprise Search platform built on Lucene/Solr Unified development platform for Big Data applications Version Version 4.3 released May 2013 Version 2.5 ships December 2012 GA Version 1.1 released Feb. 2013 LucidWorks Offering › Annual Support Subscriptions › Professional Services › Training › Inside Sales Model › Free trial › On-prem or cloud › Inside sales model › Free Trial › On-prem or cloud › Enterprise sales model
    9. 9. © 2013 LucidWorks 9 5 Minutes to Search 1. Install LWS 1. Unpack, double click to launch Installer 2. Launch, wait for startup 2. http://localhost:8989/ 3. Choose “Quick Start” 4. Choose a Data Source 1. For me: /Users/grantingersoll/Desktop/reading 5. Quick Search 6. Search with Flare 1. http://localhost:8989/flare/catalog/quickstart 7. Quick Changes: 1. Add a Facet 2. Change Display Results
    10. 10. © 2013 LucidWorks 10 Prepare Deep Dive Demo 1. https://github.com/LucidWorks/lws-financial- demo/blob/master/README.md 2. cd src/main/python 3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data 4. python python.py
    11. 11. © 2013 LucidWorks • Java APIs for building search applications • Fast, efficient, flexible • Modules to add functionality: - Lang. Analysis - Faceting - Highlighting, spell checking - Much more • Lucene best practices • HTTP-based service - Many client bindings • Faceting • Distributed, fault-tolerant • Many No-SQL features 11
    12. 12. © 2013 LucidWorks 12 • IT Ready Open Source - Installation, provisioning, monitoring, administration, integration • Enterprise Grade - A robust connector framework » Including a wide assortment of prebuilt connectors to popular data sources - Enterprise security framework » Leverages SSL, LDAP, Active Directory » Document level access control • Business Friendly - Rich graphical administration console » speeds up search application development, deployment and management - Expressive Business Logic » Processing information thru filters for better more accurate results - Relevancy Work Bench • Full power of Apache Lucene and Solr LucidWorks Search Goals
    13. 13. © 2013 LucidWorks Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/hist oric data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classifi cation, NLP, topic identification, sear ch log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
    14. 14. © 2013 LucidWorks 14 Basic Vocab •Documents - Fields »Tokens ▪ Payloads • Query - Many diff. kinds: term, phrase, regex, spatial, function •Facets & Filters •Collection - Index »Shard ▪ Segment
    15. 15. © 2013 LucidWorks 15 Search Concepts: Indexing
    16. 16. © 2013 LucidWorks 16 Search Concepts: Ranking • Search is optimized for solving top N problems • Hand Waving Algo: - Parse query - For Each Term » Look up documents containing term - Rank documents according to similarity - Return top X
    17. 17. © 2013 LucidWorks 17 Search Concepts: Faceting • Dynamically slice and dice query results in a variety of ways: - Term - Range (date and numeric) - Pivot - Function - Multi-select • Gather Stats
    18. 18. © 2013 LucidWorks 18 Demo Deep Dive • Application: - Stock Insights - Twitter Bootstrap + Python Flask + LWS - http://localhost:5000 • Goals: - Explore data sources, scheduling, other features - Automate setup via script and LWS APIs • Data: - Company Info (Symbol, Company, Industry, City, State) - Twitter, websites - Historical Stock Prices from Y! Finance • http://github.com/lucidworks/lws-financial-demo - README covers setup
    19. 19. © 2013 LucidWorks 19 Level Up • Explore our APIs: - http://bit.ly/lws-apis • Build your own UI or extend ours • Write a custom connector • Customize Solr! • Scale with SolrCloud • Explore Solr Marketplace: • http://bit.ly/solr-market
    20. 20. © 2013 LucidWorks 20 Where to Next? • http://www.lucidworks.com • http://lucene.apache.org/solr • Training: http://bit.ly/lws-training • LWS more info: http://bit.ly/lws-more-info • LWS Documentation: http://bit.ly/lws-docs • Twitter: @gsingers, @LucidWorks • Taming Text: http://www.manning.com/ingersoll

    ×