© Copyright 2013
Intro to Search
Grant Ingersoll
CTO, LucidWorks
@gsingers
© 2013 LucidWorks
• Search is Everywhere!
• The Bar is Raised
- Keyword search is a
commodity
• Holistic view of the data
AND the users is critical
• Scalable
Search, Discovery and
Analytics are the key to
unlocking this view of
users and data
Search is dead, long live search
Documents
User
Interaction
Access
Content
Relationships
© 2013 LucidWorks
3
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document
collection
• De-normalized data
- “light” relational
• Top N problems
- Key-value (top 1)
- Recommendations
- “Good enough” classification, clustering
• Faceting, slicing and dicing of enumerated data
• Spatial, spell checking, record linkage, highlighting
• NoSQL
© 2013 LucidWorks
4
Common Use Cases
• eCommerce
- Search + Recs + Analysis of users
• Knowledge Management
- Financial, transportation, pharma
• Fraud detection
• Social media
- Trend monitoring
• Information technology
- Log monitoring, analysis
• Healthcare
- DNA Analysis
© 2013 LucidWorks
http://bit.ly/get-lws
5
© 2013 LucidWorks
6
Topics
• Intros
• First 5 Minutes with LucidWorks Search (Solr++)
• Search Concepts
• Demo Deep Dive
• Level Up
• Resources
© 2013 LucidWorks
7
› Founded in 2007 to be the go-to-company for Lucene/Solr
expertise
› 250+ customers (many Fortune 500)
› 100% y-y growth
› Over 40% of the active Apache Lucene/Solr Committers
› Host fast-growing Lucene/Solr Revolution User Conference
(400+ attendees)
LucidWorks Overview
© 2013 LucidWorks
8
LucidWorks Product Suite
PRODUCT
LucidWorks Search
LucidWorks Big
Data
Description
Massively adopted open
source search
technology
Enterprise Search
platform built on
Lucene/Solr
Unified development
platform for Big Data
applications
Version
Version 4.3 released
May 2013
Version 2.5 ships
December 2012
GA Version 1.1
released Feb. 2013
LucidWorks
Offering
› Annual Support
Subscriptions
› Professional Services
› Training
› Inside Sales Model
› Free trial
› On-prem or cloud
› Inside sales model
› Free Trial
› On-prem or cloud
› Enterprise sales model
© 2013 LucidWorks
9
5 Minutes to Search
1. Install LWS
1. Unpack, double click to launch Installer
2. Launch, wait for startup
2. http://localhost:8989/
3. Choose “Quick Start”
4. Choose a Data Source
1. For me: /Users/grantingersoll/Desktop/reading
5. Quick Search
6. Search with Flare
1. http://localhost:8989/flare/catalog/quickstart
7. Quick Changes:
1. Add a Facet
2. Change Display Results
© 2013 LucidWorks
10
Prepare Deep Dive Demo
1. https://github.com/LucidWorks/lws-financial-
demo/blob/master/README.md
2. cd src/main/python
3. python setup.py -n setup -a
TWITTER_ACCESS_TOKEN -c
TWITTER_CONSUMER_KEY -s
TWITTER_CONSUMER_SECRET -t
TWITTER_ACCESS_TOKEN_SECRET -p
../../../data/sp500List-30.txt -A -l Finance --data_dir
../../../data
4. python python.py
© 2013 LucidWorks
• Java APIs for building
search applications
• Fast, efficient, flexible
• Modules to add
functionality:
- Lang. Analysis
- Faceting
- Highlighting, spell checking
- Much more
• Lucene best practices
• HTTP-based service
- Many client bindings
• Faceting
• Distributed, fault-tolerant
• Many No-SQL features
11
© 2013 LucidWorks
12
• IT Ready Open Source
- Installation, provisioning, monitoring, administration, integration
• Enterprise Grade
- A robust connector framework
» Including a wide assortment of prebuilt connectors to popular data sources
- Enterprise security framework
» Leverages SSL, LDAP, Active Directory
» Document level access control
• Business Friendly
- Rich graphical administration console
» speeds up search application development, deployment and management
- Expressive Business Logic
» Processing information thru filters for better more accurate results
- Relevancy Work Bench
• Full power of Apache Lucene and Solr
LucidWorks Search Goals
© 2013 LucidWorks
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/hist
oric data
Classification
Recommendation
Personalization
& Machine
Learning
Services
Classification
Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering, classifi
cation, NLP, topic
identification, sear
ch log
analysis, user
behavior Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture
© 2013 LucidWorks
14
Basic Vocab
•Documents
- Fields
»Tokens
▪ Payloads
• Query
- Many diff. kinds: term, phrase, regex, spatial, function
•Facets & Filters
•Collection
- Index
»Shard
▪ Segment
© 2013 LucidWorks
15
Search Concepts: Indexing
© 2013 LucidWorks
16
Search Concepts: Ranking
• Search is optimized for solving top
N problems
• Hand Waving Algo:
- Parse query
- For Each Term
» Look up documents containing term
- Rank documents according to
similarity
- Return top X
© 2013 LucidWorks
17
Search Concepts: Faceting
• Dynamically slice and dice query
results in a variety of ways:
- Term
- Range (date and numeric)
- Pivot
- Function
- Multi-select
• Gather Stats
© 2013 LucidWorks
18
Demo Deep Dive
• Application:
- Stock Insights
- Twitter Bootstrap + Python Flask + LWS
- http://localhost:5000
• Goals:
- Explore data sources, scheduling, other features
- Automate setup via script and LWS APIs
• Data:
- Company Info (Symbol, Company, Industry, City, State)
- Twitter, websites
- Historical Stock Prices from Y! Finance
• http://github.com/lucidworks/lws-financial-demo
- README covers setup
© 2013 LucidWorks
19
Level Up
• Explore our APIs:
- http://bit.ly/lws-apis
• Build your own UI or
extend ours
• Write a custom connector
• Customize Solr!
• Scale with SolrCloud
• Explore Solr Marketplace:
• http://bit.ly/solr-market
© 2013 LucidWorks
20
Where to Next?
• http://www.lucidworks.com
• http://lucene.apache.org/solr
• Training: http://bit.ly/lws-training
• LWS more info: http://bit.ly/lws-more-info
• LWS Documentation: http://bit.ly/lws-docs
• Twitter: @gsingers, @LucidWorks
• Taming Text: http://www.manning.com/ingersoll

Intro to Search

  • 1.
    © Copyright 2013 Introto Search Grant Ingersoll CTO, LucidWorks @gsingers
  • 2.
    © 2013 LucidWorks •Search is Everywhere! • The Bar is Raised - Keyword search is a commodity • Holistic view of the data AND the users is critical • Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data Search is dead, long live search Documents User Interaction Access Content Relationships
  • 3.
    © 2013 LucidWorks 3 Searchis good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (top 1) - Recommendations - “Good enough” classification, clustering • Faceting, slicing and dicing of enumerated data • Spatial, spell checking, record linkage, highlighting • NoSQL
  • 4.
    © 2013 LucidWorks 4 CommonUse Cases • eCommerce - Search + Recs + Analysis of users • Knowledge Management - Financial, transportation, pharma • Fraud detection • Social media - Trend monitoring • Information technology - Log monitoring, analysis • Healthcare - DNA Analysis
  • 5.
  • 6.
    © 2013 LucidWorks 6 Topics •Intros • First 5 Minutes with LucidWorks Search (Solr++) • Search Concepts • Demo Deep Dive • Level Up • Resources
  • 7.
    © 2013 LucidWorks 7 ›Founded in 2007 to be the go-to-company for Lucene/Solr expertise › 250+ customers (many Fortune 500) › 100% y-y growth › Over 40% of the active Apache Lucene/Solr Committers › Host fast-growing Lucene/Solr Revolution User Conference (400+ attendees) LucidWorks Overview
  • 8.
    © 2013 LucidWorks 8 LucidWorksProduct Suite PRODUCT LucidWorks Search LucidWorks Big Data Description Massively adopted open source search technology Enterprise Search platform built on Lucene/Solr Unified development platform for Big Data applications Version Version 4.3 released May 2013 Version 2.5 ships December 2012 GA Version 1.1 released Feb. 2013 LucidWorks Offering › Annual Support Subscriptions › Professional Services › Training › Inside Sales Model › Free trial › On-prem or cloud › Inside sales model › Free Trial › On-prem or cloud › Enterprise sales model
  • 9.
    © 2013 LucidWorks 9 5Minutes to Search 1. Install LWS 1. Unpack, double click to launch Installer 2. Launch, wait for startup 2. http://localhost:8989/ 3. Choose “Quick Start” 4. Choose a Data Source 1. For me: /Users/grantingersoll/Desktop/reading 5. Quick Search 6. Search with Flare 1. http://localhost:8989/flare/catalog/quickstart 7. Quick Changes: 1. Add a Facet 2. Change Display Results
  • 10.
    © 2013 LucidWorks 10 PrepareDeep Dive Demo 1. https://github.com/LucidWorks/lws-financial- demo/blob/master/README.md 2. cd src/main/python 3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data 4. python python.py
  • 11.
    © 2013 LucidWorks •Java APIs for building search applications • Fast, efficient, flexible • Modules to add functionality: - Lang. Analysis - Faceting - Highlighting, spell checking - Much more • Lucene best practices • HTTP-based service - Many client bindings • Faceting • Distributed, fault-tolerant • Many No-SQL features 11
  • 12.
    © 2013 LucidWorks 12 •IT Ready Open Source - Installation, provisioning, monitoring, administration, integration • Enterprise Grade - A robust connector framework » Including a wide assortment of prebuilt connectors to popular data sources - Enterprise security framework » Leverages SSL, LDAP, Active Directory » Document level access control • Business Friendly - Rich graphical administration console » speeds up search application development, deployment and management - Expressive Business Logic » Processing information thru filters for better more accurate results - Relevancy Work Bench • Full power of Apache Lucene and Solr LucidWorks Search Goals
  • 13.
    © 2013 LucidWorks Shards 1 2 3N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/hist oric data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classifi cation, NLP, topic identification, sear ch log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 14.
    © 2013 LucidWorks 14 BasicVocab •Documents - Fields »Tokens ▪ Payloads • Query - Many diff. kinds: term, phrase, regex, spatial, function •Facets & Filters •Collection - Index »Shard ▪ Segment
  • 15.
    © 2013 LucidWorks 15 SearchConcepts: Indexing
  • 16.
    © 2013 LucidWorks 16 SearchConcepts: Ranking • Search is optimized for solving top N problems • Hand Waving Algo: - Parse query - For Each Term » Look up documents containing term - Rank documents according to similarity - Return top X
  • 17.
    © 2013 LucidWorks 17 SearchConcepts: Faceting • Dynamically slice and dice query results in a variety of ways: - Term - Range (date and numeric) - Pivot - Function - Multi-select • Gather Stats
  • 18.
    © 2013 LucidWorks 18 DemoDeep Dive • Application: - Stock Insights - Twitter Bootstrap + Python Flask + LWS - http://localhost:5000 • Goals: - Explore data sources, scheduling, other features - Automate setup via script and LWS APIs • Data: - Company Info (Symbol, Company, Industry, City, State) - Twitter, websites - Historical Stock Prices from Y! Finance • http://github.com/lucidworks/lws-financial-demo - README covers setup
  • 19.
    © 2013 LucidWorks 19 LevelUp • Explore our APIs: - http://bit.ly/lws-apis • Build your own UI or extend ours • Write a custom connector • Customize Solr! • Scale with SolrCloud • Explore Solr Marketplace: • http://bit.ly/solr-market
  • 20.
    © 2013 LucidWorks 20 Whereto Next? • http://www.lucidworks.com • http://lucene.apache.org/solr • Training: http://bit.ly/lws-training • LWS more info: http://bit.ly/lws-more-info • LWS Documentation: http://bit.ly/lws-docs • Twitter: @gsingers, @LucidWorks • Taming Text: http://www.manning.com/ingersoll

Editor's Notes

  • #3 The bar is raised: when we first started Lucid, the problems were all around standing up Lucene or Solr or dealing with performance issues, now the large majority of them are around taking search to the next level: better relevance, personalization, recommendations, etc., i.e. how to have better relevance
  • #12 What is Lucene?What is Solr?
  • #14 Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time