Intro to Search

© Copyright 2013
Intro to Search
Grant Ingersoll
CTO, LucidWorks
@gsingers

© 2013 LucidWorks
• Search is Everywhere!
• The Bar is Raised
- Keyword search is a
commodity
• Holistic view of the data
AND the users is critical
• Scalable
Search, Discovery and
Analytics are the key to
unlocking this view of
users and data
Search is dead, long live search
Documents
User
Interaction
Access
Content
Relationships

© 2013 LucidWorks
3
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document
collection
• De-normalized data
- “light” relational
• Top N problems
- Key-value (top 1)
- Recommendations
- “Good enough” classification, clustering
• Faceting, slicing and dicing of enumerated data
• Spatial, spell checking, record linkage, highlighting
• NoSQL

© 2013 LucidWorks
4
Common Use Cases
• eCommerce
- Search + Recs + Analysis of users
• Knowledge Management
- Financial, transportation, pharma
• Fraud detection
• Social media
- Trend monitoring
• Information technology
- Log monitoring, analysis
• Healthcare
- DNA Analysis

© 2013 LucidWorks
http://bit.ly/get-lws
5

© 2013 LucidWorks
6
Topics
• Intros
• First 5 Minutes with LucidWorks Search (Solr++)
• Search Concepts
• Demo Deep Dive
• Level Up
• Resources

© 2013 LucidWorks
7
› Founded in 2007 to be the go-to-company for Lucene/Solr
expertise
› 250+ customers (many Fortune 500)
› 100% y-y growth
› Over 40% of the active Apache Lucene/Solr Committers
› Host fast-growing Lucene/Solr Revolution User Conference
(400+ attendees)
LucidWorks Overview

© 2013 LucidWorks
8
LucidWorks Product Suite
PRODUCT
LucidWorks Search
LucidWorks Big
Data
Description
Massively adopted open
source search
technology
Enterprise Search
platform built on
Lucene/Solr
Unified development
platform for Big Data
applications
Version
Version 4.3 released
May 2013
Version 2.5 ships
December 2012
GA Version 1.1
released Feb. 2013
LucidWorks
Offering
› Annual Support
Subscriptions
› Professional Services
› Training
› Inside Sales Model
› Free trial
› On-prem or cloud
› Inside sales model
› Free Trial
› On-prem or cloud
› Enterprise sales model

© 2013 LucidWorks
9
5 Minutes to Search
1. Install LWS
1. Unpack, double click to launch Installer
2. Launch, wait for startup
2. http://localhost:8989/
3. Choose “Quick Start”
4. Choose a Data Source
1. For me: /Users/grantingersoll/Desktop/reading
5. Quick Search
6. Search with Flare
1. http://localhost:8989/flare/catalog/quickstart
7. Quick Changes:
1. Add a Facet
2. Change Display Results

© 2013 LucidWorks
10
Prepare Deep Dive Demo
1. https://github.com/LucidWorks/lws-financial-
demo/blob/master/README.md
2. cd src/main/python
3. python setup.py -n setup -a
TWITTER_ACCESS_TOKEN -c
TWITTER_CONSUMER_KEY -s
TWITTER_CONSUMER_SECRET -t
TWITTER_ACCESS_TOKEN_SECRET -p
../../../data/sp500List-30.txt -A -l Finance --data_dir
../../../data
4. python python.py

© 2013 LucidWorks
• Java APIs for building
search applications
• Fast, efficient, flexible
• Modules to add
functionality:
- Lang. Analysis
- Faceting
- Highlighting, spell checking
- Much more
• Lucene best practices
• HTTP-based service
- Many client bindings
• Faceting
• Distributed, fault-tolerant
• Many No-SQL features
11

© 2013 LucidWorks
12
• IT Ready Open Source
- Installation, provisioning, monitoring, administration, integration
• Enterprise Grade
- A robust connector framework
» Including a wide assortment of prebuilt connectors to popular data sources
- Enterprise security framework
» Leverages SSL, LDAP, Active Directory
» Document level access control
• Business Friendly
- Rich graphical administration console
» speeds up search application development, deployment and management
- Expressive Business Logic
» Processing information thru filters for better more accurate results
- Relevancy Work Bench
• Full power of Apache Lucene and Solr
LucidWorks Search Goals

© 2013 LucidWorks
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/hist
oric data
Classification
Recommendation
Personalization
& Machine
Learning
Services
Classification
Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering, classifi
cation, NLP, topic
identification, sear
ch log
analysis, user
behavior Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture

© 2013 LucidWorks
14
Basic Vocab
•Documents
- Fields
»Tokens
▪ Payloads
• Query
- Many diff. kinds: term, phrase, regex, spatial, function
•Facets & Filters
•Collection
- Index
»Shard
▪ Segment

© 2013 LucidWorks
16
Search Concepts: Ranking
• Search is optimized for solving top
N problems
• Hand Waving Algo:
- Parse query
- For Each Term
» Look up documents containing term
- Rank documents according to
similarity
- Return top X

© 2013 LucidWorks
17
Search Concepts: Faceting
• Dynamically slice and dice query
results in a variety of ways:
- Term
- Range (date and numeric)
- Pivot
- Function
- Multi-select
• Gather Stats

© 2013 LucidWorks
18
Demo Deep Dive
• Application:
- Stock Insights
- Twitter Bootstrap + Python Flask + LWS
- http://localhost:5000
• Goals:
- Explore data sources, scheduling, other features
- Automate setup via script and LWS APIs
• Data:
- Company Info (Symbol, Company, Industry, City, State)
- Twitter, websites
- Historical Stock Prices from Y! Finance
• http://github.com/lucidworks/lws-financial-demo
- README covers setup

© 2013 LucidWorks
19
Level Up
• Explore our APIs:
- http://bit.ly/lws-apis
• Build your own UI or
extend ours
• Write a custom connector
• Customize Solr!
• Scale with SolrCloud
• Explore Solr Marketplace:
• http://bit.ly/solr-market

© 2013 LucidWorks
20
Where to Next?
• http://www.lucidworks.com
• http://lucene.apache.org/solr
• Training: http://bit.ly/lws-training
• LWS more info: http://bit.ly/lws-more-info
• LWS Documentation: http://bit.ly/lws-docs
• Twitter: @gsingers, @LucidWorks
• Taming Text: http://www.manning.com/ingersoll

Intro to Search

More Related Content

What's hot

Viewers also liked

Similar to Intro to Search

More from Grant Ingersoll

Recently uploaded

Intro to Search

Editor's Notes