This Ain't Your Parent's Search Engine
Upcoming SlideShare
Loading in...5
×
 

This Ain't Your Parent's Search Engine

on

  • 498 views

In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine ...

In just a few short years, search has quickly evolved from being a small text box in the nether regions of a website to being front and center in our lives. Increasingly, however, search engine technology is also being used for practical, real time recommendations, events processing, complex spatial functionality and time series analysis capable of not only matching user's queries in text, but also driving real time decision making and analytics. In fact, open source Apache Lucene/Solr can do all of this and more by taking advantage of new data structures and algorithms that complement more traditional IR approaches. In this demo-driven talk, Lucene committer Grant Ingersoll will take a look at some of the new and exciting ways users are leveraging Lucene/Solr and related technology to drive deeper insight into information needs that go beyond keywords in a text box.

Statistics

Views

Total Views
498
Views on SlideShare
452
Embed Views
46

Actions

Likes
2
Downloads
4
Comments
0

3 Embeds 46

https://twitter.com 38
http://www.slideee.com 7
http://www.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Highlights: Joins, stats, pivot faceting
  • http://localhost:3334/#/dashboard/solr/Trading <br /> <br /> Time series, joins
  • TARDIS: http://2.bp.blogspot.com/-ysN8JskY4WM/UEZNhBywQKI/AAAAAAAABdg/gXE0A9OO6Mk/s1600/13881_doctor_who.jpg <br /> <br /> Work under way to formalize <br />
  • but not as a search engine for content <br /> more like a search engine for behavior <br />

This Ain't Your Parent's Search Engine This Ain't Your Parent's Search Engine Presentation Transcript

  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 This ain’t your Parent’s Search Engine Grant Ingersoll CTO, LucidWorks Twitter: @gsingers
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Search is dead.
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Long live search
  • Confidential and Proprietary © Copyright 2013 Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (n=1) - Recommendations - “Good enough” classification, clustering • Faceting, aggregations, analytical slicing and dicing of data • Spatial, record/event linkage, alerting http://cheezburger.com/5243950080
  • Confidential and Proprietary © Copyright 2013 Foundational Changes in Lucene/Solr 4 •Reduced Memory usage •Pluggable Codecs/similarity •FS(A|T) •Doc Values (column oriented) •Spatial upgrade •New facets and functions •Cursors (deep paging) •Distributed capabilities •Joins/Grouping
  • Confidential and Proprietary © Copyright 2013 Search + Hadoop •What’s Old is New Again •“Traditional” Use Cases: - Build/Store indexes - https://cwiki.apache.org/confluence/display/solr/ Running+Solr+on+HDFS •Enrichment and Signal processing - PageRank, Statistically Interesting Phrases, etc.
  • Confidential and Proprietary © Copyright 2013 LucidWorks + Hadoop •Ingestion Help - Flexible Map-Reduce content ingestion supporting: »Directory of files »CSV, Writable, etc. »LogStash »Build Your Own •Pig Load/Store and UDFs •Hive 2-way support •http://www.lucidworks.com/search-for- hadoop/ - Open source this summer
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 LucidWorks SiLK LucidWorks Search JDBC Connector Web/File System Crawl Data Warehouse Hadoop Connectors Clickstream Networking Data Sources Connectors Servers
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Solr/Solr Cloud Search Analytics—Data Ingestion & Visualization Gateway (Reverse Proxy) Solr Output Writer for LogStash (Http) Search Logs Visualization Configurable Dashboards Hadoop Connector GrokIngestMapperLogStash
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 LucidWorks Open Source • Logstash for Solr: https://github.com/LucidWorks/solrlogmanager • Banana (Kibana for Solr): https://github.com/LucidWorks/banana • Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk • Data Quality Toolkit: https://github.com/LucidWorks/data-quality
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Demos
  • Confidential and Proprietary © Copyright 2013 12 Fly the friendly skies http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Make $$$ • Leverage time series data and visualization using LucidWorks SiLK • Monitor Social • Traditional Research https://github.com/lucidworks/lws-financial-demo
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Cure what ails you
  • Confidential and Proprietary © Copyright 2013 15 Space-Time Continuum • Leverage Solr’s spatial capabilities to index non- spatial data, such as time ranges - Useful for Open Hours, Shifts, etc. • Query using rectangle intersections - q = shift:"Intersects(0 19 23 365)” https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Signal Processing for Search and Discovery • Signals power modern relevance – Clicks, conversions, sharing, history, signatures • LucidWorks 5 makes it easy to capture and leverage signals – Recommendations, analytics, discovery • Simplifies your data workflow • Simplify your operational footprint
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Solr Powered Signal Processing • Use Case: eCommerce • Data: – Product catalog (~1.2m items) – Click data (~3.9M clicks)
  • Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Meta • http://www.lucidworks.com – grant@lucidworks.com – @gsingers • Lucene/Solr Revolution – Washington DC, Nov 11-14 – http://www.lucenerevolution.org