Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

638 views
478 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
638
On SlideShare
0
From Embeds
0
Number of Embeds
18
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • I chose LogStash for data transformation and import for two reasons:
     
    It provides a powerful framework for extracting, grokking and transforming log data into a structured format that Solr can consume and that SILK can use for dashboards.
    LucidWorks’ Hadoop Connectors have a GrokIngestMapper that allows me to reuse the same LogStash Filters to work with larger volumes of files on HDFS (more details on this in a future article).
  • Highlights: Joins, stats, pivot faceting
  • http://localhost:3334/#/dashboard/solr/Trading

    Time series, joins
  • TARDIS: http://2.bp.blogspot.com/-ysN8JskY4WM/UEZNhBywQKI/AAAAAAAABdg/gXE0A9OO6Mk/s1600/13881_doctor_who.jpg

    Work under way to formalize
  • but not as a search engine for content
    more like a search engine for behavior
  • Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine

    1. 1. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 This Ain’t Your Parents’ Search Engine Grant Ingersoll CTO, LucidWorks Twitter: @gsingers
    2. 2. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Search is dead.
    3. 3. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Long live search
    4. 4. Confidential and Proprietary © Copyright 2013 Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (n=1) - Recommendations - “Good enough” classification, clustering • Faceting, aggregations, analytical slicing and dicing of data • Spatial, record/event linkage, alerting http://cheezburger.com/5243950080
    5. 5. Confidential and Proprietary © Copyright 2013 Foundational Changes in Lucene/Solr 4 •Reduced Memory usage •Pluggable Codecs/similarity •FS(A|T) •Doc Values (column oriented) •Spatial upgrade •New facets and functions •Cursors (deep paging) •Distributed capabilities •Joins/Grouping
    6. 6. Confidential and Proprietary © Copyright 2013 Search + Hadoop •What’s Old is New Again •“Traditional” Use Cases: - Build/Store indexes - https://cwiki.apache.org/confluence/display/solr/ Running+Solr+on+HDFS •Enrichment and Signal processing - PageRank, Statistically Interesting Phrases, etc.
    7. 7. Confidential and Proprietary © Copyright 2013 LucidWorks + Hadoop •Ingestion Help - Flexible Map-Reduce content ingestion supporting: »Directory of files »CSV, Writable, etc. »LogStash »Build Your Own •Pig Load/Store and UDFs •Hive 2-way support •http://www.lucidworks.com/search-for- hadoop/ - Open source this summer
    8. 8. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 LucidWorks SiLK LucidWorks Search JDBC Connector Web/File System Crawl Data Warehouse Hadoop Connectors Clickstream Networking Data Sources Connectors Servers
    9. 9. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Solr/Solr Cloud Search Analytics—Data Ingestion & Visualization Gateway (Reverse Proxy) Solr Output Writer for LogStash (Http) Search Logs Visualization Configurable Dashboards Hadoop Connector GrokIngestMapperLogStash
    10. 10. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 LucidWorks Open Source • Logstash for Solr: https://github.com/LucidWorks/solrlogmanager • Banana (Kibana for Solr): https://github.com/LucidWorks/banana • Effortless AWS deployment and monitoring: http://www.github.com/lucidworks/solr-scale-tk • Data Quality Toolkit: https://github.com/LucidWorks/data-quality
    11. 11. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Demos
    12. 12. Confidential and Proprietary © Copyright 2013 12 Fly the friendly skies http://www.ibm.com/developerworks/library/j-solr-lucene/index.html
    13. 13. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Make $$$ • Leverage time series data and visualization using LucidWorks SiLK • Monitor Social • Traditional Research https://github.com/lucidworks/lws-financial-demo
    14. 14. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Cure what ails you
    15. 15. Confidential and Proprietary © Copyright 2013 15 Space-Time Continuum • Leverage Solr’s spatial capabilities to index non- spatial data, such as time ranges - Useful for Open Hours, Shifts, etc. • Query using rectangle intersections - q = shift:"Intersects(0 19 23 365)” https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
    16. 16. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Signal Processing for Search and Discovery • Signals power modern relevance – Clicks, conversions, sharing, history, signatures • LucidWorks 5 makes it easy to capture and leverage signals – Recommendations, analytics, discovery • Simplifies your data workflow • Simplify your operational footprint
    17. 17. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Solr Powered Signal Processing • Use Case: eCommerce • Data: – Product catalog (~1.2m items) – Click data (~3.9M clicks)
    18. 18. Confidential and Proprietary © Copyright 2013Confidential and Proprietary © Copyright 2013 Meta • http://www.lucidworks.com – grant@lucidworks.com – @gsingers • Sales – Steve Drane (based here in Chicago) – steve.drane@lucidworks.com • Lucene/Solr Revolution – Washington DC, Nov 11-14 – http://www.lucenerevolution.org

    ×