SlideShare a Scribd company logo
Search Analytics

      Business Value
            &
      NoSQL Backend


Otis Gospodnetić – Sematext International
  @otisg ◦ @sematext ◦ sematext.com

    sematext.com/search-analytics
About Otis Gospodnetić
• ASF Member: Lucene, Solr, Nutch, Mahout

• Author: Lucene in Action 1 & 2


• Entrepreneur: Sematext, Simpy




                                                                     2
               Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Metrics
●   100% organic: no GMO, no VC
●   4 years old
●   < 10 people
●   7 countries
●   3 timezones
●   2 continents
●   > 100 customers


                                                                         3
                   Copyright 2011 Sematext Int'l. All rights reserved.
About Sematext
    Products & Services
    Consulting, Development, Tech Support:

●   Search (Lucene, Solr, ElasticSearch...)
●   Big Data (Hadoop, HBase, Voldemort...)
●   Web Crawling (Nutch, Droids)
●   Machine Learning (Mahout)


                                                                        4
                  Copyright 2011 Sematext Int'l. All rights reserved.
Agenda

●   What is Search Analytics and why it matters
●   Example reports and their value
●   What we built, why, and how




                                                                          5
                    Copyright 2011 Sematext Int'l. All rights reserved.
Communication
●   twitter.com/sematext
●   twitter.com/otisg
●   hash tags: #stsa or #stanalytics
●   http://sematext.com/search-analytics/index.html
●   Raise your hand!
●   otis@sematext.com



                                                                        6
                  Copyright 2011 Sematext Int'l. All rights reserved.
The Compass


     Search logs are your Map
     Search Analytics is your Compass




                                                                 7
           Copyright 2011 Sematext Int'l. All rights reserved.
High Level Why


                         search
                          users


                      search
                    experience



                       search
                      providers




                                                                8
          Copyright 2011 Sematext Int'l. All rights reserved.
High Level Why
                                                             This search sucks!
                                                   It takes 17 tries to find anything here!
                                                              F!?@#$%^&?!?


                         search
                          users


                      search
                    experience



                       search
                      providers
                                                          Cool, the latest search tweaks
                                                           made our site really sticky!
                                                                     Awesome!



                                                                                           9
          Copyright 2011 Sematext Int'l. All rights reserved.
Don't Be Like This Dude




                                                                10
          Copyright 2011 Sematext Int'l. All rights reserved.
Got Clue?

                Performance Monitoring




    Tuning      Search Analytics                                   UI




                   Quality Assurance




                                                                        11
             Copyright 2011 Sematext Int'l. All rights reserved.
More Concrete Why
●   Measure and monitor everything. Introspection.
●   Supports (re)design, navigation choices
●   Helps with content acquisition & enhancement
●   Improve search experience
●   Mula




                                                                       12
                 Copyright 2011 Sematext Int'l. All rights reserved.
The Moment of Truth
       Question for the audience #1

   What do you use for Search Analytics?

   a) Home grown stuff
   b) Google Analytics
   c) Omniture
   d) Webtrends
   e) Other
   f ) Nothing

                                                                   13
             Copyright 2011 Sematext Int'l. All rights reserved.
Search Analytics Outline
●   Collect: queries & clicks & interactions & ...
●   Analyze: actions / xactions / conversions
●   Output: reports – over time
●   Output++: feedback loop                                             remember this




●   The means, not the goal
●   Ongoing, not one-off


                                                                                        14
                  Copyright 2011 Sematext Int'l. All rights reserved.
Search vs. Web Analytics
●   User intent and information needs vs. inferring
●   Hand in hand
●   Ideally you can relate data from both or even
    unify it




                                                                         15
                   Copyright 2011 Sematext Int'l. All rights reserved.
Example Core Reports
●   Rate & Volume, Latency (mean, avg, 90%)
●   Click Through Rate, Mean Reciprocal Rank
●   Top Queries by count, clicks, 0 hits...
●   Query Trending
●   Top Seen Docs, Top Clicked Docs (msft)
●   Page & Click Depth
●   Facet & Sort Usage
●   ...
                                                                        16
                  Copyright 2011 Sematext Int'l. All rights reserved.
More Reports in More Detail
●   See Search Analytics What? Why?
    How?

    http://blog.sematext.com/tag/analytics/




                                                                        17
                  Copyright 2011 Sematext Int'l. All rights reserved.
Part Dos
     Switching gears... Juno digs NoSQL




                                                                  18
            Copyright 2011 Sematext Int'l. All rights reserved.
What We've Built
●   Search Analytics SaaS
    ●   Numerous reports (e.g. query volume,
        rate, latency, term frequencies /
        comparisons, hit buckets, search origins,
        etc.)
    ●   Trending over time
    ●   Comparisons of time periods
    ●   Top N reports
    ●   Filter, slice and dice


                                                                            19
                      Copyright 2011 Sematext Int'l. All rights reserved.
Who Needs a Compass?
●   We need it
    ●   search-hadoop.com & search-lucene.com

●   Our customers need it!

●   You?




                                                                         20
                   Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Search Analytics




                                                                21
          Copyright 2011 Sematext Int'l. All rights reserved.
Big Dreams
●   SaaS
●   Multitenant
●   Large Scale – Massive Data
●   Cloud




                                                                        22
                  Copyright 2011 Sematext Int'l. All rights reserved.
Storage Choices
●   RDBMS: MySQL, PostgreSQL
●   HDFS
●   Hive
●   HBase
●   Cassandra




                                                                      23
                Copyright 2011 Sematext Int'l. All rights reserved.
SaaS vs. In-House
     Question for the audience #2

     SaaS vs in-house Search Analytics?

     a) SaaS
     b) in-house




                                                                  24
            Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Search Analytics




                                                                25
          Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Search Analytics




                                                                26
          Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Search Analytics




                                                                27
          Copyright 2011 Sematext Int'l. All rights reserved.
Sematext Search Analytics




                                                                28
          Copyright 2011 Sematext Int'l. All rights reserved.
Data Flow
●   See Search Analytics with Flume and HBase
     http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/




                                                                                  29
                           Copyright 2011 Sematext Int'l. All rights reserved.
Data Collection
●   See Search Analytics with Flume and HBase
    http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/




                                                                                     30
                           Copyright 2011 Sematext Int'l. All rights reserved.
Core Tech
●   JavaScript Beacons
●   Metric Capture Web App aka Receiver
●   Flume Agents, Collectors, Sinks
●   HBase
●   MapReduce Aggregations
●   Search Analytics Reporting Web App



                                                                       31
                 Copyright 2011 Sematext Int'l. All rights reserved.
What is Flume
●   Distributed data/log collection service
●   Scalable, configurable, extensible
●   Centrally manageable, open source

●   Agents get data from app, Collectors save it
●   Abstractions: Source → Decorator(s) → Sink



                                                                         32
                   Copyright 2011 Sematext Int'l. All rights reserved.
What is HBase
●   Scalable, reliable, distributed, column-oriented DB
●   On top of HDFS
●   MapReducable




                                                                        33
                  Copyright 2011 Sematext Int'l. All rights reserved.
Data Flow, Detailed




                                                                 34
           Copyright 2011 Sematext Int'l. All rights reserved.
Why Flume
●   Reliable delivery
    ●   e.g. queue msgs locally if destination unreachable
●   Easy, centralized management via Web UI or
    console
●   Good community, good progress, now @ASF
●   But: more complex, more moving parts
●   On Flume: slideshare.net/cloudera/inside-flume
●   Alternatives: Kafka, Scribe...

                                                                            35
                      Copyright 2011 Sematext Int'l. All rights reserved.
Why HBase
●   Scalable raw & aggregate data storage
●   MapReduce data input
●   Fast scans for time ranges, fast key lookups
●   Easy storage and compute power expansion
●   Good looking roadmap, community, progress




                                                                        36
                  Copyright 2011 Sematext Int'l. All rights reserved.
Open Sourcing
●   2 open-source projects:
    github.com/sematext/HBaseWD
    github.com/sematext/HBaseHUT
●   See sematext.com/open-source/index.html

●   Patches for Flume and HBase
    blog.sematext.com/tag/flume/


                                                                        37
                  Copyright 2011 Sematext Int'l. All rights reserved.
Challenges
●   Data size. Solutions:
    ●   Compression (4-5x smaller with lzo)
    ●   Data pruning (variable levels)
●   Query string distribution: very long-tail
    ●   Lots of data to process, update, aggregate
●   Young tools: Flume, HBase
●   Poor IO on EC2
●   Hadoop distributions

                                                                           38
                     Copyright 2011 Sematext Int'l. All rights reserved.
Output++
●   AutoComplete - $MM improvement
●   Better DYM Spellchecker
●   Related Searches
●   Recommendations
●   Relevance Feedback
●   ...



                                                                      39
                Copyright 2011 Sematext Int'l. All rights reserved.
Closing the Loop

                         search
                          users



                      search
                    experience




                        search
                       providers




                                                                40
          Copyright 2011 Sematext Int'l. All rights reserved.
Resource
                                      Search Analytics for Your Site
                                                  Louis Rosenfeld




           http://rosenfeldmedia.com/books/searchanalytics/




                                                                       41
              Copyright 2011 Sematext Int'l. All rights reserved.
We're Hiring
    Dig Search?
    Dig Analytics?
    Dig Big Data?
    Dig Performance?
    Dig working with and in open-source?
    We're hiring world-wide!
    http://sematext.com/about/jobs.html


                                                                  42
            Copyright 2011 Sematext Int'l. All rights reserved.
Contact
      sematext.com
      blog.sematext.com
      @sematext
      @otisg
      otis@sematext.com

      Want SA? Grab me or go to:
          sematext.com/search-analytics

      Hash tags: #stsa or #stanalytics
                                                                  43
            Copyright 2011 Sematext Int'l. All rights reserved.

More Related Content

Similar to Search Analytics Business Value & NoSQL Backend

Content Analytics for Better Search
Content Analytics for Better SearchContent Analytics for Better Search
Content Analytics for Better Search
Seth Grimes
 
Getting the Most Out of Google Analytics
Getting the Most Out of Google AnalyticsGetting the Most Out of Google Analytics
Getting the Most Out of Google Analytics
Sanger & Eby
 
Getting The Most Out of Google Analytics
Getting The Most Out of Google AnalyticsGetting The Most Out of Google Analytics
Getting The Most Out of Google Analytics
Kat Jenkins
 
Measuring web performance. Velocity EU 2011
Measuring web performance. Velocity EU 2011Measuring web performance. Velocity EU 2011
Measuring web performance. Velocity EU 2011
Stephen Thair
 
Search Systems Redux
Search Systems ReduxSearch Systems Redux
Samepoint API
Samepoint APISamepoint API
Samepoint API
Darren Culbreath
 
Five Pillars of SharePoint Governance Supportability
Five Pillars of SharePoint Governance SupportabilityFive Pillars of SharePoint Governance Supportability
Five Pillars of SharePoint Governance SupportabilitySentri
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
Timothy Spann
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfresco
rivetlogic
 
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
mosaicnet
 
Making most of marketing dashboards
Making most of marketing dashboardsMaking most of marketing dashboards
Making most of marketing dashboardsStratigent
 
TIRTA ERP
TIRTA ERPTIRTA ERP
TIRTA ERP
Wildan Maulana
 
Empower your Enterprise with language intelligence_Francisco Webber
Empower your Enterprise with language intelligence_Francisco Webber Empower your Enterprise with language intelligence_Francisco Webber
Empower your Enterprise with language intelligence_Francisco Webber
Dataconomy Media
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
Uri Cohen
 
Dude where's my backlog?
Dude where's my backlog?Dude where's my backlog?
Dude where's my backlog?
Robin Dymond
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFi
Timothy Spann
 
Deploying Enterprise Search in PLM Context with Aras
Deploying Enterprise Search in PLM Context with ArasDeploying Enterprise Search in PLM Context with Aras
Deploying Enterprise Search in PLM Context with Aras
Aras
 
UPA 2011 - Better Usability Through Visualization
UPA 2011 - Better Usability Through VisualizationUPA 2011 - Better Usability Through Visualization
UPA 2011 - Better Usability Through Visualization
OneSpring LLC
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
Hakka Labs
 

Similar to Search Analytics Business Value & NoSQL Backend (20)

Content Analytics for Better Search
Content Analytics for Better SearchContent Analytics for Better Search
Content Analytics for Better Search
 
Getting the Most Out of Google Analytics
Getting the Most Out of Google AnalyticsGetting the Most Out of Google Analytics
Getting the Most Out of Google Analytics
 
Getting The Most Out of Google Analytics
Getting The Most Out of Google AnalyticsGetting The Most Out of Google Analytics
Getting The Most Out of Google Analytics
 
Measuring web performance. Velocity EU 2011
Measuring web performance. Velocity EU 2011Measuring web performance. Velocity EU 2011
Measuring web performance. Velocity EU 2011
 
Search Systems Redux
Search Systems ReduxSearch Systems Redux
Search Systems Redux
 
Samepoint API
Samepoint APISamepoint API
Samepoint API
 
Search Analytics What? Why? How?
Search Analytics What? Why? How?Search Analytics What? Why? How?
Search Analytics What? Why? How?
 
Five Pillars of SharePoint Governance Supportability
Five Pillars of SharePoint Governance SupportabilityFive Pillars of SharePoint Governance Supportability
Five Pillars of SharePoint Governance Supportability
 
Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018Apache Deep Learning 101 - DWS Berlin 2018
Apache Deep Learning 101 - DWS Berlin 2018
 
Digital Asset Management with Alfresco
Digital Asset Management with AlfrescoDigital Asset Management with Alfresco
Digital Asset Management with Alfresco
 
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
토드(Toad) 신제품 및 크로스 플랫폼 전략(1)
 
Making most of marketing dashboards
Making most of marketing dashboardsMaking most of marketing dashboards
Making most of marketing dashboards
 
TIRTA ERP
TIRTA ERPTIRTA ERP
TIRTA ERP
 
Empower your Enterprise with language intelligence_Francisco Webber
Empower your Enterprise with language intelligence_Francisco Webber Empower your Enterprise with language intelligence_Francisco Webber
Empower your Enterprise with language intelligence_Francisco Webber
 
Real Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case studyReal Time Analytics for Big Data - A twitter inspired case study
Real Time Analytics for Big Data - A twitter inspired case study
 
Dude where's my backlog?
Dude where's my backlog?Dude where's my backlog?
Dude where's my backlog?
 
Enterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFiEnterprise IIoT Edge Processing with Apache NiFi
Enterprise IIoT Edge Processing with Apache NiFi
 
Deploying Enterprise Search in PLM Context with Aras
Deploying Enterprise Search in PLM Context with ArasDeploying Enterprise Search in PLM Context with Aras
Deploying Enterprise Search in PLM Context with Aras
 
UPA 2011 - Better Usability Through Visualization
UPA 2011 - Better Usability Through VisualizationUPA 2011 - Better Usability Through Visualization
UPA 2011 - Better Usability Through Visualization
 
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedInDataEngConf SF16 - Methods for Content Relevance at LinkedIn
DataEngConf SF16 - Methods for Content Relevance at LinkedIn
 

More from Sematext Group, Inc.

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
Sematext Group, Inc.
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
Sematext Group, Inc.
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
Sematext Group, Inc.
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
Sematext Group, Inc.
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
Sematext Group, Inc.
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
Sematext Group, Inc.
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
Sematext Group, Inc.
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
Sematext Group, Inc.
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Sematext Group, Inc.
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
Sematext Group, Inc.
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
Sematext Group, Inc.
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
Sematext Group, Inc.
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Sematext Group, Inc.
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Sematext Group, Inc.
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
Sematext Group, Inc.
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
Sematext Group, Inc.
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
Sematext Group, Inc.
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleSematext Group, Inc.
 

More from Sematext Group, Inc. (20)

Tweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities ExplainedTweaking the Base Score: Lucene/Solr Similarities Explained
Tweaking the Base Score: Lucene/Solr Similarities Explained
 
OOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM appsOOPs, OOMs, oh my! Containerizing JVM apps
OOPs, OOMs, oh my! Containerizing JVM apps
 
Is observability good for your brain?
Is observability good for your brain?Is observability good for your brain?
Is observability good for your brain?
 
Introducing log analysis to your organization
Introducing log analysis to your organization Introducing log analysis to your organization
Introducing log analysis to your organization
 
Solr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for YouSolr Search Engine: Optimize Is (Not) Bad for You
Solr Search Engine: Optimize Is (Not) Bad for You
 
Solr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the UglySolr on Docker - the Good, the Bad and the Ugly
Solr on Docker - the Good, the Bad and the Ugly
 
Monitoring and Log Management for
Monitoring and Log Management forMonitoring and Log Management for
Monitoring and Log Management for
 
Introduction to solr
Introduction to solrIntroduction to solr
Introduction to solr
 
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & KafkaBuilding Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
Building Resilient Log Aggregation Pipeline with Elasticsearch & Kafka
 
Elasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep diveElasticsearch for Logs & Metrics - a deep dive
Elasticsearch for Logs & Metrics - a deep dive
 
How to Run Solr on Docker and Why
How to Run Solr on Docker and WhyHow to Run Solr on Docker and Why
How to Run Solr on Docker and Why
 
Tuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for LogsTuning Solr & Pipeline for Logs
Tuning Solr & Pipeline for Logs
 
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on DockerRunning High Performance & Fault-tolerant Elasticsearch Clusters on Docker
Running High Performance & Fault-tolerant Elasticsearch Clusters on Docker
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
 
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on DockerRunning High Performance and Fault Tolerant Elasticsearch Clusters on Docker
Running High Performance and Fault Tolerant Elasticsearch Clusters on Docker
 
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
Large Scale Log Analytics with Solr (from Lucene Revolution 2015)
 
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
From Zero to Production Hero: Log Analysis with Elasticsearch (from Velocity ...
 
Docker Logging Webinar
Docker Logging  WebinarDocker Logging  Webinar
Docker Logging Webinar
 
Docker Monitoring Webinar
Docker Monitoring  WebinarDocker Monitoring  Webinar
Docker Monitoring Webinar
 
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at ScaleMetrics, Logs, Transaction Traces, Anomaly Detection at Scale
Metrics, Logs, Transaction Traces, Anomaly Detection at Scale
 

Recently uploaded

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
Abida Shariff
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 

Recently uploaded (20)

"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 

Search Analytics Business Value & NoSQL Backend

  • 1. Search Analytics Business Value & NoSQL Backend Otis Gospodnetić – Sematext International @otisg ◦ @sematext ◦ sematext.com sematext.com/search-analytics
  • 2. About Otis Gospodnetić • ASF Member: Lucene, Solr, Nutch, Mahout • Author: Lucene in Action 1 & 2 • Entrepreneur: Sematext, Simpy 2 Copyright 2011 Sematext Int'l. All rights reserved.
  • 3. Sematext Metrics ● 100% organic: no GMO, no VC ● 4 years old ● < 10 people ● 7 countries ● 3 timezones ● 2 continents ● > 100 customers 3 Copyright 2011 Sematext Int'l. All rights reserved.
  • 4. About Sematext Products & Services Consulting, Development, Tech Support: ● Search (Lucene, Solr, ElasticSearch...) ● Big Data (Hadoop, HBase, Voldemort...) ● Web Crawling (Nutch, Droids) ● Machine Learning (Mahout) 4 Copyright 2011 Sematext Int'l. All rights reserved.
  • 5. Agenda ● What is Search Analytics and why it matters ● Example reports and their value ● What we built, why, and how 5 Copyright 2011 Sematext Int'l. All rights reserved.
  • 6. Communication ● twitter.com/sematext ● twitter.com/otisg ● hash tags: #stsa or #stanalytics ● http://sematext.com/search-analytics/index.html ● Raise your hand! ● otis@sematext.com 6 Copyright 2011 Sematext Int'l. All rights reserved.
  • 7. The Compass Search logs are your Map Search Analytics is your Compass 7 Copyright 2011 Sematext Int'l. All rights reserved.
  • 8. High Level Why search users search experience search providers 8 Copyright 2011 Sematext Int'l. All rights reserved.
  • 9. High Level Why This search sucks! It takes 17 tries to find anything here! F!?@#$%^&?!? search users search experience search providers Cool, the latest search tweaks made our site really sticky! Awesome! 9 Copyright 2011 Sematext Int'l. All rights reserved.
  • 10. Don't Be Like This Dude 10 Copyright 2011 Sematext Int'l. All rights reserved.
  • 11. Got Clue? Performance Monitoring Tuning Search Analytics UI Quality Assurance 11 Copyright 2011 Sematext Int'l. All rights reserved.
  • 12. More Concrete Why ● Measure and monitor everything. Introspection. ● Supports (re)design, navigation choices ● Helps with content acquisition & enhancement ● Improve search experience ● Mula 12 Copyright 2011 Sematext Int'l. All rights reserved.
  • 13. The Moment of Truth Question for the audience #1 What do you use for Search Analytics? a) Home grown stuff b) Google Analytics c) Omniture d) Webtrends e) Other f ) Nothing 13 Copyright 2011 Sematext Int'l. All rights reserved.
  • 14. Search Analytics Outline ● Collect: queries & clicks & interactions & ... ● Analyze: actions / xactions / conversions ● Output: reports – over time ● Output++: feedback loop remember this ● The means, not the goal ● Ongoing, not one-off 14 Copyright 2011 Sematext Int'l. All rights reserved.
  • 15. Search vs. Web Analytics ● User intent and information needs vs. inferring ● Hand in hand ● Ideally you can relate data from both or even unify it 15 Copyright 2011 Sematext Int'l. All rights reserved.
  • 16. Example Core Reports ● Rate & Volume, Latency (mean, avg, 90%) ● Click Through Rate, Mean Reciprocal Rank ● Top Queries by count, clicks, 0 hits... ● Query Trending ● Top Seen Docs, Top Clicked Docs (msft) ● Page & Click Depth ● Facet & Sort Usage ● ... 16 Copyright 2011 Sematext Int'l. All rights reserved.
  • 17. More Reports in More Detail ● See Search Analytics What? Why? How? http://blog.sematext.com/tag/analytics/ 17 Copyright 2011 Sematext Int'l. All rights reserved.
  • 18. Part Dos Switching gears... Juno digs NoSQL 18 Copyright 2011 Sematext Int'l. All rights reserved.
  • 19. What We've Built ● Search Analytics SaaS ● Numerous reports (e.g. query volume, rate, latency, term frequencies / comparisons, hit buckets, search origins, etc.) ● Trending over time ● Comparisons of time periods ● Top N reports ● Filter, slice and dice 19 Copyright 2011 Sematext Int'l. All rights reserved.
  • 20. Who Needs a Compass? ● We need it ● search-hadoop.com & search-lucene.com ● Our customers need it! ● You? 20 Copyright 2011 Sematext Int'l. All rights reserved.
  • 21. Sematext Search Analytics 21 Copyright 2011 Sematext Int'l. All rights reserved.
  • 22. Big Dreams ● SaaS ● Multitenant ● Large Scale – Massive Data ● Cloud 22 Copyright 2011 Sematext Int'l. All rights reserved.
  • 23. Storage Choices ● RDBMS: MySQL, PostgreSQL ● HDFS ● Hive ● HBase ● Cassandra 23 Copyright 2011 Sematext Int'l. All rights reserved.
  • 24. SaaS vs. In-House Question for the audience #2 SaaS vs in-house Search Analytics? a) SaaS b) in-house 24 Copyright 2011 Sematext Int'l. All rights reserved.
  • 25. Sematext Search Analytics 25 Copyright 2011 Sematext Int'l. All rights reserved.
  • 26. Sematext Search Analytics 26 Copyright 2011 Sematext Int'l. All rights reserved.
  • 27. Sematext Search Analytics 27 Copyright 2011 Sematext Int'l. All rights reserved.
  • 28. Sematext Search Analytics 28 Copyright 2011 Sematext Int'l. All rights reserved.
  • 29. Data Flow ● See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ 29 Copyright 2011 Sematext Int'l. All rights reserved.
  • 30. Data Collection ● See Search Analytics with Flume and HBase http://blog.sematext.com/2010/10/16/search-analytics-hadoop-world-flume-hbase/ 30 Copyright 2011 Sematext Int'l. All rights reserved.
  • 31. Core Tech ● JavaScript Beacons ● Metric Capture Web App aka Receiver ● Flume Agents, Collectors, Sinks ● HBase ● MapReduce Aggregations ● Search Analytics Reporting Web App 31 Copyright 2011 Sematext Int'l. All rights reserved.
  • 32. What is Flume ● Distributed data/log collection service ● Scalable, configurable, extensible ● Centrally manageable, open source ● Agents get data from app, Collectors save it ● Abstractions: Source → Decorator(s) → Sink 32 Copyright 2011 Sematext Int'l. All rights reserved.
  • 33. What is HBase ● Scalable, reliable, distributed, column-oriented DB ● On top of HDFS ● MapReducable 33 Copyright 2011 Sematext Int'l. All rights reserved.
  • 34. Data Flow, Detailed 34 Copyright 2011 Sematext Int'l. All rights reserved.
  • 35. Why Flume ● Reliable delivery ● e.g. queue msgs locally if destination unreachable ● Easy, centralized management via Web UI or console ● Good community, good progress, now @ASF ● But: more complex, more moving parts ● On Flume: slideshare.net/cloudera/inside-flume ● Alternatives: Kafka, Scribe... 35 Copyright 2011 Sematext Int'l. All rights reserved.
  • 36. Why HBase ● Scalable raw & aggregate data storage ● MapReduce data input ● Fast scans for time ranges, fast key lookups ● Easy storage and compute power expansion ● Good looking roadmap, community, progress 36 Copyright 2011 Sematext Int'l. All rights reserved.
  • 37. Open Sourcing ● 2 open-source projects: github.com/sematext/HBaseWD github.com/sematext/HBaseHUT ● See sematext.com/open-source/index.html ● Patches for Flume and HBase blog.sematext.com/tag/flume/ 37 Copyright 2011 Sematext Int'l. All rights reserved.
  • 38. Challenges ● Data size. Solutions: ● Compression (4-5x smaller with lzo) ● Data pruning (variable levels) ● Query string distribution: very long-tail ● Lots of data to process, update, aggregate ● Young tools: Flume, HBase ● Poor IO on EC2 ● Hadoop distributions 38 Copyright 2011 Sematext Int'l. All rights reserved.
  • 39. Output++ ● AutoComplete - $MM improvement ● Better DYM Spellchecker ● Related Searches ● Recommendations ● Relevance Feedback ● ... 39 Copyright 2011 Sematext Int'l. All rights reserved.
  • 40. Closing the Loop search users search experience search providers 40 Copyright 2011 Sematext Int'l. All rights reserved.
  • 41. Resource Search Analytics for Your Site Louis Rosenfeld http://rosenfeldmedia.com/books/searchanalytics/ 41 Copyright 2011 Sematext Int'l. All rights reserved.
  • 42. We're Hiring Dig Search? Dig Analytics? Dig Big Data? Dig Performance? Dig working with and in open-source? We're hiring world-wide! http://sematext.com/about/jobs.html 42 Copyright 2011 Sematext Int'l. All rights reserved.
  • 43. Contact sematext.com blog.sematext.com @sematext @otisg otis@sematext.com Want SA? Grab me or go to: sematext.com/search-analytics Hash tags: #stsa or #stanalytics 43 Copyright 2011 Sematext Int'l. All rights reserved.