SlideShare a Scribd company logo
1 of 21
Crowd Sourcing Reflected
Intelligence Using Search and Big
Data
Ted Dunning
Grant Ingersoll

©MapR Technologies - Confidential   1
Grant’s Background

     Co-founder:
       –   LucidWorks – Chief Scientist
       –   Apache Mahout
     Long time Lucene/Solr committer
     Author: Taming Text
     Background in IR and NLP
       –   Built CLIR, QA and a variety of other search-based apps




©MapR Technologies - Confidential             2
Ted’s Background

     Academia, Startups
       –   Aptex, MusicMatch, ID Analytics, Veoh
       –   Big data since before big
     Open source
       –   since the dark ages before the internet
       –   Mahout, Zookeeper, Drill
       –   bought the beer at first HUG
     MapR
       –   Chief Application Architect
     Founding member of Apache Drill




©MapR Technologies - Confidential             3
Agenda

     Intro
     Search Evolution and Search Revolution
     Reflected Intelligence Use Cases
     Building a Next Generation Search and Discovery Platform
       –   MapR
       –   LucidWorks
     1+1=3




©MapR Technologies - Confidential        4
Search is Dead, Long Live Search


     Search is a system building block                       Content
       –   text is only a part of the story


     If the algorithms fit,
                             use them!          Content                    User
                                              Relationships             Interaction


     Embrace fuzziness!


     Scoring features are everywhere                         Access




©MapR Technologies - Confidential                  5
Search (R)evolution

     Search use leads to search abuse
       –   denormalization frees your mind
       –   scoring is just a sparse matrix multiply

     Lucene/Solr evolution
       –   non free text usages abound
       –   many DB-like features
       –   noSQL before NoSQL was cool
       –   flexible indexing
       –   finite State Transducers FTW!

     Scale

     “This ain’t your father’s relevance anymore”


©MapR Technologies - Confidential                 6
Add (Lots of) Water

     Large-scale analysis is key to reflected intelligence
       –   correlation analysis
            • based on queries, clicks, mouse tracks,
                   even explicit feedback
            • produce clusters, trends, topics, SIP’s              Search
       –   start with engineered knowledge,
                 refine with user feedback
     Large-scale discovery features
        encourage experimentation

     Always test, always enrich!                      Analytics            Discovery




©MapR Technologies - Confidential                  7
Social Media Analysis in Telecom

     Correlate mobile traffic analysis with social media analysis
       –   events cause traffic micro-bursts
       –   participants tweet the events ahead of time
     Deploy operations faster to predict outages and better handle
      emergency situations
       –   high cost bandwidth augmentation can be marshaled as the traffic appears
       –   anticipation beats reaction




©MapR Technologies - Confidential            8
Provenance is 80% of value

     Analysis of social media to determine advertising reach and
      response


     In one case the same untargeted advertising was worth 5x if sold
      with supporting data.




©MapR Technologies - Confidential     9
Claims Analysis

     Goal
       –   Insurance claims processing and analysis
       –   fraud analysis
     Method
       –   Combine free text search with metadata analysis to identify high risk
           activities across the country
       –   Integrate with corporate workflows to detect and fix outliers in customer
           relations
     Results
       –   Questions that took 24-48 hours now take seconds to answer




©MapR Technologies - Confidential            10
Virginia Tech - Help the World

     Grab data around crisis
     Search immediately
     Large-scale analysis enriches data to find
      ways to improve responses and
      understanding
     http://www.ctrnet.net




©MapR Technologies - Confidential      11
Bright Planet - Catch the Bad Guys

     Online Drug Counterfeit detection
     Identify commonly used language indicating counterfeits
       –   you know it when you see it
       –   and you know you have seen it
     Feed to analyst via search-driven application
       –   enrich based on analysts feedback




©MapR Technologies - Confidential              12
Veoh - Cross Recommendations

     Cross recommendation as search
       –   with search used to build cross recommendation!
     Recommend content to people who exhibit certain behaviors
      (clicks, query terms, other)
     (Ab)use of a search engine
       –   but not as a search engine for content
       –   more like a search engine for behavior




©MapR Technologies - Confidential            13
What Platform Do You Need?

     Fast, efficient, scalable search
       –   bulk and near real-time indexing
       –   handle billions of records with sub-second search and faceting


     Large scale, cost effective storage and processing capabilities

     NLP and machine learning tools that scale to enhance discovery
      and analysis


     Integrated log analysis workflows that close the loop between the
      raw data and user interactions


©MapR Technologies - Confidential            14
Reference Architecture
                                                                    Access APIs
                                                                                    •View into
                                               Search View             Analytic      numeric/histo   Personalization &
                                                                                     ric data
                               1                                       Services                      Machine Learning
                                     2                                                                   Services
                          Shards           3                N
                                                                                                            •Classification
                                                                                                            •Recommendation

                                                                         Document       •Documents   Classification Models
                                    Discovery &                                         •Users
                                    Enrichment                             Store                        In memory
                                                                                        •Logs           Replicated
                                    Clustering,
                                    classification, NLP,                                                Multi-tenant
                                    topic identification,
                                    search log analysis,
                                    user behavior
                                                                Content Acquisition
                                                                   ETL, batch or near
                                                                   real-time


                                    Data
                    • LucidWorks Search
                      connectors
                    • Push




©MapR Technologies - Confidential                                           15
MapR

     MapR provides the technology leading Hadoop distribution
       –   full eco-system distribution
       –   integrated data platform
       –   complete solution for data integrity
     MapR clusters also provide tight integration with search
      technologies like LucidWorks
       –   integration is key for effective ops




©MapR Technologies - Confidential                 16
LucidWorks

     LucidWorks provides the leading packaging of Apache Lucene and
      Solr
       –   build your own, we support
       –   founded by the most prominent Lucene/Solr experts
     LucidWorks Search
       –   “Solr++”
            •    UI, REST API, MapR connectors, relevance tools, much more
     LucidWorks Big Data
       –   Big Data as a Service
       –   Integrated LucidWorks Search, Hadoop, machine learning with prebuilt
           workflows for many of these tasks




©MapR Technologies - Confidential                  17
LucidWorks Big Data Architecture

                                              Uniform ReST API

                     Content        Search – Discovery – Analytics                                System
                                      • LucidWorks Search
                    Acquisition       • Machine Learning (classification, clustering,           Management
                                        recommendations)
                                                                                             • Administration
                                      • Natural Language Processing
            • Enterprise
                                      • SQL (Hive) Interface
              Repository                                                                     • Provisioning
                                      • Data Workflows (ETL, log analysis, common metrics)
                                      • Extensible
            • Social Media                                                                   • Monitoring

            • Databases                  Big Data Operating System                           • Configuration

            • HDFS                                                                           • Service Management

            • Cloud (S3)                                                                     • Data Management

            • Push                                                                           • Security
                                     Hadoop/HBase        Search            Search
                                                          Logs            Indexes




©MapR Technologies - Confidential                          18
Easy Wins

     Analyze logs from application stored in MapR
     Seamlessly store search indexes in MapR
       –   and feed to Pig, Mahout and others
       –   use mirrors + NFS to directly deploy indexes
     Snapshots make backups a snap
     LucidWorks 2.5 (2013 Q1) easily connects with MapR




©MapR Technologies - Confidential             19
1+1=3



©MapR Technologies - Confidential     20
Learn More

     More information
       http://www.mapr.com/company/events/lucidworks-12-13-2012
     Vote for this topic for Hadoop Summit EU:
       http://bit.ly/128tLQe
     Talk to Ted
       @ted_dunning
       tdunning@maprtech.com
     Talk to Grant
       @gsingers
     MapR and Lucid Works
       http://www.mapr.com
       http://www.lucidworks.com


©MapR Technologies - Confidential      21

More Related Content

What's hot

Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19
jasonfrantz
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
Nojan Emad
 

What's hot (20)

Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012Drill lightning-london-big-data-10-01-2012
Drill lightning-london-big-data-10-01-2012
 
Dealing with an Upside Down Internet
Dealing with an Upside Down InternetDealing with an Upside Down Internet
Dealing with an Upside Down Internet
 
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
Big Data Meets HPC - Exploiting HPC Technologies for Accelerating Big Data Pr...
 
MapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document DatabaseMapR-DB – The First In-Hadoop Document Database
MapR-DB – The First In-Hadoop Document Database
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
 
MapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR 5.2: Getting More Value from the MapR Converged Community Edition
MapR 5.2: Getting More Value from the MapR Converged Community Edition
 
Big Data Performance and Capacity Management
Big Data Performance and Capacity ManagementBig Data Performance and Capacity Management
Big Data Performance and Capacity Management
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Philly DB MapR Overview
Philly DB MapR OverviewPhilly DB MapR Overview
Philly DB MapR Overview
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale SystemsDesigning HPC, Deep Learning, and Cloud Middleware for Exascale Systems
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
 
Best Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data PlatformBest Practices for Protecting Sensitive Data Across the Big Data Platform
Best Practices for Protecting Sensitive Data Across the Big Data Platform
 
Hug france-2012-12-04
Hug france-2012-12-04Hug france-2012-12-04
Hug france-2012-12-04
 
Architectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop DistributionArchitectural Overview of MapR's Apache Hadoop Distribution
Architectural Overview of MapR's Apache Hadoop Distribution
 
Realtime Computation with Storm
Realtime Computation with StormRealtime Computation with Storm
Realtime Computation with Storm
 
Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19Drill Bay Area HUG 2012-09-19
Drill Bay Area HUG 2012-09-19
 
Inside MapR's M7
Inside MapR's M7Inside MapR's M7
Inside MapR's M7
 
Hadoop as a Platform for Genomics
Hadoop as a Platform for GenomicsHadoop as a Platform for Genomics
Hadoop as a Platform for Genomics
 
Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016Back to School - St. Louis Hadoop Meetup September 2016
Back to School - St. Louis Hadoop Meetup September 2016
 
Azure_Business_Opportunity
Azure_Business_OpportunityAzure_Business_Opportunity
Azure_Business_Opportunity
 
Hadoop programming
Hadoop programmingHadoop programming
Hadoop programming
 

Viewers also liked

P-BuyingaHomeSummer2016
P-BuyingaHomeSummer2016P-BuyingaHomeSummer2016
P-BuyingaHomeSummer2016
Jean Venezia
 
P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016
Jean Venezia
 

Viewers also liked (15)

Cach duoi ruoi hieu qua nhat - Tinhdautram.info
Cach duoi ruoi hieu qua nhat - Tinhdautram.infoCach duoi ruoi hieu qua nhat - Tinhdautram.info
Cach duoi ruoi hieu qua nhat - Tinhdautram.info
 
Muralidhar
MuralidharMuralidhar
Muralidhar
 
Giacomo bertoldi seminar_30_padova_08
Giacomo bertoldi seminar_30_padova_08Giacomo bertoldi seminar_30_padova_08
Giacomo bertoldi seminar_30_padova_08
 
V1v2v3v4 verb
V1v2v3v4 verbV1v2v3v4 verb
V1v2v3v4 verb
 
lam the nao de sua van xa nuoc bon cau - Williamcuong.com
lam the nao de sua van xa nuoc bon cau - Williamcuong.comlam the nao de sua van xa nuoc bon cau - Williamcuong.com
lam the nao de sua van xa nuoc bon cau - Williamcuong.com
 
8 cach tri ran da sau sinh hieu qua bang phuong phap tu nhien
8 cach tri ran da sau sinh hieu qua bang phuong phap tu nhien8 cach tri ran da sau sinh hieu qua bang phuong phap tu nhien
8 cach tri ran da sau sinh hieu qua bang phuong phap tu nhien
 
Sharing scores with Scorewave!
Sharing scores with Scorewave!Sharing scores with Scorewave!
Sharing scores with Scorewave!
 
Cach su dung tinh dau đúng cách - Tinhdautram.info
Cach su dung tinh dau đúng cách - Tinhdautram.infoCach su dung tinh dau đúng cách - Tinhdautram.info
Cach su dung tinh dau đúng cách - Tinhdautram.info
 
Cach tri nghet mũi cho tre so sinh hieu qua - tinhdautram.info
Cach tri nghet mũi cho tre so sinh hieu qua - tinhdautram.infoCach tri nghet mũi cho tre so sinh hieu qua - tinhdautram.info
Cach tri nghet mũi cho tre so sinh hieu qua - tinhdautram.info
 
Cannabis y psoriasis
Cannabis y psoriasisCannabis y psoriasis
Cannabis y psoriasis
 
Mô hình-osi
Mô hình-osiMô hình-osi
Mô hình-osi
 
chương 4 - TCP/IP - mạng máy tính
chương 4 - TCP/IP - mạng máy tínhchương 4 - TCP/IP - mạng máy tính
chương 4 - TCP/IP - mạng máy tính
 
Chap002
Chap002Chap002
Chap002
 
P-BuyingaHomeSummer2016
P-BuyingaHomeSummer2016P-BuyingaHomeSummer2016
P-BuyingaHomeSummer2016
 
P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016P-SellingYourHouseSummer2016
P-SellingYourHouseSummer2016
 

Similar to MapR LucidWorks Joint Webinar 121211

Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
DataWorks Summit
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Grant Ingersoll
 

Similar to MapR LucidWorks Joint Webinar 121211 (20)

MapR lucidworks joint webinar
MapR lucidworks joint webinarMapR lucidworks joint webinar
MapR lucidworks joint webinar
 
Crowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over HadoopCrowd-Sourced Intelligence Built into Search over Hadoop
Crowd-Sourced Intelligence Built into Search over Hadoop
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, SolrLarge-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago20120419 linkedopendataandteamsciencemcguinnesschicago
20120419 linkedopendataandteamsciencemcguinnesschicago
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene Ecosystem
 
Teradata Loom Introductory Presentation
Teradata Loom Introductory PresentationTeradata Loom Introductory Presentation
Teradata Loom Introductory Presentation
 
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTIONDATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
DATAWEEK KEYNOTE: LARGE SCALE SEARCH, DISCOVERY AND ANALYSIS IN ACTION
 
Neo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to GraphsNeo4j GraphTalk Oslo - Introduction to Graphs
Neo4j GraphTalk Oslo - Introduction to Graphs
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
The Search Is Over: Integrating Solr and Hadoop in the Same Cluster to Simpli...
 
Data Governance for Data Lakes
Data Governance for Data LakesData Governance for Data Lakes
Data Governance for Data Lakes
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Big data beyond the hype may 2014
Big data beyond the hype may 2014Big data beyond the hype may 2014
Big data beyond the hype may 2014
 

More from MapR Technologies

More from MapR Technologies (20)

Converging your data landscape
Converging your data landscapeConverging your data landscape
Converging your data landscape
 
ML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & EvaluationML Workshop 2: Machine Learning Model Comparison & Evaluation
ML Workshop 2: Machine Learning Model Comparison & Evaluation
 
Self-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your DataSelf-Service Data Science for Leveraging ML & AI on All of Your Data
Self-Service Data Science for Leveraging ML & AI on All of Your Data
 
Enabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data CaptureEnabling Real-Time Business with Change Data Capture
Enabling Real-Time Business with Change Data Capture
 
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...
 
ML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning LogisticsML Workshop 1: A New Architecture for Machine Learning Logistics
ML Workshop 1: A New Architecture for Machine Learning Logistics
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
Live Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIsLive Tutorial – Streaming Real-Time Events Using Apache APIs
Live Tutorial – Streaming Real-Time Events Using Apache APIs
 
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale StorageBringing Structure, Scalability, and Services to Cloud-Scale Storage
Bringing Structure, Scalability, and Services to Cloud-Scale Storage
 
Live Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn PredictionLive Machine Learning Tutorial: Churn Prediction
Live Machine Learning Tutorial: Churn Prediction
 
An Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data PlatformAn Introduction to the MapR Converged Data Platform
An Introduction to the MapR Converged Data Platform
 
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...
 
Best Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in HealthcareBest Practices for Data Convergence in Healthcare
Best Practices for Data Convergence in Healthcare
 
Geo-Distributed Big Data and Analytics
Geo-Distributed Big Data and AnalyticsGeo-Distributed Big Data and Analytics
Geo-Distributed Big Data and Analytics
 
MapR Product Update - Spring 2017
MapR Product Update - Spring 2017MapR Product Update - Spring 2017
MapR Product Update - Spring 2017
 
3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics3 Benefits of Multi-Temperature Data Management for Data Analytics
3 Benefits of Multi-Temperature Data Management for Data Analytics
 
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsCisco & MapR bring 3 Superpowers to SAP HANA Deployments
Cisco & MapR bring 3 Superpowers to SAP HANA Deployments
 
MapR and Cisco Make IT Better
MapR and Cisco Make IT BetterMapR and Cisco Make IT Better
MapR and Cisco Make IT Better
 
Evolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQLEvolving from RDBMS to NoSQL + SQL
Evolving from RDBMS to NoSQL + SQL
 

Recently uploaded

Recently uploaded (20)

2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 

MapR LucidWorks Joint Webinar 121211

  • 1. Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning Grant Ingersoll ©MapR Technologies - Confidential 1
  • 2. Grant’s Background  Co-founder: – LucidWorks – Chief Scientist – Apache Mahout  Long time Lucene/Solr committer  Author: Taming Text  Background in IR and NLP – Built CLIR, QA and a variety of other search-based apps ©MapR Technologies - Confidential 2
  • 3. Ted’s Background  Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big  Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR – Chief Application Architect  Founding member of Apache Drill ©MapR Technologies - Confidential 3
  • 4. Agenda  Intro  Search Evolution and Search Revolution  Reflected Intelligence Use Cases  Building a Next Generation Search and Discovery Platform – MapR – LucidWorks  1+1=3 ©MapR Technologies - Confidential 4
  • 5. Search is Dead, Long Live Search  Search is a system building block Content – text is only a part of the story  If the algorithms fit, use them! Content User Relationships Interaction  Embrace fuzziness!  Scoring features are everywhere Access ©MapR Technologies - Confidential 5
  • 6. Search (R)evolution  Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply  Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW!  Scale  “This ain’t your father’s relevance anymore” ©MapR Technologies - Confidential 6
  • 7. Add (Lots of) Water  Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s Search – start with engineered knowledge, refine with user feedback  Large-scale discovery features encourage experimentation  Always test, always enrich! Analytics Discovery ©MapR Technologies - Confidential 7
  • 8. Social Media Analysis in Telecom  Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time  Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction ©MapR Technologies - Confidential 8
  • 9. Provenance is 80% of value  Analysis of social media to determine advertising reach and response  In one case the same untargeted advertising was worth 5x if sold with supporting data. ©MapR Technologies - Confidential 9
  • 10. Claims Analysis  Goal – Insurance claims processing and analysis – fraud analysis  Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations  Results – Questions that took 24-48 hours now take seconds to answer ©MapR Technologies - Confidential 10
  • 11. Virginia Tech - Help the World  Grab data around crisis  Search immediately  Large-scale analysis enriches data to find ways to improve responses and understanding  http://www.ctrnet.net ©MapR Technologies - Confidential 11
  • 12. Bright Planet - Catch the Bad Guys  Online Drug Counterfeit detection  Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it  Feed to analyst via search-driven application – enrich based on analysts feedback ©MapR Technologies - Confidential 12
  • 13. Veoh - Cross Recommendations  Cross recommendation as search – with search used to build cross recommendation!  Recommend content to people who exhibit certain behaviors (clicks, query terms, other)  (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior ©MapR Technologies - Confidential 13
  • 14. What Platform Do You Need?  Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting  Large scale, cost effective storage and processing capabilities  NLP and machine learning tools that scale to enhance discovery and analysis  Integrated log analysis workflows that close the loop between the raw data and user interactions ©MapR Technologies - Confidential 14
  • 15. Reference Architecture Access APIs •View into Search View Analytic numeric/histo Personalization & ric data 1 Services Machine Learning 2 Services Shards 3 N •Classification •Recommendation Document •Documents Classification Models Discovery & •Users Enrichment Store In memory •Logs Replicated Clustering, classification, NLP, Multi-tenant topic identification, search log analysis, user behavior Content Acquisition ETL, batch or near real-time Data • LucidWorks Search connectors • Push ©MapR Technologies - Confidential 15
  • 16. MapR  MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity  MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops ©MapR Technologies - Confidential 16
  • 17. LucidWorks  LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts  LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more  LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks ©MapR Technologies - Confidential 17
  • 18. LucidWorks Big Data Architecture Uniform ReST API Content Search – Discovery – Analytics System • LucidWorks Search Acquisition • Machine Learning (classification, clustering, Management recommendations) • Administration • Natural Language Processing • Enterprise • SQL (Hive) Interface Repository • Provisioning • Data Workflows (ETL, log analysis, common metrics) • Extensible • Social Media • Monitoring • Databases Big Data Operating System • Configuration • HDFS • Service Management • Cloud (S3) • Data Management • Push • Security Hadoop/HBase Search Search Logs Indexes ©MapR Technologies - Confidential 18
  • 19. Easy Wins  Analyze logs from application stored in MapR  Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes  Snapshots make backups a snap  LucidWorks 2.5 (2013 Q1) easily connects with MapR ©MapR Technologies - Confidential 19
  • 20. 1+1=3 ©MapR Technologies - Confidential 20
  • 21. Learn More  More information http://www.mapr.com/company/events/lucidworks-12-13-2012  Vote for this topic for Hadoop Summit EU: http://bit.ly/128tLQe  Talk to Ted @ted_dunning tdunning@maprtech.com  Talk to Grant @gsingers  MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com ©MapR Technologies - Confidential 21

Editor's Notes

  1. TED: We can tighten or loosen as necessary.
  2. TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  3. TED: This is a money slide where people should say “Wow man”. They shouldn’t understand the implications of this, but they should be very, very aware that something big just slide into the room.Tech Building Block: Not just textNot just users + queriesEmbrace Fuzziness: Esp. in Big Data, it is the only way you are going to survive.TED: I think that this should make the case for advanced that is still search at its heart. The idea that search can be radically changed should be on the next slide.
  4. Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  5. All that revolution is good, but what the heck does this have to do w/ Big Data?
  6. GSI: needs a bit more meat
  7. Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time