MapR LucidWorks Joint Webinar 121211

Crowd Sourcing Reflected
Intelligence Using Search and Big
Data
Ted Dunning
Grant Ingersoll

©MapR Technologies - Confidential 1

Grant’s Background

 Co-founder:
– LucidWorks – Chief Scientist
– Apache Mahout
 Long time Lucene/Solr committer
 Author: Taming Text
 Background in IR and NLP
– Built CLIR, QA and a variety of other search-based apps


Ted’s Background

 Academia, Startups
– Aptex, MusicMatch, ID Analytics, Veoh
– Big data since before big
 Open source
– since the dark ages before the internet
– Mahout, Zookeeper, Drill
– bought the beer at first HUG
 MapR
– Chief Application Architect
 Founding member of Apache Drill


Agenda

 Intro
 Search Evolution and Search Revolution
 Reflected Intelligence Use Cases
 Building a Next Generation Search and Discovery Platform
– MapR
– LucidWorks
 1+1=3


Search is Dead, Long Live Search

 Search is a system building block Content
– text is only a part of the story

 If the algorithms fit,
use them! Content User
Relationships Interaction

 Embrace fuzziness!

 Scoring features are everywhere Access


Search (R)evolution

 Search use leads to search abuse
– denormalization frees your mind
– scoring is just a sparse matrix multiply

 Lucene/Solr evolution
– non free text usages abound
– many DB-like features
– noSQL before NoSQL was cool
– flexible indexing
– finite State Transducers FTW!

 Scale

 “This ain’t your father’s relevance anymore”


Add (Lots of) Water

 Large-scale analysis is key to reflected intelligence
– correlation analysis
• based on queries, clicks, mouse tracks,
even explicit feedback
• produce clusters, trends, topics, SIP’s Search
– start with engineered knowledge,
refine with user feedback
 Large-scale discovery features
encourage experimentation

 Always test, always enrich! Analytics Discovery


Social Media Analysis in Telecom

 Correlate mobile traffic analysis with social media analysis
– events cause traffic micro-bursts
– participants tweet the events ahead of time
 Deploy operations faster to predict outages and better handle
emergency situations
– high cost bandwidth augmentation can be marshaled as the traffic appears
– anticipation beats reaction


Provenance is 80% of value

 Analysis of social media to determine advertising reach and
response

 In one case the same untargeted advertising was worth 5x if sold
with supporting data.


Claims Analysis

 Goal
– Insurance claims processing and analysis
– fraud analysis
 Method
– Combine free text search with metadata analysis to identify high risk
activities across the country
– Integrate with corporate workflows to detect and fix outliers in customer
relations
 Results
– Questions that took 24-48 hours now take seconds to answer


Virginia Tech - Help the World

 Grab data around crisis
 Search immediately
 Large-scale analysis enriches data to find
ways to improve responses and
understanding
 http://www.ctrnet.net


Bright Planet - Catch the Bad Guys

 Online Drug Counterfeit detection
 Identify commonly used language indicating counterfeits
– you know it when you see it
– and you know you have seen it
 Feed to analyst via search-driven application
– enrich based on analysts feedback


Veoh - Cross Recommendations

 Cross recommendation as search
– with search used to build cross recommendation!
 Recommend content to people who exhibit certain behaviors
(clicks, query terms, other)
 (Ab)use of a search engine
– but not as a search engine for content
– more like a search engine for behavior


What Platform Do You Need?

 Fast, efficient, scalable search
– bulk and near real-time indexing
– handle billions of records with sub-second search and faceting

 Large scale, cost effective storage and processing capabilities

 NLP and machine learning tools that scale to enhance discovery
and analysis

 Integrated log analysis workflows that close the loop between the
raw data and user interactions


Reference Architecture
Access APIs
•View into
Search View Analytic numeric/histo Personalization &
ric data
1 Services Machine Learning
2 Services
Shards 3 N
•Classification
•Recommendation

Document •Documents Classification Models
Discovery & •Users
Enrichment Store In memory
•Logs Replicated
Clustering,
classification, NLP, Multi-tenant
topic identification,
search log analysis,
user behavior
Content Acquisition
ETL, batch or near
real-time

Data
• LucidWorks Search
connectors
• Push


MapR

 MapR provides the technology leading Hadoop distribution
– full eco-system distribution
– integrated data platform
– complete solution for data integrity
 MapR clusters also provide tight integration with search
technologies like LucidWorks
– integration is key for effective ops


LucidWorks

 LucidWorks provides the leading packaging of Apache Lucene and
Solr
– build your own, we support
– founded by the most prominent Lucene/Solr experts
 LucidWorks Search
– “Solr++”
• UI, REST API, MapR connectors, relevance tools, much more
 LucidWorks Big Data
– Big Data as a Service
– Integrated LucidWorks Search, Hadoop, machine learning with prebuilt
workflows for many of these tasks


LucidWorks Big Data Architecture

Uniform ReST API

Content Search – Discovery – Analytics System
• LucidWorks Search
Acquisition • Machine Learning (classification, clustering, Management
recommendations)
• Administration
• Natural Language Processing
• Enterprise
• SQL (Hive) Interface
Repository • Provisioning
• Data Workflows (ETL, log analysis, common metrics)
• Extensible
• Social Media • Monitoring

• Databases Big Data Operating System • Configuration

• HDFS • Service Management

• Cloud (S3) • Data Management

• Push • Security
Hadoop/HBase Search Search
Logs Indexes


Easy Wins

 Analyze logs from application stored in MapR
 Seamlessly store search indexes in MapR
– and feed to Pig, Mahout and others
– use mirrors + NFS to directly deploy indexes
 Snapshots make backups a snap
 LucidWorks 2.5 (2013 Q1) easily connects with MapR


1+1=3


Learn More

 More information
http://www.mapr.com/company/events/lucidworks-12-13-2012
 Vote for this topic for Hadoop Summit EU:
http://bit.ly/128tLQe
 Talk to Ted
@ted_dunning
tdunning@maprtech.com
 Talk to Grant
@gsingers
 MapR and Lucid Works
http://www.mapr.com
http://www.lucidworks.com


MapR LucidWorks Joint Webinar 121211

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to MapR LucidWorks Joint Webinar 121211

Similar to MapR LucidWorks Joint Webinar 121211 (20)

More from MapR Technologies

More from MapR Technologies (20)

Recently uploaded

Recently uploaded (20)

MapR LucidWorks Joint Webinar 121211

Editor's Notes