• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 

Hadoop Summit EU - Crowd Sourcing Reflected Intelligence

on

  • 587 views

A live replay of the webinar that Grant and I gave last fall, this time at Hadoop Summit EU.

A live replay of the webinar that Grant and I gave last fall, this time at Hadoop Summit EU.

Statistics

Views

Total Views
587
Views on SlideShare
587
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • TED: We can tighten or loosen as necessary.
  • TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  • Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  • All that revolution is good, but what the heck does this have to do w/ Big Data?
  • GSI: needs a bit more meat
  • Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time

Hadoop Summit EU - Crowd Sourcing Reflected Intelligence Hadoop Summit EU - Crowd Sourcing Reflected Intelligence Presentation Transcript

  • 1©MapR Technologies - Confidential Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning Ivan Provalov
  • 2©MapR Technologies - Confidential Ted’s Background  Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big  Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR – Chief Application Architect  Founding member of Apache Drill
  • 3©MapR Technologies - Confidential Ivan’s Background  Sr. Research Engineer at LucidWorks  NLP, IR  Agile development
  • 4©MapR Technologies - Confidential Agenda  Intro  Search Evolution and Search Revolution  Reflected Intelligence Use Cases  Building a Next Generation Search and Discovery Platform – MapR – LucidWorks  1+1=3
  • 5©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User
  • 6©MapR Technologies - Confidential Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data • Search, Discovery and Analytics are the key to unlocking this view of users and data Search new pope
  • 7©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User Reflected Intelligence
  • 8©MapR Technologies - Confidential Search (R)evolution  Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply  Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW!  Scale  “This ain’t your father’s relevance anymore”
  • 9©MapR Technologies - Confidential Search, Discovery and Analytics  Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s – start with engineered knowledge, refine with user feedback  Large-scale discovery features encourage experimentation  Always test, always enrich! Search DiscoveryAnalytics
  • 10©MapR Technologies - Confidential Social Media Analysis in Telecom  Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time  Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction
  • 11©MapR Technologies - Confidential Provenance is 80% of Value  Analysis of social media to determine advertising reach and response  In one case the same untargeted advertising was worth 5x if sold with supporting data.
  • 12©MapR Technologies - Confidential Claims Analysis  Goal – Insurance claims processing and analysis – fraud analysis  Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations  Results – Questions that took 24-48 hours now take seconds to answer
  • 13©MapR Technologies - Confidential Virginia Tech - Help the World  Grab data around crisis  Search immediately  Large-scale analysis enriches data to find ways to improve responses and understanding  http://www.ctrnet.net
  • 14©MapR Technologies - Confidential Bright Planet - Catch the Bad Guys  Online Drug Counterfeit detection  Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it  Feed to analyst via search-driven application – enrich based on analysts feedback
  • 15©MapR Technologies - Confidential Veoh - Cross Recommendations  Cross recommendation as search – with search used to build cross recommendation!  Recommend content to people who exhibit certain behaviors (clicks, query terms, other)  (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior
  • 16©MapR Technologies - Confidential What Platform Do You Need?  Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting  Large scale, cost effective storage and processing capabilities  NLP and machine learning tools that scale to enhance discovery and analysis  Integrated log analysis workflows that close the loop between the raw data and user interactions
  • 17©MapR Technologies - Confidential Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/histor ic data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classification, NLP, topic identification, search log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 18©MapR Technologies - Confidential MapR  MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity  MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops
  • 19©MapR Technologies - Confidential LucidWorks  LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts  LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more  LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks
  • 20©MapR Technologies - Confidential LucidWorks Big Data Inputs API MgmtSearch, Discovery, Analytics Processing & Storage Analytics Service Document Service Big Data LucidWorks Web HDFS Admin Service Mgmt Data Mgmt Provisioning, Monitoring & Configuration
  • 21©MapR Technologies - Confidential Easy Wins  Analyze logs from application stored in MapR  Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes  Snapshots make backups a snap – A lot like version control for data  LucidWorks 2.5.2 easily connects with MapR
  • 22©MapR Technologies - Confidential 1 + 1 = 3
  • 23©MapR Technologies - Confidential Learn More  Talk to Ted (we are hiring) @ted_dunning tdunning@maprtech.com  Talk to Ivan (we are hiring) @iprovalov  MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #hadoopsummit #lucidworks