Hadoop Summit EU - Crowd Sourcing Reflected Intelligence


Published on

A live replay of the webinar that Grant and I gave last fall, this time at Hadoop Summit EU.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • TED: We can tighten or loosen as necessary.
  • TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  • Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  • All that revolution is good, but what the heck does this have to do w/ Big Data?
  • GSI: needs a bit more meat
  • Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time
  • Hadoop Summit EU - Crowd Sourcing Reflected Intelligence

    1. 1. 1©MapR Technologies - Confidential Crowd Sourcing Reflected Intelligence Using Search and Big Data Ted Dunning Ivan Provalov
    2. 2. 2©MapR Technologies - Confidential Ted’s Background  Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big  Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG  MapR – Chief Application Architect  Founding member of Apache Drill
    3. 3. 3©MapR Technologies - Confidential Ivan’s Background  Sr. Research Engineer at LucidWorks  NLP, IR  Agile development
    4. 4. 4©MapR Technologies - Confidential Agenda  Intro  Search Evolution and Search Revolution  Reflected Intelligence Use Cases  Building a Next Generation Search and Discovery Platform – MapR – LucidWorks  1+1=3
    5. 5. 5©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User
    6. 6. 6©MapR Technologies - Confidential Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data • Search, Discovery and Analytics are the key to unlocking this view of users and data Search new pope
    7. 7. 7©MapR Technologies - Confidential User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User Reflected Intelligence
    8. 8. 8©MapR Technologies - Confidential Search (R)evolution  Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply  Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW!  Scale  “This ain’t your father’s relevance anymore”
    9. 9. 9©MapR Technologies - Confidential Search, Discovery and Analytics  Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s – start with engineered knowledge, refine with user feedback  Large-scale discovery features encourage experimentation  Always test, always enrich! Search DiscoveryAnalytics
    10. 10. 10©MapR Technologies - Confidential Social Media Analysis in Telecom  Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time  Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction
    11. 11. 11©MapR Technologies - Confidential Provenance is 80% of Value  Analysis of social media to determine advertising reach and response  In one case the same untargeted advertising was worth 5x if sold with supporting data.
    12. 12. 12©MapR Technologies - Confidential Claims Analysis  Goal – Insurance claims processing and analysis – fraud analysis  Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations  Results – Questions that took 24-48 hours now take seconds to answer
    13. 13. 13©MapR Technologies - Confidential Virginia Tech - Help the World  Grab data around crisis  Search immediately  Large-scale analysis enriches data to find ways to improve responses and understanding  http://www.ctrnet.net
    14. 14. 14©MapR Technologies - Confidential Bright Planet - Catch the Bad Guys  Online Drug Counterfeit detection  Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it  Feed to analyst via search-driven application – enrich based on analysts feedback
    15. 15. 15©MapR Technologies - Confidential Veoh - Cross Recommendations  Cross recommendation as search – with search used to build cross recommendation!  Recommend content to people who exhibit certain behaviors (clicks, query terms, other)  (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior
    16. 16. 16©MapR Technologies - Confidential What Platform Do You Need?  Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting  Large scale, cost effective storage and processing capabilities  NLP and machine learning tools that scale to enhance discovery and analysis  Integrated log analysis workflows that close the loop between the raw data and user interactions
    17. 17. 17©MapR Technologies - Confidential Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/histor ic data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classification, NLP, topic identification, search log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
    18. 18. 18©MapR Technologies - Confidential MapR  MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity  MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops
    19. 19. 19©MapR Technologies - Confidential LucidWorks  LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts  LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more  LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks
    20. 20. 20©MapR Technologies - Confidential LucidWorks Big Data Inputs API MgmtSearch, Discovery, Analytics Processing & Storage Analytics Service Document Service Big Data LucidWorks Web HDFS Admin Service Mgmt Data Mgmt Provisioning, Monitoring & Configuration
    21. 21. 21©MapR Technologies - Confidential Easy Wins  Analyze logs from application stored in MapR  Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes  Snapshots make backups a snap – A lot like version control for data  LucidWorks 2.5.2 easily connects with MapR
    22. 22. 22©MapR Technologies - Confidential 1 + 1 = 3
    23. 23. 23©MapR Technologies - Confidential Learn More  Talk to Ted (we are hiring) @ted_dunning tdunning@maprtech.com  Talk to Ivan (we are hiring) @iprovalov  MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #hadoopsummit #lucidworks