Hadoop summit EU - Crowd Sourcing Reflected Intelligence


Published on

A live replay of the webinar that Grant and I gave last fall, this time at Hadoop Summit EU.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • TED: We can tighten or loosen as necessary.
  • TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  • Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  • All that revolution is good, but what the heck does this have to do w/ Big Data?
  • GSI: needs a bit more meat
  • Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time
  • Hadoop summit EU - Crowd Sourcing Reflected Intelligence

    1. 1. Crowd Sourcing ReflectedIntelligence Using Search and BigDataTed DunningIvan Provalov©MapR Technologies - Confidential 1
    2. 2. Ted’s Background Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG MapR – Chief Application Architect Founding member of Apache Drill©MapR Technologies - Confidential 2
    3. 3. Ivan’s Background Sr. Research Engineer at LucidWorks NLP, IR Agile development©MapR Technologies - Confidential 3
    4. 4. Agenda Intro Search Evolution and Search Revolution Reflected Intelligence Use Cases Building a Next Generation Search and Discovery Platform – MapR – LucidWorks 1+1=3©MapR Technologies - Confidential 4
    5. 5. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Data Index Search End User©MapR Technologies - Confidential 5
    6. 6. Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data • Search, Discovery and Analytics are the key to unlocking this view of users and data new pope Search©MapR Technologies - Confidential 6
    7. 7. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Search Data Index End User Reflected Intelligence©MapR Technologies - Confidential 7
    8. 8. Search (R)evolution Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW! Scale “This ain’t your father’s relevance anymore”©MapR Technologies - Confidential 8
    9. 9. Search, Discovery and Analytics Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s Search – start with engineered knowledge, refine with user feedback Large-scale discovery features encourage experimentation Always test, always enrich! Analytics Discovery©MapR Technologies - Confidential 9
    10. 10. Social Media Analysis in Telecom Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction©MapR Technologies - Confidential 10
    11. 11. Provenance is 80% of Value Analysis of social media to determine advertising reach and response In one case the same untargeted advertising was worth 5x if sold with supporting data.©MapR Technologies - Confidential 11
    12. 12. Claims Analysis Goal – Insurance claims processing and analysis – fraud analysis Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations Results – Questions that took 24-48 hours now take seconds to answer©MapR Technologies - Confidential 12
    13. 13. Virginia Tech - Help the World Grab data around crisis Search immediately Large-scale analysis enriches data to find ways to improve responses and understanding http://www.ctrnet.net©MapR Technologies - Confidential 13
    14. 14. Bright Planet - Catch the Bad Guys Online Drug Counterfeit detection Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it Feed to analyst via search-driven application – enrich based on analysts feedback©MapR Technologies - Confidential 14
    15. 15. Veoh - Cross Recommendations Cross recommendation as search – with search used to build cross recommendation! Recommend content to people who exhibit certain behaviors (clicks, query terms, other) (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior©MapR Technologies - Confidential 15
    16. 16. What Platform Do You Need? Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting Large scale, cost effective storage and processing capabilities NLP and machine learning tools that scale to enhance discovery and analysis Integrated log analysis workflows that close the loop between the raw data and user interactions©MapR Technologies - Confidential 16
    17. 17. Reference Architecture Access APIs View into Search View Analytic numeric/histor 1 Services ic data 2 Shards 3 N Personalization & Classification Machine Learning Recommendation Services Document •Documents Discovery & •Users Enrichment Store •Logs Clustering, classification, NLP, topic identification, Classification Models search log analysis, In memory user behavior Content Acquisition Replicated ETL, batch or near Multi-tenant real-time Data • LucidWorks Search connectors • Push©MapR Technologies - Confidential 17
    18. 18. MapR MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops©MapR Technologies - Confidential 18
    19. 19. LucidWorks LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks©MapR Technologies - Confidential 19
    20. 20. LucidWorks Big Data API Big Data LucidWorks Web HDFS Inputs Search, Discovery, Analytics Mgmt Analytics Service Document Service Admin Service Processing & Storage Mgmt Data Mgmt Provisioning, Monitoring & Configuration ©MapR Technologies - Confidential 20
    21. 21. Easy Wins Analyze logs from application stored in MapR Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes Snapshots make backups a snap – A lot like version control for data LucidWorks 2.5.2 easily connects with MapR©MapR Technologies - Confidential 21
    22. 22. 1+1=3©MapR Technologies - Confidential 22
    23. 23. Learn More Talk to Ted (we are hiring) @ted_dunning tdunning@maprtech.com Talk to Ivan (we are hiring) @iprovalov MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #hadoopsummit #lucidworks©MapR Technologies - Confidential 23