Crowd Sourcing ReflectedIntelligence Using Search and BigDataTed DunningIvan Provalov©MapR Technologies - Confidential   1
Ted’s Background     Academia, Startups       –   Aptex, MusicMatch, ID Analytics, Veoh       –   Big data since before b...
Ivan’s Background     Sr. Research Engineer at LucidWorks     NLP, IR     Agile development©MapR Technologies - Confide...
Agenda     Intro     Search Evolution and Search Revolution     Reflected Intelligence Use Cases     Building a Next G...
User Interactions With Big Data                                                    Command    System                      ...
Is Search Enough?    • Keyword search is a commodity    • Holistic view of the data and the      user interactions with th...
User Interactions With Big Data                                                    Command        System                  ...
Search (R)evolution     Search use leads to search abuse       –   denormalization frees your mind       –   scoring is j...
Search, Discovery and Analytics     Large-scale analysis is key to reflected intelligence       –   correlation analysis ...
Social Media Analysis in Telecom     Correlate mobile traffic analysis with social media analysis       –   events cause ...
Provenance is 80% of Value     Analysis of social media to determine advertising reach and      response     In one case...
Claims Analysis     Goal       –   Insurance claims processing and analysis       –   fraud analysis     Method       – ...
Virginia Tech - Help the World     Grab data around crisis     Search immediately     Large-scale analysis enriches dat...
Bright Planet - Catch the Bad Guys     Online Drug Counterfeit detection     Identify commonly used language indicating ...
Veoh - Cross Recommendations     Cross recommendation as search       –   with search used to build cross recommendation!...
What Platform Do You Need?     Fast, efficient, scalable search       –   bulk and near real-time indexing       –   hand...
Reference Architecture                                                                      Access APIs                   ...
MapR     MapR provides the technology leading Hadoop distribution       –   full eco-system distribution       –   integr...
LucidWorks     LucidWorks provides the leading packaging of Apache Lucene and      Solr       –   build your own, we supp...
LucidWorks Big Data                                                            API                        Big Data        ...
Easy Wins     Analyze logs from application stored in MapR     Seamlessly store search indexes in MapR       –   and fee...
1+1=3©MapR Technologies - Confidential     22
Learn More     Talk to Ted (we are hiring)       @ted_dunning       tdunning@maprtech.com     Talk to Ivan (we are hirin...
Upcoming SlideShare
Loading in …5
×

Crowd-Sourced Intelligence Built into Search over Hadoop

1,361 views
1,237 views

Published on

Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,361
On SlideShare
0
From Embeds
0
Number of Embeds
20
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • TED: We can tighten or loosen as necessary.
  • TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  • Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  • All that revolution is good, but what the heck does this have to do w/ Big Data?
  • GSI: needs a bit more meat
  • Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time
  • Crowd-Sourced Intelligence Built into Search over Hadoop

    1. 1. Crowd Sourcing ReflectedIntelligence Using Search and BigDataTed DunningIvan Provalov©MapR Technologies - Confidential 1
    2. 2. Ted’s Background Academia, Startups – Aptex, MusicMatch, ID Analytics, Veoh – Big data since before big Open source – since the dark ages before the internet – Mahout, Zookeeper, Drill – bought the beer at first HUG MapR – Chief Application Architect Founding member of Apache Drill©MapR Technologies - Confidential 2
    3. 3. Ivan’s Background Sr. Research Engineer at LucidWorks NLP, IR Agile development©MapR Technologies - Confidential 3
    4. 4. Agenda Intro Search Evolution and Search Revolution Reflected Intelligence Use Cases Building a Next Generation Search and Discovery Platform – MapR – LucidWorks 1+1=3©MapR Technologies - Confidential 4
    5. 5. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Data Index Search End User©MapR Technologies - Confidential 5
    6. 6. Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data • Search, Discovery and Analytics are the key to unlocking this view of users and data new pope Search©MapR Technologies - Confidential 6
    7. 7. User Interactions With Big Data Command System Data DFS Line Administrator Key Value Query Data Engineer Store Language Keyword Search Data Index End User Reflected Intelligence©MapR Technologies - Confidential 7
    8. 8. Search (R)evolution Search use leads to search abuse – denormalization frees your mind – scoring is just a sparse matrix multiply Lucene/Solr evolution – non free text usages abound – many DB-like features – noSQL before NoSQL was cool – flexible indexing – finite State Transducers FTW! Scale “This ain’t your father’s relevance anymore”©MapR Technologies - Confidential 8
    9. 9. Search, Discovery and Analytics Large-scale analysis is key to reflected intelligence – correlation analysis • based on queries, clicks, mouse tracks, even explicit feedback • produce clusters, trends, topics, SIP’s Search – start with engineered knowledge, refine with user feedback Large-scale discovery features encourage experimentation Always test, always enrich! Analytics Discovery©MapR Technologies - Confidential 9
    10. 10. Social Media Analysis in Telecom Correlate mobile traffic analysis with social media analysis – events cause traffic micro-bursts – participants tweet the events ahead of time Deploy operations faster to predict outages and better handle emergency situations – high cost bandwidth augmentation can be marshaled as the traffic appears – anticipation beats reaction©MapR Technologies - Confidential 10
    11. 11. Provenance is 80% of Value Analysis of social media to determine advertising reach and response In one case the same untargeted advertising was worth 5x if sold with supporting data.©MapR Technologies - Confidential 11
    12. 12. Claims Analysis Goal – Insurance claims processing and analysis – fraud analysis Method – Combine free text search with metadata analysis to identify high risk activities across the country – Integrate with corporate workflows to detect and fix outliers in customer relations Results – Questions that took 24-48 hours now take seconds to answer©MapR Technologies - Confidential 12
    13. 13. Virginia Tech - Help the World Grab data around crisis Search immediately Large-scale analysis enriches data to find ways to improve responses and understanding http://www.ctrnet.net©MapR Technologies - Confidential 13
    14. 14. Bright Planet - Catch the Bad Guys Online Drug Counterfeit detection Identify commonly used language indicating counterfeits – you know it when you see it – and you know you have seen it Feed to analyst via search-driven application – enrich based on analysts feedback©MapR Technologies - Confidential 14
    15. 15. Veoh - Cross Recommendations Cross recommendation as search – with search used to build cross recommendation! Recommend content to people who exhibit certain behaviors (clicks, query terms, other) (Ab)use of a search engine – but not as a search engine for content – more like a search engine for behavior©MapR Technologies - Confidential 15
    16. 16. What Platform Do You Need? Fast, efficient, scalable search – bulk and near real-time indexing – handle billions of records with sub-second search and faceting Large scale, cost effective storage and processing capabilities NLP and machine learning tools that scale to enhance discovery and analysis Integrated log analysis workflows that close the loop between the raw data and user interactions©MapR Technologies - Confidential 16
    17. 17. Reference Architecture Access APIs View into Search View Analytic numeric/histor 1 Services ic data 2 Shards 3 N Personalization & Classification Machine Learning Recommendation Services Document •Documents Discovery & •Users Enrichment Store •Logs Clustering, classificat ion, NLP, topic identification, searc Classification Models h log analysis, user In memory behavior Content Acquisition Replicated ETL, batch or near Multi-tenant real-time Data • LucidWorks Search connectors • Push©MapR Technologies - Confidential 17
    18. 18. MapR MapR provides the technology leading Hadoop distribution – full eco-system distribution – integrated data platform – complete solution for data integrity MapR clusters also provide tight integration with search technologies like LucidWorks – integration is key for effective ops©MapR Technologies - Confidential 18
    19. 19. LucidWorks LucidWorks provides the leading packaging of Apache Lucene and Solr – build your own, we support – founded by the most prominent Lucene/Solr experts LucidWorks Search – “Solr++” • UI, REST API, MapR connectors, relevance tools, much more LucidWorks Big Data – Big Data as a Service – Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks©MapR Technologies - Confidential 19
    20. 20. LucidWorks Big Data API Big Data LucidWorks Web HDFS Inputs Search, Discovery, Analytics Mgmt Analytics Service Document Service Admin Service Processing & Storage Mgmt Data Mgmt Provisioning, Monitoring & Configuration ©MapR Technologies - Confidential 20
    21. 21. Easy Wins Analyze logs from application stored in MapR Seamlessly store search indexes in MapR – and feed to Pig, Mahout and others – use mirrors + NFS to directly deploy indexes Snapshots make backups a snap – A lot like version control for data LucidWorks 2.5.2 easily connects with MapR©MapR Technologies - Confidential 21
    22. 22. 1+1=3©MapR Technologies - Confidential 22
    23. 23. Learn More Talk to Ted (we are hiring) @ted_dunning tdunning@maprtech.com Talk to Ivan (we are hiring) @iprovalov MapR and Lucid Works http://www.mapr.com http://www.lucidworks.com Hash Tags #mapr #hadoopsummit #lucidworks©MapR Technologies - Confidential 23

    ×