Big Data Use Cases

13,054 views
12,744 views

Published on

Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.

Published in: Technology
0 Comments
18 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
13,054
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1,024
Comments
0
Likes
18
Embeds 0
No embeds

No notes for slide
  • SCRIPT:You can see from the Word Count example that a MapReduce is a low level construct. Typical applications require more complex processing, which is accomplished by performing multiple stages of MapReduce. Here is an example of a Hadoop system to detect account fraud after a security breach, using machine learning models. (*) Each step is its own MapReduce program. We’ll return to this example in more detail later.---------------[DON’T do any explanation of the algorithm here. Just twinkle the MR stages.(*) User transaction data is loaded into a distributed datastore for massive tables, such as HBase running on Hadoop, or native tables available with MapR’s M7 distribution.(*) There’s a training phase, to train the system what normal transactions look like.(*) Later, individual user transactions are scored against the “normal behavior” pattern.(*) Then, transactions with highly anomalous behavior are singled out as candidate events to be manually reviewed by analysts for potential fraud.In your data flow, any place you have a group-by, or join, or filter, or count occurrences event, it typically equates to one or more map-reduce jobs.
  • MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.
  • Big Data Use Cases

    1. 1. Big Data Use DevNexus Conference 2/18/2013 *Fully buzzword-compliant title 1 * Cases
    2. 2. whoami • Brad Anderson • Solutions Architect at MapR (Atlanta) • ATLHUG co-chair • NoSQL East Conference 2009 • “boorad” most places (twitter, github) • banderson@maprtech.com 2
    3. 3. Mobile Virtualization Social Media B2B Application Service Provider Cloud Client/Server Web 2.0 Service Bureau Software-as-a-Service 3
    4. 4. BIG DATA 4
    5. 5. 5
    6. 6. Business Value 6
    7. 7. Business Value 7
    8. 8. Big Data is not new! but the tools are. 8
    9. 9. Ship the Function to the Data Distributed Computing Traditional Architecture function function data data function data data function function data data function data RDBMS function data data data data data data data data function function function data data data data data data data data data function function function data data data SAN/NAS 9
    10. 10. Variation: Multiple MapReduces Example: Fraud Detection in User Transactions MapReduce Transaction data LDA training LDA scoring G2 score 95 %-ile LDA anomaly HBase / MapR M7 Edition Candidate events for analyst review http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation 10
    11. 11. MapR Distribution for Apache Hadoop  Complete Hadoop distribution  Comprehensive management suite  Industry-standard interfaces  Enterprise-grade dependability  Higher performance 11
    12. 12. Big Data Ecosystem 12
    13. 13. Use Case Company  Data Source(s)  Technique(s)  Business Value  13
    14. 14. Proactive Monitoring 14
    15. 15. Data Sources Server Telemetry  Monitoring Logs  Network Flow  15
    16. 16. Techniques Pattern Recognition  Proactive Monitoring  Early Alert Delivery  16
    17. 17. Business Value 17
    18. 18. Telecommunications Giant ETL Offload 18
    19. 19. Telecommunications Data Sources Customer Records  Contract Data  Purchase Orders  Call Center  19
    20. 20. Telecommunications Techniques Analytics ETL 20
    21. 21. Telecommunications Techniques + ETL (Hadoop) Analytics (Teradata) 21
    22. 22. Telecommunications Business Value 22
    23. 23. Credit Card Issuer Data Sources Customer Purchase History  Merchant Designations  Merchant Special Offers  23
    24. 24. Credit Card Issuer Techniques Hadoop Purchase History Export (4 hrs) App App Merchant Information Recommendation Engine Results (Mahout) Presentation Data Store (DB2) App App Merchant Offers App Import (4 hrs) 24
    25. 25. Credit Card Issuer Techniques Hadoop Purchase History Merchant Information Recommendation Engine Results (Mahout) Index Update (2 min) App App Recommendation Search Index (Solr) App App Merchant Offers App 25
    26. 26. Credit Card Issuer Business Value 26
    27. 27. Waste & Recycling Leader Idle Alerts 27
    28. 28. Data Sources  Truck Geolocation Data 20,000 trucks – 5 sec interval –  Landfill Geographic Boundaries 28
    29. 29. Techniques Realtime Stream Computation (Storm) Truck Geolocation Data Hadoop Storage Immediate Alerts Batch Computation (MapReduce) Tax Reduction Reporting Shortest Path Graph Algorithm Route Optimization 29
    30. 30. Business Value 30
    31. 31. Fraud Detection Data Lake 31
    32. 32. Data Sources   Anti-Money Laundering Consumer Transactions 32
    33. 33. Techniques Anti-Money Laundering System Consumer Transactions System 33
    34. 34. Techniques AML Data Lake (Hadoop) Suspicious Events Consumer Transactions Analyst Latent Dirichlet Allocation, Bayesian Learning Neural Network, Peer Group Analysis 34
    35. 35. Business Value 35
    36. 36. Machine Learning Search Relevance DNA Matching 36
    37. 37. Data Sources Birth, Death, Census, Military, I mmigration records  Search Behavior Activity  DNA SNP (snips)  37
    38. 38. Techniques Record Linking  Search Relevance  Clickstream Behavior  Security Forensics  DNA Matching  38
    39. 39. Business Value 39
    40. 40. Traffic Analytics 40
    41. 41. Data Sources  Inrix Road Segment Data Avg Speed / minute / segment – Reference Speeds –  Road Segment Geolocation Data 41
    42. 42. Techniques  Bottleneck Detection Algorithm  Time Offset Correlations –  Alternate Routes Predictive Congestion Analysis – Growth & Term Assumptions 42
    43. 43. 43
    44. 44. 44
    45. 45. Business Value 45
    46. 46. Similar Characteristics Lots of Data  Structured, Semi-Structured, Unstructured  Varied Systems Interoperating – Hadoop, Storm, Solr, MPP, Visualizations  Increase Revenue  Decrease Costs  46
    47. 47. Thank You 47

    ×