Big Data Use Cases
Upcoming SlideShare
Loading in...5
×
 

Big Data Use Cases

on

  • 6,236 views

Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by ...

Everyone is awash in the new buzzword, Big Data, and it seems as if you can’t escape it wherever you go. But there are real companies with real use cases creating real value for their businesses by using big data. This talk will discuss some of the more compelling current or recent projects, their architecture & systems used, and successful outcomes.

Statistics

Views

Total Views
6,236
Views on SlideShare
6,236
Embed Views
0

Actions

Likes
5
Downloads
301
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • SCRIPT:You can see from the Word Count example that a MapReduce is a low level construct. Typical applications require more complex processing, which is accomplished by performing multiple stages of MapReduce. Here is an example of a Hadoop system to detect account fraud after a security breach, using machine learning models. (*) Each step is its own MapReduce program. We’ll return to this example in more detail later.---------------[DON’T do any explanation of the algorithm here. Just twinkle the MR stages.(*) User transaction data is loaded into a distributed datastore for massive tables, such as HBase running on Hadoop, or native tables available with MapR’s M7 distribution.(*) There’s a training phase, to train the system what normal transactions look like.(*) Later, individual user transactions are scored against the “normal behavior” pattern.(*) Then, transactions with highly anomalous behavior are singled out as candidate events to be manually reviewed by analysts for potential fraud.In your data flow, any place you have a group-by, or join, or filter, or count occurrences event, it typically equates to one or more map-reduce jobs.
  • MapR provides a complete distribution for Apache Hadoop. MapR has integrated, tested and hardened a broad array of packages as part of this distribution Hive, Pig, Oozie, Sqoop, plus additional packages such as Cascading. We have spent over a two year well funded effort to provide deep architectural improvements to create the next generation distribution for Hadoop. MapR has made significant updates combined with a dozen open source packages. Any of the innovations MapR has delivered include 100% compatibility with the Apache Hadoop APIs. This is in stark contrast with the alternative distributions from Cloudera, HortonWorks, Apache which are all equivalent.

Big Data Use Cases Big Data Use Cases Presentation Transcript

  • Big Data Use DevNexus Conference 2/18/2013 *Fully buzzword-compliant title 1 * Cases
  • whoami • Brad Anderson • Solutions Architect at MapR (Atlanta) • ATLHUG co-chair • NoSQL East Conference 2009 • “boorad” most places (twitter, github) • banderson@maprtech.com 2
  • Mobile Virtualization Social Media B2B Application Service Provider Cloud Client/Server Web 2.0 Service Bureau Software-as-a-Service 3
  • BIG DATA 4
  • 5
  • Business Value 6
  • Business Value 7
  • Big Data is not new! but the tools are. 8
  • Ship the Function to the Data Distributed Computing Traditional Architecture function function data data function data data function function data data function data RDBMS function data data data data data data data data function function function data data data data data data data data data function function function data data data SAN/NAS 9
  • Variation: Multiple MapReduces Example: Fraud Detection in User Transactions MapReduce Transaction data LDA training LDA scoring G2 score 95 %-ile LDA anomaly HBase / MapR M7 Edition Candidate events for analyst review http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation 10
  • MapR Distribution for Apache Hadoop  Complete Hadoop distribution  Comprehensive management suite  Industry-standard interfaces  Enterprise-grade dependability  Higher performance 11
  • Big Data Ecosystem 12
  • Use Case Company  Data Source(s)  Technique(s)  Business Value  13
  • Proactive Monitoring 14
  • Data Sources Server Telemetry  Monitoring Logs  Network Flow  15
  • Techniques Pattern Recognition  Proactive Monitoring  Early Alert Delivery  16
  • Business Value 17
  • Telecommunications Giant ETL Offload 18
  • Telecommunications Data Sources Customer Records  Contract Data  Purchase Orders  Call Center  19
  • Telecommunications Techniques Analytics ETL 20
  • Telecommunications Techniques + ETL (Hadoop) Analytics (Teradata) 21
  • Telecommunications Business Value 22
  • Credit Card Issuer Data Sources Customer Purchase History  Merchant Designations  Merchant Special Offers  23
  • Credit Card Issuer Techniques Hadoop Purchase History Export (4 hrs) App App Merchant Information Recommendation Engine Results (Mahout) Presentation Data Store (DB2) App App Merchant Offers App Import (4 hrs) 24
  • Credit Card Issuer Techniques Hadoop Purchase History Merchant Information Recommendation Engine Results (Mahout) Index Update (2 min) App App Recommendation Search Index (Solr) App App Merchant Offers App 25
  • Credit Card Issuer Business Value 26
  • Waste & Recycling Leader Idle Alerts 27
  • Data Sources  Truck Geolocation Data 20,000 trucks – 5 sec interval –  Landfill Geographic Boundaries 28
  • Techniques Realtime Stream Computation (Storm) Truck Geolocation Data Hadoop Storage Immediate Alerts Batch Computation (MapReduce) Tax Reduction Reporting Shortest Path Graph Algorithm Route Optimization 29
  • Business Value 30
  • Fraud Detection Data Lake 31
  • Data Sources   Anti-Money Laundering Consumer Transactions 32
  • Techniques Anti-Money Laundering System Consumer Transactions System 33
  • Techniques AML Data Lake (Hadoop) Suspicious Events Consumer Transactions Analyst Latent Dirichlet Allocation, Bayesian Learning Neural Network, Peer Group Analysis 34
  • Business Value 35
  • Machine Learning Search Relevance DNA Matching 36
  • Data Sources Birth, Death, Census, Military, I mmigration records  Search Behavior Activity  DNA SNP (snips)  37
  • Techniques Record Linking  Search Relevance  Clickstream Behavior  Security Forensics  DNA Matching  38
  • Business Value 39
  • Traffic Analytics 40
  • Data Sources  Inrix Road Segment Data Avg Speed / minute / segment – Reference Speeds –  Road Segment Geolocation Data 41
  • Techniques  Bottleneck Detection Algorithm  Time Offset Correlations –  Alternate Routes Predictive Congestion Analysis – Growth & Term Assumptions 42
  • 43
  • 44
  • Business Value 45
  • Similar Characteristics Lots of Data  Structured, Semi-Structured, Unstructured  Varied Systems Interoperating – Hadoop, Storm, Solr, MPP, Visualizations  Increase Revenue  Decrease Costs  46
  • Thank You 47