#5usecases




   5 Big Data Use
   Cases
             Tim Gasper
             Director of Product
                                   for 2013
             Infochimps, Inc.
#5usecases




   #1 enterprise cloud for big data
    some of our customers   our partners
#5usecases




               BI & Data
              Visualization

                Big Data
               Applications

               Reporting &
             Ad-hoc Business
                Questions
#5usecases




             poll
#5usecases




             poll
#5usecases
#5usecases




    81% of companies
    say Big Data is a top 5
    IT priority in 2013
#5usecases
#5usecases

            more data than ever
 CRM/customer support before
 POS/purchases
 ERP/accounting
 email/documents/collab.
 BI & data warehouse
 system & network logs       many terabytes of data,
 web logs/clickstream      sometimes many petabytes
 google
 analytics/omniture
 facebook/twitter
                                                       ?
 yelp/foursquare/google
 experian/epsilon/acxiom
 mobile devices
 sensors
 product reviews
 google search results
 + more
#5usecases




             BIG               DATA
             •   volume        •   scalable
             •   velocity      •   intelligent
             •   variety       •   agnostic
             •   variability   •   holistic
#5usecases




             poll
#5usecases
#5usecases
#5usecases




1
#5usecases




  risk analysis and fraud
  detection
#5usecases



    customer risk analysis
    comprehensive data picture
      •   build comprehensive data picture of customer-side
          risk
      •   publish a consolidated set of attributes for analysis
      •   add additional context, both internal and external
    parse and aggregate data from different sources
      •   credit and debit cards, product payments, deposits
          and savings
      •   banking activity, browsing behavior, call logs, e-mails
          and chats
    merge data into a single view
      •   a “fuzzy join” among data sources
      •   structure and normalize attributes
      •   sentiment analysis, pattern recognition
#5usecases
#5usecases



 surveillance & fraud detection
    activity records in a central repository
      •   centralized logging across all execution platforms
      •   structured and raw log data from multiple applications
    pattern recognition to detect anomalies/harmful behavior
      •   feature set and timeline vector are very dynamic
      •   “schema on read” provides flexibility for analysis
    data is primarily served and processed in HDFS with
    MapReduce
      •   data filtering and projection in Pig and Hive
      •   statistical modeling of data sets in R or SAS
#5usecases



regulatory
compliance




Source:
http://virtualization.sys-con.com/node/101598
#5usecases

              global investment bank
                     trade risk
                                     search &
    Trading
     Data
                      ingest data legal discovery




     Customer
       Data
                                intraday analysis
                               & historical analysis
                              (production reports +
                            exploratory risk modeling)
#5usecases



  brand and sentiment
  analysis
#5usecases



    brand & sentiment analysis
    the internet generates a lot of chatter about brands
      •   understanding what’s said is key to protecting brand
          value
      •   facebook & twitter generate a flood of data for large
          brands
    capturing and processing direct feedback
      •   better engagement and alerting via sentiment analysis
      •   integration with other customer service systems
    hadoop handles the diverse data types and processing
      •   sources of data changing and semantics continuously
          evolving
      •   sophistication of algorithms is iteratively improving
#5usecases

             large media conglomerate

                                                  search &
        Social
        Media
                                ingest data      application


News, Blogs, etc.


        Traditional
          Media
                 real-time sentiment,         trend analysis
                  influence, gender,
                 topic extraction, etc.
#5usecases




  customer insights/behavior
#5usecases
#5usecases



    customer churn analysis
    understanding customer behavior and preferences
      •   rapidly test and build behavioral model of customer
      •   combine disparate data sources (transactional, social,
          etc.)
    structure and analyze with Hadoop
      •   traversing usage and social graphs
      •   pattern identification and recognition to find indicators
    feature extraction to find root causes
      •   defining attributes and modeling statistical
          significance
      •   combinations and sequence of attributes + actions
          factor in
#5usecases



    customer loyalty
    comparison shopping is making retail hyper-competitive
      •   discount programs, e-mail correspondence entice
          shoppers
      •   brand loyalty means attention to detail and service
    customer lifecycle is more than purchases
      •   browsing and online data used to capture customer
          attention
      •   loyalty programs bridge the gap between purchases
    reach into online channels
      •   online engagement is personalized just as in store
      •   connecting online and in store shows customer
          awareness
#5usecases



                     customer segmentation

    Demographics,                    customer insight
     Geography,          ingest data     reports
    Web Data, etc.




     Point Of Sale
    Purchase Data
                                   shopping pattern
                                     recognition
#5usecases



  targeted marketing and
  personalization
#5usecases



    targeted offers
    the checkout lane is everywhere
      •   cookies track users through ad impressions
      •   purchasing behavior is time sensitive
    logs collected online and offline
      •   data is ingested incrementally
      •   process happens at a variety of time scales
    data logged into HBase and primary store
      •   some events naturally associate, others require
          deeper analysis
      •   insights implemented via application logic
#5usecases



 recommendations & forecasting
    collect and serve personalization information
      •   wide variety of constantly changing data sources
      •   data guaranteed to be messy
    data ingestion includes collection of raw data
      •   filtering and fixing of poorly formatted data
      •   normalization and matching across data sources
    analysis looks for reliable attributes and groupings
      •   interpretation (e.g. gender by name)
      •   aggregation across likely matching identifiers
      •   identify possible predicted attributes or preferences
#5usecases

                    major apparel brand
                      targeted discounts
                                             pre-defined
   Clickstream
                                             web content
 Data from Online           ingest data       and deals
    Storefront




                                             behavioral
                                          cluster analysis
#5usecases




  big data business
  intelligence
#5usecases




             poll
#5usecases
#5usecases
#5usecases


traditional data
warehousing
#5usecases


big data
warehousing
#5usecases


big data
warehousing
The Infochimps Approach
#5usecases
    big data exploration & visualization
#5usecases

                popular online deal site
                 business command center


  Retail Site
                        ingest data       BI dashboarding
  Web Logs




                                       SQL analysis
                                      with Hive & Hue
#5usecases



             learn more >>


               sales@infochimps.com
                   1-855-328-2386

                   Request a Demo:
             http://infochimps.com/demo

5 Big Data Use Cases for 2013

  • 1.
    #5usecases 5 Big Data Use Cases Tim Gasper Director of Product for 2013 Infochimps, Inc.
  • 2.
    #5usecases #1 enterprise cloud for big data some of our customers our partners
  • 3.
    #5usecases BI & Data Visualization Big Data Applications Reporting & Ad-hoc Business Questions
  • 4.
  • 5.
  • 6.
  • 7.
    #5usecases 81% of companies say Big Data is a top 5 IT priority in 2013
  • 8.
  • 9.
    #5usecases more data than ever CRM/customer support before POS/purchases ERP/accounting email/documents/collab. BI & data warehouse system & network logs many terabytes of data, web logs/clickstream sometimes many petabytes google analytics/omniture facebook/twitter ? yelp/foursquare/google experian/epsilon/acxiom mobile devices sensors product reviews google search results + more
  • 10.
    #5usecases BIG DATA • volume • scalable • velocity • intelligent • variety • agnostic • variability • holistic
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
    #5usecases riskanalysis and fraud detection
  • 16.
    #5usecases customer risk analysis comprehensive data picture • build comprehensive data picture of customer-side risk • publish a consolidated set of attributes for analysis • add additional context, both internal and external parse and aggregate data from different sources • credit and debit cards, product payments, deposits and savings • banking activity, browsing behavior, call logs, e-mails and chats merge data into a single view • a “fuzzy join” among data sources • structure and normalize attributes • sentiment analysis, pattern recognition
  • 17.
  • 18.
    #5usecases surveillance &fraud detection activity records in a central repository • centralized logging across all execution platforms • structured and raw log data from multiple applications pattern recognition to detect anomalies/harmful behavior • feature set and timeline vector are very dynamic • “schema on read” provides flexibility for analysis data is primarily served and processed in HDFS with MapReduce • data filtering and projection in Pig and Hive • statistical modeling of data sets in R or SAS
  • 19.
  • 20.
    #5usecases global investment bank trade risk search & Trading Data ingest data legal discovery Customer Data intraday analysis & historical analysis (production reports + exploratory risk modeling)
  • 21.
    #5usecases brandand sentiment analysis
  • 22.
    #5usecases brand & sentiment analysis the internet generates a lot of chatter about brands • understanding what’s said is key to protecting brand value • facebook & twitter generate a flood of data for large brands capturing and processing direct feedback • better engagement and alerting via sentiment analysis • integration with other customer service systems hadoop handles the diverse data types and processing • sources of data changing and semantics continuously evolving • sophistication of algorithms is iteratively improving
  • 23.
    #5usecases large media conglomerate search & Social Media ingest data application News, Blogs, etc. Traditional Media real-time sentiment, trend analysis influence, gender, topic extraction, etc.
  • 24.
    #5usecases customerinsights/behavior
  • 25.
  • 26.
    #5usecases customer churn analysis understanding customer behavior and preferences • rapidly test and build behavioral model of customer • combine disparate data sources (transactional, social, etc.) structure and analyze with Hadoop • traversing usage and social graphs • pattern identification and recognition to find indicators feature extraction to find root causes • defining attributes and modeling statistical significance • combinations and sequence of attributes + actions factor in
  • 27.
    #5usecases customer loyalty comparison shopping is making retail hyper-competitive • discount programs, e-mail correspondence entice shoppers • brand loyalty means attention to detail and service customer lifecycle is more than purchases • browsing and online data used to capture customer attention • loyalty programs bridge the gap between purchases reach into online channels • online engagement is personalized just as in store • connecting online and in store shows customer awareness
  • 28.
    #5usecases customer segmentation Demographics, customer insight Geography, ingest data reports Web Data, etc. Point Of Sale Purchase Data shopping pattern recognition
  • 29.
    #5usecases targetedmarketing and personalization
  • 30.
    #5usecases targeted offers the checkout lane is everywhere • cookies track users through ad impressions • purchasing behavior is time sensitive logs collected online and offline • data is ingested incrementally • process happens at a variety of time scales data logged into HBase and primary store • some events naturally associate, others require deeper analysis • insights implemented via application logic
  • 31.
    #5usecases recommendations &forecasting collect and serve personalization information • wide variety of constantly changing data sources • data guaranteed to be messy data ingestion includes collection of raw data • filtering and fixing of poorly formatted data • normalization and matching across data sources analysis looks for reliable attributes and groupings • interpretation (e.g. gender by name) • aggregation across likely matching identifiers • identify possible predicted attributes or preferences
  • 32.
    #5usecases major apparel brand targeted discounts pre-defined Clickstream web content Data from Online ingest data and deals Storefront behavioral cluster analysis
  • 33.
    #5usecases bigdata business intelligence
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40.
    #5usecases big data exploration & visualization
  • 41.
    #5usecases popular online deal site business command center Retail Site ingest data BI dashboarding Web Logs SQL analysis with Hive & Hue
  • 42.
    #5usecases learn more >> sales@infochimps.com 1-855-328-2386 Request a Demo: http://infochimps.com/demo

Editor's Notes

  • #3 we are a big data cloud services provider for the enterprise. we bundle together all the analytics infrastructure you need, like Hadoop, real-time analytics, and powerful databases, and provide the hosting, support, and expertise – so that you can focus on analytics and driving those business use cases and apps – not on wrangling with the complex systems
  • #5 I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
  • #6 I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
  • #7 I represent…a business person at an enterprisea technical person at an enterprisea consultanta vendorother
  • #8 and 94% in the top 10
  • #9 So let’s dig into it. Big data is a pretty easy idea to explain: we produce data, all the time, constantly, and we produce a lot of it. Data centers now take up 1.3% of global energy usage – as much as the entire continent of Australia. So we have some similarly big challenges and even bigger opportunities.On the left on this slide I’ve listed just a few of the kinds of data sources that might be available to an agency, should they choose to ingest them. Everything from their own clients’ customer databases, to streams of tweets from Twitter, to Google search results and even forum posts, can be ingested in the pursuit of building something that generates insights for their clients.
  • #33 Best explained by describing other use-cases like the GAP. Copying Flip and Tim, so they can benefit from the use-case....When we stood up a Horton-works cluster at PARC for the GAP, we architected a system whereby we could combine real-time (Esper) with batch (Hortonworks) to essentially make GAP.com become both "interactive and intelligent".This was done by analyzing click-stream log data in real-time to determine your behavior and based on what you were doing at that very instant, we served up personalized content to each individual user.....influencing them in real-time. So based on your current activity (you interacted with the website), we acted to customize your experience, intelligently. Where Hadoop came in was to build the "population-based behavioral" clusters, which allowed us to pre-define which content to serve up for you if and when you followed a certain real-time sequence.For example, click-stream analysis in Hadoop determined that when a large, statistically significant group did the following:HomepageJeans sectionSkinny jeansLong-sleeve shirtsThey were 90% likely to buy both jeans and shirts together.Whereas, if you did the following:HomepageLong-sleeve shirtsJeansSkinny jeansYou only bought the shirt! UNLESS there was at least a 20% discount associated with it.Two different clusters determined through complex Hadoop analysis over a long period of time.So....when you surf the web in real-time on the site, you see the following interactive behavior happen:Cluster 1: Homepage->Jeans->Skinny -> Recommendation to go to Long-sleeve shirts -> Long-sleeve shirts -> Purchase with NO DiscountCluster 2: Homepage->Long-sleeve shirts -> Recommendation to go to Jeans -> Skinny jeans -> 20% discount offered in real-time -> PurchaseThis is an interactive and intelligent web and e-commerce application which is 100% data-driven.
  • #43 I invite you to let us know what your use case is, and we can help you evaluate which tools and architecture is appropriate to solve it. Now we are open to questions!