SlideShare a Scribd company logo
1 of 22
Just-In-Time Analytics
Surfing the new big data landscape
Self-Service Big Data Analytics for Hadoop

 Matt Schumpert



                                © 2012 Datameer, Inc. All rights reserved.
Agenda

  Backdrop


  Observations


  Solution


  Demo




                 © 2012 Datameer, Inc. All rights reserved.
Big Data Landscape
Challenge                            Enablers                                        Needs

  Dramatic data growth                 Low cost storage and CPUs                          Democratize data access

  Structured and unstructured data     Disruptive new technologies                        Crowd-source insights

  Scale economically                   Availability of cloud infrastructure               Just-In-Time Delivery

  Maintain agility




                                                                                      Source: Forrester




                                                                         © 2012 Datameer, Inc. All rights reserved.
Hadoop - A Disruptive Response
Advantages             Challenges                                   Rapid Adoption

  Economics               Raw technology, complexity                      Led by Yahoo, Facebook, etc

  Flexibility             Requires significant resources                   Data-driven companies followed

  Scalability             No packaged applications                        Fortune 500 rapidly deploying




                Goal

                  Make Big Data analytics accessible to business
                  users

                  Shorten time-to-insight

                  Seamless integration to all data types

                  Low cost of ownership

                  Demystify




                                                           © 2012 Datameer, Inc. All rights reserved.
Current State
   Volume problem was solved with MPP DBs
   • TCO sometimes lower than Hadoop
   Variety problem is tractable with Hadoop


   People still struggle with velocity


   Time-to-insight is too high


   Will business agility decline?

                                         © 2012 Datameer, Inc. All rights reserved.
What We’re Surfing on...




                  © 2012 Datameer, Inc. All rights reserved.
Observations
   “The Wild Wild West”
   • We’re in a lawless era for data formats
   “Amateur Night”
   • People re-invent crooked wheels all over their pipelines
   “Open Mic Night”
   • Dealing with data that talks too much
   “Social Data Gold Rush”
   • A rush to judgement on social media data leads to silos




                                                     © 2012 Datameer, Inc. All rights reserved.
1. “The Wild Wild West” ...
   JSON (Twitter, Facebook, MongoDB, etc)
   • Not always well-formed
   • Difficult to split raw (backtrack to what, ‘{‘ ?)
   Sequence Files
   • Metadata is completely open-ended
   • Triple-packed content (Flume JSON w/ compressed files, etc.)
   Raw
   • “the delimiter of the week”
        ‣ u0001 (Hive)
        ‣ Þ (DoubleClick)

    •   Various text encoding schemes
        ‣   ISO-8859 vs. UTF-8

                                                  © 2012 Datameer, Inc. All rights reserved.
2. “Amateur Night...”
  Naive collection strategies
  • e.g. 1 file per record (Facebook user)
  • rudimentary use of batch requests / store-and-forward
  Naive ingestion strategies
  • e.g. per minute log ingestion with no compaction --> millions of small files
  • Partitioning for ease-of-ingestion, not analytics
     ‣ e.g. create files/keys/partitions by the server of origin

  Naive storage Strategies
  • Uncompressed, all-String storage of mostly numerical fields
  • Shimming compressed SEQ onto big compressed files --> not splittable
  • Mixing compression codecs with data formats (e.g. LzoTextInputFormat)

                                                      © 2012 Datameer, Inc. All rights reserved.
3. “Open Mic Night” ...
   Data can be verbose
   • e.g. repeating key/value pairs

   Semi-structured is the norm

   Deep hierarchies that explode unexpectedly
   • Even beyond task JVM memory (too many friends/fans!)

   Low Signal-to-noise ratio

   Content in various languages
   • Makes sentiment analysis tricky
                                            © 2012 Datameer, Inc. All rights reserved.
Example: FB Profile
{"id":"10011666","name":"Test user","first_name":"Test","last_name":"user","link":"http://www.facebook.com/test.user","username":"test.user","birthday":"09/19/
   1983","hometown":{"id":"103102203064024","name":"West Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"I'm an honorary Sean Connery, born '83r
   nThere's only one of mernSingle-handedly raising the economyrnAin't no chance of the record company dropping mernPress be asking do I care for sodomyrnI don't
       know, yeah, probablyrnI've been looking for serial monogamyrnNot some bird that looks like Billy ConnollyrnBut for now I'm down for ornithologyrnGrab your
 binoculars, come follow me","quotes":"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for - in order to
  get to the job you need to pay for the clothes and the car, and the house you leave vacant all day so you can afford to live in it. -Ellen GoodmanrnrnThe entire economy of
the Western world is built on things that cause cancer.-From the movie "Bliss"rnrnNever give a party if you will be the most interesting person there. -Mickey Friedmanrn
      rnAhhh. A man with a sharp wit. Someone ought to take it away from him before he cuts himself. -Peter da SilvarnrnNow it seems the music Industry's working on
 marketing ploys. I remember back when it wasn't about looks or color but about the voice. -Jay SeanrnrnWhy are you trying so hard to fit in, when you were born to stand
   out? -RandomrnrnI think if you're ready to go out with Johnny. Now's the time to tell him about your one month limit. He wont mind he'll apreciate your fresh look on
 dating. And once you've dated someone else you can date him again. I'm sure he'll like it. Everyone will appreciate it. You so novel what a good idea. You can keep your time
 to your self. You don't need date insurance.You can go out with whoever you want to. Every boy, every boy, in the whole world could be yours. If you'll just listen to my planr
nTHE TEENAGE GUIDE TO POPULARITY -Nada SurfrnrnThe difference between now and the future is simply greater destruction and more universal chaos_-Stephen
          Hawking rnrnIn archaeology you uncover the unknown. In diplomacy you cover the known. -Thomas PickeringrnrnYou know the disease u get when u get
married..Onegina -Russel PetersrnrnI saw you standing in my headlights. (Blink, blink, blink.)rnI thought I'd run you down for the weight you left on me.rnInstead I pushed
  rewind, reversed and drove away.rnAnd seeing you disappear in my rearview brought to me the wordrn'Reciprocity!' -IncubusrnrnFew people are capable of expressing
with equanimity opinions which differ from the prejudices of their social environment. Most people are even incapable of forming such opinions. -Albert EinsteinrnrnNinety-
eight percent of the adults in this country are decent, hard-working, honest Americans. It's the other lousy two percent that get all the publicity. But then--we elected them. -Lily
  TomlinrnrnWhen You Are Not Practicing, Remember: Someone Somewhere Is Practicing And When You Meet Him- He Will WinrnrnIf not I, who? If not here, where? If
    not now, when?rnrnAll that is necessary for evil to triumph is for good people to stand by and do nothing -UnknownrnrnWe are the people our parents warned us
about. -Jimmy BuffettrnrnNever explain--your friends do not need it and your enemies will not believe you anyway. -Elbert HubbardrnrnMy definition of a free society is a
        society where it is safe to be unpopular. -Adlai E. Stevenson Jr.rnrnToo many have dispensed with generosity in order to practice charity. -Albert Camus","work":
                 [{"employer":{"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position":
     {"id":"133619273341785","name":"Lead Programmer Analyst"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac group"},"position":
    {"id":"105630109469647","name":"Executive Producer"},"description":"We create music for Artist Placement and TV/Film.","start_date":"2002-01"},{"employer":
   {"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"116439401740213","name":"Senior
         Database Administrator"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location":
                                     {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT
              Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location":
                                   {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"144703015548786","name":"Pharmacy
                       Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams":
                [{"id":"87169796810","name":"Philadelphia Flyers"},{"id":"93625750491","name":"Philadelphia Phillies"},{"id":"45898408995","name":"Phoenix Suns"},
       {"id":"120163518021430","name":"Philadelphia Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve Nash"},{"id":"105590659475179","name":"Wayne
          Gretzky"},{"id":"62975399193","name":"Michael Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school":
                  {"id":"109324275761313","name":"Corona del Sol High School"},"type":"High School"},{"school":{"id":"23680344606","name":"Arizona State
                University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism (One with all
                                                          things)","political":"Liberal (Left of Center)","email":"app+22c90gj.
                                      9hh9d.f7304b58ac646e08b5f0f10a73547e34u0040proxymail.facebook.com","website":"www.slashdot.orgr
                                  nwww.gizmodo.com","timezone":-7,"locale":"en_US","languages":[{"id":"106059522759137","name":"English"},
                                    {"id":"112969428713061","name":"Hindi"}],"verified":true,"updated_time":"2012-03-22T17:24:25+0000"}
                                                                                                                     © 2012 Datameer, Inc. All rights reserved.
Example: Email (MBOX)
    From common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org Thu Aug 20 14:02:59 2009
           Return-Path: <common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org>
                                     Delivered-To: apmail-hadoop-common-user-archive@www.apache.org
                                  Received: (qmail 83137 invoked from network); 20 Aug 2009 14:02:58 -0000
                                   Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
                                         by minotaur.apache.org with SMTP; 20 Aug 2009 14:02:58 -0000
                                    Received: (qmail 23328 invoked by uid 500); 20 Aug 2009 14:03:14 -0000
                                    Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org
                                    Received: (qmail 23266 invoked by uid 500); 20 Aug 2009 14:03:14 -0000
                                  Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
                                                                  Precedence: bulk
                                          List-Help: <mailto:common-user-help@hadoop.apache.org>
                                  List-Unsubscribe: <mailto:common-user-unsubscribe@hadoop.apache.org>
                                             List-Post: <mailto:common-user@hadoop.apache.org>
                                                   List-Id: <common-user.hadoop.apache.org>
                                                  Reply-To: common-user@hadoop.apache.org
                                          Delivered-To: mailing list common-user@hadoop.apache.org
                                     Received: (qmail 23254 invoked by uid 99); 20 Aug 2009 14:03:14 -0000
                                   Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230)
                                   by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:14 +0000
                                                 X-ASF-Spam-Status: No, hits=-0.0 required=10.0
                                                                   tests=SPF_PASS
                                                           X-Spam-Check-By: apache.org
                                                 Received-SPF: pass (nike.apache.org: local policy)
                            Received: from [209.85.219.209] (HELO mail-ew0-f209.google.com) (209.85.219.209)
                                   by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:05 +0000
                                                Received: by ewy5 with SMTP id 5so181532ewy.36
                                 for <common-user@hadoop.apache.org>; Thu, 20 Aug 2009 07:02:45 -0700 (PDT)
                                                                  MIME-Version: 1.0
                            Received: by 10.216.39.85 with SMTP id c63mr1821542web.103.1250776964866; Thu,
                                                        20 Aug 2009 07:02:44 -0700 (PDT)
                             In-Reply-To: <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com>
                              References: <597eea000908191855v579b9c4r8baeb638630cfb27@mail.gmail.com>
                                    <e01b80590908192249s5302cd26m7984a32816c0d58c@mail.gmail.com>
                                     <597eea000908200209o176aefacjca2a45369301c296@mail.gmail.com>
                                     <e01b80590908200230x608ad35en5f372a9fd5aba325@mail.gmail.com>
                                     <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com>
                                                     Date: Thu, 20 Aug 2009 15:02:44 +0100
                             Message-ID: <ac79ea400908200702u309a4fcey9ab1a7b358f313ce@mail.gmail.com>
                                                Subject: Re: File Chunk to Map Thread Association
                                                    From: Tom White <tom@cloudera.com>
                                                     To: common-user@hadoop.apache.org
                                                  Content-Type: text/plain; charset=ISO-8859-1
                                                          Content-Transfer-Encoding: 7bit
                                             X-Virus-Checked: Checked by ClamAV on apache.org
                                                                    Hi Roman,
                            Have a look at CombineFileInputFormat - it might be related to what
                                                             you are trying to do.
                                                                      Cheers,
                                                                        Tom                           © 2012 Datameer, Inc. All rights reserved.
What do we need?




                   © 2012 Datameer, Inc. All rights reserved.
Just-In-Time Supply Chain
     Slow     Expensive                 Expertise



     ETL    Data Warehouse   Business Intelligence




                             © 2012 Datameer, Inc. All rights reserved.
Just-In-Time Supply Chain
      Slow       Expensive                 Expertise



      ETL      Data Warehouse   Business Intelligence




     Fast       Economical               Self Service

                                      Spreadsheets+
                                       drag ‘n drop
    Raw Load      Hadoop

                                    “schema on read”




                                © 2012 Datameer, Inc. All rights reserved.
A “One Stop Shop”
Compressing “Time-To-Insight”


        Fast                       Self Service



       Raw Load      Spreadsheet         Drag and Drop Visualization




                       Hadoop


                    Economical




                                        © 2012 Datameer, Inc. All rights reserved.
What We Do:




          © 2012 Datameer, Inc. All rights reserved.
Datameer Capabilities
Seamless Data Integration          Powerful Analytics                           Self-Service Dashboards

  Wizard-based integration           Interactive spreadsheet UI                      Drag and drop

  Structured, semi- and              Cleansing, transformation,                      Powerful visualizations
  unstructured                       analysis
                                                                                     Mash-up anything
  No complex mappings/schemas        Over 200 built-in functions
                                                                                     Integrate into existing portals
  Pluggable data integration API     Pluggable function API




                                                                   © 2012 Datameer, Inc. All rights reserved.
Data-Center Ready




                    © 2012 Datameer, Inc. All rights reserved.
Demo...




          © 2012 Datameer, Inc. All rights reserved.
Q/A




      © 2012 Datameer, Inc. All rights reserved.
Please Download Our
    Trial Edition!

www.datameer.com




              © 2012 Datameer, Inc. All rights reserved.

More Related Content

Similar to May 2012 HUG: The Changing Big Data Landscape

Similar to May 2012 HUG: The Changing Big Data Landscape (20)

Big data introduction
Big data introductionBig data introduction
Big data introduction
 
How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?How much money do you lose every time your ecommerce site goes down?
How much money do you lose every time your ecommerce site goes down?
 
Big Data in small words
Big Data in small wordsBig Data in small words
Big Data in small words
 
Next Generation Hadoop Introduction
Next Generation Hadoop IntroductionNext Generation Hadoop Introduction
Next Generation Hadoop Introduction
 
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
2014 feb 24_big_datacongress_hadoopsession2_moderndataarchitecture
 
2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda2014 feb 5_what_ishadoop_mda
2014 feb 5_what_ishadoop_mda
 
What is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMACWhat is Hadoop? Nov 20 2013 - IRMAC
What is Hadoop? Nov 20 2013 - IRMAC
 
2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop2013 Dec 9 Data Marketing 2013 - Hadoop
2013 Dec 9 Data Marketing 2013 - Hadoop
 
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
Deep Learning: Changing the Playing Field of Artificial Intelligence - MaRS G...
 
Decoding Data Science
Decoding Data ScienceDecoding Data Science
Decoding Data Science
 
Big data with Hadoop - Introduction
Big data with Hadoop - IntroductionBig data with Hadoop - Introduction
Big data with Hadoop - Introduction
 
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
Got Chaos? Extracting Business Intelligence from Email with Natural Language ...
 
IBM Watson-How it works
IBM Watson-How it worksIBM Watson-How it works
IBM Watson-How it works
 
Ibm watson - how it works, and what it means for society beyond winning jeo...
Ibm   watson - how it works, and what it means for society beyond winning jeo...Ibm   watson - how it works, and what it means for society beyond winning jeo...
Ibm watson - how it works, and what it means for society beyond winning jeo...
 
Watson how it works?
Watson how it works?Watson how it works?
Watson how it works?
 
Level Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentationLevel Seven - Expedient Big Data presentation
Level Seven - Expedient Big Data presentation
 
Why CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital masteryWhy CxOs care about Data Governance; the roadblock to digital mastery
Why CxOs care about Data Governance; the roadblock to digital mastery
 
Data mining with big data
Data mining with big dataData mining with big data
Data mining with big data
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your DataCloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
Cloudera Breakfast: Advanced Analytics Part II: Do More With Your Data
 

More from Yahoo Developer Network

Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Yahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Yahoo Developer Network
 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

May 2012 HUG: The Changing Big Data Landscape

  • 1. Just-In-Time Analytics Surfing the new big data landscape Self-Service Big Data Analytics for Hadoop Matt Schumpert © 2012 Datameer, Inc. All rights reserved.
  • 2. Agenda Backdrop Observations Solution Demo © 2012 Datameer, Inc. All rights reserved.
  • 3. Big Data Landscape Challenge Enablers Needs Dramatic data growth Low cost storage and CPUs Democratize data access Structured and unstructured data Disruptive new technologies Crowd-source insights Scale economically Availability of cloud infrastructure Just-In-Time Delivery Maintain agility Source: Forrester © 2012 Datameer, Inc. All rights reserved.
  • 4. Hadoop - A Disruptive Response Advantages Challenges Rapid Adoption Economics Raw technology, complexity Led by Yahoo, Facebook, etc Flexibility Requires significant resources Data-driven companies followed Scalability No packaged applications Fortune 500 rapidly deploying Goal Make Big Data analytics accessible to business users Shorten time-to-insight Seamless integration to all data types Low cost of ownership Demystify © 2012 Datameer, Inc. All rights reserved.
  • 5. Current State Volume problem was solved with MPP DBs • TCO sometimes lower than Hadoop Variety problem is tractable with Hadoop People still struggle with velocity Time-to-insight is too high Will business agility decline? © 2012 Datameer, Inc. All rights reserved.
  • 6. What We’re Surfing on... © 2012 Datameer, Inc. All rights reserved.
  • 7. Observations “The Wild Wild West” • We’re in a lawless era for data formats “Amateur Night” • People re-invent crooked wheels all over their pipelines “Open Mic Night” • Dealing with data that talks too much “Social Data Gold Rush” • A rush to judgement on social media data leads to silos © 2012 Datameer, Inc. All rights reserved.
  • 8. 1. “The Wild Wild West” ... JSON (Twitter, Facebook, MongoDB, etc) • Not always well-formed • Difficult to split raw (backtrack to what, ‘{‘ ?) Sequence Files • Metadata is completely open-ended • Triple-packed content (Flume JSON w/ compressed files, etc.) Raw • “the delimiter of the week” ‣ u0001 (Hive) ‣ Þ (DoubleClick) • Various text encoding schemes ‣ ISO-8859 vs. UTF-8 © 2012 Datameer, Inc. All rights reserved.
  • 9. 2. “Amateur Night...” Naive collection strategies • e.g. 1 file per record (Facebook user) • rudimentary use of batch requests / store-and-forward Naive ingestion strategies • e.g. per minute log ingestion with no compaction --> millions of small files • Partitioning for ease-of-ingestion, not analytics ‣ e.g. create files/keys/partitions by the server of origin Naive storage Strategies • Uncompressed, all-String storage of mostly numerical fields • Shimming compressed SEQ onto big compressed files --> not splittable • Mixing compression codecs with data formats (e.g. LzoTextInputFormat) © 2012 Datameer, Inc. All rights reserved.
  • 10. 3. “Open Mic Night” ... Data can be verbose • e.g. repeating key/value pairs Semi-structured is the norm Deep hierarchies that explode unexpectedly • Even beyond task JVM memory (too many friends/fans!) Low Signal-to-noise ratio Content in various languages • Makes sentiment analysis tricky © 2012 Datameer, Inc. All rights reserved.
  • 11. Example: FB Profile {"id":"10011666","name":"Test user","first_name":"Test","last_name":"user","link":"http://www.facebook.com/test.user","username":"test.user","birthday":"09/19/ 1983","hometown":{"id":"103102203064024","name":"West Chester, Pennsylvania"},"location":{"id":"","name":null},"bio":"I'm an honorary Sean Connery, born '83r nThere's only one of mernSingle-handedly raising the economyrnAin't no chance of the record company dropping mernPress be asking do I care for sodomyrnI don't know, yeah, probablyrnI've been looking for serial monogamyrnNot some bird that looks like Billy ConnollyrnBut for now I'm down for ornithologyrnGrab your binoculars, come follow me","quotes":"Normal is getting dressed in clothes that you buy for work and driving through traffic in a car that you are still paying for - in order to get to the job you need to pay for the clothes and the car, and the house you leave vacant all day so you can afford to live in it. -Ellen GoodmanrnrnThe entire economy of the Western world is built on things that cause cancer.-From the movie "Bliss"rnrnNever give a party if you will be the most interesting person there. -Mickey Friedmanrn rnAhhh. A man with a sharp wit. Someone ought to take it away from him before he cuts himself. -Peter da SilvarnrnNow it seems the music Industry's working on marketing ploys. I remember back when it wasn't about looks or color but about the voice. -Jay SeanrnrnWhy are you trying so hard to fit in, when you were born to stand out? -RandomrnrnI think if you're ready to go out with Johnny. Now's the time to tell him about your one month limit. He wont mind he'll apreciate your fresh look on dating. And once you've dated someone else you can date him again. I'm sure he'll like it. Everyone will appreciate it. You so novel what a good idea. You can keep your time to your self. You don't need date insurance.You can go out with whoever you want to. Every boy, every boy, in the whole world could be yours. If you'll just listen to my planr nTHE TEENAGE GUIDE TO POPULARITY -Nada SurfrnrnThe difference between now and the future is simply greater destruction and more universal chaos_-Stephen Hawking rnrnIn archaeology you uncover the unknown. In diplomacy you cover the known. -Thomas PickeringrnrnYou know the disease u get when u get married..Onegina -Russel PetersrnrnI saw you standing in my headlights. (Blink, blink, blink.)rnI thought I'd run you down for the weight you left on me.rnInstead I pushed rewind, reversed and drove away.rnAnd seeing you disappear in my rearview brought to me the wordrn'Reciprocity!' -IncubusrnrnFew people are capable of expressing with equanimity opinions which differ from the prejudices of their social environment. Most people are even incapable of forming such opinions. -Albert EinsteinrnrnNinety- eight percent of the adults in this country are decent, hard-working, honest Americans. It's the other lousy two percent that get all the publicity. But then--we elected them. -Lily TomlinrnrnWhen You Are Not Practicing, Remember: Someone Somewhere Is Practicing And When You Meet Him- He Will WinrnrnIf not I, who? If not here, where? If not now, when?rnrnAll that is necessary for evil to triumph is for good people to stand by and do nothing -UnknownrnrnWe are the people our parents warned us about. -Jimmy BuffettrnrnNever explain--your friends do not need it and your enemies will not believe you anyway. -Elbert HubbardrnrnMy definition of a free society is a society where it is safe to be unpopular. -Adlai E. Stevenson Jr.rnrnToo many have dispensed with generosity in order to practice charity. -Albert Camus","work": [{"employer":{"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position": {"id":"133619273341785","name":"Lead Programmer Analyst"},"start_date":"2012-01"},{"employer":{"id":"190876464341724","name":"Cardiac group"},"position": {"id":"105630109469647","name":"Executive Producer"},"description":"We create music for Artist Placement and TV/Film.","start_date":"2002-01"},{"employer": {"id":"6185812851","name":"American Express"},"location":{"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"116439401740213","name":"Senior Database Administrator"},"start_date":"2007-10","end_date":"2012-01"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location": {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"202489236428627","name":"Pharmacy IT Coordinator"},"start_date":"2005-10","end_date":"2007-10"},{"employer":{"id":"110067355684846","name":"Saint Joseph Hospital"},"location": {"id":"105540216147364","name":"Phoenix, Arizona"},"position":{"id":"144703015548786","name":"Pharmacy Tech"},"start_date":"2001-02","end_date":"2005-10"}],"sports":[{"id":"108606435830479","name":"Karate"}],"favorite_teams": [{"id":"87169796810","name":"Philadelphia Flyers"},{"id":"93625750491","name":"Philadelphia Phillies"},{"id":"45898408995","name":"Phoenix Suns"}, {"id":"120163518021430","name":"Philadelphia Eagles"}],"favorite_athletes":[{"id":"77922840249","name":"Steve Nash"},{"id":"105590659475179","name":"Wayne Gretzky"},{"id":"62975399193","name":"Michael Jordan"}],"inspirational_people":[{"id":"106676942701904","name":"Gandhi"}],"education":[{"school": {"id":"109324275761313","name":"Corona del Sol High School"},"type":"High School"},{"school":{"id":"23680344606","name":"Arizona State University"},"type":"College"}],"gender":"male","interested_in":["female"],"relationship_status":"Single","religion":"Hinduism (One with all things)","political":"Liberal (Left of Center)","email":"app+22c90gj. 9hh9d.f7304b58ac646e08b5f0f10a73547e34u0040proxymail.facebook.com","website":"www.slashdot.orgr nwww.gizmodo.com","timezone":-7,"locale":"en_US","languages":[{"id":"106059522759137","name":"English"}, {"id":"112969428713061","name":"Hindi"}],"verified":true,"updated_time":"2012-03-22T17:24:25+0000"} © 2012 Datameer, Inc. All rights reserved.
  • 12. Example: Email (MBOX) From common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org Thu Aug 20 14:02:59 2009 Return-Path: <common-user-return-16923-apmail-hadoop-common-user-archive=hadoop.apache.org@hadoop.apache.org> Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 83137 invoked from network); 20 Aug 2009 14:02:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 20 Aug 2009 14:02:58 -0000 Received: (qmail 23328 invoked by uid 500); 20 Aug 2009 14:03:14 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 23266 invoked by uid 500); 20 Aug 2009 14:03:14 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: <mailto:common-user-help@hadoop.apache.org> List-Unsubscribe: <mailto:common-user-unsubscribe@hadoop.apache.org> List-Post: <mailto:common-user@hadoop.apache.org> List-Id: <common-user.hadoop.apache.org> Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 23254 invoked by uid 99); 20 Aug 2009 14:03:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:14 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.219.209] (HELO mail-ew0-f209.google.com) (209.85.219.209) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Aug 2009 14:03:05 +0000 Received: by ewy5 with SMTP id 5so181532ewy.36 for <common-user@hadoop.apache.org>; Thu, 20 Aug 2009 07:02:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.39.85 with SMTP id c63mr1821542web.103.1250776964866; Thu, 20 Aug 2009 07:02:44 -0700 (PDT) In-Reply-To: <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com> References: <597eea000908191855v579b9c4r8baeb638630cfb27@mail.gmail.com> <e01b80590908192249s5302cd26m7984a32816c0d58c@mail.gmail.com> <597eea000908200209o176aefacjca2a45369301c296@mail.gmail.com> <e01b80590908200230x608ad35en5f372a9fd5aba325@mail.gmail.com> <597eea000908200259o8e3bd78l385059f2b5d31555@mail.gmail.com> Date: Thu, 20 Aug 2009 15:02:44 +0100 Message-ID: <ac79ea400908200702u309a4fcey9ab1a7b358f313ce@mail.gmail.com> Subject: Re: File Chunk to Map Thread Association From: Tom White <tom@cloudera.com> To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi Roman, Have a look at CombineFileInputFormat - it might be related to what you are trying to do. Cheers, Tom © 2012 Datameer, Inc. All rights reserved.
  • 13. What do we need? © 2012 Datameer, Inc. All rights reserved.
  • 14. Just-In-Time Supply Chain Slow Expensive Expertise ETL Data Warehouse Business Intelligence © 2012 Datameer, Inc. All rights reserved.
  • 15. Just-In-Time Supply Chain Slow Expensive Expertise ETL Data Warehouse Business Intelligence Fast Economical Self Service Spreadsheets+ drag ‘n drop Raw Load Hadoop “schema on read” © 2012 Datameer, Inc. All rights reserved.
  • 16. A “One Stop Shop” Compressing “Time-To-Insight” Fast Self Service Raw Load Spreadsheet Drag and Drop Visualization Hadoop Economical © 2012 Datameer, Inc. All rights reserved.
  • 17. What We Do: © 2012 Datameer, Inc. All rights reserved.
  • 18. Datameer Capabilities Seamless Data Integration Powerful Analytics Self-Service Dashboards Wizard-based integration Interactive spreadsheet UI Drag and drop Structured, semi- and Cleansing, transformation, Powerful visualizations unstructured analysis Mash-up anything No complex mappings/schemas Over 200 built-in functions Integrate into existing portals Pluggable data integration API Pluggable function API © 2012 Datameer, Inc. All rights reserved.
  • 19. Data-Center Ready © 2012 Datameer, Inc. All rights reserved.
  • 20. Demo... © 2012 Datameer, Inc. All rights reserved.
  • 21. Q/A © 2012 Datameer, Inc. All rights reserved.
  • 22. Please Download Our Trial Edition! www.datameer.com © 2012 Datameer, Inc. All rights reserved.

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n