Hadoop is changing the data warehousing paradigm, and the implications on system architectures. View webinar video recording and download this deck: http://www.senturus.com/resources/is-hadoop-the-demise-of-data-warehousing/.
In this presentation, we examine tools and technologies poised to dramatically alter the fabric of business intelligence in the next few years. We also take a detailed look at solutions to the following challenges: Exponential growth in data volumes and how data velocity is stretching the limits of traditional business intelligence tools and architectures. How organizations are having a hard time finding the right (and cost-effective) place to store new data types. And amidst all this, business users are growing ever more frustrated at not being able to get at the information they need to run their organizations.
Senturus, a business analytics consulting firm, has a resource library with hundreds of free recorded webinars, trainings, demos and unbiased product reviews. Take a look and share them with your colleagues and friends: http://www.senturus.com/resources/.
2. questions
here
Copyright 2014Senturus,Inc.
AllRightsReserved
This slide deck is part of a recorded webinar. To view the FREE recording of the entire presentation and download the slide deck go to
www.senturus.com/resources/is-hadoop-the-demise- of-data-warehousing/
Hear the Recording
3. Resource Library
Senturus’ whole purpose is to make you successful with Business Analytics. Thus, we offer a series of technology-neutral webinars, training on specific software, demonstrations, and no-holds-barred reviews of new software releases. We host dozens of live webinars every year and we offer a comprehensive library of recorded webinars, demos, white papers, presentations and case studies on our website--a wealth of learning resources. Most of our content is custom created and constantly updated, so visit us often to see what’s new in the industry.
www.senturus.com/resources/
3
Copyright 2014 Senturus, Inc. All Rights Reserved
4. John Peterson CEO & Co-Founder
Senturus
Today’s Presenter
4
With thanks to:
Guy Wilnai, Sujee Maniyam and Knowledge @ Senturus
5. •INTRODUCTION
•THEDATACHALLENGE
•WHATISHADOOP?
•ADVANTAGES& CHALLENGES
•IMPLICATIONS, PREDICTIONS& MISC. MUSINGS
•CONCLUSIONS
•Q&A
AGENDA
5
Copyright 2014 Senturus, Inc. All Rights Reserved
7. questions
here
Copyright 2014Senturus,Inc.
AllRightsReserved
Hear the Recording
This slide deck is part of a recorded webinar. To view the FREE recording of the entire presentation and download the slide deck go to
www.senturus.com/resources/is-hadoop-the-demise-of- data-warehousing/
Senturus’ comprehensive library of recorded webinars, demos, white papers, presentations and case studies is available on our website.
www.senturus.com
8. Our Team:
Business depth combined with technical expertise. Former CFOs, CIOs, Controllers, Directors, BI Managers
SENTURUS: BUSINESSANALYTICSCONSULTANTS
8
Copyright 2014 Senturus, Inc. All Rights Reserved
Business Intelligence
Enterprise Planning
Predictive Analytics
Creating Clarity from Chaos
9. •Former Head of BI/ Lead Architect –VISA
•Former Chief BI Architect –Jamba Juice
•Former Head of BI –Dole
•Former Chief BI Architect –Cisco
•Former Chief BI Architect –Central Garden & Pet
•Former Head of BI –Experian
•Former Head of BI –Robert Half International
•Former Head of Training (IBM Cognos, Southern California)
•Former Controller –The GAP
•Two former CFO’s
•Former Partner -PWC ($50million+ projects)
•Several former Vice Presidents of Marketing, Sales & Manufacturing/Supply Chain
•Several former COO’s
•Several former CIO’s
•Average experience = over 20 years
A FEWOFOURTEAMMEMBERS(FORMERROLES)
Deep & Pragmatic Experience
Copyright 2014 Senturus, Inc. All Rights Reserved.
9
10. 750+ CLIENTS, 1600+ PROJECTS, 13+ YEARS
Copyright 2014 Senturus, Inc. All Rights Reserved.
10
12. THECHALLENGES(ANDOPPORTUNITIES)
12Copyright 2014 Senturus, Inc. All Rights Reserved.
•Data volumes & velocity increasing exponentially
•Data types proliferating
•Rapid emergence of less structured (or unstructured) data sources
•Valueof Data increasing
•Traditional ETL is time-consuming and costly
•Traditional storage costs skyrocketing(not $/TB)
•Business users increasinglyfrustrated at not being able to get access to information
14. A WARNINGABOUTTODAY’SFOCUS
14
Copyright 2014 Senturus, Inc. All Rights Reserved.
ISABOUT:
Hadoopas a potential platform or tool for Business Analytics & DW
ISNOTABOUT:
Yet another “How Big Data will change the world” paradigm-shift prediction
17. questions
here
Copyright 2014Senturus,Inc.
AllRightsReserved
Hear the Recording
This slide deck is part of a recorded webinar. To view the FREE recording of the entire presentation and download the slide deck go to
www.senturus.com/resources/is-hadoop-the-demise-of- data-warehousing/
Senturus’ comprehensive library of recorded webinars, demos, white papers, presentations and case studies is available on our website.
www.senturus.com
19. WHATISHADOOPREALLY?
19
Copyright 2014 Senturus, Inc. All Rights Reserved.
Database Tables
•Hadoopis an open source distributed storage and processing framework
•Hadoopvs. RDBMS
System Tables
SQL Query Engine
Typical RDBMS
HDFS Files*
Hcatalog& YARN
Multiple Engines
HadoopStack
Storage
Metadata
Queries
*Raw data to highly structured
All layers combined in a proprietary bundle
All layers separate and independent allowing flexible access
22. HADOOPSTACKDISTRIBUTIONS
22
Copyright 2014 Senturus, Inc. All Rights Reserved.
Distribution
Open Source
Premium
Apache
Y
N
Cloudera
Y
Y
HortonWorks
Y
N
MapR
Y (?)
Y
Intel
N
Y
EMC GreenplumHD
N
Y
23. ADVANTAGESOFHADOOP(FORBI)
23
Copyright 2014 Senturus, Inc. All Rights Reserved.
•Dramatically lower cost
–50x to 100x (or more)
•Can store virtually any data type
•Can support multipleanalytic engines
•Massively scalable
–Both Size and Performance
–100’s of nodes, TB of RAM, PB of storage
•Open-source leads to rapid innovation
24. HADOOPOFFERSCOSTEFFECTIVESTORAGE
“A recent survey of large financial services firms, telecommunications carriers and retailers indicated that storing data in an RDBMS typically runs between $30,000 and $100,000 (USD) per TB per year in total costs”
---Clouderawhite paper
-Hadoopcan bring down the cost to ~$1,000 / TB
27. COSTCASESTUDY(TELECOM)
•The carrier’s previous data processing environment was costing $59 million (USD) each year to manage 1PB of data, broken down as follows:
–$2 million (USD) per year = storage for 1PB raw archive data on network-attached storage (NAS) at $2,000 per TB per year
–$55 million (USD) per year = management and backup of 1PB processed data on EDW at $55,000 per TB per year
–$2 million (USD) per year = administration costs calculated at $1,000 per TB per year
•Calculating costs for moving data processing onto Cloudera, the carrier reduced infrastructure costs to $5.1 million (USD) total
–$5 million (USD) per year = hardware, software and infrastructure for 1PB at $5,000 per TB per year
–$100,000 (USD) per year = administration costs calculated at $100 per TB per year
28. HADOOPCANSTOREANYDATATYPE
•Key-value pairs
•Text and binary data
•Structured
–Database records
•Semi-structured
–Sensor & Machine data
–Log files
•Un-structured
–Emails, tweets
“Set structure at query time”
Can retain atomic level data
29. ANALYTICSINHADOOP
•‘Batch’ or ‘offline’ analytics
–MapReducebased tools (java mapreduce, streaming, pig, hive)
–Have been there from the start, Well understood
•Fast Ad-Hoc querying
–New wave of processing, answer to MPP databases (Teradata .etc)
–Impala (Cloudera), stinger / Tez(Hortonworks), Shark on Spark (Apache)
•Streaming / Near-RealTimeworkloads
–Storm, Spark
–Propelled by YARN processing framework in Hadoop version 2.x
30. ANALYTICSINHADOOP(CONT.)
•BI Tools integration
–Rich BI tool integration
–Various levels of integration (basic, native, high-speed)
–Lots of vendors : Datameer, Pentaho, Tableau, QlikView, IBM Cognos…
•NOSQL store
–Find data very quickly (milliseconds, just like a traditional database)
–Hbase
•Statistical Tools
–R
•And, of course, the old favorite
–SQL
–Example: InfiniDB(Calpont)
31. CHALLENGESOFHADOOP
31
Copyright 2014 Senturus, Inc. All Rights Reserved.
•Everything is very NEW
•Playing field is changing DAILY
–The Wild West
•Tools still in v1.0 mode (at best)
•Does not eliminate the need for dimensional modeling
•Security TBD
•No “standard”(winners) declared yet
•Lots of roughedges still
•Simple things, like surrogate keys…
32. A DIZZYINGFIELDOFPLAYERS
•Alpine Data Labs, San Mateo, CA.
•Cloudera, Palo Alto, CA.
•Concurrent, San Francisco, CA.
•Continuum Analytics, Austin, TX.
•Continuuity, Palo Alto, CA.
•Couchbase, Mountain View, CA.
•Datameer, San Mateo, CA.
•DataSift, San Francisco, CA.
•DataStax, San Francisco, CA.
•DataXu, Boston, MA.
•Enigma, New York, NY.
•Factual, Los Angeles, CA.
•GoodData, San Francisco, CA.
•Gravity, New York, NY.
•Guavus, San Mateo, CA.
•Hadapt, Cambridge, MA
•Hopper, Cambridge, MA.
•Hortonworks, Palo Alto, CA.
•KarmaSphere, Cupertino, CA
•Lattice Engines, San Mateo, CA.
•MapRTechnologies, San Jose, CA.
•MemSQL, New York, NY.
•Mortar Data, New York, NY.
•Mu Sigma, Northbrook, IL + India.
•Neo Technology, San Mateo, CA
•Opera Solutions, San Diego, CA + India.
•ParAccel, Campbell, CA.
•Pivotal Software, Palo Alto, CA
•Platfora:, San Mateo, CA.
•RainStor, San Francisco, CA.
•Rocket Fuel, Redwood City, CA.
•SiSense, Redwood Shores, CA and Israel.
•Skytree, Atlanta, GA.
•Splice Machine, San Francisco, CA.
•Splunk, San Francisco, CA
•Statwing, San Francisco, CA.
•SumAll, New York, NY.
•Talend, Los Altos, CA.
•WibiData, San Francisco, CA.
•Zettaset, Mountain View, CA
•Zoomdata, Reston, VA.
•10gen, New York, NY
•1010data, New York, NY.
32
Copyright 2014 Senturus, Inc. All Rights Reserved.
Partial snapshopas of May 2014
34. questionshereCopyright 2014Senturus,Inc.AllRightsReserved
Hear the Recording
This slide deck is part of a recorded webinar. To view the FREE recording of the entire presentation and download the slide deck go to
www.senturus.com/resources/is-hadoop-the-demise-of- data-warehousing/
Senturus’ comprehensive library of recorded webinars, demos, white papers, presentations and case studies is available on our website.
www.senturus.com
35. IMPLICATIONS, PREDICTIONS& MUSINGS
35
Copyright 2014 Senturus, Inc. All Rights Reserved.
•Hadoopas a Data Stagingenvironment
•Hadoopas an Archive
•Hadoopas the Data Warehouse
–“Enterprise Data Hub”
•Future role of RDBMS’s??
–For OLTP
–For Data Warehouse
•How much Transformationand where?
36. TYPICAL“BESTPRACTICES” BI ARCHITECTUREINTEGRATEDBUSINESSPROCESSDIMENSIONALMODELSWITHMETADATALAYER(S)
36
Copyright 2014 Senturus, Inc. All Rights Reserved. ERP Data
CRM Data
Data Integration
Conforming
Business Process
Dimensional Models
Standard
Reports Web Portal Other Sources
Information Security
Data Warehouse
Data Abstraction Model
Ad hoc Querying
Planning Data Slicing & DicingDashboard Authoring
Report Authoring
Dashboards/
Scorecards
Source Systems of Record
Threshold
Alerting
Self-service Reporting
& Analysis
Single Version of the TruthThreshold-basedAlerts
37. POTENTIALBI ARCHITECTUREUSINGHADOOPINTEGRATEDBUSINESSPROCESSDIMENSIONALMODELSWITHMETADATALAYER(S)
37
Copyright 2014 Senturus, Inc. All Rights Reserved.
ERP Data
CRM Data
Data Integration
Conforming
Business Process
Dimensional ModelsStandardReports
Web Portal
Other Sources
Information Security
Data Warehouse
Data Abstraction Model
Ad hoc Querying
Planning Data Slicing & Dicing
Dashboard Authoring
Report Authoring
Dashboards/
Scorecards
Source Systems of Record
Threshold
Alerting
Self-service Reporting& AnalysisSingle Version of the Truth
Threshold-based
Alerts
HadoopData Staging
38. IMPLICATIONS, PREDICTIONS& MUSINGS(CONT.)
38
Copyright 2014 Senturus, Inc. All Rights Reserved.
•What have I got to learn?
–MapReduce= No
–Hand-coding = No
–Scoop = Maybe
–SQL = YES
•Role of Existing Tools going forward
–ETL
–BI Front-ends
•Role of DW Appliances?
–HANA
–IBM PureDataSystem (formerly Netezza), etc.
39. IMPLICATIONS, PREDICTIONS& MUSINGS(CONT.)
39
Copyright 2014 Senturus, Inc. All Rights Reserved.
•What is the impact on end-users seeking information?
•We still need:
–Data delivered in business user-friendly state
–Rich, relevant and conformingdimensions
–Ability to account for dimension changes over time
–Good performance(transformation and aggregation)
–Ability to integratewith existing systems
42. JP’SCONCLUSION#3
42
Copyright 2014 Senturus, Inc. All Rights Reserved.
DW Architectures & Technologies
are in a huge state of fluxBut…
DW Principlesstill apply
44. •Cloudera& Ralph Kimball
–Best Practices for the HadoopData Warehouse: EDW 101 for HadoopProfessionals
–http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/ best-practices-for-the-hadoop-data-warehouse-video.html
–Building a HadoopData Warehouse: Hadoop101 for EDW Professionals
–http://www.cloudera.com/content/cloudera/en/resources/library/recordedwebinar/ building-a-hadoop-data-warehouse-video.html
•MapR& Jack Norris
–How (and Why) Hadoopis Changing the Data Warehousing Paradigm
–http://tdwi.org/articles/2013/08/13/hadoop-changing-dw-paradigm.aspx
•HortonWorks
–http://hortonworks.com/hadoop/
•Senturus.com
–http://senturus.com/resources/
–jpeterson@senturus.comor jfrazier@senturus.com
ADDITIONALRESOURCES
44
Copyright 2014 Senturus, Inc. All Rights Reserved
Contact us for help on a POC
46. More Information on www.senturus.com
Copyright 2014 Senturus, Inc. All Rights Reserved
46
47. questions
hereCopyright 2014Senturus,Inc.
AllRightsReserved
Hear the Recording
This slide deck is part of a recorded webinar. To view the FREE recording of the entire presentation and download the slide deck go to
http://www.senturus.com/resources/is-hadoop-the- demise-of-data-warehousing/
Senturus’ comprehensive library of recorded webinars, demos, white papers, presentations and case studies is available on our website.
www.senturus.com