SlideShare a Scribd company logo
Modern Data Warehouse
Stephen Alex
BI & Big Data Architect
AGENDA
 History and Milestones
 Traditional Data Warehouse
 Key trends breaking the traditional data warehouse
 Modern Data Warehouse
 Multiple parallel processing (MPP) architecture
 Hadoop Ecosystem
 Technical Innovation on Hadoop
 Big Data Value Assessment
2Rolta AdvizeX Confidential & Proprietary 9/11/2016
History and Milestones
 1970’s: Relational Model Invented
 1984: DB2 released, RDBMS declared mainstream
 1990: RDBMS takes over
3Rolta AdvizeX Confidential & Proprietary 9/11/2016
The Traditional Data Warehouse
 Central repository for all internal data in a
company.
 Overall relational schema.
 The predictable data structure and quality
optimized processing and reporting.
 Data is in disk block formatting
 Fundamental operation is read a row
 Indexing via B-trees
 Dynamic row-level locking
 Data transfer usually EOD
4
Key Trends Breaking The Traditional Data Warehouse
5
Key Related Business and IT Trends
 Emerging Technologies are disruptive by nature and play a
key role in driving digital business and the related business
trends.
 Business Ecosystems enable each of the business trends,
and organizations are aggressively searching for ways to
leverage the role they play in the business ecosystem
 Business Moments provide opportunities to capture value
by setting in motion a series of events and actions involving a
network of people, businesses and things that spans or
crosses multiple industries and business ecosystems.
 Digital Economics seeks to harvest value from across the
business ecosystem by identifying business moments of
opportunity and exploiting the economics of connections.
This early-stage trend will have increasing importance as
business models evolve to leverage algorithmic business.
 Algorithmic Business propels organizations to leverage
business algorithms to drive value in the business
ecosystem. In this early-stage trend, we are starting to see
organizations transforming data with algorithms to drive
intelligent actions, particularly with the IoT.
6
The Risks of Bottlenecks in Data Movement
7
Hadoop Changes the Game
 Storage and Compute on One Platform
8
Modern Data Warehouse
9
 Incorporates Hadoop, traditional data
warehouses, and other data stores.
 Includes multiple repositories may
reside in different locations.
 Includes Data from cloud, mobile
devices, sensors, and the Internet of
Things
 Includes structured/semi-
structured/unstructured, raw data
 Inexpensive commodity hardware in
cluster mode
Multiple parallel processing (MPP) architecture
 Multiple parallel processing (MPP)
architecture enables extremely powerful
distributed computing and scale
 Resources can be added for a near linear
scale-out to the largest data warehousing
projects.
 MPP architecture uses a “shared-nothing”
There are multiple physical nodes, each
running its own instance. This results in
performance many times faster than
traditional architectures.
10
Apache Hadoop Ecosystem
 Hadoop ecosystem
components as part of
Apache Software
Foundation projects.
 The components are
categorized into file
system and data store,
serialization, job
execution, and others as
shown on the image.
11
Hadoop / BDD Ecosystem
Technology Purpose
Hadoop Distributed
File System
Distributed file system that provides high-throughput access to application data. Data is
split into blocks and distributed across multiple nodes in the cluster
Hadoop YARN Framework for job scheduling/monitoring and cluster resource management
Hive Facilitates ad hoc queries over data stored in HDFS. Uses HiveQL which is a SQL-like
language. Provides a relational view of data stored in HDFS.
HCatalog Hcatalog (aka Hive Metastore) provides a table and storage management layer for Hadoop
Spark Spark Powers a stack of high-level tools including Spark SQL, MLlib for machine learning,
GraphX, and Spark Streaming
Pig Pig is a high level platform for creating MapReduce programs. BDD uses Pig to manipulate
data prior to ingesting via data processing.
Technology Purpose
Oozie Oozie is the workflow scheduler system to manage Apache Hadoop jobs. BDD
uses Oozie for workflow management (sampling, profiling, enrichment).
Sqoop Tool for efficiently transferring bulk data between Hadoop and structured
datastores such a relational database
Flume Tool for efficiently collecting, aggregating and moving large amounts of streaming
data into the HDFS
ZooKeeper Zookeeper is a centralized service for maintaining configuration information,
naming, providing distributed synchronization, and providing group services
Hue Hue is a set of web applications that enable you to interact with CDH cluster.
Hadoop / BDD Ecosystem
Top Three Hadoop Vendors
14
Oracle BDD Technical Innovation on Hadoop
15
Key Features and Functionality:
Find
• Access a rich, interactive catalog of all data in Hadoop
• Use familiar search and guided navigation to find information quickly
• See data set summaries, user annotation and recommendations
• Provision personal and enterprise data to Hadoop via self-service
Explore
• Visualize all attributes by type
• Sort attributes by information potential
• Assess attribute statistics, data quality and outliers
• Use a scratch pad to uncover correlations between attributes
Transform
• Get the data ready for analytics via Intuitive, user driven data wrangling
• Leverage an extensive library of data transformations and enrichments
• Preview results, undo, commit and replay transforms
• Test on sample data in memory then apply to full data set in Hadoop
Discover
• Join and blend data for deeper perspectives
• Compose project pages via drag and drop
• Use powerful search and guided navigation to ask questions
• See new patterns in rich, interactive data visualizations
Share
• Share projects, bookmarks and snapshots with others
• Build galleries and tell Big Data stories
• Collaborate and iterate as a team
• Publish blended data to HDFS for leverage in other tools
Components of Big Data Discovery
16
Big Data Value Assessment
17
Descriptive analytics looks at past performance and understands that
performance by mining historical data to look for the reasons behind past
success or failure and that is the traditional BI work.
Predictive analytics answers the question what will happen. This is when
historical performance data is combined with rules, algorithms, and external
data to determine the probable future outcome of an event or the likelihood
of a situation occurring.
Prescriptive analytics not only anticipates what will happen and when it will
happen, but also why it will happen.
Basic Analytics
Advanced Analytics
Prescriptive
Predictive
Descriptive
Thank You!!!
Stephen Alex
BI & Big Data Architect
(732) 485-0011(m)
9/11/201618
Rolta AdvizeX Proprietary and Confidential

More Related Content

What's hot

Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyonddatasalt
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKRajesh Jayarman
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
Muh Saleh
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
Febiyan Rachman
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
GeekNightHyderabad
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
Mahmoud Yassin
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
saisreealekhya
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
Robert Chong
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
Maryan Faryna
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Tyrone Systems
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
Abdullah Çetin ÇAVDAR
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
faizrashid1995
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingm_hepburn
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
Multisoft Virtual Academy
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
SpringPeople
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
Praveen Hanchinal
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
Trendwise Analytics
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
Xpand IT
 

What's hot (20)

Big data, map reduce and beyond
Big data, map reduce and beyondBig data, map reduce and beyond
Big data, map reduce and beyond
 
Big Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RKBig Data Practice_Planning_steps_RK
Big Data Practice_Planning_steps_RK
 
Analysis of big data in pandemic case
Analysis of big data in pandemic case Analysis of big data in pandemic case
Analysis of big data in pandemic case
 
Introduction to Big Data and Hadoop
Introduction to Big Data and HadoopIntroduction to Big Data and Hadoop
Introduction to Big Data and Hadoop
 
Building a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's PerspectiveBuilding a Data Lake - An App Dev's Perspective
Building a Data Lake - An App Dev's Perspective
 
Big data analytics, survey r.nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
 
Big data introduction, Hadoop in details
Big data introduction, Hadoop in detailsBig data introduction, Hadoop in details
Big data introduction, Hadoop in details
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
A Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - IntroductionA Glimpse of Bigdata - Introduction
A Glimpse of Bigdata - Introduction
 
Designing the Next Generation Data Lake
Designing the Next Generation Data LakeDesigning the Next Generation Data Lake
Designing the Next Generation Data Lake
 
Hadoop core concepts
Hadoop core conceptsHadoop core concepts
Hadoop core concepts
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Tech Stack
Big Data Tech StackBig Data Tech Stack
Big Data Tech Stack
 
Big data and apache hadoop adoption
Big data and apache hadoop adoptionBig data and apache hadoop adoption
Big data and apache hadoop adoption
 
Apache hadoop bigdata-in-banking
Apache hadoop bigdata-in-bankingApache hadoop bigdata-in-banking
Apache hadoop bigdata-in-banking
 
Exploring Big Data Analytics Tools
Exploring Big Data Analytics ToolsExploring Big Data Analytics Tools
Exploring Big Data Analytics Tools
 
Introduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeopleIntroduction To Big Data Analytics On Hadoop - SpringPeople
Introduction To Big Data Analytics On Hadoop - SpringPeople
 
Big Data simplified
Big Data simplifiedBig Data simplified
Big Data simplified
 
Hadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and MoreHadoop,Big Data Analytics and More
Hadoop,Big Data Analytics and More
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 

Similar to Modern data warehouse

Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
vhrocca
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
Cognizant
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
Supratim Ray
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
AshishRathore72
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
AgnihotriGhosh2
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsJane Roberts
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
GeeksLab Odessa
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Hortonworks
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
POSSCON
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
nallagangus
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
Editor IJCATR
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
Attunity
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business IntelligenceHGanesh
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Innovative Management Services
 
Big data
Big dataBig data
Big data
Mohamed Salman
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
MongoDB
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
Cloudera, Inc.
 

Similar to Modern data warehouse (20)

Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Infrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical WorkloadsInfrastructure Considerations for Analytical Workloads
Infrastructure Considerations for Analytical Workloads
 
Hadoop data-lake-white-paper
Hadoop data-lake-white-paperHadoop data-lake-white-paper
Hadoop data-lake-white-paper
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
Comparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and sparkComparison among rdbms, hadoop and spark
Comparison among rdbms, hadoop and spark
 
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRobertsWP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
 
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
FOSS Sea 2014_DataWarehouse & BigData_Владимир Слободянюк ( Luxoft)
 
paper
paperpaper
paper
 
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big DataCombine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
Combine Apache Hadoop and Elasticsearch to Get the Most of Your Big Data
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Unstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus ModelUnstructured Datasets Analysis: Thesaurus Model
Unstructured Datasets Analysis: Thesaurus Model
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Haddop in Business Intelligence
Haddop in Business IntelligenceHaddop in Business Intelligence
Haddop in Business Intelligence
 
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
Open-BDA Hadoop Summit 2014 - Mr. Slim Baltagi (Building a Modern Data Archit...
 
Big data
Big dataBig data
Big data
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 
Better Together: The New Data Management Orchestra
Better Together: The New Data Management OrchestraBetter Together: The New Data Management Orchestra
Better Together: The New Data Management Orchestra
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Modern data warehouse

  • 1. Modern Data Warehouse Stephen Alex BI & Big Data Architect
  • 2. AGENDA  History and Milestones  Traditional Data Warehouse  Key trends breaking the traditional data warehouse  Modern Data Warehouse  Multiple parallel processing (MPP) architecture  Hadoop Ecosystem  Technical Innovation on Hadoop  Big Data Value Assessment 2Rolta AdvizeX Confidential & Proprietary 9/11/2016
  • 3. History and Milestones  1970’s: Relational Model Invented  1984: DB2 released, RDBMS declared mainstream  1990: RDBMS takes over 3Rolta AdvizeX Confidential & Proprietary 9/11/2016
  • 4. The Traditional Data Warehouse  Central repository for all internal data in a company.  Overall relational schema.  The predictable data structure and quality optimized processing and reporting.  Data is in disk block formatting  Fundamental operation is read a row  Indexing via B-trees  Dynamic row-level locking  Data transfer usually EOD 4
  • 5. Key Trends Breaking The Traditional Data Warehouse 5
  • 6. Key Related Business and IT Trends  Emerging Technologies are disruptive by nature and play a key role in driving digital business and the related business trends.  Business Ecosystems enable each of the business trends, and organizations are aggressively searching for ways to leverage the role they play in the business ecosystem  Business Moments provide opportunities to capture value by setting in motion a series of events and actions involving a network of people, businesses and things that spans or crosses multiple industries and business ecosystems.  Digital Economics seeks to harvest value from across the business ecosystem by identifying business moments of opportunity and exploiting the economics of connections. This early-stage trend will have increasing importance as business models evolve to leverage algorithmic business.  Algorithmic Business propels organizations to leverage business algorithms to drive value in the business ecosystem. In this early-stage trend, we are starting to see organizations transforming data with algorithms to drive intelligent actions, particularly with the IoT. 6
  • 7. The Risks of Bottlenecks in Data Movement 7
  • 8. Hadoop Changes the Game  Storage and Compute on One Platform 8
  • 9. Modern Data Warehouse 9  Incorporates Hadoop, traditional data warehouses, and other data stores.  Includes multiple repositories may reside in different locations.  Includes Data from cloud, mobile devices, sensors, and the Internet of Things  Includes structured/semi- structured/unstructured, raw data  Inexpensive commodity hardware in cluster mode
  • 10. Multiple parallel processing (MPP) architecture  Multiple parallel processing (MPP) architecture enables extremely powerful distributed computing and scale  Resources can be added for a near linear scale-out to the largest data warehousing projects.  MPP architecture uses a “shared-nothing” There are multiple physical nodes, each running its own instance. This results in performance many times faster than traditional architectures. 10
  • 11. Apache Hadoop Ecosystem  Hadoop ecosystem components as part of Apache Software Foundation projects.  The components are categorized into file system and data store, serialization, job execution, and others as shown on the image. 11
  • 12. Hadoop / BDD Ecosystem Technology Purpose Hadoop Distributed File System Distributed file system that provides high-throughput access to application data. Data is split into blocks and distributed across multiple nodes in the cluster Hadoop YARN Framework for job scheduling/monitoring and cluster resource management Hive Facilitates ad hoc queries over data stored in HDFS. Uses HiveQL which is a SQL-like language. Provides a relational view of data stored in HDFS. HCatalog Hcatalog (aka Hive Metastore) provides a table and storage management layer for Hadoop Spark Spark Powers a stack of high-level tools including Spark SQL, MLlib for machine learning, GraphX, and Spark Streaming Pig Pig is a high level platform for creating MapReduce programs. BDD uses Pig to manipulate data prior to ingesting via data processing.
  • 13. Technology Purpose Oozie Oozie is the workflow scheduler system to manage Apache Hadoop jobs. BDD uses Oozie for workflow management (sampling, profiling, enrichment). Sqoop Tool for efficiently transferring bulk data between Hadoop and structured datastores such a relational database Flume Tool for efficiently collecting, aggregating and moving large amounts of streaming data into the HDFS ZooKeeper Zookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services Hue Hue is a set of web applications that enable you to interact with CDH cluster. Hadoop / BDD Ecosystem
  • 14. Top Three Hadoop Vendors 14
  • 15. Oracle BDD Technical Innovation on Hadoop 15 Key Features and Functionality: Find • Access a rich, interactive catalog of all data in Hadoop • Use familiar search and guided navigation to find information quickly • See data set summaries, user annotation and recommendations • Provision personal and enterprise data to Hadoop via self-service Explore • Visualize all attributes by type • Sort attributes by information potential • Assess attribute statistics, data quality and outliers • Use a scratch pad to uncover correlations between attributes Transform • Get the data ready for analytics via Intuitive, user driven data wrangling • Leverage an extensive library of data transformations and enrichments • Preview results, undo, commit and replay transforms • Test on sample data in memory then apply to full data set in Hadoop Discover • Join and blend data for deeper perspectives • Compose project pages via drag and drop • Use powerful search and guided navigation to ask questions • See new patterns in rich, interactive data visualizations Share • Share projects, bookmarks and snapshots with others • Build galleries and tell Big Data stories • Collaborate and iterate as a team • Publish blended data to HDFS for leverage in other tools
  • 16. Components of Big Data Discovery 16
  • 17. Big Data Value Assessment 17 Descriptive analytics looks at past performance and understands that performance by mining historical data to look for the reasons behind past success or failure and that is the traditional BI work. Predictive analytics answers the question what will happen. This is when historical performance data is combined with rules, algorithms, and external data to determine the probable future outcome of an event or the likelihood of a situation occurring. Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Basic Analytics Advanced Analytics Prescriptive Predictive Descriptive
  • 18. Thank You!!! Stephen Alex BI & Big Data Architect (732) 485-0011(m) 9/11/201618 Rolta AdvizeX Proprietary and Confidential