The Road to Digital Transformation
Dell EMC Cloudera Syncsort ETL Offload Hadoop
Solution
December 2016
Armando Acosta
Dell EMC
Sean Anderson
Cloudera
Mark Muncy
Syncsort
Ted Arden
Dell EMC
Dell - Internal Use - Confidential3 of 123 of 22
The digital transformation will cause disruption
48%
don’t know what their
industry will look like
in 3 years
78%
feel threatened
by digital startups
45%
fear they may
become obsolete
in 3-5 years
Business leaders see a chaotic, uncertain future ahead
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
Dell - Internal Use - Confidential4 of 12
Businesses still have a huge opportunity
to get this right
73%
say a centralized
tech strategy needs
to be a priority
72%
plan to expand
their software
development
capabilities
66%
are incentivized
to invest in IT
infrastructure
and digital skills
leadership
This is how leaders plan to leap ahead
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
4 of 22
Dell - Internal Use - Confidential5 of 12
Leaders agreed the following digital business
attributes are imperatives to success
Source: Digital Transformation Index, October, 2016
Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are
transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
Predictively spot
new opportunities
Demonstrate
transparency
and trust
Deliver unique
and personalized
experiences
Innovate in
agile ways
Operate in
real time
Big Data and Analytics will
be at the core to enabling
all these attributes
5 of 22
Dell - Internal Use - Confidential6 of 12
Data-driven organizations are more effective
greater revenue growth
for businesses that
leverage data effectively
50%
But 44%
Become data-driven.
A journey begins with a single step.
Align IT / Business goals
Improve operational efficiency
Transform your organization
of organizations do
not know how to start…
Data from Dell Global Technology Adoption Index, November 2015
6 of 22
Dell - Internal Use - Confidential7 of 12
Align business and IT
Dell helps by
Utilizing ALL data to deliver deeper
insights and enhanced data-driven
decision making.
Organizational goals
.
.
. Empower end Users
Control costs
Improve outcomes
S
Reducing TCO and seamlessly
integrating with existing investments
to enable greater ROI
Providing secure anywhere, anytime
access to data and analytics for
improved productivity.
7 of 22
Ted Arden, Dell EMC
8 of 22
Dell - Internal Use - Confidential9 of 12
Traditional tools are not working
#1 Challenge
Organizations cite TCO as
biggest obstacle to data
integration tools
Dell accelerates time to
value by lowering data
transformation costs &
improve performance by
augmenting the Enterprise
Data Warehouse (EDW)
Dell EMC Cloudera Syncsort ETL Offload
Hadoop Solution reduces Hadoop
deployment to weeks, develop Hadoop
ETL jobs within hours, and become fully
productive within days
after deployment
of all Data Warehouses are performance
and capacity constrained
*Gartner
70%
Data integration and
transformation drive a
majority of the EDW capacity
80%
9 of 98
Dell - Internal Use - Confidential10 of 12
Too many workloads in the EDW
Modernize the data pipeline with Hadoop
Traditional data pipeline
Enterprise data warehouse + ETL
Data transformation jobs
Business reporting
Query
Data staging tool
Extract and load data
Clean and parse data
Disparate data
sources
The results
Longer data transformation
job times
Not meeting SLAs for
business reporting
Slow Ad Hoc Query
Too costly to scale
Perf
Capacity
10 of 98
Modern data pipeline
Enterprise data warehouse
Business reporting
Query
Hadoop + ETL
Data transformation jobs
Clean, parse, transform
Disparate data
sources
The results
Reduced data
transformation job times
Improved SLAs for
business reporting
Fast Ad Hoc Query
Scales Economically
Perf
Capacity
Dell - Internal Use - Confidential11 of 12
Customer value
Dell Services
Reference Architecture
ETL Offload
PE R730XD, Networking
Solution stack Components Customer value
Faster deployment
from months to weeks
Hadoop Distribution Cloudera 5.9
Data management
and security
Data Transformation
Syncsort
DMX-h version 9.1
Convert SQL jobs into
native Hadoop execution
Deployment
business application
Build operational
efficiency with Hadoop
No other vendor offers this solution
11 of 98
Dell - Internal Use - Confidential12 of 12
Dell data solutions drive operational efficiency
Reduce data warehouse
administrative costs up to 76%
Control
costs
Transform data 60% faster for analysis
Improve
productivity
Develop and design complex data
transformation jobs up to 54% faster
Simplify ongoing
operations
12 of 98
Dell - Internal Use - Confidential13 of 12
Dell EMC Cloudera Syncsort ETL offload Hadoop
Solution
Solution benefits
• Integrates easily with Hadoop®
• No coding necessary for easy deployment
• No need for expertise on Apache Pig™, Hive™,
and Sqoop™
• Closes the skills gap using Syncsort
Differentiation
• Reduces EDW admin costs up to 76%1
• Transforms data 60 percent faster for analysis2
• Designs transformation jobs up to 54% faster3
Primary use case: Scale out solution to optimize data management,
processing and analytics
Pod Network
2x Dell EMC Networking S4048 10GbE Pod Switches
1x S3124 iDRAC Switch
Data Nodes
10x Dell EMC PowerEdge R730xd with 3.5 Drives – 48 TB or
10x PowerEdge R730xd with 2.5” Drives – 24TB or 20x
PowerEdge FC630 / FD332 – 32 TB
Infrastructure Nodes
1x Dell EMC PowerEdge™ R630 Admin Node
3x PowerEdge R730xd Name Nodes
1x PowerEdge R730xd Edge Node or
1x PowerEdge FC630 Name Nodes Admin Node
3x PowerEdge FC630 Name Nodes
1x PowerEdge FC630 Edge Node
Cluster Network
2x Dell EMC Networking S6000 40GbE Cluster Switches
Cloudera ™ Enterprise
Syncsort™ DMX-h™
1Cost advantages report
2Performance advantages report
3Design advantages report
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42Stack-ID
LNK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
ACT
50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 51 53
Stack-ID
LNK
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
ACT
50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
49 51 53
120124
112116
104108
96100
8892
8084
7276
6468
5660
4852
4044
3236
2428
1620
812
04
Stack ID
120124
112116
104108
96100
8892
8084
7276
6468
5660
4852
4044
3236
2428
1620
812
04
Stack ID
Stack No.
1
2
25 26SFP+
3 5 7 9 11
4 6 8 10 12
13 15 17 19 21
14 16 18 20 22 24
LNK ACT1
2
23
LNK ACT
COMBO PORTS 23 24
6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17
6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17
6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17
13 of 22
Dell - Internal Use - Confidential14 of 12
Operational Efficiency: From use case to action
Source 1. Connect 3. Act2. Analyze
Preventive
Maintenance
IT Resource Capacity
and Unitization
Operational Process
Improvement
Business Process Cost
Optimization
Cyber Security
Analytics
Improved
Forecasting
Compliance and
Reporting
Operational data
sources
Extract, transform load Business reporting
and query
Enterprise data
warehouse
Enterprise data
warehouse
Relational
management database
Relational
Management database
Data mart Data mart
Services • Management • Infrastructure • Security • Dell Financial Services
Parse
Clean
Translate
Sort
Aggregate
Group
Compute
+ Data
14 of 22
Sean Anderson, Cloudera
15 of 22
16© Cloudera, Inc. All rights reserved.
Traditional Monolithic Analytic Databases
No Cloud Elasticity or Cloud
Storage Integration
Rigid Data Model with Tightly
Coupled Storage/Compute
Limited to SQL with Data
Movement Necessary
Static Sizing
∞
COMPUTE
STORE
17© Cloudera, Inc. All rights reserved.
Challenges Across the Business
Enterprise Architect
Existing Systems Hitting
Their Limits
• How long does it take
to bring in more
data/use cases? And
what would the cost
be?
• What is your process
for scaling today?
• What is your plan for
cloud?
Missed SLAs &
Overloaded Bottleneck
• How much time do you
spend
troubleshooting vs
developing new uses?
• How long does it take
to deliver on business
requests?
Limited Data & Insights
of Latent Value
• What limits on users,
data, and time period
exist?
• How long does it take
to get new
reports/data?
• Are you able to run
actionable real-time
analysis?
Meet Compliance
Needs & Protect Data
• How do you manage
siloed security &
governance across
workloads and
systems?
• Is sensitive data
available for analysis?
IT/DBA
Security Team &
Data Steward
SQL Developer &
Business Analyst
18© Cloudera, Inc. All rights reserved.
Cloudera’s Analytic Database Solution
Identify, offload, &
optimize workloads to
Hadoop
Navigator
Optimizer
Intelligent SQL editor
Hue
Audit, lineage,
encryption, key
management, & policy
lifecycles
Navigator
Integration with the
leading BI tools
BI Partners
Interactive query engine
for BI & SQL analytics
Impala
Large-scale ETL & batch
processing engine
Hive-on-
Spark
Multi-Storage, Multi-Environment
19© Cloudera, Inc. All rights reserved.
The DCC Rule
D C C
Complexity
Maximize your optimization
opportunities by exposing
complex access patterns that
make the best use of Hadoop’s
architecture
Compatibility
Reduce development time by
leveraging existing query
compatibilities with Hadoop
tools and get guidance for
query rewrites
Duplication
Improve performance by easily
detecting workload duplication
and recommending top queries
to optimize
20© Cloudera, Inc. All rights reserved.
Cloudera Navigator Optimizer
Unlock Your Best Hadoop Strategy, Instantly
Active Data
Optimization for
Hadoop to save you
time and money
• Instant workload insights
• Intelligent optimization
guidance
• Reduce Hadoop
workload development
effort
Mark Muncy, Syncsort
21 of 22
22 of 22
Goals of the Modern Data Architecture
• Centralize all your data
Collect raw data from every source from within the enterprise, regardless of
complexity. Only when you are able to collect and retain all your data, you can
see the full picture.
• Turn raw data into insight
Cleanse, blend and transform your data, give it context and meaning so decision
makers can execute.
• Maintain governance, compliance and security standards
Increase consistency and confidence in decision making by preserving the
confidentiality, integrity and availability of information. Protect data from
unauthenticated and unauthorized access.
• Eliminate complexities within IT
Your Modern Data Architecture should automate and optimize your data needs,
keep pace with the evolution of technology, and homogenize platforms and
infrastructures.
23Syncsort Confidential and Proprietary - do not copy or distribute
Shift Data and ELT Workloads out of Data Warehouses
24Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
25Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for new
compute frameworks, on
premise or in the cloud.
Simplify Big Data Integration with Syncsort
26Syncsort Confidential and Proprietary - do not copy or distribute
Access
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Access: Bring ALL Enterprise Data Securely to the Data Lake
• Collect virtually any data from mainframe
to relational, cloud and NoSQL sources
• Batch & streaming sources
• Access, re-format and load data directly
into Hive & Parquet. No staging required!
• Pull hundreds of tables at once into your
data hub, whole DB schemas in one
invocation
• Load more data into Hadoop in less time
27Syncsort Confidential and Proprietary - do not copy or distribute
Build Your Enterprise Data Hub
Access: Get Your Database data into Hadoop, At the Press of a Button
• Pull multiple data sources and funnel into your data lake --
extract and move whole DB schemas in one invocation
• One-step data movement, auto-generating jobs
• Process multiple funnels in parallel on your edge node or
from data nodes
‒ Leverages DMX-h high speed data engine via DTL
‒ Generated applications can be imported into GUI
• In-flight transformations
‒ Filtering, funnel dependency ordering, mixed source/target,
data type filtering, table exclusion/inclusion
28Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DataFunnel™
Simplify Big Data Integration with Syncsort
29Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Integrate: Achieve the Fastest Path from Raw Data to Insight
• Prepare data on-the-fly
• Load into Hadoop without staging
• Write directly into Big Data formats (Parquet, Hive, etc.)
• Connect fast to NoSQL databases (Cassandra, HBase, etc.)
• Cloud Connectivity: Amazon AWS, Google Cloud
Platform, Microsoft Azure
• Get the fastest, most efficient data joins and sorts
• Dynamic planning/optimization at runtime
• Create Tableau & Qlikview files with one click
• Fastest parallel loads to Amazon Redshift, Greenplum,
Netezza, Oracle, Teradata & Vertica
30Syncsort Confidential and Proprietary - do not copy or distribute
Feed Business Intelligence Visualization
A single tool for designing both
streaming and batch jobs
Integrate: Single Interface for Streaming & Batch
• Kafka, Spark, Apache Nifi, HDF
• Combine legacy batch and cutting edge
streaming data sources
• Easy development in GUI – no need to
write Scala, C or Java code
31Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Streaming Data Integration
Simplify Big Data Integration with Syncsort
32Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
Comply: Secure, Manage & Monitor Your Cluster
• Kerberos-secured clusters
– Authenticated browsing
– Authenticated sampling
• Apache Sentry security certified
• Cloudera Manager
– Deploy DMX-h across cluster
– Monitor DMX-h jobs
33Syncsort Confidential and Proprietary - do not copy or distribute
Comply: Get Governance, Metadata and Lineage
• Metadata and data lineage for Hive, Avro and
Parquet through HCatalog
• Metadata lineage export from DMX
– Simplify audits, analytics dashboards, metrics
– Integrate with enterprise metadata repositories
• Cloudera Navigator certified integration
– Extends HCatalog metadata
– HDFS, YARN, Spark and other metadata
– Lineage, tagging
– Business and structural metadata
34Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
35Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data
ingestion capabilities for
Hadoop. Mainframes,
RDBMS, MPP, JSON,
Parquet, Avro, ORC,
NoSQL, Kafka and more.
Single interface for
streaming and batch
processes. Single data
pipeline for all enterprise
data, batch or streaming.
Secure data access, data
governance and lineage.
Seamless integration
with Kerberos, Apache
Ranger, Apache Ambari,
Cloudera Manager,
Cloudera Navigator and
Sentry.
Design once, deploy
anywhere & insulate
your organization from
rapidly changing eco-
system. Future proof
your applications for new
compute frameworks, on
premise or in the cloud.
Simplify: Design Once, Deploy Anywhere
• Use existing ETL skills
• No need to worry about mappers, reducers, big side or small side of joins,
and so on
• Automatic optimization for best performance, load balancing, etc.
• No changes or tuning required, even if you change execution frameworks
• Future-proof job designs for emerging compute frameworks, e.g. Spark
Single GUI Execute Anywhere!
36Syncsort Confidential and Proprietary - do not copy or distribute
Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
Using the Dell | Cloudera | Syncsort solution for Hadoop, an entry-level technician developed and deployed Hadoop
ETL jobs in 53.7% less time than a Hadoop expert
Simplify: Reclaim days of valuable time
Fact dimension load
with type 2 SCD
Data validation and
pre-processing
Vendor mainframe
file integration
Load Validate Int.
Source: http://en.community.dell.com/techcenter/blueprints/m/resources
37Syncsort Confidential and Proprietary - do not copy or distribute
Cut Development Time in Half!
8.3 Days
3.8 Days
Thank You
38 of 22
The Path to Digital Transformation

The Path to Digital Transformation

  • 1.
    The Road toDigital Transformation Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution December 2016
  • 2.
    Armando Acosta Dell EMC SeanAnderson Cloudera Mark Muncy Syncsort Ted Arden Dell EMC
  • 3.
    Dell - InternalUse - Confidential3 of 123 of 22 The digital transformation will cause disruption 48% don’t know what their industry will look like in 3 years 78% feel threatened by digital startups 45% fear they may become obsolete in 3-5 years Business leaders see a chaotic, uncertain future ahead Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future.
  • 4.
    Dell - InternalUse - Confidential4 of 12 Businesses still have a huge opportunity to get this right 73% say a centralized tech strategy needs to be a priority 72% plan to expand their software development capabilities 66% are incentivized to invest in IT infrastructure and digital skills leadership This is how leaders plan to leap ahead Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future. 4 of 22
  • 5.
    Dell - InternalUse - Confidential5 of 12 Leaders agreed the following digital business attributes are imperatives to success Source: Digital Transformation Index, October, 2016 Research by Vanson Bourne & Dell Technologies exploring the implications of digital disruption around the world, how companies are transforming to meet changing customer demands and business leaders’ plans to succeed in the connected future. Predictively spot new opportunities Demonstrate transparency and trust Deliver unique and personalized experiences Innovate in agile ways Operate in real time Big Data and Analytics will be at the core to enabling all these attributes 5 of 22
  • 6.
    Dell - InternalUse - Confidential6 of 12 Data-driven organizations are more effective greater revenue growth for businesses that leverage data effectively 50% But 44% Become data-driven. A journey begins with a single step. Align IT / Business goals Improve operational efficiency Transform your organization of organizations do not know how to start… Data from Dell Global Technology Adoption Index, November 2015 6 of 22
  • 7.
    Dell - InternalUse - Confidential7 of 12 Align business and IT Dell helps by Utilizing ALL data to deliver deeper insights and enhanced data-driven decision making. Organizational goals . . . Empower end Users Control costs Improve outcomes S Reducing TCO and seamlessly integrating with existing investments to enable greater ROI Providing secure anywhere, anytime access to data and analytics for improved productivity. 7 of 22
  • 8.
    Ted Arden, DellEMC 8 of 22
  • 9.
    Dell - InternalUse - Confidential9 of 12 Traditional tools are not working #1 Challenge Organizations cite TCO as biggest obstacle to data integration tools Dell accelerates time to value by lowering data transformation costs & improve performance by augmenting the Enterprise Data Warehouse (EDW) Dell EMC Cloudera Syncsort ETL Offload Hadoop Solution reduces Hadoop deployment to weeks, develop Hadoop ETL jobs within hours, and become fully productive within days after deployment of all Data Warehouses are performance and capacity constrained *Gartner 70% Data integration and transformation drive a majority of the EDW capacity 80% 9 of 98
  • 10.
    Dell - InternalUse - Confidential10 of 12 Too many workloads in the EDW Modernize the data pipeline with Hadoop Traditional data pipeline Enterprise data warehouse + ETL Data transformation jobs Business reporting Query Data staging tool Extract and load data Clean and parse data Disparate data sources The results Longer data transformation job times Not meeting SLAs for business reporting Slow Ad Hoc Query Too costly to scale Perf Capacity 10 of 98 Modern data pipeline Enterprise data warehouse Business reporting Query Hadoop + ETL Data transformation jobs Clean, parse, transform Disparate data sources The results Reduced data transformation job times Improved SLAs for business reporting Fast Ad Hoc Query Scales Economically Perf Capacity
  • 11.
    Dell - InternalUse - Confidential11 of 12 Customer value Dell Services Reference Architecture ETL Offload PE R730XD, Networking Solution stack Components Customer value Faster deployment from months to weeks Hadoop Distribution Cloudera 5.9 Data management and security Data Transformation Syncsort DMX-h version 9.1 Convert SQL jobs into native Hadoop execution Deployment business application Build operational efficiency with Hadoop No other vendor offers this solution 11 of 98
  • 12.
    Dell - InternalUse - Confidential12 of 12 Dell data solutions drive operational efficiency Reduce data warehouse administrative costs up to 76% Control costs Transform data 60% faster for analysis Improve productivity Develop and design complex data transformation jobs up to 54% faster Simplify ongoing operations 12 of 98
  • 13.
    Dell - InternalUse - Confidential13 of 12 Dell EMC Cloudera Syncsort ETL offload Hadoop Solution Solution benefits • Integrates easily with Hadoop® • No coding necessary for easy deployment • No need for expertise on Apache Pig™, Hive™, and Sqoop™ • Closes the skills gap using Syncsort Differentiation • Reduces EDW admin costs up to 76%1 • Transforms data 60 percent faster for analysis2 • Designs transformation jobs up to 54% faster3 Primary use case: Scale out solution to optimize data management, processing and analytics Pod Network 2x Dell EMC Networking S4048 10GbE Pod Switches 1x S3124 iDRAC Switch Data Nodes 10x Dell EMC PowerEdge R730xd with 3.5 Drives – 48 TB or 10x PowerEdge R730xd with 2.5” Drives – 24TB or 20x PowerEdge FC630 / FD332 – 32 TB Infrastructure Nodes 1x Dell EMC PowerEdge™ R630 Admin Node 3x PowerEdge R730xd Name Nodes 1x PowerEdge R730xd Edge Node or 1x PowerEdge FC630 Name Nodes Admin Node 3x PowerEdge FC630 Name Nodes 1x PowerEdge FC630 Edge Node Cluster Network 2x Dell EMC Networking S6000 40GbE Cluster Switches Cloudera ™ Enterprise Syncsort™ DMX-h™ 1Cost advantages report 2Performance advantages report 3Design advantages report 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42Stack-ID LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 Stack-ID LNK 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 ACT 50 52 5433 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 51 53 120124 112116 104108 96100 8892 8084 7276 6468 5660 4852 4044 3236 2428 1620 812 04 Stack ID 120124 112116 104108 96100 8892 8084 7276 6468 5660 4852 4044 3236 2428 1620 812 04 Stack ID Stack No. 1 2 25 26SFP+ 3 5 7 9 11 4 6 8 10 12 13 15 17 19 21 14 16 18 20 22 24 LNK ACT1 2 23 LNK ACT COMBO PORTS 23 24 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 6 720 1 14 1512 1310 118 9 22 2320 2118 1916 17 13 of 22
  • 14.
    Dell - InternalUse - Confidential14 of 12 Operational Efficiency: From use case to action Source 1. Connect 3. Act2. Analyze Preventive Maintenance IT Resource Capacity and Unitization Operational Process Improvement Business Process Cost Optimization Cyber Security Analytics Improved Forecasting Compliance and Reporting Operational data sources Extract, transform load Business reporting and query Enterprise data warehouse Enterprise data warehouse Relational management database Relational Management database Data mart Data mart Services • Management • Infrastructure • Security • Dell Financial Services Parse Clean Translate Sort Aggregate Group Compute + Data 14 of 22
  • 15.
  • 16.
    16© Cloudera, Inc.All rights reserved. Traditional Monolithic Analytic Databases No Cloud Elasticity or Cloud Storage Integration Rigid Data Model with Tightly Coupled Storage/Compute Limited to SQL with Data Movement Necessary Static Sizing ∞ COMPUTE STORE
  • 17.
    17© Cloudera, Inc.All rights reserved. Challenges Across the Business Enterprise Architect Existing Systems Hitting Their Limits • How long does it take to bring in more data/use cases? And what would the cost be? • What is your process for scaling today? • What is your plan for cloud? Missed SLAs & Overloaded Bottleneck • How much time do you spend troubleshooting vs developing new uses? • How long does it take to deliver on business requests? Limited Data & Insights of Latent Value • What limits on users, data, and time period exist? • How long does it take to get new reports/data? • Are you able to run actionable real-time analysis? Meet Compliance Needs & Protect Data • How do you manage siloed security & governance across workloads and systems? • Is sensitive data available for analysis? IT/DBA Security Team & Data Steward SQL Developer & Business Analyst
  • 18.
    18© Cloudera, Inc.All rights reserved. Cloudera’s Analytic Database Solution Identify, offload, & optimize workloads to Hadoop Navigator Optimizer Intelligent SQL editor Hue Audit, lineage, encryption, key management, & policy lifecycles Navigator Integration with the leading BI tools BI Partners Interactive query engine for BI & SQL analytics Impala Large-scale ETL & batch processing engine Hive-on- Spark Multi-Storage, Multi-Environment
  • 19.
    19© Cloudera, Inc.All rights reserved. The DCC Rule D C C Complexity Maximize your optimization opportunities by exposing complex access patterns that make the best use of Hadoop’s architecture Compatibility Reduce development time by leveraging existing query compatibilities with Hadoop tools and get guidance for query rewrites Duplication Improve performance by easily detecting workload duplication and recommending top queries to optimize
  • 20.
    20© Cloudera, Inc.All rights reserved. Cloudera Navigator Optimizer Unlock Your Best Hadoop Strategy, Instantly Active Data Optimization for Hadoop to save you time and money • Instant workload insights • Intelligent optimization guidance • Reduce Hadoop workload development effort
  • 21.
  • 22.
  • 23.
    Goals of theModern Data Architecture • Centralize all your data Collect raw data from every source from within the enterprise, regardless of complexity. Only when you are able to collect and retain all your data, you can see the full picture. • Turn raw data into insight Cleanse, blend and transform your data, give it context and meaning so decision makers can execute. • Maintain governance, compliance and security standards Increase consistency and confidence in decision making by preserving the confidentiality, integrity and availability of information. Protect data from unauthenticated and unauthorized access. • Eliminate complexities within IT Your Modern Data Architecture should automate and optimize your data needs, keep pace with the evolution of technology, and homogenize platforms and infrastructures. 23Syncsort Confidential and Proprietary - do not copy or distribute
  • 24.
    Shift Data andELT Workloads out of Data Warehouses 24Syncsort Confidential and Proprietary - do not copy or distribute
  • 25.
    Simplify Big DataIntegration with Syncsort 25Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Simplify Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry. Design once, deploy anywhere & insulate your organization from rapidly changing eco- system. Future proof your applications for new compute frameworks, on premise or in the cloud.
  • 26.
    Simplify Big DataIntegration with Syncsort 26Syncsort Confidential and Proprietary - do not copy or distribute Access Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
  • 27.
    Access: Bring ALLEnterprise Data Securely to the Data Lake • Collect virtually any data from mainframe to relational, cloud and NoSQL sources • Batch & streaming sources • Access, re-format and load data directly into Hive & Parquet. No staging required! • Pull hundreds of tables at once into your data hub, whole DB schemas in one invocation • Load more data into Hadoop in less time 27Syncsort Confidential and Proprietary - do not copy or distribute Build Your Enterprise Data Hub
  • 28.
    Access: Get YourDatabase data into Hadoop, At the Press of a Button • Pull multiple data sources and funnel into your data lake -- extract and move whole DB schemas in one invocation • One-step data movement, auto-generating jobs • Process multiple funnels in parallel on your edge node or from data nodes ‒ Leverages DMX-h high speed data engine via DTL ‒ Generated applications can be imported into GUI • In-flight transformations ‒ Filtering, funnel dependency ordering, mixed source/target, data type filtering, table exclusion/inclusion 28Syncsort Confidential and Proprietary - do not copy or distribute DMX DataFunnel™
  • 29.
    Simplify Big DataIntegration with Syncsort 29Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
  • 30.
    Integrate: Achieve theFastest Path from Raw Data to Insight • Prepare data on-the-fly • Load into Hadoop without staging • Write directly into Big Data formats (Parquet, Hive, etc.) • Connect fast to NoSQL databases (Cassandra, HBase, etc.) • Cloud Connectivity: Amazon AWS, Google Cloud Platform, Microsoft Azure • Get the fastest, most efficient data joins and sorts • Dynamic planning/optimization at runtime • Create Tableau & Qlikview files with one click • Fastest parallel loads to Amazon Redshift, Greenplum, Netezza, Oracle, Teradata & Vertica 30Syncsort Confidential and Proprietary - do not copy or distribute Feed Business Intelligence Visualization
  • 31.
    A single toolfor designing both streaming and batch jobs Integrate: Single Interface for Streaming & Batch • Kafka, Spark, Apache Nifi, HDF • Combine legacy batch and cutting edge streaming data sources • Easy development in GUI – no need to write Scala, C or Java code 31Syncsort Confidential and Proprietary - do not copy or distribute Simplify Streaming Data Integration
  • 32.
    Simplify Big DataIntegration with Syncsort 32Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry.
  • 33.
    Comply: Secure, Manage& Monitor Your Cluster • Kerberos-secured clusters – Authenticated browsing – Authenticated sampling • Apache Sentry security certified • Cloudera Manager – Deploy DMX-h across cluster – Monitor DMX-h jobs 33Syncsort Confidential and Proprietary - do not copy or distribute
  • 34.
    Comply: Get Governance,Metadata and Lineage • Metadata and data lineage for Hive, Avro and Parquet through HCatalog • Metadata lineage export from DMX – Simplify audits, analytics dashboards, metrics – Integrate with enterprise metadata repositories • Cloudera Navigator certified integration – Extends HCatalog metadata – HDFS, YARN, Spark and other metadata – Lineage, tagging – Business and structural metadata 34Syncsort Confidential and Proprietary - do not copy or distribute
  • 35.
    Simplify Big DataIntegration with Syncsort 35Syncsort Confidential and Proprietary - do not copy or distribute Access Integrate Comply Simplify Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more. Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming. Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, Cloudera Manager, Cloudera Navigator and Sentry. Design once, deploy anywhere & insulate your organization from rapidly changing eco- system. Future proof your applications for new compute frameworks, on premise or in the cloud.
  • 36.
    Simplify: Design Once,Deploy Anywhere • Use existing ETL skills • No need to worry about mappers, reducers, big side or small side of joins, and so on • Automatic optimization for best performance, load balancing, etc. • No changes or tuning required, even if you change execution frameworks • Future-proof job designs for emerging compute frameworks, e.g. Spark Single GUI Execute Anywhere! 36Syncsort Confidential and Proprietary - do not copy or distribute Intelligent Execution - Insulate your organization from underlying complexities of Hadoop.
  • 37.
    Using the Dell| Cloudera | Syncsort solution for Hadoop, an entry-level technician developed and deployed Hadoop ETL jobs in 53.7% less time than a Hadoop expert Simplify: Reclaim days of valuable time Fact dimension load with type 2 SCD Data validation and pre-processing Vendor mainframe file integration Load Validate Int. Source: http://en.community.dell.com/techcenter/blueprints/m/resources 37Syncsort Confidential and Proprietary - do not copy or distribute Cut Development Time in Half! 8.3 Days 3.8 Days
  • 38.