2. 2
About Me
• Director of Big Data Engineering with Impetus
• Focus on Enterprise data architecture, Data platform solution
deployment, High Performance & Optimization
• Believer of “Data is the most important digital asset”
3. 4
Need For Hybrid Data Platform
• Mixed work-load scenarios on Hadoop
• Applications’ long-tail usage of data platforms
• Time-spent on data preparation than processing
• Time-spent on data movement
• Geo-centric data processing and provisioning requirements
• Cost effective solution options
• Untapped scale up and scale out capabilities of Cloud
• Limitations with a physical data center/platform setup
4. 5
Hybrid Data Platform
“Combination of on-premise physical data infrastructure with Cloud
based Big Data platform - to use as one extended, complementary,
scalable data infrastructure”
5. 6
Considerations
• Changes to current architecture
– Impact on on-premise infrastructure
– Impact on business processes
– Data availability and accessibility in the Cloud
• Impact on data exchange policy and procedures
– Data Characteristics – Data at rest & in-motion
– Geographical considerations
• Data Security
• Virtual Cloud Geo-Fencing, Cloud Boundaries
• Investment considerations
– Technology Choices, Maturity and Adoption
6. 7
Hybrid Data Platform Architecture
Databases
Other
Data
Sources
Sensitive
Data
Text Files,
Binary Files
SmartInterfaceLayer
Security&AccessControl
Hadoop
On Cloud
On-Premise
Hadoop
Landing Zone
On-Premise
Hadoop
Data Lake
Security&AccessControl
ApplicationInterfaces
Integration
Check-point
On-Prem/Cloud
3rd
Parties
Analytics
Data Scientists
Business
Data Acquisition
Layer
Data Integration
Layer
Data Provisioning
Layer
User Management
Access Audit and Control
Metadata Management
Data Security Management
BAR Management
DR Management
Workload Management
Key Management Master Data Management Data Quality Management Operations Management
Data Governance Layer
8. 9
Execution Workflow
S3
(Data Landing)
Payload
Organizer
Private, Secured
Tunnel
Transmission
Channel
Security Checks
Payload
Delivery
Cloud HSM
Identity &
Access
Management
Key Management
Service
Certificate
Manager
QuickSight
SNS
( Push Notification )
On-Premise
Hadoop
Data Lake
Private, Secured
Tunnel
Data Pipeline
SQS
( Queue Service )
RedShift
Data warehouse
Kinesis
EMR/MapReduce
9. 10
Data Exchange & Security
Cloud HSM
Identity &
Access
Management
Key Management
Service
Certificate
Manager
1
2
3
4
Data Center
Direct Connect
Secure Tunnel
VPC
On premise Data Center hosts Hadoop Cluster and has
connectivity established to the Cloud
1
Uses Direct Connect option to connect to the private
Cloud setup
2
Uses secured VPN tunnel to the dedicated Cloud setup
for data exchange3
Hadoop on Cloud setup connected with data center,
secured behind firewall and access restrictions
4
Role based access control, process execution privileges,
Identity management
5
5
10. 11
Benefits
• Comprehensive Solution Options
– Modular and complementary data management options
• Flexibility
– Meets dynamic business and technology demands
• Performance and Scalability
– Scale up and out
• Best of both worlds
– Play to platform’s strengths
• Economic$
– Hybrid model provides best of TCO and ROI
11. 12
Case Study
• One of the worlds
largest producer of
commodities, natural
ores, conventional and
unconventional energy
resources, with
suppliers and
consumers as end users
of data analytics
• Need to build an Hybrid
Data Analytics
Environment covering
areas such as
Productivity, Supply
Chain and Operations
• Data to be loaded in
less than 20 minutes
• Analytics queries to run
in less than 5-seconds
on 95% of the queries
• Highly available
environment with both
on-premise and Cloud
connectivity