Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Hybrid Data Platform


Published on

Hybrid Data Platform

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Hybrid Data Platform

  1. 1. 1 Shankar Radhakrishnan Impetus Hybrid Data Platform Cloud Environment Connected with On-Premise Data Environment
  2. 2. 2 About Me • Director of Big Data Engineering with Impetus • Focus on Enterprise data architecture, Data platform solution deployment, High Performance & Optimization • Believer of “Data is the most important digital asset”
  3. 3. 4 Need For Hybrid Data Platform • Mixed work-load scenarios on Hadoop • Applications’ long-tail usage of data platforms • Time-spent on data preparation than processing • Time-spent on data movement • Geo-centric data processing and provisioning requirements • Cost effective solution options • Untapped scale up and scale out capabilities of Cloud • Limitations with a physical data center/platform setup
  4. 4. 5 Hybrid Data Platform “Combination of on-premise physical data infrastructure with Cloud based Big Data platform - to use as one extended, complementary, scalable data infrastructure”
  5. 5. 6 Considerations • Changes to current architecture – Impact on on-premise infrastructure – Impact on business processes – Data availability and accessibility in the Cloud • Impact on data exchange policy and procedures – Data Characteristics – Data at rest & in-motion – Geographical considerations • Data Security • Virtual Cloud Geo-Fencing, Cloud Boundaries • Investment considerations – Technology Choices, Maturity and Adoption
  6. 6. 7 Hybrid Data Platform Architecture Databases Other Data Sources Sensitive Data Text Files, Binary Files SmartInterfaceLayer Security&AccessControl Hadoop On Cloud On-Premise Hadoop Landing Zone On-Premise Hadoop Data Lake Security&AccessControl ApplicationInterfaces Integration Check-point On-Prem/Cloud 3rd Parties Analytics Data Scientists Business Data Acquisition Layer Data Integration Layer Data Provisioning Layer User Management Access Audit and Control Metadata Management Data Security Management BAR Management DR Management Workload Management Key Management Master Data Management Data Quality Management Operations Management Data Governance Layer
  7. 7. 8 Data Integration Hadoop On CloudJob/Task Profiler On-Premise Hadoop Data Lake Integration Check-point On-Prem/Cloud Data Upload Workflow Organizer Payload Organizer User Profile Network Profile Data Profile Private, Secured Tunnel Private, Secured Tunnel Transmission Channel Security Checks
  8. 8. 9 Execution Workflow S3 (Data Landing) Payload Organizer Private, Secured Tunnel Transmission Channel Security Checks Payload Delivery Cloud HSM Identity & Access Management Key Management Service Certificate Manager QuickSight SNS ( Push Notification ) On-Premise Hadoop Data Lake Private, Secured Tunnel Data Pipeline SQS ( Queue Service ) RedShift Data warehouse Kinesis EMR/MapReduce
  9. 9. 10 Data Exchange & Security Cloud HSM Identity & Access Management Key Management Service Certificate Manager 1 2 3 4 Data Center Direct Connect Secure Tunnel VPC On premise Data Center hosts Hadoop Cluster and has connectivity established to the Cloud 1 Uses Direct Connect option to connect to the private Cloud setup 2 Uses secured VPN tunnel to the dedicated Cloud setup for data exchange3 Hadoop on Cloud setup connected with data center, secured behind firewall and access restrictions 4 Role based access control, process execution privileges, Identity management 5 5
  10. 10. 11 Benefits • Comprehensive Solution Options – Modular and complementary data management options • Flexibility – Meets dynamic business and technology demands • Performance and Scalability – Scale up and out • Best of both worlds – Play to platform’s strengths • Economic$ – Hybrid model provides best of TCO and ROI
  11. 11. 12 Case Study • One of the worlds largest producer of commodities, natural ores, conventional and unconventional energy resources, with suppliers and consumers as end users of data analytics • Need to build an Hybrid Data Analytics Environment covering areas such as Productivity, Supply Chain and Operations • Data to be loaded in less than 20 minutes • Analytics queries to run in less than 5-seconds on 95% of the queries • Highly available environment with both on-premise and Cloud connectivity
  12. 12. 13 Thank You ! @shankariyer