Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

BDaas- BigData as a service

628 views

Published on

BDaas- BigData as a service by "Sherya Pal" from "Saama". The presentation was done at #doppa17 DevOps++ Global Summit 2017. All the copyrights are reserved with the author

Published in: Technology
  • The author should cite the source for slide #3 ("Conflicting Enterprise Needs") in this presentation as Tom Phelan, co-founder and chief architect for BlueData, the leading Big-Data-as-a-Service software platform. Note that the content on this slide was taken verbatim from a slide presented previously by Tom Phelan at multiple events in 2016. See slide 8 in this presentation as an example: https://www.slideshare.net/BDaaSmeetup/deploying-bigdataasaservice-bdaas-in-the-enterprise
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

BDaas- BigData as a service

  1. 1. Shreya Pal Chief Architect Saama 9-Sep-2017 BDaaS - Bigdata as Service
  2. 2. Content Digital Vortex 2015 • What is BDaaS ? • Challenges • BDaaS layers • BDaaS Advantages • BDaaS Enterprise Requirements • Life Sciences Case Study
  3. 3. Conflicting Enterprise Needs Data Scientist wants flexibility • Different versions (new releases) of Hadoop, spark etc. • Different sets of BI/Analytics tools IT wants control • Multitenancy • QOS, Data access • Security • Network Authentication and Authorization DigitalVortex 2015
  4. 4. Challenges • Data is becoming increasingly : • Voluminous • Varied • Complex • Less Structured • Infrastructure setup • Maintenance of Infrastructure (Update, patching etc.) • Deployment time • On Demand Scaling • Cost
  5. 5. Rise of BDaaS Digital Vortex 2015
  6. 6. What is BDaaS ? On Demand Self Service Elastic Bigdata Infrastructure Applications Analytics BDaaS provides a cloud based framework that offers end-to-end BigData solutions to business organizations
  7. 7. Layers in BDaaS Infrastructure Cloud Infrastructure Data Storage Computing Data Management Data AnalyticsPresentation Layer Easeofuse Bigdataasaservice Hardware platform IaaS HDFS Spark, MR RDS Tableau, R
  8. 8. BDaaS Advantages - Scalability - Reliability - Availability - Flexibility - Pre stitched big data stack - Cost Effectiveness
  9. 9. BDaaS Enterprise Requirements - Multitenancy - Support for Application - High Availability - Support for HA - Cluster expansion and contraction - Infrastructure and Operation requirements - Integration with existing network configuration - Supported versions of OS, containers etc. - Integration with LDAP - Upgrade - Capacity expansion - Monitoring
  10. 10. Life Sciences Case Study - Operational data repository
  11. 11. Business Problem CDISC Standards Clinical Data Safety Data Varied Sources Syndicated & Large Data Enabled Analytics Patient & Studies Analytics  Clinical Study Data Mart  Clinical Outcomes Analytics Drug Safety & Analytics  Safety Outcome & Reporting Analytics  Trial Management Analytics  Real World Signal Detection Analytics  Activity Enablement Big Data Relational Data Advanced Analytical Tools Shared Metadata  Electronic Data Capture  Clinical Trials Management System  Safety Data Warehouse  Global Safety Data Warehouse  ARGUS  Clinical Study Reports  Disparate Business Unit Reports  External analyses  Non-Clinical, Pre-Clinical Data & Reports  Real World Claims Data  Internal Genomics Data  Public Data (Kegg, NCBI,CHEMBL,etc.)  Trials Trove, CT.gov Varied Structure Data Infrastructure Data Sources
  12. 12. Technology Stack Fluid analytics Engine and AWS Cloud Provider – AWS Hadoop distribution – Cloudera Storage – S3, Hive, Impala Archival - Glacier Processing – Spark Monitoring – Cloud Watch Metadata storage – Amazon RDS Automation – Cloud Formation Template Access – AWS IAM Cluster – VPC LAN connectivity – Direct Connect
  13. 13. High Level Flow Master data Raw CDC Data Quality Rules Repository Data Vocabulary Scheduling Data Security & Governance Lading Layer Standardized Layer Reporting & Analysis Layer CTMS Alerts and Notifications IRT EDC Aggregated Layer Detail data CRO Data Data Transformation Common Data Model Aggregated Data Model Monitoring Metadata Repository and execution Engine Data Aggregation Data CleansingFAE FAE FAE FAE FAE FAE FAE F A E FAE FAE FAE AWS AWS AWS
  14. 14. Advantages • Development time reduced by 35-40% • Testing of individual components not required • Pre built data quality rules • Pre built workflows • Pre built KPIs • Pre built common data model and aggregated data model
  15. 15. Questions ??

×