Successfully reported this slideshow.

BDaas- BigData as a service

2

Share

1 of 15
1 of 15

More Related Content

More from Agile Testing Alliance

Related Books

Free with a 14 day trial from Scribd

See all

BDaas- BigData as a service

  1. 1. Shreya Pal Chief Architect Saama 9-Sep-2017 BDaaS - Bigdata as Service
  2. 2. Content Digital Vortex 2015 • What is BDaaS ? • Challenges • BDaaS layers • BDaaS Advantages • BDaaS Enterprise Requirements • Life Sciences Case Study
  3. 3. Conflicting Enterprise Needs Data Scientist wants flexibility • Different versions (new releases) of Hadoop, spark etc. • Different sets of BI/Analytics tools IT wants control • Multitenancy • QOS, Data access • Security • Network Authentication and Authorization DigitalVortex 2015
  4. 4. Challenges • Data is becoming increasingly : • Voluminous • Varied • Complex • Less Structured • Infrastructure setup • Maintenance of Infrastructure (Update, patching etc.) • Deployment time • On Demand Scaling • Cost
  5. 5. Rise of BDaaS Digital Vortex 2015
  6. 6. What is BDaaS ? On Demand Self Service Elastic Bigdata Infrastructure Applications Analytics BDaaS provides a cloud based framework that offers end-to-end BigData solutions to business organizations
  7. 7. Layers in BDaaS Infrastructure Cloud Infrastructure Data Storage Computing Data Management Data AnalyticsPresentation Layer Easeofuse Bigdataasaservice Hardware platform IaaS HDFS Spark, MR RDS Tableau, R
  8. 8. BDaaS Advantages - Scalability - Reliability - Availability - Flexibility - Pre stitched big data stack - Cost Effectiveness
  9. 9. BDaaS Enterprise Requirements - Multitenancy - Support for Application - High Availability - Support for HA - Cluster expansion and contraction - Infrastructure and Operation requirements - Integration with existing network configuration - Supported versions of OS, containers etc. - Integration with LDAP - Upgrade - Capacity expansion - Monitoring
  10. 10. Life Sciences Case Study - Operational data repository
  11. 11. Business Problem CDISC Standards Clinical Data Safety Data Varied Sources Syndicated & Large Data Enabled Analytics Patient & Studies Analytics  Clinical Study Data Mart  Clinical Outcomes Analytics Drug Safety & Analytics  Safety Outcome & Reporting Analytics  Trial Management Analytics  Real World Signal Detection Analytics  Activity Enablement Big Data Relational Data Advanced Analytical Tools Shared Metadata  Electronic Data Capture  Clinical Trials Management System  Safety Data Warehouse  Global Safety Data Warehouse  ARGUS  Clinical Study Reports  Disparate Business Unit Reports  External analyses  Non-Clinical, Pre-Clinical Data & Reports  Real World Claims Data  Internal Genomics Data  Public Data (Kegg, NCBI,CHEMBL,etc.)  Trials Trove, CT.gov Varied Structure Data Infrastructure Data Sources
  12. 12. Technology Stack Fluid analytics Engine and AWS Cloud Provider – AWS Hadoop distribution – Cloudera Storage – S3, Hive, Impala Archival - Glacier Processing – Spark Monitoring – Cloud Watch Metadata storage – Amazon RDS Automation – Cloud Formation Template Access – AWS IAM Cluster – VPC LAN connectivity – Direct Connect
  13. 13. High Level Flow Master data Raw CDC Data Quality Rules Repository Data Vocabulary Scheduling Data Security & Governance Lading Layer Standardized Layer Reporting & Analysis Layer CTMS Alerts and Notifications IRT EDC Aggregated Layer Detail data CRO Data Data Transformation Common Data Model Aggregated Data Model Monitoring Metadata Repository and execution Engine Data Aggregation Data CleansingFAE FAE FAE FAE FAE FAE FAE F A E FAE FAE FAE AWS AWS AWS
  14. 14. Advantages • Development time reduced by 35-40% • Testing of individual components not required • Pre built data quality rules • Pre built workflows • Pre built KPIs • Pre built common data model and aggregated data model
  15. 15. Questions ??

Editor's Notes

  • Data Analytics: This layer includes high-level analytical applications similar to R or Tableau delivered over a cloud computing platform which can be used to analyze the underlying data. Users can access these technologies in this layer through a web interface where they can create queries and define reports that will be based on the underlying data in the storage layer. Technologies in the data analytics layer abstract complexities of the
    underlying BDaaS stack and enable better utilization of data within the system. The web interface of those technologies may have wizards and graphical tools that enable the user to perform complex statistical analysis.

    Data Management: In this layer, higher level applications such as Amazon Relational Database Service (RDS) and DynamoDB (see Chapter 6) are implemented to provide distributed data management and processing services. Technologies contained in this layer provide database management services over a cloud platform.

    Computation Layer: This layer is composed of technologies that provide computing services over a web platform. For example, using Amazon Elastic MapReduce (EMR), users can write programs to manipulate data and store the results in a cloud platform. This layer includes the processing framework as well as APIs and other programs to help the programs utilize it.

    Cloud Infrastructure: In this layer cloud platforms such as open stack or VMware ESX server provide the virtual cloud environment that forms the basis of the BDaaS stack.

    Data Infrastructure: This layer is composed of the actual data center hardware and the physical nodes of the system. Data centers are typically composed of thousands of servers connected to each other by a high-speed network line enabling transfer of data. The data centers also have routers, firewalls, and backup systems to insure protection against data loss.
  • ×