Digital Vortex 2015
• What is BDaaS ?
• BDaaS layers
• BDaaS Advantages
• BDaaS Enterprise Requirements
• Life Sciences Case Study
Data Scientist wants flexibility
• Different versions (new releases) of
Hadoop, spark etc.
• Different sets of BI/Analytics tools
IT wants control
• QOS, Data access
• Network Authentication and
• Data is becoming increasingly :
• Less Structured
• Infrastructure setup
• Maintenance of Infrastructure (Update, patching etc.)
• Deployment time
• On Demand Scaling
6. What is BDaaS ?
BDaaS provides a cloud based framework that offers end-to-end BigData
solutions to business organizations
7. Layers in BDaaS
9. BDaaS Enterprise Requirements
- Support for Application
- High Availability
- Support for HA
- Cluster expansion and contraction
- Infrastructure and Operation requirements
- Integration with existing network configuration
- Supported versions of OS, containers etc.
- Integration with LDAP
- Capacity expansion
11. Business Problem
Syndicated & Large Data
Patient & Studies
Clinical Study Data
Drug Safety & Analytics
Safety Outcome &
Real World Signal
Electronic Data Capture
Safety Data Warehouse
Global Safety Data
Clinical Study Reports
Disparate Business Unit Reports
Non-Clinical, Pre-Clinical Data &
Real World Claims Data
Internal Genomics Data
Public Data (Kegg, NCBI,CHEMBL,etc.)
Trials Trove, CT.gov
Varied Structure Data
13. High Level Flow
Data Quality Rules
Data Security & Governance
Reporting & Analysis
Alerts and Notifications
Metadata Repository and execution Engine
• Development time reduced by 35-40%
• Testing of individual components not required
• Pre built data quality rules
• Pre built workflows
• Pre built KPIs
• Pre built common data model and aggregated data model
Data Analytics: This layer includes high-level analytical applications similar to R or Tableau delivered over a cloud computing platform which can be used to analyze the underlying data. Users can access these technologies in this layer through a web interface where they can create queries and define reports that will be based on the underlying data in the storage layer. Technologies in the data analytics layer abstract complexities of the
underlying BDaaS stack and enable better utilization of data within the system. The web interface of those technologies may have wizards and graphical tools that enable the user to perform complex statistical analysis.
Data Management: In this layer, higher level applications such as Amazon Relational Database Service (RDS) and DynamoDB (see Chapter 6) are implemented to provide distributed data management and processing services. Technologies contained in this layer provide database management services over a cloud platform.
Computation Layer: This layer is composed of technologies that provide computing services over a web platform. For example, using Amazon Elastic MapReduce (EMR), users can write programs to manipulate data and store the results in a cloud platform. This layer includes the processing framework as well as APIs and other programs to help the programs utilize it.
Cloud Infrastructure: In this layer cloud platforms such as open stack or VMware ESX server provide the virtual cloud environment that forms the basis of the BDaaS stack.
Data Infrastructure: This layer is composed of the actual data center hardware and the physical nodes of the system. Data centers are typically composed of thousands of servers connected to each other by a high-speed network line enabling transfer of data. The data centers also have routers, firewalls, and backup systems to insure protection against data loss.