BDaas- BigData as a service

Agile Testing Alliance
Agile Testing AllianceAgile Testing Alliance
Shreya Pal
Chief Architect Saama
9-Sep-2017
BDaaS - Bigdata as
Service
Content
Digital Vortex 2015
• What is BDaaS ?
• Challenges
• BDaaS layers
• BDaaS Advantages
• BDaaS Enterprise Requirements
• Life Sciences Case Study
Conflicting
Enterprise Needs
Data Scientist wants flexibility
• Different versions (new releases) of
Hadoop, spark etc.
• Different sets of BI/Analytics tools
IT wants control
• Multitenancy
• QOS, Data access
• Security
• Network Authentication and
Authorization
DigitalVortex 2015
Challenges
• Data is becoming increasingly :
• Voluminous
• Varied
• Complex
• Less Structured
• Infrastructure setup
• Maintenance of Infrastructure (Update, patching etc.)
• Deployment time
• On Demand Scaling
• Cost
Rise of BDaaS
Digital Vortex 2015
What is BDaaS ?
On
Demand
Self
Service
Elastic
Bigdata
Infrastructure
Applications
Analytics
BDaaS provides a cloud based framework that offers end-to-end BigData
solutions to business organizations
Layers in BDaaS
Infrastructure
Cloud Infrastructure
Data Storage
Computing
Data Management
Data AnalyticsPresentation
Layer
Easeofuse
Bigdataasaservice
Hardware
platform
IaaS
HDFS
Spark,
MR
RDS
Tableau,
R
BDaaS Advantages
- Scalability
- Reliability
- Availability
- Flexibility
- Pre stitched big data stack
- Cost Effectiveness
BDaaS Enterprise Requirements
- Multitenancy
- Support for Application
- High Availability
- Support for HA
- Cluster expansion and contraction
- Infrastructure and Operation requirements
- Integration with existing network configuration
- Supported versions of OS, containers etc.
- Integration with LDAP
- Upgrade
- Capacity expansion
- Monitoring
Life Sciences Case Study -
Operational data repository
Business Problem
CDISC Standards
Clinical Data
Safety Data
Varied Sources
Syndicated & Large Data
Enabled Analytics
Patient & Studies
Analytics
 Clinical Study Data
Mart
 Clinical Outcomes
Analytics
Drug Safety & Analytics
 Safety Outcome &
Reporting Analytics
 Trial Management
Analytics
 Real World Signal
Detection Analytics
 Activity Enablement
Big Data
Relational
Data
Advanced
Analytical
Tools
Shared
Metadata
 Electronic Data Capture
 Clinical Trials
Management System
 Safety Data Warehouse
 Global Safety Data
Warehouse
 ARGUS
 Clinical Study Reports
 Disparate Business Unit Reports
 External analyses
 Non-Clinical, Pre-Clinical Data &
Reports
 Real World Claims Data
 Internal Genomics Data
 Public Data (Kegg, NCBI,CHEMBL,etc.)
 Trials Trove, CT.gov
Varied Structure Data
Infrastructure
Data Sources
Technology Stack
Fluid analytics Engine and AWS
Cloud Provider – AWS
Hadoop distribution – Cloudera
Storage – S3, Hive, Impala
Archival - Glacier
Processing – Spark
Monitoring – Cloud Watch
Metadata storage – Amazon RDS
Automation – Cloud Formation Template
Access – AWS IAM
Cluster – VPC
LAN connectivity – Direct Connect
High Level Flow
Master data
Raw CDC
Data Quality Rules
Repository
Data
Vocabulary
Scheduling
Data Security & Governance
Lading Layer
Standardized Layer
Reporting & Analysis
Layer
CTMS
Alerts and Notifications
IRT
EDC
Aggregated Layer
Detail data
CRO Data
Data
Transformation
Common
Data Model
Aggregated
Data Model
Monitoring
Metadata Repository and execution Engine
Data
Aggregation
Data CleansingFAE
FAE
FAE
FAE
FAE
FAE
FAE
F
A
E
FAE FAE
FAE
AWS
AWS
AWS
Advantages
• Development time reduced by 35-40%
• Testing of individual components not required
• Pre built data quality rules
• Pre built workflows
• Pre built KPIs
• Pre built common data model and aggregated data model
Questions ??
1 of 15

Recommended

Introduction to Big Data Hadoop Training Online by www.itjobzone.biz by
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizIntroduction to Big Data Hadoop Training Online by www.itjobzone.biz
Introduction to Big Data Hadoop Training Online by www.itjobzone.bizITJobZone.biz
607 views25 slides
BigData Hadoop by
BigData Hadoop BigData Hadoop
BigData Hadoop Kumari Surabhi
504 views43 slides
Introduction to Big Data by
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataJoey Li
2.5K views17 slides
Big Data Tech Stack by
Big Data Tech StackBig Data Tech Stack
Big Data Tech StackAbdullah Çetin ÇAVDAR
15.3K views114 slides
Making Big Data Analytics with Hadoop fast & easy (webinar slides) by
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Making Big Data Analytics with Hadoop fast & easy (webinar slides)Yellowfin
2.4K views32 slides
Rob peglar introduction_analytics _big data_hadoop by
Rob peglar introduction_analytics _big data_hadoopRob peglar introduction_analytics _big data_hadoop
Rob peglar introduction_analytics _big data_hadoopGhassan Al-Yafie
600 views47 slides

More Related Content

What's hot

Big Data Analytics for Real Time Systems by
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
2K views41 slides
BigData Analytics with Hadoop and BIRT by
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRTAmrit Chhetri
693 views21 slides
Big Data Telecom by
Big Data TelecomBig Data Telecom
Big Data TelecomTrick Consulting
4.1K views10 slides
Open Source in the Energy Industry - Creating a New Operational Model for Dat... by
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...DataWorks Summit
453 views21 slides
Guest Lecture: Introduction to Big Data at Indian Institute of Technology by
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyNishant Gandhi
469 views18 slides
Big Data: An Overview by
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
14.7K views134 slides

What's hot(20)

Big Data Analytics for Real Time Systems by Kamalika Dutta
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
Kamalika Dutta2K views
BigData Analytics with Hadoop and BIRT by Amrit Chhetri
BigData Analytics with Hadoop and BIRTBigData Analytics with Hadoop and BIRT
BigData Analytics with Hadoop and BIRT
Amrit Chhetri693 views
Open Source in the Energy Industry - Creating a New Operational Model for Dat... by DataWorks Summit
Open Source in the Energy Industry - Creating a New Operational Model for Dat...Open Source in the Energy Industry - Creating a New Operational Model for Dat...
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
DataWorks Summit453 views
Guest Lecture: Introduction to Big Data at Indian Institute of Technology by Nishant Gandhi
Guest Lecture: Introduction to Big Data at Indian Institute of TechnologyGuest Lecture: Introduction to Big Data at Indian Institute of Technology
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
Nishant Gandhi469 views
Big Data: An Overview by C. Scyphers
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
C. Scyphers14.7K views
Pouring the Foundation: Data Management in the Energy Industry by DataWorks Summit
Pouring the Foundation: Data Management in the Energy IndustryPouring the Foundation: Data Management in the Energy Industry
Pouring the Foundation: Data Management in the Energy Industry
DataWorks Summit611 views
Introduction to Big Data by Karan Desai
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Karan Desai1K views
Continuous Data Ingestion pipeline for the Enterprise by DataWorks Summit
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
DataWorks Summit2K views
Big Data Use Cases by boorad
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
boorad18.7K views
Introduction to Big Data by AmpoolIO
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
AmpoolIO1.1K views
Big data analytics, survey r.nabati by nabati
Big data analytics, survey r.nabatiBig data analytics, survey r.nabati
Big data analytics, survey r.nabati
nabati466 views
Big Data Real Time Analytics - A Facebook Case Study by Nati Shalom
Big Data Real Time Analytics - A Facebook Case StudyBig Data Real Time Analytics - A Facebook Case Study
Big Data Real Time Analytics - A Facebook Case Study
Nati Shalom29.4K views
Big Data vs Data Warehousing by Thomas Kejser
Big Data vs Data WarehousingBig Data vs Data Warehousing
Big Data vs Data Warehousing
Thomas Kejser4.6K views
Introduction to Big Data by Haluan Irsad
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Haluan Irsad919 views

Viewers also liked

Hadoop bigdata overview by
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overviewharithakannan
1.5K views24 slides
Bio bigdata by
Bio bigdata Bio bigdata
Bio bigdata Mk Kim
3.8K views112 slides
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012 by
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
3.8K views33 slides
Apache Hivemall @ Apache BigData '17, Miami by
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, MiamiMakoto Yui
3.5K views78 slides
SQL, NoSQL, BigData in Data Architecture by
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data ArchitectureVenu Anuganti
32.6K views41 slides
Data Analytics Practice at Paxcel by
Data Analytics Practice at PaxcelData Analytics Practice at Paxcel
Data Analytics Practice at PaxcelPushpinder Singh
567 views11 slides

Viewers also liked(20)

Hadoop bigdata overview by harithakannan
Hadoop bigdata overviewHadoop bigdata overview
Hadoop bigdata overview
harithakannan1.5K views
Bio bigdata by Mk Kim
Bio bigdata Bio bigdata
Bio bigdata
Mk Kim3.8K views
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012 by Preferred Networks
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Preferred Networks3.8K views
Apache Hivemall @ Apache BigData '17, Miami by Makoto Yui
Apache Hivemall @ Apache BigData '17, MiamiApache Hivemall @ Apache BigData '17, Miami
Apache Hivemall @ Apache BigData '17, Miami
Makoto Yui3.5K views
SQL, NoSQL, BigData in Data Architecture by Venu Anuganti
SQL, NoSQL, BigData in Data ArchitectureSQL, NoSQL, BigData in Data Architecture
SQL, NoSQL, BigData in Data Architecture
Venu Anuganti32.6K views
Prediction Of Muscle Power In Elderly Using Functional Screening Data by Agile Testing Alliance
Prediction Of Muscle Power In Elderly Using Functional Screening DataPrediction Of Muscle Power In Elderly Using Functional Screening Data
Prediction Of Muscle Power In Elderly Using Functional Screening Data
A systemic approach to shaping a DevOps culture by Masa Maeda
A systemic approach to shaping a DevOps cultureA systemic approach to shaping a DevOps culture
A systemic approach to shaping a DevOps culture
Masa Maeda2.7K views
Linuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem by Agile Testing Alliance
Linuxkit and Moby - A Sneek Peek into The Future of Container EcosystemLinuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Linuxkit and Moby - A Sneek Peek into The Future of Container Ecosystem
Key Success (And Failure) modes for your Large Scale DevOps Transformation by Agile Testing Alliance
Key Success (And Failure) modes for your Large Scale DevOps TransformationKey Success (And Failure) modes for your Large Scale DevOps Transformation
Key Success (And Failure) modes for your Large Scale DevOps Transformation
Addressing the challenges of delivering Microservice applications in the ente... by Agile Testing Alliance
Addressing the challenges of delivering Microservice applications in the ente...Addressing the challenges of delivering Microservice applications in the ente...
Addressing the challenges of delivering Microservice applications in the ente...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc... by Agile Testing Alliance
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...
Introduction To Big Data with Hadoop and Spark - For Batch and Real Time Proc...

Similar to BDaas- BigData as a service

AWS Big Data Platform by
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data PlatformAmazon Web Services
9.1K views66 slides
Data Driven Advanced Analytics using Denodo Platform on AWS by
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWSDenodo
36 views32 slides
Modernizing to a Cloud Data Architecture by
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
654 views22 slides
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services" by
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
258 views13 slides
StreamCentral Technical Overview by
StreamCentral Technical OverviewStreamCentral Technical Overview
StreamCentral Technical OverviewRaheel Retiwalla
618 views29 slides
How to govern and secure a Data Mesh? by
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?confluent
195 views32 slides

Similar to BDaas- BigData as a service (20)

Data Driven Advanced Analytics using Denodo Platform on AWS by Denodo
Data Driven Advanced Analytics using Denodo Platform on AWSData Driven Advanced Analytics using Denodo Platform on AWS
Data Driven Advanced Analytics using Denodo Platform on AWS
Denodo 36 views
Modernizing to a Cloud Data Architecture by Databricks
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
Databricks654 views
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services" by Fwdays
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Fwdays258 views
How to govern and secure a Data Mesh? by confluent
How to govern and secure a Data Mesh?How to govern and secure a Data Mesh?
How to govern and secure a Data Mesh?
confluent195 views
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics by Informatica
How to Architect a Serverless Cloud Data Lake for Enhanced Data AnalyticsHow to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
How to Architect a Serverless Cloud Data Lake for Enhanced Data Analytics
Informatica1.2K views
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ... by Michael Rys
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Big Data and Data Warehousing Together with Azure Synapse Analytics (SQLBits ...
Michael Rys742 views
Enabling Next Gen Analytics with Azure Data Lake and StreamSets by Streamsets Inc.
Enabling Next Gen Analytics with Azure Data Lake and StreamSetsEnabling Next Gen Analytics with Azure Data Lake and StreamSets
Enabling Next Gen Analytics with Azure Data Lake and StreamSets
Streamsets Inc.1.1K views
Horses for Courses: Database Roundtable by Eric Kavanagh
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
Eric Kavanagh258 views
SendGrid Improves Email Delivery with Hybrid Data Warehousing by Amazon Web Services
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
Amazon Web Services1.5K views
Accelerating Insight - Smart Data Lake Customer Success Stories by Cambridge Semantics
Accelerating Insight - Smart Data Lake Customer Success StoriesAccelerating Insight - Smart Data Lake Customer Success Stories
Accelerating Insight - Smart Data Lake Customer Success Stories
Cambridge Semantics1.9K views
From Business Hindsight to Foresight with Azure Synapse Analytics by Korcomptenz Inc
From Business Hindsight to Foresight with Azure Synapse AnalyticsFrom Business Hindsight to Foresight with Azure Synapse Analytics
From Business Hindsight to Foresight with Azure Synapse Analytics
Korcomptenz Inc2.3K views
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx by Amazon Web Services
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptxTrack 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
Track 6 Session 1_進入 AI 領域的第一步驟_資料平台的建置.pptx
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios by kcmallu
The Practice of Big Data - The Hadoop ecosystem explained with usage scenariosThe Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
kcmallu2K views
Azure Overview Csco by rajramab
Azure Overview CscoAzure Overview Csco
Azure Overview Csco
rajramab670 views

More from Agile Testing Alliance

Parallel Test execution in Cypress with CI/CD by
Parallel Test execution in Cypress with CI/CDParallel Test execution in Cypress with CI/CD
Parallel Test execution in Cypress with CI/CDAgile Testing Alliance
7 views10 slides
Localisation Testing using UI automation by
Localisation Testing using UI automationLocalisation Testing using UI automation
Localisation Testing using UI automationAgile Testing Alliance
8 views13 slides
AI in UI automation using Helenium by
AI in UI automation using HeleniumAI in UI automation using Helenium
AI in UI automation using HeleniumAgile Testing Alliance
13 views12 slides
Automation for test data anonymization by
Automation for test data anonymizationAutomation for test data anonymization
Automation for test data anonymizationAgile Testing Alliance
6 views9 slides
MobSF: Mobile Security Testing (Android/IoS) by
MobSF: Mobile Security Testing (Android/IoS)MobSF: Mobile Security Testing (Android/IoS)
MobSF: Mobile Security Testing (Android/IoS)Agile Testing Alliance
41 views11 slides
AI-Automation-Using-Computer-Vision-in-Testing.pptx by
AI-Automation-Using-Computer-Vision-in-Testing.pptxAI-Automation-Using-Computer-Vision-in-Testing.pptx
AI-Automation-Using-Computer-Vision-in-Testing.pptxAgile Testing Alliance
16 views9 slides

More from Agile Testing Alliance(20)

#ATAHyderabad Meetup Welcome Session and Introductions Adi.pptx by Agile Testing Alliance
#ATAHyderabad Meetup Welcome Session and Introductions Adi.pptx#ATAHyderabad Meetup Welcome Session and Introductions Adi.pptx
#ATAHyderabad Meetup Welcome Session and Introductions Adi.pptx
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf... by Agile Testing Alliance
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Use of AI and ML in Performance Testing" by Adolf...
#ATAGTR2021 Presentation : "Common Testing Mistakes –Are we really evolving?... by Agile Testing Alliance
#ATAGTR2021 Presentation :  "Common Testing Mistakes –Are we really evolving?...#ATAGTR2021 Presentation :  "Common Testing Mistakes –Are we really evolving?...
#ATAGTR2021 Presentation : "Common Testing Mistakes –Are we really evolving?...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa... by Agile Testing Alliance
#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...#ATAGTR2021 Presentation :  "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
#ATAGTR2021 Presentation : "Unlocking the Power of Machine Learning in the Mo... by Agile Testing Alliance
#ATAGTR2021 Presentation : "Unlocking the Power of Machine Learning in the Mo...#ATAGTR2021 Presentation : "Unlocking the Power of Machine Learning in the Mo...
#ATAGTR2021 Presentation : "Unlocking the Power of Machine Learning in the Mo...
#ATAGTR2021 Presentation : "Performance Evaluation Strategy of multi-access e... by Agile Testing Alliance
#ATAGTR2021 Presentation : "Performance Evaluation Strategy of multi-access e...#ATAGTR2021 Presentation : "Performance Evaluation Strategy of multi-access e...
#ATAGTR2021 Presentation : "Performance Evaluation Strategy of multi-access e...
#ATAGTR2021 Presentation : "Spice up your Testing Gamify it out!" by Geosley ... by Agile Testing Alliance
#ATAGTR2021 Presentation : "Spice up your Testing Gamify it out!" by Geosley ...#ATAGTR2021 Presentation : "Spice up your Testing Gamify it out!" by Geosley ...
#ATAGTR2021 Presentation : "Spice up your Testing Gamify it out!" by Geosley ...

Recently uploaded

Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
164 views13 slides
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...The Digital Insurer
91 views52 slides
MVP and prioritization.pdf by
MVP and prioritization.pdfMVP and prioritization.pdf
MVP and prioritization.pdfrahuldharwal141
39 views8 slides
Qualifying SaaS, IaaS.pptx by
Qualifying SaaS, IaaS.pptxQualifying SaaS, IaaS.pptx
Qualifying SaaS, IaaS.pptxSachin Bhandari
1.1K views8 slides
Generative AI: Shifting the AI Landscape by
Generative AI: Shifting the AI LandscapeGenerative AI: Shifting the AI Landscape
Generative AI: Shifting the AI LandscapeDeakin University
67 views55 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
58 views29 slides

Recently uploaded(20)

Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue164 views
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading... by The Digital Insurer
Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...Webinar : Desperately Seeking Transformation - Part 2:  Insights from leading...
Webinar : Desperately Seeking Transformation - Part 2: Insights from leading...
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays58 views
The Role of Patterns in the Era of Large Language Models by Yunyao Li
The Role of Patterns in the Era of Large Language ModelsThe Role of Patterns in the Era of Large Language Models
The Role of Patterns in the Era of Large Language Models
Yunyao Li91 views
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ... by ShapeBlue
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
How to Re-use Old Hardware with CloudStack. Saving Money and the Environment ...
ShapeBlue171 views
State of the Union - Rohit Yadav - Apache CloudStack by ShapeBlue
State of the Union - Rohit Yadav - Apache CloudStackState of the Union - Rohit Yadav - Apache CloudStack
State of the Union - Rohit Yadav - Apache CloudStack
ShapeBlue303 views
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... by Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro35 views
"Node.js Development in 2024: trends and tools", Nikita Galkin by Fwdays
"Node.js Development in 2024: trends and tools", Nikita Galkin "Node.js Development in 2024: trends and tools", Nikita Galkin
"Node.js Development in 2024: trends and tools", Nikita Galkin
Fwdays33 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue162 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue225 views
The Power of Heat Decarbonisation Plans in the Built Environment by IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE84 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty65 views
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... by ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue129 views
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P... by ShapeBlue
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
Developments to CloudStack’s SDN ecosystem: Integration with VMWare NSX 4 - P...
ShapeBlue196 views
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ... by ShapeBlue
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
Import Export Virtual Machine for KVM Hypervisor - Ayush Pandey - University ...
ShapeBlue120 views
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... by TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc176 views

BDaas- BigData as a service

  • 1. Shreya Pal Chief Architect Saama 9-Sep-2017 BDaaS - Bigdata as Service
  • 2. Content Digital Vortex 2015 • What is BDaaS ? • Challenges • BDaaS layers • BDaaS Advantages • BDaaS Enterprise Requirements • Life Sciences Case Study
  • 3. Conflicting Enterprise Needs Data Scientist wants flexibility • Different versions (new releases) of Hadoop, spark etc. • Different sets of BI/Analytics tools IT wants control • Multitenancy • QOS, Data access • Security • Network Authentication and Authorization DigitalVortex 2015
  • 4. Challenges • Data is becoming increasingly : • Voluminous • Varied • Complex • Less Structured • Infrastructure setup • Maintenance of Infrastructure (Update, patching etc.) • Deployment time • On Demand Scaling • Cost
  • 5. Rise of BDaaS Digital Vortex 2015
  • 6. What is BDaaS ? On Demand Self Service Elastic Bigdata Infrastructure Applications Analytics BDaaS provides a cloud based framework that offers end-to-end BigData solutions to business organizations
  • 7. Layers in BDaaS Infrastructure Cloud Infrastructure Data Storage Computing Data Management Data AnalyticsPresentation Layer Easeofuse Bigdataasaservice Hardware platform IaaS HDFS Spark, MR RDS Tableau, R
  • 8. BDaaS Advantages - Scalability - Reliability - Availability - Flexibility - Pre stitched big data stack - Cost Effectiveness
  • 9. BDaaS Enterprise Requirements - Multitenancy - Support for Application - High Availability - Support for HA - Cluster expansion and contraction - Infrastructure and Operation requirements - Integration with existing network configuration - Supported versions of OS, containers etc. - Integration with LDAP - Upgrade - Capacity expansion - Monitoring
  • 10. Life Sciences Case Study - Operational data repository
  • 11. Business Problem CDISC Standards Clinical Data Safety Data Varied Sources Syndicated & Large Data Enabled Analytics Patient & Studies Analytics  Clinical Study Data Mart  Clinical Outcomes Analytics Drug Safety & Analytics  Safety Outcome & Reporting Analytics  Trial Management Analytics  Real World Signal Detection Analytics  Activity Enablement Big Data Relational Data Advanced Analytical Tools Shared Metadata  Electronic Data Capture  Clinical Trials Management System  Safety Data Warehouse  Global Safety Data Warehouse  ARGUS  Clinical Study Reports  Disparate Business Unit Reports  External analyses  Non-Clinical, Pre-Clinical Data & Reports  Real World Claims Data  Internal Genomics Data  Public Data (Kegg, NCBI,CHEMBL,etc.)  Trials Trove, CT.gov Varied Structure Data Infrastructure Data Sources
  • 12. Technology Stack Fluid analytics Engine and AWS Cloud Provider – AWS Hadoop distribution – Cloudera Storage – S3, Hive, Impala Archival - Glacier Processing – Spark Monitoring – Cloud Watch Metadata storage – Amazon RDS Automation – Cloud Formation Template Access – AWS IAM Cluster – VPC LAN connectivity – Direct Connect
  • 13. High Level Flow Master data Raw CDC Data Quality Rules Repository Data Vocabulary Scheduling Data Security & Governance Lading Layer Standardized Layer Reporting & Analysis Layer CTMS Alerts and Notifications IRT EDC Aggregated Layer Detail data CRO Data Data Transformation Common Data Model Aggregated Data Model Monitoring Metadata Repository and execution Engine Data Aggregation Data CleansingFAE FAE FAE FAE FAE FAE FAE F A E FAE FAE FAE AWS AWS AWS
  • 14. Advantages • Development time reduced by 35-40% • Testing of individual components not required • Pre built data quality rules • Pre built workflows • Pre built KPIs • Pre built common data model and aggregated data model

Editor's Notes

  1. Data Analytics: This layer includes high-level analytical applications similar to R or Tableau delivered over a cloud computing platform which can be used to analyze the underlying data. Users can access these technologies in this layer through a web interface where they can create queries and define reports that will be based on the underlying data in the storage layer. Technologies in the data analytics layer abstract complexities of the underlying BDaaS stack and enable better utilization of data within the system. The web interface of those technologies may have wizards and graphical tools that enable the user to perform complex statistical analysis. Data Management: In this layer, higher level applications such as Amazon Relational Database Service (RDS) and DynamoDB (see Chapter 6) are implemented to provide distributed data management and processing services. Technologies contained in this layer provide database management services over a cloud platform. Computation Layer: This layer is composed of technologies that provide computing services over a web platform. For example, using Amazon Elastic MapReduce (EMR), users can write programs to manipulate data and store the results in a cloud platform. This layer includes the processing framework as well as APIs and other programs to help the programs utilize it. Cloud Infrastructure: In this layer cloud platforms such as open stack or VMware ESX server provide the virtual cloud environment that forms the basis of the BDaaS stack. Data Infrastructure: This layer is composed of the actual data center hardware and the physical nodes of the system. Data centers are typically composed of thousands of servers connected to each other by a high-speed network line enabling transfer of data. The data centers also have routers, firewalls, and backup systems to insure protection against data loss.