SlideShare a Scribd company logo
1 of 23
A Year in Review -
Building a
Comprehensive Data
Management Program
@ Microsoft Research
What Exactly
Is Big Data?
2
Wikipedia: “Big data is a collection of data sets so
large and complex that it becomes difficult to process
using on-hand database management tools or
traditional data processing applications”
Critical tool for Microsoft’s businesses
Opportunity to deliver transformative new
capabilities to our enterprise customers
MSR and Big
Data
3
First, the sword: Shame on us…
Many undergrads with better big data capabilities
Martians versus Earthlings
Finally…Big data has been fully embraced by MSR as
A vital tool to enable research
A vital area in which to do research
We are MAKING THE INVESTMENT
Microsoft Research’s Centralized Data Management and Data Processing Platform
Founded June - 2013
Microsoft Research’s Centralized Data Management and Data Processing Platform
Project Vision
Motivation:
• Numerous Areas of Research are Driven by Data (Research
Need)
• Data comes in very different forms from very different sources
(Adapting to Change)
• Identified need standardized Data Storage and Data Processing
resource for MSR (Community)
• Many different research groups were processing and storing the
same data sets. (Shared Knowledge / Data Sharing)
• Some research groups were not aware that so many different
types of data was available. (Communication / Collaboration)
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Adapting to
Change
Community
Collaboration
Shared
Knowledge
Data Sharing
Research
Need
Guiding Principles:
• Secure and Compliant (e.g. Data Security, Privacy and Ethics)
• World-wide Access (equal opportunity for access and use given
to all MSR labs)
• Created through Partnerships with teams throughout Microsoft
• Driven by Researcher Needs and Requirements (e.g. Tools,
Hardware, Software, Datasets)
• Flexibility
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Security
Driven by
Researcher
Needs
Research and
Product Team
Partnerships
Global Access
Compliance
Ethics
Goals:
• Centralized, Compliant, and Curated Data Storage Facilities
• Multi-Purpose Data Processing Architecture (mix of different
types of Hardware)
• Flexibility with Software
• Active User Community (supported through Outreach and
Training)
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Centralized
Compliant
Curated
User
Community
Flexibility
with Software
and Tools
Blend of
Technology
and Services
Centralized
Data
Management
Research and
Innovation
Support
Innovative
Hardware and
Tools
Partnerships
Data Privacy
and Security
Community
and Outreach
Microsoft Research’s Centralized Data Management and Data Processing Platform
Microsoft Research’s Centralized Data Management and Data Processing Platform
System Architecture
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
MNIST
Microsoft Research’s Centralized Data Management and Data Processing Platform
Bing
Microsoft Research’s Centralized Data Management and Data Processing Platform
Data Management
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
MNIST
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
MNIST
Compliance
Security
Data
Management
Ethics
Policy
Microsoft Research’s Centralized Data Management and Data Processing Platform
ComplianceSecurity Ethics
• Policy / Procedure
• Standardization /
Common Platform
• Technology
• Corporate Technology
and Compliance
• Standardization /
Common Platform
• Technology
• Ethical Review Board /
Legal and Corporate
Affairs
• Standardization /
Common Platform
• Technology
Microsoft Research’s Centralized Data Management and Data Processing Platform
ComplianceSecurity Ethics
Microsoft Research’s Centralized Data Management and Data Processing Platform
Fun Examples
F sharp
Naiad
Skype
Translator
Azure ML
Microsoft Research’s Centralized Data Management and Data Processing Platform
Discussion / Questions / Next Steps

More Related Content

What's hot

DataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDATAVERSITY
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudDATAVERSITY
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDATAVERSITY
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introductionamiyadash
 
DataEd Slides: Expressing Data Improvements as Business Outcomes
DataEd Slides: Expressing Data Improvements as Business OutcomesDataEd Slides: Expressing Data Improvements as Business Outcomes
DataEd Slides: Expressing Data Improvements as Business OutcomesDATAVERSITY
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Data-Ed Online Webinar: Data Architecture Requirements
Data-Ed Online Webinar: Data Architecture RequirementsData-Ed Online Webinar: Data Architecture Requirements
Data-Ed Online Webinar: Data Architecture RequirementsDATAVERSITY
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Data Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsData Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsDATAVERSITY
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data governance, Information security strategy
Data governance, Information security strategyData governance, Information security strategy
Data governance, Information security strategyvasanthi4ever
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingDATAVERSITY
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationEmbarcadero Technologies
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Blueprint
 
RWDG Slides: Data Architecture Is Data Governance
RWDG Slides: Data Architecture Is Data GovernanceRWDG Slides: Data Architecture Is Data Governance
RWDG Slides: Data Architecture Is Data GovernanceDATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
RWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipRWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipDATAVERSITY
 

What's hot (20)

DataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data SinsDataEd Slides: Exorcising the Seven Deadly Data Sins
DataEd Slides: Exorcising the Seven Deadly Data Sins
 
Data Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: CloudData Systems Integration & Business Value Pt. 2: Cloud
Data Systems Integration & Business Value Pt. 2: Cloud
 
DataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data ArchitectureDataOps - The Foundation for Your Agile Data Architecture
DataOps - The Foundation for Your Agile Data Architecture
 
Data analytics introduction
Data analytics introductionData analytics introduction
Data analytics introduction
 
DataEd Slides: Expressing Data Improvements as Business Outcomes
DataEd Slides: Expressing Data Improvements as Business OutcomesDataEd Slides: Expressing Data Improvements as Business Outcomes
DataEd Slides: Expressing Data Improvements as Business Outcomes
 
Focus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL CodeFocus on Your Analysis, Not Your SQL Code
Focus on Your Analysis, Not Your SQL Code
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Data-Ed Online Webinar: Data Architecture Requirements
Data-Ed Online Webinar: Data Architecture RequirementsData-Ed Online Webinar: Data Architecture Requirements
Data-Ed Online Webinar: Data Architecture Requirements
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Data Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based AnalyticsData Prep - A Key Ingredient for Cloud-based Analytics
Data Prep - A Key Ingredient for Cloud-based Analytics
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data governance, Information security strategy
Data governance, Information security strategyData governance, Information security strategy
Data governance, Information security strategy
 
Emerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big ThingEmerging Trends in Data Architecture – What’s the Next Big Thing
Emerging Trends in Data Architecture – What’s the Next Big Thing
 
The Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: CollaborationThe Key to Big Data Modeling: Collaboration
The Key to Big Data Modeling: Collaboration
 
Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing Data Systems Integration & Business Value PT. 3: Warehousing
Data Systems Integration & Business Value PT. 3: Warehousing
 
RWDG Slides: Data Architecture Is Data Governance
RWDG Slides: Data Architecture Is Data GovernanceRWDG Slides: Data Architecture Is Data Governance
RWDG Slides: Data Architecture Is Data Governance
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
RWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data StewardshipRWDG Slides: Building Data Governance Through Data Stewardship
RWDG Slides: Building Data Governance Through Data Stewardship
 

Viewers also liked

Internet un gran sector en el que emprender
Internet un gran sector en el que emprenderInternet un gran sector en el que emprender
Internet un gran sector en el que emprenderAntevenio S.A
 
Тематическое планирование 7 класс
Тематическое планирование 7 классТематическое планирование 7 класс
Тематическое планирование 7 классkoneqq
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleDataWorks Summit
 
Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)Shannon Gilliland
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNDataWorks Summit
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterDataWorks Summit
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop SecurityDataWorks Summit
 
The use of_l1.a.reynolds
The use of_l1.a.reynoldsThe use of_l1.a.reynolds
The use of_l1.a.reynoldshibbatulnoor
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownDataWorks Summit
 
Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016FertilityEurope
 
Etymology - Communication
Etymology - CommunicationEtymology - Communication
Etymology - CommunicationLinxacross Ltd
 
Самообразование
СамообразованиеСамообразование
Самообразованиеkoneqq
 
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?Maria Velarde-Peru
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLDataWorks Summit
 

Viewers also liked (20)

Internet un gran sector en el que emprender
Internet un gran sector en el que emprenderInternet un gran sector en el que emprender
Internet un gran sector en el que emprender
 
Тематическое планирование 7 класс
Тематическое планирование 7 классТематическое планирование 7 класс
Тематическое планирование 7 класс
 
Hadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant ScaleHadoop 2 @ Twitter, Elephant Scale
Hadoop 2 @ Twitter, Elephant Scale
 
CDC fy-2015-ofr-annual-report
CDC fy-2015-ofr-annual-reportCDC fy-2015-ofr-annual-report
CDC fy-2015-ofr-annual-report
 
Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)Getting out of_debt_presentation(1)
Getting out of_debt_presentation(1)
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARNOne Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
 
UX Team Of One
UX Team Of OneUX Team Of One
UX Team Of One
 
How to Determine which Algorithms Really Matter
How to Determine which Algorithms Really MatterHow to Determine which Algorithms Really Matter
How to Determine which Algorithms Really Matter
 
DaedalusFBBlog
DaedalusFBBlogDaedalusFBBlog
DaedalusFBBlog
 
The Future of Hadoop Security
The Future of Hadoop SecurityThe Future of Hadoop Security
The Future of Hadoop Security
 
Etimology
EtimologyEtimology
Etimology
 
Self esteem-2
Self esteem-2Self esteem-2
Self esteem-2
 
The use of_l1.a.reynolds
The use of_l1.a.reynoldsThe use of_l1.a.reynolds
The use of_l1.a.reynolds
 
N(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdownN(ot)-o(nly)-(Ha)doop - the DAG showdown
N(ot)-o(nly)-(Ha)doop - the DAG showdown
 
Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016Awareness actions AP Fertilidade Portugal 2016
Awareness actions AP Fertilidade Portugal 2016
 
UK 2014
UK 2014UK 2014
UK 2014
 
Etymology - Communication
Etymology - CommunicationEtymology - Communication
Etymology - Communication
 
Самообразование
СамообразованиеСамообразование
Самообразование
 
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
Redes de Mercadeo ¿Cuándo fue la última vez que recomendaste algo?
 
HBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQLHBase and Drill: How loosley typed SQL is ideal for NoSQL
HBase and Drill: How loosley typed SQL is ideal for NoSQL
 

Similar to A Year in Review - Building a Comprehensive Data Management Program

Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...Ben Blaiszik
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d studentsDebs Martindale
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017Vivien Bonazzi
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxRupaRani28
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management IzzyChad
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for EducationLee Stott
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management Wendy Mears
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesASIS&T
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorialJosh Young
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghRobin Rice
 

Similar to A Year in Review - Building a Comprehensive Data Management Program (20)

Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...The Materials Data Facility: A Distributed Model for the Materials Data Commu...
The Materials Data Facility: A Distributed Model for the Materials Data Commu...
 
dwdm unit 1.ppt
dwdm unit 1.pptdwdm unit 1.ppt
dwdm unit 1.ppt
 
Research data management for masters and ph d students
Research data management for masters and ph d studentsResearch data management for masters and ph d students
Research data management for masters and ph d students
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Data commons bonazzi bd2 k fundamentals of science feb 2017
Data commons bonazzi   bd2 k fundamentals of science feb 2017Data commons bonazzi   bd2 k fundamentals of science feb 2017
Data commons bonazzi bd2 k fundamentals of science feb 2017
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptx
 
Getting to Grips with Research Data Management
Getting to Grips with Research Data Management Getting to Grips with Research Data Management
Getting to Grips with Research Data Management
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
McGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and ScalingMcGeary Data Curation Network: Developing and Scaling
McGeary Data Curation Network: Developing and Scaling
 
Research Solutions for Education
Research Solutions for EducationResearch Solutions for Education
Research Solutions for Education
 
Getting to grips with research data management
Getting to grips with research data management Getting to grips with research data management
Getting to grips with research data management
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial2016 Ocean Sciences Meeting tutorial
2016 Ocean Sciences Meeting tutorial
 
Staffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of EdinburghStaffing Research Data Services at University of Edinburgh
Staffing Research Data Services at University of Edinburgh
 

More from DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

More from DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

A Year in Review - Building a Comprehensive Data Management Program

  • 1. A Year in Review - Building a Comprehensive Data Management Program @ Microsoft Research
  • 2. What Exactly Is Big Data? 2 Wikipedia: “Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications” Critical tool for Microsoft’s businesses Opportunity to deliver transformative new capabilities to our enterprise customers
  • 3. MSR and Big Data 3 First, the sword: Shame on us… Many undergrads with better big data capabilities Martians versus Earthlings Finally…Big data has been fully embraced by MSR as A vital tool to enable research A vital area in which to do research We are MAKING THE INVESTMENT
  • 4. Microsoft Research’s Centralized Data Management and Data Processing Platform Founded June - 2013
  • 5. Microsoft Research’s Centralized Data Management and Data Processing Platform Project Vision
  • 6. Motivation: • Numerous Areas of Research are Driven by Data (Research Need) • Data comes in very different forms from very different sources (Adapting to Change) • Identified need standardized Data Storage and Data Processing resource for MSR (Community) • Many different research groups were processing and storing the same data sets. (Shared Knowledge / Data Sharing) • Some research groups were not aware that so many different types of data was available. (Communication / Collaboration) Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Adapting to Change Community Collaboration Shared Knowledge Data Sharing Research Need
  • 7. Guiding Principles: • Secure and Compliant (e.g. Data Security, Privacy and Ethics) • World-wide Access (equal opportunity for access and use given to all MSR labs) • Created through Partnerships with teams throughout Microsoft • Driven by Researcher Needs and Requirements (e.g. Tools, Hardware, Software, Datasets) • Flexibility Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Security Driven by Researcher Needs Research and Product Team Partnerships Global Access Compliance Ethics
  • 8. Goals: • Centralized, Compliant, and Curated Data Storage Facilities • Multi-Purpose Data Processing Architecture (mix of different types of Hardware) • Flexibility with Software • Active User Community (supported through Outreach and Training) Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Centralized Compliant Curated User Community Flexibility with Software and Tools Blend of Technology and Services
  • 9. Centralized Data Management Research and Innovation Support Innovative Hardware and Tools Partnerships Data Privacy and Security Community and Outreach Microsoft Research’s Centralized Data Management and Data Processing Platform
  • 10. Microsoft Research’s Centralized Data Management and Data Processing Platform System Architecture
  • 11. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  • 12. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  • 13. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  • 14. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) Hadoop GPU HPC Azure Sandbox Bing
  • 15. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST
  • 16. Microsoft Research’s Centralized Data Management and Data Processing Platform Bing
  • 17. Microsoft Research’s Centralized Data Management and Data Processing Platform Data Management
  • 18. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST
  • 19. Microsoft Research’s Centralized Data Management and Data Processing Platform RESEARCH DATA (INTERNAL AND EXTERNAL) MNIST Compliance Security Data Management Ethics Policy
  • 20. Microsoft Research’s Centralized Data Management and Data Processing Platform ComplianceSecurity Ethics • Policy / Procedure • Standardization / Common Platform • Technology • Corporate Technology and Compliance • Standardization / Common Platform • Technology • Ethical Review Board / Legal and Corporate Affairs • Standardization / Common Platform • Technology
  • 21. Microsoft Research’s Centralized Data Management and Data Processing Platform ComplianceSecurity Ethics
  • 22. Microsoft Research’s Centralized Data Management and Data Processing Platform Fun Examples F sharp Naiad Skype Translator Azure ML
  • 23. Microsoft Research’s Centralized Data Management and Data Processing Platform Discussion / Questions / Next Steps

Editor's Notes

  1. 5
  2. 23