Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
A Year in Review - Building a Comprehensive Data Management Program
1. A Year in Review -
Building a
Comprehensive Data
Management Program
@ Microsoft Research
2. What Exactly
Is Big Data?
2
Wikipedia: “Big data is a collection of data sets so
large and complex that it becomes difficult to process
using on-hand database management tools or
traditional data processing applications”
Critical tool for Microsoft’s businesses
Opportunity to deliver transformative new
capabilities to our enterprise customers
3. MSR and Big
Data
3
First, the sword: Shame on us…
Many undergrads with better big data capabilities
Martians versus Earthlings
Finally…Big data has been fully embraced by MSR as
A vital tool to enable research
A vital area in which to do research
We are MAKING THE INVESTMENT
6. Motivation:
• Numerous Areas of Research are Driven by Data (Research
Need)
• Data comes in very different forms from very different sources
(Adapting to Change)
• Identified need standardized Data Storage and Data Processing
resource for MSR (Community)
• Many different research groups were processing and storing the
same data sets. (Shared Knowledge / Data Sharing)
• Some research groups were not aware that so many different
types of data was available. (Communication / Collaboration)
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Adapting to
Change
Community
Collaboration
Shared
Knowledge
Data Sharing
Research
Need
7. Guiding Principles:
• Secure and Compliant (e.g. Data Security, Privacy and Ethics)
• World-wide Access (equal opportunity for access and use given
to all MSR labs)
• Created through Partnerships with teams throughout Microsoft
• Driven by Researcher Needs and Requirements (e.g. Tools,
Hardware, Software, Datasets)
• Flexibility
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Security
Driven by
Researcher
Needs
Research and
Product Team
Partnerships
Global Access
Compliance
Ethics
8. Goals:
• Centralized, Compliant, and Curated Data Storage Facilities
• Multi-Purpose Data Processing Architecture (mix of different
types of Hardware)
• Flexibility with Software
• Active User Community (supported through Outreach and
Training)
Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Centralized
Compliant
Curated
User
Community
Flexibility
with Software
and Tools
Blend of
Technology
and Services
11. Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
12. Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
13. Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
14. Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
Hadoop
GPU
HPC
Azure
Sandbox
Bing
19. Microsoft Research’s Centralized Data Management and Data Processing Platform
RESEARCH DATA
(INTERNAL AND EXTERNAL)
MNIST
Compliance
Security
Data
Management
Ethics
Policy
20. Microsoft Research’s Centralized Data Management and Data Processing Platform
ComplianceSecurity Ethics
• Policy / Procedure
• Standardization /
Common Platform
• Technology
• Corporate Technology
and Compliance
• Standardization /
Common Platform
• Technology
• Ethical Review Board /
Legal and Corporate
Affairs
• Standardization /
Common Platform
• Technology