Despite all the recent advancements in the operations management field, data center management today still largely remains as a black art. Administrators have limited visibility into their data center operations today and yet they have to make important operations management decisions every day. A typical data center generates about a Billion data points every day. A lot of insight could be gathered from this data but due to the large volume and scale, on-premise software solutions only collect limited subset of this data. This limits them to a very narrow view of the data center. We at CloudPhysics have taken a different approach to this problem. We created an analytics platform in the cloud, that provides the ability to query, slice and dice and mashup the data with multiple data-sources. This approach not only yields incredible insights but also solves many of the teething operational management issues that have not been solved before. In this talk we give an overview of the data center metadata and provide details on how CloudPhysics handles this data at scale using its platform.
15. Can
Private
Clouds
Match
The
Efficiency?
Data
Scientists
+
Machine
Metadata
Half
Million
Private
Clouds
16. Operational
Metadata
Per
Day
1
Billion
Data
Points
500
Virtual
Machines
50
Servers
Private
Cloud
25GB
17. Massive
Scale
650
Million
Users
60 million tweets
1
Billion
Users
2.5 billion pieces of content
500,000
Datacenters
50
Million
VMs
Quadrillion datapoints (1015)
18. Challenges
Management
Server
Control
Plane
VM VM VM
VM VM VM
(Predict/Alert)
VM VM VM
VM VM
Server
Server
Server
Server
Data
Center
VM
(Collect)
Management
Driven
by
Collective
Intelligence
24. Predicting
Out
of
Disk
Space
Event
Datastore
1
Datastore
2
Datastore
3
Datastore
Capacity
100%
75%
50%
25%
0%
April
May
June
July
Disk
Space
Usage
is
Hard
to
Predict
August
25. Predicting
Out
of
Disk
Space
100%
Full
90%
Full
80%
Full
Probability
100%
75%
50%
25%
0%
<
1
Month
6
Months
1
Year
Probability
Distribution
Using
Monte
Carlo
Simulation
28. Predicting
SSD
Performance
Impact
Latency
Reducjon
100%
75%
50%
VM
1
VM
2
cy
ten
a
et
L
g
Tar
25%
VM
3
0%
16
GB
B
4
G
2
32
GB
B
2
G
6
64
GB
Solid
State
Drive
Cache
Per
VM
128
GB