The document discusses building operational visibility and analytics directly into cloud platforms. It describes an agentless system crawler that can provide deep visibility into cloud instances without requiring any action from end users. The crawler collects various system data which is then analyzed to provide operational insights and solve real-world problems. Specific applications discussed include vulnerability advising, configuration analysis, and license discovery. The goal is to design monitoring and analytics that are seamlessly integrated and optimized for cloud environments.
Nell’iperspazio con Rocket: il Framework Web di Rust!
Built-in Op Visibility & Analytics Designed for Cloud
1. 0
Built-in Operational Visibility and Analytics
Designed for Cloud
Canturk Isci
IBM Research, NY
@canturkisci
Boston University
Thu Apr 28, 11:00 AM
CloudSightResearch
Vulnerability Advisor
2. 1
Cloud Evolution: Greats and Needs
What is Great
What is Great
Density
Scale
Portability
Repeatability
Speed
What Needs Work
What Needs Work
Visibility
Operational Insight
Utility Cost Scale Automation Agility (u)Services
Operational
Intelligence
- Modernization of IT infra and SW delivery
- Complex made simple
- Unprecedented efficiency and TTV
- Lots of shiny toys across IT lifecycle
- Visibility into our environments remains an issue
- Also lots of shiny toys for monitoring & analytics
BUT:
- Still based on traditional IT Principles!
3. 2
- Provide unmatched deep, seamless visibility into cloud instances
- Drive operational insights to solve real-world pain points
Our Work: Built-in Op Visibility & Analytics Designed for Cloud
4. 3
- Provide unmatched deep, seamless visibility into cloud instances
- Drive operational insights to solve real-world pain points
Built-in Operational Visibility & Analytics Designed for Cloud
5. 4
- Provide unmatched deep, seamless and unified visibility into ALL cloud instances
- Drive operational insights to solve real-world pain points
Built-in Operational Visibility & Analytics Designed for Cloud
Agentless System Crawler (ASC)
6. 5
Traditional Monitoring vs. Crawlers
OS
Host
Wkld
Agent
Agent
Agent
Agent
OS
Host
Wkld A A
AA
VM
OS Wkld A A
AA
Host
OS
Wkld
A A
AA
Cont
. Wkld
A A
AA
Cont
. Wkld
A A
AA
Cont
.
VMBMS Container
OS
Host
Wkld OS
Host
Wkld
VM
OS Wkld
Host
OS
Wkld
Cont
. Wkld
Cont
. Wkld
Cont
.
VMBMS Container
7. 6
Some Data Points
From an employee- "This is the BES client agent. I don't know what it does but it's always at
50%. I would be the first customer to remove this evil thing from my machines:”
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3515 root 20 0 781m 21m 6272 R 53.8 0.3 51:28.92 BESClient
C. Colohan. The Scariest Outage Ever.
CMU SDI Seminar Series, 2012.
http://pdl.cmu.edu/SDI/2012/083012b.html
Amazon. Summary of Oct. 22 '12
AWS Service Event in US-East Region.
http://aws.amazon.com/message/680342/
8. 7
”Users do not have to do anything to get this visibility. It is already there by default”
Container Cloud
Docker Hosts
App
Cont
.App
Cont
.App
Cont
.App
Cont
.
Docker Hosts
App
Cont
.App
Cont
.App
Cont
.App
Cont
.
Docker Hosts
App
Cont
.App
Cont
.App
Cont
.App
Cont
.
Metrics & Logs
Bus
Multitenant
Index
Logmet
Svc
Provisioning
Tenancy Info
State
Events
Built-in in every compute node, all geos
Enabled by default for all users in all prod
O(10K) metrics/s & logs/s
Current State
Seamless: Built-in Monitoring & Logging in Bluemix Containers
10. 9
Key Advantages
Key Advantages
App
Cont
.App
Cont
.App
Cont
.App
Cont
.
Why Agentless System Crawlers
magic
Monitoring built into the platform
not in end-user systems
No complexity to end user
(They do nothing, all they see is the service)
No agents/credentials/access
(nothing built into userworld)
Works out of the box
Makes data consumable*
(lower barrier to data collection and analytics)
Better Security* for end user
(No attack surface, in userworld)
Better Availability* of monitoring
(From birth to death, inspect even defunct guest)
Guest Agnostic
(Build for platform, not each user distro)
Decoupled* from user context
(No overhead/side-effect concerns)
Monitoring done right for the
processes of the Cloud OS
11. 10
Deep Visibility: What We Actually Collect (and Annotate)
- OS Info
- Processes
- Disk Info
- Metrics
- Network Info
- Packages
- Files
- Config Info
From Container/VM
- Docker metadata
(docker inspect)
- CPU metrics
(/cgroup/cpuacct/)
- Memory metrics
(/cgroup/memory)
- Docker history
Docker Runtime
Config
Annotator
Vulnerability
Annotator
Compliance
Annotator
Password
Annotator
SW
Annotator
Licence
Annotator
12. 11
Deep Visibility Operational Insights/Analytics Solve Real Problems
Index (Data)
Data Bus Annotators Index (Data)
Vuln. &
Compl.
Analysis
Config
Analytics
(SecConfig)
Cloud Time
Machine
(Audit/PD)
Pipeline
Service
(DevOps)
Remediation
Service
Analyitcs
* All analytics services
work from the
same data & pipeline!
Today’s Special:
Vulnerability Advisor- OS Info
- Processes
- Disk Info
- Metrics
- Network Info
- Packages
- Files
- Config Info
From Container/VM
- Docker metadata
(docker inspect)
- CPU metrics
(/cgroup/cpuacct/)
- Memory metrics
(/cgroup/memory)
- Docker history
Docker Runtime
Config
Annotator
Vulnerability
Annotator
Compliance
Annotator
Password
Annotator
SW
Annotator
Licence
Annotator
13. 12
Crawler: How it Works for VMs
• Leverage VM Introspection (VMI) techniques to access VM Mem and Disk state
(We built bunch or our own optimizations that make this very efficient and practical)
• Can even remote both (decouple all from VM and host)
• Almost no new dependencies on host
• Currently support 1000+ kernel distros
Hypervisor
MEM
View
KB
APP
Analytics
Apps
Memory
Crawl
API
VM
OS
MEMDisk
Disk
View
Disk
Crawl
API
Cloud Analytics
Crawl
Logic Structured
view of
VM states
APP
APP
{
.......
.......
}
Frames
14. 13
Crawler: How it Works for Containers
• Leverage Docker APIs for base container information
• Exploit container abstractions (namespace mapping and cgroups) for deeper insight
• Provide deep state info at scale with no visible overheads to end user
1) Get visibility into container world
by namespace mapping
2) Crawl the container
(Crawler dependencies still borrowed from host.
No need to inject into container!)
3) Return to original namespace
4) Push data to backend index
15. 14
Crawler: Typical Deployment
• Typical deployment, able to track diverse cloud runtimes w parity
• Need not be on same host, most crawler functions can be even remoted
16. 15
Crawler: Design
• Same crawler across runtimes for unified operational visibility
• Multiple fanouts as use cases grow
17. 16
Open Innovation <3
April 13
Open Container Introspection Toolkit
for Security Analysis
Open Container Introspection Toolkit
for Security Analysis
18. 17
DEMO TIME
This Session
This Session
Agentless System Crawler
Bluemix Test Drive (live – ldwave)
https://developer.ibm.com/bluemix/2015/11/16/
built-in-monitoring-and-logging-for-bluemix-containers/
LogCrawler and JSON Parsing
(live – CanoLibUK3)
Vanilla LogCrawler
(20150619_LogCrawlerDemo)
Crawl even Non-responsive systems
(oopsRconsole2)
Out of Band SIEM
(QRadarDemo)
TopoLog for Topology Discovery
(newTopo)
RTop for Realtime Monitoring
(RtopAnnotatedMOV)
Crawling for Rootkits with RConsole
(RConsoleAnnotatedMOV)
Sunday & Wednesday
Sunday & Wednesday
Vulnerability Advisor
Coming soon…
19. 18
Bluemix Test Drive
Just start a Bluemix Container
(https://console.ng.bluemix.net/)
Go to Container Overview
(Metrics show up in few mins)
22. 21
Back to: Deep Visibility Operational Insights/Analytics Solve Real Problems
- OS Info
- Processes
- Disk Info
- Metrics
- Network Info
- Packages
- Files
- Config Info
From Container/VM
- Docker metadata
(docker inspect)
- CPU metrics
(/cgroup/cpuacct/)
- Memory metrics
(/cgroup/memory)
- Docker history
Docker Runtime
Config
Annotator
Vulnerability
Annotator
Compliance
Annotator
Password
Annotator
SW
Annotator
Licence
Annotator
How can I identify my vulnerable/non-compliant images
before they go live?
How can I detect and block systems with password access
configurations and weak passwords?
21
23. 22
- OS Info
- Processes
- Disk Info
- Metrics
- Network Info
- Packages
- Files
- Config Info
From Container/VM
- Docker metadata
(docker inspect)
- CPU metrics
(/cgroup/cpuacct/)
- Memory metrics
(/cgroup/memory)
- Docker history
Docker Runtime
Config
Annotator
Vulnerability
Annotator
Compliance
Annotator
Password
Annotator
SW
Annotator
Licence
Annotator
How can I track, query and analyze my configurations in a simple
and robust manner for drift/config analytics?
How can I do better resource management and allocation?
22
Deep Visibility Operational Insights/Analytics Solve Real Problems
24. 23
DEMO TIME
This Session
This Session
Vulnerability Advisor, Policy Mgr
Go to Bluemix Catalog
See VA Image Status
(Safe, Caution, Blocked)
Go to Create View
Explore Status Details
(Vulnerabilities, Policy Violations)
Browse Policy Manager
(Policy Settings, Deployment Impact)
Change Org Policies
Override Policies
(Don’t do it)
See Weak Password Discovery
Update Image in Local Dev
Fix Policy Violation
Previously
Previously
Built-in Monitoring & Logging
We just did that one…
26. 25
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy
27. 26
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution
28. 27
Deployment Status
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
29. 28
Create View
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
30. 29
Vulnerability Advisor Report
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:
Discovered Vulnerabilities | Policy Violations
31. 30
Vulnerability Advisor Report
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:
Discovered Vulnerabilities | Policy Violations
32. 31
Policy Manager and Deployment Impact
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:
Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment Impact
33. 32
Policy Manager and Deployment Impact
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:
Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment Impact
Change Org Policy and Observe Impact
34. 33
Policy Override
Login to Bluemix London
(https://console.eu-gb.bluemix.net/)
Go to Catalog and Look for Containers
Hover over containers to see VA verdict:
Safe to Deploy | Deploy with Caution | Blocked
Click on Image to go to Create View
See Verdict Details and Explore Options
View Vulnerability Advisor Report:
Discovered Vulnerabilities | Policy Violations
Policy Manager and Deployment Impact
Change Org Policy and Observe Impact
Create View > Click One-time Override
Name your risky container and deploy
36. 3535
Some Nostalgia: Big Vision = Systems as Data
Transform systems into
documents/frames/data
Crawl the cloud like you crawl
the web
Query & mine the cloud like
query/mine the web
Learn good & bad sytem/SW
configurations automagically
37. 36
Operational Analytics Data Pipeline [Where We Started]
Images
(Registry)
Kafka
Configuration Channel
Compliance Channel
Vulnerability Channel
Indexers
Vulnerability Annotator
Elastic
Configuration Index
Compliance Index
Vulnerability Index
Compliance Annotator
38. 37
Operational Analytics Data Pipeline [Where We Are]
Images
(Registry) Notification Channel
Kafka
Configuration Channel
Compliance Channel
Vulnerability Channel
Indexers
Vulnerability Annotator
Discovery Channel
Instances
(Compute) SecConfig Channel
Rootkit Channel
Licence Channel
Notification Index
Elastic
Configuration Index
Compliance Index
Vulnerability Index
Discovery Index
SecConfig Index
Rootkit Index
Licence Index
USNs Index
Compliance Annotator
Password Annotator
Config Parser
SecConfig Annotator
SW Discovery
Rootkit Annotator
Licence Discovery
Notification Parser
Security
Notices
39. 38
Our Other Key Operational Analytics Directions
Config Analytics SW and System Discovery by Examples
Secure Config Advisor Cloud Time Machine
Risk Analysis Licence Discovery
Licence Discovery
Data Pipeline Licence Db
Im
g
40. 39
Summary & Open Problems
Summary:
Challenges: Operational visibility into complex cloud applications; need for real operational intelligence
Opportunities: Transform systems to data; New line of ops data analytics; So many low-hanging pain points
Agentless System Crawler and Vulnerability Advisor as simple ground-floor examples
Parting Thoughts:
Operational Visibility >> Metrics & Logs (although a good start, add state, config, interactions, dependencies,…)
Cloud lends itself to novel & elegant “monalytics” designed with cloud-native thinking
Everything analytics can be as-a-service when we decouple systems | observations | recommendations | actions
Open Research Questions:
Truly Seamless OpVis: No performance impact (~/~) + Absolutely no side effects (+/-)
Extensibility and configurability: Deep visibility into system, application and infra
Scale out across runtimes and scale up to many instances; challenges & limits
How do you design DDOS-mitigation/admission-control/fair sharing
in this model of built-in service
Privacy and data sensitivity with Ops data analytics
Piecemeal analytics/security solutions Cloud analytics/security roadmap
Rules/annotators Actually smart analytics that learn
good and bad configs for security, performance, availability, etc.
Cross-silo analytics across Time, Space, Dev/Ops [CloudSight Dream]
41. 40
The More You Know
Papers:
Operational Visibility: IC2E’14, Sigmetrics’14, VEE’15, HotCloud’15, ATC’16 (InterConnect’15)
Operational Analytics: BigData’14, IBM JRD’16:{SWDisc,NFM,DevOps} (InterConnect’16)
Blogs:
Crawl the Cloud Like You Crawl the Web:
https://developer.ibm.com/open/2015/07/18/crawl-cloud-like-crawl-web/
Monitoring and Logging for IBM Containers. No configuration needed:
https://developer.ibm.com/bluemix/2015/07/06/monitoring-and-logging-for-containers-no-config-required/
Test Driving Built-in Monitoring and Logging in IBM Containers:
https://developer.ibm.com/bluemix/2015/11/16/built-in-monitoring-and-logging-for-bluemix-containers/
Is your Docker container secure? Ask Vulnerability Advisor!:
https://developer.ibm.com/bluemix/2015/07/02/vulnerability-advisor/
Demos:
https://www.youtube.com/channel/UCf8Fn8dKQzBCJRgI1jOlGYg
Open Source:
dwOpen Tech Talk: https://developer.ibm.com/open/events/dw-open-tech-talk-agentless-system-crawler/
dwOpen Page: https://developer.ibm.com/open/agentless-system-crawler/
Agentless System Crawler: http://github.com/cloudviz/agentless-system-crawler
PSVMI Introspection Library: https://github.com/cloudviz/psvmi
Try It:
As-a-service today: http:///www.bluemix.net
Run it yourself: http://github.com/cloudviz/agentless-system-crawler
42. 41
Thank You
Seamless, Unified Operational visibility and Analytics Designed fro Cloud
[feat. Agentless System Crawler & Vulnerability Advisor]
IBM Research
Cloud Monitoring, Operational and DevOps Analytics
http://www.canturkisci.com/blog
@canturkisci