Season 3 Episode 2
Oct 14, 2015
Welcome!
Agenda
NetflixOSS Website Relaunch @aspyker
Fenzo @podila
Vector @spiermar
Linux Java perf support @brendangregg
FIDO, Sleepy Puppy, Lemur @chanjbs
Falcor @jhusain
Website Relaunch
http://netflix.github.io
Goals of the Relaunch
● Show how the pieces fit together
○ Projects now discussed with each other in context
● OSS categories mirror internal teams
○ No artificial categories, focal points for each area
● Focus on projects that are core to Netflix
○ Projects mentioned are core and strategic
● Adding project-branded websites
High Level Categories
Big Data
Tools and services for (big) data
Build and Delivery Tools
Taking code from desktop to the cloud
Common Runtimes Service & Libraries
Runtime containers, libraries & services that power
microservices
High Level Categories
Data Persistence
Storing and serving data in the cloud
Insight, Reliability and Performance
Providing actionable insight at massive scale
High Level Categories
Security
Security for dynamic and distributed environments
User Interface
Libraries to help you build rich client applications
Fenzo
A generic, plug-ins based scheduling library for
Apache Mesos frameworks
Fenzo scheduling library
Heterogeneous
resources
Autoscaling
of cluster
Visibility of
scheduler
actions
Plugins for
Constraints, Fitness
High speed
Heterogeneous
task requests
Fenzo: scheduling model
Fitness
Pending
Assigned
Urgency
Fenzo: scheduling optimizations
Speed Accuracy
First fit assignment Optimal assignment
Real world tradeoffs
~ O (1) ~ O (N * M)1
1
Assuming tasks are not reassigned
Fenzo: fitness, constraints plugins
● Fitness value (0.0 - 1.0)
○ Degree of fitness - first fit, best fit, worst fit
○ Composable evaluators
○ e.g., bin packing
● Constraints
○ Hard constraints filter appropriate resources
○ Soft constraints specify preferences
○ e.g., zone balancing, instance type preferences
Fenzo: bin packing experiment
Bin pack tasks using Fenzo’s built-in CPU bin packer
Fenzo: cluster autoscaling
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/Cluster:
mantisagent
MinIdle: 8
MaxIdle: 20
CooldownSecs:
360
ASG/cluster:
computeCluster
MinIdle: 8
MaxIdle: 20
CooldownSecs: 360
Fenzo
ScaleUp
action:
Cluster, N
ScaleDown
action:
Cluster,
HostList
Fenzo: what’s next
● Task management SLAs
● Support for newer Mesos features
● Collaboration
Why?
● Easier way for users to troubleshoot
performance issues
● Access to low-level and specialized metrics
● Easier way to visualize and understand
● High-resolution data to detect anomalies
● Real-time and on-demand
● No additional overhead when not in use
● Something easier than SSH
● And simpler than full-fledged monitoring
solution
What?
● Is a Performance Monitoring tool
● Host-Level, On-Demand, High-Resolution Metrics (1 second)
● Client-side Application, User-friendly web UI
● Configurable dashboards and widgets
● Leverages Performance Co-Pilot (PCP)
● Stateless and Lightweight Metric Collection
● No persistence
● System Metrics: CPU, Memory, Network, Disk, ...
● Application Metrics*: Java, Memcached, C*, ElasticSearch, Apache
● Extensible. Custom metric agents and widgets.
* Agents are available, but not included by default.
What’s Next?
● Interface for different backends
● Better support for containers;
○ With container-specific dashboard and widgets.
● Native flame graph integration;
○ With our d3.js flame graph plugin.
CPU Flame Graphs
Java Mixed-Mode Flame Graphs
● Needs JDK8u60+ with
-XX:+PreserveFramePointer
○ May have some cost
● Lets Linux perf (perf_events)
see Java method frames
● Use with perf-map-agent for
symbols
● http://techblog.netflix.
com/2015/07/java-in-flames.
html
Java
Kernel
JVMGC
See all the things...
D3.js Flame Graph Plugin
Netflix Security OSS
FIDO - Security Response Orchestration
● Centralize alerts
● Enrich with data
○ User, machine
○ Threat
● Prioritize response
● Automate first
actions
Netflix's FIDO is not a part of or service of the FIDO Alliance
Cross-Site Scripting
Sleepy Puppy - XSS Testing Framework
● Visibility for non-
targeted vulnerable
apps
● Assessment
management over
time
Sleepy Puppy - Assessments and Payloads
TLS Certificate Management
Lemur - x.509 Certificate Orchestration
● Pluggable CA
support
● Private key
management and
distribution
● Expiry monitoring
Lemur Certificate Request
Every user wants to believe the entire cloud is
sitting right on their device.
Falcor let’s you code that way.
Let's talk about REST.
The Web used to be a place to get things.
Today, the Web is a place to do things.
Web pages use a small
number of large resources.
Web apps use large numbers of
small resources.
What is ?
Falcor is not a replacement for your Database,
MVC Framework, or your Web Server.
Falcor fits into your existing stack,
allowing the layers to communicate
more efficiently.
model.json
Demo
Falcor
● Designed for needs of Web Apps
● Model domain with JSON Graph
● Optimizes Data Access using...
○ caching
○ batching
○ path optimization
Falcor Roadmap
● netflix.github.io/falcor
● Java version of Router coming
● iOS client coming
Wrapup
● Thanks for attending!
● Join us in the courtyard for food and drinks

NetflixOSS Meetup season 3 episode 2

  • 1.
    Season 3 Episode2 Oct 14, 2015
  • 2.
  • 3.
    Agenda NetflixOSS Website Relaunch@aspyker Fenzo @podila Vector @spiermar Linux Java perf support @brendangregg FIDO, Sleepy Puppy, Lemur @chanjbs Falcor @jhusain
  • 4.
  • 5.
    Goals of theRelaunch ● Show how the pieces fit together ○ Projects now discussed with each other in context ● OSS categories mirror internal teams ○ No artificial categories, focal points for each area ● Focus on projects that are core to Netflix ○ Projects mentioned are core and strategic ● Adding project-branded websites
  • 6.
    High Level Categories BigData Tools and services for (big) data Build and Delivery Tools Taking code from desktop to the cloud Common Runtimes Service & Libraries Runtime containers, libraries & services that power microservices
  • 7.
    High Level Categories DataPersistence Storing and serving data in the cloud Insight, Reliability and Performance Providing actionable insight at massive scale
  • 8.
    High Level Categories Security Securityfor dynamic and distributed environments User Interface Libraries to help you build rich client applications
  • 9.
    Fenzo A generic, plug-insbased scheduling library for Apache Mesos frameworks
  • 10.
    Fenzo scheduling library Heterogeneous resources Autoscaling ofcluster Visibility of scheduler actions Plugins for Constraints, Fitness High speed Heterogeneous task requests
  • 11.
  • 12.
    Fenzo: scheduling optimizations SpeedAccuracy First fit assignment Optimal assignment Real world tradeoffs ~ O (1) ~ O (N * M)1 1 Assuming tasks are not reassigned
  • 13.
    Fenzo: fitness, constraintsplugins ● Fitness value (0.0 - 1.0) ○ Degree of fitness - first fit, best fit, worst fit ○ Composable evaluators ○ e.g., bin packing ● Constraints ○ Hard constraints filter appropriate resources ○ Soft constraints specify preferences ○ e.g., zone balancing, instance type preferences
  • 14.
    Fenzo: bin packingexperiment Bin pack tasks using Fenzo’s built-in CPU bin packer
  • 15.
    Fenzo: cluster autoscaling ASG/Cluster: mantisagent MinIdle:8 MaxIdle: 20 CooldownSecs: 360 ASG/Cluster: mantisagent MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 ASG/cluster: computeCluster MinIdle: 8 MaxIdle: 20 CooldownSecs: 360 Fenzo ScaleUp action: Cluster, N ScaleDown action: Cluster, HostList
  • 16.
    Fenzo: what’s next ●Task management SLAs ● Support for newer Mesos features ● Collaboration
  • 18.
    Why? ● Easier wayfor users to troubleshoot performance issues ● Access to low-level and specialized metrics ● Easier way to visualize and understand ● High-resolution data to detect anomalies ● Real-time and on-demand ● No additional overhead when not in use ● Something easier than SSH ● And simpler than full-fledged monitoring solution
  • 19.
    What? ● Is aPerformance Monitoring tool ● Host-Level, On-Demand, High-Resolution Metrics (1 second) ● Client-side Application, User-friendly web UI ● Configurable dashboards and widgets ● Leverages Performance Co-Pilot (PCP) ● Stateless and Lightweight Metric Collection ● No persistence ● System Metrics: CPU, Memory, Network, Disk, ... ● Application Metrics*: Java, Memcached, C*, ElasticSearch, Apache ● Extensible. Custom metric agents and widgets. * Agents are available, but not included by default.
  • 22.
    What’s Next? ● Interfacefor different backends ● Better support for containers; ○ With container-specific dashboard and widgets. ● Native flame graph integration; ○ With our d3.js flame graph plugin.
  • 23.
  • 24.
    Java Mixed-Mode FlameGraphs ● Needs JDK8u60+ with -XX:+PreserveFramePointer ○ May have some cost ● Lets Linux perf (perf_events) see Java method frames ● Use with perf-map-agent for symbols ● http://techblog.netflix. com/2015/07/java-in-flames. html Java Kernel JVMGC
  • 25.
    See all thethings...
  • 26.
  • 27.
  • 28.
    FIDO - SecurityResponse Orchestration ● Centralize alerts ● Enrich with data ○ User, machine ○ Threat ● Prioritize response ● Automate first actions Netflix's FIDO is not a part of or service of the FIDO Alliance
  • 29.
  • 30.
    Sleepy Puppy -XSS Testing Framework ● Visibility for non- targeted vulnerable apps ● Assessment management over time
  • 31.
    Sleepy Puppy -Assessments and Payloads
  • 32.
  • 33.
    Lemur - x.509Certificate Orchestration ● Pluggable CA support ● Private key management and distribution ● Expiry monitoring
  • 34.
  • 36.
    Every user wantsto believe the entire cloud is sitting right on their device.
  • 37.
    Falcor let’s youcode that way.
  • 38.
  • 39.
    The Web usedto be a place to get things.
  • 40.
    Today, the Webis a place to do things.
  • 41.
    Web pages usea small number of large resources.
  • 42.
    Web apps uselarge numbers of small resources.
  • 43.
  • 44.
    Falcor is nota replacement for your Database, MVC Framework, or your Web Server.
  • 45.
    Falcor fits intoyour existing stack, allowing the layers to communicate more efficiently.
  • 46.
  • 47.
  • 49.
    Falcor ● Designed forneeds of Web Apps ● Model domain with JSON Graph ● Optimizes Data Access using... ○ caching ○ batching ○ path optimization
  • 50.
    Falcor Roadmap ● netflix.github.io/falcor ●Java version of Router coming ● iOS client coming
  • 51.
    Wrapup ● Thanks forattending! ● Join us in the courtyard for food and drinks