Diving Deep into the API
Ocean with Open Source
Deep Learning Tools
Paul M. Cray, APImetrics
Who are APImetrics?
Seattle-based startup
Blue chip clients include banks, fintech, carriers, utilities
and vehicle IoT
• APImetrics makes individual or sequences of functional API calls
• Synthetic test calls can be scheduled to be made from any location
in any of the 4 main clouds (AWS, Azure, Google, IBM)
• Codebase written in Python with JavaScript for UI
• Data is analyzed using ML and AI functionality we are developing
using open source tools
Who does APImetrics do?
• to manage your APIs you need to understand how they actually behave
from the end-user’s perspective in the real world
• APImetrics is an API performance and quality monitoring system running as
Software-as-a-Service on Google App Engine
• we provide wizards that allow users to create authentications, test calls and
workflows (back-to-back calls) easily
• test calls can deployed to more than 60 cloud locations on four continents to
make scheduled calls to exercise API endpoints
• we support our own API to facilitate deep integration into higher-level
management systems
What does APImetrics look like?
APImetrics 4.7TB historical dataset
• Over 400M API call records made from multiple clouds and locations
• We retain retained all data associated with each call including
payload to give complete picture of API performance
– Timestamp of call
– API endpoint
– Call cloud location
– HTTP response code
– Payload
– Latency breakdown times
• DNS lookup, Connect, Handshake, Upload, Processing, Download
APImetrics Insights CASC score
• What metric do you use measure API performance?
– Latency? Availability? Pass rate?
• Too many variables to compare and contrast API quality easily
• APImetrics use our own magic sauce to combine metrics into a
single blended credit rating-like score
• CASC score allows at-a-glance like-to-like comparison and trend
analysis of the performance and quality of different API calls
• CASC scores are currently calculated on a weekly and monthly
basis, but daily scores coming soon
Typical APImetrics Insights CASC scores
The CASC score and Machine Learning
CHALLENGE: How do we calculate CASC scores in real time? What
do we need?
• More robust (patent application in progress) method for calculating
CASC score that leverages our unrivalled historical dataset
• Uses supervised learning with linear regression used to calculate
CASC parameters
• Python scikit-learn package also numpy, pandas, scipy and
statsmodel used in APImetrics Insights
It’s 2017. How about a
neural net?
The components to be looked
• Outlier detection
• Handling multimodality
• Identifying clusters of related events
• Anomaly detection
Outlier detection
• Historically:
– Heuristic designated a record an outlier if overall latency exceeded a certain
number of standard deviations from the mean
• Outlier detection is a visual problem
– We can see (some/most of) the outliers by eye
• How to use deep learning techniques to detect outliers?
– Implement Recurrent Neural Net (RNN) to analyze time series data?
– Implement Convolutional Neural Net (CNN) to recognise outlier patterns?
– Use PyTorch as it is emerging as the leading Deep Learning framework and
supports idiomatic Python approach
What outliers look like
Multimodality detection
• Latency distribution are typically neither unimodal nor normal
• Outlier detection heuristics relying on latencies being so are flawed
• Reliable outlier detection must first determine modality
• Easy by eye, but sensitive to binning
– Use a CNN to detect modality?
– Use a clustering algorithm to assign modality?
– How to handle binning problem?
What multimodality looks like
Cluster detection
• Currently using a heuristic to construct clusters of outliers
– Much too simplistic
• Exploring algorithms like k-means implemented in a package such
as scikit-learn
• But a result is more like to be an outlier if it is close to other outliers,
i.e. if it is in a cluster
• We believe outlier and cluster detection should be done
simultaneously
– Investigating if an RNN can identify whether a record is an outlier and whether it
belongs to a cluster
What clusters look like
APIs and AIPIs
• APImetrics has 4.7TB of (semi-)structured data packed with
actionable intelligence
– If we can discover it
• We know what we can look for, but what is hidden in the data
ocean?
• An experienced API support engineer can extrapolate from an issue
with one API to an similar issue with a completely different API
– Ultimate goal is a domain-specific AI that does this automagically: an Artificially
Intelligence Programming Interface (AIPI) that can capture, generate and
manipulate API-related knowledge
Diving Deep into the API
Ocean with Open Source
Deep Learning Tools
Paul M. Cray, APImetrics

LF_APIStrat17_Diving Deep into the API Ocean with Open Source Deep Learning Tools

  • 2.
    Diving Deep intothe API Ocean with Open Source Deep Learning Tools Paul M. Cray, APImetrics
  • 3.
    Who are APImetrics? Seattle-basedstartup Blue chip clients include banks, fintech, carriers, utilities and vehicle IoT • APImetrics makes individual or sequences of functional API calls • Synthetic test calls can be scheduled to be made from any location in any of the 4 main clouds (AWS, Azure, Google, IBM) • Codebase written in Python with JavaScript for UI • Data is analyzed using ML and AI functionality we are developing using open source tools
  • 4.
    Who does APImetricsdo? • to manage your APIs you need to understand how they actually behave from the end-user’s perspective in the real world • APImetrics is an API performance and quality monitoring system running as Software-as-a-Service on Google App Engine • we provide wizards that allow users to create authentications, test calls and workflows (back-to-back calls) easily • test calls can deployed to more than 60 cloud locations on four continents to make scheduled calls to exercise API endpoints • we support our own API to facilitate deep integration into higher-level management systems
  • 5.
  • 6.
    APImetrics 4.7TB historicaldataset • Over 400M API call records made from multiple clouds and locations • We retain retained all data associated with each call including payload to give complete picture of API performance – Timestamp of call – API endpoint – Call cloud location – HTTP response code – Payload – Latency breakdown times • DNS lookup, Connect, Handshake, Upload, Processing, Download
  • 7.
    APImetrics Insights CASCscore • What metric do you use measure API performance? – Latency? Availability? Pass rate? • Too many variables to compare and contrast API quality easily • APImetrics use our own magic sauce to combine metrics into a single blended credit rating-like score • CASC score allows at-a-glance like-to-like comparison and trend analysis of the performance and quality of different API calls • CASC scores are currently calculated on a weekly and monthly basis, but daily scores coming soon
  • 8.
  • 9.
    The CASC scoreand Machine Learning CHALLENGE: How do we calculate CASC scores in real time? What do we need? • More robust (patent application in progress) method for calculating CASC score that leverages our unrivalled historical dataset • Uses supervised learning with linear regression used to calculate CASC parameters • Python scikit-learn package also numpy, pandas, scipy and statsmodel used in APImetrics Insights
  • 10.
    It’s 2017. Howabout a neural net?
  • 11.
    The components tobe looked • Outlier detection • Handling multimodality • Identifying clusters of related events • Anomaly detection
  • 12.
    Outlier detection • Historically: –Heuristic designated a record an outlier if overall latency exceeded a certain number of standard deviations from the mean • Outlier detection is a visual problem – We can see (some/most of) the outliers by eye • How to use deep learning techniques to detect outliers? – Implement Recurrent Neural Net (RNN) to analyze time series data? – Implement Convolutional Neural Net (CNN) to recognise outlier patterns? – Use PyTorch as it is emerging as the leading Deep Learning framework and supports idiomatic Python approach
  • 13.
  • 14.
    Multimodality detection • Latencydistribution are typically neither unimodal nor normal • Outlier detection heuristics relying on latencies being so are flawed • Reliable outlier detection must first determine modality • Easy by eye, but sensitive to binning – Use a CNN to detect modality? – Use a clustering algorithm to assign modality? – How to handle binning problem?
  • 15.
  • 16.
    Cluster detection • Currentlyusing a heuristic to construct clusters of outliers – Much too simplistic • Exploring algorithms like k-means implemented in a package such as scikit-learn • But a result is more like to be an outlier if it is close to other outliers, i.e. if it is in a cluster • We believe outlier and cluster detection should be done simultaneously – Investigating if an RNN can identify whether a record is an outlier and whether it belongs to a cluster
  • 17.
  • 18.
    APIs and AIPIs •APImetrics has 4.7TB of (semi-)structured data packed with actionable intelligence – If we can discover it • We know what we can look for, but what is hidden in the data ocean? • An experienced API support engineer can extrapolate from an issue with one API to an similar issue with a completely different API – Ultimate goal is a domain-specific AI that does this automagically: an Artificially Intelligence Programming Interface (AIPI) that can capture, generate and manipulate API-related knowledge
  • 19.
    Diving Deep intothe API Ocean with Open Source Deep Learning Tools Paul M. Cray, APImetrics