"The world is full of seas of data. Some of these seas are created and exchanged by APIs; some of them are even about APIs. Since 2014, APImetrics has accumulated over 100 GB of data on API test calls made to over 5000 API endpoints by agents deployed in cloud locations on 5 continents.
There's a huge amount of insight trapped at the bottom of that sea of (in our case unlabelled) data. Getting at it would've been nearly impossible before the emergence of powerful open source deep learning libraries in the mid-2010s.
APImetrics will share how we chose a deep learning library and the data munging we did to get our data to work with the library. We will explain how we were able carry out unsupervised and semi-supervised learning and discuss the insights on global API performance and quality we were able to dredge from the bottom of our sea of data. We'll provide pointers on how organizations, from startups like APImetrics to megacorporations, can use deep learning to create oceans of knowledge from their own seas of data."
2024: Domino Containers - The Next Step. News from the Domino Container commu...
LF_APIStrat17_Diving Deep into the API Ocean with Open Source Deep Learning Tools
1.
2. Diving Deep into the API
Ocean with Open Source
Deep Learning Tools
Paul M. Cray, APImetrics
3. Who are APImetrics?
Seattle-based startup
Blue chip clients include banks, fintech, carriers, utilities
and vehicle IoT
• APImetrics makes individual or sequences of functional API calls
• Synthetic test calls can be scheduled to be made from any location
in any of the 4 main clouds (AWS, Azure, Google, IBM)
• Codebase written in Python with JavaScript for UI
• Data is analyzed using ML and AI functionality we are developing
using open source tools
4. Who does APImetrics do?
• to manage your APIs you need to understand how they actually behave
from the end-user’s perspective in the real world
• APImetrics is an API performance and quality monitoring system running as
Software-as-a-Service on Google App Engine
• we provide wizards that allow users to create authentications, test calls and
workflows (back-to-back calls) easily
• test calls can deployed to more than 60 cloud locations on four continents to
make scheduled calls to exercise API endpoints
• we support our own API to facilitate deep integration into higher-level
management systems
6. APImetrics 4.7TB historical dataset
• Over 400M API call records made from multiple clouds and locations
• We retain retained all data associated with each call including
payload to give complete picture of API performance
– Timestamp of call
– API endpoint
– Call cloud location
– HTTP response code
– Payload
– Latency breakdown times
• DNS lookup, Connect, Handshake, Upload, Processing, Download
7. APImetrics Insights CASC score
• What metric do you use measure API performance?
– Latency? Availability? Pass rate?
• Too many variables to compare and contrast API quality easily
• APImetrics use our own magic sauce to combine metrics into a
single blended credit rating-like score
• CASC score allows at-a-glance like-to-like comparison and trend
analysis of the performance and quality of different API calls
• CASC scores are currently calculated on a weekly and monthly
basis, but daily scores coming soon
9. The CASC score and Machine Learning
CHALLENGE: How do we calculate CASC scores in real time? What
do we need?
• More robust (patent application in progress) method for calculating
CASC score that leverages our unrivalled historical dataset
• Uses supervised learning with linear regression used to calculate
CASC parameters
• Python scikit-learn package also numpy, pandas, scipy and
statsmodel used in APImetrics Insights
11. The components to be looked
• Outlier detection
• Handling multimodality
• Identifying clusters of related events
• Anomaly detection
12. Outlier detection
• Historically:
– Heuristic designated a record an outlier if overall latency exceeded a certain
number of standard deviations from the mean
• Outlier detection is a visual problem
– We can see (some/most of) the outliers by eye
• How to use deep learning techniques to detect outliers?
– Implement Recurrent Neural Net (RNN) to analyze time series data?
– Implement Convolutional Neural Net (CNN) to recognise outlier patterns?
– Use PyTorch as it is emerging as the leading Deep Learning framework and
supports idiomatic Python approach
14. Multimodality detection
• Latency distribution are typically neither unimodal nor normal
• Outlier detection heuristics relying on latencies being so are flawed
• Reliable outlier detection must first determine modality
• Easy by eye, but sensitive to binning
– Use a CNN to detect modality?
– Use a clustering algorithm to assign modality?
– How to handle binning problem?
16. Cluster detection
• Currently using a heuristic to construct clusters of outliers
– Much too simplistic
• Exploring algorithms like k-means implemented in a package such
as scikit-learn
• But a result is more like to be an outlier if it is close to other outliers,
i.e. if it is in a cluster
• We believe outlier and cluster detection should be done
simultaneously
– Investigating if an RNN can identify whether a record is an outlier and whether it
belongs to a cluster
18. APIs and AIPIs
• APImetrics has 4.7TB of (semi-)structured data packed with
actionable intelligence
– If we can discover it
• We know what we can look for, but what is hidden in the data
ocean?
• An experienced API support engineer can extrapolate from an issue
with one API to an similar issue with a completely different API
– Ultimate goal is a domain-specific AI that does this automagically: an Artificially
Intelligence Programming Interface (AIPI) that can capture, generate and
manipulate API-related knowledge
19. Diving Deep into the API
Ocean with Open Source
Deep Learning Tools
Paul M. Cray, APImetrics