API Security Now Depends on The Novel and Penetrating Use Of Advanced Machine Learning And Actionable Artificial Intelligence

1
API Security Now Depends on
The Novel and Penetrating Use
Of Advanced Machine Learning
And Actionable Artificial Intelligence
Transcript of a discussion on the best security solutions for APIs across their dynamic and often
uncharted use across myriad apps and business services.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Traceable.ai.
Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you’re
listening to BriefingsDirect.
While the use of machine learning (ML) and artificial intelligence (AI) for IT security may not be
new, the extent to which data-driven analytics can detect and thwart nefarious activities is still in
its infancy.
As we’ve recently discussed here on BriefingsDirect, an expanding universe of interdependent
application programming interfaces (APIs) forms a new and complex threat vector that strikes at
the heart of digital business.
How will ML and AI form the next best security solution for APIs across their dynamic and often
uncharted use in myriad apps and services? Stay with us now as we answer that question by
exploring how advanced big data analytics forms a powerful and comprehensive means to
track, understand, and model safe APIs use.
To learn how AI makes APIs secure and more resilient across
their life cycles and use ecosystems, please join me in
welcoming Ravi Guntur, Head of Machine Learning and
Artificial Intelligence at Traceable.ai. Welcome, Ravi.
Ravi Guntur: Thanks, Dana. Happy to be here.
Gardner: Why does API security provide such a perfect use
case for the strengths of ML and AI? Why do these all come
together so well?
Guntur: When you look at the strengths of ML, the biggest
strength is to process data at scale. And newer applications
have taken a turn in the form of API-driven applications.
Large pieces of applications have been broken down into smaller pieces, and these smaller
pieces are being exposed as even smaller applications in themselves. To process the
information going between all these applications, to monitor what activity is going on, the scale
at which you need to deal with them has gone up many fold. That’s the reason why ML
Guntur

2
algorithms form the best-suited class of algorithms to deal with the challenges we face with API-
driven applications.
Gardner: Given the scale and complexity of the app security problem, what makes the older
approaches to security wanting? Why don’t we just scale up what we already do with security?
More than rules needed to secure multiple apps
Guntur: I’ll give an analogy as to why older approaches don’t work very well. Think of the older
approaches as a big box with, let’s say, a single door. For attackers to get into that big box, all
they must do is crack through that single door.
Now, with the newer applications, we have broken that big box into multiple small boxes, and
we have given a door to each one of those small boxes. If the attacker wants to get into the
application, they only have to get into one of these smaller boxes. And once he gets into one of
the smaller boxes, he needs to take a key out of it and use that key to open another box.
By creating API-driven applications, we have exposed a much bigger attack surface. That’s
number one. Number two, of course, we have made it challenging to the attackers, but the
attack surface being so much bigger now needs to be dealt with in a completely different way.
The older class of applications took a rules-based system as the common approach to solve
security use cases. Because they just had a single application and the application would not
change that much in terms of the interfaces it exposed, you could build in rules to analyze how
traffic goes in and out of that application.
Now, when we break the application into
multiple pieces, and we bring in other
paradigms of software development, such as
DevOps and Agile development
methodologies, this creates a scenario where
the applications are always rapidly changing.
There is no way rules can catch up with these
rapidly changing applications. We need automation to understand what is happening with these
applications, and we need automation to solve these problems, which rules alone cannot do.
Gardner: We shouldn’t think of AI here as replacing old security or even humans. It’s doing
something that just couldn’t be done any other way.
Guntur: Yes, absolutely. There’s no substitute for human intelligence, and there’s no substitute
for the thinking capability of humans. If you go deeper into the AI-based algorithms, you realize
that these algorithms are very simple in terms of how the AI is powered. They’re all based on
optimization algorithms. Optimization algorithms don’t have thinking capability. They don’t have
creativity, which humans have. So, there’s no way these algorithms are going to replace human
intelligence.
They are going to work alongside humans to make all the mundane activities easier for humans
and help humans look at the more creative and the difficult aspects of security, which these
algorithms can’t do out of the box.
We need automation to understand what
is happening with these applications, and
we need automation to solve these
problems, which rules alone cannot do.

3
Gardner: And, of course, we’re also starting to see that the bad guys, the attackers, the
hackers, are starting to rely on AI and ML themselves. You have to fight fire with fire. And so
that’s another reason, in my thinking, to use the best combination of AI tools that you can.
Guntur: Absolutely.
Gardner: Another significant and growing security threat are bots, and the scale that threat
vector takes. It seems like only automation and the best combination of human and machines
can ferret out these bots.
Machines, humans must combine to combat attacks
Guntur: You are right. Most of the best detection cases we see in security are a combination
of humans and machines. The attackers are also starting to use automation to get into systems.
We have seen such cases where the same bot comes in from geographically different locations
and is trying to do the same thing in some of the customer locations.
The reason they’re coming from so many different locations is to challenge AI-based algorithms.
One of the oldest schools of algorithms looks at rate anomaly, to see how quickly somebody is
coming from a particular IP address. The moment you spread the IP addresses across the
globe, you don’t know whether it’s different attackers or the same attacker coming from different
locations. This kind of challenge has been brought by attackers using AI. The only way to
challenge that is by building algorithms to counter them.
One thing is for sure, algorithms are not perfect.
Algorithms can generate errors. Algorithms can
create false positives. That’s where the human
analyst comes in, to understand whether what the
algorithm discovered is a true positive or a false positive. Going deeper into the output of an
algorithm digs back into exactly how the algorithm figured out an attack is being launched. But
some of these insights can’t be discovered by algorithms, only humans when they correlate
different pieces of information, can find that out. So, it requires a team. Algorithms and humans
work well as a team.
Learn More
About Traceable AI.
Gardner: What makes the way in which Traceable.ai is doing ML and AI different? How are
you unique in your vision and execution for using AI for API security?
Guntur: When you look at any AI-based implementation, you will see that there are three basic
components. The first is about the data itself. It’s not enough if you capture a large amount of
data; it’s still not enough if you capture quality data. In most cases, you cannot guarantee data
of high quality. There will always be some noise in the data.
It requires a team. Algorithms
and humans work well as a team.

4
But more than volume and quality of data, what is more important is whether the data that
you’re capturing is relevant for the particular use-case you’re trying to solve. We want to use the
data that is helpful in solving security use-cases.
Traceable.ai built a platform from the ground up to cater to those security use cases. Right from
the foundation, we began looking at the specific type of data required to solve modern API-
based application security use cases. That’s the first challenge that we address, it’s very
important, and brings strength to the product.
Be specific, respect differences in APIs
Once you address the proper data issue, the next is about how you learn from it. What are the
challenges around learning? What kind of algorithms do we use? What is the scenario when we
deploy that in a customer location?
We realized that every customer is completely
different and has a completely different set of
APIs, too, and those APIs behave differently. The
data that goes in and out is different. Even if you
take two e-commerce customers, they’re doing
the same thing. They’re allowing you to look at
products, and they’re selling you products. But the
way the applications have been built, and the API architecture -- everything is different.
We realized it’s no use to build supervised approaches. We needed to come up with an
architecture where the day we deploy at the customer location; the algorithm then self-learns.
The whole concept of being able to learn on its own just by looking at data is the core to the way
we build security using the AI algorithms we have.
Finally, the last step is to look at how we deliver security use cases. What is the philosophy
behind building a security product? We knew that rules-based systems are not going to work.
The alternate system is modeled around anomaly detection. Now, anomaly detection is a very
old subject, and we have used anomaly detection in various things. We have used it to
understand whether machinery is going to go down, we have used them to understand whether
the traffic patterns on the road are going to change, and we have used it for anomaly detection
in security.
But within anomaly detection, we focused on behavioral anomalies. We realized that APIs and
the people who use APIs are the two key entities in the system. We needed to model the
behavior of these two groups -- and when we see any deviation from this behavior, that’s when
we’re able to capture the notion of an attack.
Behavioral anomalies are important because if you look at the attacks, they’re so subtle. You
just can’t easily find the difference between the normal usage of an API and abnormal usage.
But very deep inside the data and very deep into how the APIs are interacting, there is a
deviation in the behavior. It’s very hard for humans to figure this out. Only algorithms can tease
this out and determine that the behavior is different from a known behavior.
We realized that every customer is
completely different and has a
completely different set of APIs, too,
and those APIs behave differently.

5
We have addressed this at all levels of our stack: The data-capture level, and the choice of how
we want to execute our AI, and the choice of how we want to deliver our security use cases.
And I think that’s what makes Traceable unique and holistic. We didn’t just bolt things on, we
built it from the ground up. That’s why these three pieces gel well and work well together.
Gardner: I’d like to revisit the concept you brought up about the contextual use of the
algorithms and the types of algorithms being deployed. This is a moving target, with so many
different use cases and company by company.
How do you keep up with that rate of change? How do you remain contextual?
Focus on function over form delivers context
Guntur: That’s a very good question. The notion of context is abstract. But when you dig
deeper into what context is and how you build context, it boils down to basically finding all
factors influencing the execution of a particular API.
Let’s take an example. We have an API, and we’re looking at how this API functions. It’s just not
enough to look at the input and output of the API. We need to look at something around it. We
need to see who triggered that input. Where did the user come from? Was it a residential IP
address that the user came in from? Was it a hosted IP address? Which geolocation is the user
coming from? Did this user have past anomalies within the system?
You need to bring in all these factors into the notion of context when we’re dealing with API
security. Now, it’s a moving target. The context -- because data is constantly changing. There
comes a moment when you have fixed this context, when you say that you know where the
users are coming from, and you know what the users have done in the past. There is some
amount of determinism to whatever detection you’re performing on these APIs.
Learn More
About Traceable AI.
Let’s say an API takes in five inputs, and it gives out 10 outputs. The inputs and outputs are a
constant for every user, but the values that go into the input varies from user to user. Your bank
account is different from my bank account. The account number I put in there is different for
you, and it’s different for me. If you build an algorithm that looks for an anomaly, you will say,
“Hey, you know what? For this part of the field, I’m seeing many different bank account
numbers.”
There is some problem with this, but that’s not true.
It’s meant to have many variations in that account
number, and that determination comes from context.
Building a context engine is unique in our AI-based
system. It helps us tease out false positives and helps
us learn the fact that some variations are genuine.
Building a context engine is
unique in our AI-based system. It
helps us tease out false positives
and helps us learn the fact that
some variations are genuine.

6
That’s how we keep up with this constant changing environment, where the environment is
changing not just because new APIs are coming in. It’s also because new data is coming into
the APIs.
Gardner: Is there a way for the algorithms to learn more about what makes the context powerful
to avoid false positives? Is there certain data and certain ways people use APIs that allow your
model to work better?
Guntur: Yes. When we initially started, we thought of APIs as rigidly designed. We thought of
an API as a small unit of execution. When developers use these APIs, they’ll all be focused on
very precise execution between the APIs.
But we soon realized that developers bundle various additional features within the same API.
We started seeing that they just provide a few more input options, and by triggering those extra
input options you get completely different functionality from the same API.
We had to come up with algorithms that discover that
a particular API can behave in multiple ways --
depending on the inputs being transmitted. It’s
difficult for us to figure out whether the API is going to
change and has ongoing change. But when we built
our algorithms, we assumed that an API is going to
have multiple manifestations, and we need to figure
out which manifestation is currently being triggered
by looking at the data.
We solved it differently by creating multiple personas for the same API. Although it looks like a
single API, we have an internal representation of an API with multiple personas.
Gardner: Interesting. Another thing that’s fascinating to me about the API security problem is
that the way hackers try not to abuse the API. Instead, they have subtle logic abuse attacks
where they’re basically doing what the API is designed to do but using it as a tool for their
nefarious activities.
How does your model help fight against these subtle logic abuse attacks?
Stopping logic abuse requires internal API deep dive
Guntur: When you look at the way hackers are getting into distributed applications and APIs
using these attacks – it is very subtle. We classify these attacks as business logic abuse. They
are using the existing business logic, but they are abusing it. Now, figuring out abuse to
business logic is a very difficult task. It involves a lot of combinatorial issues that we need to
solve. When I say combinatorial issues, it’s a problem of scale in terms of the number of APIs,
the number of parameters that can be passed, and the types of values that can be passed.
When we built the Traceable.ai platform, it was not enough to just look at the front-facing APIs,
we call them the external APIs. It’s also important for us to go deeper into the API ecosystem.
When we built our algorithms, we
assumed that an API is going to
have multiple manifestations, and
we need to figure out which
manifestation is currently being
triggered by looking at the data.

7
We have two classes of APIs. One, the external facing APIs, and the other is the internal APIs.
The internal APIs are not called by users sitting outside of the ecosystem. They’re called by
other APIs within the system. The only way for us to identify the subtle logic attacks is to be able
to follow the paths taken by those internal APIs.
If the internal APIs are reaching a resource like a database, and within the database it reaches
a particular row and column, it then returns the value. Only then you will be able to figure out
that there was a subtle attack. We’re able to figure this out only because of the capability to
trace the data deep into the ecosystem.
If we had done everything at the API gateway,
if we had done everything at external facing
APIs, we would not have figured out that there
was an attack launched that went deep into
the system and touched a resource it should
never have touched.
It’s all about how well you capture the data and how rich your data representation is to capture
this kind of attack. Once you capture this, using tons of data, and especially graph-like data, you
have no option but to use algorithms to process it. That’s why we started using graph-based
algorithms to discover variations in behavior, discover outliers, and uncover patterns of outliers,
and so on.
Learn More
About Traceable AI.
Gardner: To fully tackle this problem, you need to know a lot about data integration, a lot
about security and the vulnerabilities, as well as a lot about algorithms, AI, and data science.
Tell me about your background. How are you able to keep these big, multiple balls in the air at
once when it comes to solving this problem? There are so many different disciplines involved.
Multiple skills, experience fill data scientist toolbox
Guntur: Yes, it’s been a journey for me. When I initially started in 2005, I had just graduated
from university. I used a lot of mathematical techniques to solve key problems in natural
language processing (NLP) as part of my thesis. I realized that even security use cases can be
modeled as a language. If you take any operating system (OS), we typically have a few system
calls, right? About 200 system calls, or maybe 400 system calls. All the programs running in the
operating system are using about 400 system calls in different ways to build the different
applications.
It’s similar to natural languages. In natural language, you have words, and you compose the
words according to a grammar to get a meaningful sentence. Something similar happens in the
security world. We realized we could apply techniques from statistical NLP into the security use
cases. We discovered, for example, way back then, certain Solaris login buffer and overflow
vulnerabilities.
If we had done everything at external
facing APIs, we would not have figured
out that there was an attack launched
that went deep into the system.

8
That’s how the journey began. I then went through multiple jobs and worked on different use
cases. I learned if you want to be a good data scientist -- or if you want to use ML effectively --
you should think of yourself as a carpenter, as somebody with a toolbox with lots of tools in it,
and who knows how to use those tools very well.
But to best use those tools, you also need the experience from building various things. You
need to build a chair, a table, and a house. You need to build various things using the same set
of tools, and that took me further along that journey.
While I began with NLP, I soon ventured into
image processing and video processing, and I
applied that to security, too. It furthered the
journey. And through that whole process, I
realized that almost all problems can be mapped
to canonical forms. You can take any complex
problem and break it down into simpler problems. Almost all fields can be broken down into
simple mathematical problems. And if you know how to use various mathematical concepts, you
can solve a lot of different problems.
We are applying these same principles at Traceable.ai as well. Yes, it’s been a journey, and
every time you look at data you come up with different challenges. The only way to overcome
that is to dirty your hands and solve it. That’s the only way to learn and the only way we could
build this new class of algorithms -- by taking a piece from here, a piece from there, putting it
together, and building something different.
Gardner: To your point that complex things in nature, business, and technology can be brought
down to elemental mathematical understandings, once you’ve attained that with APIs, for
example, applying this first to security, and rightfully so, it’s the obvious low-lying fruit.
But over time, you also gain mathematical insights and understanding of more about how
microservices are used and how they could be optimized. Or even how the relationship between
developers and the IT production crews might be optimized.
Is that what you’re setting the stage for here? Will that mathematical foundation be brought to a
much greater and potentially productive set of a problem-solving?
Something for everybody
Guntur: Yes, you’re right. If you think about it, we have embarked on that journey already.
Based on what we have achieved as of today, and we look at the foundations over which we
have built that, we see that we have something for everybody.
For example, we have something for the security folks as well as for the developer folks. The
Traceable.ai system gives insights to developers as to what happens to their APIs when they’re
in production. They need to know that. How is it all behaving? How many users are using the
APIs? How are they using them? Mostly, they have no clue.
And on the other side, the security team doesn’t know exactly what the application is. They can
see lots of APIs, but how are the APIs glued together to form this big application? Now, the
You can take any complex problem and
break it down into simpler problems.
Almost fields can be broken down into
simple mathematical problems.

9
mathematical foundation under which all these implementations are being done is based on
relationships, relationships between APIs. You can call them graphs, you can call them
sequences, but it’s all about relationships.
One aspect we are looking at is how do you expose these relationships. Today we have this
relationship buried deep inside of our implementations, inside our platform. But how do you take
it out and make it visual so that you can better understand what’s happening? What is this
application? What happens to the APIs?
By looking at these visualizations, you can easily figure out if there are bottlenecks within the
application, for example. Is one API constantly being hit on? If I always go through this API, but
the same API is also leading me to a search engine or a products catalog page, why does this
API need to go through all these various functions? Can I simplify the API? Can I break it down
and make it into multiple pieces? These kinds of insights are now being made available to the
developer community.
Gardner: For those listening or reading this interview, how should they prepare themselves for
being better able to leverage and take advantage of what Traceable.ai is providing? How can
developers, security teams, as well as the IT operators get ready?
Rapid insights result in better APIs
Guntur: The moment you deploy Traceable in
your environment, the algorithms kick in and
start learning about the patterns of traffic in your
environment. Within a few hours -- or if your
traffic has high volume, within 48 hours -- you
will receive insights into the API landscape
within your environment. This insight starts with how many APIs are there in your environment.
That’s a fundamental problem that a lot of companies are facing today. They just don’t know
how many APIs exist in their environment at any given point of time. Once you know how many
APIs are there, you can figure out how many services there are. What are the different services,
and which APIs belong to which services?
Traceable gives you the entire landscape within a few hours of deployment. Once you
understand your landscape, the next interesting thing to see are your interfaces. You can learn
how risky your APIs are. Are you exposing sensitive data? How many of the APIs are external
facing? How to best use authentication to give access to APIs or not? And why do some APIs
not have authentication? How are you exposing APIs without authentication?
Learn More
About Traceable AI.
All these questions are answered right there in the user interface. After that, you can look at
whether your development team is in compliance. Do the APIs comply with the specifications in
the requirements? Because usually the development teams are rapidly churning out code, they
almost never maintain the API’s spec. They will have a draft spec and they will build against it,
The moment you deploy Traceable in
your environment, the algorithms kick
in and start learning about the patterns
of traffic in your environment.

10
but finally, when you deploy it, the spec looks very different. But who knows it’s different? How
do you know it’s different?
Traceable’s insights tell you whether your spec is compliant. You get to see that within a few
hours of deployment. In addition to knowing what happened to your APIs and whether they are
compliant with the spec, you start seeing various behaviors.
People think that when you have 100 APIs
deployed, all users use those APIs the same way.
We think all of them are using the apps the same
way. But you’d be surprised to learn that users use
apps in many different ways. Sometimes the APIs
are accessed through computational means,
sometimes they are accessed via user interfaces. There is now insight for the development
team on how users are actually using the APIs, which in itself is a great insight to help build
better APIs, which helps build better applications, and simplifies the application deployments.
All of these insights are available within a few hours of the Traceable.ai deployment. And I think
that’s very exciting. You just deploy it and open the screen to look at all the information. It’s just
fascinating to see how different companies have built their API ecosystems.
And, of course, you have the security use cases. You start seeing what’s at work. We have
seen, for example, what Bingbot from Microsoft looks like. But how active is it? Is it coming from
100 different IP addresses, or is it always coming from one part of a geolocation?
You can see how, for example, what search spiders’ activity looks like. What are they doing with
our APIs? Why is the search engine starting to look at the APIs, which are internal language
and have no information? But why are they crawling these APIs? All this information is available
to you within a few hours. It’s really fascinating when you just deploy and observe.
Gardner: I’m afraid we’ll have to leave it there. You’ve been listening to a sponsored
BriefingsDirect discussion on how data-driven behavioral analytics best detect and thwart
nefarious activities from across the burgeoning ecosystem of APIs use.
And we’ve learned how advanced ML-powered modeling and algorithms form a powerful and
inclusive means to track, understand, and model APIs in action.
So, a big thank you to Ravi Guntur, Head of Machine Learning and Artificial Intelligence at
Traceable.ai. Thank you so much.
Guntur: Thanks, Dana.
Gardner: And a big thank you as well for our audience for joining this BriefingsDirect API
resiliency discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host
throughout this series of Traceable.ai-sponsored BriefingsDirect interviews.
Thanks again for listening. Please pass this along to your business community and do come
back for our next chapter.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Traceable.ai.
We think of all users using the
apps the same way, but you’d be
surprised to learn that users use
apps in many different ways.

11
Transcript of a discussion on the best security solutions for APIs across their dynamic and often
uncharted use in myriad apps and business services. Copyright Interarbor Solutions, LLC, 2005-2021. All
rights reserved.
You may also be interested in:
● Securing APIs demands tracing and machine learning that analyze behaviors to head off attacks
● Rise of APIs brings new security threat vector -- and need for novel defenses
● Learn More About the Technologies and Solutions Behind Traceable.ai.
● Three Threat Vectors Addressed by Zero Trust App Sec
● Web Application Security is Not API Security
● Does SAST Deliver? The Challenges of Code Scanning.
● Everything You Need to Know About Authentication and Authorization in Web APIs
● Top 5 Ways to Protect Against Data Exposure
● TraceAI : Machine Learning Driven Application and API Security

API Security Now Depends on The Novel and Penetrating Use Of Advanced Machine Learning And Actionable Artificial Intelligence

Recommended

Recommended

More Related Content

Recently uploaded

Recently uploaded (20)

Featured

Featured (20)

API Security Now Depends on The Novel and Penetrating Use Of Advanced Machine Learning And Actionable Artificial Intelligence