2. Network infrastructures
are more complex than
ever
Qualified resources are
more scarce than ever
Network infrastructures
are more mission critical
than ever
3. Knowledge softwarization
Knowledge dependent on people
Scarce, limited.
user user
API call
+
Knowledge compiled in software
Broadly available at the distance of an API call
Elastic, scalable, “unlimited”
The LLMs are huge
“knowledge compilators”
4. The network softwarization journey
Standarization of
APIs
Network
Softwarization
Pervasive
Network Data
Decision
points
Decision
optimization
Cloud
LLM
Knowledge
softwarization
Dynamic
networks
Autonomous
agents
Autonomous
networks
5. LLM-Powered Autonomous Agents
From
LLM
Imperative / prescriptive
Task execution:
• Generate text
• Summarize text
• Translate text
• Q&A
• ...
To
Memory Function
Planning
Action
Intent based / descriptive
- Complex problem solver
- Autonomous execution
- Example: Anetta.ai
LLM-powered
Agent
8. Network infrastructures
are more complex than
ever
Qualified resources are
more scarce than ever
Network infrastructures
are more mission critical
than ever
LLM-
Powered
Autonomous
Agents
9. Your logo
here
AI in Networking:
where does it fit?
Igor Giangrossi
Sr. Dir, Consulting Engineering
10.
11. Typical use cases of AI/ML in Networking
Reporting Thresholding Baselining Clustering Predicting
Collect current and
historical data for
effective visualization
and analysis
Detect situations
where customized
indicators violate
defined thresholds
Learn the expected
range of values for
key resources
Identify things that
appear to be
behaving differently
than others
Anticipate behavior
that might happen in
the future
Passive Pro-active
Active
13. Could we use AI/ML inside routers?
Challenges:
• What is the use case?
• Processing resources (CPU, Memory, Disk)?
• App development environment / SDK?
• Data collection API?
• Analysis only, or closed loop automation?
• How to use the output?
Opportunities:
• All system state readily available
• Real-time processing of events?
• Smaller, custom-trained models?
14. Demo: ChatGPT app in Nokia SR Linux
Natural language as a CLI helper
Hardware eXtensible Data Path (XDP)
Infrastructure
Standard Linux Kernel
Pub/Sub via
protobufs/gRPC
Lightweight Impart
Database (IDB)
NetOps
Development Kit
(NDK)
Applications
BGP OSPF QoS …
ACL
ChatGPT App
BFD ISIS MPLS
Management
YANG Models
gNMI gNOI gRIBI OC gRPC CLI
Question + context
Answer
LLM
18. Monitoring – Observability - AIOps
Monitoring
Observability
AI Ops
Aggregate
MELT
Gather
MELT
Root Cause
Analysis
Predictive
Analytics
Use Cases
Benefits
Prescriptive Analytics
Self-healing
Anomaly
detection
Support
Audits
Noise
Reductio
n
Natural
Language
Interaction
s
19. Maturity Model
Passive
Ops
Too much data –
rely on customer-
triggered incident
Reactive and
highly manual
processes
Silod Ops teams
– poor
collaboration
Active
Ops
Silo’d
observability
tools
Part Reactive,
part proactive
Silo’d Ops teams
– some
collaboration
AIOps
AI Augmented
Operations
Highly Proactive
Strong Ops Team
Collaboration
NoOps
Autonomous AI
Supervised full
automation
Ops team re-
skilled as
developers
20. Passive to Active Ops
Graduation Criteria:
● Exec Sponsorship – budgets and business goal alignment
● Manual operations transitioned to tools-led ops landscape
● Architects, ops teas and developers focused on the discipline of
observability
Characteristics of Active Ops
● Proactive Alerting on serious issues, detecting hazcons, predicting unknown
‘unknowns’
● Filtering noise in telemetry data
● Correlation of events across data types and sources – reduced alert fatigue
● Collected and analyzed data measures SLA and SLO compliance
21. Active Ops to AI Ops
Active Ops Graduation Criteria
● Adoption of enterprise AI discipline
● Refactored Ops processes to gear them towards an AI-based automation environment
● AIOps solutions are the central operations function in the enterprise
● AIOps solutions are linked seamlessly with existing Monitoring and Observability solutions
Characterization of AI Ops
● Decision making is augmented by AI-based tools through descriptive and predictive analytics, as well as
remediation recommendations
● Expert management of large data systems/modern data lakes/lakehouse
● Enhanced Root Cause Analysis, Anomaly & Outlier detection, and correlation of these machine interpreted
incidents
● Prediction and forecasting inherent in the ML models
● Highly collaborative ops teams
● AI tools are trained/customized to the specific domain(s) they support
22. Active to AI Ops Use Case: Tracfone
Challenges
● Over a dozen monitoring tools
● Excellent Eng and Ops staff in their specific domain
● Cross-domain challenges resulted significant outage times
The Implementation of Selector AIOps
● Ingested data from 18 different tools that monitored everything from the network all the
way up to the application layer
● AI driven alert & event filtering, and correlation
● Graphical dashboards presenting a unified view and analytical insights
Results
● Huge reduction in MTTR, especially in major incidents
● Improved overall uptime with proactive identification of potential incident causing issues