Discover how AI is reshaping cybersecurity. This presentation delves into AI's role in enhancing threat detection, the balance of innovation and risk, and the strategies shaping the future of digital defense.
2. Lots of Hype - and Doom/Gloom -
Around AI Right Now ...
3. Overview of
Generative AI
• Gen AI is a subset of Deep
Learning, focusing on creating
models capable of generating new
content that resembles existing data.
• These models aim to generate
content that is indistinguishable from
what might be created by humans.
4. Four Main Types Of Generative AI (GAI)
Techniques [1]
•Models that generate human-like text across
various languages and styles, improving with
each version through larger training datasets.
Generative Pre-trained
Transformer (GPT)
• Two neural networks, a “Generator” and a
“Discriminator”, work together.
• The generator creates content, and the
discriminator judges its authenticity, aiming for
indistinguishably realistic outputs.
Generative Adversarial
Networks (GANs)
5. Main Types Of Generative AI (GAI) Techniques [1]
•Starts with data,
adds noise, then
learns to reverse
this process,
creating high-
quality content
from random
noise.
The
Generative
Diffusion
Model
(GDM)
6. Large Language Model
(LLM)
• LLM is a form of Generative AI,
which focuses on generating human-
like text based on the patterns
learned from vast amounts of textual
data during the training process.
• LLM can be considered a specific
machine learning model specialized
in natural language processing.
• ChatGPT is possibly the most
famous example of technology using
LLM right now.
7. How Generative AI
works ?
(admittedly oversimplified)
• The system generates text or images using its previously
built model of the statistical distributions of tokens (=
"chunks" of words, punctuation marks, pixels, etc.)
created from its very large training dataset.
8. How Generative AI works ?
• It might make mistakes or
“hallucinate” based on the limitations
of its process, but the output still
might look like what you wanted.
• Ted Chiang’s analogy = “unreliable
photocopier” or a “blurry JPEG”
9. How Generative AI works ?
• We can ask it questions -
but a very specific type
of question known as
prompts, following this
structure:
"Here’s a fragment of text.
Tell me how this fragment might <continue in this
language or suggest a particular image.>
According to your model of the statistics of
<human language, or human-handled images>,
what <words, or pixels> are likely to come next?"
10. How Generative AI
works ?
The prompts are converted into tokens (= "chunks" of
words, punctuation marks, pixels, etc.), then the system
analyzes what is likely to come next, based on the tokens in
its own dataset (as many as 128,000 in GPT-4 Turbo!).
• It then generates a tokenized output.
11. How Generative AI
works ?
With each output, it keeps re-analyzing the probabilities to
decide next tokens.
12. HERE'S THE
REALLY COOL
PART!!!
Transformers (the "T" in "GPT") know
how to direct attention to specific parts of
the input to guide their selection of the
output - such as verb tenses, objects.
14. How Generative AI
works ?
"Hallucinations"—when the output doesn't
seem to make sense - are why it is essential
not to accept everything it outputs at face
value.
19. Applications – II
Midjourney
• Prompt: Imagine a small seed planted in the ground. It
sprouts, grows into a sapling, then a small tree, and finally
a large robust tree. Each year, it sprouts new branches,
leaves and sometimes fruits – all from that small seed. This
is how your investment grows with compounding – It
branches out producing more and more just like a tree.
21. CIA Triad
Confidentiality
Confidentiality is about
preventing the disclosure of
data to unauthorized parties.
Integrity
Integrity refers to protecting
information from being
modified by unauthorized
parties.
Availability
Availability is making sure
that authorized parties can
access the information when
needed.
CIA Triad
22. Confidentiality Integrity Availability
Attacks • Cracking Encrypted
Data
• Man In The Middle
• Installing Spyware
• Doxxing
• Web Penetration
• Unauthorized Scans
• Remote Controlling
• Phishing
• DDoS attacks
• Ransomware Attacks
• Disrupting Server
Countermeasures • Access Control
• Encryption
• Biometric Verification
• Intrusion Detection
• Cryptography
• Hashing
• Regular Backup
• Data Replication
• Adequate
communication
bandwidth
CIA Triad Broken Down
23. Firewalls
Filter incoming and outgoing network
traffic
•Prevent unauthorized access
Antivirus Software
Detects, quarantines, and eliminates malware
Relies on virus definitions and heuristic analysis
Intrusion Detection Systems
(IDS)
Monitors network or system activities for malicious
actions
Alerts administrators to potential breaches
Traditional Cybersecurity Measures
24. • Primarily act after threat identification
• Struggle against zero-day exploits
Reactive Nature
• Antivirus and some IDS solutions rely on known threat signatures
• Ineffective against novel or sophisticated attacks
Dependence on
Signatures
• Firewalls can't protect against threats that bypass network perimeter
• Internal threats and sophisticated external attacks pose challenges
Limited Scope
• Rapid advancement of attack techniques outpaces updates
• Requires constant updates and patches
Evolving Threat
Landscape
Limitations
25. • Rapid Evolution of Cyber Threats
o Human capabilities alone cannot keep up with the pace at which hackers innovate.
A report by Capgemini highlights that about 61% of enterprises can't detect breach attempts without the use of
AI technologies.
log4j vulnerability was not known in earlier times, although it was present from the beginning, finally, it was
reintroduced in December 2021.
• Complexity and Volume of Threats
o The diversity and sophistication of attacks require advanced detection and response strategies.
o Traditional methods fall short in identifying and mitigating novel threats.
• AI as a Strategic Defense
o AI can analyze vast datasets quickly, identifying patterns and anomalies indicative of cyber threats.
o Provides proactive and adaptive defenses against a constantly evolving threat landscape.
• Preventing Impact on Businesses
o Early detection and mitigation of unknown threats are crucial to protect network integrity and company
assets.
o AI-driven cybersecurity measures can significantly reduce potential damages from such vulnerabilities.
The Need for AI in Cybersecurity
26. AI: The Future of 21st
Century Cybersecurity
• Unprecedented Efficiency:
o AI processes over 2.5 quintillion bytes of data daily.
o 53% faster threat detection and response time
(Ponemon Institute Study).
• Constant attentiveness:
o Continuous Threat Exposure Management (CTEM)
for 24/7 monitoring.
• Proactive Defense:
o Predictive analytics for potential attacks.
o 12x faster breach resolution with AI (Accenture
Report).
• Minimizing Human Error:
o Human error responsible for 90% of breaches
(CybSafe Report).
o Automation reduces errors, enhances focus on
strategic tasks.
27. • Enhancing Traditional Defenses
o Integrates with existing security frameworks to provide advanced, adaptive protections.
o Uses predictive analysis to identify and mitigate potential threats in advance.
• Simulating Cyber Attacks
o Employs Generative AI to create realistic attack scenarios.
o Enables proactive defense strategies by testing systems against simulated threats.
• Generating Synthetic Data for Training
o Produces diverse datasets for machine learning models, enhancing detection capabilities.
o Improves model robustness against novel and sophisticated cyber attacks.
• Advancing Anomaly Detection
o Enhances the ability to detect unusual patterns indicating potential security breaches.
o Provides early threat detection through deeper, AI-driven analysis of network and system
activity.
Enhancing Cybersecurity with Generative AI
A New Frontier in Defense
29. I-Threat Modeling And Analysis
STRIDE GPT
• STRIDE GPT is an AI-powered threat modeling tool.
o Developed by Microsoft
• Leverages Large Language Models (LLMs) for
generating threat models and attack trees.
• Based on the STRIDE methodology for
comprehensive security analysis.
• Users input application details:
o Application type
o Authentication methods
o Internet-facing status
o Processing of sensitive data
• The tool generates outputs tailored to the provided
application specifics.
https://github.com/mrwadams/stride-gpt
31. • Objective: Compare the effectiveness of phishing emails created by GPT-4, the
V-Triad method, a combination of both, and a control group of generic phishing
emails.
• Method: Simulated phishing attacks on 112 participants using a red teaming
approach.
• Key Findings
Click-Through Rates:
o Control Group: 19-28%
o GPT-4 Generated: 30-44%
o V-Triad Generated: 69-79%
o Combined GPT-4 & V-Triad: 43-81%
Detection: Large language models (GPT, Claude, PaLM, LLaMA) effectively detected
phishing intent, sometimes outperforming human detection.
II- Synthetic Phishing Emails
Devising and Detecting Phishing Emails Using Large Language Models [2]
32. • Objective : To introduce SecurityBERT, a BERT-based architecture, for efficient cyber threat
detection in IoT networks, enhancing precision and minimizing computational demand.
• Method
o Utilization of Bidirectional Encoder Representations from Transformers (BERT) with a novel
Privacy-Preserving Fixed-Length Encoding (PPFLE) and the Byte-level Byte-Pair Encoder
(BBPE) Tokenizer.
o Evaluation using the Edge-IIoTset cybersecurity dataset for identifying fourteen distinct
attack types.
• Key Findings
o Performance: SecurityBERT achieved an impressive 98.2% accuracy, outperforming
traditional ML and DL models, including CNNs and RNNs.
o Efficiency: Demonstrated high efficiency with an inference time of <0.15 seconds on average
CPUs and a model size of 16.7MB, suitable for deployment on resource-constrained IoT
devices.
III- Threat Detection
Revolutionizing Cyber Threat Detection With Large Language Models: A Privacy-Preserving BERT-
Based Lightweight Model for IoT/IIoT Devices [3]
33. • Objective: Propose a novel AI-based NIDS to efficiently resolve data imbalance problems and
enhance detection performance of network threats.
• Method
o Utilized state-of-the-art Generative Adversarial Networks (GANs) models , BEGAN , to generate
synthetic data for underrepresented attack traffic.
o Focused on reconstruction error and Wasserstein distance-based GANs, alongside autoencoder-driven
Deep Neural Networks (DNN) and Convolutional Neural Networks (CNN).
o Conducted evaluations across various datasets, including benchmark datasets, IoT datasets, and real
enterprise system data.
• Key Findings
o Achieved detection accuracies up to 93.2% on the NSL-KDD dataset and 87% on the UNSW-NB15
dataset, significantly improving minor class performance.
o Demonstrated the system’s effectiveness in detecting network threats within distributed environments
and IoT settings.
o Highlighted remarkable improvement in threat detection rates in real-world environments by addressing
data imbalance.
IV- Anomaly Detection
An Enhanced AI-Based Network Intrusion Detection System Using Generative Adversarial Networks [4]
34. Feature
AI-based NIDS
(Data Imbalance)
SecurityBERT
(IoT Security)
GPT-4 & V-Triad
(Phishing Detection)
Core Technology GANs, DNN, CNN BERT, PPFLE, BBPE Tokenizer GPT-4, V-Triad, Large Language Models
Objective
Improve NIDS by addressing data
imbalance
Detect cyber threats in IoT with high
accuracy
Evaluate effectiveness of automated vs. manual
phishing email creation
Data Handling
Generates synthetic data to balance
datasets
Uses privacy-preserving techniques for data
representation
Simulates phishing attacks for evaluation
Performance Indicator Accuracy in threat detection Overall accuracy in identifying attack types Click-through rates of phishing emails
Achievements
High accuracy in detecting
underrepresented attacks
98.2% accuracy in detecting diverse attack
types
Higher effectiveness of combined approach (GPT-
4 & V-Triad)
Suitability Broad network environments IoT networks Phishing email detection and creation
Model Size & Efficiency
Compact model, efficient for real-time
analysis
Low inference time, small model size for IoT
devices
Not explicitly mentioned
Comparative Analysis of AI-Based Cybersecurity Solutions
35. RECENT OPEN ISSUES FUTURE DIRECTIONS AREAS OF APPLICATION
1. Data Quality: Cybersecurity data often being
noisy, incomplete, or outdated, compromising AI
model accuracy.
Improve Data Management: Implement advanced
data validation, cleaning, and updating techniques.
Anomaly Detection: Enhance anomaly detection
with higher quality, reliable data.
2. Ethical and Legal Issues: AI's decisions on
privacy, security, and rights raising transparency
and accountability concerns.
Ethical AI Frameworks: Develop guidelines and
regulations for ethical AI use in cybersecurity.
Threat Detection: Ensure AI-driven threat
detection respects ethical and legal standards.
3. Skills Gap: Shortage of professionals with both
AI and cybersecurity expertise.
Education and Training: Invest in specialized training
programs for current and future cybersecurity
professionals.
Cyber Defense: Build teams capable of deploying
advanced AI-driven cyber defense mechanisms.
4. Adversarial Attacks: AI models vulnerable to
manipulation, undermining their reliability.
Resilient AI Systems: Use adversarial training and
encryption to enhance AI model security.
Simulating Cyber Attacks: Test AI systems
against sophisticated simulated adversarial threats.
5. Trust and Adoption: Hesitation in adopting AI
solutions due to concerns about control, autonomy,
or obsolescence.
Build Confidence in AI: Showcase successful AI
implementations and involve stakeholders in
development.
Automating Cybersecurity Responses: Increase
trust in automated responses through proven AI
efficacy.
Gen AI Cybersecurity: Challenges, Future Paths
36. AI in Cybersecurity: Asset or Threat?
Transformative Impact: AI revolutionizes threat detection, minimizes human
error, and enables proactive defense strategies.
Enterprise Adoption: Forbes reports 76% of enterprises prioritize AI and ML
in IT budgets to handle vast data for cyber threat analysis.
Data Explosion: By 2025, connected devices expected to produce 79 zettabytes
of data, surpassing manual analysis capabilities.
Investment Trend: According to Blackberry research, 82% of IT decision-
makers to invest in AI-driven cybersecurity within two years; 48% by end of
2023.
Double-Edged Sword: While AI enhances security, its potential misuse by
cybercriminals poses a significant risk.
Ongoing Battle: An evolving confrontation between AI-powered security
measures and cyber threats.
Governance Key: The perception of AI as an asset or threat hinges on effective
management and use in cybersecurity.
AI's journey in cybersecurity navigates through
vast possibilities and notable challenges.
37. References
[1] Jovanovic, M., & Campbell, M. (2022). Generative artificial intelligence: Trends and
prospects. Computer, 55(10), 107-112.
[2] Heiding, F., Schneier, B., Vishwanath, A., Bernstein, J., & Park, P. S. (2024).
Devising and detecting phishing emails using large language models. IEEE Access.
[3] Ferrag, M. A., Ndhlovu, M., Tihanyi, N., Cordeiro, L. C., Debbah, M., Lestable, T., &
Thandi, N. S. (2024). Revolutionizing Cyber Threat Detection with Large Language Models: A
privacy-preserving BERT-based Lightweight Model for IoT/IIoT Devices. IEEE Access.
[4] Park, C., Lee, J., Kim, Y., Park, J. G., Kim, H., & Hong, D. (2022). An enhanced AI-
based network intrusion detection system using generative adversarial networks. IEEE
Internet of Things Journal, 10(3), 2330-2345.
First , it's important to establish a foundation for both domains separately before merging them. This approach helps you understand the significance of applying generative AI techniques to cybersecurity challenges.
In 2017, Google researchers introduced the Transformer in "Attention is all you need", which took AI by storm.
Hallucinations is likely the biggest hole in the performance of LLMs. These mainly can come from biases within or simply the complexity of giant datasets, as these LLMs take in a huge amount of training data, as discussed in section I-A. LLMs are bound to make mistakes on these large datasets. One way to attempt to mitigate these hallucinations is to apply automated reinforcement learning to tell the model when it is making a mistake. Researchers could attempt to automate a system that detects and error and corrects it before it goes completely into the model’s pool of knowledge. This could be potentially done by implementing anomaly detection for error detection. Another way to potentially reduce the amount of hallucinations could be to curate the training data. Due to the size of the training data for LLMs, this would take a very long time, but ensuring that the data doesn’t have any inaccuracies or biases will help LLMs to not hallucinate as much. By developing a system for easy reinforcement learning and ensuring that the training data is processed correctly, LLMs can overall become more reliable and trustworthy sources of information.
From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy
Midjourney is a generative artificial intelligence program and service created and hosted by the San Francisco–based independent research lab Midjourney, Inc. Midjourney generates images from natural language descriptions, called prompts, similar to OpenAI's DALL-E and Stability AI's Stable Diffusio
Phishing attacks primarily target the "Confidentiality" and "Integrity" aspects of the CIA triad in cybersecurity, which stands for Confidentiality, Integrity, and Availability. Here's how phishing impacts these components:
Confidentiality: Phishing attacks often aim to steal sensitive information such as usernames, passwords, credit card details, and other personal data. By tricking individuals into providing this information, attackers breach the confidentiality of the data, accessing information they are not authorized to view.
Integrity: Phishing can also impact the integrity of data. For instance, attackers might trick users into downloading malware that alters or corrupts data without the user's knowledge. Additionally, by gaining unauthorized access to systems, attackers can manipulate or delete data, thereby compromising its accuracy and reliability.
Benefits of AI in Cybersecurity
In the complex digital landscape of the 21st century, safeguarding data and digital infrastructures has become a paramount priority. With increasing interconnectivity and a mounting number of sophisticated cyber threats, the traditional approaches to cybersecurity are continually challenged. Here's where Artificial Intelligence (AI) steps in.
Efficiency
In an age where over 2.5 quintillion bytes of data are produced daily, the role of AI in cybersecurity becomes pivotal. AI's machine learning algorithms swiftly scan through and learn from these vast data sets, identifying abnormalities or potential threats that might not be evident to human analysts.
According to a study by the Ponemon Institute, AI-driven security platforms have resulted in a 53% improvement in threat detection and response times. This increased efficiency in threat detection and speed in response time can significantly reduce the potential damage of cyber threats.More about threat detection: Continuous Threat Exposure Management (CTEM)
Proactiveness
Traditionally, cybersecurity has been reactive, responding to threats after they occur. AI's predictive capabilities allow organizations to shift to a proactive stance. By analyzing past data and identifying patterns, AI can predict potential attacks. According to Accenture's State of Cyber Resilience report, security teams that adopted AI were able to find and fix breaches 12 times faster than teams that didn't.
This proactiveness can make a critical difference in preventing a significant security breach.
Reduced Human Error
A report from CybSafe found that human error was the cause of approximately 90% of data breaches in 2019. With the automation of various processes, AI significantly reduces such errors. AI systems can be trained to recognize and alert about potential errors, providing an additional layer of security.
This not only strengthens cybersecurity measures but also allows IT teams to focus on strategic planning and other complex tasks, thereby increasing overall operational efficiency.
Key Features
User-Friendly Interface: Simplifies the process of creating threat models.
Automated Threat Models: Generates detailed threat models and attack trees.
Comprehensive Analysis: Enumerates possible attack paths and vulnerabilities.
Mitigation Suggestions: Offers actionable advice to address identified threats.
Flexible AI Integration: Utilizes OpenAI or Mistral models for nuanced analysis.
Privacy-Conscious: Does not store application details, ensuring user privacy.
Wide Accessibility: Supports OpenAI API, Azure OpenAI Service, and Mistral API.
Easy Deployment: Available as a Docker container image for quick setup.
It focuses on leveraging artificial intelligence (AI), specifically generative models and deep learning techniques like autoencoders, to address the challenge of data imbalance—a common issue in anomaly detection where malicious (anomalous) behavior is significantly less frequent than normal behavior. This imbalance can hinder the AI model's ability to learn and detect rare but critical malicious activities accurately. By generating synthetic data for underrepresented attack traffic, the project aims to improve the detection performance of network threats, essentially enhancing anomaly detection capabilities in distributed environments, including IoT networks.