These slides outline how AI is influencing cybersecurity.
Note that they were used in the keynote speech at the event "Defense and Security 2023" held in Thailand on November 8, 2023.
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
AI: The New Player in Cybersecurity (Nov. 08, 2023)
1. AI: The New Player in Cybersecurity
1
Takeshi Takahashi, Ph.D., CISSP
Associate Director
Cybersecurity Laboratory
Cybersecurity Research Institute
National Institute of Information and Communications Technology (NICT)
2. Optical Communication
(Peta bps class multi-core fiber)
Satellite Communication
(Internet Satellite WINDS)
Science Cloud
(Reai-time Web of Himawari-8)
Remote Sensing
(Pi-SAR2 image after 3.11)
Japan Standard Time (JST)
(Leap second on Jan 1, 2017)
Bio/Nano ICT
(Self-organizing bio molecule)
Brain ICT
(Brain-machine Interface)
Multi-lingual Machine Translation
(VoiceTra)
Cybersecurity
(DAEDALUS)
Ultra Realistic Communication
(Electronic Holography)
Research Topics in NICT
2
NICT is the sole national research institute in the field of ICT in Japan
3. A brief introduction of myself
3
Current work
Associate director at the Cybersecurity Laboratory of Cybersecurity Research Institute, NICT
Leading a research on AI x Cybersecurity to reinforce cybersecurity capabilities
Related activities
• Associate editors
ACM Digital Threats:
Research and Practice
IEEE Transactions on
Reliabilities
• Standardization activities
ITU-T SG17 Q6 Associate
Rapporteur
Work experiences
• Researcher at Tampere University of Technology in
2002-2004
• Business consultant at Roland Berger Ltd. in 2006-2009
• Researcher at NICT in 2009-now
Visiting research scholar at the University of
California, Santa Barbara in 2019-2020
Management trainee at the Cabinet Office in 2021-
2022
5. The emergence of ChatGPT
5
• Chat Generative Pre-trained Transformer
Source: https://chat.openai.com/
6. 6
Concerns against ChatGPT
Source: BlackBerry Global Research, February 2023
• Top ChatGPT cybersecurity concerns of global IT decision-makers
Helping attackers craft more believable and legitimate-sounding phishing emails
Enabling less-experienced attackers to improve their knowledge and skills
Spreading disinformation
53%
49%
49%
8. Major attacks using AI
8
Phishing attack
Deepfake
Voice phishing
(vishing)
Stealing critical personal information such as credit card numbers and
account details (user IDs, passwords, etc.) through fake email links.
Replacing parts of people's faces in videos
Creating and exploiting voice clones
9. Phishing
9
The quality of generated phishing email is very high.
(attack is more sophisticated.)
“the AI-generated phish was so convincing that it nearly beat the
one crafted by experienced social engineers” (IBM, Oct 2023)
“With AI tools, it is now far more likely that a phishing email will
appear genuine, leading to more potential victims actually clicking
on malicious links.” (Pillsbury Winthrop Shaw Pittman, Jun 2023)
AI can help craft a phishing email efficiently.
(attack operation is streamlined.)
“It generally takes my team about 16 hours to build a phishing
email, and that’s without factoring in the infrastructure set-up. So,
attackers can potentially save nearly two days of work by using
generative AI models.” (IBM, Oct 2023)
Source: https://jp.godaddy.com/help/what-is-phishing-346,
https://securityintelligence.com/x-force/ai-vs-human-deceit-unravelling-new-age-phishing-tactics/
10. Voice phishing / Vishing
10
In March 2019, the CEO of a UK energy provider received a phone call from someone who
sounded exactly like his boss.
The caller posed as the chief executive of the firm's German parent company and urged the victim to
send €220,000 ($243,000) to a "Hungarian supplier".
This was so convincing (even having the same "slight German accent" as the victim's actual boss) that
the CEO actually complied with the demands and transferred money to the "supplier's" account, which
turned out to be the scammer’s.
What’s more, the police were unable to find any suspects and eventually closed the case.
In early 2020, in Hong Kong, a bank manager believing he was speaking to a director of
another company, authorized a transfer of $35m for "acquisition costs".
The "CEO" claimed that he had hired a lawyer named Martin Zelner to coordinate the acquisition
procedure
The person impersonating the CEO had been in regular contact with the HK manager, the victim
thought it was legitimate.
Source: https://proprivacy.com/privacy-news/deepfake-technology-used-in-hong-kong-bank-heist
11. Deepfake
11
Deepfake = Deep learning + fake
Potential application
Pornography
Politics (impersonation)
Announcer, etc.
Examples
Deepfake video of Barack Obama warning
deepfake technology
Deepfake video of Volodymyr Zelensky
surrendering surfaces on social media
Source: https://news.sky.com/story/ukraine-war-deepfake-video-of-zelenskyy-telling-ukrainians-to-lay-down-arms-debunked-12567789
https://ars.electronica.art/center/en/obama-deep-fake/
12. ChatGPT: “Please list up cyber attacks using AI techniques.”
12
Cyberattacks that utilize artificial intelligence (AI) techniques are becoming increasingly sophisticated and challenging to detect. These
AI-driven attacks leverage machine learning, neural networks, and other AI technologies to improve their efficiency and effectiveness.
Here is a list of some common cyberattacks that use AI techniques:
• Phishing Attacks: AI can be used to generate highly convincing phishing emails by analyzing the victim's online presence and
crafting personalized messages that are more likely to be clicked on.
• Spear Phishing: Similar to phishing attacks but highly targeted, spear phishing uses AI to analyze a victim's behavior, preferences,
and social connections to create convincing emails and messages.
• Malware and Ransomware: AI can enhance malware and ransomware attacks by making them more polymorphic and capable of
evading traditional security measures. It can also help attackers choose the right moment to deploy their malware.
• Adversarial Machine Learning: Attackers can use AI to create adversarial examples that can fool machine learning models. For
example, they can manipulate images or data to bypass security measures based on AI.
• Automated Vulnerability Scanning: AI-driven bots can scan for vulnerabilities in networks and systems at a much faster rate than
humans, making it easier for attackers to identify and exploit weaknesses.
• AI-Enhanced DDoS Attacks: AI can be used to amplify distributed denial of service (DDoS) attacks by dynamically adjusting the
attack patterns to maximize disruption.
• Credential Stuffing: AI can automate the process of trying stolen username and password combinations across multiple sites,
increasing the chances of success.
• Deepfakes: AI-generated deepfake videos and audio can be used to impersonate individuals, causing reputational harm or
facilitating social engineering attacks.
13. ChatGPT: “Please list up cyber attacks using AI techniques.”
13
(continued)
• AI-Generated Text Attacks: AI can generate convincing fake reviews, comments, or news articles to spread disinformation or
manipulate public opinion.
• AI-Powered Bots for Social Engineering: AI-driven chatbots and social engineering tools can mimic human behavior, making it easier
to manipulate people into revealing sensitive information or taking malicious actions.
• Fraud Detection Evasion: AI can be used to develop fraud evasion techniques that adapt to changes in fraud detection systems,
making it harder for businesses to identify fraudulent activities.
• Automated Password Cracking: AI-driven password cracking tools can significantly speed up the process of trying to guess or crack
passwords.
• AI-Enhanced Zero-Day Attacks: AI can help identify and exploit zero-day vulnerabilities faster, potentially before security patches are
available.
• AI-Generated Network Traffic: Attackers can use AI to generate legitimate-looking network traffic to blend in with normal activities,
making it more challenging to detect malicious activity.
• Behavioral Biometric Spoofing: AI can generate synthetic biometric data (such as fingerprints or facial features) to impersonate
individuals for authentication purposes.
These are just a few examples of how AI is being used to enhance cyberattacks. As AI technology continues to advance, it's crucial for
organizations to develop robust cybersecurity strategies and defenses to counter these evolving threats.
15. AI x Cybersecurity research activities
Intelligence generation and analysis
• Vulnerability information analysis
• Threat trend analysis
• Disinformation analysis
Malware samples analysis
• Android app/market
• Decoy document
Intrusion detection
• Security alert analysis
• Explainable Intrusion Detection
Web security
• Web access log
• JavaScript content analysis
• Explainable AI for Phishing Detection
Operation
automation
Darknet analysis
• Early detection of malware activities
• Generating fingerprints of Internet-wide scanner
• Clustering and tracking coordinated scanners
15
Practical application of the above core techniques:
• Practical system/tool building for actual operations: national projects (MITIGATE, CYNEX, etc)
• Domestic/global collaboration: Providing dataset, Accepting interns, Joint publications, and Recruiting
16. AI x Cybersecurity research activities
Intrusion detection
• Security alert analysis
• Explainable Intrusion Detection
Web security
• Web access log
• JavaScript content analysis
• Explainable AI for Phishing Detection
Operation
automation
Darknet analysis
• Early detection of malware activities
• Generating fingerprints of Internet-wide scanner
• Clustering and tracking coordinated scanners
Intelligence generation and analysis
• Vulnerability information analysis
• Threat trend analysis
• Disinformation analysis
Malware samples analysis
• Android app/market
• Decoy document
16
Practical application of the above core techniques:
• Practical system/tool building for actual operations: national projects (MITIGATE, CYNEX, etc)
• Domestic/global collaboration: Providing dataset, Accepting interns, Joint publications, and Recruiting
17. What is Darknet?
Darknet: Unused IP addresses space
‒ In theory: any packets should NOT arrive at the darknet because they are
not connected to any hosts.
‒ In fact: quite a few packets DO arrive!
Packets arriving at the darknet are…
Scans by malwares
Backscatter (reflection of DDoS attack)
Mis-configurations etc.
Darknet traffic reflects global trend in malicious activities
on the Internet.
Darknet
17
18. • Estimate the cooperativeness of the hosts sending packets to our darknet
• Detect the emergence of new malware scan activities in real time, especially those that are
hard to manually detect
A sample case of a coordinated scan detection
(x: date/time, y: number of sources)
malware
infect
scan
Internet
Darknet
Scans reach NICT sensors on the darknet
Infected hosts
Detected by
our technique
Early detection of malware activities: Dark-TRACER
Source: C.Han, et. al., “Dark-TRACER: Early Detection Framework for Malware Activity Based on Anomalous Spatiotemporal Patterns,” IEEE Access, 2022
H.Kanehara, et. al., "Real-Time Botnet Detection Using Nonnegative Tucker Decomposition," ACM SAC, 2019.
• Our technique can detect new malware scan activities in real time with good recall rate.
• However, operators still need to verify the alerted situation
Darknet traffic analysis
Approach
Objective
18
19. • Local model updates are trained in on-premise federated servers for different darknet sensors
and the gradients are sent back to NICT for global model aggregation of spatiotemporal patterns.
• Reduce the time complexity of NMF when combining all observation matrices of all darknet
sensors and obtaining a global view of scanning activity.
• Detect malware activities accurately and efficiently by jointly collecting and analyzing data
among different organizations without sharing them
Collaborative malware activity detection: FINISH
Darknet traffic analysis
Source: Yu-wei Chang, Hong-yen Chen, Chansu Han, Tomohiro Morikawa, Takeshi Takahashi, and Tsung-nan Lin, ''FINISH: Efficient and Scalable NMF-based
Federated Learning for Detecting Malware Activities,'' IEEE Transactions on Emerging Topics in Computing (TETC), 2023.
Approach
Objective
Characteristics of the proposed
system: FINISH
• Improved detection performance
(less false positives)
• Balanced analysis workload
among participating parties
19
20. AI x Cybersecurity research activities
Intrusion detection
• Security alert analysis
• Explainable Intrusion Detection
Web security
• Web access log
• JavaScript content analysis
• Explainable AI for Phishing Detection
Operation
automation
Darknet analysis
• Early detection of malware activities
• Generating fingerprints of Internet-wide scanner
• Clustering and tracking coordinated scanners
Intelligence generation and analysis
• Vulnerability information analysis
• Threat trend analysis
• Disinformation analysis
Malware samples analysis
• Android app/market
• Decoy document
20
Practical application of the above core techniques:
• Practical system/tool building for actual operations: national projects (MITIGATE, CYNEX, etc)
• Domestic/global collaboration: Providing dataset, Accepting interns, Joint publications, and Recruiting
21. Alerts/hints
>= 800M/day
Suspicious
Alerts
~ 100/day
True
Alerts
~ 5 /day
static rule filter Manual verification
Security appliances
Our approach
• We streamline the above process with AI techniques.
• The number of alerted packets are small, and the dataset is significantly
imbalanced; we applied oversampling methods to cope with this
• However, operators still need to verify the alerted situation
Current operations
• Operators need to spend large amount of time for verifying security alerts
• That is tedious and time-consuming, resulting in causing “alert fatigue” problem
Security alert analysis
Intrusion Detection 21
22. AI x Cybersecurity research activities
Malware samples analysis
• Android app/market
• IoT malware
Intrusion detection
• Security alert analysis
• Explainable Intrusion Detection
Web security
• Web access log
• JavaScript content analysis
• Explainable AI for Phishing Detection
Operation
automation
Darknet analysis
• Early detection of malware activities
• Generating fingerprints of Internet-wide scanner
• Clustering and tracking coordinated scanners
Intelligence generation and analysis
• Vulnerability information analysis
• Threat trend analysis
• Disinformation analysis
22
Practical application of the above core techniques:
• Practical system/tool building for actual operations: national projects (MITIGATE, CYNEX, etc)
• Domestic/global collaboration: Providing dataset, Accepting interns, Joint publications, and Recruiting
23. [This is the joint work lead by Kyushu University]
• We generate a phylogenetic tree and classify malware samples
− This helps analysts to understand the behavior and characteristics of the sample
− Distance: measured by normalized compression distance (NCD)
− Tree generation: generate subtrees (using neighbor joining method) and concatenate them
• Our scheme is scalable and practical
− This scheme can deal with more than several
hundred thousands malware samples
− Calculation cost: O(N2) in conventional scheme vs
O(N log N) in our scheme
− We found that there are some species inheriting
characteristics of multiple families
IoT malware classification and analysis
bashlite
mirai
tsunami
Clustering 314 samples
Source: T.He, et al., ”A fast algorithm for constructing phylogenetic trees with application to iot malware clustering, ” ICONIP 2019
Malware analysis 23
24. AI x Cybersecurity research activities
Intrusion detection
• Security alert analysis
• Explainable Intrusion Detection
Web security
• Web access log
• JavaScript content analysis
• Explainable AI for Phishing Detection
Operation
automation
Darknet analysis
• Early detection of malware activities
• Generating fingerprints of Internet-wide scanner
• Clustering and tracking coordinated scanners
Intelligence generation and analysis
• Vulnerability information analysis
• Threat trend analysis
• Disinformation analysis
Malware samples analysis
• Android app/market
• Decoy document
24
Practical application of the above core techniques:
• Practical system/tool building for actual operations: national projects (MITIGATE, CYNEX, etc)
• Domestic/global collaboration: Providing dataset, Accepting interns, Joint publications, and Recruiting
25. • We need to collect data at the user side. For that, we need a sensor at user side.
• How to motivate people to install and keep using a sensor on user browsers was
a tough issue
– Asking people to install…
– Providing coupons…
• We decided to change our strategies
– Asking people to install the sensor won’t work too well
– We need to make people want to install the sensor
• We implemented a sensor on browsers, called Tachikoma,
a fictional walker with AI from the Ghost in the Shell universe.
Web-based Attack Response with Practical and Deployable Research InitiatiVE
25
Source: https://warpdrive-project.jp/
26. Analysis on the paths to malicious URLs
URL 1
URL 2
URL 3
URL4
(blacklisted)
Entry point
Tracing the access paths of users
Source-tab tracing
• Referer tracing
• Mainframe tracing
In-tab tracing
• Referer tracing
• Mainframe tracing
Global tracing
• Referer tracing
• Origin tracing
• Mainframe tracing
Reload tracing
• In-tab reload tracing
• Global reload tracing
Transition type
= reload?
Source tabid?
Access record
found?
True
False
Exist
Doesn’t exist
No
Yes
End
Start
Stab
Previous access
= entry point ?
Start
Identify the previous
access record
True
False
End
Flow for tracing a previous access
Path extraction algorithm
Source: T.Takahashi, et. al., “Tracing and Analyzing Web Access Paths Based on User-Side Data Collection: How Do Users Reach Malicious URLs?,” RAID, 2020.
Malicious Webpage Detection 26
27. Domain risk evaluation method
• The proposed scheme identifies domains that most likely lead to malicious URLs.
• These domains themselves do not necessarily host malicious contents (thus not blacklisted)
• We can minimize the number of users and accesses reaching malicious URLs by filtering
traffic on the domains or by providing alerts.
• Risk levels of all domains on the paths are calculated, and those domains with the risk levels
beyond certain threshold value are determined as hazardous
• We have identified 45 domains per month that are not blacklisted and that reaches malicious
URLs with 100% certainty
Domain A
Domain B
Domain D Domain E
URL 4
URL 2
(blacklisted)
URL 7
(blacklisted)
URL5
(blacklisted)
Domain C
URL1
(blacklisted)
URL 3
URL 6
Hazardous domain
Malicious Webpage Detection
Source: T.Takahashi, et. al., “Tracing and Analyzing Web Access Paths Based on User-Side Data Collection: How Do Users Reach Malicious URLs?,” RAID, 2020.
27
28. 1. F. Charmet, et. al., ‘‘Towards a better understanding of mobile users’ behavior: a web session repair scheme’’ IEEE Access, 2022
2. SC.Lin, et al., “SenseInput: An Image-Based Sensitive Input Detection Scheme for Phishing Website Detection,” ICC, 2022
3. T.Takahashi, et. al., “Tracing and Analyzing Web Access Paths Based on User-Side Data Collection: How Do Users Reach Malicious
URLs?,” RAID, 2020.
Major publications
Web
security
Malware
analysis
Darknet
analysis
Alert
screening
Android
security
1. B.Sun, et. al., “Detecting Android Malware and Classifying Its Families in Large-scale Datasets,” ACM Trans. Manage. Inf. Syst., 2021
2. YC Chen, et. al., “Impact of Code Deobfuscation and Feature Interaction in Android Malware Detection,” IEEE Access, 2021
3. B.Sun, et. al., “A Scalable and Accurate Feature Representation Method for Identifying Malicious Mobile Applications,” ACM SAC, 2019.
1. T. He, et. al., ‘‘Scalable and Fast Algorithm for Constructing Phylogenetic Trees with Application to IoT Malware Clustering,’’ IEEE
ACCESS, 2023.
2. C. Wu, et. al., ‘‘IoT Malware Classification Based on Reinterpreted Function-Call Graphs,’’ Computer & Security, Elsevier, 2022
3. R.Kawazoe, et. al., “Investigating Behavioral Differences between IoT Malware via Function Call Sequence Graphs,” ACM SAC 2021
4. T.Wan, et. al., “An Efficient Approach to Detect and Classify IoT Malware Based On Byte Sequences from Executable Files,” IEEE open
j. Commun. Soc, 2020
1. Y. Chang, et. al., ''FINISH: Efficient and Scalable NMF-based Federated Learning for Detecting Malware Activities,'' IEEE Trans. Emerg.
Top. Comput., 2023.
2. A. Tanaka, et. al., “Detecting Coordinated Internet-Wide Scanning by TCP/IP Header Fingerprint,” IEEE Access, 2023
3. C.Han, et. al., “Dark-TRACER: Early Detection Framework for Malware Activity Based on Anomalous Spatiotemporal Patterns,” IEEE
Access, 2022
4. H.Kanehara, et. al., “Real-Time Botnet Detection Using Nonnegative Tucker Decomposition,” ACM SAC, 2019.
1. R. Ishibashi, Kohei Miyamoto, Chansu Han, Tao Ban, Takeshi Takahashi, and Jun’ichi Takeuchi, ‘‘Generating Labeled Training Datasets
Towards Unified Network Intrusion Detection Systems,’’ IEEE ACCESS, 2022
2. M. Aminanto, et. al., “Threat Alert Prioritization Using Isolation Forest and Stacked Auto Encoder with Day-forwarding-chaining
Analysis,” IEEE Access, 2020
1. M. Pattaranantakul, et. al., “SFC security survey: Addressing security challenges and threats in Service Function Chaining”, Computer
Networks, Elsevier, 2022.
2. B.Sun, et. al., “Leveraging machine learning techniques to identify deceptive decoy documents associated with targeted email attacks,”
IEEE Access, 2021
3. R.Iijima, et. al., “Audio Hotspot Attack: An Attack on Voice Assistance Systems Using Directional Sound Beams and its Feasibility,” IEEE
Trans. Emerg. Top. Comput., 2019.
Others
Appendix
28
30. Attacks at the deployment phase
30
Data preparation
・collect data
・organize the data
・label the data
Model fitting
・generate model
・evaluate model
Deployment
・deploy the model
・operation
・maintenance
• Adversarial examples
• Membership inference attacks
• Model theft
Project design
・define objective
・define approach
Development process of AI
31. Adversarial examples
31
Source: Ian J. Goodfellow et al. “Explaining and Harnessing Adversarial Examples, ICLR, 2015.
https://www.nri-secure.co.jp/blog/hostile-sample-mechanics-and-attack-classification
• An adversarial example is an image that has been added with slight noise (i.e., perturbation) in
order to cause the model to misclassify the image.
• The noise is invisible to humans
“gibbon” domain
(adversarial
example)
“panda”
domain
(original)
32. Membership inference attacks
32
Source: https://jpsec.ai/invasion-of-ai-privacy/
• The attacker infers whether the input data is included in the target AI's training data.
Attacker provides normal input data to the target AI and observes the classification
results returned by the target AI,
It observes the difference in confidence score when inputting data included in the
training data to the target AI and when data not included in the training data is input to
the target AI.
33. 33
Model stealing
Source:https://www.mbsd.jp/aisec_portal/attack_copycat_cnn.html#copycat_cnn
The attacker inputs multiple data (images) to the target AI (Trained CNN) and creates a "fake
dataset" by linking the AI's classification results (labels) to the input data.
The attacker then learns the proprietary AI at hand using the imitation data set and creates a
“copycat network” with performance equivalent to the target AI.
34. Attacks at the data preparation phase
34
Project design
・define objective
・define approach
Data preparation
・collect data
・organize the data
・label the data
Model fitting
・generate model
・evaluate model
Deployment
・deploy the model
・operation
・maintenance
Poisoning attack injects malicious training data with the aim of corrupting the learned model.
• targets the learning data collection/creation process.
• will cause misclassification of the input data.
Development process of AI
35. Error-specific poisoning
35
Source: https://jpsec.ai/attack-to-hijack-ai/, https://doi.org/10.6028/NIST.AI.100-2e2023.ipd
※ It is classified into “ship” though it should be classified into “frog”
This attack method misclassifies specific data input into the target AI into the class intended
by the attacker.
• Only the attacker knows the data (hereinafter referred to as a trigger)
• Indeed, the poisoned data are adversarial examples of the trigger
36. Error-generic poisoning
36
Source: https://jpsec.ai/attack-to-hijack-ai/, https://doi.org/10.6028/NIST.AI.100-2e2023.ipd
※ The decision boundary is distorted and
misclassifications occur frequently.
It is an attack technique designed to induce as many misclassifications as possible.
This attack generate as many misclassifications as possible, regardless of class.
In other words, the inference accuracy of AI will be significantly reduced.
The poisoned data are adversarial examples of arbitrary classes.
37. Attacks at the model fitting phase
37
Data preparation
・collect data
・organize the data
・label the data
Model fitting
・generate model
・evaluate model
Deployment
・deploy the model
・operation
・maintenance
Pre-trained models are shared
• Trojaing attacks (sharing of pre-trained
model with a trojaned node)
• Sharing pre-trained model with lambda
layer (function of Tensorflow framework)
Project design
・define objective
・define approach
Development process of AI
38. Trojaning attacks on neural networks
38
Source: https://docs.lib.purdue.edu/cgi/viewcontent.cgi?article=2782&context=cstech
The trigger is misclassified into the attacker's intended class.
• A backdoor is embedded in the model, and it is designed to trigger false identification only when an input
with a trigger is given.
• The attack is then accomplished by having someone else use the model with the backdoor embedded.
39. Concluding remarks
39
AI is a tool for both attack and defense.
Attackers and defenders can use AI for more efficient and effective attacks and defenses, respectively.
The use of AI will become indispensable for both sides.
Tools and knowledge are developing further, and we will face new challenges.
Assorted impersonation attacks will be initiated, and it is not too easy to detect these
AI itself could be a target of attacks.
As with the cases until now, no perfect security solution exists. Continuous efforts are indispensable.
We cannot blindly rely on AI; we need to understand the nature of AI to cope with it.
As the phrase "human-in-the-loop" suggests, security left to AI will not work.
The importance of literacy is growing further
AI gives us many opportunities. Instead of making it risks, we should understand it and use it for
better cybersecurity