We present a set of newly tuned algorithms that can distinguish between malicious and non-malicious websites with a high degree of accuracy using Machine Learning (ML). We use the Bro IDS/IPS tool for extracting the SSL certificates from network traffic and training the ML algorithms.
The extracted SSL attributes are then loaded into multiple ML frameworks such as Splunk, AWS ML and we run a series of classification algorithms to identify those attributes that correlate with malicious sites.
Our analysis shows that there are a number of emerging patterns that even allow for identification of high-jacked devices and self-signed certificates. We present the results of our analysis which show which attributes are the most relevant for detecting malicious SSL certificates and as well the performance of the ML algorithms.
Detecting Malicious SSL Certificates Using BroAndrew Beard
We have developed a set of techniques to detect malicious SSL certificates using data collected by Bro. Our analysis framework consists of Bro for collecting the data and a variety of tools such as Splunk and AWS ML for data analysis. We show how we used Bro for collecting the attributes we needed for SSL certificates from both good and bad sources. Bro is a very effective and simple tool for analyzing and extracting data from network traffic.
Next, the extracted data was loaded into Splunk and we ran a series of Machine Learning algorithms to identify those attributes that correlated with malicious activity. The algorithms we used also allowed for categorization of certificates used in the delivery and control of malware. Our analysis showed that there were a number of patterns that emerged that allowed for classification of high-jacked devices, self-signed certificates, etc. We will present the results of our analysis which show which attributes are the most relevant for detecting malicious SSL certificates and as well the performance of the ML algorithms. Finally, we show how well the training has worked in detecting new malicious sources. All of the source code will be made available on github.
In this talk, we will dive into the data captured during last years Wall of Sheep applications and protocols that are giving your away credentials. This is something that anyone, with the right level of knowledge and inclination, could certainly do with a few basic ingredients. We will enumerate them. The dataset we will focus on was gathered as part of the Wall of Sheep contest during DEF CON 22. While this data was gathered using an off the shelf technology, that platform will not be the topic we discuss. Rather, we will focus on the types and scope of data sent totally in the clear for all to see. Additionally, we will discuss the ramifications this might have in a less "friendly" environment --where loss of one's anonymity, might really, really suck. Finally, we will discuss and recommend ways you can hamper this type of collection.
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
Detecting Malicious SSL Certificates Using BroAndrew Beard
We have developed a set of techniques to detect malicious SSL certificates using data collected by Bro. Our analysis framework consists of Bro for collecting the data and a variety of tools such as Splunk and AWS ML for data analysis. We show how we used Bro for collecting the attributes we needed for SSL certificates from both good and bad sources. Bro is a very effective and simple tool for analyzing and extracting data from network traffic.
Next, the extracted data was loaded into Splunk and we ran a series of Machine Learning algorithms to identify those attributes that correlated with malicious activity. The algorithms we used also allowed for categorization of certificates used in the delivery and control of malware. Our analysis showed that there were a number of patterns that emerged that allowed for classification of high-jacked devices, self-signed certificates, etc. We will present the results of our analysis which show which attributes are the most relevant for detecting malicious SSL certificates and as well the performance of the ML algorithms. Finally, we show how well the training has worked in detecting new malicious sources. All of the source code will be made available on github.
In this talk, we will dive into the data captured during last years Wall of Sheep applications and protocols that are giving your away credentials. This is something that anyone, with the right level of knowledge and inclination, could certainly do with a few basic ingredients. We will enumerate them. The dataset we will focus on was gathered as part of the Wall of Sheep contest during DEF CON 22. While this data was gathered using an off the shelf technology, that platform will not be the topic we discuss. Rather, we will focus on the types and scope of data sent totally in the clear for all to see. Additionally, we will discuss the ramifications this might have in a less "friendly" environment --where loss of one's anonymity, might really, really suck. Finally, we will discuss and recommend ways you can hamper this type of collection.
R + Storm Moneyball - Realtime Advanced Statistics - Hadoop Summit - San JoseAllen Day, PhD
Architecting R into the Storm Application Development Process
~~~~~
The business need for real-time analytics at large scale has focused attention on the use of Apache Storm, but an approach that is sometimes overlooked is the use of Storm and R together. This novel combination of real-time processing with Storm and the practical but powerful statistical analysis offered by R substantially extends the usefulness of Storm as a solution to a variety of business critical problems. By architecting R into the Storm application development process, Storm developers can be much more effective. The aim of this design is not necessarily to deploy faster code but rather to deploy code faster. Just a few lines of R code can be used in place of lengthy Storm code for the purpose of early exploration – you can easily evaluate alternative approaches and quickly make a working prototype.
In this presentation, Allen will build a bridge from basic real-time business goals to the technical design of solutions. We will take an example of a real-world use case, compose an implementation of the use case as Storm components (spouts, bolts, etc.) and highlight how R can be an effective tool in prototyping a solution.
Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013Sonal Raj
This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.
Streaming data presents new challenges for statistics and machine learning on extremely large data sets. Tools such as Apache Storm, a stream processing framework, can power range of data analytics but lack advanced statistical capabilities. These slides are from the Apache.con talk, which discussed developing streaming algorithms with the flexibility of both Storm and R, a statistical programming language.
At the talk I dicsussed issues of why and how to use Storm and R to develop streaming algorithms; in particular I focused on:
• Streaming algorithms
• Online machine learning algorithms
• Use cases showing how to process hundreds of millions of events a day in (near) real time
See: https://apacheconna2015.sched.org/event/09f5a1cc372860b008bce09e15a034c4#.VUf7wxOUd5o
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
For self-service BI and exploratory analytic workloads, the cloud can provide a number of key benefits, but the move to the cloud isn’t all-or-nothing. Gartner predicts nearly 80 percent of businesses will adopt a hybrid strategy. Learn how a modern analytic database can power your business-critical workloads across multi-cloud and hybrid environments, while maintaining data portability. We'll also discuss how to best leverage the increased agility cloud provides, while maintaining peak performance.
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
At the StampedeCon 2015 Big Data Conference: Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance. Each of the data formats have different strengths and weaknesses, depending on how you want to store and retrieve your data. For instance, we have observed performance differences on the order of 25x between Parquet and Plain Text files for certain workloads. However, it isn’t the case that one is always better than the others.
Internet of Things (IoT) - We Are at the Tip of An IcebergDr. Mazlan Abbas
You are likely benefitting from The Internet of Things (IoT) today, whether or not you’re familiar with the term. If your phone automatically connects to your car radio, or if you have a smartwatch counting your steps, congratulations! You have adopted one small piece of a very large IoT pie, even if you haven't adopted the name yet.
IoT may sound like a business buzzword, but in reality, it’s a real technological revolution that will impact everything we do. It's the next IT Tsunami of new possibility that is destined to change the face of technology, as we know it. IoT is the interconnectivity between things using wireless communication technology (each with their own unique identifiers) to connect objects, locations, animals, or people to the Internet, thus allowing for the direct transmission of and seamless sharing of data.
IoT represents a massive wave of technical innovation. Highly valuable companies will be built and new ecosystems will emerge from bridging the offline world with the online into one gigantic new network. Our limited understanding of the possibilities hinders our ability to see future applications for any new technology. Mainstream adoption of desktop computers and the Internet didn’t take hold until they became affordable and usable. When that occurred, fantastic and creative new innovation ensued. We are on the cusp of that tipping point with the Internet of Things.
IoT matters because it will create new industries, new companies, new jobs, and new economic growth. It will transform existing segments of our economy: retail, farming, industrial, logistics, cities, and the environment. It will turn your smartphone into the command center for the both digital and physical objects in your life. You will live and work smarter, not harder – and what we are seeing now is only the tip of the iceberg.
Budapest Spark Meetup - Apache Spark @enbrite.lyMészáros József
Budapest Spark Meetup - Apache Spark @enbrite.ly presentation held on
March 30, 2016.
The vision we all share at enbrite.ly is to create the next generation decision supporting system in online advertising that combines the market needs; anti-fraud, viewability, brand safety and traffic quality assurances in one platform. We do this by analyzing vast amount of data to create value for our customers. In the last 6 months we created our ETL pipeline, the core component of our data platform based on Apache Spark. In this presentation I share the journey from the whiteboard designs to the maintenance of a TB-scale data pipeline. I share the lessons we learned and the ups and downs using Spark in scale.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
A look inside the European Covid Green Certificate - Rust DublinLuciano Mammino
When I saw how dense the European Covid Green Pass QR code is, I got immediately curious: "WOW, there must be a lot of interesting data in here". So, I started to dig deeper and I found that there's really a great wealth of interesting encoding and verification technologies being used in it! In this talk, I will share what I learned! We will go on a journey where we will explore Base54 encoding, COSE tokens, CBOR serialization, elliptic curve crypto, and much more! Finally, I will also show you how to write a decoder for Green Pass certificates in the most hyped language ever: Rust!
CHAMCHICON 2019
ref: https://ohjeongwook.com/2019/04/08/chamchicon-season-1-hack-inssa/
악성코드의 진단 우회 및 난독화는 custom packer를 이용하는 방식에서 스크립트 패키지를 이용하는 방식으로 많이 바뀌었습니다. Living Off the Land 라는 유행어 그대로 디스크에 아무것도 남기지 않으며 이미 시스템에 있는 스크립트 파서들을 활용하여 난독화 및 진단우회를 수행하게 됩니다.
VBA macro, Batch, powershell, steganography 등의 기법을 총동원한 핵인싸의 진단 우회 트릭들을 하나하나 어떻게 푸는지 살펴봅니다.
* This talk is extension of my speech from Becks JP
A talk on Data Science in Piano, contains the following:
1. Tips on how to make sure your data are analysis-friendly
2. A short introduction into how to do data science with a for loop (partially stolen from https://goo.gl/wHwZKv)
3. A brief look on output evolution for paywall health check for our clients (publishers)
4. A sneak peek into challenges we face currently
Identifying and Correlating Internet-wide Scan Traffic to Newsworthy Security...Andrew Morris
In this presentation, we will discuss using GreyNoise, a geographically and logically distributed system of passive Internet scan traffic collector nodes, to identify statistical anomalies in global opportunistic Internet scan traffic and correlate these anomalies with publicly disclosed vulnerabilities, large-scale DDoS attacks, and other newsworthy events. We will discuss establishing (and identifying any deviations away from) a “standard” baseline of Internet scan traffic. We will discuss successes and failures of different methods employed over the past six months. We will explore open questions and future work on automated anomaly detection of Internet scan traffic. Finally, we will provide raw data and a challenge as an exercise to the attendees.
Concepts and types of anomaly detection and also step-by-step explanation on how to detect anomalies with normal distribution and multivariate normal distribution.
I presented these slides at a meeting of ACM data mining group. I discuss using data mining to improve performance of an existing trading system. The presentation was video taped. You can see the video at:
http://fora.tv/2009/05/13/Michael_Bowles_Neural_Nets_and_Rule-Based_Trading_Systems
if you have any questions or comments contact me: mike@mbowles.com or
http://www.linkedin.com/in/mikebowles
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
Parquet is a columnar format designed to be extremely efficient and interoperable across the hadoop ecosystem. Its integration in most of the Hadoop processing frameworks (Impala, Hive, Pig, Cascading, Crunch, Scalding, Spark, …) and serialization models (Thrift, Avro, Protocol Buffers, …) makes it easy to use in existing ETL and processing pipelines, while giving flexibility of choice on the query engine (whether in Java or C++). In this talk, we will describe how one can us Parquet with a wide variety of data analysis tools like Spark, Impala, Pig, Hive, and Cascading to create powerful, efficient data analysis pipelines. Data management is simplified as the format is self describing and handles schema evolution. Support for nested structures enables more natural modeling of data for Hadoop compared to flat representations that create the need for often costly joins.
Gartner Data and Analytics Summit: Bringing Self-Service BI & SQL Analytics ...Cloudera, Inc.
For self-service BI and exploratory analytic workloads, the cloud can provide a number of key benefits, but the move to the cloud isn’t all-or-nothing. Gartner predicts nearly 80 percent of businesses will adopt a hybrid strategy. Learn how a modern analytic database can power your business-critical workloads across multi-cloud and hybrid environments, while maintaining data portability. We'll also discuss how to best leverage the increased agility cloud provides, while maintaining peak performance.
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...StampedeCon
At the StampedeCon 2015 Big Data Conference: Picking your distribution and platform is just the first decision of many you need to make in order to create a successful data ecosystem. In addition to things like replication factor and node configuration, the choice of file format can have a profound impact on cluster performance. Each of the data formats have different strengths and weaknesses, depending on how you want to store and retrieve your data. For instance, we have observed performance differences on the order of 25x between Parquet and Plain Text files for certain workloads. However, it isn’t the case that one is always better than the others.
Internet of Things (IoT) - We Are at the Tip of An IcebergDr. Mazlan Abbas
You are likely benefitting from The Internet of Things (IoT) today, whether or not you’re familiar with the term. If your phone automatically connects to your car radio, or if you have a smartwatch counting your steps, congratulations! You have adopted one small piece of a very large IoT pie, even if you haven't adopted the name yet.
IoT may sound like a business buzzword, but in reality, it’s a real technological revolution that will impact everything we do. It's the next IT Tsunami of new possibility that is destined to change the face of technology, as we know it. IoT is the interconnectivity between things using wireless communication technology (each with their own unique identifiers) to connect objects, locations, animals, or people to the Internet, thus allowing for the direct transmission of and seamless sharing of data.
IoT represents a massive wave of technical innovation. Highly valuable companies will be built and new ecosystems will emerge from bridging the offline world with the online into one gigantic new network. Our limited understanding of the possibilities hinders our ability to see future applications for any new technology. Mainstream adoption of desktop computers and the Internet didn’t take hold until they became affordable and usable. When that occurred, fantastic and creative new innovation ensued. We are on the cusp of that tipping point with the Internet of Things.
IoT matters because it will create new industries, new companies, new jobs, and new economic growth. It will transform existing segments of our economy: retail, farming, industrial, logistics, cities, and the environment. It will turn your smartphone into the command center for the both digital and physical objects in your life. You will live and work smarter, not harder – and what we are seeing now is only the tip of the iceberg.
Budapest Spark Meetup - Apache Spark @enbrite.lyMészáros József
Budapest Spark Meetup - Apache Spark @enbrite.ly presentation held on
March 30, 2016.
The vision we all share at enbrite.ly is to create the next generation decision supporting system in online advertising that combines the market needs; anti-fraud, viewability, brand safety and traffic quality assurances in one platform. We do this by analyzing vast amount of data to create value for our customers. In the last 6 months we created our ETL pipeline, the core component of our data platform based on Apache Spark. In this presentation I share the journey from the whiteboard designs to the maintenance of a TB-scale data pipeline. I share the lessons we learned and the ups and downs using Spark in scale.
Beyond php - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just wrting PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
A look inside the European Covid Green Certificate - Rust DublinLuciano Mammino
When I saw how dense the European Covid Green Pass QR code is, I got immediately curious: "WOW, there must be a lot of interesting data in here". So, I started to dig deeper and I found that there's really a great wealth of interesting encoding and verification technologies being used in it! In this talk, I will share what I learned! We will go on a journey where we will explore Base54 encoding, COSE tokens, CBOR serialization, elliptic curve crypto, and much more! Finally, I will also show you how to write a decoder for Green Pass certificates in the most hyped language ever: Rust!
CHAMCHICON 2019
ref: https://ohjeongwook.com/2019/04/08/chamchicon-season-1-hack-inssa/
악성코드의 진단 우회 및 난독화는 custom packer를 이용하는 방식에서 스크립트 패키지를 이용하는 방식으로 많이 바뀌었습니다. Living Off the Land 라는 유행어 그대로 디스크에 아무것도 남기지 않으며 이미 시스템에 있는 스크립트 파서들을 활용하여 난독화 및 진단우회를 수행하게 됩니다.
VBA macro, Batch, powershell, steganography 등의 기법을 총동원한 핵인싸의 진단 우회 트릭들을 하나하나 어떻게 푸는지 살펴봅니다.
* This talk is extension of my speech from Becks JP
A talk on Data Science in Piano, contains the following:
1. Tips on how to make sure your data are analysis-friendly
2. A short introduction into how to do data science with a for loop (partially stolen from https://goo.gl/wHwZKv)
3. A brief look on output evolution for paywall health check for our clients (publishers)
4. A sneak peek into challenges we face currently
Identifying and Correlating Internet-wide Scan Traffic to Newsworthy Security...Andrew Morris
In this presentation, we will discuss using GreyNoise, a geographically and logically distributed system of passive Internet scan traffic collector nodes, to identify statistical anomalies in global opportunistic Internet scan traffic and correlate these anomalies with publicly disclosed vulnerabilities, large-scale DDoS attacks, and other newsworthy events. We will discuss establishing (and identifying any deviations away from) a “standard” baseline of Internet scan traffic. We will discuss successes and failures of different methods employed over the past six months. We will explore open questions and future work on automated anomaly detection of Internet scan traffic. Finally, we will provide raw data and a challenge as an exercise to the attendees.
Concepts and types of anomaly detection and also step-by-step explanation on how to detect anomalies with normal distribution and multivariate normal distribution.
I presented these slides at a meeting of ACM data mining group. I discuss using data mining to improve performance of an existing trading system. The presentation was video taped. You can see the video at:
http://fora.tv/2009/05/13/Michael_Bowles_Neural_Nets_and_Rule-Based_Trading_Systems
if you have any questions or comments contact me: mike@mbowles.com or
http://www.linkedin.com/in/mikebowles
Beyond PHP - it's not (just) about the codeWim Godden
Most PHP developers focus on writing code. But creating Web applications is about much more than just writing PHP. Take a step outside the PHP cocoon and into the big PHP ecosphere to find out how small code changes can make a world of difference on servers and network. This talk is an eye-opener for developers who spend over 80% of their time coding, debugging and testing.
Cybersecurity Asia 2021 Conference: Learning from HoneypotsAPNIC
APNIC Senior Security Specialist Adli Wahid presents on the APNIC Honeynet Project at the Cybersecurity Asia 2021 Conference, held online from 11 to 12 October 2021.
Prediction of Arterial Blood Gases(ABG) by Using Neural Network In Trauma Pat...Mohammad Sabouri
Prediction of Arterial Blood Gases(ABG) by Using Neural Network In Trauma Patients
Milad Shayan
Mohammad Sabouri
Dr. Shahram Paydar
Leila Shayan
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
https://sites.google.com/view/acrrl/
Lecture at London's Kings College given to undergraduate students on applications of AI in Finance. Supervised, Unsupervised and Reinforcement Learning Examples. Trading, Credit Scoring, Sentiment Analysis.
Detecting advanced and evasive threats on the networkDell World
Threat actors are increasingly employing evasive tactics that bypass traditional security controls, including more advanced technologies such as sandboxing. In this session, Dell SecureWorks will share examples of tactics used, their impact, what this means for organizations and new capabilities for addressing the risk posed by these threats.
Similar to Detecting Malicious Websites using Machine Learning (20)
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
6. abeard@pcap:~/certs$ base64 -d trickbot_cert.der | openssl x509 -inform der -text -noout
Certificate:
Data:
Version: 1 (0x0)
Serial Number:
c5:63:15:a8:0d:6a:86:e5
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=AU, ST=f2tee4, L=gf23et65adt, O=tg4r6tds, OU=rst, CN=rvgvtfdf
Validity
Not Before: Jun 8 17:51:56 2016 GMT
Not After : Jun 8 17:51:56 2017 GMT
Subject: C=AU, ST=f2tee4, L=gf23et65adt, O=tg4r6tds, OU=rst, CN=rvgvtfdf
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (1024 bit)
Modulus:
00:b8:9a:63:c8:fe:a4:f5:40:86:64:79:16:92:da:
8a:0b:6f:d3:66:e1:3b:ba:66:c7:e7:40:8a:98:e7:
b1:b8:46:6a:94:a9:bd:9d:84:e6:22:a2:d4:9e:ab:
c0:d0:0d:e6:4c:ea:0d:84:56:07:02:d8:94:df:8d:
a5:43:5b:26:3f:22:71:3a:d8:05:f6:66:6d:ba:97:
23:fd:33:bf:ae:86:07:e8:36:7e:ab:dd:68:23:70:
b4:9b:ff:13:ff:86:17:8c:3d:0d:56:56:1a:18:92:
94:37:0b:2d:0b:9b:1b:0b:ad:81:2d:eb:1c:98:ff:
75:a4:5e:68:cb:04:e0:f9:ad
Exponent: 65537 (0x10001)
Signature Algorithm: sha256WithRSAEncryption
32:dc:6a:1e:f0:c7:94:d0:7a:e4:99:95:7e:34:85:96:9c:7f:
60:c5:49:a2:5e:b5:02:61:04:bb:59:e8:2c:c5:a1:a7:6d:27:
c3:01:3f:ad:9d:46:b9:fe:ea:17:55:ca:ea:c1:2a:3f:c6:08:
a0:01:90:b2:5d:48:5b:48:73:d1:71:db:db:8f:9f:a0:20:1b:
6e:83:c2:1c:cd:db:d5:7b:74:06:87:d1:cb:53:d7:fa:07:32:
f3:84:a3:6d:e4:e9:39:9e:22:b8:a7:9e:66:37:14:1f:6a:08:
35:dc:fb:9c:d1:a7:cf:ed:83:9b:29:f6:6e:e2:de:d8:fa:18:
cd:a7
7. abeard@pcap:~/certs$ base64 -d trickbot_cert.der | openssl x509 -inform der -text -noout
Certificate:
Data:
Version: 1 (0x0)
Serial Number:
c5:63:15:a8:0d:6a:86:e5
Signature Algorithm: sha256WithRSAEncryption
Issuer: C=AU, ST=f2tee4, L=gf23et65adt, O=tg4r6tds, OU=rst, CN=rvgvtfdf
Validity
Not Before: Jun 8 17:51:56 2016 GMT
Not After : Jun 8 17:51:56 2017 GMT
Subject: C=AU, ST=f2tee4, L=gf23et65adt, O=tg4r6tds, OU=rst, CN=rvgvtfdf
Subject Public Key Info:
Public Key Algorithm: rsaEncryption
Public-Key: (1024 bit)
Modulus:
00:b8:9a:63:c8:fe:a4:f5:40:86:64:79:16:92:da:
8a:0b:6f:d3:66:e1:3b:ba:66:c7:e7:40:8a:98:e7:
b1:b8:46:6a:94:a9:bd:9d:84:e6:22:a2:d4:9e:ab:
c0:d0:0d:e6:4c:ea:0d:84:56:07:02:d8:94:df:8d:
a5:43:5b:26:3f:22:71:3a:d8:05:f6:66:6d:ba:97:
23:fd:33:bf:ae:86:07:e8:36:7e:ab:dd:68:23:70:
b4:9b:ff:13:ff:86:17:8c:3d:0d:56:56:1a:18:92:
94:37:0b:2d:0b:9b:1b:0b:ad:81:2d:eb:1c:98:ff:
75:a4:5e:68:cb:04:e0:f9:ad
Exponent: 65537 (0x10001)
Signature Algorithm: sha256WithRSAEncryption
32:dc:6a:1e:f0:c7:94:d0:7a:e4:99:95:7e:34:85:96:9c:7f:
60:c5:49:a2:5e:b5:02:61:04:bb:59:e8:2c:c5:a1:a7:6d:27:
c3:01:3f:ad:9d:46:b9:fe:ea:17:55:ca:ea:c1:2a:3f:c6:08:
a0:01:90:b2:5d:48:5b:48:73:d1:71:db:db:8f:9f:a0:20:1b:
6e:83:c2:1c:cd:db:d5:7b:74:06:87:d1:cb:53:d7:fa:07:32:
f3:84:a3:6d:e4:e9:39:9e:22:b8:a7:9e:66:37:14:1f:6a:08:
35:dc:fb:9c:d1:a7:cf:ed:83:9b:29:f6:6e:e2:de:d8:fa:18:
cd:a7
8. Issuer and Subject Fields
• Subject and Issuer fields are structured by
CONVENTION
• Essentially free-form text
• Free form text fields make awesome fingerprints for
tracking adversaries
9. Project Sonar
• Scans complete IPv4 space every 7 days
• Only looks at port 443
• Data published every week starting in May 2013
• ~82 million certificates observed to date
• Complete x.509 certificates are available in base64 DER format
10. SSLBL
• Blacklist of SSL certificates used by malware
• Published by abuse.ch
• Mostly commodity C2 servers
• Certificate fingerprints (SHA1 hashes)
11. A Couple Assumptions
• The Internet is big, and malicious sites are (relatively) rare
• Certificate blacklists do not provide anywhere near full
coverage (even of a particular family)
• We can trust that the certificates that are identified are
bad
• Some actors generate certificates for malicious sites in
batches that are “similar”
14. • Hijacked or
compromised certificates
• Targeted Attacks
• Initial wave of new
campaigns
• Certificates that were
issued with malicious
intent
• Commodity Malware C2
• A lot of dumb
What We Can’t Find What We Can Find
15. Logistic Regression
• Certificates are either malicious or not -> binary classifier
• Very common method, lots of implementations
• Supervised learning model (data is labeled)
• First try, let’s just feed it all the labeled data…
16. I have 5 million labeled “good” records, and 1335
“bad” records
5,000,000 / 5,001,335 = .99973
See that?
18. More specific case, random looking values
• Possible to recognize via transformations (Shannon entropy,
unique character counts, etc)
• We want to try without creating more problem-specific attributes
• Easy for a human to label our 1335 bad certificates as random
or not
Random
38%
19.
20. Again with the logistic regression
• Good news: Pretty accurate in this case
• Bad news: Entirely dependent on state and country
code. Pretty easy to figure out which are uncommon
• Easier way: Use a Bloom Filter
21. Clustering
• All about counts of occurrences
• Start with most common values for a given attribute
• Look for other common attributes that occur in the
same set of results
• Harder with categorical values since there’s no real
distance, but scripts can help
23. L = Springfield
L = <EMPTY>
L = Taipei
L = York
L = London
O = Company
O =
MyCompany
Ltd.
O = Internet Widgits Pty
Ltd
L = Springfield
O = Dis
24. • Patterns should be more common in the bad set than the overall
set
• Find a ratio of matches in overall set: matches bad set, normalize
to overall matches per bad match
• Exactly 1 - Only found in bad set, not useful
• Greater than ~1000 - Appears often in overall set. Probably
not a pattern marker.
• Between 1 < x < ~250 : The sweet spot. Potentially useful
28. What can I do with this?
• OK: Search for patterns in the set of certificates you know about;
produce a bigger blacklist.
• Better: Look for relevant strings in SSL traffic using a packet-based
IPS like Snort or Suricata
• Best: Look for patterns in your network traffic using a tool that
decodes SSL metadata (like Bro)
32. Takeaways
• It’s possible to leverage large data sets with specific
indicators to find more general patterns
• Choose your data sets carefully
• Machine Learning algorithms give you insight into your data.
Sometimes they tell you to use something simpler.
33. Thanks to:
• BSides DC
• abuse.ch for creating the SSLBL feed
• Rapid7 Labs for providing Project Sonar data
35. CA Breakdown
• Majority still self-signed
• Let’s Encrypt barely registers (Green
slice)
• A lot of our data is old, and COMODO
has offered free trial certificates for years
COMODO
8%
Self Signed
87%