Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
For the past 18 months, Niddel has been collecting threat intelligence indicator data from multiple sources in order to make sense of the ecosystem and try to find a measure of efficiency or quality in these feeds. This initiative culminated in the creation of Combine and TIQ-test, two of the open source projects from MLSec Project. These projects have been improved upon for the last year and are able to gather and compare data from multiple Threat Intelligence sources on the Internet.
We take this analysis a step further and extract insights form more than 12 months of collected threat intel data to verify the overlap and uniqueness of those sources. If we are able to find enough overlap, there could be a strategy that could put together to acquire an optimal number of feeds, but as Niddel demonstrated on the 2015 Verizon DBIR, that is not the case.
We also gathered aggregated usage information from intelligence sharing communities in order to determine if the added interest and "push" towards sharing is really being followed by the companies and if its adoption is putting us in the right track to close these gaps.
Join us in an data-driven analysis of over an year of collected Threat Intelligence indicators and their sharing communities!
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionAlex Pinto
There is no doubt that indicators of compromise (IOCs) are here to stay. However, even the most mature incident response (IR) teams are currently mainly focused on matching known indicators to their captured traffic or logs. The real “eureka” moments of using threat intelligence mostly come out of analyst intuition. You know, the ones that are almost impossible to hire.
In this session, we show you how you can apply descriptive statistics, graph theory, and non-linear scoring techniques on the relationships of known network IOCs to log data. Learn how to use those techniques to empower IR teams to encode analyst intuition into repeatable data techniques that can be used to simplify the triage stage and get actionable information with minimal human interaction.
With these results, we can make IR teams more productive as soon as the initial triage stages, by providing them data products that provide a “sixth sense” on what events are the ones worth analyst time. They also make painfully evident which IOC feeds an organization consume that are being helpful to their detection process and which ones are not.
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
Follow along with the R Markdown file at http://rpubs.com/alexcpsec/tiq-test-Summer2014-2
Source code available at:
https://github.com/mlsecproject/tiq-test
https://github.com/mlsecproject/tiq-test-Summer2014
https://github.com/mlsecproject/combine
---------
Full Abstract:
Threat Intelligence feeds are now being touted as the saving grace for SIEM and log management deployments, and as a way to supercharge incident detection and even response practices. We have heard similar promises before as an industry, so it is only fair to try to investigate. Since the actual number of breaches and attacks worldwide is unknown, it is impossible to measure how good threat intelligence feeds really are, right? Enter a new scientific breakthrough developed over the last 300 years: statistics!
This presentation will consist of a data-driven analysis of a cross-section of threat intelligence feeds (both open-source and commercial) to measure their statistical bias, overlap, and representability of the unknown population of breaches worldwide. Are they a statistical good measure of the population of "bad stuff" happening out there? Is there even such a thing? How tuned to your specific threat surface are those feeds anyway? Regardless, can we actually make good use of them even if the threats they describe have no overlap with the actual incidents you have been seeing in your environment?
We will provide an open-source tool for attendees to extract, normalize and export data from threat intelligence feeds to use in their internal projects and systems. It will be pre-configured with current OSINT network feed and easily extensible for private or commercial feeds. All the statistical code written and research data used (from the open-source feeds) will be made available in the spirit of reproducible research. The tool itself will be able to be used by attendees to perform the same type of tests on their own data.
Join Alex and Kyle on a journey through the actual real-world usability of threat intelligence to find out which mix of open source and private feeds are right for your organization.
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...Alex Pinto
This session will center on a market-centric and technological exploration of commercial and open-source threat intelligence feeds that are becoming common to be offered as a way to improve the defense capabilities of organizations.
While not all Threat Intelligence can be represented as "indicator feeds", this space has enough market attention that it deserves a proper scientific, evidence-based investigation so that practitioners and decision makers can maximize the results they are able to get for the data they have available.
The presentation will consist of a data-driven analysis of a cross-section of threat intelligence feeds (both open-source and commercial) to measure their statistical bias, overlap, and representability of the unknown population of breaches worldwide. All the statistical code written and research data used (from the publicly available feeds) will be made available in the spirit of reproducible research. The tool itself will be able to be used by attendees to perform the same type of tests on their own data (called tiq-test).
Some of the important questions and answers that emerge in this presentation include:
"Are Threat Intelligence Feeds a statistical good measure of the population of 'bad stuff' happening out there? Is there even such a thing?"
"How tuned to YOUR specific threat surface are those feeds?"
"Can we actually make good use of them even if the threats they describe have no overlap with the actual incidents you have been seeing in your environment? (hint: probably not)"
We will provide an open-source tool for attendees to extract, normalize and export data from threat intelligence feeds to use in their internal projects and systems. It will be pre-configured with current publicly available network feeds and easily extensible for private or commercial feeds (called combine).
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
Video (at YouTube) - http://bit.ly/19TNSTF
Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community.
This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker.
The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and SharingAlex Pinto
For the past 18 months, Niddel has been collecting threat intelligence indicator data from multiple sources in order to make sense of the ecosystem and try to find a measure of efficiency or quality in these feeds. This initiative culminated in the creation of Combine and TIQ-test, two of the open source projects from MLSec Project. These projects have been improved upon for the last year and are able to gather and compare data from multiple Threat Intelligence sources on the Internet.
We take this analysis a step further and extract insights form more than 12 months of collected threat intel data to verify the overlap and uniqueness of those sources. If we are able to find enough overlap, there could be a strategy that could put together to acquire an optimal number of feeds, but as Niddel demonstrated on the 2015 Verizon DBIR, that is not the case.
We also gathered aggregated usage information from intelligence sharing communities in order to determine if the added interest and "push" towards sharing is really being followed by the companies and if its adoption is putting us in the right track to close these gaps.
Join us in an data-driven analysis of over an year of collected Threat Intelligence indicators and their sharing communities!
Beyond Matching: Applying Data Science Techniques to IOC-based DetectionAlex Pinto
There is no doubt that indicators of compromise (IOCs) are here to stay. However, even the most mature incident response (IR) teams are currently mainly focused on matching known indicators to their captured traffic or logs. The real “eureka” moments of using threat intelligence mostly come out of analyst intuition. You know, the ones that are almost impossible to hire.
In this session, we show you how you can apply descriptive statistics, graph theory, and non-linear scoring techniques on the relationships of known network IOCs to log data. Learn how to use those techniques to empower IR teams to encode analyst intuition into repeatable data techniques that can be used to simplify the triage stage and get actionable information with minimal human interaction.
With these results, we can make IR teams more productive as soon as the initial triage stages, by providing them data products that provide a “sixth sense” on what events are the ones worth analyst time. They also make painfully evident which IOC feeds an organization consume that are being helpful to their detection process and which ones are not.
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)Alex Pinto
Follow along with the R Markdown file at http://rpubs.com/alexcpsec/tiq-test-Summer2014-2
Source code available at:
https://github.com/mlsecproject/tiq-test
https://github.com/mlsecproject/tiq-test-Summer2014
https://github.com/mlsecproject/combine
---------
Full Abstract:
Threat Intelligence feeds are now being touted as the saving grace for SIEM and log management deployments, and as a way to supercharge incident detection and even response practices. We have heard similar promises before as an industry, so it is only fair to try to investigate. Since the actual number of breaches and attacks worldwide is unknown, it is impossible to measure how good threat intelligence feeds really are, right? Enter a new scientific breakthrough developed over the last 300 years: statistics!
This presentation will consist of a data-driven analysis of a cross-section of threat intelligence feeds (both open-source and commercial) to measure their statistical bias, overlap, and representability of the unknown population of breaches worldwide. Are they a statistical good measure of the population of "bad stuff" happening out there? Is there even such a thing? How tuned to your specific threat surface are those feeds anyway? Regardless, can we actually make good use of them even if the threats they describe have no overlap with the actual incidents you have been seeing in your environment?
We will provide an open-source tool for attendees to extract, normalize and export data from threat intelligence feeds to use in their internal projects and systems. It will be pre-configured with current OSINT network feed and easily extensible for private or commercial feeds. All the statistical code written and research data used (from the open-source feeds) will be made available in the spirit of reproducible research. The tool itself will be able to be used by attendees to perform the same type of tests on their own data.
Join Alex and Kyle on a journey through the actual real-world usability of threat intelligence to find out which mix of open source and private feeds are right for your organization.
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tid...Alex Pinto
This session will center on a market-centric and technological exploration of commercial and open-source threat intelligence feeds that are becoming common to be offered as a way to improve the defense capabilities of organizations.
While not all Threat Intelligence can be represented as "indicator feeds", this space has enough market attention that it deserves a proper scientific, evidence-based investigation so that practitioners and decision makers can maximize the results they are able to get for the data they have available.
The presentation will consist of a data-driven analysis of a cross-section of threat intelligence feeds (both open-source and commercial) to measure their statistical bias, overlap, and representability of the unknown population of breaches worldwide. All the statistical code written and research data used (from the publicly available feeds) will be made available in the spirit of reproducible research. The tool itself will be able to be used by attendees to perform the same type of tests on their own data (called tiq-test).
Some of the important questions and answers that emerge in this presentation include:
"Are Threat Intelligence Feeds a statistical good measure of the population of 'bad stuff' happening out there? Is there even such a thing?"
"How tuned to YOUR specific threat surface are those feeds?"
"Can we actually make good use of them even if the threats they describe have no overlap with the actual incidents you have been seeing in your environment? (hint: probably not)"
We will provide an open-source tool for attendees to extract, normalize and export data from threat intelligence feeds to use in their internal projects and systems. It will be pre-configured with current publicly available network feeds and easily extensible for private or commercial feeds (called combine).
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
Video (at YouTube) - http://bit.ly/19TNSTF
Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community.
This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker.
The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.