Frodo Baggins presents on software analytics for software engineering and security tasks. The presentation discusses how software and how it is built and used is changing, with data now being ubiquitous and software having continuous development and release. Software analytics aims to enable software practitioners to perform data exploration and analysis to obtain useful insights. Examples of software analytics techniques discussed include XIAO for scalable code clone analysis, and SAS for incident management of online services. The presentation then shifts to discussing software analytics techniques for mobile app security, including WHYPER for natural language processing on app descriptions to link permissions to functionality, and AppContext for machine learning to classify malware.
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...Tao Xie
2018 Keynote Speaker, Symposium on Dependable Software Engineering - Theories, Tools and Applications (SETTA 2018). "Intelligent Software Engineering: Synergy between AI and Software Engineering" http://confesta2018.csp.escience.cn/dct/page/65581
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
Invited Talk at the 2018 Computing in the 21st Century Conference & Asia Faculty Summit on MSRA’s 20th Anniversary https://www.microsoft.com/en-us/research/event/computing-in-the-21st-century-conference-asia-faculty-summit-on-msras-20th-anniversary/#!agenda
SETTA'18 Keynote: Intelligent Software Engineering: Synergy between AI and So...Tao Xie
2018 Keynote Speaker, Symposium on Dependable Software Engineering - Theories, Tools and Applications (SETTA 2018). "Intelligent Software Engineering: Synergy between AI and Software Engineering" http://confesta2018.csp.escience.cn/dct/page/65581
MSRA 2018: Intelligent Software Engineering: Synergy between AI and Software ...Tao Xie
Invited Talk at the 2018 Computing in the 21st Century Conference & Asia Faculty Summit on MSRA’s 20th Anniversary https://www.microsoft.com/en-us/research/event/computing-in-the-21st-century-conference-asia-faculty-summit-on-msras-20th-anniversary/#!agenda
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
Join us as Tao Xie, Professor and Willett Faculty Scholar in the Department of Computer Science at the University of Illinois at Urbana-Champaign and ACM Distinguished Speaker, talks about Intelligent Software Engineering: Synergy between AI and Software Engineering. This is a joint meeting hosted by Chicago Chapter ACM / Loyola University Computer Science Department.
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
2018 Distinguished Speaker, the UC Irvine Institute for Software Research (ISR) Distinguished Speaker Series 2018-2019. "Intelligent Software Engineering: Synergy between AI and Software Engineering" http://isr.uci.edu/content/isr-distinguished-speaker-series-2018-2019
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next?
To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain.
This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.
Research seminar slides at URJC June 6. Briefly: social analysis; more detailed: static analysis and co-evolution (joint w Landman, Vinju, Muske; Businge).
Video (at YouTube) - http://bit.ly/19TNSTF
Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community.
This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker.
The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Alex Pinto
For the last 18 months, MLSec Project and Niddel collaborated to collect threat intelligence indicator data from multiple sources in order to make sense of the ecosystem and try to find a measure of efficiency or quality in these feeds. This initiative culminated in the creation of Combine and TIQ-test, two of the open source projects from MLSec Project. These projects have been improved upon for the last year, and are able to gather and compare data from multiple Threat Intelligence sources on the Internet.
Alex Sieira and his team have gathered aggregated usage information from intelligence sharing communities in order to determine if the added interest and "push" towards sharing is really being followed by the companies and if its adoption is putting us on the right track to close these gaps. He proposes a new set of metrics on the same vein as TIQ-test to help you understand what a "healthy" threat intelligence sharing community looks like.
To better illustrate the points and metrics, Alex will be conducting part of this analysis using usage data from some high-profile threat intelligence platforms and sharing communities that have been kind enough to contribute with usage data for this research.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationAlex Pinto
Threat Hunting has been commonly definable as a series of investigative actions that should be performed by human teams in order to cover detection gaps where automated tools fail. However, as those techniques become more and more popular and standardized, wouldn't it be the case that we are able to automate a large part of those common threat hunting activities, creating what is basicaly a definition oxymoron?
In this session, we will demonstrate how some IOC-based threat hunting techniques can be automated or constructed to augment human activity by encoding analyst intuition into repeatable data extraction and processing techniques. Those techniques can be used to simplify the triage stage and get actionable information from potential threats with minimal human interaction. The more math-oriented parts will cover descriptive statistics, graph theory, and non-linear scoring techniques on the relationships of known network-based IOCs to an organization's log data.
Our goal here is to demonstrate that by elevating the quality of data available to our automation processes we can effectively simulate "analyst intuition" on some of the more time consuming aspects of network threat hunting. IR teams can then theoretically more productive as soon as the initial triage stages, with data products that provide a “sixth sense” on what events are the ones worth of additional analyst time.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
ACM Chicago March 2019 meeting: Software Engineering and AI - Prof. Tao Xie, ...ACM Chicago
Join us as Tao Xie, Professor and Willett Faculty Scholar in the Department of Computer Science at the University of Illinois at Urbana-Champaign and ACM Distinguished Speaker, talks about Intelligent Software Engineering: Synergy between AI and Software Engineering. This is a joint meeting hosted by Chicago Chapter ACM / Loyola University Computer Science Department.
Intelligent Software Engineering: Synergy between AI and Software Engineering...Tao Xie
2018 Distinguished Speaker, the UC Irvine Institute for Software Research (ISR) Distinguished Speaker Series 2018-2019. "Intelligent Software Engineering: Synergy between AI and Software Engineering" http://isr.uci.edu/content/isr-distinguished-speaker-series-2018-2019
Data has always been used in every company irrespective of its domain to improve the operational
efficiency and the products themselves. However, analyzing and extracting information from “Big Data”
is the next revolution in technology, since previously unknown nuggets of information are now made
visible. In fact, over 90% of the data available in the world has been generated in the last two years.
“Big Data” analytics has become the next hot topic for most companies - from financial institutions to
technology companies to service providers. Likewise in software engineering, data collected about the
development of software, the operation of the software in the field, and the users feedback on software
have been used before. However, collecting and analyzing this information across hundreds of thousands
or millions of software projects gives us the unique ability to reason about the ecosystem at large, and
software in general. At no time in history has there been easier access to extremely powerful
computational resources as it is today, thanks to the advances in cloud computing, both from the
technology and business perspectives. Therefore, it is easier today than ever before to analyze big data.
In this technical briefing, we will present the state-of-the-art with respect to the research carried out in the
area of big data analytics in software engineering research. We will present the research along three
dimensions:
1) What are the software engineering problems being solved? Examples of problems include: How
much source code is newly written and how much is reused from past projects? Can we
recommend best practices to developers by observing the development of software among
hundreds of thousands of software projects?
2) What are the datasets that are being used? Examples of my datasets include: all the mobile apps
in the Google Play store, all of the world's Open Source projects, and hundreds of gigabytes of
execution logs. Such large datasets provide us with a unique view into the SE field.
3) What are the tools and techniques available to analyze the large datasets? We intend to present
generic software solutions that have been applied to big datasets in other areas of research, and
the tools and techniques created by software engineering researchers.
In the end we will present the challenges inherently present in large datasets - volume, variety, velocity,
and veracity. Such challenges often complicate the analysis of the data and can invalidate the
interpretation of the results. We will conclude with the future opportunities that are present in big data
analytics for software engineering research.
In 2003 Dave et al. have coined the term “opinion mining” to refer to “processing a set of search results for a given item, generating a list of product attributes (quality, features, etc.) and aggregating opinions about each of them (poor, mixed, good)”. Nine years later, in 2012 Brooks and Swigger have applied sentiment analysis in the context of software engineering. Today another nine years have passed and it is time to look back: what have we achieved as a research community and where should we go next?
To answer this question we conducted a systematic literature review involving 185 papers. Based on the literature review we present 1) well-defined categories of opinion mining-related software development activities, 2) available opinion mining approaches, whether they are evaluated when adopted in other studies, and how their performance is compared, 3) available datasets for performance evaluation and tool customization, and 4) concerns or limitations SE researchers might need to take into account when applying/customizing these opinion mining techniques. The results of our study serve as references to choose suitable opinion mining tools for SE tasks, and provide critical insights for the further development of opinion mining techniques in the SE domain.
This work has been done together with Bin Lin, Gabriele Bavota and Michele Lanza from Università della Svizzera italiana, Switzerland, Nathan Cassee from Eindhoven University of Technology, The Netherlands and Nicole Novielli from University of Bari, Italy.
Research seminar slides at URJC June 6. Briefly: social analysis; more detailed: static analysis and co-evolution (joint w Landman, Vinju, Muske; Businge).
Video (at YouTube) - http://bit.ly/19TNSTF
Big Data Security Analytics, Data Science and Machine Learning are a few of the new buzzwords that have invaded out industry of late. Most of what we hear are promises of an unicorn-laden, silver-bullet panacea by heavy-handed marketing folks, evoking an expected pushback from the most enlightened members of our community.
This talk will help parse what we as a community need to know and understand about these concepts and help understand where the technical details and actual capabilities of those concepts and also where they fail and how they can be exploited and fooled by an attacker.
The talk will also share results of the author's current ongoing research (on MLSec Project) of applying machine learning techniques to information secuirty monitoring.
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...Alex Pinto
For the last 18 months, MLSec Project and Niddel collaborated to collect threat intelligence indicator data from multiple sources in order to make sense of the ecosystem and try to find a measure of efficiency or quality in these feeds. This initiative culminated in the creation of Combine and TIQ-test, two of the open source projects from MLSec Project. These projects have been improved upon for the last year, and are able to gather and compare data from multiple Threat Intelligence sources on the Internet.
Alex Sieira and his team have gathered aggregated usage information from intelligence sharing communities in order to determine if the added interest and "push" towards sharing is really being followed by the companies and if its adoption is putting us on the right track to close these gaps. He proposes a new set of metrics on the same vein as TIQ-test to help you understand what a "healthy" threat intelligence sharing community looks like.
To better illustrate the points and metrics, Alex will be conducting part of this analysis using usage data from some high-profile threat intelligence platforms and sharing communities that have been kind enough to contribute with usage data for this research.
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...Alex Pinto
We could all have predicted this with our magical Big Data analytics platforms, but it seems that Machine Learning is the new hotness in Information Security. A great number of startups with ‘cy’ and ‘threat’ in their names that claim that their product will defend or detect more effectively than their neighbour's product "because math". And it should be easy to fool people without a PhD or two that math just works.
Indeed, math is powerful and large scale machine learning is an important cornerstone of much of the systems that we use today. However, not all algorithms and techniques are born equal. Machine Learning is a most powerful tool box, but not every tool can be applied to every problem and that’s where the pitfalls lie.
This presentation will describe the different techniques available for data analysis and machine learning for information security, and discuss their strengths and caveats. The Ghost of Marketing Past will also show how similar the unfulfilled promises of deterministic and exploratory analysis were, and how to avoid making the same mistakes again.
Finally, the presentation will describe the techniques and feature sets that were developed by the presenter on the past year as a part of his ongoing research project on the subject, in particular present some interesting results obtained since the last presentation on DefCon 21, and some ideas that could improve the application of machine learning for use in information security, especially in its use as a helper for security analysts in incident detection and response.
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting AutomationAlex Pinto
Threat Hunting has been commonly definable as a series of investigative actions that should be performed by human teams in order to cover detection gaps where automated tools fail. However, as those techniques become more and more popular and standardized, wouldn't it be the case that we are able to automate a large part of those common threat hunting activities, creating what is basicaly a definition oxymoron?
In this session, we will demonstrate how some IOC-based threat hunting techniques can be automated or constructed to augment human activity by encoding analyst intuition into repeatable data extraction and processing techniques. Those techniques can be used to simplify the triage stage and get actionable information from potential threats with minimal human interaction. The more math-oriented parts will cover descriptive statistics, graph theory, and non-linear scoring techniques on the relationships of known network-based IOCs to an organization's log data.
Our goal here is to demonstrate that by elevating the quality of data available to our automation processes we can effectively simulate "analyst intuition" on some of the more time consuming aspects of network threat hunting. IR teams can then theoretically more productive as soon as the initial triage stages, with data products that provide a “sixth sense” on what events are the ones worth of additional analyst time.
BSidesLV 2013 - Using Machine Learning to Support Information SecurityAlex Pinto
Big Data, Data Science, Machine Learning and Analytics are a few of the new buzzwords that have invaded out industry of late. Again we are being sold a unicorn-laden, silver-bullet panacea by heavy handed marketing folks, evoking an expected pushback from the most enlightened members of our community. However, as was the case before, there might just be enough technical meat in there to help out with our security challenges and the overwhelming odds we face everyday. And if so, what do we as a community have to know about these technologies in order to be better professionals? Can we really use the data we have been collecting to help automate our security decision making? Is a robot going to steal my job?
If you are interested in what is behind this marketing buzz and are not scared of a little math, this talk would like to address some insights into applying Machine Learning techniques to data any of us have easy access to, and try to bring home the point that if all of this technology can be used to show us “better” ads in social media and track our behavior online (and a bit more than that) it can also be used to defend our networks as well.
Keeping security top of mind while creating standards for engineering teams following the DevOps culture. This talk was designed to show off how easily it is to automate security scanning and to be the developer advocate by showing the quality of development work. We will cover some high-level topics of DevSecOps and demo some examples DevOps team can implement for free.
How do organizations build secure applications, given today's rapidly moving and evolving DevOps practices? Join Black Duck and our customer experts on best practices for application security in DevOps.
You’ll learn:
-New security challenges facing today’s popular DevOps and Continuous Integration (CI) practices, including managing custom code and open source risks with containers and traditional environments
-Best practices for designing and incorporating an automated approach to application security into your existing development environment
-Future development and application security challenges organizations will face and what they can do to prepare
Programming languages and techniques for today’s embedded andIoT worldRogue Wave Software
This presentation looks at the problem of selecting the best programming language and tools to ensure IoT software is secure, robust, and safe. By taking a look at industry best practices and decades of knowledge from other industries (such as automotive and aerospace), you will learn the criteria necessary to choose the right language, how to overcome gaps in developers’ skills, and techniques to ensure your team delivers bulletproof IoT applications.
Finding Zero-Days Before The Attackers: A Fortune 500 Red Team Case StudyDevOps.com
Graph databases offer security teams a new and more efficient way to find zero day vulnerabilities. As software development increases its reliance on open source libraries and release cycles get faster and faster application security is becoming more and more difficult. AppSec still has the same charter -- to find vulnerabilities in dev, before they reach prod, but now with more complexity and less time. Graphing source code, and traversing it to identify technical and business logic vulnerabilities, gives AppSec teams a much needed leg up identify zero days and stay ahead of attackers.
As numerous famous examples demonstrate, open source libraries are a common attack vector. Hence, AppSec teams must secure 3rd party dependencies just as vigorously as custom code. While much of the emphasis for securing open source libraries (OSS) has been on identifying and eliminating known CVEs, because OSS is widely used, zero-day vulnerabilities are often more likely to be found in popular OSS than custom code.
This webinar will cover the following:
An introduction to the emerging graph landscape and why it matters for AppSec
How a Fortune 500 company is using graphs to find zero days
Technical demo of finding technical and business logic vulnerabilities in source code
Quality of software code for a given product shipped effectively translates not only to its functional quality but as well to its non functional aspects say security. Many of the issues in code can be addressed much before they reach SCM.
Building Your Application Security Data Hub - OWASP AppSecUSADenim Group
One of the reasons application security is so challenging to address is that it spans multiple teams within an organization. Development teams build software, security testing teams find vulnerabilities, security operations staff manage applications in production and IT audit organizations make sure that the resulting software meets compliance and governance requirements. In addition, each team has a different toolbox they use to meet their goals, ranging from scanning tools, defect trackers, Integrated Development Environments (IDEs), WAFs and GRC systems. Unfortunately, in most organizations the interactions between these teams is often strained and the flow of data between these disparate tools and systems is non-existent or tediously implemented manually.
In today’s presentation, we will demonstrate how leading organizations are breaking down these barriers between teams and better integrating their disparate tools to enable the flow of application security data between silos to accelerate and simplify their remediation efforts. At the same time, we will show how to collect the proper data to measure the performance and illustrate the improvement of the software security program. The challenges that need to be overcome to enable teams and tools to work seamlessly with one another will be enumerated individually. Team and tool interaction patterns will also be outlined that reduce the friction that will arise while addressing application security risks. Using open source products such as OWASP ZAP, ThreadFix, Bugzilla and Eclipse, a significant amount of time will also be spent demonstrating the kinds of interactions that need to be enabled between tools. This will provide attendees with practical examples on how to replicate a powerful, integrated Application Security program within their own organizations. In addition, how to gather program-wide metrics and regularly calculate measurements such as mean-time-to-fix will also be demonstrated to enable attendees to monitor and ensure the continuing health and performance of their Application Security program.
Using Static Binary Analysis To Find Vulnerabilities And Backdoors in FirmwareLastline, Inc.
Over the last few years, as the world has moved closer to realizing the idea of the Internet of Things, an increasing number of the analog things with which we used to interact every day have been replaced with connected devices. The increasingly-complex systems that drive these devices have one thing in common – they must all communicate to carry out their intended functionality. Such communication is handled by firmware embedded in the device. And firmware, like any piece of software, is susceptible to a wide range of errors and vulnerabilities.
Find Out What's New With WhiteSource May 2018- A WhiteSource WebinarWhiteSource
In our latest webinar, we learned about our latest product updates here at WhiteSource. We unveiled our new, revolutionary technology as well as highlighting other cool releases and enhancements.
Building an Open Source AppSec Pipeline - 2015 Texas Linux FestMatt Tesauro
Take the ideas of DevOps and the notion of a delivery pipeline and combine them for an AppSec Pipeline. This talk covers the open source components used to create an AppSec Pipeline and the benefits we received from its implementation.
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...Deborah Schalm
Discover how Sona Srinivasan, Senior Architect of Cisco IT’s Global Architecture and Technology Services group, helps transform an IT DevOps strategy to a Security DevOps strategy, with IBM Security's assistance. Cisco is presently implementing continuous security and agile methods throughout the software development lifecycle (SDLC), and specific examples of current initiatives will be reviewed in this session.
Leverage DevOps & Agile Development to Transform Your Application Testing Pro...DevOps.com
Discover how Sona Srinivasan, Senior Architect of Cisco IT’s Global Architecture and Technology Services group, helps transform an IT DevOps strategy to a Security DevOps strategy, with IBM Security's assistance. Cisco is presently implementing continuous security and agile methods throughout the software development lifecycle (SDLC), and specific examples of current initiatives will be reviewed in this session.
Comment Meetic opère son changement technologique sur son SI. De la création d’API jusqu’à la mise en place d’une démarche qualité tout en passant par l'adoption du Behavior Driven Development, vous saurez tout sur notre parcours, sur les problématiques que nous avons rencontrées, les solutions que nous avons mises en place ainsi que sur le chemin qu'il nous reste à parcourir afin d’appréhender l’avenir avec la plus grande des sérénités. Les thèmes abordés seront : - Comment aborder des changements majeurs sur notre SI sans impacter notre performance globale ? - Migration d'un code monolithique vers des API REST en Sf2, - Exemple de microservices : AB Test, GEO, Permission, Configuration. - Déploiement avec Composer, Satis, Sf2 et Capistrano sur des centaines de serveurs, - Démarche Qualité (Back, Front, App) : nos métriques, outils du marché, outils interne, gestion aux changements. - Méthodologie : Agilité, DevOps, TDD, BDD. - Next steps : Kafka, Continuous Delivery.
Software Security Assurance for DevOps - Hewlett Packard Enterprise + Black DuckBlack Duck by Synopsys
Presented August 11, 2016 by Michael Right, Senior Product Manager, HPE Security Fortify; Mike Pittenger, VP of Security Strategy, Black Duck.
Open source software is an integral part of today’s technology ecosystem, powering everything from enterprise and mobile applications to cloud computing, containers and the Internet of Things.
While open source offers attractive economic and productivity benefits for application development, it also presents organizations with significant security challenges. Every year, thousands of new open source security vulnerabilities – such as Heartbleed, Venom and Shellshock – are reported. Unfortunately, many organizations lack visibility into and control of their open source. Addressing this challenge is vital for ensuring security in applications and containers.
Whether you’re building software for customers or for internal use, the majority of the code is likely open source and securing it is no easy task. In this session, you’ll learn about:
• The evolving DevOps and software security assurance lifecycle in the age of open source
• The software security considerations CISOs, security, and development teams must address when using open source
• An automated approach to identifying vulnerabilities and managing software security assurance for custom and open source code.
Similar to Software Analytics: Data Analytics for Software Engineering and Security (20)
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
MSR 2022 Foundational Contribution Award Talk on "Software Analytics: Reflection and Path Forward" by Dongmei Zhang and Tao Xie
https://conf.researchr.org/info/msr-2022/awards
Diversity and Computing/Engineering: Perspectives from AlliesTao Xie
Slides from the invited talk given on Feb 13, 2019 being part of a diversity and inclusion week - Infusion 2019. Infusion is a diversity focused week for the Illinois College of Engineering, hosted by the Dean's Student Advisory Committee of Engineering Council. This invited talk was co-hosted by the NSBE - UIUC chapter.
Transferring Software Testing Tools to PracticeTao Xie
ACM SIGSOFT Webinar co-presented by Nikolai Tillmann (Microsoft), Judith Bishop (Microsoft Research), Pratap Lakshman (Microsoft), Tao Xie (University of Illinois at Urbana-Champaign) http://www.sigsoft.org/resources/webinars.html
Transferring Software Testing and Analytics Tools to PracticeTao Xie
Keynote Talk in the Workshop on Testing: Academia-Industry Collaboration, Practice and Research Techniques (TAIC PART 2016) http://www2016.taicpart.org/
Towards Mining Software Repositories Research that MattersTao Xie
Towards Mining Software Repositories Research that Matters. Talk slides at Next Generation of Mining Software Repositories '14 (Pre-FSE 2014 Event), Nov 15–16. HKUST, Hong Kong http://ng2014.msrworld.org/
Teaching and Learning Programming and Software Engineering via Interactive Ga...Tao Xie
Pex4Fun (http://www.pex4fun.com/) is a web-based educational gaming environment for teaching and learning programming and software engineering. Pex4Fun can be used to teach and learn programming and software engineering at many levels, from high school all the way through graduate courses. With Pex4Fun, a student edits code in any browser – with Intellisense – and Pex4Fun executes it and analyzes it in the cloud. Pex4Fun connects teachers, curriculum authors, and students in a unique social experience, tracking and streaming progress updates in real time. In particular, Pex4Fun finds interesting and unexpected input values (with Pex, an advanced test-generation tool) that help students understand what their code is actually doing. The real fun starts with coding duels where a student writes code to implement a teacher's secret specification (in the form of sample-solution code not visible to the student). Pex4Fun finds any discrepancies in behavior between the student’s code and the secret specification. Such discrepancies are given as feedback to the student to guide how to fix the student’s code to match the behavior of the secret specification. In early 2014, Code Hunt (https://www.codehunt.com/) has been released as a redesign of Pex4Fun as game.
First Steps with Globus Compute Multi-User EndpointsGlobus
In this presentation we will share our experiences around getting started with the Globus Compute multi-user endpoint. Working with the Pharmacology group at the University of Auckland, we have previously written an application using Globus Compute that can offload computationally expensive steps in the researcher's workflows, which they wish to manage from their familiar Windows environments, onto the NeSI (New Zealand eScience Infrastructure) cluster. Some of the challenges we have encountered were that each researcher had to set up and manage their own single-user globus compute endpoint and that the workloads had varying resource requirements (CPUs, memory and wall time) between different runs. We hope that the multi-user endpoint will help to address these challenges and share an update on our progress here.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Globus
The Earth System Grid Federation (ESGF) is a global network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. Many of these petabyte-scale data archives are located in proximity to large high-performance computing (HPC) or cloud computing resources, but the primary workflow for data users consists of transferring data, and applying computations on a different system. As a part of the ESGF 2.0 US project (funded by the United States Department of Energy Office of Science), we developed pre-defined data workflows, which can be run on-demand, capable of applying many data reduction and data analysis to the large ESGF data archives, transferring only the resultant analysis (ex. visualizations, smaller data files). In this talk, we will showcase a few of these workflows, highlighting how Globus Flows can be used for petabyte-scale climate analysis.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns
Unlocking Business Potential: Tailored Technology Solutions by Prosigns
Discover how Prosigns, a leading technology solutions provider, partners with businesses to drive innovation and success. Our presentation showcases our comprehensive range of services, including custom software development, web and mobile app development, AI & ML solutions, blockchain integration, DevOps services, and Microsoft Dynamics 365 support.
Custom Software Development: Prosigns specializes in creating bespoke software solutions that cater to your unique business needs. Our team of experts works closely with you to understand your requirements and deliver tailor-made software that enhances efficiency and drives growth.
Web and Mobile App Development: From responsive websites to intuitive mobile applications, Prosigns develops cutting-edge solutions that engage users and deliver seamless experiences across devices.
AI & ML Solutions: Harnessing the power of Artificial Intelligence and Machine Learning, Prosigns provides smart solutions that automate processes, provide valuable insights, and drive informed decision-making.
Blockchain Integration: Prosigns offers comprehensive blockchain solutions, including development, integration, and consulting services, enabling businesses to leverage blockchain technology for enhanced security, transparency, and efficiency.
DevOps Services: Prosigns' DevOps services streamline development and operations processes, ensuring faster and more reliable software delivery through automation and continuous integration.
Microsoft Dynamics 365 Support: Prosigns provides comprehensive support and maintenance services for Microsoft Dynamics 365, ensuring your system is always up-to-date, secure, and running smoothly.
Learn how our collaborative approach and dedication to excellence help businesses achieve their goals and stay ahead in today's digital landscape. From concept to deployment, Prosigns is your trusted partner for transforming ideas into reality and unlocking the full potential of your business.
Join us on a journey of innovation and growth. Let's partner for success with Prosigns.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...Juraj Vysvader
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I didn't get rich from it but it did have 63K downloads (powered possible tens of thousands of websites).
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
Les Buildpacks existent depuis plus de 10 ans ! D’abord, ils étaient utilisés pour détecter et construire une application avant de la déployer sur certains PaaS. Ensuite, nous avons pu créer des images Docker (OCI) avec leur dernière génération, les Cloud Native Buildpacks (CNCF en incubation). Sont-ils une bonne alternative au Dockerfile ? Que sont les buildpacks Paketo ? Quelles communautés les soutiennent et comment ?
Venez le découvrir lors de cette session ignite
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
Tim Combridge from Sensible Giraffe and Salesforce Ben presents some important tips that all developers should know when dealing with Flows in Salesforce.
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
Data privacy is one of the most critical issues that businesses face. This presentation shares insights on the principles and best practices for ensuring the resilience and security of your workload.
Drawing on a real-life project from the HR industry, the various challenges will be demonstrated: data protection, self-healing, business continuity, security, and transparency of data processing. This systematized approach allowed to create a secure AWS cloud infrastructure that not only met strict compliance rules but also exceeded the client's expectations.
We describe the deployment and use of Globus Compute for remote computation. This content is aimed at researchers who wish to compute on remote resources using a unified programming interface, as well as system administrators who will deploy and operate Globus Compute services on their research computing infrastructure.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
Modern design is crucial in today's digital environment, and this is especially true for SharePoint intranets. The design of these digital hubs is critical to user engagement and productivity enhancement. They are the cornerstone of internal collaboration and interaction within enterprises.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Software Analytics: Data Analytics for Software Engineering and Security
1. Software Analytics: Data Analytics
for Software Engineering and Security
(Speaker Info)
Frodo Baggins
Ring Bearer
FOTR, LLC
Tao Xie
Department of Computer Science
University of Illinois at Urbana-Champaign, USA
taoxie@illinois.edu
In Collaboration with Microsoft Research and NC State University
5. How people use software is changing…
Individual Isolated
Not much data/content
generation
6. How people use software is changing…
Individual
Social
Isolated
Not much data/content
generation
Collaborative
Huge amount of data/artifacts
generated anywhere anytime
8. How software is built & operated is changing…
Data pervasive
Long product cycle
Experience & gut-feeling
In-lab testing
Informed decision making
Centralized development
Code centric
Debugging in the large
Distributed development
Continuous release
… …
9. How software is built & operated is changing…
Data pervasive
Long product cycle
Experience & gut-feeling
In-lab testing
Informed decision making
Centralized development
Code centric
Debugging in the large
Distributed development
Continuous release
… …
10. Software Analytics
Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
Dongmei Zhang, Yingnong Dang, Jian-Guang Lou, Shi Han, Haidong Zhang, and Tao Xie. Software
Analytics as a Learning Case in Practice: Approaches and Experiences. In MALETS 2011
http://research.microsoft.com/en-us/groups/sa/malets11-analytics.pdf
11. Software Analytics
Software analytics is to enable software
practitioners to perform data exploration and
analysis in order to obtain insightful and
actionable information for data-driven tasks
around software and services.
http://research.microsoft.com/en-us/groups/sa/
http://research.microsoft.com/en-us/news/features/softwareanalytics-052013.aspx
12. Data sources
Runtime traces
Program logs
System events
Perf counters
…
Usage log
User surveys
Online forum posts
Blog & Twitter
…
Source code
Bug history
Check-in history
Test cases
Eye tracking
MRI/EMG
…
15. Target audience – software practitioners
Developer
Tester
Program Manager
Usability engineer
Designer
Support engineer
Management personnel
Operation engineer
16. Output – insightful information
• Conveys meaningful and useful understanding or
knowledge towards completing the target task
• Not easily attainable via directly investigating raw data
without aid of analytics technologies
• Example
– It is easy to count the number of re-opened bugs, but how to
find out the primary reasons for these re-opened bugs?
17. Output – actionable information
• “So what” -- enables software practitioners to come up
with concrete solutions towards completing the target
task
• Example
– Why bugs were re-opened?
• A list of bug groups each with the same reason of re-
opening
18. Research topics & technology pillars
Software
Users
Software
Development
Process
Software
System
Vertical
Horizontal
Information Visualization
Data Analysis Algorithms
Large-scale Computing
19. Outline
• Overview of Software Analytics
• Software Engineering Tasks
– XIAO: Scalable code clone analysis
– SAS: Incident management of online services
• Mobile App Security Tasks
– WHYPER: NLP on app descriptions
– AppContext: Machine learning to classify malware
21. XIAO: Code Clone Analysis
• Motivation
– Copy-and-paste is a common developer behavior
– A real tool widely adopted internally and externally
• XIAO enables code clone analysis in the following way
– High tunability
– High scalability
– High compatibility
– High explorability
22. High tunability – what you tune is what you get
• Intuitive similarity metric: effective control of the
degree of syntactical differences between two code
snippets
for (i = 0; i < n; i ++) {
a ++;
b ++;
c = foo(a, b);
d = bar(a, b, c);
e = a + c; }
for (i = 0; i < n; i ++) {
c = foo(a, b);
a ++;
b ++;
d = bar(a, b, c);
e = a + d;
e ++; }
23. High explorability
1. Clone navigation based on source tree hierarchy
2. Pivoting of folder level statistics
3. Folder level statistics
4. Clone function list in selected folder
5. Clone function filters
6. Sorting by bug or refactoring potential
7. Tagging
1 2 3 4 5 6
7
1. Block correspondence
2. Block types
3. Block navigation
4. Copying
5. Bug filing
6. Tagging
1
2
3
4
1
6
5
24. Scenarios & Solutions
Quality gates at milestones
• Architecture refactoring
• Code clone clean up
• Bug fixing
Post-release maintenance
• Security bug investigation
• Bug investigation for sustained engineering
Development and testing
• Checking for similar issues before check-in
• Reference info for code review
• Supporting tool for bug triage
Online code clone search
Offline code clone analysis
26. More secure Microsoft products
Code Clone Search service integrated into
workflow of Microsoft Security Response Center
Over 590 million lines of code indexed across
multiple products
Real security issues proactively identified and
addressed
27. Example – MS Security Bulletin MS12-034
Combined Security Update for Microsoft Office, Windows, .NET Framework, and
Silverlight, published: Tuesday, May 08, 2012
3 publicly disclosed vulnerabilities and seven privately reported involved. Specifically,
one is exploited by the Duqu malware to execute arbitrary code when a user opened
a malicious Office document
Insufficient bounds check within the font parsing subsystem of win32k.sys
Cloned copy in gdiplus.dll, ogl.dll (office), Silver Light, Windows Journal viewer
Microsoft Technet Blog about this bulletin
However, we wanted to be sure to address the vulnerable code wherever it appeared
across the Microsoft code base. To that end, we have been working with Microsoft
Research to develop a “Cloned Code Detection” system that we can run for every
MSRC case to find any instance of the vulnerable code in any shipping product. This
system is the one that found several of the copies of CVE-2011-3402 that we are
now addressing with MS12-034.
29. Motivation
Incident Management (IcM) is a critical task to
assure service quality
• Online services are increasingly popular & important
• High service quality is the key
30. Incident Management: Workflow
Detect a
service
issue
Alert On-
Call
Engineers
(OCEs)
Investigate
the problem
Restore
the
service
Fix root cause
via
postmortem
analysis
33. SAS: Incident management of online services
SAS, developed and deployed to effectively reduce MTTR
(Mean Time To Restore) via automatically analyzing
monitoring data
3
3
Design Principle of SAS
Automating Analysis
Handling Heterogeneity
Accumulating Knowledge
Supporting human-in-the-loop (HITL)
35. Industry Impact of SAS
Deployment
• SAS deployed to
worldwide datacenters for
Service X (serving
hundreds of millions of
users) since June 2011
• OCEs now heavily depend
on SAS
Usage
• SAS helped successfully
diagnose ~76% of the
service incidents assisted
with SAS
36. Outline
• Overview of Software Analytics
• Software Engineering Tasks
– XIAO: Scalable code clone analysis
– SAS: Incident management of online services
• Mobile App Security Tasks
– WHYPER: NLP on app descriptions
– AppContext: Machine learning to classify malware
37. “Conceptual” Model
38
APP DEVELOPERS
APP USERS
App
Functional
Requirements
App Security
Requirements
User
Functional
Requirements
User Security
Requirements
informal: app description, etc. permission list, etc.
App Code
42. o Focus on permission app descriptions
o permissions (protecting user understandable resources)
should be discussed
o What does the users expect (w.r.t. app functionalities)?
o GPS Tracker: record and send location
o Phone-Call Recorder: record audio during phone call
WHYPER: Text Analytics for Mobile Security
43
App Description Sentence
Permission
Linkage
Pandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013
http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf
43. WHYPER Overview
Application Market
WHYPER
DEVELOPERS
USERS
44
Pandita et al. WHYPER: Towards Automating Risk Assessment of Mobile Applications. USENIX Security 2013
http://web.engr.illinois.edu/~taoxie/publications/usenixsec13-whyper.pdf
• Enhance user experience while installing apps
• Enforce functionality disclosure on developers
• Complement program analysis to ensure justifications
44. Natural Language Processing on App Description
45
• “Also you can share the yoga exercise to your friends via Email and SMS.
– Implication of using the contact permission
– Permission sentences
• Confounding effects:
– Certain keywords such as “contact” have a confounding meaning
– E.g., “... displays user contacts, ...” vs “... contact me at abc@xyz.com”.
• Semantic inference:
– Sentences describe a sensitive action w/o referring to keywords
– E.g., “share yoga exercises with your friends via Email and SMS”
NLP + Semantic Graphs/Ontologies Derived from Android API Documents
45. • Synonym analysis
• Ex non-permission sentence: “You can now turn recordings into
ringtones.”
• functionality that allows users to create ringtones from previously recorded sounds but
NOT requiring permission to record audio
• false positive due to using synonym: (turn, start)
• Limitations of Semantic Graphs
• Ex. permission sentence: “blow into the mic to extinguish the flame like
a real candle”
• false negative due to failing to associate “blow into” with “record”
• Automatic mining from user comments and forums
Challenges
46
48. Not All Malware Developers Are “Dumb” or “Lazy”
Benign? Malicious?
49. Our Insight
Different goals of benign apps vs. malware.
• Benign apps
– Meet requirements from users (as delivering utility)
• Malware
– Trigger malicious behaviors frequently (as maximizing profits)
– Evade detection (as prolonging lifetime)
50
50. Differentiating characteristics
Mobile malware (vs. benign apps)
– Frequently enough to meet the need: frequent
occurrences of imperceptible system events;
• E.g., many malware families trigger malicious behaviors via
background events.
– Not too frequently for users to notice anomaly:
indicative states of external environments
• E.g., Send premium SMS every 12 hours
Balance!!!
51. ActionReceiver.OnReceive()
Date date = new Date();
if(data.getHours>23 || date.getHours< 5 ){
ContextWrapper.StartService(MainService);
…
MainService.OnCreate()
DummyMainMethod()
SendTextActivity$4.onClick()
SplashActivity.OnCreate()
SmsManager.sendTextMessage()
long last = db.query(“LastConnectTime");
long current = System.currentTimeMillis();
if(current – last > 43200000 ){
SmsManager.sendTextMessage();
db.save(“LastConnectTime”, current);
…
SendTextActivity$5.run()
MainService.b()
ContextWrapper.StartService()
The app will send an SMS when
• user clicks a button in the app
Example of malicious app
SendTextActivity$4.onClick
SmsManager.sendTextMessage
52. ActionReceiver.OnReceive()
Date date = new Date();
if(data.getHours>23 || date.getHours< 5 ){
ContextWrapper.StartService(MainService);
…
MainService.OnCreate()
DummyMainMethod()
SendTextActivity$4.onClick()
SplashActivity.OnCreate()
SmsManager.sendTextMessage()
long last = db.query(“LastConnectTime");
long current = System.currentTimeMillis();
if(current – last > 43200000 ){
SmsManager.sendTextMessage();
db.save(“LastConnectTime”, current);
…
SendTextActivity$5.run()MainService.b()
ContextWrapper.StartService()
The app will send an SMS when
• phone signal strength changes
(frequent)
• current time is within 11PM-5 AM
(not too frequent, User not
around)
Example of malicious app
if(data.getHours>23 || date.getHours< 5 ){
Android.intent.action.SIG_STR
53. ActionReceiver.OnReceive()
Date date = new Date();
if(data.getHours>23 || date.getHours< 5 ){
ContextWrapper.StartService(MainService);
…
MainService.OnCreate()
DummyMainMethod()
SendTextActivity$4.onClick()
SplashActivity.OnCreate()
SmsManager.sendTextMessage()
long last = db.query(“LastConnectTime");
long current = System.currentTimeMillis();
if(current – last > 43200000 ){
SmsManager.sendTextMessage();
db.save(“LastConnectTime”, current);
…
SendTextActivity$5.run()
MainService.b()
ContextWrapper.StartService()
The app will send an SMS when
• user enters the app (frequent)
• (current time – time when last msg
sent) >12 hours (not too frequent)
Example
if(current – last > 43200000 ){
54. AppContext
• Capture differentiating characteristics with
contexts of security-sensitive behavior.
• Leverage contexts in machine learning
(classification) to differentiate malware and
benign apps.
Yang et al. AppContext: Differentiating Malicious and Benign Mobile App Behavior Under Contexts. ICSE 2015.
http://taoxie.cs.illinois.edu/publications/icse15-appcontext.pdf
55. Techniques
• Abstraction for expressing context of security-
sensitive behaviors, e.g., a permission protected
API method.
– To precisely capture the differentiating
characteristics
• Inter-component analysis for extracting contexts
– To identify entry point for activation events
– To connect control flows for context factors
56. Context of security-sensitive behavior
• Activation events:
• E.g., signal strength changes
• Context factors:
• Environmental attributes for affecting security-
sensitive behavior’s invocation (or not)
• E.g., current system time
58. Context-based
Security-Behavior Classification
Context1:
(Event: Signal strength changes),
(Factor: Calendar)
Context2:
(Event: Entering app),
(Factor: Database, SystemTime)
Context3:
(Event: Clicking a button)
Transforming Labelling Training ClassifyingStep 1. Transform contexts for each app’s security behavior as
features
59. Context-based
Security-Behavior Classification (Cont.)
Transforming Labelling Training Classifying
Systematically label security-sensitive method calls as
malicious based on the existing malware signatures
Support Vector Machine (SVM)
• SVM is resilient to over-fitting
• SVM can handle high dimension data such as our
context factor data (dimension reduction may be
another option).
60. Evaluation
Subjects: 846 Android apps
• 633 benign apps: randomly selected from popular
apps on Google Play.
• 202 malicious apps: collected through three
different malware dataset (Genome, VirusShare,
Contagio).
• 11 open source apps: randomly selected from F-
Droid.
61. Research Questions
• RQ1: How effective is AppContext in identifying
malware?
• RQ2: How do activation events and context factors
in our context definition contribute to the
effectiveness of malware identification?
• RQ3: How accurate is our static analysis in inferring
contexts?
65. Limitations
• False negatives
– Malicious behaviors triggered by UI events and
without context factors.
• UI events have less indication of the maliciousness of a
security-sensitive method call
• False positives
– Reflective method calls, dynamic code loading in
benign apps.
– Uncommon security-sensitive method calls used in
benign apps.