DLP is a data security technology that detects and prevents data breach incidents by monitoring data in-use, in-motion and at-rest. It has been widely applied for regulatory compliances, data privacy and intellectual property protection. This talk will introduce basic concepts and security models to describe DLP systems with high level architecture. DLP is an interesting discipline with content inspection techniques supported by sophisticated algorithms. Special investigation will be taken for a few algorithms: document fingerprinting, data record fingerprinting, scalable M-pattern string match and etc..
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
DLP is a technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in-motion (network traffic), and at-rest (data storage). It has been driven by regulatory compliances and intellectual property protection. This talk will introduce DLP models that describe the capabilities and scope that a DLP system should cover. A few system categories will be discussed accordingly with high-level system architecture. DLP is an interesting technology in that it provides advanced content inspection techniques. As such, a few content inspection techniques will be proposed and investigated in rigorous terms.
Technology Overview - Symantec Data Loss Prevention (DLP)Iftikhar Ali Iqbal
The presentation provides the following:
- Symantec Corporate Overview
- Solution Portfolio of Symantec
- Symantec Data Loss Prevention - Introduction
- Symantec Data Loss Prevention - Components
- Symantec Data Loss Prevention - Features & Use Cases
- Symantec Data Loss Prevention - System Requirements
- Symantec Data Loss Prevention - Appendix (extra information)
This provides a brief overview of Symantec Data Loss Prevention (DLP). Please note all the information is based prior to May 2016 and the full integration of Blue Coat Systems's set of solutions.
This document provides an overview of data loss prevention (DLP) technology. It discusses what DLP is, different DLP models for data in use, in motion, and at rest. It also covers typical DLP system architecture, approaches for data classification and identification, and some technical challenges. The document references DLP product websites and summarizes two research papers on using machine learning for automatic text classification to identify sensitive data for DLP systems.
Symantec Data Loss Prevention helps organizations address the serious problem of data loss by providing visibility into where sensitive data is located and how it is being used, enabling monitoring of data movement and detection of policy violations, and offering flexible options for protecting data and educating employees to prevent accidental or intentional data loss. Symantec is a leader in this field with the most highly rated products, largest customer base, and deepest expertise in helping customers improve security, comply with regulations, and reduce the costs of data breaches.
Data Loss Prevention (DLP) - Fundamental Concept - ErykEryk Budi Pratama
This document discusses data loss prevention (DLP) concepts and implementations. It begins with an overview of data governance and the data lifecycle. It then defines DLP, explaining how DLP solutions protect data in motion, at rest, and in use. Sample DLP deployments are shown, outlining key activities and considerations for implementation such as governance, infrastructure, and a phased approach. Finally, examples of DLP use cases are provided for data in motion like email and data in use on workstations.
Data loss prevention ensures critical corporate information is kept safely within networks and helps administrators control data transfers. It is important for maintaining corporate image, compliance, and avoiding penalties. DLP identifies sensitive data like credit cards, social security numbers, business plans, and financial records. It monitors, detects, prevents data leakage, and notifies users of violations while protecting sensitive information. Choosing a DLP product requires considering budget, in-house vs outsourcing needs, policies, incident response, and compatibility with existing infrastructure.
At the highest level, our mission continues to be about keeping our customers (companies and governments) safe from ever-evolving digital threats, so they are confident to move business forward. Our strategy to accomplish this mission centers around four key pillars: Advanced Threat Protection, Information Protection for On Premise and Cloud, Security as a Service -- all anchored by a Unified Security Analytics Platform. Symantec Data Loss Prevention is a foundational product in the Information Protection for On Premise and Cloud pillar.
Everyone knows that storing and accessing data and applications in the cloud and on mobile devices provides makes work much easier and productive by allowing employees to work everywhere they need to.
It allows for great business agility – applications are always up to date, new functionality and processes can be deployed and activated quickly and organizations can adjust things on the fly if they need to.
It also brings the convenience factor – all employees to work in the way that they need to, collaboration and sharing is made vastly easier with cloud applications and storage.
But it brings with it all the challenges of securing devices and applications that your don’t own, and whilst saying NO might be the right thing for security, end users will find a way around it. Right now, close to 30% of employees use their personal devices for work. And that number is on the rise, potentially turning BYOD into Bring Your Own Disaster.
The document discusses implementing a data loss prevention (DLP) system to protect sensitive information. It describes why DLP is needed due to growing costs of data breaches and regulations. It then explains the key components of DLP, including discovering sensitive data, monitoring its flow, enforcing policies, and reporting/auditing. The document outlines how DLP can be applied across endpoints, networks and data centers to classify data, discover risks, and enforce policies to prevent data loss and unauthorized use.
Overview of Data Loss Prevention (DLP) TechnologyLiwei Ren任力偉
DLP is a technology that detects potential data breach incidents in timely manner and prevents them by monitoring data in-use (endpoints), in-motion (network traffic), and at-rest (data storage). It has been driven by regulatory compliances and intellectual property protection. This talk will introduce DLP models that describe the capabilities and scope that a DLP system should cover. A few system categories will be discussed accordingly with high-level system architecture. DLP is an interesting technology in that it provides advanced content inspection techniques. As such, a few content inspection techniques will be proposed and investigated in rigorous terms.
Technology Overview - Symantec Data Loss Prevention (DLP)Iftikhar Ali Iqbal
The presentation provides the following:
- Symantec Corporate Overview
- Solution Portfolio of Symantec
- Symantec Data Loss Prevention - Introduction
- Symantec Data Loss Prevention - Components
- Symantec Data Loss Prevention - Features & Use Cases
- Symantec Data Loss Prevention - System Requirements
- Symantec Data Loss Prevention - Appendix (extra information)
This provides a brief overview of Symantec Data Loss Prevention (DLP). Please note all the information is based prior to May 2016 and the full integration of Blue Coat Systems's set of solutions.
This document provides an overview of data loss prevention (DLP) technology. It discusses what DLP is, different DLP models for data in use, in motion, and at rest. It also covers typical DLP system architecture, approaches for data classification and identification, and some technical challenges. The document references DLP product websites and summarizes two research papers on using machine learning for automatic text classification to identify sensitive data for DLP systems.
Symantec Data Loss Prevention helps organizations address the serious problem of data loss by providing visibility into where sensitive data is located and how it is being used, enabling monitoring of data movement and detection of policy violations, and offering flexible options for protecting data and educating employees to prevent accidental or intentional data loss. Symantec is a leader in this field with the most highly rated products, largest customer base, and deepest expertise in helping customers improve security, comply with regulations, and reduce the costs of data breaches.
Data Loss Prevention (DLP) - Fundamental Concept - ErykEryk Budi Pratama
This document discusses data loss prevention (DLP) concepts and implementations. It begins with an overview of data governance and the data lifecycle. It then defines DLP, explaining how DLP solutions protect data in motion, at rest, and in use. Sample DLP deployments are shown, outlining key activities and considerations for implementation such as governance, infrastructure, and a phased approach. Finally, examples of DLP use cases are provided for data in motion like email and data in use on workstations.
Data loss prevention ensures critical corporate information is kept safely within networks and helps administrators control data transfers. It is important for maintaining corporate image, compliance, and avoiding penalties. DLP identifies sensitive data like credit cards, social security numbers, business plans, and financial records. It monitors, detects, prevents data leakage, and notifies users of violations while protecting sensitive information. Choosing a DLP product requires considering budget, in-house vs outsourcing needs, policies, incident response, and compatibility with existing infrastructure.
At the highest level, our mission continues to be about keeping our customers (companies and governments) safe from ever-evolving digital threats, so they are confident to move business forward. Our strategy to accomplish this mission centers around four key pillars: Advanced Threat Protection, Information Protection for On Premise and Cloud, Security as a Service -- all anchored by a Unified Security Analytics Platform. Symantec Data Loss Prevention is a foundational product in the Information Protection for On Premise and Cloud pillar.
Everyone knows that storing and accessing data and applications in the cloud and on mobile devices provides makes work much easier and productive by allowing employees to work everywhere they need to.
It allows for great business agility – applications are always up to date, new functionality and processes can be deployed and activated quickly and organizations can adjust things on the fly if they need to.
It also brings the convenience factor – all employees to work in the way that they need to, collaboration and sharing is made vastly easier with cloud applications and storage.
But it brings with it all the challenges of securing devices and applications that your don’t own, and whilst saying NO might be the right thing for security, end users will find a way around it. Right now, close to 30% of employees use their personal devices for work. And that number is on the rise, potentially turning BYOD into Bring Your Own Disaster.
The document discusses implementing a data loss prevention (DLP) system to protect sensitive information. It describes why DLP is needed due to growing costs of data breaches and regulations. It then explains the key components of DLP, including discovering sensitive data, monitoring its flow, enforcing policies, and reporting/auditing. The document outlines how DLP can be applied across endpoints, networks and data centers to classify data, discover risks, and enforce policies to prevent data loss and unauthorized use.
This document provides an overview and agenda for a Data Loss Prevention presentation. It discusses trends in data loss, how DLP works to discover, monitor and protect data, and case studies of how DLP helps different types of insider and outsider threats. It highlights the advantages of the Symantec DLP solution, including its accuracy, sophisticated workflow for incident response, ability to identify sensitive data with Data Insight, and zero-day content detection through machine learning. The appendix discusses Symantec's leadership in the DLP market and new features of the latest DLP product version.
Symantec Data Loss Prevention. Las tendencias mundiales nos muestran que el mayor porcentaje de perdida y robo de datos responde a la falta de visibilidad y el error en el manejo de los mismos. Conozca como prevenirse.
1. Data leakage prevention (DLP) refers to systems that identify, monitor, and protect confidential data in motion, in use, and at rest to prevent unauthorized transmission. DLP provides deep content analysis based on security policies.
2. There are three main types of DLP: network DLP to protect data in motion, endpoint DLP on devices to protect data in use, and embedded DLP within specific applications like email.
3. Key benefits of DLP include preventing data leakage, reducing costs of investigations and reputation damage, facilitating early risk detection, and increasing senior management comfort through compliance. However, DLP implementation risks include excessive false positives, software conflicts reducing performance, and improperly configured network modules missing
This document discusses data leakage prevention (DLP) and outlines best practices for implementing a DLP project. It defines DLP, explains how DLP technology works to monitor data in motion, at rest, and in use. The document recommends a multi-step DLP project that includes analyzing business environments and threats, classifying sensitive data, mapping data storage and business processes, assessing leakage channels, and selecting DLP tools. It also stresses the importance of organizational culture and policies to complement technical solutions and prevent data leakage.
All the essential information you need about DLP in one eBook.
As security professionals struggle with how to keep up with threats, DLP - a technology designed to ensure sensitive data isn't stolen or lost - is hot again. This comprehensive guide provides what you need to understand, evaluate, and succeed with today's DLP. It includes insights from DLP Experts, Forrester Research, Gartner, and Digital Guardian's security analysts.
What's Inside:
-The seven trends that have made DLP hot again
-How to determine the right approach for your organization
-Making the business case to executives
-How to build an RFP and evaluate vendors
-How to start with a clearly defined quick win
-Straight-forward frameworks for success
The Zero Trust Model of Information Security Tripwire
In today’s IT threat landscape, the attacker might just as easily be over the cubicle wall as in another country. In the past, organizations have been content to use a trust and verify approach to information security, but that’s not working as threats from malicious insiders represent the most risk to organizations. Listen in as John Kindervag, Forrester Senior Analyst, explains why it’s not working and what you can do to address this IT security shortcoming.
In this webcast, you’ll hear:
Examples of major data breaches that originated from within the organization
Why it’s cheaper to invest in proactive breach prevention—even when the organization hasn’t been breached
What’s broken about the traditional trust and verify model of information security
About a new model for information security that works—the zero-trust model
Immediate and long-term activities to move organizations from the "trust and verify" model to the "verify and never trust" model
DLP (Data Loss Protection) is NOT dead, but needs to be revisited in the context of new methodologies and threats. Here are some practical steps to improve your cybersecurity awareness and response to data loss.
This document discusses cloud security and provides an overview of McAfee's cloud security program. It begins with definitions of cloud computing and cloud security. It then analyzes the growth of the global cloud security market from 2012-2014. Next, it discusses McAfee's cloud security offerings, strengths, weaknesses, opportunities, threats and competitors in the cloud security space. It also provides details on some of McAfee's major customers. Finally, it discusses Netflix's move to the cloud and its cloud security strategy.
The presentation explains about Data Security as an industrial concept. It addresses
its concern on Data Loss Prevention in detail, from what it is, its approach, the best practices and
common mistakes people make for the same. The presentation concludes with highlighting
Happiest Minds' expertise in the domain.
Learn more about Happiest Minds Data Security Service Offerings
http://www.happiestminds.com/IT-security-services/data-security-services/
Data Loss Prevention: Challenges, Impacts & Effective StrategiesSeccuris Inc.
The document discusses data loss prevention challenges and strategies. It notes that data loss incidents have increased significantly in recent years and now cost organizations millions on average. Many data losses are caused by employees and insiders. The document outlines various types of employee, application, and process exposures that can lead to data loss and recommends assessing current controls and focusing on technical controls, access management, and process controls to better mitigate risks.
Adopting A Zero-Trust Model. Google Did It, Can You?Zscaler
Based on 6 years of creating zero trust networks at Google, the BeyondCorp framework has led to the popularization of a new network security model within enterprises, called the software-defined perimeter.
Identity— Help protect against identity compromise and identify potential breaches before they cause damage
Devices—Enhance device security while enabling mobile work and BYOD
Apps and Data—Boost productivity with cloud access while keeping information protected
Infrastructure—Take a new approach to security across your hybrid environment
Data Leakage is an important concern for the business organizations in this increasingly networked world these days. Unauthorized disclosure may have serious consequences for an organization in both long term and short term. Risks include losing clients and stakeholder confidence, tarnishing of brand image, landing in unwanted lawsuits, and overall losing goodwill and market share in the industry.
This document discusses the importance of data quality and data governance. It states that poor data quality can lead to wrong decisions, bad reputation, and wasted money. It then provides examples of different dimensions of data quality like accuracy, completeness, currency, and uniqueness. It also discusses methods and tools for ensuring data quality, such as validation, data merging, and minimizing human errors. Finally, it defines data governance as a set of policies and standards to maintain data quality and provides examples of data governance team missions and a sample data quality scorecard.
Cyber Resilience presented at the Malta Association of Risk Management (MARM) Cybercrime Seminar of 24 June 2013 by Mr Donald Tabone. Mr Tabone, Associate Director and Head of Information Protection and Business Resilience Services at KPMG Malta, presented a six-point action plan corporate entities can follow in order to reach a sustainable level of cyber resilience.
This document discusses data security and password protection. It explains that passwords should be strong, with a minimum of 6 characters including letters, numbers, and symbols. Longer passwords are more secure, with 12+ character passwords being very secure. The document also discusses encryption, explaining that encryption translates plain text into encrypted ciphertext using a key, and the same key is needed for decryption. Encryption securely protects data by allowing only authorized parties with the key to access it. Common encryption methods include DES, RSA, AES, Blowfish and Twofish. Free encryption tools include Veracrypt, Bitlocker and AxCrypt.
Intelligent compliance and risk management solutions.
First, we understand ‘compliance’ can have different meanings to various teams across enterprise. Compliance is an outcome of continuous risk management, involving compliance, risk, legal, privacy, security, IT and often even HR and finance teams which requires integrated approach to manage risk.
Let's start with the base pillar Compliance Management: compliance management is all about simplify risk assessment and mitigation in more automated way, providing visibility and insights to help meet compliance requirements.
Information Protection and Governance: we believe there is a huge opportunity for Microsoft to help our customers to know their data better, protect and govern data throughout its lifecycle in heterogenous environment. This is often the key starting point for many of our customers in their modern compliance journey – knowing what sensitive data they have, putting flexible, end-user friendly policies for both security and compliance outcomes, using more automation and intelligence.
Internal Risk Management: Internal risks are often what keeps business leaders up at night – regardless of negligent or malicious, identifying and being able to take action on internal risks are critical. The ability to quickly identify and manage risks from insiders (employees or contractors with corporate access) and minimize the negative impact on corporate compliance, competitive business position and brand reputation is a priority for organizations worldwide.
Last but not least, Discover and Respond: being able to discover relevant data for internal investigations, litigation, or regulatory requests and respond to them efficiently, and doing so without having to use multiple solutions and moving data in and out of systems to increase risk – is critical.
Best Practices for Implementing Data Loss Prevention (DLP)Sarfaraz Chougule
Vast amounts of your organization's sensitive data are accessible, stored, and used by authorized employees and partners on a host of devices and servers. Protecting that data where ever it is stored or travels is a top priority.
Symantec announced it is planning to offer Symantec Data Loss Prevention for Tablet, the first comprehensive data loss prevention (DLP) solution for the monitoring and protection of sensitive information on tablet computers. Available first for the Apple iPad, Symantec Data Loss Prevention for Tablet will help solve one of the most urgent problems facing security organizations today by providing content-aware protection for this remarkably popular new corporate endpoint. The solution is designed to maintain user productivity and protect an organization’s confidential data at the same time.
Check Point's data loss prevention (DLP) solution combines technology and processes to prevent data breaches, enforce data policies across the network, and educate and alert users without involving IT staff. The solution uses the MultiSpect detection engine to detect sensitive data and proprietary forms, and the UserCheck technology provides real-time user alerts and remediation. Check Point DLP is available as an appliance or software blade and integrates with existing Check Point security gateways.
This document provides an overview and agenda for a Data Loss Prevention presentation. It discusses trends in data loss, how DLP works to discover, monitor and protect data, and case studies of how DLP helps different types of insider and outsider threats. It highlights the advantages of the Symantec DLP solution, including its accuracy, sophisticated workflow for incident response, ability to identify sensitive data with Data Insight, and zero-day content detection through machine learning. The appendix discusses Symantec's leadership in the DLP market and new features of the latest DLP product version.
Symantec Data Loss Prevention. Las tendencias mundiales nos muestran que el mayor porcentaje de perdida y robo de datos responde a la falta de visibilidad y el error en el manejo de los mismos. Conozca como prevenirse.
1. Data leakage prevention (DLP) refers to systems that identify, monitor, and protect confidential data in motion, in use, and at rest to prevent unauthorized transmission. DLP provides deep content analysis based on security policies.
2. There are three main types of DLP: network DLP to protect data in motion, endpoint DLP on devices to protect data in use, and embedded DLP within specific applications like email.
3. Key benefits of DLP include preventing data leakage, reducing costs of investigations and reputation damage, facilitating early risk detection, and increasing senior management comfort through compliance. However, DLP implementation risks include excessive false positives, software conflicts reducing performance, and improperly configured network modules missing
This document discusses data leakage prevention (DLP) and outlines best practices for implementing a DLP project. It defines DLP, explains how DLP technology works to monitor data in motion, at rest, and in use. The document recommends a multi-step DLP project that includes analyzing business environments and threats, classifying sensitive data, mapping data storage and business processes, assessing leakage channels, and selecting DLP tools. It also stresses the importance of organizational culture and policies to complement technical solutions and prevent data leakage.
All the essential information you need about DLP in one eBook.
As security professionals struggle with how to keep up with threats, DLP - a technology designed to ensure sensitive data isn't stolen or lost - is hot again. This comprehensive guide provides what you need to understand, evaluate, and succeed with today's DLP. It includes insights from DLP Experts, Forrester Research, Gartner, and Digital Guardian's security analysts.
What's Inside:
-The seven trends that have made DLP hot again
-How to determine the right approach for your organization
-Making the business case to executives
-How to build an RFP and evaluate vendors
-How to start with a clearly defined quick win
-Straight-forward frameworks for success
The Zero Trust Model of Information Security Tripwire
In today’s IT threat landscape, the attacker might just as easily be over the cubicle wall as in another country. In the past, organizations have been content to use a trust and verify approach to information security, but that’s not working as threats from malicious insiders represent the most risk to organizations. Listen in as John Kindervag, Forrester Senior Analyst, explains why it’s not working and what you can do to address this IT security shortcoming.
In this webcast, you’ll hear:
Examples of major data breaches that originated from within the organization
Why it’s cheaper to invest in proactive breach prevention—even when the organization hasn’t been breached
What’s broken about the traditional trust and verify model of information security
About a new model for information security that works—the zero-trust model
Immediate and long-term activities to move organizations from the "trust and verify" model to the "verify and never trust" model
DLP (Data Loss Protection) is NOT dead, but needs to be revisited in the context of new methodologies and threats. Here are some practical steps to improve your cybersecurity awareness and response to data loss.
This document discusses cloud security and provides an overview of McAfee's cloud security program. It begins with definitions of cloud computing and cloud security. It then analyzes the growth of the global cloud security market from 2012-2014. Next, it discusses McAfee's cloud security offerings, strengths, weaknesses, opportunities, threats and competitors in the cloud security space. It also provides details on some of McAfee's major customers. Finally, it discusses Netflix's move to the cloud and its cloud security strategy.
The presentation explains about Data Security as an industrial concept. It addresses
its concern on Data Loss Prevention in detail, from what it is, its approach, the best practices and
common mistakes people make for the same. The presentation concludes with highlighting
Happiest Minds' expertise in the domain.
Learn more about Happiest Minds Data Security Service Offerings
http://www.happiestminds.com/IT-security-services/data-security-services/
Data Loss Prevention: Challenges, Impacts & Effective StrategiesSeccuris Inc.
The document discusses data loss prevention challenges and strategies. It notes that data loss incidents have increased significantly in recent years and now cost organizations millions on average. Many data losses are caused by employees and insiders. The document outlines various types of employee, application, and process exposures that can lead to data loss and recommends assessing current controls and focusing on technical controls, access management, and process controls to better mitigate risks.
Adopting A Zero-Trust Model. Google Did It, Can You?Zscaler
Based on 6 years of creating zero trust networks at Google, the BeyondCorp framework has led to the popularization of a new network security model within enterprises, called the software-defined perimeter.
Identity— Help protect against identity compromise and identify potential breaches before they cause damage
Devices—Enhance device security while enabling mobile work and BYOD
Apps and Data—Boost productivity with cloud access while keeping information protected
Infrastructure—Take a new approach to security across your hybrid environment
Data Leakage is an important concern for the business organizations in this increasingly networked world these days. Unauthorized disclosure may have serious consequences for an organization in both long term and short term. Risks include losing clients and stakeholder confidence, tarnishing of brand image, landing in unwanted lawsuits, and overall losing goodwill and market share in the industry.
This document discusses the importance of data quality and data governance. It states that poor data quality can lead to wrong decisions, bad reputation, and wasted money. It then provides examples of different dimensions of data quality like accuracy, completeness, currency, and uniqueness. It also discusses methods and tools for ensuring data quality, such as validation, data merging, and minimizing human errors. Finally, it defines data governance as a set of policies and standards to maintain data quality and provides examples of data governance team missions and a sample data quality scorecard.
Cyber Resilience presented at the Malta Association of Risk Management (MARM) Cybercrime Seminar of 24 June 2013 by Mr Donald Tabone. Mr Tabone, Associate Director and Head of Information Protection and Business Resilience Services at KPMG Malta, presented a six-point action plan corporate entities can follow in order to reach a sustainable level of cyber resilience.
This document discusses data security and password protection. It explains that passwords should be strong, with a minimum of 6 characters including letters, numbers, and symbols. Longer passwords are more secure, with 12+ character passwords being very secure. The document also discusses encryption, explaining that encryption translates plain text into encrypted ciphertext using a key, and the same key is needed for decryption. Encryption securely protects data by allowing only authorized parties with the key to access it. Common encryption methods include DES, RSA, AES, Blowfish and Twofish. Free encryption tools include Veracrypt, Bitlocker and AxCrypt.
Intelligent compliance and risk management solutions.
First, we understand ‘compliance’ can have different meanings to various teams across enterprise. Compliance is an outcome of continuous risk management, involving compliance, risk, legal, privacy, security, IT and often even HR and finance teams which requires integrated approach to manage risk.
Let's start with the base pillar Compliance Management: compliance management is all about simplify risk assessment and mitigation in more automated way, providing visibility and insights to help meet compliance requirements.
Information Protection and Governance: we believe there is a huge opportunity for Microsoft to help our customers to know their data better, protect and govern data throughout its lifecycle in heterogenous environment. This is often the key starting point for many of our customers in their modern compliance journey – knowing what sensitive data they have, putting flexible, end-user friendly policies for both security and compliance outcomes, using more automation and intelligence.
Internal Risk Management: Internal risks are often what keeps business leaders up at night – regardless of negligent or malicious, identifying and being able to take action on internal risks are critical. The ability to quickly identify and manage risks from insiders (employees or contractors with corporate access) and minimize the negative impact on corporate compliance, competitive business position and brand reputation is a priority for organizations worldwide.
Last but not least, Discover and Respond: being able to discover relevant data for internal investigations, litigation, or regulatory requests and respond to them efficiently, and doing so without having to use multiple solutions and moving data in and out of systems to increase risk – is critical.
Best Practices for Implementing Data Loss Prevention (DLP)Sarfaraz Chougule
Vast amounts of your organization's sensitive data are accessible, stored, and used by authorized employees and partners on a host of devices and servers. Protecting that data where ever it is stored or travels is a top priority.
Symantec announced it is planning to offer Symantec Data Loss Prevention for Tablet, the first comprehensive data loss prevention (DLP) solution for the monitoring and protection of sensitive information on tablet computers. Available first for the Apple iPad, Symantec Data Loss Prevention for Tablet will help solve one of the most urgent problems facing security organizations today by providing content-aware protection for this remarkably popular new corporate endpoint. The solution is designed to maintain user productivity and protect an organization’s confidential data at the same time.
Check Point's data loss prevention (DLP) solution combines technology and processes to prevent data breaches, enforce data policies across the network, and educate and alert users without involving IT staff. The solution uses the MultiSpect detection engine to detect sensitive data and proprietary forms, and the UserCheck technology provides real-time user alerts and remediation. Check Point DLP is available as an appliance or software blade and integrates with existing Check Point security gateways.
Humans Are The Weakest Link – How DLP Can HelpValery Boronin
SAS 2012 Official Video is available at http://www.youtube.com/watch?v=Vr8lmIhc0pk
Abstracts: All companies are invested in security, but far from all came to realize: employees’ awareness and education are the key factors to improve information protection and prevent data leaks. You can install most powerful DLP, encryption and other security tools, hire a lot of security officers and consulters to tune your business processes, eventually waste a lot of money and resources at security issues, but if end-users don’t understand threats, don’t know rules – they cannot follow internal policies and regulations, cannot correctly use appropriate tools. It’s all for nothing. Efficient information security strategy is to create a culture of awareness and enforcement – culture where users understand the consequences.
This session is about 3 main things:
1) What is user awareness in information security?
2) Why user awareness is required?
3) How to raise user awareness and what are key factors.
Practical recommendations for security user awareness program adopters and practitioners will be given. Role of the DLP in raising user awareness will be highlighted.
Related links:
http://www.youtube.com/watch?v=vXlyuGXAZzU – Valery Boronin on Data Luxury Protection at DLP Russia 2011 (in Russian)
Презентация с вебинара по комплексу McAfee DLP.
Основной упор был сделан на использование новых возможностей DLP Endpoint 9.4 для пресечения случайных/умышленных попыток передачи информации в чужие руки по различным каналам.
Edge pereira oss304 tech ed australia regulatory compliance and microsoft off...Edge Pereira
Edge Pereira presentation at Microsoft TechEd Australia. Session OSS304 Regulatory Compliance and Microsoft Office 365 - data leakage protection, dlp, privacy, sharepoint
This document summarizes Liwei Ren's presentation on differential compression. It begins with Ren's background and introduces differential compression. Ren then presents a mathematical model describing differences between files using edit operations. The document categorizes differential compression based on whether references and targets are in the same/different locations over the same/different times. Finally, it discusses three advanced topics in more depth: comparing local, remote, and in-place differential compression schemes; applying differential compression to executable files; and performing in-place file merging with local differential compression.
My testimony to NSTAC (http://www.dhs.gov/national-security-telecommunications-advisory-committee) on the need for more research data in big data networking analysis, better taxonomies/ontologies, and the need for more accessible tools, given December 8, 2015. A very insightful, thoughtful group of people. The administration really got it right with this one.
Securing Your Data for Your Journey to the CloudLiwei Ren任力偉
n the era of cloud computing, data security is one of the concerns for adopting cloud applications. In this talk, we will investigate a few general data security issues caused by cloud platforms: (a) Data security & privacy for the residence in cloud when using cloud SaaS or cloud apps; (b) Data leaks to personal cloud apps directly from enterprise networks; (c) Data leaks to personal cloud apps indirectly via BYOD devices.
Multiple technologies do exist for solving these data security issues. They are CASB , Cloud Encryption Gateway, Cloud DLP, and even traditional DLP. Those products or services are ad-hoc in nature. In long term, general cloud security technologies such as FHE (fully homomorphic encryption) or MPC (multi-party computation) should be implemented when they become practical.
New enterprise application and data security challenges and solutions apr 2...Ulf Mattsson
Ulf Mattsson presented on new enterprise application and data security challenges and solutions. He discussed how 20% of organizations are expected to budget for quantum computing projects by 2023 compared to less than 1% currently. He also summarized that web application security is needed based on Verizon's 2018 breach report showing many breaches originate from applications. Finally, he emphasized the importance of integrating security into the application development process from the beginning using approaches like SecDevOps and DevSecOps.
This document discusses using machine learning and big data technologies to improve security workflows. It describes the challenges of analyzing large amounts of security data from many sources to detect threats. Machine learning can help by analyzing patterns in the data at scale. The document introduces the Lambda Defense approach, which applies a lambda architecture to build a "central nervous system" for security. This combines batch and real-time machine learning models to detect threats based on both sequential and unordered behaviors.
IRJET- Empower Syntactic Exploration Based on Conceptual Graph using Searchab...IRJET Journal
This document discusses a proposed system for empowering syntactic exploration based on conceptual graphs using searchable symmetric encryption. It begins with an abstract that outlines using conceptual graphs and related natural language processing techniques to perform semantic search over encrypted cloud data. It then describes the system modules, including data owners who can upload and authorize access to encrypted files, data users who can search for files, and a cloud server that stores the outsourced encrypted data and indexes. Key algorithms discussed include named entity recognition, term frequency-inverse document frequency (TF-IDF) calculation, data encryption standard (DES) encryption, and hashed message authentication codes (HMACs) to identify duplicate documents. The proposed system architecture involves data owners encrypting and outsourcing documents
Bytewise approximate matching, searching and clusteringLiwei Ren任力偉
This document discusses bytewise approximate matching, searching, and clustering. It defines six matching problems - identicalness, containment, cross-sharing, similarity, approximate containment, and approximate cross-sharing. A framework is proposed that uses bytewise relevance to define matching, searching, and clustering problems and solutions. Current and future work includes algorithms, tools, and applications for approximate matching in domains like malware analysis, plagiarism detection, and digital forensics.
The document provides an overview of text mining, including:
1. Text mining analyzes unstructured text data through techniques like information extraction, text categorization, clustering, and summarization.
2. It differs from regular data mining as it works with natural language text rather than structured databases.
3. Text mining has various applications including security, biomedicine, software, media, business and more. It faces challenges in representing meaning and context from unstructured text.
Session 2 - Akyildiz, Beinecke, Yee at MLconf NYCMLconf
1. Prototyping machine learning models on big data has become expensive, painful, and slow due to the limitations of small data tools that do not scale and customized solutions that are hard to modify and maintain.
2. A new approach called Data Science 2.0 separates business logic (algorithm code) from implementation logic (infrastructure code) to allow models to be tested quickly and easily through automatic distribution and parallelization of data and computations.
3. Part-of-speech tagging can provide useful input for tasks like parsing and named-entity recognition. The document describes applying supervised classification for part-of-speech tagging on noisy data sets from news and Twitter, achieving accuracy rates up to 88.53
Implementation and Review Paper of Secure and Dynamic Multi Keyword Search in...IRJET Journal
This document proposes a secure tree-based search scheme for encrypted cloud data that supports multi-keyword ranked search and dynamic operations like deletion and insertion of documents. It combines the vector space model and TF-IDF model to construct a tree-based index and propose a "Greedy Depth-first Search" algorithm for efficient multi-keyword search. The scheme uses PEKS to encrypt the file index and queries while ensuring accurate relevance scoring between encrypted data. It aims to overcome security threats to keyword privacy in existing searchable encryption schemes and provide flexible dynamic operations on document collections in the cloud.
An Overview of Python for Data AnalyticsIRJET Journal
This document provides an overview of using Python for data analytics. It discusses how Python is well-suited for data science tasks due to its many preconfigured libraries. The key Python libraries for data analysis that are mentioned include NumPy, Pandas, Seaborn, and Matplotlib. The document also describes the typical steps in a data analysis process, such as data collection, cleaning, exploratory analysis, modeling, and creating data products. A case study is presented that demonstrates analyzing a dataset on world happiness using Python functions, libraries, and plotting capabilities.
(Recent) technology trends and bridges to gap in the localization industryLoctimize GmbH
Slides of the keynote presentation held by Daniel Zielinski during the frist Egyptian conference on Translation, Localization and Interpreting in Cairo on April 16, 2019
Data science is a multidisciplinary field that uses statistics, programming, and machine learning to extract knowledge and insights from large amounts of data. It has various applications like email spam detection, medical diagnosis, predicting stock prices, and self-driving cars. The document discusses how the size of data is rapidly increasing and will continue to do so, with an estimated 463 exabytes of new data generated daily by 2025. It also outlines common tasks performed by data scientists like understanding business problems, analyzing and visualizing data, making recommendations, and predicting future values. Theoretical and practical aspects of data science are also covered, along with examples of how it relates to other fields.
Automatic Classification of Springer Nature Proceedings with Smart Topic MinerFrancesco Osborne
The document summarizes research on automatically classifying Springer Nature proceedings using the Smart Topic Miner (STM). STM extracts topics from publications, maps them to a computer science ontology, selects relevant topics using a greedy algorithm, and infers tags. It was tested on 8 Springer Nature editors who found STM accurately classified 75-90% of proceedings and improved their work. However, STM is currently limited to computer science and occasional noisy results were found in books with few chapters. Future work aims to expand STM to characterize topic evolution over time and directly support author tagging.
The advent of Big Data has presented nee challenges in terms of Data Security. There is an increasing need of research
in technologies that can handle the vast volume of Data and make it secure efficiently. Current Technologies for securing data are
slow when applied to huge amounts of data. This paper discusses security aspect of Big Data.
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
The document summarizes an agenda for a presentation on machine learning and data science. It includes an introduction to CRISP-DM (Cross Industry Standard for Data Mining), guided analytics, and a KNIME demo. It also discusses the differences between machine learning, artificial intelligence, and data science. Machine learning produces predictions, artificial intelligence produces actions, and data science produces insights. It provides an overview of the CRISP-DM process for data mining projects including the business understanding, data understanding, data preparation, modeling, evaluation, and deployment phases. It also discusses guided analytics and interactive systems to assist business analysts in finding insights and predicting outcomes from data.
This document provides an overview of data science tools, techniques, and applications. It begins by defining data science and explaining why it is an important and in-demand field. Examples of applications in healthcare, marketing, and logistics are given. Common computational tools for data science like RapidMiner, WEKA, R, Python, and Rattle are described. Techniques like regression, classification, clustering, recommendation, association rules, outlier detection, and prediction are explained along with examples of how they are used. The advantages of using computational tools to analyze data are highlighted.
This document provides a summary of Alberto Trombetta's academic background and research interests. It includes:
- His education, including a PhD from the University of Torino in Computer Science and a Laurea from the University of Milano in Computer Science.
- His professional experience, including positions as a post-doc, visiting researcher and assistant professor at various universities. He is currently an assistant professor at the University of Insubria.
- His main research interests, which include management of imprecise data, query languages for semistructured data, data integration, business process management, fault tolerance, trust management, and privacy and security in data management. He has also worked on several funded
This document provides an introduction and overview of the INF2190 - Data Analytics course. It discusses the instructor, Attila Barta, details on where and when the course will take place. It then provides definitions and history of data analytics, discusses how the field has evolved with big data, and references enterprise data analytics architectures. It contrasts traditional vs. big data era data analytics approaches and tools. The objective of the course is described as providing students with the foundation to become data scientists.
Similar to DLP Systems: Models, Architecture and Algorithms (20)
This document provides an introduction to deep neural networks (DNNs) by a Dr. Liwei Ren. It defines DNNs from both technical and mathematical perspectives. DNNs are composed of three main elements - architecture, activity rule, and learning rule. The architecture determines the network's capability and is typically a directed graph with weights, biases, and activation functions. Gradient descent and backpropagation are commonly used as the learning rule to update weights and minimize error. Universal approximation theorems show that both shallow and deep neural networks can approximate functions, with deep networks potentially being more efficient. Examples of DNN applications include image recognition. Security issues are also briefly mentioned.
Extending Boyer-Moore Algorithm to an Abstract String Matching ProblemLiwei Ren任力偉
The bad character shift rule of Boyer-Moore string search algorithm is studied in this paper for the purpose of extending it to more general string match problems. An abstract problem of string match is defined in general. An optimized string match algorithm based one the bad character heuristics is proposed to solve the abstract match problem efficiently.
Near Duplicate Document Detection: Mathematical Modeling and AlgorithmsLiwei Ren任力偉
Near-duplicate document detection is a well-known problem in the area of information retrieval. It is an important problem to be solved for many applications in IT industry. It has been studied with profound research literatures. This article provides a novel solution to this classic problem. We present the problem with abstract models along with additional concepts such as text models, document fingerprints and document similarity. With these concepts, the problem can be transformed into keyword like search problem with results ranked by document similarity. There are two major techniques. The first technique is to extract robust and unique fingerprints from a document. The second one is to calculate document similarity effectively. Algorithms for both fingerprint extraction and document similarity calculation are introduced as a complete solution.
Monotonicity of Phaselocked Solutions in Chains and Arrays of Nearest-Neighbo...Liwei Ren任力偉
This document summarizes a paper on phaselocked solutions in chains and arrays of coupled oscillators. It introduces the topic of coupled oscillators and their importance in modeling neural activity. It describes previous work analyzing one-dimensional chains of oscillators using continuum approximations. The current paper aims to rigorously prove the existence of phaselocked solutions in chains without requiring a continuum limit. It also analyzes two-dimensional arrays by decomposing them into independent one-dimensional problems under certain frequency distributions. Key results include proving monotonicity of phaselocked solutions and spontaneous formation of target patterns in two-dimensional arrays with isotropic synaptic coupling.
Phase locking in chains of multiple-coupled oscillatorsLiwei Ren任力偉
This document summarizes a research paper that studied phase locking in chains of oscillators with coupling beyond nearest neighbors. It introduced a model for such chains using piecewise linear coupling functions. The paper proved the existence of phase locked solutions for this model by using a homotopy method to smoothly transform the model into more realistic coupling functions. It discussed differences between models with multiple coupling versus nearest neighbor coupling only, and highlighted the importance of studying coupling beyond just the nearest neighbors seen in some biological systems.
Binary Similarity : Theory, Algorithms and Tool EvaluationLiwei Ren任力偉
This document summarizes Liwei Ren's presentation on binary similarity algorithms. It discusses three existing algorithms - ssdeep, sdhash, and TLSH - and proposes a new algorithm called TSFP. TSFP represents files as bags of blocks and measures similarity based on the overlap of blocks between two files. It is suggested as a way to solve similarity search and clustering problems by creating an index of files represented as TSFPs. The presentation concludes with inviting questions from the audience.
IoT Security: Problems, Challenges and SolutionsLiwei Ren任力偉
As a novel computing platform in network, IoT will bring many security challenges to enterprise networks, and create new opportunities for security industry. This talk will provide a general overview of enterprise network security problems, especially the data security, caused by IoT. After that, a few existing security technologies are evaluated as necessary elements of a holistic network security that cover IoT devices. These technologies include : (a) IoT security monitoring and control; (b) FOTA for firmware vulnerability management; (c) NetFlow based big data security analysis. In the end, the practice of standard security protocols (such as OpenIoC and IODEF) will be strongly advocated for delivering effective IoT security solutions.
Bytewise Approximate Match: Theory, Algorithms and ApplicationsLiwei Ren任力偉
Byte-wise approximate matching has become an important field in computer science that includes not only practical value but also theoretical significance. This talk will use six cases to define and describe the concept of approximate matching rigorously. They are identicalness, containment, cross-sharing, similarity, approximate containment and approximate cross-sharing. Based on the concept of approximate matching, one can propose a theoretic framework that consists of many problems of approximate matching, searching & clustering. Algorithmic solutions and challenges of the matching problems will be briefed as well as theoretic analysis. This framework also includes some elements of our previous works in both document fingerprinting problem and mathematical evaluation of similarity digest schemes { TLSH, ssdeep, sdhash }. In the end, we will discuss applications in various security disciplines.
Mathematical Modeling for Practical ProblemsLiwei Ren任力偉
Mathematical modeling is an important step for developing many advanced technologies in various domains such as network security, data mining and etc… This lecture introduces a process that the speaker summarizes from his past practice of mathematical modeling and algorithmic solutions in IT industry, as an applied mathematician, algorithm specialist or software engineer , and even as an entrepreneur. A practical problem from DLP system will be used as an example for creating math models and providing algorithmic solutions.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofsAlex Pruden
This paper presents Reef, a system for generating publicly verifiable succinct non-interactive zero-knowledge proofs that a committed document matches or does not match a regular expression. We describe applications such as proving the strength of passwords, the provenance of email despite redactions, the validity of oblivious DNS queries, and the existence of mutations in DNA. Reef supports the Perl Compatible Regular Expression syntax, including wildcards, alternation, ranges, capture groups, Kleene star, negations, and lookarounds. Reef introduces a new type of automata, Skipping Alternating Finite Automata (SAFA), that skips irrelevant parts of a document when producing proofs without undermining soundness, and instantiates SAFA with a lookup argument. Our experimental evaluation confirms that Reef can generate proofs for documents with 32M characters; the proofs are small and cheap to verify (under a second).
Paper: https://eprint.iacr.org/2023/1886
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
National Security Agency - NSA mobile device best practices
DLP Systems: Models, Architecture and Algorithms
1. Copyright 2011 Trend Micro Inc.Classification 8/2/2013 1
DLP Systems: Models, Architecture and Algorithms
Liwei Ren, Ph.D, Sr. Architect
Data Security Research, Trend Micro™
May, 2013, UCSC, Santa Cruz, CA
2. Copyright 2011 Trend Micro Inc.
Backgrounds:
• Liwei Ren, Data Security Research, Trend Micro™
– Research interests:
• DLP, differential compression, data de-duplication, file transfer protocols, database
security, and practical algorithms.
– Education:
• MS/BS in mathematics, Tsinghua University, Beijing
• Ph.D in mathematics, MS in information science, University of Pittsburgh
– Relevant works for this talk:
• Provilla, Inc : a startup focusing on endpoint based DLP products and solutions. It was
co-founded by Liwei and acquired by Trend Micro a few years ago.
• Patents --- Liwei holds 10+ patents for DLP, mostly, for DLP content inspection
techniques.
• Trend Micro™
– Global security software company with headquarter in Tokyo, and R&D centers in
Nanjing, Taipei and Silicon Valley.
– One of top 3 anti-malware vendors
– Pioneer in cloud security
– DLP vendor via Provilla™ acquisition
2
3. Copyright 2011 Trend Micro Inc.
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems
• Practical Algorithms for DLP
• Summary
• References
• Q&A
Classification 8/2/2013 3
4. Copyright 2011 Trend Micro Inc.
What Is Data Loss Prevention?
• What is Data Loss Prevention?
– Data loss prevention (aka, DLP) is a data security technology that detects
data breach incidents in timely manner and prevents them by monitoring
data in-use (endpoints), in-motion (network traffic), and at-rest (data
storage) in an organization’s network.
– A.k.a. ,Data Leak Prevention (DLP),Information Leak Prevention (ILP) or
Information Leak Detection and Prevention (ILDP).
Classification 8/2/2013 4
5. Copyright 2011 Trend Micro Inc.
What Is Data Loss Prevention?
• A Few Elements of a DLP system:
– WHAT data to protect?
– WHO leaks data?
– HOW the data is leaked?
– WHERE to protect data?
– WHAT actions to take?
Classification 8/2/2013 5
6. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• WHAT data to protect?
Classification 8/2/2013 6
• WHO causes data leaks?
External Hackers
7. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
Three Data States:
Classification 8/2/2013 7
8. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• Data-in-use:
Classification 8/2/2013 8
• Data-in-motion:
9. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• Data-at-rest at risk:
Classification 8/2/2013 9
10. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• DLP for data-in-use and data-in-motion:
Classification 8/2/2013 10
• A conceptual view!
11. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• DLP for data-in-use and data-in-motion:
Classification 8/2/2013 11
• A technical view!
12. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• DLP Model for data-in-use and data-in-motion:
– If DATA flows from SOURCE to DESTINATION via CHANNEL, the
system takes ACTIONs
Classification 8/2/2013 12
– DATA specifies what confidential data is
– SOURCE can be an user, an endpoint, an email
address, or a group of them
– DESTINATION can be an endpoint, an email address,
or a group of them, or simply the external world
– CHANNEL indicates the data leak channel such as
USB, email, network protocols and etc
– ACTION is the action that needs to be taken by the
DLP system when an incident occurs
13. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• DLP for data-at-rest:
Classification 8/2/2013 13
14. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• DLP Model for data-at-rest:
– If DATA resides at SOURCE , the system takes ACTIONs
Classification 8/2/2013 14
– DATA specifies what the sensitive data (which has
potential for leakage) is
– SOURCE can be an endpoint, a storage server or a
group of them
– ACTION is the action that needs to be taken by the
DLP system when confidential data is identified at
rest.
16. Copyright 2011 Trend Micro Inc.
Concepts, Models and Architecture
• Typical DLP system architecture:
Classification 8/2/2013 16
17. Copyright 2011 Trend Micro Inc.
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
•Content Inspection Problems
• Practical Algorithms for DLP
• Summary
• References
• Q&A
Classification 8/2/2013 17
18. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Two fundamental problems for a DLP system:
Classification 8/2/2013 18
• It is a pair of problems that always come together:
• One determines data sensitivity based on what has been
defined.
19. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Four typical approaches for <defining, determining>
sensitive data in a DLP system:
Classification 8/2/2013 19
1. Document fingerprinting
2. Database record fingerprinting
3. Multiple Keyword matching
4. Regular expression matching
20. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Document fingerprinting:
• A technique for identifying modified versions of known documents
• Problem Definition (Model 1):
– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T, one needs to determine if there exist at least a
document t ϵ S such that T and t share common textual content
significantly, where multiple returned documents are ranked by how
much common content are shared.
Classification 8/2/2013 20
21. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• An alternative model (Model 2):
– Let S= { T1, T2, …,Tn} be a set of known texts
– Given a query text T and X%, one needs to determine if there exist at
least a text t ϵ S such that SIM(T,t)≥ X%, where SIM() is a function to
measure the similarity between two texts.
• Multiple documents are ranked by the percentiles .
Classification 8/2/2013 21
22. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Database record fingerprinting:
– A technique for identifying sensitive data records within a text.
– A.k.a., Exact Match in DLP field
• Use Case:
– We have several personal data records of <SSN, Phone#, address>
that are included in a text, we want to extract all records from the
text to determine the sensitivity of the file.
Classification 8/2/2013 22
23. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
Hhhhhdds ghghg 178-76-6754 ggkjkfddfdkkkk879-45-6785kjkjjk 43
Atword Street, Pittsburgh, PA 15260 kllkll 412-876-6789 kjkjjkj 76
Parkview Ave, Sunnyvale, CA 94086 hhsjskkdhjhjhj 408-780-8876
hjhjkjkjjj 159-87-8965 hjhjhjhjmnnmnxcbls w243 54y45 wefddew
dddw3n nn xxxxxxxxxx
23
SSN Phone # Address
178-76-6754 412-876-6789 43 Atword Street, Pittsburgh, PA 15260
159-87-8965 408-780-8876 76 Parkview Ave, Sunnyvale, CA 94086
…… …… ……
An example: a text contains a few data records:
24. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Problem Definition (Model 3) :
– Let S= { R1, R2, …,Rn} be a set of known data records from a same table.
– Given any text T, one needs to extract all records or sub-records from T
while the record cells may appear randomly within the text.
Classification 8/2/2013 24
25. Copyright 2011 Trend Micro Inc.
Content Inspection Problems
• Problem Definition for Keyword Match:
– Let S= {K1,K2,…,Kn} be a dictionary of keywords.
– Given any text T, one needs to identify all keyword occurrences in T.
• Problem Definition for RegEx Match:
– Let S= {P1,P2,…,Pm} be a set of RegEx patterns.
– Given any text T, one needs to identify all pattern instances from T.
Classification 8/2/2013 25
Easy problems?
– Not at all! For large n and m, one will
have performance issue.
– That’s the problem of scalability.
– Scalable algorithms must be provided.
26. Copyright 2011 Trend Micro Inc.
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems
• Practical Algorithms for DLP
• Summary
• References
• Q&A
Classification 8/2/2013 26
27. Copyright 2011 Trend Micro Inc.
Practical Algorithms for DLP
• We investigate some algorithms for 2 problems:
Classification 8/2/2013 27
1. Document fingerprinting
2. Multiple keyword matching
Assumption: a text T is a sequence of UTF-8 characters without
loss of generality.
28. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
• Lets investigate algorithmic solutions for Model 2 ( document
fingerprinting).
• Analysis for Solution:
1. We need to construct the function SIM(T,t). For example:
– SIM(T,t) = |T ∩t| /Min(|T|,|t|) based on common sub-strings.
2. An Obvious Challenge:
– If n is large, say, in scale of millions, we can not compute SIM(T, Tk) one by one
to find the t that satisfies SIM(T,t) ≥ X%
– We need to figure out an approach that can identify a possible candidate quickly.
3. General search engines like Google use keywords to index/identify
the documents. Should we? There are too many keywords and
language dependency. The answer is NO.
4. So, which features can we use for indexing/searching?
– One answer is documents fingerprints.
28
29. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
• What are document fingerprints?
– A fingerprint is a hash value
– One text has multiple fingerprints
– Unique to the text: two irrelevant texts do not share any fingerprints.
– Robustness: it can survive moderate textual changes.
29
30. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
• How to extract fingerprints from a text?
– Anchoring point:
• A point in the text that can endure the moderate changes.
• Its neighborhood (of fixed size) is unique to the text
– We select a few anchoring points to fingerprints:
• To generate hash values around their neighborhoods.
• These hash values are the fingerprints
30
•Samples of anchoring points and their neighborhood:
Thereareabundantliteraturesonhowtogeneratedifferencebetween
twofilesBasicallytherearetwofundamentalapproachestoattackthisgenericp
roblemLCSmodelwhereLCSstandsforlargestcommonsubsequenceCalculate
thelargestcommonsubsequenceoftwostringFindasequenceofeditoperation
sbasedontheLCSsothatonecanapplytheeditoperationstothereferencefiletoc
onstructthetargetfileBlock movemodel
31. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
• Conclusion : we have a solution that consists of two
algorithms and one search technology:
– An algorithm for computing SIM(T,t)
– An algorithm for fingerprint generator FPGEN(T)
– Fingerprint search engine
31
32. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
• Fingerprint generation algorithm 1:
– INPUT: String T
•Select top M candidate characters based on a score function
– Character frequency n
– Character positions in the text T: P(1), …, P(n)
– SCORE(c) = SQRT(D(n) * [ P(n)-P(1)] / SQRT(D)
» Where D= [(P(2)-P(1)]2+ [(P(3)-P(2)] 2 + … + *(P(n)-P(n-1)] 2
•For each selected character c
– Create a hash around the neighborhood at each occurrence
– Sort these hashes
– Select the top N hashes
– These N hashes are fingerprints
– OUTPUT: M*N fingerprints
32
Note 2: Two keys of this algorithm are (a)
the score function; (b)sorting the hashes.
Note 1: M and N are pre-defined.
33. Copyright 2011 Trend Micro Inc.
33
Document Fingerprinting Algorithms
• About the score function:
– Why SQRT(n) ?
• Measurement of frequency for the given character
• The larger the value, more stable the character is
– Why [ P(n)-P(1)] / SQRT(D) ?
• Measurement of distribution for the given character
• The larger the value, more even distributed the character, and more
stable the character;
• WHY? Think about a constrained optimization problem:
– min f(X1,X2 , … Xm) = X1
2+ X2
2 + … Xm
2
» subject to
» X1+ X2 + … Xm = c AND
» Xk ≥ 0, k=1,2,…,m
Note: The solution of the
optimization problem is Xk
= c/m, k=1,2,…,m
34. Copyright 2011 Trend Micro Inc.
Document Fingerprinting Algorithms
There are alternative algorithms to construct a
fingerprint generation function.
34
We recently constructed algorithm 2:
– A novel approach based on rolling hash function
H(x);
– It selects anchoring points with first filter H(x) = 0
mod p;
– It further selects anchoring points with a heuristic
second filter.
– It also employs the asymmetric architecture of
fingerprint match;
Note 1: The anchoring
points have better
distribution across text.
Note 2: Two keys of this algorithm are (a) Rolling hash;
(b)Asymmetric use of two filters.
35. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
Essentially, it is a multi-pattern
string match problems.
35
Problem Definition:
– Let S={P1,P2,…,Pk} be multiple short strings as
patterns;
– Given any string T, one needs to identify all pattern
occurrences in T.
36. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
Existing string match algorithms:
36
Algorithm Type
Naïve string match One pattern
Knuth–Morris–Pratt One pattern
Boyer-Moore One pattern
Boyer-Moore-Horspool One pattern
Boyer-Moore-Horspool-Raita One pattern
Rabin-Karp Multi-patterns
Aho-Corasick Multi-patterns
Sun-Manber Multi-patterns
37. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
37
Key elements of the algorithm:
– Character comparison can be made from right to left, starting from the end of
the pattern.
– Ending Character Heuristics
• Consider that we are pointing to character R[i] and try to compare it with the
ending character of P
• Bad character
– If R[i- ≠P,m- and R,i- is not included in P’s alphabet, then it is safe for the pointer to skip
m positions arriving at R[i+m].
– If R[i- ≠P,m-, R,i- is included in P’s alphabet, and R,i-’s last occurrence within P has
distance q from the end of P, then it is safe for the pointer to skip q positions arriving at
R[i+q].
• Good character
– If R[i] =P[m] , P is not matched , and R[i] has no other occurrences within P, then it is safe
for the pointer to skip m positions arriving at R[i+m].
– If R[i] =P[m] , P is not matched and R[i-’s last occurrence other than P,m- has distance q
from the end of P, then it is safe for the pointer to skip q positions arriving at R[i+q].
• Matched instance
– If R[i] =P[m] and P is matched, then save the instance.
– It is almost safe to move the pointer to skip m positions arriving at R[i+m].
Boyer-Moore-Horspool (BMH) Algorithm
38. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
38
• Rabin-Karp Algorithm
– Hash based string match
• Rabin-Karp hash function H(S):
– For a given string S = x1x2…xm with length m, a hash function can be
constructed as:
• H(S) = x1bm-1 + x2 bm-2 + … + xm-1 b + xm mod q
• Where b is a base number, usually we take b=256 , and q is a big prime
number.
– For pattern P, H(P) = p1bm-1 + p2 bm-2 + … + pm-1 b + pm mod q
– If we denote Rk = R[k,k+m-1], we can derive H(Rk+1) from H(Rk) with
relatively small cost
– H(Rk+1) = [ H(Rk) – rkbm-1 ] b + rk+m mod q
– This is an iterative formula which is a common practice for algorithm
optimization
39. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
39
• Rabin-Karp hash function:
– The quantity bm-1 mod q can be pre-calculated to save CPU time.
– For each iteration, we only need 5 arithmetic operations.
• It can be further reduced to 4
• One considers the number rkbm-1
– Horner’s rule
• H(S) = (…( (x1b + x2)b + x3) b + … + x m-1 ) b + xm mod q
• Yet another formula for performance tuning
40. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
40
• Rabin-Karp algorithm for multiple patterns:
– Input:
• String R, multiple patterns {P1,…,Pk},
• n= Length(R), mj =Length(Pj), q, b,
– Procedure:
• Step 0:
– Let m = Min(mk)
– Calculate the number bm-1 mod q
– Calculate all H(Pj,1,…,m-) (j=1,..,k) and H(R1) by Horner’s rule
• Step 1: Let i=1
• Step 2:
If there exists j in *1,2,…,k+ such that
H(Pj,1,…,m-) = H(Ri) and Pj = R[i,…, mj +i-1],
it is a match and output the instance
• Step 3: i = i + 1
• Step 4: If i > n-m, stop
• Step 5: Calculate H(Ri+1) using the iterative formula.
• Step 6 Go to step 2
– Output: All matched instances
41. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
41
A practical hybrid method:
– BMH or Rabin-Karp
– If k < Magic-number,
• Use BMH k times,
• Otherwise, use Rabin-Harp
– Magic-number=100 is my exercise in DLP products.
Rabin-Karp has its weakness :
• when Min({Length(Pi)| i =1,2,…,k +) is
small, say, less than 4, we have trouble.
• We need to introduce efficient multiple
pattern match for short patterns.
42. Copyright 2011 Trend Micro Inc.
Multiple Keyword Match
42
We have a complimentary solution to RK algorithm when
handling multiple short patterns
– This is Reverse-trie matching algorithm.
A reverse-trie presents a set of keywords,
especially, it is good for CJK languages in
UTF-8 encoding :
c d
b
a
c
b a
a
root
The keyword set: {abc,abcd,acd}
43. Copyright 2011 Trend Micro Inc.
Agenda
• What is Data Loss Prevention (DLP) ?
• Concepts, Models, Architecture
• Content Inspection Problems
• Practical Algorithms for DLP
• Summary
• References
• Q&A
Classification 8/2/2013 43
44. Copyright 2011 Trend Micro Inc.
Summary
• What DLP is.
• DLP Security Model
• Architecture of a DLP System
• Four Content Inspection Problems
• Two Algorithms for DLP Content Inspection
– Document Fingerprinting
– Multi-keyword matching
Classification 8/2/2013 44
45. Copyright 2011 Trend Micro Inc.
References
• Liwei Ren et al., Document fingerprinting with asymmetric selection of anchor
points, US patent 8359472
• Liwei Ren et al., Two tiered architecture of named entity recognition engine, US
patent 8321434.
• Yingqiang Lin el al., Scalable document signature search engine, US patent
8266150
• Liwei Ren et al., Fingerprint based entity extraction, US patent 7950062
• Liwei Ren et al., Document match engine using asymmetric signature generation,
US patent 7860853
• Liwei Ren et al., Match engine for querying relevant documents, US patent
7747642
• Liwei Ren et al., Matching engine with signature generation, US patent 7516130
Classification 8/2/2013 45
47. Copyright 2011 Trend Micro Inc.
Thank You!
Classification 8/2/2013 47
Innovation is not a part
time job, and it is not even
a full-time job. It’s a life
style.