Data Scientists are going to need to pay attention to the EU General Data Protection Regulation (GDPR), set to be published early 2016. Fines for violation are massive.
Big Data Expo 2015 - Data Science Innovation Privacy ConsiderationsBigDataExpo
Data science techniques are capable of producing unanticipated insights from data, with many of these insights potentially crossing the boundary from personalized into intrusive and even generating PII from seemingly anonymous data. Our ability to mathematically derive insights increases with the rise of highly personalized technologies such as mobile devices,
wearables and the internet of things. At the same time, inexpensive
noSQL data stores and cloud technologies have dramatically lowered the threshold for an organization to archive Big Data “just in case”, without truly understanding the data privacy ramifications.
Beginning with an overview of the emerging field of data science, we will discuss how efforts to increasingly produce and leverage personalized
insights interplay with implicit and explicit privacy concerns. The
discussion will cover a range of analytic methodologies, data stores and data sources as well as data protection and the balance between appropriate and inappropriate personalization.
How to protect your business post EU-US Privacy ShieldAndreas Wild
This document discusses steps organizations should take after the end of the EU-US Privacy Shield framework. It recommends immediately reassessing all data flows from the EU to non-EU countries. Organizations should conduct a data mapping and case-by-case assessment of whether standard contractual clauses provide adequate protection. They should consider implementing additional safeguards like encryption for existing EU data flows. The document also provides tips for protecting business in the meantime such as documenting GDPR compliance and assessing data paths and potential leakages. Long term protection is through implementing strong data governance policies.
Expanded top ten_big_data_security_and_privacy_challengesTom Kirby
The document discusses the top 10 security and privacy challenges of big data. It begins by explaining how big data has expanded through streaming cloud technology, rendering traditional security mechanisms inadequate. It then outlines a 3-step process used to identify the top 10 challenges: 1) interviewing CSA members and reviewing trade journals to draft an initial list, 2) studying published solutions, and 3) characterizing remaining problems as challenges if solutions did not adequately address problem scenarios. The top 10 challenges are then grouped into 4 aspects: infrastructure security, data privacy, data management, and integrity and reactive security. The first challenge discussed in detail is securing computations in distributed programming frameworks.
Integris Software, the global leader in data privacy automation, helps enterprises discover and control the use of sensitive data in a way that protects privacy and fuels innovation. Visit us https://integris.io/
Big data security challenges and recommendations!cisoplatform
What will you learn:
- Key Insights on Existing Big Data Architecture
- Unique Security Risks and Vulnerabilities of Big Data Technologies
- Top 5 Solutions to mitigate these security challenges
The REAL Impact of Big Data on PrivacyClaudiu Popa
The awesome promise of Big Data is tempered by the need to protect personal information. Data scientists must expertly navigate the legislative waters and acquire the skills to protect privacy and security. This talk provides enterprise leaders with answers and suggests questions to ask when the time comes to consider the vast opportunities offered by big data.
Big data contains valuable information— some of it sensitive customer data—that can be a honeypot for internal and external attackers. Given the risk involved, organizations must proactively enhance defenses and prevent data breaches. The four steps outlined in this deck, help organizations to develop a holistic approach to data security and privacy.
Privacy Secrets Your Systems May Be TellingRebecca Leitch
Privacy has overtaken security as a top concern for many organizations. New laws such as GDPR come with steep fines and stringent rules, and more are certainly to come. Attend this webcast to learn how everyday business operations put customer privacy data at risk. More importantly understand best practices on protecting this data and dealing with disclosure requirements. Topics include:
* Types of privacy and threats to them
* How is privacy different than security?
* Business systems putting you most at risk
Big Data Expo 2015 - Data Science Innovation Privacy ConsiderationsBigDataExpo
Data science techniques are capable of producing unanticipated insights from data, with many of these insights potentially crossing the boundary from personalized into intrusive and even generating PII from seemingly anonymous data. Our ability to mathematically derive insights increases with the rise of highly personalized technologies such as mobile devices,
wearables and the internet of things. At the same time, inexpensive
noSQL data stores and cloud technologies have dramatically lowered the threshold for an organization to archive Big Data “just in case”, without truly understanding the data privacy ramifications.
Beginning with an overview of the emerging field of data science, we will discuss how efforts to increasingly produce and leverage personalized
insights interplay with implicit and explicit privacy concerns. The
discussion will cover a range of analytic methodologies, data stores and data sources as well as data protection and the balance between appropriate and inappropriate personalization.
How to protect your business post EU-US Privacy ShieldAndreas Wild
This document discusses steps organizations should take after the end of the EU-US Privacy Shield framework. It recommends immediately reassessing all data flows from the EU to non-EU countries. Organizations should conduct a data mapping and case-by-case assessment of whether standard contractual clauses provide adequate protection. They should consider implementing additional safeguards like encryption for existing EU data flows. The document also provides tips for protecting business in the meantime such as documenting GDPR compliance and assessing data paths and potential leakages. Long term protection is through implementing strong data governance policies.
Expanded top ten_big_data_security_and_privacy_challengesTom Kirby
The document discusses the top 10 security and privacy challenges of big data. It begins by explaining how big data has expanded through streaming cloud technology, rendering traditional security mechanisms inadequate. It then outlines a 3-step process used to identify the top 10 challenges: 1) interviewing CSA members and reviewing trade journals to draft an initial list, 2) studying published solutions, and 3) characterizing remaining problems as challenges if solutions did not adequately address problem scenarios. The top 10 challenges are then grouped into 4 aspects: infrastructure security, data privacy, data management, and integrity and reactive security. The first challenge discussed in detail is securing computations in distributed programming frameworks.
Integris Software, the global leader in data privacy automation, helps enterprises discover and control the use of sensitive data in a way that protects privacy and fuels innovation. Visit us https://integris.io/
Big data security challenges and recommendations!cisoplatform
What will you learn:
- Key Insights on Existing Big Data Architecture
- Unique Security Risks and Vulnerabilities of Big Data Technologies
- Top 5 Solutions to mitigate these security challenges
The REAL Impact of Big Data on PrivacyClaudiu Popa
The awesome promise of Big Data is tempered by the need to protect personal information. Data scientists must expertly navigate the legislative waters and acquire the skills to protect privacy and security. This talk provides enterprise leaders with answers and suggests questions to ask when the time comes to consider the vast opportunities offered by big data.
Big data contains valuable information— some of it sensitive customer data—that can be a honeypot for internal and external attackers. Given the risk involved, organizations must proactively enhance defenses and prevent data breaches. The four steps outlined in this deck, help organizations to develop a holistic approach to data security and privacy.
Privacy Secrets Your Systems May Be TellingRebecca Leitch
Privacy has overtaken security as a top concern for many organizations. New laws such as GDPR come with steep fines and stringent rules, and more are certainly to come. Attend this webcast to learn how everyday business operations put customer privacy data at risk. More importantly understand best practices on protecting this data and dealing with disclosure requirements. Topics include:
* Types of privacy and threats to them
* How is privacy different than security?
* Business systems putting you most at risk
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis
In Big Data we focus on the 4 V's: Volume, Velocity, Varity and Veracity. But another important topic is often not in the focus: Privacy and Security. Yet as important and if not considered from the beginning it might put your Big Data project at risk. Learn about most important Privacy and Security fundamentals in Big Data, you should take into account in your next Big Data project.
The document discusses how big data, increased data volumes, and weaknesses in security present a "perfect storm" risk scenario. It notes that while big data deployments are growing fast to realize business value, security is often not properly prioritized or implemented. This can allow breaches to go undetected. The document also outlines how data sources and volumes are expanding dramatically, while relevant security skills remain limited. Overall it argues that the confluence of these factors poses significant security challenges for organizations working with big data.
Internet of Things With Privacy in MindGosia Fraser
This document discusses privacy considerations for Internet of Things devices. It notes that IoT devices collect personal data that, even when fragmented, can reveal sensitive information when aggregated and analyzed. Many IoT manufacturers do not adequately explain how they collect, use, store, and allow deletion of personal information. The document advocates adopting privacy by design principles to build privacy protections into IoT technologies from the early stages of development through privacy impact assessments and data protection impact assessments. This helps understand privacy needs, shape better policies, and improve transparency while demonstrating adherence to high data protection standards.
The National Security Agency's (NSA) surveillance system known as “PRISM” is not a surprise to most Information Technology Experts. In fact, many experts such as the companies (AOL Inc., Apple Inc., Facebook Inc., Google Inc., Microsoft Corp., Yahoo Inc., Skype, YouTube and Paltalk) who shared the information with NSA are trying very hard to mitigate the risks in securing this technology;
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014kevintsmith
In our era of “Big Data”, organizations are collecting, analyzing, and making decisions based on analysis of massive amounts of data sets from various sources, and security in this process is becoming increasingly more important. With regulations like HIPAA and other privacy protection laws, securing access and determining releasability of data sets is critical. Organizations using Big Data Analytics solutions face challenges, as most of today’s solutions were not designed with security in mind. This presentation focuses on challenges, use cases, and practical real-world solutions related to securing and preserving privacy in Big Data Analytics solutions, addressing authorization, differential privacy, and more.
Privacy by Design and by Default + General Data Protection Regulation with Si...Peter Procházka
My presentation for SUG Hungary presented on 26.06.2018 with topic Privacy by Design and by Default and General Data Protection Regulation with Sitecore
TrustArc Webinar: Challenges & Risks Of Data GraveyardsTrustArc
With the rise of big data, companies now obtain and store many data in massive quantities. As a result, they end up having giant repositories of unused data stored in their servers, also called data graveyards.
Storage infrastructure, maintenance costs, compliance with privacy laws, security gaps, and risk of data corruption: risks due to data graveyards are numerous.
What can organizations do with a large amount of data? How can you uncover the value of data before storing it? How can you manage the maintenance costs of big data?
Join our panel in this webinar as we explore how your company should manage the risks and challenges associated with data graveyards.
This webinar will review:
- What data graveyards are
- How to manage data graveyards risks
- How to define data retention periods and stay compliant
The document outlines 4 steps for successful data mapping and identification during discovery: 1) Conduct a thorough investigation of where potentially responsive data is located; 2) Understand who has access to the data as these individuals can help locate additional data sources; 3) Consider data retention, destruction and archiving policies to prevent accidental data loss; 4) Identify hard copies of documents that may supplement electronic data. Taking these steps to map out a data acquisition plan can increase the value of the discovery process and the likelihood of case success.
Wearable technologies, privacy and intellectual property rightsGiulio Coraggio
Outline of main legal issues connected to the usage of wearable technologies with particular reference to privacy and data protection, intellectual property rights and confidentiality
Privacy by Design - taking in account the state of the artJames Mulhern
Establishing transparency and building trust provide an opportunity to develop greater, more meaningful relationships with data subjects i.e people, customers, colleagues... in turn this can lead to more effective and valuable services that help transform organisations.
A "Privacy by design" approach can help achieve this but it doesn't happen by accident and transformation doesn't occur over night. So a deliberate approach that looks beyond May 2018 and compliance is required.
Presentation to representatives from the technology and Local Government sectors at TechUK, the UK's trade association for the technology.
This document discusses privacy by design principles for software development. It outlines key concepts like data subjects, controllers, processors and regulators. The 7 guiding principles of privacy by design are described. Implementation considerations include legal requirements for data transfers, privacy policies, impact assessments and training. Typical privacy issues for mobile/web apps are listed. Examples of implementation include opt-in mechanisms and restricting data access. Working with providers outside the EU poses high risks of non-compliance.
Privacy by Design as a system design strategy - EIC 2019 Sagara Gunathunga
1) Privacy by Design (PbD) is an approach to system design that emphasizes privacy and data protection through the entire lifecycle. The 7 PbD principles include making privacy the default, embedding privacy into design, and keeping systems user-centric and transparent.
2) To apply PbD, personal data should be separated from other business data and stored securely in a separate system. Standard protocols like SAML and OAuth2 should be used to share personal data securely.
3) When designing a personal data repository, transparency, data minimization, and giving users control over their data through a self-care portal are important considerations.
The document discusses securing big data in enterprises. It notes that big data presents both challenges and opportunities for security. Throughout the data lifecycle, from collection to analysis, security is crucial. This involves securing access to data, enforcing policies, detecting threats, and protecting data across systems. With the right tools for logging, analysis, and reporting, organizations can better understand normal network activity and secure vast amounts of information to leverage the opportunities big data provides.
The Privacy Law Landscape: Issues for the research communityARDC
Presentation by Anna Johnston of Salinger Privacy to ARDC's 'GDPR and NDB scheme: Intersection with the Australian research sector' webinar on 13 September 2018
Securing, storing and enabling safe access to dataRobin Rice
Invited talk as part of Westminster Insight Research Data Management Forum, https://www.westminsterinsight.co.uk/event/3416/Research_Data_Management_Forum
IAPP Canada Privacy Symposium- "Data Retention Is a Team Sport: How to Get It...Blancco
The document discusses how companies need to implement strong data retention policies and procedures to comply with increasing data privacy regulations, properly classify and manage data through its lifecycle, and ensure all data is securely erased at end-of-life through an information lifecycle management approach involving key stakeholders like IT, legal, and data owners. It highlights how simply deleting or formatting data is not enough and certified data erasure tools and processes are needed to prevent data breaches and regulatory fines from non-compliant data disposal.
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
Organizations are utilizing Sqrrl Enterprise to securely integrate vast amounts of multi-structured data (e.g., tens of petabytes) onto a single Big Data platform and then are building real-time applications using this data and Sqrrl Enterprise’s analytical interfaces. The secure integration is enabled by Accumulo’s innovative cell-level security capabilities and Sqrrl Enterprise’s security extensions, such as encryption.
This document provides an overview of the General Data Protection Regulation (GDPR) and outlines steps for compliance. It begins with a disclaimer about the information provided. It then lists resources for learning more about the GDPR and its 99 articles and 173 recitals. The rest of the document outlines key aspects of GDPR compliance, including identifying high and critical risk data, privacy notices, individual rights and redress, lawful and fair processing, privacy by design, data security, and data transfers.
The Internet Services, Web and Mobile Applications, Pervasive Communication widely available today that are meeting many of our needs have stimulated production of tremendous amounts of data (call metadata, texts, emails, social media updates, photos, videos, location, etc.). The computing power available today in conjunction with trending technologies like Data Mining and Analytics, Machine Learning and Computational Linguistics provide an opportunity business and government organizations to manage, search, analyze, and visualize vast amount of data as information.
Companies named data brokers collect consumer data including behavioral and private and then sell to companies those use this data for personalized marketing and selling. There is no doubt that this is good for businesses, but is this same good for consumers? Is this just positively affects buying experience of customers? How much does reliable this kind data event for companies? How to keep a balance between new opportunities derived by Big Data to companies and privacy concern it brings to consumers?
In proposed speech we will try to find out some of the answers to these and other questions.
Customer data and the new EU privacy law - May2016Andrew Sanderson
Under the new EU law, international business that do not handle Personal Data of EU citizens correctly may be fined up to 4% of global revenues.
The grace period for adapting processes to comply with the law begins 25 May 2016 and ends 25 May 2018.
This presentation explains why *all* customer data counts as "personal information".
Written by an EU marketer for non-EU marketrs in international business. Enjoy.
Netflix receives 2 billion requests per day to its API from users and makes 12 billion outbound requests from its personalization engine to power recommendations. The personalization engine uses data on users, movies, ratings, reviews, and similar movies to conduct A/B tests and has experienced 30 times growth over two years. The document requests feedback on the presentation and conference.
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
From the Hadoop Summit 2015 Session with Ted Dunning:
The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.
Trivadis TechEvent 2016 Big Data Privacy and Security Fundamentals by Florian...Trivadis
In Big Data we focus on the 4 V's: Volume, Velocity, Varity and Veracity. But another important topic is often not in the focus: Privacy and Security. Yet as important and if not considered from the beginning it might put your Big Data project at risk. Learn about most important Privacy and Security fundamentals in Big Data, you should take into account in your next Big Data project.
The document discusses how big data, increased data volumes, and weaknesses in security present a "perfect storm" risk scenario. It notes that while big data deployments are growing fast to realize business value, security is often not properly prioritized or implemented. This can allow breaches to go undetected. The document also outlines how data sources and volumes are expanding dramatically, while relevant security skills remain limited. Overall it argues that the confluence of these factors poses significant security challenges for organizations working with big data.
Internet of Things With Privacy in MindGosia Fraser
This document discusses privacy considerations for Internet of Things devices. It notes that IoT devices collect personal data that, even when fragmented, can reveal sensitive information when aggregated and analyzed. Many IoT manufacturers do not adequately explain how they collect, use, store, and allow deletion of personal information. The document advocates adopting privacy by design principles to build privacy protections into IoT technologies from the early stages of development through privacy impact assessments and data protection impact assessments. This helps understand privacy needs, shape better policies, and improve transparency while demonstrating adherence to high data protection standards.
The National Security Agency's (NSA) surveillance system known as “PRISM” is not a surprise to most Information Technology Experts. In fact, many experts such as the companies (AOL Inc., Apple Inc., Facebook Inc., Google Inc., Microsoft Corp., Yahoo Inc., Skype, YouTube and Paltalk) who shared the information with NSA are trying very hard to mitigate the risks in securing this technology;
Big Data Security and Privacy - Presentation to AFCEA Cyber Symposium 2014kevintsmith
In our era of “Big Data”, organizations are collecting, analyzing, and making decisions based on analysis of massive amounts of data sets from various sources, and security in this process is becoming increasingly more important. With regulations like HIPAA and other privacy protection laws, securing access and determining releasability of data sets is critical. Organizations using Big Data Analytics solutions face challenges, as most of today’s solutions were not designed with security in mind. This presentation focuses on challenges, use cases, and practical real-world solutions related to securing and preserving privacy in Big Data Analytics solutions, addressing authorization, differential privacy, and more.
Privacy by Design and by Default + General Data Protection Regulation with Si...Peter Procházka
My presentation for SUG Hungary presented on 26.06.2018 with topic Privacy by Design and by Default and General Data Protection Regulation with Sitecore
TrustArc Webinar: Challenges & Risks Of Data GraveyardsTrustArc
With the rise of big data, companies now obtain and store many data in massive quantities. As a result, they end up having giant repositories of unused data stored in their servers, also called data graveyards.
Storage infrastructure, maintenance costs, compliance with privacy laws, security gaps, and risk of data corruption: risks due to data graveyards are numerous.
What can organizations do with a large amount of data? How can you uncover the value of data before storing it? How can you manage the maintenance costs of big data?
Join our panel in this webinar as we explore how your company should manage the risks and challenges associated with data graveyards.
This webinar will review:
- What data graveyards are
- How to manage data graveyards risks
- How to define data retention periods and stay compliant
The document outlines 4 steps for successful data mapping and identification during discovery: 1) Conduct a thorough investigation of where potentially responsive data is located; 2) Understand who has access to the data as these individuals can help locate additional data sources; 3) Consider data retention, destruction and archiving policies to prevent accidental data loss; 4) Identify hard copies of documents that may supplement electronic data. Taking these steps to map out a data acquisition plan can increase the value of the discovery process and the likelihood of case success.
Wearable technologies, privacy and intellectual property rightsGiulio Coraggio
Outline of main legal issues connected to the usage of wearable technologies with particular reference to privacy and data protection, intellectual property rights and confidentiality
Privacy by Design - taking in account the state of the artJames Mulhern
Establishing transparency and building trust provide an opportunity to develop greater, more meaningful relationships with data subjects i.e people, customers, colleagues... in turn this can lead to more effective and valuable services that help transform organisations.
A "Privacy by design" approach can help achieve this but it doesn't happen by accident and transformation doesn't occur over night. So a deliberate approach that looks beyond May 2018 and compliance is required.
Presentation to representatives from the technology and Local Government sectors at TechUK, the UK's trade association for the technology.
This document discusses privacy by design principles for software development. It outlines key concepts like data subjects, controllers, processors and regulators. The 7 guiding principles of privacy by design are described. Implementation considerations include legal requirements for data transfers, privacy policies, impact assessments and training. Typical privacy issues for mobile/web apps are listed. Examples of implementation include opt-in mechanisms and restricting data access. Working with providers outside the EU poses high risks of non-compliance.
Privacy by Design as a system design strategy - EIC 2019 Sagara Gunathunga
1) Privacy by Design (PbD) is an approach to system design that emphasizes privacy and data protection through the entire lifecycle. The 7 PbD principles include making privacy the default, embedding privacy into design, and keeping systems user-centric and transparent.
2) To apply PbD, personal data should be separated from other business data and stored securely in a separate system. Standard protocols like SAML and OAuth2 should be used to share personal data securely.
3) When designing a personal data repository, transparency, data minimization, and giving users control over their data through a self-care portal are important considerations.
The document discusses securing big data in enterprises. It notes that big data presents both challenges and opportunities for security. Throughout the data lifecycle, from collection to analysis, security is crucial. This involves securing access to data, enforcing policies, detecting threats, and protecting data across systems. With the right tools for logging, analysis, and reporting, organizations can better understand normal network activity and secure vast amounts of information to leverage the opportunities big data provides.
The Privacy Law Landscape: Issues for the research communityARDC
Presentation by Anna Johnston of Salinger Privacy to ARDC's 'GDPR and NDB scheme: Intersection with the Australian research sector' webinar on 13 September 2018
Securing, storing and enabling safe access to dataRobin Rice
Invited talk as part of Westminster Insight Research Data Management Forum, https://www.westminsterinsight.co.uk/event/3416/Research_Data_Management_Forum
IAPP Canada Privacy Symposium- "Data Retention Is a Team Sport: How to Get It...Blancco
The document discusses how companies need to implement strong data retention policies and procedures to comply with increasing data privacy regulations, properly classify and manage data through its lifecycle, and ensure all data is securely erased at end-of-life through an information lifecycle management approach involving key stakeholders like IT, legal, and data owners. It highlights how simply deleting or formatting data is not enough and certified data erasure tools and processes are needed to prevent data breaches and regulatory fines from non-compliant data disposal.
Sqrrl Enterprise: Big Data Security Analytics Use CaseSqrrl
Organizations are utilizing Sqrrl Enterprise to securely integrate vast amounts of multi-structured data (e.g., tens of petabytes) onto a single Big Data platform and then are building real-time applications using this data and Sqrrl Enterprise’s analytical interfaces. The secure integration is enabled by Accumulo’s innovative cell-level security capabilities and Sqrrl Enterprise’s security extensions, such as encryption.
This document provides an overview of the General Data Protection Regulation (GDPR) and outlines steps for compliance. It begins with a disclaimer about the information provided. It then lists resources for learning more about the GDPR and its 99 articles and 173 recitals. The rest of the document outlines key aspects of GDPR compliance, including identifying high and critical risk data, privacy notices, individual rights and redress, lawful and fair processing, privacy by design, data security, and data transfers.
The Internet Services, Web and Mobile Applications, Pervasive Communication widely available today that are meeting many of our needs have stimulated production of tremendous amounts of data (call metadata, texts, emails, social media updates, photos, videos, location, etc.). The computing power available today in conjunction with trending technologies like Data Mining and Analytics, Machine Learning and Computational Linguistics provide an opportunity business and government organizations to manage, search, analyze, and visualize vast amount of data as information.
Companies named data brokers collect consumer data including behavioral and private and then sell to companies those use this data for personalized marketing and selling. There is no doubt that this is good for businesses, but is this same good for consumers? Is this just positively affects buying experience of customers? How much does reliable this kind data event for companies? How to keep a balance between new opportunities derived by Big Data to companies and privacy concern it brings to consumers?
In proposed speech we will try to find out some of the answers to these and other questions.
Customer data and the new EU privacy law - May2016Andrew Sanderson
Under the new EU law, international business that do not handle Personal Data of EU citizens correctly may be fined up to 4% of global revenues.
The grace period for adapting processes to comply with the law begins 25 May 2016 and ends 25 May 2018.
This presentation explains why *all* customer data counts as "personal information".
Written by an EU marketer for non-EU marketrs in international business. Enjoy.
Netflix receives 2 billion requests per day to its API from users and makes 12 billion outbound requests from its personalization engine to power recommendations. The personalization engine uses data on users, movies, ratings, reviews, and similar movies to conduct A/B tests and has experienced 30 times growth over two years. The document requests feedback on the presentation and conference.
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
From the Hadoop Summit 2015 Session with Ted Dunning:
The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.
Managing Personally Identifiable Information (PII)KP Naidu
This document discusses personally identifiable information (PII) and provides guidance on managing PII. It defines PII as information that can be used to identify an individual. The document notes that data breaches involving PII are common and outlines legal issues related to PII. It recommends assessing the confidentiality impact of PII and implementing appropriate controls based on the impact level. Specific steps are outlined to help organizations properly manage PII.
Introduction to Apache Drill - interactive query and analysis at scaleMapR Technologies
This document introduces Apache Drill, an open source interactive analysis engine for big data. It was inspired by Google's Dremel and supports standard SQL queries over various data sources like Hadoop and NoSQL databases. Drill provides low-latency interactive queries at scale through its distributed, schema-optional architecture and support for nested data formats. The talk outlines Drill's capabilities and status as a community-driven project under active development.
Using Apache Calcite for Enabling SQL and JDBC Access to Apache Geode and Oth...Christian Tzolov
When working with BigData & IoT systems we often feel the need for a Common Query Language. The system specific languages usually require longer adoption time and are harder to integrate within the existing stacks.
To fill this gap some NoSql vendors are building SQL access to their systems. Building SQL engine from scratch is a daunting job and frameworks like Apache Calcite can help you with the heavy lifting. Calcite allow you to integrate SQL parser, cost-based optimizer, and JDBC with your NoSql system.
We will walk through the process of building a SQL access layer for Apache Geode (In-Memory Data Grid). I will share my experience, pitfalls and technical consideration like balancing between the SQL/RDBMS semantics and the design choices and limitations of the data system.
Hopefully this will enable you to add SQL capabilities to your prefered NoSQL data system.
The document summarizes key changes in the Basel Committee's revised market risk framework, known as Fundamental Review of the Trading Book (FRTB). It introduces more complex capital calculations under the internal models approach, with requirements for multiple scenario analyses and risk factor combinations that significantly increase processing needs. It also requires clearer position classification and metadata for regulatory capital calculations. Banks will need enhanced data management and risk aggregation capabilities to integrate information across business units. The substantial technology impacts suggest a long-term, flexible implementation approach rather than short-term minimum compliance.
The document summarizes the Fundamental Review of the Trading Book (FRTB), which establishes new capital requirements for market risk. It outlines the standardized approach and internal models approach, both of which involve calculating expected shortfall and stressed value-at-risk. Banks will need to store and process significantly more market data to meet the new requirements, which are estimated to increase median capital requirements by 22% and weighted average capital requirements by 40%. Technical challenges include automating extensive data gathering, pricing, and reporting to support the new risk measurement approaches and capital calculations.
Apache Drill: Building Highly Flexible, High Performance Query Engines by M.C...The Hive
SQL is one of the most widely used languages to access, analyze, and manipulate structured data. As Hadoop gains traction within enterprise data architectures across industries, the need for SQL for both structured and loosely-structured data on Hadoop is growing rapidly Apache Drill started off with the audacious goal of delivering consistent, millisecond ANSI SQL query capability across wide range of data formats. At a high level, this translates to two key requirements – Schema Flexibility and Performance. This session will delve into the architectural details in delivering these two requirements and will share with the audience the nuances and pitfalls we ran into while developing Apache Drill.
This talk provides an in-depth overview of the key concepts of Apache Calcite. It explores the Calcite catalog, parsing, validation, and optimization with various planners.
Ted Dunning presents information on Drill and Spark SQL. Drill is a query engine that operates on batches of rows in a pipelined and optimistic manner, while Spark SQL provides SQL capabilities on top of Spark's RDD abstraction. The document discusses the key differences in their approaches to optimization, execution, and security. It also explores opportunities for unification by allowing Drill and Spark to work together on the same data.
Merlin: The Ultimate Data Science EnvironmentCharles Givre
Merlin is a virtual computing environment developed by data scientists for data scientists. Merlin is free and open source, and contains a suite of all the best open source data science tools including data visualization tools, programming languages, big data tools, databases, notebooks, IDEs, and much more. The goal of Merlin is to allow data scientists to do data science work, not sysadmin.
What Does Your Smart Car Know About You? Strata London 2016Charles Givre
In the last few years, auto makers and technology companies have introduced a variety of devices to connect cars to the Internet and use this connectivity to gather data about the vehicles’ activity, but these connected cars gather a considerable amount of data about their owners’ activities beyond what one might expect. In aggregate and combined with other datasets, this data represents a significant degradation of personal privacy as well as a potential security risk. As auto insurers and local governments start to require this data collection, consumers should be aware of the security risks as well as the potential privacy invasions associated with this unique type of data collection.
In a follow-up to his 2015 session at Strata + Hadoop World NYC, Charles Givre examines data gathered from sensors in automobiles. Charles focuses on what kinds of data cars are gathering and asks critical questions about whether the benefits this data provides outweigh the risks and cost to personal privacy—the inevitable result of this data collection.
The Extract-Transform-Load (ETL) process is one of the most time consuming processes facing anyone who wishes to analyze data. Imagine if you could quickly, easily and scaleably merge and query data without having to spend hours in data prep. Well.. you don’t have to imagine it. You can with Apache Drill. In this hands-on, interactive presentation Mr. Givre will show you how to unleash the power of Apache Drill and explore your data without any kind of ETL process.
Study after study show that data scientists spend 50-90 percent of their time gathering and preparing data. In many large organizations this problem is exacerbated by data being stored on a variety of systems, with different structures and architectures. Apache Drill is a relatively new tool which can help solve this difficult problem by allowing analysts and data scientists to query disparate datasets in-place using standard ANSI SQL without having to define complex schemata, or having to rebuild their entire data infrastructure. In this talk I will introduce the audience to Apache Drill—to include some hands-on exercises—and present a case study of how Drill can be used to query a variety of data sources. The presentation will cover:
* How to explore and merge data sets in different formats
* Using Drill to interact with other platforms such as Python and others
* Exploring data stored on different machines
Apache Drill and Zeppelin: Two Promising Tools You've Never Heard OfCharles Givre
Study after study shows that data preparation and other data janitorial work consume 50-90% of most data scientists’ time. Apache Drill is a very promising tool which can help address this. Drill works with many different forms of “self describing data” and allows analysts to run ad-hoc queries in ANSI SQL against that data. Unlike HIVE or other SQL on Hadoop tools, Drill is not a wrapper for Map-Reduce and can scale to clusters of up to 10k nodes.
Flink vs. Spark: this is the slide deck of my talk at the 2015 Flink Forward conference in Berlin, Germany, on October 12, 2015. In this talk, we tried to compare Apache Flink vs. Apache Spark with focus on real-time stream processing. Your feedback and comments are much appreciated.
Large scale, interactive ad-hoc queries over different datastores with Apache...jaxLondonConference
Presented at JAX London 2013
Apache Drill is a distributed system for interactive ad-hoc query and analysis of large-scale datasets. It is the Open Source version of Google’s Dremel technology. Apache Drill is designed to scale to thousands of servers and able to process Petabytes of data in seconds, enabling SQL-on-Hadoop and supporting a variety of data sources.
What should organizations be concerned about when using Machine Learning for Predictive Modeling techniques? Divergence Academy and Divergence.AI are leading efforts to bring Algorithmic Accountability awareness to masses.
The Rise of Data Ethics and Security - AIDI WebinarEryk Budi Pratama
The document discusses the rise of data ethics and security. It begins with an introduction of the speaker and their background. It then covers various topics related to data ethics including the data lifecycle, implementation of data ethics through vision, strategy, governance and more. Big data security is also discussed as it relates to data governance, challenges, and approaches to building a security program. Regulatory requirements and their impact on data scientists is covered as it relates to privacy. Techniques for privacy control like data masking and tokenization in ETL processes are presented.
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.
Takeaways:
- The means by which Big Data techniques can complement existing data management practices
- The prototyping nature of practicing Big Data techniques
- The distinct ways in which utilizing Big Data can generate business value
- Bigger Data isn’t always Better Data
We are in the middle of a data flood and we need to figure out how to tame it without drowning. Most of what has been written about Big Data is focused on selling hardware and services. But what about a Big Data Strategy that guides hardware and software decisions? While virtually every major organization is faced with the challenge of figuring out the approach for and the requirements of this new development, jumping into the fray hastily and unprepared will only reproduce the same dismal IT project results as previously experienced. Join Dr. Peter Aiken as he will debunk a number of misconceptions about Big Data as your un-typical IT project. He will provide guidance on how to establish realistic Big Data management plans and expectations, and help demonstrate the value of such actions to both internal and external decision makers without getting lost in the hype.
Check out more of our Data-Ed webinars here: www.datablueprint.com/webinar-schedule
This document summarizes a presentation on privacy, security and ethics related to big data analytics. It discusses several key points:
1. Big data promises new opportunities but also new privacy and surveillance risks due to the vast amount of personal data being collected and analyzed.
2. Privacy risks are best managed proactively through techniques like Privacy by Design which embeds privacy protections from the start of a project.
3. Innovation and privacy are not mutually exclusive; it is possible to gain insights from big data analytics while also protecting privacy through approaches like Privacy by Design.
Privacy experience in Plone and other open source CMSInteraktiv
This document discusses privacy experience in open source content management systems (CMS) like Plone. It begins by explaining why privacy matters and providing examples of recent privacy issues. It then discusses different approaches to privacy internationally and how this affects global open source communities. The document proposes universal privacy principles and discusses how privacy can be ensured in open source CMS communities specifically, with suggestions for Plone. It emphasizes a preventative, privacy by design approach.
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...e-SIDES.eu
This is the slide-deck of the community event held on November 14, 2019 in Brussels, titled "Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019". It includes the presentations given by the speakers.
Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019...IDC4EU
This is the slide-deck of the community event held on November 14, 2019 in Brussels, titled "Beyond Privacy: Learning Data Ethics - European Big Data Community Forum 2019". It includes the presentations given by the speakers.
The document discusses three key challenges for data governance and security with big data: 1) ethics and compliance as personally identifiable data is widespread and regulations are increasing, 2) poor data management when there is no clear ownership or lifecycle management of data, and 3) insecure infrastructure as many devices and systems generating data were not designed with security in mind. Effective data governance is important for security, and requires defining responsibilities, auditing data use, and protecting data during collection, storage, and analysis. Technologies can help automate and scale governance, but it is ultimately a combination of people, processes, and tools.
There are three key challenges to effective data governance and security in the big data era: 1) ethics and compliance as personally identifiable data is widespread and regulations are increasing, 2) poor data management when there is no clear ownership or lifecycle management of data, and 3) insecure infrastructure as many IoT and other devices were not designed with security in mind. Effective data governance requires a combination of people, processes, and technology to classify, secure, and manage data throughout its lifecycle.
Anonos NIST Comment Letter – De–Identification Of Personally Identifiable Inf...Ted Myerson
The document is a letter submitted to NIST proposing that the draft NISTIR report on de-identification of personally identifiable information include discussion of "dynamic data obscurity". The letter argues that dynamic data obscurity technologies can help overcome limitations of static de-identification techniques by allowing intelligent and compliant access to data elements while still enforcing core privacy protections. The letter proposes adding a section on dynamic data obscurity to the report and discusses the history and benefits of this approach.
Data mining and privacy preserving in data miningNeeda Multani
Data mining involves analyzing data from different perspectives to discover useful patterns and relationships not previously known. It can be used to increase profits, reduce costs, and more. Privacy preservation in data mining aims to protect individual privacy while still providing valid mining results, using techniques like cryptographic protocols to run algorithms on joined databases without revealing unnecessary information. Data mining has various applications like fraud detection, credit risk assessment, customer profiling, and more.
ETHICAL ISSUES WITH CUSTOMER DATA COLLECTIONPranav Godse
Data mining involves collecting and analyzing large amounts of customer data. While this can provide commercial benefits, it also raises ethical issues regarding customer privacy. Some key ethical challenges include ambiguity around how social networks label relationships, uncertainty around future uses of customer data by companies, and a lack of transparency around passive collection of mobile location data. To address these challenges, companies should focus on ethical data mining practices like verifying data sources, respecting customer expectations of privacy, developing trust through transparency and control over data access. Regulators also need to continue updating laws and regulations to balance the benefits of data analytics with protecting individual privacy rights.
This document discusses big data and the importance of data quality for big data initiatives. It defines big data as large, diverse digital data sets that require new techniques to enable capture, storage, analysis and visualization. The key challenges of big data include integrating diverse structured and unstructured data sources and ensuring high quality data. The document emphasizes that poor data quality can undermine big data analytics efforts and lead to wrong insights. It promotes establishing a data quality framework including profiling, standardization, matching and enrichment to enable valid big data analytics.
GDPR Compliance Made Easy with Data VirtualizationDenodo
Companies should be gearing up for May 25, 2018 when the General Data Protection Regulation (GDPR) comes into effect. GPDR will affect how businesses that serve the European Union collect, use and transfer data, forcing them to provide specific reasons and need for the personal data they gather and prove their compliance with the principles established by the regulation.
The regulation is already creating many challenges for companies, including:
• Ensuring secure access to most current data, whether on or off-premise
• Consistent security across all data sources
• Data access audit
• Ability to provide data lineage
This webinar aims to demonstrate how data virtualization has surfaced as a straight-forward solution to many of the challenges and questions brought on by the GDPR. It will also include a case study of how Asurion already achieved the desired level of security with data virtualization.
Watch the webinar in full to learn more about the benefits of using data virtualization to smoothly comply with the GDPR: http://ow.ly/1kzk30bRw3i
This document outlines the course roadmap for a data analytics course. It includes 12 topics covered over 15 weeks, with flexibility weeks built in. The topics include data exploration and visualization, predictive analytics, research design and experimentation, and data communication. Workshops are included to provide hands-on learning opportunities. The learning objectives focus on key principles of data ethics like ethical decision making, technical approaches to prevent issues, and risk management for data ethics.
Ethyca CodeDriven - Data Privacy Compliance for Engineers & Data TeamsCillian Kieran
A presentation at FirstMark's CodeDriven event in AWS Loft in New York on how to think about Data Privacy Compliance if you work in engineering, data or product teams.
This document summarizes a webinar on privacy secrets and how systems can reveal personal information. It discusses defining privacy, the seven types of privacy, and the differences between privacy and security. It also covers threats to privacy like big data, location tracking, and metadata analysis. The webinar examines data types like PII, PHI, and anonymous/pseudonymous data. It provides examples of data lifecycles and analyzing how data flows through systems and to third parties. The goal is to help organizations understand privacy risks and comply with regulations like GDPR.
My keynote speech at the ISACA IIA Belgium software watch day in October 2014 in Brussels on the value of big data and data analytics for auditors and other assurance professionals
Perspectives on Ethical Big Data GovernanceCloudera, Inc.
Enterprise data governance is a critical, yet challenging, business process, and the rapidly expanding universe of data volumes and types make it a more significant undertaking, particularly for public sector organizations. In this session, attendees will learn how to bring comprehensive data governance to their organizations to ensure data collected and managed is handled and protected as required. Discover practical information on how to use the components and frameworks of the Hadoop stack to support your requirements for data auditing, lineage, metadata management, and policy enforcement, and hear recommendations on how to get started with measuring the progress of ethical big data usage--including what’s legal and what’s right. Bring your questions and join this lively, interactive dialogue.
Similar to Data science and pending EU privacy laws - a storm on the horizon (20)
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Global Situational Awareness of A.I. and where its headed
Data science and pending EU privacy laws - a storm on the horizon
1. Data Science and EU Privacy
A Storm on the Horizon
David Stephenson, Ph.D.
dsiAnalytics.com
2. PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
• Privacy: Storm on the Horizon
• Concluding Thoughts
Agenda
2
3. My Background
Intro & Case Studies
Head of Global Business
Analytics
Professor
(Advanced Analytics)
Ph.D. Analytics &
Computer Science
Financial Analytics,
Credit Risk and Insurance
Independent Consultant
3
6. PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
• Privacy: Storm on the Horizon
Agenda
6
7. Data Usage
The Power of Data Science
Propensity Classification/Profiling
PersonalizationMarketing
7
8. More Data Means More Insights
Traditional Data
Big
Data
Smart
Devices
IoT
8
10. PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
• Intro & Case Studies
• Data & Data Science: Growth and Usage
– Data Science
– Modern Technology
• Privacy: Storm on the Horizon
Agenda
10
11. Data Sources
The Power of Data Science
11
Brainstorm: Today’s Sources of Personal Data? 11
17. Source and Use of Customer Data
Privacy: A Brief Background
Can be
Known
Used
Stored
Shared with
3rd parties
Observed
Volunteered
Data
Science
17
18. PRIVACY CONSIDERATIONS WITH DATA AND DATA SCIENCE
18
• Intro & Case Studies
• Data & Data Science: Growth and Usage
• Privacy: Storm on the Horizon
Agenda
20. Preparing for compliance
Privacy: Storm on the Horizon
What are my
data assets?
Usage
Storage
Flow to/from
3rd parties
Observed
Volunteered
Data
Science
Right to be forgotten
De-anonymization
Cloud computing
Explicit and up-
front consent
Restricted profiling
Privacy by Design
Potential liabilities from
buying, selling and sharing
20
21. Moving Forward
Privacy: Storm on the Horizon
Become aware of your entire data ecosystem and how it may expose
you to privacy violations
Audit current data storage and governance for compliance
Ensure that all product roadmaps comply with the principles of
Privacy by Design
21
Ensure that proper user consent is in place from the moment of first
user registration
Initiate dialogue with corporate privacy officer or external expert
24. Privacy by Design
24
1 Proactive not Reactive; Preventative not Remedial
2 Privacy as the Default Setting
3 Privacy Embedded into Design
4 Full Functionality – Positive-Sum, not Zero-Sum
5 End-to-End Security – Full Lifecycle Protection
6 Visibility and Transparency – Keep it Open
7 Respect for User Privacy – Keep it User-Centric
25. Privacy by Design for Big Data (Jeff Jonas, IBM)
25
1. FULL ATTRIBUTION: Every observation (record) needs to know from where it came and when. There cannot be
merge/purge data survivorship processing whereby some observations or fields are discarded.
2. DATA TETHERING: Adds, changes and deletes occurring in systems of record must be accounted for, in real time, in sub-
seconds.
3. ANALYTICS ON ANONYMIZED DATA: The ability to perform advanced analytics (including some fuzzy matching) over
cryptographically altered data means organizations can anonymize more data before information sharing.
4. TAMPER-RESISTANT AUDIT LOGS: Every user search should be logged in a tamper-resistant manner — even the
database administrator should not be able to alter the evidence contained in this audit log.
5. FALSE NEGATIVE FAVORING METHODS: The capability to more strongly favor false negatives is of critical importance
in systems that could be used to affect someone’s civil liberties.
6. SELF-CORRECTING FALSE POSITIVES: With every new data point presented, prior assertions are re-evaluated to
ensure they are still correct, and if no longer correct, these earlier assertions can often be repaired — in real time.
7. INFORMATION TRANSFER ACCOUNTING: Every secondary transfer of data, whether to human eyeball or a tertiary
system, can be recorded to allow stakeholders (e.g., data custodians or the consumers themselves) to understand how
their data is flowing.
Editor's Notes
Photo from http://mac360.com/2012/02/free-make-your-own-photo-puzzles-on-a-mac/
Photo from http://www.retrooffice.com/vintage-store/vintage-filing-storage/mcdowell-craig-vintage-steel-retro-vertical-letter-and-legal-file-cabinets.html.