Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
Can we use data to train Machine Learning models, perform statistical analysis, yet without putting private data on risk? There are tools and techniques such as Federated Learning, Differential Privacy or Homomorphic Encryption enabling safer work on the data.
Responsible Data Use in AI - core tech pillarsSofus Macskássy
In this deck, we cover four core pillars of responsible data use in AI, including fairness, transparency, explainability -- as well as data governance.
“AI is the new electricity” proclaims Andrew Ng, co-founder of Google Brain. Just as we need to know how to safely harness electricity, we also need to know how to securely employ AI to power our businesses. In some scenarios, the security of AI systems can impact human safety. On the flip side, AI can also be misused by cyber-adversaries and so we need to understand how to counter them.
This talk will provide food for thought in 3 areas:
Security of AI systems
Use of AI in cybersecurity
Malicious use of AI
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Ethical Issues in Machine Learning Algorithms. (Part 1)Vladimir Kanchev
This presentation describes recent ethical issues related to AI and ML algorithms. Its focus is data and algorithmic bias, algorithmic interpretability and how GDPR relates to these issues.
Generative AI models, such as GANs and VAEs, have the potential to create realistic and diverse synthetic data for various applications, from image and speech synthesis to drug discovery and language modeling. However, training these models can be challenging due to the instability and mode collapse issues that often arise. In this workshop, we will explore how stable diffusion, a recent training method that combines diffusion models and Langevin dynamics, can address these challenges and improve the performance and stability of generative models. We will use a pre-configured development environment for machine learning, to run hands-on experiments and train stable diffusion models on different datasets. By the end of the session, attendees will have a better understanding of generative AI and stable diffusion, and how to build and deploy stable generative models for real-world use cases.
Can we use data to train Machine Learning models, perform statistical analysis, yet without putting private data on risk? There are tools and techniques such as Federated Learning, Differential Privacy or Homomorphic Encryption enabling safer work on the data.
Responsible Data Use in AI - core tech pillarsSofus Macskássy
In this deck, we cover four core pillars of responsible data use in AI, including fairness, transparency, explainability -- as well as data governance.
“AI is the new electricity” proclaims Andrew Ng, co-founder of Google Brain. Just as we need to know how to safely harness electricity, we also need to know how to securely employ AI to power our businesses. In some scenarios, the security of AI systems can impact human safety. On the flip side, AI can also be misused by cyber-adversaries and so we need to understand how to counter them.
This talk will provide food for thought in 3 areas:
Security of AI systems
Use of AI in cybersecurity
Malicious use of AI
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Ethical Issues in Machine Learning Algorithms. (Part 1)Vladimir Kanchev
This presentation describes recent ethical issues related to AI and ML algorithms. Its focus is data and algorithmic bias, algorithmic interpretability and how GDPR relates to these issues.
Generative AI models, such as GANs and VAEs, have the potential to create realistic and diverse synthetic data for various applications, from image and speech synthesis to drug discovery and language modeling. However, training these models can be challenging due to the instability and mode collapse issues that often arise. In this workshop, we will explore how stable diffusion, a recent training method that combines diffusion models and Langevin dynamics, can address these challenges and improve the performance and stability of generative models. We will use a pre-configured development environment for machine learning, to run hands-on experiments and train stable diffusion models on different datasets. By the end of the session, attendees will have a better understanding of generative AI and stable diffusion, and how to build and deploy stable generative models for real-world use cases.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019anant90
Machine Learning is itself just another tool, susceptible to adversarial attacks. These can have huge implications, especially in a world with self-driving cars and other automation. In this talk, we will look at recent developments in the world of adversarial attacks on the A.I. systems, and how far we have come in mitigating these attacks.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
Anomaly Detection Using Generative Adversarial Network(GAN)Asha Aher
This presentation covers Anomaly Detection using different GAN architectures. Methodology used in order to check efficiency of GAN in anomaly detection.
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
With the broader adoption of digital technologies and AI, organisations face the emerging risks of AI, the unfamiliar, and the intensified risk of cybersecurity, the familiar. AI and cybersecurity are intertwined, but risk silos are often created when they are dealt with at the technology and governance levels. This talk will explore the interactions between responsible AI and cybersecurity risks via industry case studies. It will show how we can break down the risk silos and use emerging trust-enhancing technologies, architecture and end-to-end software engineering/DevOps practices to connect the two worlds and uplift the risk management posture for both.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
This collection of slides are meant as a starting point and tutorial for the ones who want to understand AI Ethics and in particular the challenges around bias and fairness. Furthermore, I have also included studies on how we as humans perceive AI influence in our private as well as working lives.
Dr Murari Mandal from NUS presented as part of 3 days OpenPOWER Industry summit about Robustness in Deep learning where he talked about AI Breakthroughs , Performance improments in AI models , Adversarial attacks , Attacks on semantic segmentation , Attacs on object detector , Defending Against adversarial attacks and many other areas.
Every single security company is talking about how they are using machine learning—as a security company you have to claim artificial intelligence to be even part of the conversation. However, this approach can be dangerous when we blindly rely on algorithms to do the right thing. Rather than building systems with actual security knowledge, companies are using algorithms that nobody understands and, in turn, discovering wrong insights.
In this session, we will discuss:
• Limitations of machine learning and issues of explainability
• Where deep learning should never be applied
• Examples of how the blind application of algorithms can lead to wrong results
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
CS6659 Artificial Intelligence
Slides in the features of Artificial Intelligence, Definition of Artificial Intelligence
Can be used by undergraduate students
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple’s differential privacy deployment for iOS, Google’s RAPPOR, and LinkedIn Salary. We will also discuss various open source as well as commercial privacy tools, and conclude with open problems and challenges for data mining / machine learning community.
It’s long ago, approx. 30 years, since AI was not only a topic for Science-Fiction writers, but also a major research field surrounded with huge hopes and investments. But the over-inflated expectations ended in a subsequent crash and followed by a period of absent funding and interest – the so-called AI winter. However, the last 3 years changed everything – again. Deep learning, a machine learning technique inspired by the human brain, successfully crushed one benchmark after another and tech companies, like Google, Facebook and Microsoft, started to invest billions in AI research. “The pace of progress in artificial general intelligence is incredible fast” (Elon Musk – CEO Tesla & SpaceX) leading to an AI that “would be either the best or the worst thing ever to happen to humanity” (Stephen Hawking – Physicist).
What sparked this new Hype? How is Deep Learning different from previous approaches? Are the advancing AI technologies really a threat for humanity? Let’s look behind the curtain and unravel the reality. This talk will explore why Sundar Pichai (CEO Google) recently announced that “machine learning is a core transformative way by which Google is rethinking everything they are doing” and explain why "Deep Learning is probably one of the most exciting things that is happening in the computer industry” (Jen-Hsun Huang – CEO NVIDIA).
Either a new AI “winter is coming” (Ned Stark – House Stark) or this new wave of innovation might turn out as the “last invention humans ever need to make” (Nick Bostrom – AI Philosoph). Or maybe it’s just another great technology helping humans to achieve more.
Adversarial Attacks on A.I. Systems — NextCon, Jan 2019anant90
Machine Learning is itself just another tool, susceptible to adversarial attacks. These can have huge implications, especially in a world with self-driving cars and other automation. In this talk, we will look at recent developments in the world of adversarial attacks on the A.I. systems, and how far we have come in mitigating these attacks.
Organizations are collecting massive amounts of data from disparate sources. However, they continuously face the challenge of identifying patterns, detecting anomalies, and projecting future trends based on large data sets. Machine learning for anomaly detection provides a promising alternative for the detection and classification of anomalies.
Find out how you can implement machine learning to increase speed and effectiveness in identifying and reporting anomalies.
In this webinar, we will discuss :
How machine learning can help in identifying anomalies
Steps to approach an anomaly detection problem
Various techniques available for anomaly detection
Best algorithms that fit in different situations
Implementing an anomaly detection use case on the StreamAnalytix platform
To view the webinar - https://bit.ly/2IV2ahC
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WS...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
Please cite as:
Sarah Bird, Ben Hutchinson, Krishnaram Kenthapadi, Emre Kiciman, and Margaret Mitchell. Fairness-Aware Machine Learning: Practical Challenges and Lessons Learned. WSDM 2019.
Anomaly Detection Using Generative Adversarial Network(GAN)Asha Aher
This presentation covers Anomaly Detection using different GAN architectures. Methodology used in order to check efficiency of GAN in anomaly detection.
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
With the broader adoption of digital technologies and AI, organisations face the emerging risks of AI, the unfamiliar, and the intensified risk of cybersecurity, the familiar. AI and cybersecurity are intertwined, but risk silos are often created when they are dealt with at the technology and governance levels. This talk will explore the interactions between responsible AI and cybersecurity risks via industry case studies. It will show how we can break down the risk silos and use emerging trust-enhancing technologies, architecture and end-to-end software engineering/DevOps practices to connect the two worlds and uplift the risk management posture for both.
Responsible AI in Industry (Tutorials at AAAI 2021, FAccT 2021, and WWW 2021)Krishnaram Kenthapadi
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
This collection of slides are meant as a starting point and tutorial for the ones who want to understand AI Ethics and in particular the challenges around bias and fairness. Furthermore, I have also included studies on how we as humans perceive AI influence in our private as well as working lives.
Dr Murari Mandal from NUS presented as part of 3 days OpenPOWER Industry summit about Robustness in Deep learning where he talked about AI Breakthroughs , Performance improments in AI models , Adversarial attacks , Attacks on semantic segmentation , Attacs on object detector , Defending Against adversarial attacks and many other areas.
Every single security company is talking about how they are using machine learning—as a security company you have to claim artificial intelligence to be even part of the conversation. However, this approach can be dangerous when we blindly rely on algorithms to do the right thing. Rather than building systems with actual security knowledge, companies are using algorithms that nobody understands and, in turn, discovering wrong insights.
In this session, we will discuss:
• Limitations of machine learning and issues of explainability
• Where deep learning should never be applied
• Examples of how the blind application of algorithms can lead to wrong results
The impact of AI on society gets bigger and bigger - and it is not all good. We as Data Scientists have to really put in work to not end up in ML hell.
This presentation was given at the Dutch Data Science Week.
Process of converting data set having vast dimensions into data set with lesser dimensions ensuring that it conveys similar information concisely.
Concept
R code
CS6659 Artificial Intelligence
Slides in the features of Artificial Intelligence, Definition of Artificial Intelligence
Can be used by undergraduate students
Nick Schmidt of BLDS, LLC to the Maryland AI meetup, June 4, 2019 (https://www.meetup.com/Maryland-AI). Nick discusses ideas of fairness and how they apply to machine learning. He explores recent academic work on identifying and mitigating bias, and how his work in lending and employment can be applied to other industries. Nick explains how to measure whether an algorithm is fair and also demonstrate the techniques that model builders can use to ameliorate bias when it is found.
Privacy-preserving Data Mining in Industry (WSDM 2019 Tutorial)Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple's differential privacy deployment for iOS / macOS, Google's RAPPOR, LinkedIn Salary, and Microsoft's differential privacy deployment for collecting Windows telemetry. We will conclude with open problems and challenges for the data mining / machine learning community, based on our experiences in industry.
Privacy-preserving Data Mining in Industry: Practical Challenges and Lessons ...Krishnaram Kenthapadi
Preserving privacy of users is a key requirement of web-scale data mining applications and systems such as web search, recommender systems, crowdsourced platforms, and analytics applications, and has witnessed a renewed focus in light of recent data breaches and new regulations such as GDPR. In this tutorial, we will first present an overview of privacy breaches over the last two decades and the lessons learned, key regulations and laws, and evolution of privacy techniques leading to differential privacy definition / techniques. Then, we will focus on the application of privacy-preserving data mining techniques in practice, by presenting case studies such as Apple’s differential privacy deployment for iOS, Google’s RAPPOR, and LinkedIn Salary. We will also discuss various open source as well as commercial privacy tools, and conclude with open problems and challenges for data mining / machine learning community.
BROWN BAG TALK WITH MICAH ALTMAN, SOURCES OF BIG DATA FOR SOCIAL SCIENCESMicah Altman
This talk, is part of the MIT Program on Information Science brown bag series (http://informatics.mit.edu)
This talk reviews emerging big data sources for social scientific analysis and explores the challenges these present. Many of these sources pose distinct challenges for acquisition, processing, analysis, inference, sharing, and preservation.
Dr Micah Altman is Director of Research and Head/Scientist, Program on Information Science for the MIT Libraries, at the Massachusetts Institute of Technology. Dr. Altman is also a Non-Resident Senior Fellow at The Brookings Institution. Prior to arriving at MIT, Dr. Altman served at Harvard University for fifteen years as the Associate Director of the Harvard-MIT Data Center, Archival Director of the Henry A. Murray Archive, and Senior Research Scientist in the Institute for Quantitative Social Sciences.
Dr. Altman conducts research in social science, information science and research methods -- focusing on the intersections of information, technology, privacy, and politics; and on the dissemination, preservation, reliability and governance of scientific knowledge.
How do we protect privacy of users in large-scale systems? How do we ensure fairness and transparency when developing machine learned models? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical and legal challenges encountered by researchers and practitioners alike. In this talk (presented at QConSF 2018), we first present an overview of privacy breaches as well as algorithmic bias / discrimination issues observed in the Internet industry over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving privacy and fairness in data-driven systems. We motivate the need for adopting a "privacy and fairness by design" approach when developing data-driven AI/ML models and systems for different consumer and enterprise applications. We also focus on the application of privacy-preserving data mining and fairness-aware machine learning techniques in practice, by presenting case studies spanning different LinkedIn applications, and conclude with the key takeaways and open challenges.
The term 'Data Scientist' arose fairly recently to express the specialised recruitment needs of certain well-known data-driven Silicon Valley firms. It signifies a mix of diverse and rare talents, mostly drawing from Computer Science (with emphasis on Big Data), Statistics and Machine Learning. In this talk, we will attempt to briefly survey the state-of-the-art both in terms of problems and solutions at the vanguard of Data Science. We will cover both novel developments, as well as centuries-old best practices, in an attempt to demonstrate that Data Science is indeed a Science, in the full sense of the word. This talk represents part of a seminar series that the speaker has given across the world, including Google (Mountainview), Cisco (San Jose) and Aviva Headquarters (London), and represents joint work with Professor David Hand (OBE).
Fairness, Transparency, and Privacy in AI @LinkedInC4Media
Video and slides synchronized, mp3 and slide download available at URL https://bit.ly/2V9zW73.
Krishnaram Kenthapadi talks about privacy breaches, algorithmic bias/discrimination issues observed in the Internet industry, regulations & laws, and techniques for achieving privacy and fairness in data-driven systems. He focusses on the application of privacy-preserving data mining and fairness-aware ML techniques in practice, by presenting case studies spanning different LinkedIn applications. Filmed at qconsf.com.
Krishnaram Kenthapadi is part of the AI team at LinkedIn, where he leads the transparency and privacy modeling efforts across different applications. He is LinkedIn's representative in Microsoft's AI and Ethics in Engineering & Research Committee. He shaped the technical roadmap for LinkedIn Salary product, and served as the relevance lead for the LinkedIn Careers & Talent Solutions Relevance team.
On Tuesday 18 September 2007, Ben Shneiderman gave a talk at the Centre for HCI Design, City University London, on the topic of information visualisation for high-dimensional spaces. Over 100 people from industry and academia attended the talk.
http://hcid.soi.cty.ac.uk/
- What is Clustering, Honeypots and Density Based Clustering?
- What is Optics Clustering and how is it different than DB Clustering? …and how
can it be used for outlier detection.
- What is so-called soft clustering and how is it different than clustering? …and how
can it be used for outlier detection.
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
This is a presentation we perform internally every quarter as part of our Data Science Brown Bag Series. This presentation was talking about different types of soft clustering techniques - all of which the team currently performs depending on the complexity of the data and the complexity of customer problems. If you are interested in learning more about working with L-3 Data Tactics or interested in working for the L-3 Data Tactics Data Science team please contact us soon! Thank you.
Characterizing Data and Software for Social Science ResearchMicah Altman
This presentation describes the landscape of data and software use across the social sciences in terms of the abstract dimensions of data and data use. It then examines three use cases.
Presentation for DASPOS < https://daspos.crc.nd.edu/index.php/workshops/workshop-2 > Workshop at JCDL.
Immersive Recommendation Workshop, NYC Media Lab'17Longqi Yang
The rapid evolution of deep learning technologies and the explosion of diverse user interaction traces have brought significant challenges and opportunities to recommendation and personalized systems. In this workshop, we discussed recent trends and techniques in user modeling and presented our work on immersive recommendation systems. These systems learn users’ preferences from diverse digital trace modalities (text, image and unstructured data streams) in a wide range of recommendation domains (creative art, food, news, and events). The workshop included a light tutorial on OpenRec, an open source framework that enables quick prototyping of complex recommender systems via modularization.
This workshop is based on research and development done at Cornell Tech as part of the Connected Experiences Lab, supported by Oath and NSF.
ML practitioners and advocates are increasingly finding themselves becoming gatekeepers of the modern world. The models you create have power to get people arrested or vindicated, get loans approved or rejected, determine what interest rate should be charged for such loans, who should be shown to you in your long list of pursuits on your Tinder, what news do you read, who gets called for a job phone screen or even a college admission... the list goes on. My goal in this talk is to summarize the kinds of disparate outcomes that are caused by cargo cult machine learning, and recent academic efforts to address some of them.
Similar to Privacy-preserving Data Mining in Industry (WWW 2019 Tutorial) (20)
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
Responsible AI in Industry: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? How do we protect the privacy of users when building large-scale AI based systems? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains such as hiring, lending, and healthcare. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of responsible AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
[Video available at https://sites.google.com/view/ResponsibleAITutorial]
Artificial Intelligence is increasingly being used in decisions and processes that are critical for individuals, businesses, and society, especially in areas such as hiring, lending, criminal justice, healthcare, and education. Recent ethical challenges and undesirable outcomes associated with AI systems have highlighted the need for regulations, best practices, and practical tools to help data scientists and ML developers build AI systems that are secure, privacy-preserving, transparent, explainable, fair, and accountable – to avoid unintended and potentially harmful consequences and compliance challenges.
In this tutorial, we will present an overview of responsible AI, highlighting model explainability, fairness, and privacy in AI, key regulations/laws, and techniques/tools for providing understanding around AI/ML systems. Then, we will focus on the application of explainability, fairness assessment/unfairness mitigation, and privacy techniques in industry, wherein we present practical challenges/guidelines for using such techniques effectively and lessons learned from deploying models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning many industries and application domains. Finally, based on our experiences in industry, we will identify open problems and research directions for the AI community.
Amazon SageMaker Clarify (https://aws.amazon.com/sagemaker/clarify/) provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. SageMaker Clarify detects potential bias during data preparation, after model training, and in your deployed model by examining attributes you specify. For instance, you can check for bias related to age in your initial dataset or in your trained model and receive a detailed report that quantifies different types of possible bias. SageMaker Clarify also includes feature importance graphs that help you explain model predictions and produces reports which can be used to support internal presentations or to identify issues with your model that you can take steps to correct.
For more information on Amazon SageMaker Clarify, please refer these links: (1) https://aws.amazon.com/sagemaker/clarify (2) https://aws.amazon.com/blogs/aws/new-amazon-sagemaker-clarify-detects-bias-and-increases-the-transparency-of-machine-learning-models (3) https://github.com/aws/amazon-sagemaker-clarify (4) Discussion and demo: https://youtu.be/cQo2ew0DQw0
Acknowledgments: Amazon SageMaker Clarify core team, Amazon AWS AI team, and partners across Amazon
Privacy in AI/ML Systems: Practical Challenges and Lessons LearnedKrishnaram Kenthapadi
How do we protect the privacy of users when building large-scale AI based systems? How do we develop machine learning models and systems taking fairness, accuracy, explainability, and transparency into account? Model fairness and explainability and protection of user privacy are considered prerequisites for building trust and adoption of AI systems in high stakes domains. We will first motivate the need for adopting a “fairness, explainability, and privacy by design” approach when developing AI/ML models and systems for different consumer and enterprise applications from the societal, regulatory, customer, end-user, and model developer perspectives. We will then focus on the application of privacy-preserving AI techniques in practice through industry case studies. We will discuss the sociotechnical dimensions and practical challenges, and conclude with the key takeaways and open challenges.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
How do we protect privacy of users when building large-scale AI based systems? How do we develop machine learned models and systems taking fairness, accountability, and transparency into account? With the ongoing explosive growth of AI/ML models and systems, these are some of the ethical, legal, and technical challenges encountered by researchers and practitioners alike. In this talk, we will first motivate the need for adopting a "fairness and privacy by design" approach when developing AI/ML models and systems for different consumer and enterprise applications. We will then focus on the application of fairness-aware machine learning and privacy-preserving data mining techniques in practice, by presenting case studies spanning different LinkedIn applications (such as fairness-aware talent search ranking, privacy-preserving analytics, and LinkedIn Salary privacy & security design), and conclude with the key takeaways and open challenges.
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, as well as critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we will first motivate the need for model interpretability and explainability in AI from societal, legal, customer/end-user, and model developer perspectives. [Note: Due to time constraints, we will not focus on techniques/tools for providing explainability as part of AI/ML systems.] Then, we will focus on the real-world application of explainability techniques in industry, wherein we present practical challenges / implications for using explainability techniques effectively and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We will present case studies across different companies, spanning application domains such as search and recommendation systems, sales, lending, and fraud detection. Finally, based on our experiences in industry, we will identify open problems and research directions for the research community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (KD...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we present open problems and research directions for the data mining / machine learning community.
[Video recording available at https://www.youtube.com/playlist?list=PLewjn-vrZ7d3x0M4Uu_57oaJPRXkiS221]
Artificial Intelligence is increasingly playing an integral role in determining our day-to-day experiences. Moreover, with proliferation of AI based solutions in areas such as hiring, lending, criminal justice, healthcare, and education, the resulting personal and professional implications of AI are far-reaching. The dominant role played by AI models in these domains has led to a growing concern regarding potential bias in these models, and a demand for model transparency and interpretability. In addition, model explainability is a prerequisite for building trust and adoption of AI systems in high stakes domains requiring reliability and safety such as healthcare and automated transportation, and critical industrial applications with significant economic implications such as predictive maintenance, exploration of natural resources, and climate change modeling.
As a consequence, AI researchers and practitioners have focused their attention on explainable AI to help them better trust and understand models at scale. The challenges for the research community include (i) defining model explainability, (ii) formulating explainability tasks for understanding model behavior and developing solutions for these tasks, and finally (iii) designing measures for evaluating the performance of models in explainability tasks.
In this tutorial, we present an overview of model interpretability and explainability in AI, key regulations / laws, and techniques / tools for providing explainability as part of AI/ML systems. Then, we focus on the application of explainability techniques in industry, wherein we present practical challenges / guidelines for effectively using explainability techniques and lessons learned from deploying explainable models for several web-scale machine learning and data mining applications. We present case studies across different companies, spanning application domains such as search & recommendation systems, hiring, sales, and lending. Finally, based on our experiences in industry, we identify open problems and research directions for the data mining / machine learning community.
Fairness-aware Machine Learning: Practical Challenges and Lessons Learned (WW...Krishnaram Kenthapadi
Researchers and practitioners from different disciplines have highlighted the ethical and legal challenges posed by the use of machine learned models and data-driven systems, and the potential for such systems to discriminate against certain population groups, due to biases in algorithmic decision-making systems. This tutorial presents an overview of algorithmic bias / discrimination issues observed over the last few years and the lessons learned, key regulations and laws, and evolution of techniques for achieving fairness in machine learning systems. We will motivate the need for adopting a "fairness by design" approach (as opposed to viewing algorithmic bias / fairness considerations as an afterthought), when developing machine learning based models and systems for different consumer and enterprise applications. Then, we will focus on the application of fairness-aware machine learning techniques in practice by presenting non-proprietary case studies from different technology companies. Finally, based on our experiences working on fairness in machine learning at companies such as Facebook, Google, LinkedIn, and Microsoft, we will present open problems and research directions for the data mining / machine learning community.
This talk provides an overview of privacy-preserving analytics and data mining systems at LinkedIn, highlighting the practical challenges/requirements, techniques, and lessons learned from deployment. The first part presents a framework to compute robust, privacy-preserving analytics, while the second part focuses on the privacy challenges/design for a large crowdsourced system (LinkedIn Salary). This presentation is an expanded version of the talk given at the Differential Privacy Deployed workshop, co-organized by Cynthia Dwork and held at Harvard / American Academy of Sciences in September, 2018.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
3. Fairness Privacy
Transparency Explainability
Related WWW’19 sessions:
1.Tutorial: Designing Equitable Algorithms for the Web
2.Tutorial: Economic Theories of Distributive Justice for Fair Machine Learning
3.Tutorial: Socially Responsible NLP
4.Tutorial: Fairness-aware Machine Learning in Practice
5.Tutorial: Explainable Recommendation and Search
6.Workshop: FATE and Society on the Web
7.Session: Fairness, Credibility, and Search (Wednesday, 10:30 – 12:30)
8.Session: Privacy and Trust (Wednesday, 16:00 – 17:30)
9.Special Track: Designing an Ethical Web (Friday)
5. Outline / Learning Outcomes
• Privacy breaches and lessons learned
• Evolution of privacy techniques
• Differential privacy: definition and techniques
• Privacy techniques in practice: Challenges and Lessons Learned
• Google’s RAPPOR
• Apple’s differential privacy deployment for iOS
• Privacy in AI @ LinkedIn (Analytics framework & LinkedIn Salary)
• Key Takeaways
6. Privacy: A Historical Perspective
Evolution of Privacy Techniques and Privacy Breaches
7. Privacy Breaches and Lessons Learned
Attacks on privacy
•Governor of Massachusetts
•AOL
•Netflix
•Web browsing data
•Facebook
•Amazon
•Genetic data
8. born July 31, 1945
resident of 02138
Massachusetts Group Insurance Commission (1997):
Anonymized medical history of state employees (all
hospital visits, diagnosis, prescriptions)
Latanya Sweeney (MIT grad student): $20 – Cambridge
voter roll
William Weld vs Latanya Sweeney
9. 64
%uniquely identifiable with
ZIP + birth date + gender
(in the US population)
Golle, “Revisiting the Uniqueness of Simple Demographics in the US Population”,
14. Oct 2006: Netflix announces Netflix
Prize
• 10% of their users
• average 200 ratings per user
Narayanan, Shmatikov (2006):
Netflix Prize
15. Deanonymizing Netflix Data
Narayanan, Shmatikov, Robust De-
anonymization of Large Datasets (How to
Break Anonymity of the Netflix Prize
Dataset), 2008
16. ● Noam Chomsky in Our Times
● Farenheit 9/11
● Jesus of Nazareth
● Queer as Folk
17. Key idea:
● Similar intuition as the attack on medical records
● Medical records: Each person can be identified
based on a combination of a few attributes
● Web browsing history: Browsing history is unique for
each person
● Each person has a distinctive social network links
appearing in one’s feed is unique
● Users likely to visit links in their feed with higher
probability than a random user
● “Browsing histories contain tell-tale marks of identity”
Su et al, De-anonymizing Web Browsing Data with Social Networks, 2017
De-anonymizing Web Browsing Data with Social Networks
20. 10 campaigns targeting 1 person (zip code, gender,
workplace, alma mater)
Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM
Facebook vs Korolova
Age
21
22
23
…
30
Ad Impressions in a week
0
0
8
…
0
21. 10 campaigns targeting 1 person (zip code, gender,
workplace, alma mater)
Korolova, “Privacy Violations Using Microtargeted Ads: A Case Study”, PADM
Facebook vs Korolova
Interest
A
B
C
…
Z
Ad Impressions in a week
0
0
8
…
0
22. ● Context: Microtargeted Ads
● Takeaway: Attackers can instrument ad campaigns to
identify individual users.
● Two types of attacks:
○ Inference from Impressions
○ Inference from Clicks
Facebook vs Korolova: Recap
24. Items frequently bought together
Bought: A B C D E
Z: Customers Who Bought This Item Also Bought
Calandrino, Kilzer, Narayanan, Felten, Shmatikov, “You Might Also Like: Privacy Risks of Collaborative
Attacking Amazon.com
A C D E
26. Homer et al., “Resolving individuals contributing trace
amounts of DNA to highly complex mixtures using high-
density SNP genotyping microarrays”, PLoS Genetics,
2008
Genetic data
29. “In all mixtures, the identification
of the presence of a person’s
genomic DNA was possible.”
30. Zerhouni, NIH Director:
“As a result, the NIH has removed from
open-access databases the aggregate
results (including P values and genotype
counts) for all the GWAS that had been
available on NIH sites”
… one week later
33. Dinur-Nissim
0 1 1 0 1 0 0 0 1 1 0 1Data
query: 𝚺
Dinur-Nissim 2003:
If error is o(√n), then reconstruction is possible up to n−o(n)
...even if 23.9% of errors are arbitrary [DMT07]
...even with O(n) queries [DY08]
34. Dwork-Naor
Tore Dalenius desideratum (aka as “semantic security”):
“Access to a statistical database should not enable one to
learn anything about an individual that could not be learned
without access.” (1977)
Dwork-Naor (~2006):
If the database teaches us anything, there is always some auxiliary
information that breaks Dalenius desideratum.
39. Differential Privacy
39
Databases D and D′ are neighbors if they differ in one person’s data.
Differential Privacy: The distribution of the curator’s output M(D) on database
D is (nearly) the same as M(D′).
CuratorCurator
+ your data
- your data
Dwork, McSherry, Nissim, Smith [TCC 2006]
40. ε-Differential Privacy: The distribution of the curator’s output M(D) on
database D is (nearly) the same as M(D′).
Differential Privacy
40
CuratorCurator
Parameter ε quantifies
information leakage
∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S].
+ your data
- your data
Dwork, McSherry, Nissim, Smith [TCC 2006]
41. ε-Differential Privacy: The distribution of the curator’s output M(D) on
database D is (nearly) the same as M(D′).
Differential Privacy
41
CuratorCurator
Parameter ε quantifies
information leakage
∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S]+𝛿.
Parameter 𝛿 gives
some slack
Dwork, Kenthapadi, McSherry, Mironov, Naor [EUROCRYPT 2006]
+ your data
- your data
42. 42
f(D) f(D′)
— bad outcomes
— probability with record x
— probability without record x
“Bad Outcomes” Interpretation
43. ● Prior on databases p
● Observed output O
● Does the database contain record x?
43
Bayesian Interpretation
44. Differential Privacy
● Robustness to auxiliary data
● Post-processing:
If M(D) is differentially private, so is f(M(D)).
● Composability:
Run two ε-DP mechanisms. Full interaction is 2ε-DP.
● Group privacy:
Graceful degradation in the presence of correlated inputs.
44
45. Differential Privacy: Laplace Mechanism
Define ℓ1-sensitivity of f: D→ℝn:
maxD,D′ ||f(D) − f(D′)||1 < 1,
then the Laplace mechanism
f(D) + Laplacen(1/ε)
offers ε-differential privacy.
Dwork, McSherry, Nissim, Smith, “Calibrating Noise to Sensitivity in Private Data Analysis”, TCC 2006
46. Differential Privacy: Gaussian Mechanism
If ℓ2-sensitivity of f:D→ℝn:
maxD,D′ ||f(D) − f(D′)||2 < 1,
then the Gaussian mechanism
f(D) + Nn(0, σ2)
offers (ε, δ)-differential privacy, where δ = ⅘·exp(-(εσ)2/2).
Dwork, Kenthapadi, McSherry, Mironov, Naor, “Our Data, Ourselves”, Eurocrypt 2006
47. What Differential Privacy Isn’t
● Algorithm, architecture, or a rule book
● Secure Computation: what not how
● All-encompassing guarantee: trends may be
sensitive too
50. Differential Privacy: Takeaway points
• Privacy as a notion of stability of randomized algorithms in
respect to small perturbations in their input
• Worst-case definition
• Robust (to auxiliary data, correlated inputs)
• Composable
• Quantifiable
• Concept of a privacy budget
• Noise injection
57. Differential Privacy
ε-Differential Privacy: The distribution of the output M(D) on database
D is (nearly) the same as M(D′) for all adjacent databases D and D′:
∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S].
58. Local Differential Privacy
ε-Differential Privacy: The distribution of the output M(D) on database
D is (nearly) the same as M(D′) for all adjacent databases D and D′:
∀S: Pr[M(D)∊S] ≤ exp(ε) ∙ Pr[M(D′)∊S].
59. Local-Differentially Private Mechanisms
● Stanley L. Warner, "Randomized response: a survey technique for
eliminating evasive answer bias", Journal of American Statistical
Association, March 1965.
● Arijit Chaudhuri, Rahul Mukerjee. Randomized
Response. Theory and Techniques. 1988.
60. Randomized Response (Warner 1965)
Q1: Are you a citizen of the United States?
Q2: Are you not a citizen of the United States?
𝜃 - the true fraction of citizens in the sample
Answer Q1 Answer Q2
p 1 − p
-DP
62. RAPPOR: two-level randomized response
Can we do repeated surveys of sensitive attributes?
— Average of randomized responses will reveal a user’s true answer :-(
Solution: Memoize! Re-use the same random answer
— Memoization can hurt privacy too! Long, random bit sequence can
be a unique tracking ID :-(
Solution: Use 2-levels! Randomize the memoized response
63. RAPPOR: two-level randomized response
● Store client value v into bloom filter B using hash functions
● Memoize a Permanent Randomized Response (PRR) B′
● Report an Instantaneous Randomized Response (IRR) S
64. RAPPOR: two-level randomized response
● Store client value v into bloom filter B using hash functions
● Memoize a Permanent Randomized Response (PRR) B′
● Report an Instantaneous Randomized Response (IRR) S
f = ½
q = ¾ , p = ½
65. RAPPOR: Life of a report
Value
Bloom
Filter
PRR
IRR
“www.google.com”
68. Differential privacy of RAPPOR
● Permanent Randomized Response satisfies differential privacy at
● Instantaneous Randomized Response has differential privacy at
= 4 ln(3)
= ln(3)
69. Differential Privacy of RAPPOR:
Measurable privacy bounds
Each report offers differential privacy with
ε = ln(3)
Attacker’s guess goes from 0.1% → 0.3% in the worst case
Differential privacy even if attacker gets all reports (infinite data!!!)
Also… Base Rate Fallacy prevents attackers from finding needles in
haystacks
70. Cohorts
Bloom Filter: 2 bits out of 128 — too many false positives
...
user 0xA0FE91B76:
google.com
cohort 2cohort 1 cohort 128
h2
72. From Raw Counts to De-noised Counts
True bit counts, with no noise
De-noised RAPPOR reports
73. From De-Noised Count to Distribution
True bit counts, with no noise
De-noised RAPPOR reports
google.com:
yahoo.com:
bing.com:
74. From De-Noised Count to Distribution
Linear Regression:
minX ||B - A X||2
LASSO:
minX (||B - A X||2)2 + λ||X||1
Hybrid:
1. Find support of X via LASSO
2. Solve linear regression to find weights
82. Google Chrome Privacy White Paper
https://www.google.com/chrome/browser/privacy/whitepaper.html
Phishing and malware protection
Google Chrome includes an optional feature called "Safe Browsing" to help protect you against phishing and malware attacks. This
helps prevent evil-doers from tricking you into sharing personal information with them (“phishing”) or installing malicious software
on your computer (“malware”). The approach used to accomplish this was designed specifically to protect your privacy and is also
used by other popular browsers.
If you'd rather not send any information to Safe Browsing, you can also turn these features off. Please be aware that Chrome will no
longer be able to protect you from websites that try to steal your information or install harmful software if you disable this feature.
We really don't recommend turning it off.
…
If a URL was indeed dangerous, Chrome reports this anonymously to Google to improve Safe Browsing. The data sent is randomized,
constructed in a manner that ensures differential privacy, permitting only monitoring of aggregate statistics that apply to tens of
thousands of users at minimum. The reports are an instance of Randomized Aggregatable Privacy-Preserving Ordinal Responses,
whose full technical details have been published in a technical report and presented at the 2014 ACM Computer and Communications
Security conference. This means that Google cannot infer which website you have visited from this.
88. Follow-up
- Bassily, Smith, “Local, Private, Efficient Protocols for Succinct
Histograms,” STOC 2015
- Kairouz, Bonawitz, Ramage, “Discrete Distribution Estimation under
Local Privacy”, https://arxiv.org/abs/1602.07387
- Qin et al., “Heavy Hitter Estimation over Set-Valued Data with Local
Differential Privacy”, CCS 2016
89. Key takeaway points
RAPPOR - locally differentially-private mechanism for reporting of
categorical and string data
● First Internet-scale deployment of differential privacy
● Explainable
● Conservative
● Open-sourced
96. Roadmap
1. Private frequency estimation with count-min-sketch
2. Private heavy hitters with puzzle piece algorithm
3. Private heavy hitters with tree histogram protocol
98. Private frequency oracle
Building block for private heavy hitters
𝑑2𝑑1 𝑑 𝑛
All errors within
𝛾 = O( 𝑛 log|𝒮|)
frequency
Words (𝒮)
𝛾
"phablet"
frequency("phablet")
99. Private frequency oracle:
Design constraints
Computational and communication constraints:
Client side:
Logarithm in size of the domain (|S|) and n
Communication to server:
very few bits
Server-side cost for one query:
size of the domain (|S|) and n
100. Private frequency oracle:
Design constraints
Computational and communication constraints:
Client side:
size of the domain (|S|) and n
# characters > 3,000
For 8-character words:
size of the domain |S|=3,000^8
number of clients ~ 1B
Efficiently [BS15] ~ n
Our goal ~ O(log |S|)
101. Private frequency oracle:
Design constraints
Computational and communication constraints:
Client side:
O(log |S|)
Communication to server:
O(1) bits
Server-side cost for one query:
O(log |S|)
102. Private frequency oracle
A starter solution: Randomized response
𝑑
0 1 0
𝑖
1 0 1
𝑖
Protects ε-differential privacy
(with the right bias)
Randomized response: d′
103. 1 0 0
1 1 0
1 0 1
+ With bias
correction
frequency
All domain elements
Error in each estimate:
Θ( 𝑛 log|𝒮|)
Optimal error under privacy
Private frequency oracle
A starter solution: Randomized response
104. Private frequency oracle
A starter solution: Randomized response
Computational and communication constraints:
Client side:
O(|S|)
Communication to server:
O(|S|) bits
Server-side cost for one query:
O(1)
1 0 1
𝑖
108. Private frequency oracle
Private count-min sketch
𝑑
Making client computation differentially private
0 01
0 01
0 01
1 01
1 00
0 00
𝑘𝜖-diff. private, since 𝑘 pieces of information
109. Private frequency oracle
Private count-min sketch
𝑑
Theorem: Sampling ensures 𝜖-differential privacy without hurting accuracy,
rather improves it by a factor of 𝑘
0 01 1 00
111. Private frequency oracle
Private count-min sketch
Reducing client communication
0 01 +1 +1-1
Hadamard transform
-1 +1
Communication: 𝑂(1) bit
Theorem: Hadamard transform and sampling
do not hurt accuracy
112. Private frequency oracle
Private count-min sketch
Computational and communication constraints:
Client side:
O(log |S|)
Communication to server:
O(1) bits
Server-side cost for one query:
O(log |S|)
Error in each estimate:
O( 𝑛log|𝒮|)
113. Roadmap
1. Private frequency estimation with count-min-sketch
2. Private heavy hitters with puzzle piece algorithm
3. Private heavy hitters with tree histogram protocol
114. Private heavy hitters:
Using the frequency oracle
Private frequency oracle
Private count-min sketch
Domain 𝒮
Too many elements in 𝒮 to search.
Element s in S
Frequency(s)
Find all s in S with
frequency > γ
115. Roadmap
1. Private frequency estimation with count-min-sketch
2. Private heavy hitters with puzzle piece algorithm
3. Private heavy hitters with tree histogram protocol
116. Puzzle piece algorithm
(works well in practice, no theoretical guarantees)
[Bassily Nissim Stemmer Thakurta, 2017 and Apple differential privacy team, 2017]
117. Private heavy hitters
Observation: If a word is frequent, its bigrams are frequent too.
Ph ab le t$ Frequency > 𝛾
Each bi-gram frequency > 𝛾
118. Private heavy hitters
Natural algorithm: Cartesian product of frequent bi-grams
Sanitized
bi-grams, and the
complete word
ab
ad
ph
ba
ab
ax
le
ab
ab
Position P1 Position P2 Position P3
le
ab
t$
Position P4
Frequent bi-grams
119. Private heavy hitters
ab
ad
ph
ba
ab
ax
le
ab
ab
Position P1 Position P2 Position P3
le
ab
t$
Position P4
Frequent bi-grams Candidate words
P1 x P2 x P3 x P4
Private frequency oracle
Private count-min sketch
Find frequent
words
Natural algorithm: Cartesian product of frequent bi-grams
120. Private heavy hitters
Natural algorithm: Cartesian product of frequent bi-grams
Candidate words
P1 x P2 x P3 x P4
Private frequency oracle
Find frequent
words
Combinatorial explosion
In practice, all bi-grams are frequent
Private count-min sketch
121. Puzzle piece algorithm
Ph ab le t$
≜
h=Hash(Phablet)
Hash: 𝒮 → 1, … , ℓ
Ph ab le t$h h h h
Privatized
bi-grams tagged
with the hash, and
the complete
word
122. Puzzle piece algorithm: Server side
ab 1
ad 5
Ph 3
ba 4
ab 3
ax 9
le 3
le 7
ab 1
Position P1 Position P2 Position P3
le 1
ab 9
t$ 3
Position P4
Frequent bi-grams tagged with {1, … , ℓ}
Candidate words
P1 x P2 x P3 x P4
Private frequency oracle
Find frequent
words
Combine only matching
bi-grams
Private count-min sketch
123. Roadmap
1. Private frequency estimation with count-min-sketch
2. Private heavy hitters with puzzle piece algorithm
3. Private heavy hitters with tree histogram protocol
125. Private heavy hitters:
Tree histograms (based on [CM05])
1 0 0
Any string in 𝒮:
log |𝒮| bits
Idea: Construct prefixes of the heavy hitter bit by bit
127. Private heavy hitters:
Tree histograms
0 1
Level 1: Frequent prefix of length 1
Use private frequency oracle
If a string is a heavy hitter, its prefixes are too.
129. Private heavy hitters:
Tree histograms
Level 2: Frequent prefix of length two
Idea: Each level has ≈ 𝑛 heavy hitters
00 01 10 11
130. Private heavy hitters:
Tree histograms
Computational and communication constraints:
Client side:
O(log |S|)
Communication to server:
O(1) bits
Server-side computation:
O(n log |S|)
Theorem: Finds all heavy hitters with frequency at least
𝑂( 𝑛 log|𝒮|)
131. Key takeaway points
• Keeping local differential privacy constant:
•One low-noise report is better than many noisy ones
•Weak signal with probability 1 is better than strong signal with small probability
• We can learn the dictionary – at a cost
• Longitudinal privacy remains a challenge
133. Microsoft: Discretization of continuous variables
"These guarantees are particularly strong when user’s behavior remains
approximately the same, varies slowly, or varies around a small number of
values over the course of data collection."
134. Microsoft's deployment
"Our mechanisms have been deployed by
Microsoft across millions of devices ... to protect
users’ privacy while collecting application usage
statistics."
B. Ding, J. Kulkarni, S. Yekhanin, NeurIPS 2017
136. Privacy in AI @ LinkedIn
• Framework to compute robust, privacy-preserving analytics
• Privacy challenges/design for a large crowdsourced system (LinkedIn Salary)
137. Analytics & Reporting Products at LinkedIn
Profile View Analytics
137
Content Analytics
Ad Campaign Analytics
All showing
demographics of
members engaging with
the product
138. • Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
Analytics & Reporting Products at LinkedIn
139. • Admit only a small # of predetermined query types
• Querying for the number of member actions, for a specified time period,
together with the top demographic breakdowns
E.g., Clicks on a
given adE.g., Title = “Senior
Director”
Analytics & Reporting Products at LinkedIn
140. Privacy Requirements
• Attacker cannot infer whether a member performed an action
• E.g., click on an article or an ad
• Attacker may use auxiliary knowledge
• E.g., knowledge of attributes associated with the target member (say,
obtained from this member’s LinkedIn profile)
• E.g., knowledge of all other members that performed similar action
141. Possible Privacy Attacks
141
Targeting:
Senior directors in US, who studied at Cornell
Matches ~16k LinkedIn members
→ over minimum targeting threshold
Demographic breakdown:
Company = X
May match exactly one person
→ can determine whether the person
clicks on the ad or not
Require minimum reporting threshold
Still amenable to attacks
(Refer our ACM CIKM’18 paper for details)
Rounding mechanism
E.g., report incremental of 10
Still amenable to attacks
E.g. using incremental counts over time to
infer individuals’ actions
Need rigorous techniques to preserve member privacy
(not reveal exact aggregate counts)
142. Key Product Desiderata
• Coverage & Utility
• Data Consistency
• for repeated queries
• over time
• between total and breakdowns
• across entity/action hierarchy
• for top k queries
143. Problem Statement
Compute robust, reliable analytics in a privacy-
preserving manner, while addressing the product
desiderata such as coverage, utility, and consistency.
144. Differential Privacy: Random Noise Addition
If ℓ1-sensitivity of f : D → ℝn:
maxD,D′ ||f(D) − f(D′)||1 = s,
then adding Laplacian noise to true output
f(D) + Laplacen(s/ε)
offers ε-differential privacy.
Dwork, McSherry, Nissim, Smith, “Calibrating Noise to Sensitivity in Private Data Analysis”, TCC 2006
145. PriPeARL: A Framework for Privacy-Preserving Analytics
K. Kenthapadi, T. T. L. Tran, ACM CIKM 2018
145
Pseudo-random noise generation, inspired by differential privacy
● Entity id (e.g., ad
creative/campaign/account)
● Demographic dimension
● Stat type (impressions, clicks)
● Time range
● Fixed secret seed
Uniformly Random
Fraction
● Cryptographic
hash
● Normalize to
(0,1)
Random
Noise
Laplace
Noise
● Fixed ε
True
Count
Noisy
Count
To satisfy consistency
requirements
● Pseudo-random noise → same query has same result over time, avoid
averaging attack.
● For non-canonical queries (e.g., time ranges, aggregate multiple entities)
○ Use the hierarchy and partition into canonical queries
○ Compute noise for each canonical queries and sum up the noisy
counts
147. Lessons Learned from Deployment (> 1 year)
• Semantic consistency vs. unbiased, unrounded noise
• Suppression of small counts
• Online computation and performance requirements
• Scaling across analytics applications
• Tools for ease of adoption (code/API library, hands-on how-to tutorial) help!
148. Summary
• Framework to compute robust, privacy-preserving analytics
• Addressing challenges such as preserving member privacy, product coverage, utility,
and data consistency
• Future
• Utility maximization problem given constraints on the ‘privacy loss budget’ per user
• E.g., noise with larger variance to impressions but less noise to clicks (or conversions)
• E.g., more noise to broader time range sub-queries and less noise to granular time range sub-
queries
• Reference: K. Kenthapadi, T. Tran, PriPeARL: A Framework for Privacy-Preserving
Analytics and Reporting at LinkedIn, ACM CIKM 2018.
• https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-at-
linkedin
149. Acknowledgements
•Team:
• AI/ML: Krishnaram Kenthapadi, Thanh T. L. Tran
• Ad Analytics Product & Engineering: Mark Dietz, Taylor Greason, Ian
Koeppe
• Legal / Security: Sara Harrington, Sharon Lee, Rohit Pitke
•Acknowledgements (in alphabetical order)
• Deepak Agarwal, Igor Perisic, Arun Swami
154. Current Reach (May 2019)
• A few million responses out of several millions of members targeted
• Targeted via emails since early 2016
• Countries: US, CA, UK, DE, IN, …
• Insights available for a large fraction of US monthly active users
155. Data Privacy Challenges
• Minimize the risk of inferring any one individual’s compensation data
• Protection against data breach
• No single point of failure
Achieved by a combination of
techniques: encryption, access control,
, aggregation,
thresholding
K. Kenthapadi, A. Chudhary, and S.
Ambler, LinkedIn Salary: A System
for Secure Collection and
Presentation of Structured
Compensation Insights to Job
Seekers, IEEE PAC 2017
(arxiv.org/abs/1705.06976)
156. Modeling Challenges
• Evaluation
• Modeling on de-identified data
• Robustness and stability
• Outlier detection
X. Chen, Y. Liu, L. Zhang, and K.
Kenthapadi, How LinkedIn
Economic Graph Bonds
Information and Product:
Applications in LinkedIn Salary,
KDD 2018
(arxiv.org/abs/1806.09063)
K. Kenthapadi, S. Ambler,
L. Zhang, and D. Agarwal,
Bringing salary transparency to
the world: Computing robust
compensation insights via
LinkedIn Salary, CIKM 2017
(arxiv.org/abs/1703.09845)
157. Problem Statement
•How do we design LinkedIn Salary system taking into
account the unique privacy and security challenges,
while addressing the product requirements?
158. Differential Privacy? [Dwork et al, 2006]
• Rich privacy literature (Adam-Worthmann, Samarati-Sweeney, Agrawal-Srikant, …,
Kenthapadi et al, Machanavajjhala et al, Li et al, Dwork et al)
• Limitation of anonymization techniques (as discussed in the first part)
• Worst case sensitivity of quantiles to any one user’s compensation data is
large
• Large noise to be added, depriving reliability/usefulness
• Need compensation insights on a continual basis
• Theoretical work on applying differential privacy under continual observations
• No practical implementations / applications
• Local differential privacy / Randomized response based approaches (Google’s RAPPOR; Apple’s
iOS differential privacy; Microsoft’s telemetry collection) not applicable
159. Title Region
$$
User Exp
Designer
SF Bay
Area 100K
User Exp
Designer
SF Bay
Area 115K
... ...
...
Title Region
$$
User Exp
Designer
SF Bay
Area 100K
De-identification Example
Title Region Company Industry Years of
exp
Degree FoS Skills
$$
User Exp
Designer
SF Bay
Area
Google Internet 12 BS Interactive
Media
UX,
Graphics,
...
100K
Title Region Industry
$$
User Exp
Designer
SF Bay
Area
Internet
100K
Title Region Years of
exp $$
User Exp
Designer
SF Bay
Area
10+
100K
Title Region Company Years of
exp $$
User Exp
Designer
SF Bay
Area
Google 10+
100K
#data
points >
threshold?
Yes ⇒ Copy to
Hadoop (HDFS) Note: Original submission stored as encrypted objects.
162. Collection & Storage
• Allow members to submit their compensation info
• Extract member attributes
• E.g., canonical job title, company, region, by invoking LinkedIn standardization services
• Securely store member attributes & compensation data
164. De-identification & Grouping
• Approach inspired by k-Anonymity [Samarati-Sweeney]
• “Cohort” or “Slice”
• Defined by a combination of attributes
• E.g, “User experience designers in SF Bay Area”
• Contains aggregated compensation entries from corresponding individuals
• No user name, id or any attributes other than those that define the cohort
• A cohort available for offline processing only if it has at least k entries
• Apply LinkedIn standardization software (free-form attribute canonical version)
before grouping
• Analogous to the generalization step in k-Anonymity
165. De-identification & Grouping
• Slicing service
• Access member attribute info &
submission identifiers (no
compensation data)
• Generate slices & track #
submissions for each slice
• Preparation service
• Fetch compensation data (using
submission identifiers), associate
with the slice data, copy to HDFS
171. Preventing Timestamp Join based Attacks
• Inference attack by joining these on timestamp
• De-identified compensation data
• Page view logs (when a member accessed compensation collection web interface)
• Not desirable to retain the exact timestamp
• Perturb by adding random delay (say, up to 48 hours)
• Modification based on k-Anonymity
• Generalization using a hierarchy of timestamps
• But, need to be incremental
• Process entries within a cohort in batches of size k
• Generalize to a common timestamp
• Make additional data available only in such incremental batches
172. Privacy vs Modeling Tradeoffs
• LinkedIn Salary system deployed in production for ~3 years
• Study tradeoffs between privacy guarantees (‘k’) and data available for
computing insights
• Dataset: Compensation submission history from 1.5M LinkedIn members
• Amount of data available vs. minimum threshold, k
• Effect of processing entries in batches of size, k
176. Key takeaway points
• LinkedIn Salary: a new internet application, with
unique privacy/modeling challenges
• Privacy vs. Modeling Tradeoffs
• Potential directions
• Privacy-preserving machine learning models in a practical setting
[e.g., Chaudhuri et al, JMLR 2011; Papernot et al, ICLR 2017]
• Provably private submission of compensation entries?
183. “Generalization Implies Privacy” Fallacy
Generalization
● average case
● model’s accuracy
Privacy
● worst case
● model’s parameters
184. “Generalization Implies Privacy” Fallacy
● Examples when it just ain’t so:
○ Person-to-person similarities
○ Support Vector Machines
● Models can be very large
○ Millions of parameters
195. Key takeaway points
• Notion of differential privacy is a principled foundation for privacy-
preserving data analyses
• Local differential privacy is a powerful technique appropriate for
Internet-scale telemetry
• Other techniques (thresholding, shuffling) can be combined with
differentially private algorithms or be used in isolation.
196. References
Differential privacy:
review "A Firm Foundation For Private Data Analysis", C. ACM 2011
by Dwork
book "The Algorithmic Foundations of Differential Privacy"
by Dwork and Roth
197. References
Google's RAPPOR:
paper "RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response", ACM CCS 2014,
Erlingsson, Pihur, Korolova
blog (https://security.googleblog.com/2014/10/learning-statistics-with-privacy-aided.html)
Apple's implementation:
article "Learning with Privacy at Scale", Apple ML J., Dec 2017
paper "Practical Locally Private Heavy Hitters", NIPS 2017,
by Bassily, Nissim, Stemmer, Thakurta
paper "Privacy Loss in Apple's Implementation of Differential Privacy on MacOS 10.12" by Tang,
Korolova, Bai, Wang, Wang
LinkedIn’s privacy-preserving analytics framework
paper "PriPeARL: A Framework for Privacy-Preserving Analytics and Reporting at LinkedIn", CIKM
2018, Kenthapadi, Tran
blog (https://engineering.linkedin.com/blog/2019/04/privacy-preserving-analytics-and-reporting-
at-linkedin)
LinkedIn Salary:
paper "LinkedIn Salary: A System for Secure Collection and Presentation of Structured
Compensation Insights to Job Seekers", IEEE PAC 2017, Kenthapadi, Chudhary, Ambler
blog (https://engineering.linkedin.com/blog/2017/12/statistical-modeling-for-linkedin-salary)
198. Fairness Privacy
Transparency Explainability
Related WWW’19 sessions:
1.Tutorial: Designing Equitable Algorithms for the Web
2.Tutorial: Economic Theories of Distributive Justice for Fair Machine Learning
3.Tutorial: Socially Responsible NLP
4.Tutorial: Fairness-aware Machine Learning in Practice
5.Tutorial: Explainable Recommendation and Search
6.Workshop: FATE and Society on the Web
7.Session: Fairness, Credibility, and Search (Wednesday, 10:30 – 12:30)
8.Session: Privacy and Trust (Wednesday, 16:00 – 17:30)
9.Special Track: Designing an Ethical Web (Friday)