Machine learning and data mining algorithms construct predictive models and decision making systems based on big data. Big data are the digital traces of human activities - opinions, preferences, movements, lifestyles, ... - hence they reflect all human biases and prejudices. Therefore, the models learnt from big data may inherit all such biases, leading to discriminatory decisions. In my talk, I discuss many real examples, from crime prediction to credit scoring to image recognition, and how we can tackle the problem of discovering discrimination using the very same approach: data mining.
Data and Ethics: Why Data Science Needs OneTim Rich
This was a talk I gave at SXSW 2016. It outlines the current state of applied ethics in data science as a profession. Describes key reasons a code should be constructed and also proposes a framework to begin discussion.
Main chapters
#1 THE NEXT WAVE OF DIGITAL TRANSFORMATIONS 06
#2 CONNECTED REALITY 2025: TRENDS AND DRIVERS 11
#3 CONNECTED MARKETS 2025: SIGNALS 33
#4 CONNECTED BUSINESS 2025: TRANSFORMATIONS 53
#5 CONNECTED LIVING 2025: ONE SCENARIO 61
#6 SMART WORLD OR NETWORKED NIGHTMARE? 66
Introduction
The next wave of digital transformations
The more digital networking takes hold of all aspects of our lives and all types of commercial transactions, the more it becomes a fundamental part of our daily reality – a changed reality, in which future generations will not be able to understand how it was possible to live with 'stupid things' that weren't permanently linked to the Cloud, nor how we managed to survive without goggles and information-forecasting services.
If, in a few years, we have become used to the constant availability of information about people, situations and things in our immediate surroundings thanks to technology about our person – so-called wearables, and if it has become the norm for intelligent products, houses and vehicles to 'recognise' us and to use networked services to cooperate and anticipate our requirements, then a world in which these magic properties are lacking will soon seem very strange to us.
Connected reality will set new parameters for businesses
Thus, value is increasingly being created in networks through the use of hyperconnectivity. The importance of individual companies is disappearing: connected reality means the key players will actually be 'business economic systems'. Manufacturers and service providers will offer complex solutions to customers' requirements, e.g. the use of wearable sensors in the field of smart health, providing cloud-based data analysis, medical diagnosis and nutritional advice that will make it possible for health to be monitored intensively in real time.
This creates a multitude of new challenges for businesses. Products that can be networked will generate a continuous stream of data, and new ways of creating value based on that data will have to be developed in order to generate added value from the data. Customer relations will come to be characterised more and more by real-time interaction. Increasingly, products and services will need to be developed and marketed as hybrid bundles. It will be necessary to open up the potential for smart automatisation along the entire value-creation chain.
Yet, as the pace of change becomes greater, the more important it becomes to evaluate the various trends and future developments in the round in order to gain sight of the big picture. This overview can then be used to guide strategic focus. This study represents a first step along this path.
Direction:
Andreas Neef, Klaus Burmeister
Authors:
Niels Boeing, Klaus Burmeister, Andreas Neef, Ben Rodenhäuser,
Willi Schroll
Find more and download also here: http://www.z-punkt.de/connected-reality2025-en.html
Data and Ethics: Why Data Science Needs OneTim Rich
This was a talk I gave at SXSW 2016. It outlines the current state of applied ethics in data science as a profession. Describes key reasons a code should be constructed and also proposes a framework to begin discussion.
Main chapters
#1 THE NEXT WAVE OF DIGITAL TRANSFORMATIONS 06
#2 CONNECTED REALITY 2025: TRENDS AND DRIVERS 11
#3 CONNECTED MARKETS 2025: SIGNALS 33
#4 CONNECTED BUSINESS 2025: TRANSFORMATIONS 53
#5 CONNECTED LIVING 2025: ONE SCENARIO 61
#6 SMART WORLD OR NETWORKED NIGHTMARE? 66
Introduction
The next wave of digital transformations
The more digital networking takes hold of all aspects of our lives and all types of commercial transactions, the more it becomes a fundamental part of our daily reality – a changed reality, in which future generations will not be able to understand how it was possible to live with 'stupid things' that weren't permanently linked to the Cloud, nor how we managed to survive without goggles and information-forecasting services.
If, in a few years, we have become used to the constant availability of information about people, situations and things in our immediate surroundings thanks to technology about our person – so-called wearables, and if it has become the norm for intelligent products, houses and vehicles to 'recognise' us and to use networked services to cooperate and anticipate our requirements, then a world in which these magic properties are lacking will soon seem very strange to us.
Connected reality will set new parameters for businesses
Thus, value is increasingly being created in networks through the use of hyperconnectivity. The importance of individual companies is disappearing: connected reality means the key players will actually be 'business economic systems'. Manufacturers and service providers will offer complex solutions to customers' requirements, e.g. the use of wearable sensors in the field of smart health, providing cloud-based data analysis, medical diagnosis and nutritional advice that will make it possible for health to be monitored intensively in real time.
This creates a multitude of new challenges for businesses. Products that can be networked will generate a continuous stream of data, and new ways of creating value based on that data will have to be developed in order to generate added value from the data. Customer relations will come to be characterised more and more by real-time interaction. Increasingly, products and services will need to be developed and marketed as hybrid bundles. It will be necessary to open up the potential for smart automatisation along the entire value-creation chain.
Yet, as the pace of change becomes greater, the more important it becomes to evaluate the various trends and future developments in the round in order to gain sight of the big picture. This overview can then be used to guide strategic focus. This study represents a first step along this path.
Direction:
Andreas Neef, Klaus Burmeister
Authors:
Niels Boeing, Klaus Burmeister, Andreas Neef, Ben Rodenhäuser,
Willi Schroll
Find more and download also here: http://www.z-punkt.de/connected-reality2025-en.html
The use of artificial intelligence in healthcare has the potential to assist healthcare providers in many aspects of patient care and administrative processes as well as improve patient outcomes.
AI analyzes data throughout a healthcare system to mine, automate and predict processes. Some of the use cases are :
1. Early Diagnosis of diseases
2. Improved clinical trial processes
3. Mental health apps etc.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
How should nonprofit leaders adjust to the new reality of operating under COVID-19? This detailed checklist can help you understand the actions needed to protect team health, improve financial resilience, and continue executing on your mission with clarity and impact.
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Ethics in Data Science and Machine LearningHJ van Veen
Introduction and overview on ethics in data science and machine learning, variations and examples of algorithmic bias, and a call-to-action for self-regulation. Given by Thierry Silbermann as part of the Sao Paulo Machine Learning Meetup, theme: "Ethics".
https://www.linkedin.com/in/thierrysilbermann
https://twitter.com/silbermannt
https://github.com/thierry-silbermann
INDIAN STATISTICAL INSTITUTE
Documentation Research & Training Centre
8th Mile, Mysore Road, RVCE Post
Bangalore-560 059
DRTC Seminar- 5
2014
Data Literacy
ABSTRACT
In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society. Data is slowly but steadily forcing their way into the societies. Data literacy may seem less technical than either Computer Science or any other fields. Still we need to envisage a wide variety of tools for accessing, converting and manipulating data. These require to understand relational databases (like MS Access), data manipulation techniques, statistical software tools (like Minitab, SPSS, STATA and MS Excel) and data representation software tools (like MS PowerPoint and MS Excel). This seminar includes an introduction on data literacy, its inter-relationship with information literacy and statistical literacy. It also includes various steps for working with data followed by short demonstration of data analysis techniques by using the software STATA11.
Speaker: Jayanta Kr. Nayek
Date:29 .10.2014. Time: 2 p.m.
Venue: DRTC, ISI Bangalore.
All are cordially invited.
Seminar Coordinator
Biswanath Dutta
This Isn't 'Big Data.' It's Just Bad Data.Peter Orszag
With response rates that have declined to under 10 percent, public opinion polls are increasingly unreliable. Perhaps even more concerning, though, is that the same phenomenon is hindering surveys used for official government statistics, including the Current Population Survey, the Survey of Income and Program Participation and the American Community Survey.
These slides show that the demand for most professions is growing steadily in spite of continued improvements in productivity enhancing tools for them. They also show that AI will have a largely incremental effect on the professions, in combination with Moore's Law, cloud computing, and Big Data. They do this accounting, legal, architects, journalists, and engineers.
Presentation - Racial and Gender Bias in AI by Gunay Kazimzade. Gunay Kazimzade is working at the Weizenbaum Institute for the Networked Society and she is also a Ph.D. student in Computer Science at the Technical University of Berlin. After Applied Mathematics and Computer Science degrees, she was involved in the education field and managed two social projects focused on women and children Computer Science education. Trained over 3000 women and children in Azerbaijan. Currently working with the Research Group "Criticality of Artificial Intelligence-based systems". Her main research directions are Gender and racial bias in AI, inclusiveness in AI and AI-enhanced education. She is a TEDx speaker participating and presenting in various conferences and summits happening in Europe.
As the author of “Big Data in Healthcare Hype and Hope,” Dr. Feldman has interviewed over 180 emerging tech and healthcare companies, always asking, “How can your new approach help patients?” Her research shows that data, as an enabling tool, has the power to give us critical new insights into not only what causes disease, but what comprises normal. Despite this promise, few patients have reaped the benefits of personalized medicine. A panel of leading big data innovators will discuss the evolving health data ecosystem and how big data is being leveraged for research, discovery, clinical trials, genomics, and cancer care. Case studies and real-life examples of what’s working, what’s not working, and how we can help speed up progress to get patients the right care at the right time will be explored and debated.
• Bonnie Feldman, DDS, MBA - Chief Growth Officer, @DrBonnie360
• Colin Hill - CEO, GNS Healthcare
• Jonathan Hirsch - Founder & President, Syapse
• Andrew Kasarskis, PhD - Co-Director, Icahn Institute for Genomics & Multiscale Biology; Associate Professor, Genetics & Genomic Studies, Icaahn School of Medicine at Mt. Sinai
• William King - CEO, Zephyr Health
New York eHealth Collaborative Digital Health Conference
November 18, 2014
The use of artificial intelligence in healthcare has the potential to assist healthcare providers in many aspects of patient care and administrative processes as well as improve patient outcomes.
AI analyzes data throughout a healthcare system to mine, automate and predict processes. Some of the use cases are :
1. Early Diagnosis of diseases
2. Improved clinical trial processes
3. Mental health apps etc.
My keynote talk at San Diego Superdata conference, looking at history and current state of Analytics and Data Mining, and examining the effects of Big Data
How should nonprofit leaders adjust to the new reality of operating under COVID-19? This detailed checklist can help you understand the actions needed to protect team health, improve financial resilience, and continue executing on your mission with clarity and impact.
My class presentation at USC. It gives an introduction about what is data science, machine learning, applications, recommendation system and infrastructure.
Ethics in Data Science and Machine LearningHJ van Veen
Introduction and overview on ethics in data science and machine learning, variations and examples of algorithmic bias, and a call-to-action for self-regulation. Given by Thierry Silbermann as part of the Sao Paulo Machine Learning Meetup, theme: "Ethics".
https://www.linkedin.com/in/thierrysilbermann
https://twitter.com/silbermannt
https://github.com/thierry-silbermann
INDIAN STATISTICAL INSTITUTE
Documentation Research & Training Centre
8th Mile, Mysore Road, RVCE Post
Bangalore-560 059
DRTC Seminar- 5
2014
Data Literacy
ABSTRACT
In our increasingly data-driven society, data literacy is an important civic skill which we should be developing in our society. Data is slowly but steadily forcing their way into the societies. Data literacy may seem less technical than either Computer Science or any other fields. Still we need to envisage a wide variety of tools for accessing, converting and manipulating data. These require to understand relational databases (like MS Access), data manipulation techniques, statistical software tools (like Minitab, SPSS, STATA and MS Excel) and data representation software tools (like MS PowerPoint and MS Excel). This seminar includes an introduction on data literacy, its inter-relationship with information literacy and statistical literacy. It also includes various steps for working with data followed by short demonstration of data analysis techniques by using the software STATA11.
Speaker: Jayanta Kr. Nayek
Date:29 .10.2014. Time: 2 p.m.
Venue: DRTC, ISI Bangalore.
All are cordially invited.
Seminar Coordinator
Biswanath Dutta
This Isn't 'Big Data.' It's Just Bad Data.Peter Orszag
With response rates that have declined to under 10 percent, public opinion polls are increasingly unreliable. Perhaps even more concerning, though, is that the same phenomenon is hindering surveys used for official government statistics, including the Current Population Survey, the Survey of Income and Program Participation and the American Community Survey.
These slides show that the demand for most professions is growing steadily in spite of continued improvements in productivity enhancing tools for them. They also show that AI will have a largely incremental effect on the professions, in combination with Moore's Law, cloud computing, and Big Data. They do this accounting, legal, architects, journalists, and engineers.
Presentation - Racial and Gender Bias in AI by Gunay Kazimzade. Gunay Kazimzade is working at the Weizenbaum Institute for the Networked Society and she is also a Ph.D. student in Computer Science at the Technical University of Berlin. After Applied Mathematics and Computer Science degrees, she was involved in the education field and managed two social projects focused on women and children Computer Science education. Trained over 3000 women and children in Azerbaijan. Currently working with the Research Group "Criticality of Artificial Intelligence-based systems". Her main research directions are Gender and racial bias in AI, inclusiveness in AI and AI-enhanced education. She is a TEDx speaker participating and presenting in various conferences and summits happening in Europe.
As the author of “Big Data in Healthcare Hype and Hope,” Dr. Feldman has interviewed over 180 emerging tech and healthcare companies, always asking, “How can your new approach help patients?” Her research shows that data, as an enabling tool, has the power to give us critical new insights into not only what causes disease, but what comprises normal. Despite this promise, few patients have reaped the benefits of personalized medicine. A panel of leading big data innovators will discuss the evolving health data ecosystem and how big data is being leveraged for research, discovery, clinical trials, genomics, and cancer care. Case studies and real-life examples of what’s working, what’s not working, and how we can help speed up progress to get patients the right care at the right time will be explored and debated.
• Bonnie Feldman, DDS, MBA - Chief Growth Officer, @DrBonnie360
• Colin Hill - CEO, GNS Healthcare
• Jonathan Hirsch - Founder & President, Syapse
• Andrew Kasarskis, PhD - Co-Director, Icahn Institute for Genomics & Multiscale Biology; Associate Professor, Genetics & Genomic Studies, Icaahn School of Medicine at Mt. Sinai
• William King - CEO, Zephyr Health
New York eHealth Collaborative Digital Health Conference
November 18, 2014
What is big data, and what are its potential benefits and risks?
Presentation given by Sir Mark Walport at the Oxford Martin School on 3 December 2013.
June 2015 (142) MIS Quarterly Executive 67The Big Dat.docxcroysierkathey
June 2015 (14:2) | MIS Quarterly Executive 67
The Big Data Industry1 2
Big Data receives a lot of press and attention—and rightly so. Big Data, the combination of
greater size and complexity of data with advanced analytics,3 has been effective in improving
national security, making marketing more effective, reducing credit risk, improving medical
research and facilitating urban planning. In leveraging easily observable characteristics and
events, Big Data combines information from diverse sources in new ways to create knowledge,
make better predictions or tailor services. Governments serve their citizens better, hospitals
are safer, firms extend credit to those previously excluded from the market, law enforcers catch
more criminals and nations are safer.
Yet Big Data (also known in academic circles as “data analytics”) has also been criticized as a
breach of privacy, as potentially discriminatory, as distorting the power relationship and as just
“creepy.”4 In generating large, complex data sets and using new predictions and generalizations,
firms making use of Big Data have targeted individuals for products they did not know they
needed, ignored citizens when repairing streets, informed friends and family that someone
is pregnant or engaged, and charged consumers more based on their computer type. Table 1
summarizes examples of the beneficial and questionable uses of Big Data and illustrates the
1 Dorothy Leidner is the accepting senior editor for this article.
2 This work has been funded by National Science Foundation Grant #1311823 supporting a three-year study of privacy online. I
wish to thank the participants at the American Statistical Association annual meeting (2014), American Association of Public Opin-
ion Researchers (2014) and the Philosophy of Management conference (2014), as well as Mary Culnan, Chris Hoofnagle and Katie
Shilton for their thoughtful comments on an earlier version of this article.
3 Both the size of the data set, due to the volume, variety and velocity of the data, as well as the advanced analytics, combine to
create Big Data. Key to definitions of Big Data are that the amount of data and the software used to analyze it have changed and
combine to support new insights and new uses. See also Ohm, P. “Fourth Amendment in a World without Privacy,” Mississippi.
Law Journal (81), 2011, pp. 1309-1356; Boyd, D. and Crawford, K. “Critical Questions for Big Data: Provocations for a Cultural,
Technological, and Scholarly Phenomenon,” Information, Communication & Society (15:5), 2012, pp. 662-679; Rubinstein, I. S.
“Big Data: The End of Privacy or a New Beginning?,” International Data Privacy Law (3:2), 2012, pp. 74-87; and Hartzog, W. and
Selinger, E. “Big Data in Small Hands,” Stanford Law Review Online (66), 2013, pp. 81-87.
4 Ur, B. et al. “Smart, Useful, Scary, Creepy: Perceptions of Online Behavioral Advertising,” presented at the Symposium On
Usable Privacy and Security, July 11-13, 2 ...
BigData & Supply Chain: A "Small" IntroductionIvan Gruer
In the frame of the master in logistic LOG2020, a brief presentation about BigData and its impacts on Supply Chains at IUAV.
Topics and contents have been developed along the research for the MBA final dissertation at MIB School of Management.
Algocracy and the state of AI in public administrations.Sandra Bermúdez
AI, as technical approach to solve problems, now is deploying in social systems and public administrations. What are the effects? the challenges? should we fear? What should we do?
Abstract:
Big Data concern large-volume, complex, growing data sets with multiple, autonomous sources. With the fast development of networking, data storage, and the data collection capacity, Big Data are now rapidly expanding in all science and engineering domains, including physical, biological and biomedical sciences. This paper presents a HACE theorem that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective. This data-driven model involves demand-driven aggregation of information sources, mining and analysis, user interest modeling, and security and privacy considerations. We analyze the challenging issues in the data-driven model and also in the Big Data revolution.
The REAL Impact of Big Data on PrivacyClaudiu Popa
The awesome promise of Big Data is tempered by the need to protect personal information. Data scientists must expertly navigate the legislative waters and acquire the skills to protect privacy and security. This talk provides enterprise leaders with answers and suggests questions to ask when the time comes to consider the vast opportunities offered by big data.
Unveiling the Power of Data Science.pdfKajal Digital
Data science is an interdisciplinary field that combines techniques from statistics, computer science, and domain expertise to extract insights and knowledge from data. It involves the collection, cleaning, analysis, and interpretation of data to make informed decisions and predictions. The goal is to uncover hidden patterns, trends, and correlations that might otherwise remain obscured.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...sameer shah
"Join us for STATATHON, a dynamic 2-day event dedicated to exploring statistical knowledge and its real-world applications. From theory to practice, participants engage in intensive learning sessions, workshops, and challenges, fostering a deeper understanding of statistical methodologies and their significance in various fields."
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
Global Situational Awareness of A.I. and where its headedvikram sood
You can see the future first in San Francisco.
Over the past year, the talk of the town has shifted from $10 billion compute clusters to $100 billion clusters to trillion-dollar clusters. Every six months another zero is added to the boardroom plans. Behind the scenes, there’s a fierce scramble to secure every power contract still available for the rest of the decade, every voltage transformer that can possibly be procured. American big business is gearing up to pour trillions of dollars into a long-unseen mobilization of American industrial might. By the end of the decade, American electricity production will have grown tens of percent; from the shale fields of Pennsylvania to the solar farms of Nevada, hundreds of millions of GPUs will hum.
The AGI race has begun. We are building machines that can think and reason. By 2025/26, these machines will outpace college graduates. By the end of the decade, they will be smarter than you or I; we will have superintelligence, in the true sense of the word. Along the way, national security forces not seen in half a century will be un-leashed, and before long, The Project will be on. If we’re lucky, we’ll be in an all-out race with the CCP; if we’re unlucky, an all-out war.
Everyone is now talking about AI, but few have the faintest glimmer of what is about to hit them. Nvidia analysts still think 2024 might be close to the peak. Mainstream pundits are stuck on the wilful blindness of “it’s just predicting the next word”. They see only hype and business-as-usual; at most they entertain another internet-scale technological change.
Before long, the world will wake up. But right now, there are perhaps a few hundred people, most of them in San Francisco and the AI labs, that have situational awareness. Through whatever peculiar forces of fate, I have found myself amongst them. A few years ago, these people were derided as crazy—but they trusted the trendlines, which allowed them to correctly predict the AI advances of the past few years. Whether these people are also right about the next few years remains to be seen. But these are very smart people—the smartest people I have ever met—and they are the ones building this technology. Perhaps they will be an odd footnote in history, or perhaps they will go down in history like Szilard and Oppenheimer and Teller. If they are seeing the future even close to correctly, we are in for a wild ride.
Let me tell you what we see.
Data ethics and machine learning: discrimination, algorithmic bias, and how to discover them. Dino Pedreschi
1. Data ethics and machine learning
Discrimination, algorithmic bias, and
how to discover them.
DINO PEDRESCHI
KDDLAB, DIPARTIMENTO DI INFORMATICA, UNIVERSITÀ DI PISA
6. Event Detection
Detecting events in a geographic area
classifying the different kinds of users.
City of Rome
Metropolitan area
Covered geographical region: city of Rome
Dataset size per snapshot: ≈ 1.2 GBytes per day
Number of records: ≈ 5.6 million lines per day
8 months between 2015 and 2016
14. Predicting GDP with Retail Market data
14
generic utility
function
(rationality)
personal utility
function
(diversity)
Product
Price
Quantity
Needed
Sophistication
R2 = 17.25% R2 = 32.38%
R2 = 85.72%
16. Big Data, Big Risks
Big data is algorithmic, therefore it cannot be biased! And yet…
• All traditional evils of social discrimination, and many new ones, exhibit
themselves in the big data ecosystem
• Because of its tremendous power, massive data analysis must be used
responsibly
• Technology alone won’t do: also need policy, user involvement and
education efforts
16
17. By 2018, 50% of business ethics
violations will occur through
improper use of big data analytics
[source: Gartner, 2016]
AI and Big Data 17
20. The danger of black boxes - 1
The COMPAS score (Correctional Offender Management Profiling for
Alternative Sanctions)
A 137-questions questionnaire and a predictive model for “risk of
crime recidivism.” The model is a proprietary secret of Northpointe,
Inc.
The data journalists at propublica.org have shown that
• the prediction accuracy of recidivism is rather low (around 60%)
• the model has a strong ethnic bias
◦ blacks who did not reoffend are classified as high risk twice as much as
whites who did not reoffend
◦ whites who did reoffend were classified as low risk twice as much as
blacks who did reoffend.
AI and Big Data 20
21. The danger of black boxes -2
The three major US credit bureaus, Experian, TransUnion, and
Equifax, providing credit scoring for millions of individuals, are
often discordant.
In a study of 500,000 records, 29% of consumers received credit
scores that differ by at least fifty points between credit bureaus, a
difference that may mean tens of thousands dollars over the life of
a mortgage [CRS+16].
AI and Big Data 21
22. The danger of black boxes - 3
In 2010, some homeowners with a regular payment
history of their mortgage reported a sudden drop of forty
points in their credit score, soon after their own enquiry.
AI and Big Data 22
23. The danger of black boxes - 4
During the 1970s and 1980s, St. George’s Hospital
Medical School in London used a computer program for
initial screening of job applicants.
The program used information from applicants’ forms,
which contained no reference to ethnicity.
The program was found to unfairly discriminate against
female applicants and ethnic minorities (inferred from
surnames and place of birth), less likely to be selected for
interview [LM88].
AI and Big Data 23
24. The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
AI and Big Data 24
25. The danger of black boxes - 5
In a recent paper at SIGKDD 2016 [RSG16] the authors
show how an accurate but untrustworthy classifier may
result from an accidental bias in the training data.
In a task of discriminating wolves from huskies in a
dataset of images, the resulting deep learning model is
shown to classify a wolf in a picture based solely on …
the presence of snow in the background!
[RSG16] “Why Should I Trust You?” Explaining the Predictions of Any Classifier
SIGKDD 2016 Conference Paper
AI and Big Data 25
26. Deep learning is creating computer
systems we don't fully understand
www.theverge.com/2016/7/12/12158238/first-click-deep-learning-algorithmic-
black-boxes
AI and Big Data 26
27. Is AI Permanently Inscrutable?
nautil.us/issue/40/learning/is-artificial-intelligence-permanently-inscrutable
27
28. The danger of black boxes - 6
In a recent study at Princeton Univ, the authors show
how the semantics derived automatically from large
text/web corpora contains human biases
◦ E.g., names associated with whites were found to be
significantly easier to associate with pleasant than
unpleasant terms, compared to names associated with
black people.
Therefore, any machine learning model trained on text
data for, e.g., sentiment or opinion mining has a strong
chance of inheriting the prejudices reflected in the
human-produced training data.
AI and Big Data 28
31. As we stated in our 2008 SIGKDD paper that started the field of
discrimination-aware data mining [PRT08]:
“learning from historical data recording human decision making
may mean to discover traditional prejudices that are endemic in
reality, and to assign to such practices the status of general rules,
maybe unconsciously, as these rules can be deeply hidden within
the learned classifier.”
AI and Big Data 31
35. U.S. – White House
Salvatore Ruggieri 35
www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1
_2014.pdf (May 2014)
36. U.S. – White House
Salvatore Ruggieri
36
www.whitehouse.gov/sites/default/files/microsites/ostp/2016_0504_data_disc
rimination.pdf (May 2016)
37. U.S. – White House
www.whitehouse.gov/sites/default/files/whitehouse_files/microsites/ostp/NST
C/preparing_for_the_future_of_ai.pdf (October 2016)
AI and Big Data 37
42. Value-Sensitive Design
Design for privacy
Design for security
Design for inclusion
Design for sustainability
Design for democracy
Design for safety
Design for transparency
Design for accountability
Design for human capabilities
AI and Big Data 42
43. EU Projects: SoBigData.eu
Social Mining & Big Data Ecosystem project (SoBigData, H2020-INFRAIA-2014-2015,
duration: 2015-2019, www.sobigdata.eu
AI and Big Data 43
44. Master Universitario Di II Livello
BigData Technology
BigData Sensing&Procurement
BigData Mining
BigData StoryTelling
BigData Ethics
Il Master Big Data ha l’obiettivo di formare“data scientists”,dei
professionisti dotati di un mix di competenze multidisciplinari
che permettono non solo di acquisire dati ed estrarne conos-
cenza, ma anche di raccontare“storie” attraverso questi dati, a
supporto delle decisioni, della creatività e dello sviluppo di
servizi innovativi, e di saper gestire le ripercussioni etiche e
legali dei Big Data, che spesso contengono informazioni
personali e suscitano problematiche relative alla privacy, alla
trasparenza,alla consapevolezza.
Aree di innovazione socio-economica:
BigData for Social Good
BigData forBusiness
Big Data AnalyticsESocial Mining
SoBigData
Data Ethics Literacy
Rapporto MIUR su Big Data, 28 Luglio 2016
◦ www.istruzione.it/allegati/2016/bigdata.pdf
Master UNIPI in Big Data Analytics & Social Mining
◦ masterbigdata.it
AI and Big Data 44
47. Discrimination discovery
Given:
◦ an historical database of decision records, each describing
features of an applicant to a benefit
◦ e.g., a credit request to a bank and the corresponding on credit approval/denial
◦ some designated categories of applicants, such as groups
protected by anti-discrimination laws,
find whether, and in which circumstances, there are
evidences of discrimination of the designated categories
that emerge from the data.
DCUBE: Discrimination Discovery in Databases 47
49. How? Fight with the same weapons
Idea: use data mining to discover discrimination
◦ the decision policies hidden in a database can be represented by
decision rules and discovered by frequent pattern mining
◦ Once found all such decision rules, highlight all potential niches
of discrimination by filtering the rules using a measure that
quantifies the discrimination risk.
DCUBE: Discrimination Discovery in Databases 49
50. Discrimination discovery from data
FOREIGN_WORKER=yes
& PURPOSE=new_car & HOUSING=own
CREDIT=bad
◦ elift = 5,19 supp = 56 conf = 0,37
elift = 5,19 means that foreign workers have more than 5
times more probability of being refused credit than the
average population (even if they own their house).
50
51. Outcome:
Funded
Not funded
Conditionally funded
Case Study: grant evaluation
51
53. A potentially discriminatory rule
Antecedent
◦ Project proposals in “Physical and Analytical
Chemical Sciences”
◦ Young females
◦ Total cost of 1,358,000 Euros or above
Possible interpretation
◦ “Peer-reviewers of panel PE4 trusted young females
requiring high budgets less than males leading
similar projects”
53
54. Case study: US Harmonized Tariff System
US Harmonized Tariff System (HTS)
https://hts.usitc.gov/
Detailed tariff classification system for
merchandise imported to US
Chapter 61, 62, 64, 65: apparels
◦ Different taxes for same garments
separately produced for male and female
◦ Description is at semi-structured form
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
64.4¢/kg + 18.8%96¢/doz + 1.4%8.5%Women and
girls
38.6¢/kg + 10%08.9%Men and boys
CoatsFur felt hatsCotton pajamas
Different
taxes for
same
apparels for
men and
women
54
Women: 14%
Men: 9%
1.3 billions USD!!!
55. AI and Big Data 55
Totes-Isotoner Corp. v. U.S.
Rack Room Shoes Inc. and
Forever 21 Inc. vs U.S.
Court of International Trade
U.S. Court of Appeals for the Federal
Circuit (2014)
“[…] the courts may have concluded that
Congress had no discriminatory intent when
ruling the HTS, but there is little
doubt that gender-based tariffs have
discriminatory impact”
63. Right of explanation
• Applying AI within many domains requires
transparency and responsibility:
• health care
• finance
• surveillance
• autonomous vehicles
• Government
• EU General Data Protection Regulation (April
2016) establishes (?) a right of explanation
for all individuals to obtain “meaningful
explanations of the logic involved” when
automated (algorithmic) individual decision-
making, including profiling, takes place.
• In sharp contrast, (big) data-driven AI/ML
models are often black boxes.
AI and Big Data 63
64. Accountability
“Why exactly was my loan application rejected?”
“What could I have done differently so that my application
would not have been rejected?”
AI and Big Data 64