P R E S E N TAT I O N T E M P L AT E
CORPORATEJoe Keating
Ethical Data Science - with Big Data, comes Big
Responsibility.
The Origins of Data Science
1962
• John W. Tukey
writes in “The
Future of Data
Analysis” that..
“data analysis is
intrinsically an
empirical science”.
• He Later
published
“Exploratory Data
Analysis”, arguing
that more emphasis
needed to be
placed on “using
data to suggest
hypotheses to test”.
1974
• Peter Naur
publishes the
“Concise Survey of
Computer
Methods” including
the following
definition of data
science:
• “The science of
dealing with data,
once they have
been established -
while the relation of
the data to what
they represent is
delegated to other
fields and sciences.”
1977
• The International
Association for
Statistical
Computing (IASC) is
established with the
following mission:
• “to link traditional
statistical
methodology,
modern computer
technology, and the
knowledge of
domain experts in
order to convert
data into
information and
knowledge.”
1996
• Members of the
International
Federation of
Classification
Societies (IFCS)
meet in Kobe,
Japan, for their
biennial conference.
• For the first time,
the term “data
science” is included
in the title of the
conference.
2009
• Nathan Yau writes
in “Rise of the Data
Scientist” that..
“We're seeing data
scientists—people
who can do it all—
emerge from the
rest of the pack”.
2011
• Harlan Harris writes
in “Data Science,
Moore’s Law, and
Moneyball” that..
“What Data
Scientists do has
been very well
covered, and it runs
the gamut from
data collection and
munging, through
application of
statistics and
machine learning
and related
techniques, to
interpretation,
communication,
and visualization of
the results.”
Traditional Machine Learning
•Targeted advertising based on our
online activity, whether it’s a YouTube
pre-roll ad or a targeted article on
Facebook.
Deep learning
•Scales to much larger volumes and
gives better results. However, we
don’t necessarily know how it works.
You can’t open up the lid and look
inside a black-box model.
Where is Data Science today?
• At the coalface of digital transformation.
• The opportunity for achieving social good within the field of data science is
huge.
• Data has become one of the most valuable commodities in the global
economy.
Typical Application Emerging Techniques
What About Bias and Ethics?
• A central challenge in building a fair model is to quantify some notion of
‘fairness’.
• Group vs. Individual Fairness
• Group fairness is the requirement that different groups of people should be treated the
same on average.
• Individual fairness is the requirement that individuals who are similar should be treated
similarly.
• Sample Bias vs. Label Bias in your Data
• Label bias occurs when the data-generating process systematically assigns labels
differently for different groups (i.e. Studies show that men with beards drink more = label
bias).
• Sample bias occurs when the data-generating process, samples from different groups in
different ways (i.e. More people are tested for drink driving in Dublin than elsewhere =
sample bias).
What is an Example of Bias?
In the financial industry, biased data may cause results that offend the
United States Equal Credit Opportunity Act (fair lending).
This law, enacted in 1974, prohibits credit discrimination based on race,
color, religion, national origin, sex, marital status, age or source of income.
While lenders will take steps not to include such data in a loan decision, it
may be possible to infer race in some cases using a zip code, for example.
The Impact - Consumer Financial Protection
Bureau
Federal law that places regulation of the financial
industry in the hands of the government
How do we eliminate Bias?
• To mitigate bias, data scientists need to understand the data and its
contexts before they even begin modelling algorithmic patterns.
• If bias is present in the data collected, the algorithm will carry this forward.
• Model outcomes will become more neutral if an algorithm is trained on
data that is pre-processed to minimize bias.
There is no silver bullet measurement which is guaranteed to detect
unfairness, choosing an appropriate definition of model fairness is task-
specific but should always be underpinned by an ethical code of conduct
and transparency.
Your data becomes your asset.
Contact us @ www.glantus.com
With

Joe keating - world legal summit - ethical data science

  • 1.
    P R ES E N TAT I O N T E M P L AT E CORPORATEJoe Keating Ethical Data Science - with Big Data, comes Big Responsibility.
  • 2.
    The Origins ofData Science 1962 • John W. Tukey writes in “The Future of Data Analysis” that.. “data analysis is intrinsically an empirical science”. • He Later published “Exploratory Data Analysis”, arguing that more emphasis needed to be placed on “using data to suggest hypotheses to test”. 1974 • Peter Naur publishes the “Concise Survey of Computer Methods” including the following definition of data science: • “The science of dealing with data, once they have been established - while the relation of the data to what they represent is delegated to other fields and sciences.” 1977 • The International Association for Statistical Computing (IASC) is established with the following mission: • “to link traditional statistical methodology, modern computer technology, and the knowledge of domain experts in order to convert data into information and knowledge.” 1996 • Members of the International Federation of Classification Societies (IFCS) meet in Kobe, Japan, for their biennial conference. • For the first time, the term “data science” is included in the title of the conference. 2009 • Nathan Yau writes in “Rise of the Data Scientist” that.. “We're seeing data scientists—people who can do it all— emerge from the rest of the pack”. 2011 • Harlan Harris writes in “Data Science, Moore’s Law, and Moneyball” that.. “What Data Scientists do has been very well covered, and it runs the gamut from data collection and munging, through application of statistics and machine learning and related techniques, to interpretation, communication, and visualization of the results.”
  • 3.
    Traditional Machine Learning •Targetedadvertising based on our online activity, whether it’s a YouTube pre-roll ad or a targeted article on Facebook. Deep learning •Scales to much larger volumes and gives better results. However, we don’t necessarily know how it works. You can’t open up the lid and look inside a black-box model. Where is Data Science today? • At the coalface of digital transformation. • The opportunity for achieving social good within the field of data science is huge. • Data has become one of the most valuable commodities in the global economy. Typical Application Emerging Techniques
  • 4.
    What About Biasand Ethics? • A central challenge in building a fair model is to quantify some notion of ‘fairness’. • Group vs. Individual Fairness • Group fairness is the requirement that different groups of people should be treated the same on average. • Individual fairness is the requirement that individuals who are similar should be treated similarly. • Sample Bias vs. Label Bias in your Data • Label bias occurs when the data-generating process systematically assigns labels differently for different groups (i.e. Studies show that men with beards drink more = label bias). • Sample bias occurs when the data-generating process, samples from different groups in different ways (i.e. More people are tested for drink driving in Dublin than elsewhere = sample bias).
  • 5.
    What is anExample of Bias? In the financial industry, biased data may cause results that offend the United States Equal Credit Opportunity Act (fair lending). This law, enacted in 1974, prohibits credit discrimination based on race, color, religion, national origin, sex, marital status, age or source of income. While lenders will take steps not to include such data in a loan decision, it may be possible to infer race in some cases using a zip code, for example.
  • 6.
    The Impact -Consumer Financial Protection Bureau Federal law that places regulation of the financial industry in the hands of the government
  • 7.
    How do weeliminate Bias? • To mitigate bias, data scientists need to understand the data and its contexts before they even begin modelling algorithmic patterns. • If bias is present in the data collected, the algorithm will carry this forward. • Model outcomes will become more neutral if an algorithm is trained on data that is pre-processed to minimize bias. There is no silver bullet measurement which is guaranteed to detect unfairness, choosing an appropriate definition of model fairness is task- specific but should always be underpinned by an ethical code of conduct and transparency.
  • 8.
    Your data becomesyour asset. Contact us @ www.glantus.com With

Editor's Notes

  • #2 Initial introductions.