Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
LSNTAP
Data Ethics when Designing Civil
Justice Interventions
May 19th, 2016
Using Go To Webinar
 Calling with phone? Select Telephone and enter your audio
pin if you haven’t already.
• Calling thro...
Make sure you get our
infographic after the
training!
Speakers
Solon Barocas,
Research Associate, Center for
Information Technology Policy at
Princeton University
Ali Lange,
Po...
Agenda
Introduction Data Ethics When
Designing Civil Justice
Interventions
Wilneida
Topic 1 How Machines Learn to
Discrimi...
What’s Big Data?
Structured, semi-structured,
unstructured data rom
traditional and digital sources
inside and outside you...
Big data has the power to improve lives, and
often does.
But absent of a human touch, its single-minded efficiency can lead to
troubling patterns which can isolate groups already ...
 Discover useful regularities in a
dataset that are just
preexisting patterns of
exclusion and inequality.
 Inherit the ...
We and all systems
we produce are
biased.
What Now?
Develop a plan
We have a personal
responsibility to ourselves and
our clients to address the
ethical, security, and privac...
 Quality: Have you accounted for biases at
both the collection and analytics stages of
big data’s life cycle?
 Accuracy:...
 Outline reasons for collecting personal,
community, or demographic identifiable
information;
 Identify communities whic...
Step 3: Know the consumer protection laws applicable to big data
practices
 Fair Credit Reporting Act
 Equal opportunity...
 Evaluate the data literacy of your
clients;
 Identify low-hanging fruit ways to
educate your clients;
 Be transparent ...
With increasing use of predictive
analytics, triage algorithms, justice
portals, expert systems, and
document assembly wil...
How Machines Learn to
Discriminate
Solon Barocas
Center for Information Technology Policy
Princeton University
Discrimination Law: Two Doctrines
Disparate Treatment
Formal
Intentional
Disparate Impact
Unjustified
Avoidable
“Protected...
Uncounted, Unaccounted, Discounted
• The quality and representativeness of records might vary in ways that
correlate with ...
Dealing with Tainted Examples
• Training data serve as ground truth
• These would seem like well performing models accordi...
Settling on a Selection of Features
• Does the feature set provide sufficient information to carve-up the
population in a ...
HighLow
Benefit Harm
• Equal treatment in the
marketplace  Common
level of service and
uniform price
• Socialization of r...
Dealing with “Redundant Encodings”
• In many instances, making accurate determinations will mean
considering factors that ...
Let’s not Forsake Formalization
• These moments of translation are opportunities to debate the very
nature of the problem—...
Solon Barocas and Andrew Selbst,
“Big Data’s Disparate Impact,”
California Law Review, Vol. 104, 2016
Solon Barocas
Center...
Digital Decisions:
May 19, 2016
Advocacy Perspective on the Risks and
Benefits of Data-Driven Automated Decision
Making
The Center for Democracy & Technology
About CDT
The Center for Democracy & Technology is a nonpartisan, nonprofit
technolo...
Project Summary
The Digital Decisions Project
Sophisticated statistical analysis is a pillar of decision making in the 21s...
Background
Automated Decision Making Systems
● Are present in all sectors
● Have a varying degree of importance or impact ...
Background - Disparate Impact and Nature of Harms
Some harms are more immediate for individuals, and others are cumulative...
Background -- Seeking Solutions
● Stop High-Tech Profiling
● Ensure Fairness in Automated Decisions
● Preserve Constitutio...
Background
● Stakes are high for consumers (individuals)
● Diversity of contexts makes hard-and-fast rules difficult to co...
Digital Decisions--Phases of Automation
Digital Decisions--Phases of Automation
Design: Identify Inputs
● What is the source of your data?
● Was the data collecte...
Digital Decisions--Phases of Automation
Build: Model Construction
● Do your rules rely on generalizations or cultural assu...
Digital Decisions--Phases of Automation
Test:
● What is the acceptable error rate before going to market?
● Is the error r...
Digital Decisions--Phases of Automation
Implement:
● What is the impact on individuals of a false positive / false negativ...
Digital Decisions--Phases of Automation
Evaluate and Refine
● Does the outcome provide contextual information that helps a...
Digital Decisions--How can you use this?
Data and ethics Training
Data and ethics Training
Data and ethics Training
Data and ethics Training
Upcoming SlideShare
Loading in …5
×

Data and ethics Training

196 views

Published on

Data ethics for legal services orginizations.

Published in: Law
  • Be the first to comment

  • Be the first to like this

Data and ethics Training

  1. 1. LSNTAP Data Ethics when Designing Civil Justice Interventions May 19th, 2016
  2. 2. Using Go To Webinar  Calling with phone? Select Telephone and enter your audio pin if you haven’t already. • Calling through Computer? If you’re using a microphone and headset or speakers (VoIP), please select Mic & Speakers. • Have questions? Yes! Please help us make this as relevant to you as possible. We’ll reserve the last 10 minutes for questions, but, feel free to add any questions in the Go to Meeting Question Box. • Is this being recorded? Yes. LSNTAP will distribute the information after the training.
  3. 3. Make sure you get our infographic after the training!
  4. 4. Speakers Solon Barocas, Research Associate, Center for Information Technology Policy at Princeton University Ali Lange, Policy Analyst, Center for Democracy and Technology's Consumer Privacy Project Wilneida Negron Digital Officer, Florida Justice Technology Center/Fellow at Data and Society Research Institute
  5. 5. Agenda Introduction Data Ethics When Designing Civil Justice Interventions Wilneida Topic 1 How Machines Learn to Discriminate Solon Topic 2 Digital Decision-Making Ali Questions?
  6. 6. What’s Big Data? Structured, semi-structured, unstructured data rom traditional and digital sources inside and outside your organization that provide the ability for ongoing discovery and analysis.
  7. 7. Big data has the power to improve lives, and often does.
  8. 8. But absent of a human touch, its single-minded efficiency can lead to troubling patterns which can isolate groups already at society’s margins.
  9. 9.  Discover useful regularities in a dataset that are just preexisting patterns of exclusion and inequality.  Inherit the prejudice and biases of prior decision- makers. Research has found that big data analytics can:
  10. 10. We and all systems we produce are biased.
  11. 11. What Now?
  12. 12. Develop a plan We have a personal responsibility to ourselves and our clients to address the ethical, security, and privacy challenges that arise when working with data.
  13. 13.  Quality: Have you accounted for biases at both the collection and analytics stages of big data’s life cycle?  Accuracy: Is your data representative? If not, take steps to address issues of under or over representation  Usability: Do you have the proper staffing to undertake data analytics? . Step 1: Know your data Federal Trade Commission Recommends:
  14. 14.  Outline reasons for collecting personal, community, or demographic identifiable information;  Identify communities which could be adversely affected, and how?  Can your data reinforce existing disparities in terms of: ethnicity, identity, gender, race, class, sexuality, disability, language, religion, size, citizenship status, geography, etc.? Step 2: Examine data sensitivities for communities hoping to serve
  15. 15. Step 3: Know the consumer protection laws applicable to big data practices  Fair Credit Reporting Act  Equal opportunity laws:  Equal Credit Opportunity Act (“ECOA”)  Title VII of the Civil Rights Act of 1964  Americans with Disabilities Act,  Age Discrimination in Employment Act  Fair Housing Act, and  Genetic Information Nondiscrimination Act.  Federal Trade Commission Act  State and local laws
  16. 16.  Evaluate the data literacy of your clients;  Identify low-hanging fruit ways to educate your clients;  Be transparent and let them know how their data is being used. Step 4: Include and empower your clients!
  17. 17. With increasing use of predictive analytics, triage algorithms, justice portals, expert systems, and document assembly will the civil justice community soon need:  Multi-disciplinary Data Ethics committee?  Institutional Review Boards?  Responsible Data Program Managers? Wormhole into the future
  18. 18. How Machines Learn to Discriminate Solon Barocas Center for Information Technology Policy Princeton University
  19. 19. Discrimination Law: Two Doctrines Disparate Treatment Formal Intentional Disparate Impact Unjustified Avoidable “Protected Class”
  20. 20. Uncounted, Unaccounted, Discounted • The quality and representativeness of records might vary in ways that correlate with class membership • less involved in the formal economy and its data-generating activities • unequal access to and less fluency in the technology necessary to engage online • more likely to avoid contact with specific institutions • less profitable customers or less important constituents and therefore less interesting as targets of observation • Convenience Sample • Data gathered for routine business or government purposes tend to lack the rigor of social scientific data collection • Analysts may not have some alternative or independent mechanism for determining the composition of the population
  21. 21. Dealing with Tainted Examples • Training data serve as ground truth • These would seem like well performing models according to standard evaluation methods • What the objective assessment should have been • Accepted and rejected candidates may not differ only in terms of protected characteristics • How someone would have performed under different, non- discriminatory circumstances • The difficulty in dealing with counterfactuals and correcting for past injustices
  22. 22. Settling on a Selection of Features • Does the feature set provide sufficient information to carve-up the population in a way that reveals relevant variations within each apparent sub-group? • Unintentional redlining • In other words: How does the error rate vary across the population? • Discrimination can be an artifact of statistical reasoning rather than prejudice on the part of decision-makers or bias in the composition of the dataset • Does the difficulty or cost involved in obtaining the information necessary to bring accuracy rates into closer parity justify subjecting certain populations to worse assessment? • Parity = Fair • Accurate = Fair
  23. 23. HighLow Benefit Harm • Equal treatment in the marketplace  Common level of service and uniform price • Socialization of risk • Discovering attractive customers and candidates in populations previously dismissed out of hand  Financial inclusion • Evidence-based and formalized decision- making • Less favorable treatment in the marketplace  Finding specific customers not worth servicing (e.g., firing the customer) • Individualization of risk • Underserving large swaths of the market  Redlining • Informal decision heuristics plagued by prejudice and implicit bias GranularityoftheData Effects on historically disadvantaged communities
  24. 24. Dealing with “Redundant Encodings” • In many instances, making accurate determinations will mean considering factors that are somehow correlated with legally proscribed features • There is no obvious way to determine how correlated a relevant attribute or set of attributes must be with proscribed features to be worrisome • Nor is there a self-evident way to determine when an attribute or set of attributes is sufficiently relevant to justify its consideration, despite the fact that it is highly correlated with these features
  25. 25. Let’s not Forsake Formalization • These moments of translation are opportunities to debate the very nature of the problem—and to be creative in parsing it • The process of formalization can make explicit the beliefs, values, and goals that motivate a project
  26. 26. Solon Barocas and Andrew Selbst, “Big Data’s Disparate Impact,” California Law Review, Vol. 104, 2016 Solon Barocas Center for Information Technology Policy Princeton University sbarocas@princeton.edu
  27. 27. Digital Decisions: May 19, 2016 Advocacy Perspective on the Risks and Benefits of Data-Driven Automated Decision Making
  28. 28. The Center for Democracy & Technology About CDT The Center for Democracy & Technology is a nonpartisan, nonprofit technology policy advocacy organization. The internet empowers, emboldens, and equalizes people around the world. We are dedicated to protecting civil liberties and human rights online. CDT is known for: ● Convening industry representatives, researchers, government officials, and civil rights advocates ● Bringing academic rigor to advocacy work ● Policy recommendations are informed by technical, as well as legal, expertise
  29. 29. Project Summary The Digital Decisions Project Sophisticated statistical analysis is a pillar of decision making in the 21st Century, including employment, lending, and policing. Automated systems also mediate our access to information and community through search results and social media. These technologies are pivotal to day- to-day life, but the processes that govern them are not transparent. CDT is working with stakeholders to develop guidance that ensures the rights of individuals, encourages innovation and design incentives that promote responsible use of automated decision-making technology. 32
  30. 30. Background Automated Decision Making Systems ● Are present in all sectors ● Have a varying degree of importance or impact on individuals ● What is unique about data-driven discrimination? ○ The speed and extent of the technology increase the potential for its obscurity to frustrate and disenfranchise people Civil rights and privacy advocates have expressed concern that this erodes accountability and fairness.
  31. 31. Background - Disparate Impact and Nature of Harms Some harms are more immediate for individuals, and others are cumulative and may be more visible when looking at the impact on a group or society at large. ● Insult to dignity ● Discrimination on the basis of a protected class ● Exacerbating and/or perpetuating historic inequality ● Error disproportionately impacts a particular group 34
  32. 32. Background -- Seeking Solutions ● Stop High-Tech Profiling ● Ensure Fairness in Automated Decisions ● Preserve Constitutional Principles ● Enhance Individual Control of Personal Information ● Protect People from Inaccurate Data Signatories: American Civil Liberties Union, Asian Americans Advancing Justice, Center for Media Justice, ColorOfChange, Common Cause, Free Press, The Leadership Conference on Civil and Human Rights, NAACP, National Council of La Raza, National Hispanic Media Coalition, National Urban League, NOW Foundation, New America Foundation’s Open Technology Institute, and Public Knowledge. “Civil Rights Principles for the Era of Big Data”
  33. 33. Background ● Stakes are high for consumers (individuals) ● Diversity of contexts makes hard-and-fast rules difficult to conceive or apply ● Look to existing examples for guidance ● Translate the Civil Rights Principles for the Era of Big Data into actionable steps for private companies 36
  34. 34. Digital Decisions--Phases of Automation
  35. 35. Digital Decisions--Phases of Automation Design: Identify Inputs ● What is the source of your data? ● Was the data collected first-hand by humans? ● Did they have any perspective or incentive structure that may have influenced the collection of this data? ● If the data was collected directly from users, did they have an equal opportunity to provide data inputs in a machine readable format? (There is a higher likelihood of error in a handwritten form vs a typed submission.) ● How can you clean the data to ensure that this historic or collection bias does not influence your results for this purpose? ● Is the data representative of the relevant population? Is any population missing or underrepresented? If so, can you find additional data to make your data set more robust? ● Are there any fields or features that should be explicitly prohibited from inclusion at the outset of your design process? For example, are race, gender, and other sensitive characteristics automatically excluded from inputs or are there times when they are acceptable?
  36. 36. Digital Decisions--Phases of Automation Build: Model Construction ● Do your rules rely on generalizations or cultural assumptions rather than causal relationships? (Not sure? Ask yourself if you would feel comfortable if the public saw your stated correlations.) ● Can you use pseudonymization techniques that avoid the needless scoring/targeting of non-suspicious individuals? ● Have the tools you are using from libraries been tested for bias? Is there an audited or trustworthy source for the necessary tools? ● Are any of these criteria proxies for race, gender, or other sensitive characteristics? For example, zip code + 4 is often strongly correlated with racial identity. ● How much control of the statistical process is required to prevent your model from relying on proxies for protected classes? ● Are non-deterministic outcomes acceptable given the rules and considerations around transparency and “explainability” that may be applicable?
  37. 37. Digital Decisions--Phases of Automation Test: ● What is the acceptable error rate before going to market? ● Is the error rate evenly distributed across all demographics? ● Identify reason for correlations: what factors are predominant in determining outcomes? ● Are unintended factors or variables correlated with race or other sensitive characteristics? ● Have you specifically tested your process on representative samples from a variety of racial, economic, and other diverse backgrounds for disparate outcomes? ● Are model outputs and algorithmic transactions being sufficiently logged as to enable appropriate diagnostics in the event of data subject or regulatory challenge? ● Is there a process in place for periodic assessments / reviews to ensure that (for dynamic models especially), the modeling algorithm, features, and data inputs continue to reflect the evolving realities of the marketplace?
  38. 38. Digital Decisions--Phases of Automation Implement: ● What is the impact on individuals of a false positive / false negative? ● Is there a way for users to report that they feel they may have been treated unfairly (in order to capture big-picture trends that may reveal discrimination problems)? ● Don’t make claims about the power of the results that are bigger than what the process represents. ● Is there a method for human review of model outcomes to minimize false positives? ● Where does a human being sit in the analysis process? ● Does a person make a final determination as to an outcome that might negatively affect an individual?
  39. 39. Digital Decisions--Phases of Automation Evaluate and Refine ● Does the outcome provide contextual information that helps a user understand how the result was reached or is it a more opaque output (such as a numerical score)? ● Should there be fail-safes in place to ensure that potential systematic bias that may not be otherwise detected does not have an endlessly compounding effect on consumers? ● How does the result of your process feed back into the equation? ● Process any new data or altered logic model with same inquiry as original content. ● Is there a person responsible for ensuring that all relevant parts of the institution are involved in creating this process? For example, to check with relevant internal legal and policy teams as well as external stakeholders when applicable.
  40. 40. Digital Decisions--How can you use this?

×