SlideShare a Scribd company logo
1 of 33
Download to read offline
Big Data and algorithms
Impact on individuals and society
About me
● Software engineer
● I worked in web companies
● Big Data, Social Media, Influence Marketing
Context
● Lesson given at high school
● Digital Citizenship (optional alternative class to Catholic Religion)
● 14 to 18 years old students
● Academic year 2017/2018
Potential long term effects of Big Data
● Sensors everywhere recording every kind of data
● Potential dystopia: people will tend to ‘cool down’: standardize, tone down,
suppress their own spontaneous behaviours because always tracked and
digitalized.
● social cooling
● propublica
Big Data
Data gathered, stored and manipulated by automatic processes.
Characterized by the 5 Vs:
● Volume: big size of data (big data) - but each data unit is usually small
● Variety: data of different types:
○ Structured: tables, databases, ..
○ Unstructured: text, images, audio, ..
● Velocity: data available in real-time
● Variability: inconsistent, incomplete, contradictory data
● Veracity: inaccurate, unreliable data, with low quality and a lot of noise
An example
A sensor at each metro station turnstile tracks down each access coupled with
a timestamp (i.e. expressed in seconds starting from 1th of January 1970 in
UTC time)
● UTC: Coordinated Universal Time
● Longitude 0°
● (Greenwich)
Data stored somewhere
Table with all triples stored (timestamp, station, in/out)
Timestamp Station Direction
Tue 01-01-2009 6:00 Battistini in
Tue 01-01-2009 6:02 Colosseo in
Tue 01-01-2009 6:05 Magliana out
Tue 01-01-2009 6:12 Anagnina in
Data processed to be visualized
The eye catches much more information than the brain
Open Big Data
● Open data: data collected by private or public organizations, freely
downloadable or accessible by anyone
● Public knowledge
● I.e.: data about municipality, health, geographic entities
● Linked data: open data linked one another by semantic (= meaningful and
formally structured) links
Algorithms
● Sequence of actions towards a goal
● I.e.: algorithm to get a robot out of a room
○ The robot doesn’t see, it only acknowledges a wall after hitting it
○ Step forward
○ If you hit a wall, step right
○ If you hit a wall while stepping right, turn to your right
loop
An algorithm to control the turnstiles
● If incoming people per minute and per turnstile are above a chosen threshold
IN-max (and outgoing people are under another threshold OUT-max)
○ A turnstile switches: turnstile OUT -> turnstile IN
● And vice versa
● (Incoming people from 6:00 to 6:10) / 10 / n. turnstiles IN = a value IN-V
● (Outgoing people from 6:00 to 6:10) / 10 / n. turnstiles OUT = a value OUT-V
● If IN-V > IN-max and OUT <= OUT-max, then switch OUT -> IN
Programming languages
● Languages to write algorithms
● Understood by machines, so that they can execute them
● Pseudo-languages: languages to sketch algorithms, useful for people but
unreadable by machines
○ Es. block diagram or natural language instructions
Models
● Many times models are created for the observed reality
● To simplify it, otherwise too many variables are involved
● And to take decisions, to execute actions on the observed reality
Example of model
● Class of students
● I want to improve the performance of the students
● Which data can I collect about this reality?
● Which model can I draw?
● Which actions can I put in place?
The starting theory
● I rely on a theory, a direction, an idea
● The idea: students aren’t good enough because teachers are not up to
their job.
● It’s just a theory, as good as any. Other possible theories I could take:
○ Because students are too tired
○ Because they live in poor neighbourhoods
○ Because they spend too much time on their smartphones
Useful data, according to my theory
● Students’ notes at tests, reports, etc.
● Opinions about each teacher given by the school principal and the parents
of the students
● I design an algorithm that evaluates teachers based on this model:
○ Students get better or get worse depending of their teachers’ quality
○ Teachers getting good reviews from principals and students’ parents are actually good
My algorithm
● If at the end of the year with teacher T, students get better notes than the
year before, then T was good by a factor N
● If T gets good reviews from the principal and from students’ parents, then
T was good by a factor R
○ Teacher score S = N + R
○ Among all school teachers, those ones
which are in the x% lower range of the
curve get fired
Gaussian curve: it fits well for sums of random values
Algorithm execution and resulting actions
● The algorithm runs, I get my teachers’ scores
● I find the 5% (for example) of all teachers who place themselves lowest in
the curve
● I fire them
● I optimized the faculty
● Am I ok with that? Did I do a good job?
That really happened
● Article on Washington Post
● There’s a problem:
○ Sarah was a very good teacher, held in high esteem by the principal and students’ parents
○ She got a low score by the algorithm
○ She was fired
● How did that happen?
● S = N + R = low value + high value -> in the lowest 5%
What’s wrong?
● How is it possible that a good teacher got fired?
● Model too naive
● Incoherent data
● Small data
● No feedback
Wrong model
● Each model comes with a choice, focusses on some variables and cuts
out others
● Otherwise it wouldn’t be a model, i.e. a simplified version of the reality
● It has a bias, a prejudice, an inclination more on one side
● In our example, we consider as variables the students’ notes in the
previous year and in the current year
● Too simple: abstract oversimplification of the target reality
Poorly coherent data
● Algorithm input data may not be coherent
● Notes of the previous year (e.g. last year of elementary school) could be
higher than they they should
● In the current year, lower notes assigned by the current teacher seem to
suggest a worsening of the students’ performance, but this could be not
the case
Not enough data
● Data are too few.
● In order for a statistical model to work properly, data must be a lot
(~millions)
● I can’t take notes of 25 students and give just them as an input to the
algorithm
● In any particular case of a class, there could be a thousand reasons why
those students are performing worse:
○ Problems at home
○ Personal problems
○ Change of school
No feedback
● There isn’t any feedback which loops back to the algorithm to steer it
● The feedback comes from the current state of the reality affected by the
action of the algorithm
Modeled reality Algorithm
Model
action
theory
feedback
Impossible to getting things back on an even
keel
● With no feedback the algorithm goes off on its own
● It can’t be updated with data extracted from the observed reality after its
start
● If we fired good teachers, we’ll never know
● If we kept bad teachers, we’ll never know either
● The algorithm isn’t listening to mistakes made, let alone it can’t learn from
them
Automatic speed controller
● Autopilot (controlled variable: speed - but it could be direction too)
● The car must constantly go 100 km/h
● It works in the same way of the other example, just a simpler reality here
Real speed Controller
action
feedback
Controller without feedback
● The controller give the engine an initial power and it reaches e.g. 110 km/h
● From that moment on what does it happen? We’ll never know
Real speed Controller
action
Weapons of math destruction
● Weapons of math destruction - Cathy O’ Neil
● Checklist of a weapon of math destruction (WMD) features:
○ Model and algorithm non-transparent (black box): we don’t know what there’s inside
○ Harmful for people.
○ Even worse, it builds a vicious circle that make things get worse whereas one of its
objectives was to improve objectivity and remove inequalities.
○ It can scale on big numbers.
Vicious circle
● In our example of teachers there’s no vicious circle
● The algorithm output is hardly meaningful, at least it doesn’t worse things
● Another real example with vicious circle:
● Algorithm assigning years in prison: it gives more years to people already
condemned in the past or previous dealings with justice (used in some US
state)
○ Someone living in a rough neighborhood will more likely have higher algorithm scores ->
more severe and long punishments -> even more disadvantaged once out of prison
■ -> vicious circle
When algorithms on Big Data come into play
and when they don’t
● Leveraging algorithms and Big Data to make choices is easy:
○ Automatic
○ Fast
○ With no responsibility for people who use them
● Whenever you need to choose about one single precious individual, you still ask
people to do that (es. Hiring a lawyer for a prestigious firm).
● Whenever you need to choose thousands of time about thousands of
interchangeable people, algorithms will do that (e.g. applicants for McDonald’s)
● Algorithms save money and time, but with the collateral effect of ruining the
lives of many individuals on which they simply fail (collateral damage)
Algorithms on Big Data as weapons of math
destruction
● That’s why weapons of math destruction
● They embody a simplistic and biases vision of a certain reality
● Taking decisions on the basis of a few variables
● Producing numbers, scores, rankings which look like objective
● Everyone starting from an unfavourable position according to the algorithm
parameters, will end up sinking even lower (e.g. more years in prison for repeat
offenders, even minor dealings with justice, in a poor neighborhood) - vicious
circle: the offender will more likely do bad in the future
● Inequalities grow, the exact opposite of what the algorithm’s designers expect
Example of WMD in our brain
● Everyone is coupled with the number of followers on a social media
● Who has already a big number of them will get more and more
● Who has a small number will hardly get more. Why?
● In our brain there’s a little WMD
○ If someone has got a lot of followers, than is an important person, so I’m going to follow
her/him
○ If someone has got a few followers, than is a loser, so I’m not going to follow her/him
○ Vicious circle
● That number is now for us an objective integral part of the person

More Related Content

Similar to Big Data and algorithms

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social GoodDSP智庫驅動
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!The Source for Learning, Inc.
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisSwiss Big Data User Group
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachAjit Ghodke
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Raul Eulogio
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learningTheodoros Vasiloudis
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksKatie Fang
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Alberto Danese
 
Jupyter con 2018 Diversity Analytics & OSS Adventures
Jupyter con 2018 Diversity Analytics & OSS AdventuresJupyter con 2018 Diversity Analytics & OSS Adventures
Jupyter con 2018 Diversity Analytics & OSS AdventuresHolden Karau
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
A-Level Presentation - 44 Moral and ethical issues.pptx
A-Level Presentation - 44 Moral and ethical issues.pptxA-Level Presentation - 44 Moral and ethical issues.pptx
A-Level Presentation - 44 Moral and ethical issues.pptxssuser569157
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentationparlamind
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesLuciano Pesci, PhD
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectiveXavier Amatriain
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingKai Xin Thia
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?Srinath Perera
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...SeriousGamesAssoc
 

Similar to Big Data and algorithms (20)

Data Science for Social Good
Data Science for Social GoodData Science for Social Good
Data Science for Social Good
 
PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!PETE&C 2018: Let's Get Digital: Problem solving that is!
PETE&C 2018: Let's Get Digital: Problem solving that is!
 
Closing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data AnalysisClosing The Loop for Evaluating Big Data Analysis
Closing The Loop for Evaluating Big Data Analysis
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Ml masterclass
Ml masterclassMl masterclass
Ml masterclass
 
Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 
How AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinksHow AI will change the way you help students succeed - SchooLinks
How AI will change the way you help students succeed - SchooLinks
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
 
Jupyter con 2018 Diversity Analytics & OSS Adventures
Jupyter con 2018 Diversity Analytics & OSS AdventuresJupyter con 2018 Diversity Analytics & OSS Adventures
Jupyter con 2018 Diversity Analytics & OSS Adventures
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
A-Level Presentation - 44 Moral and ethical issues.pptx
A-Level Presentation - 44 Moral and ethical issues.pptxA-Level Presentation - 44 Moral and ethical issues.pptx
A-Level Presentation - 44 Moral and ethical issues.pptx
 
#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation#1 Berlin Students in AI, Machine Learning & NLP presentation
#1 Berlin Students in AI, Machine Learning & NLP presentation
 
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture SeriesIdentifying Personas With Agile Research - Dawn of the Data Age Lecture Series
Identifying Personas With Agile Research - Dawn of the Data Age Lecture Series
 
Past, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspectivePast, present, and future of Recommender Systems: an industry perspective
Past, present, and future of Recommender Systems: an industry perspective
 
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricingXavier Conort, DataScience SG Meetup - Challenges in insurance pricing
Xavier Conort, DataScience SG Meetup - Challenges in insurance pricing
 
AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?AI in the Real World: Challenges, and Risks and how to handle them?
AI in the Real World: Challenges, and Risks and how to handle them?
 
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
Dmitriy Babichenko, Jonathan Velez - To Scope or Not To Scope: Challenges of ...
 

More from michele minno

Digital stress and remedies
Digital stress and remediesDigital stress and remedies
Digital stress and remediesmichele minno
 
Attention and smartphones
Attention and smartphonesAttention and smartphones
Attention and smartphonesmichele minno
 
99ways presentation at semtech conference 2009
99ways presentation at semtech conference 200999ways presentation at semtech conference 2009
99ways presentation at semtech conference 2009michele minno
 

More from michele minno (6)

Logica
LogicaLogica
Logica
 
Porn literacy
Porn literacyPorn literacy
Porn literacy
 
Digital stress and remedies
Digital stress and remediesDigital stress and remedies
Digital stress and remedies
 
Cyberbullying
CyberbullyingCyberbullying
Cyberbullying
 
Attention and smartphones
Attention and smartphonesAttention and smartphones
Attention and smartphones
 
99ways presentation at semtech conference 2009
99ways presentation at semtech conference 200999ways presentation at semtech conference 2009
99ways presentation at semtech conference 2009
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 

Recently uploaded (20)

TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 

Big Data and algorithms

  • 1. Big Data and algorithms Impact on individuals and society
  • 2. About me ● Software engineer ● I worked in web companies ● Big Data, Social Media, Influence Marketing
  • 3. Context ● Lesson given at high school ● Digital Citizenship (optional alternative class to Catholic Religion) ● 14 to 18 years old students ● Academic year 2017/2018
  • 4. Potential long term effects of Big Data ● Sensors everywhere recording every kind of data ● Potential dystopia: people will tend to ‘cool down’: standardize, tone down, suppress their own spontaneous behaviours because always tracked and digitalized. ● social cooling ● propublica
  • 5. Big Data Data gathered, stored and manipulated by automatic processes. Characterized by the 5 Vs: ● Volume: big size of data (big data) - but each data unit is usually small ● Variety: data of different types: ○ Structured: tables, databases, .. ○ Unstructured: text, images, audio, .. ● Velocity: data available in real-time ● Variability: inconsistent, incomplete, contradictory data ● Veracity: inaccurate, unreliable data, with low quality and a lot of noise
  • 6. An example A sensor at each metro station turnstile tracks down each access coupled with a timestamp (i.e. expressed in seconds starting from 1th of January 1970 in UTC time) ● UTC: Coordinated Universal Time ● Longitude 0° ● (Greenwich)
  • 7. Data stored somewhere Table with all triples stored (timestamp, station, in/out) Timestamp Station Direction Tue 01-01-2009 6:00 Battistini in Tue 01-01-2009 6:02 Colosseo in Tue 01-01-2009 6:05 Magliana out Tue 01-01-2009 6:12 Anagnina in
  • 8. Data processed to be visualized The eye catches much more information than the brain
  • 9. Open Big Data ● Open data: data collected by private or public organizations, freely downloadable or accessible by anyone ● Public knowledge ● I.e.: data about municipality, health, geographic entities ● Linked data: open data linked one another by semantic (= meaningful and formally structured) links
  • 10.
  • 11. Algorithms ● Sequence of actions towards a goal ● I.e.: algorithm to get a robot out of a room ○ The robot doesn’t see, it only acknowledges a wall after hitting it ○ Step forward ○ If you hit a wall, step right ○ If you hit a wall while stepping right, turn to your right loop
  • 12. An algorithm to control the turnstiles ● If incoming people per minute and per turnstile are above a chosen threshold IN-max (and outgoing people are under another threshold OUT-max) ○ A turnstile switches: turnstile OUT -> turnstile IN ● And vice versa ● (Incoming people from 6:00 to 6:10) / 10 / n. turnstiles IN = a value IN-V ● (Outgoing people from 6:00 to 6:10) / 10 / n. turnstiles OUT = a value OUT-V ● If IN-V > IN-max and OUT <= OUT-max, then switch OUT -> IN
  • 13. Programming languages ● Languages to write algorithms ● Understood by machines, so that they can execute them ● Pseudo-languages: languages to sketch algorithms, useful for people but unreadable by machines ○ Es. block diagram or natural language instructions
  • 14. Models ● Many times models are created for the observed reality ● To simplify it, otherwise too many variables are involved ● And to take decisions, to execute actions on the observed reality
  • 15. Example of model ● Class of students ● I want to improve the performance of the students ● Which data can I collect about this reality? ● Which model can I draw? ● Which actions can I put in place?
  • 16. The starting theory ● I rely on a theory, a direction, an idea ● The idea: students aren’t good enough because teachers are not up to their job. ● It’s just a theory, as good as any. Other possible theories I could take: ○ Because students are too tired ○ Because they live in poor neighbourhoods ○ Because they spend too much time on their smartphones
  • 17. Useful data, according to my theory ● Students’ notes at tests, reports, etc. ● Opinions about each teacher given by the school principal and the parents of the students ● I design an algorithm that evaluates teachers based on this model: ○ Students get better or get worse depending of their teachers’ quality ○ Teachers getting good reviews from principals and students’ parents are actually good
  • 18. My algorithm ● If at the end of the year with teacher T, students get better notes than the year before, then T was good by a factor N ● If T gets good reviews from the principal and from students’ parents, then T was good by a factor R ○ Teacher score S = N + R ○ Among all school teachers, those ones which are in the x% lower range of the curve get fired Gaussian curve: it fits well for sums of random values
  • 19. Algorithm execution and resulting actions ● The algorithm runs, I get my teachers’ scores ● I find the 5% (for example) of all teachers who place themselves lowest in the curve ● I fire them ● I optimized the faculty ● Am I ok with that? Did I do a good job?
  • 20. That really happened ● Article on Washington Post ● There’s a problem: ○ Sarah was a very good teacher, held in high esteem by the principal and students’ parents ○ She got a low score by the algorithm ○ She was fired ● How did that happen? ● S = N + R = low value + high value -> in the lowest 5%
  • 21. What’s wrong? ● How is it possible that a good teacher got fired? ● Model too naive ● Incoherent data ● Small data ● No feedback
  • 22. Wrong model ● Each model comes with a choice, focusses on some variables and cuts out others ● Otherwise it wouldn’t be a model, i.e. a simplified version of the reality ● It has a bias, a prejudice, an inclination more on one side ● In our example, we consider as variables the students’ notes in the previous year and in the current year ● Too simple: abstract oversimplification of the target reality
  • 23. Poorly coherent data ● Algorithm input data may not be coherent ● Notes of the previous year (e.g. last year of elementary school) could be higher than they they should ● In the current year, lower notes assigned by the current teacher seem to suggest a worsening of the students’ performance, but this could be not the case
  • 24. Not enough data ● Data are too few. ● In order for a statistical model to work properly, data must be a lot (~millions) ● I can’t take notes of 25 students and give just them as an input to the algorithm ● In any particular case of a class, there could be a thousand reasons why those students are performing worse: ○ Problems at home ○ Personal problems ○ Change of school
  • 25. No feedback ● There isn’t any feedback which loops back to the algorithm to steer it ● The feedback comes from the current state of the reality affected by the action of the algorithm Modeled reality Algorithm Model action theory feedback
  • 26. Impossible to getting things back on an even keel ● With no feedback the algorithm goes off on its own ● It can’t be updated with data extracted from the observed reality after its start ● If we fired good teachers, we’ll never know ● If we kept bad teachers, we’ll never know either ● The algorithm isn’t listening to mistakes made, let alone it can’t learn from them
  • 27. Automatic speed controller ● Autopilot (controlled variable: speed - but it could be direction too) ● The car must constantly go 100 km/h ● It works in the same way of the other example, just a simpler reality here Real speed Controller action feedback
  • 28. Controller without feedback ● The controller give the engine an initial power and it reaches e.g. 110 km/h ● From that moment on what does it happen? We’ll never know Real speed Controller action
  • 29. Weapons of math destruction ● Weapons of math destruction - Cathy O’ Neil ● Checklist of a weapon of math destruction (WMD) features: ○ Model and algorithm non-transparent (black box): we don’t know what there’s inside ○ Harmful for people. ○ Even worse, it builds a vicious circle that make things get worse whereas one of its objectives was to improve objectivity and remove inequalities. ○ It can scale on big numbers.
  • 30. Vicious circle ● In our example of teachers there’s no vicious circle ● The algorithm output is hardly meaningful, at least it doesn’t worse things ● Another real example with vicious circle: ● Algorithm assigning years in prison: it gives more years to people already condemned in the past or previous dealings with justice (used in some US state) ○ Someone living in a rough neighborhood will more likely have higher algorithm scores -> more severe and long punishments -> even more disadvantaged once out of prison ■ -> vicious circle
  • 31. When algorithms on Big Data come into play and when they don’t ● Leveraging algorithms and Big Data to make choices is easy: ○ Automatic ○ Fast ○ With no responsibility for people who use them ● Whenever you need to choose about one single precious individual, you still ask people to do that (es. Hiring a lawyer for a prestigious firm). ● Whenever you need to choose thousands of time about thousands of interchangeable people, algorithms will do that (e.g. applicants for McDonald’s) ● Algorithms save money and time, but with the collateral effect of ruining the lives of many individuals on which they simply fail (collateral damage)
  • 32. Algorithms on Big Data as weapons of math destruction ● That’s why weapons of math destruction ● They embody a simplistic and biases vision of a certain reality ● Taking decisions on the basis of a few variables ● Producing numbers, scores, rankings which look like objective ● Everyone starting from an unfavourable position according to the algorithm parameters, will end up sinking even lower (e.g. more years in prison for repeat offenders, even minor dealings with justice, in a poor neighborhood) - vicious circle: the offender will more likely do bad in the future ● Inequalities grow, the exact opposite of what the algorithm’s designers expect
  • 33. Example of WMD in our brain ● Everyone is coupled with the number of followers on a social media ● Who has already a big number of them will get more and more ● Who has a small number will hardly get more. Why? ● In our brain there’s a little WMD ○ If someone has got a lot of followers, than is an important person, so I’m going to follow her/him ○ If someone has got a few followers, than is a loser, so I’m not going to follow her/him ○ Vicious circle ● That number is now for us an objective integral part of the person