1) Jordan Engbers is a chief scientist and CTO who has experience in bioinformatics, neuroscience, clinical data science, and founding two data science companies.
2) Data science is a multidisciplinary field that uses techniques from many areas like statistics, computer science, and domain knowledge to understand data and help improve decision making.
3) The impact of data science comes from developing data products - tools that deliver insights from data to drive better decisions. This requires both scientific rigor and software engineering practices.
Data Science: An Emerging Field for Future JobsJian Qin
Data deluge has become a reality in today's scientific research. What does it mean to future science workforce? How can you prepare yourself to embrace the data challenges and opportunities? This presentation will provide you with an overview of data science and what it means to you as future researchers and career scientists.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
Huge amount of data is being collected everywhere - when we browse the web, go to the doctor's clinic, visit the supermarket, tweet or watch a movie. This plethora of data is dealt under a new realm called Data Science. Data Science is now recognized as a highly-critical growing area with impact across many sectors including science, government, finance, health care, social networks, manufacturing, advertising, retail,
and others. This colloquium will try to provide an overview as well as clarify bits and bats about this emerging field.
Check out what machine learning can do when implemented by Hospital administrators for their operational services. We used historical data to test out and got results that could turn around ROIs for many hospitals suffering loses today
This workshop is a hands-on introduction to machine learning with R and was presented on December 8, 2017 at the University of South Carolina for the 2017 Computational Biology Symposium held by the International Society for Computational Biology Regional Student Group-Southeast USA.
Data Science: An Emerging Field for Future JobsJian Qin
Data deluge has become a reality in today's scientific research. What does it mean to future science workforce? How can you prepare yourself to embrace the data challenges and opportunities? This presentation will provide you with an overview of data science and what it means to you as future researchers and career scientists.
It seems that AI is also becoming a buzzword, like design thinking. Everyone is talking about AI or wants to have AI, and sees all the ideas and benefits – that’s fine, but how do you get started? But what’s different now? Three innovations have finally put AI on the fast track: Big Data, with the internet and sensors everywhere; massive computing power, especially through the Cloud; and the development of breakthrough algorithms, so computers can be trained to accomplish more sophisticated tasks on their own with deep learning. If you use new technology, you need to explore and know what’s possible. With design thinking, it aids to outline the steps and define the ways in which you’re going to create the solution. Starting with mapping the customer journey, defining who will be using that service enhanced with intelligent technology, or who will benefit and gain value from it. We discuss how these two worlds are coming together, and how you get started to transform your venture with Artificial Intelligence using Design Thinking.
Speaker: Claudio Mirti, Principal Solution Specialist – Data & AI, Microsoft
Huge amount of data is being collected everywhere - when we browse the web, go to the doctor's clinic, visit the supermarket, tweet or watch a movie. This plethora of data is dealt under a new realm called Data Science. Data Science is now recognized as a highly-critical growing area with impact across many sectors including science, government, finance, health care, social networks, manufacturing, advertising, retail,
and others. This colloquium will try to provide an overview as well as clarify bits and bats about this emerging field.
Check out what machine learning can do when implemented by Hospital administrators for their operational services. We used historical data to test out and got results that could turn around ROIs for many hospitals suffering loses today
This workshop is a hands-on introduction to machine learning with R and was presented on December 8, 2017 at the University of South Carolina for the 2017 Computational Biology Symposium held by the International Society for Computational Biology Regional Student Group-Southeast USA.
Machine learning is permeating nearly every industry – from retail and financial services to entertainment and transportation. And, while it's been slow to make its way into healthcare, machine learning stands to transform this space, too… positioning us to better diagnose, predict outcomes, provide follow-up care, and tailor treatments.
In this webinar, PointClear Solutions' Michael Atkins discusses the current state of machine learning in healthcare and what we can expect in the near future:
• What is machine learning and how is it being used today?
• What are some of the risks and obstacles we face in implementing this new technology?
• Looking into the future, what role will machine learning play in transforming healthcare?
• How can my company prepare for machine learning?
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Artificial intelligence (AI) technologies, such as natural language processing (NLP), have been around for some time, and more recently there has been much hype surrounded the potential of combining AI with Machine Learning (ML) for decision making. But has it met the challenge? This webinar reviews what NLP is, the role NLP plays in machine learning approaches, such as deep learning, and some real-world use cases for application to life sciences and healthcare to improve patient outcomes.
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
Pistoia Alliance launched its Centre of Excellence for Artificial Intelligence (AI) in Life Sciences where we hope to bring together best practice, adoption strategy and hackathons covering a range of challenges.
Over the coming months we will be hosting a series of topics and speakers giving their perspectives on the role of Artificial & Augmented Intelligence in Life Sciences and Healthcare.
The topics will cover some of the current challenges, user stories & value in using AI in life sciences. If you want to get involved in this series as a speaker or suggest topics please get in touch
Webinar 1 will focused on the following
A Brief History
Big Data/ML/DL/AI - fundamentals and concepts
Data Fidelity importance
Some best practices
Understand the Demand of Analyst Opportunity in U.SJiaming Zhang
The slides summarize an analysis on the demand pattern of analyst opportunity (like data analyst, data science) in the U.S.
In a nutshell, it answer four questions, including the demand trend, demand source, degree and skill requirement based on the online job posting data.
Paradigm4 Research Report: Leaving Data on the tableParadigm4
While Big Data enjoys widespread media coverage, not enough attention has been paid to what practitioners think — data scientists who manage and analyze massive volumes of data. We wanted to know, so Paradigm4 teamed up with Innovation Enterprise to ask over 100 data scientists for their help separating Big Data hype from reality. What we learned is that data scientists face multiple challenges achieving their company’s analytical aspirations. The upshot is that businesses are leaving data — and money — on the table.
Understanding Data Science: Unveiling the Basics
What is Data Science?
Data science is an interdisciplinary field that combines techniques from statistics, mathematics, computer science, and domain knowledge to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large and complex datasets to solve real-world problems.
Importance of Data Science
In today's data-driven world, organizations are inundated with data from various sources. Data science allows them to convert this raw data into actionable insights, enabling informed decision-making, improved efficiency, and innovation.
Intersection of Data Science, Statistics, and Computer Science
Data science borrows heavily from statistics and computer science. Statistical methods help in understanding data patterns, while computer science provides the tools to process and analyze large datasets efficiently.
Key Components of Data Science
Data Collection and Storage
The first step in data science is gathering relevant data from various sources. This data is then stored in databases or data warehouses for further processing.
Data Cleaning and Preprocessing
Raw data is often messy and inconsistent. Data cleaning involves removing errors, duplicates, and irrelevant information. Preprocessing includes transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA involves visualizing and summarizing data to uncover patterns, trends, and anomalies. It helps in forming hypotheses and guiding further analysis.
Machine Learning and Predictive Modeling
Machine learning algorithms are used to build predictive models from data. These models can make predictions and decisions based on new, unseen data.
Data Visualization
Visual representations of data, such as graphs and charts, help in understanding complex information quickly. Data visualization aids in conveying insights effectively.
The Data Science Process
Problem Definition
The data science process begins with understanding the problem you want to solve and defining clear objectives.
Data Collection and Understanding
Collect relevant data and understand its context. This step is crucial as the quality of the analysis depends on the quality of the data.
Data Preparation
Clean, preprocess, and transform the data into a suitable format for analysis. This step ensures that the data is accurate and ready for modeling.
Model Building
Select appropriate algorithms and build predictive models using machine learning techniques. This step involves training and fine-tuning the models.
Model Evaluation and Deployment
Evaluate the model's performance using metrics and test datasets. If the model performs well, deploy it for making predictions on new data.
Technologies Driving Data Science
Programming Languages
Languages like Python and R are widely used in data science due to their extensive libraries and versatility.
Machine Learning Libraries
Libraries like Scikit-Learn and TensorFlow prov
Data science is an integrative field that uses scientific methods, processes, algorithms, and systems to extract, knowledge and awareness from data in various forms
Machine learning is permeating nearly every industry – from retail and financial services to entertainment and transportation. And, while it's been slow to make its way into healthcare, machine learning stands to transform this space, too… positioning us to better diagnose, predict outcomes, provide follow-up care, and tailor treatments.
In this webinar, PointClear Solutions' Michael Atkins discusses the current state of machine learning in healthcare and what we can expect in the near future:
• What is machine learning and how is it being used today?
• What are some of the risks and obstacles we face in implementing this new technology?
• Looking into the future, what role will machine learning play in transforming healthcare?
• How can my company prepare for machine learning?
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
A Practical-ish Introduction to Data ScienceMark West
In this talk I will share insights and knowledge that I have gained from building up a Data Science department from scratch. This talk will be split into three sections:
1. I'll begin by defining what Data Science is, how it is related to Machine Learning and share some tips for introducing Data Science to your organisation.
2. Next up well run through some commonly used Machine Learning algorithms used by Data Scientists, along with examples for use cases where these algorithms can be applied.
3. The final third of the talk will be a demonstration of how you can quickly get started with Data Science and Machine Learning using Python and the Open Source scikit-learn Library.
Artificial intelligence (AI) technologies, such as natural language processing (NLP), have been around for some time, and more recently there has been much hype surrounded the potential of combining AI with Machine Learning (ML) for decision making. But has it met the challenge? This webinar reviews what NLP is, the role NLP plays in machine learning approaches, such as deep learning, and some real-world use cases for application to life sciences and healthcare to improve patient outcomes.
Pistoia Alliance Webinar Demystifying AI: Centre of Excellence for AI Webina...Pistoia Alliance
Pistoia Alliance launched its Centre of Excellence for Artificial Intelligence (AI) in Life Sciences where we hope to bring together best practice, adoption strategy and hackathons covering a range of challenges.
Over the coming months we will be hosting a series of topics and speakers giving their perspectives on the role of Artificial & Augmented Intelligence in Life Sciences and Healthcare.
The topics will cover some of the current challenges, user stories & value in using AI in life sciences. If you want to get involved in this series as a speaker or suggest topics please get in touch
Webinar 1 will focused on the following
A Brief History
Big Data/ML/DL/AI - fundamentals and concepts
Data Fidelity importance
Some best practices
Understand the Demand of Analyst Opportunity in U.SJiaming Zhang
The slides summarize an analysis on the demand pattern of analyst opportunity (like data analyst, data science) in the U.S.
In a nutshell, it answer four questions, including the demand trend, demand source, degree and skill requirement based on the online job posting data.
Paradigm4 Research Report: Leaving Data on the tableParadigm4
While Big Data enjoys widespread media coverage, not enough attention has been paid to what practitioners think — data scientists who manage and analyze massive volumes of data. We wanted to know, so Paradigm4 teamed up with Innovation Enterprise to ask over 100 data scientists for their help separating Big Data hype from reality. What we learned is that data scientists face multiple challenges achieving their company’s analytical aspirations. The upshot is that businesses are leaving data — and money — on the table.
Understanding Data Science: Unveiling the Basics
What is Data Science?
Data science is an interdisciplinary field that combines techniques from statistics, mathematics, computer science, and domain knowledge to extract insights and knowledge from data. It involves collecting, processing, analyzing, and interpreting large and complex datasets to solve real-world problems.
Importance of Data Science
In today's data-driven world, organizations are inundated with data from various sources. Data science allows them to convert this raw data into actionable insights, enabling informed decision-making, improved efficiency, and innovation.
Intersection of Data Science, Statistics, and Computer Science
Data science borrows heavily from statistics and computer science. Statistical methods help in understanding data patterns, while computer science provides the tools to process and analyze large datasets efficiently.
Key Components of Data Science
Data Collection and Storage
The first step in data science is gathering relevant data from various sources. This data is then stored in databases or data warehouses for further processing.
Data Cleaning and Preprocessing
Raw data is often messy and inconsistent. Data cleaning involves removing errors, duplicates, and irrelevant information. Preprocessing includes transforming data into a usable format.
Exploratory Data Analysis (EDA)
EDA involves visualizing and summarizing data to uncover patterns, trends, and anomalies. It helps in forming hypotheses and guiding further analysis.
Machine Learning and Predictive Modeling
Machine learning algorithms are used to build predictive models from data. These models can make predictions and decisions based on new, unseen data.
Data Visualization
Visual representations of data, such as graphs and charts, help in understanding complex information quickly. Data visualization aids in conveying insights effectively.
The Data Science Process
Problem Definition
The data science process begins with understanding the problem you want to solve and defining clear objectives.
Data Collection and Understanding
Collect relevant data and understand its context. This step is crucial as the quality of the analysis depends on the quality of the data.
Data Preparation
Clean, preprocess, and transform the data into a suitable format for analysis. This step ensures that the data is accurate and ready for modeling.
Model Building
Select appropriate algorithms and build predictive models using machine learning techniques. This step involves training and fine-tuning the models.
Model Evaluation and Deployment
Evaluate the model's performance using metrics and test datasets. If the model performs well, deploy it for making predictions on new data.
Technologies Driving Data Science
Programming Languages
Languages like Python and R are widely used in data science due to their extensive libraries and versatility.
Machine Learning Libraries
Libraries like Scikit-Learn and TensorFlow prov
Data science is an integrative field that uses scientific methods, processes, algorithms, and systems to extract, knowledge and awareness from data in various forms
Bridge the Gap Between Data and Decisions: Master Data Science Course using Machine Learning
Empower yourself with the in-demand skills of data science and machine learning through our dynamic Applied Hybrid Training program!
This innovative data science course seamlessly blends classroom instruction with online learning, providing a well-rounded foundation for your data science journey. Learn to unlock the power of data and leverage machine learning algorithms to solve real-world challenges.
Uncover the Magic Behind the Data:
Machine Learning Fundamentals: Demystify the concepts of machine learning algorithms and explore their practical applications across various industries.
Python Programming Prowess: Gain hands-on experience with Python, the language of choice for data science. Learn how to leverage its libraries and tools to implement machine learning models effectively.
Data Wrangling Expertise: Master techniques for handling and manipulating datasets from diverse fields. Understand how to prepare data for optimal use with machine learning algorithms.
Actionable Insights from Algorithms: Discover how to interpret machine learning outputs and translate them into actionable insights that drive real-world results.
Data Communication Mastery: Learn to communicate your data science findings with clarity and impact, effectively presenting the results of your machine learning models.
By the end of this Data Science Course using Machine Learning, you will have enough knowledge and hands-on expertise in Python to use and apply them in the real world around you. Also, you will be able to get prepared for certifications of Data Camp and Cognitive AI.
This talk presents areas of investigation underway at the Rensselaer Institute for Data Exploration and Applications. First presented at Flipkart, Bangalore India, 3/2015.
Data Science Demystified_ Journeying Through Insights and InnovationsVaishali Pal
In the digital age, data has emerged as one of the most valuable resources, driving decision-making processes across industries. Data science, the interdisciplinary field that extracts insights and knowledge from structured and unstructured data, plays a pivotal role in leveraging this resource. This section provides an overview of data science, its importance, and its applications in various domains.
Introduction to Data Science: Unveiling Insights Hidden in Datahemayadav41
Embark on a journey into the fascinating field of Data Science and uncover the valuable insights concealed within vast datasets. In this article, we explore the fundamental concepts of Data Science and its applications. Discover how a Data science Training Institute in Jaipur, Lucknow, Indore, Mumbai, Delhi, Noida, Gurgaon and other cities in India can equip you with the knowledge and skills to analyze, interpret, and extract meaningful information from data. Explore topics such as data preprocessing, statistical analysis, machine learning, and data visualization. Join us on this enlightening exploration of the world of Data Science.
This talk is an introduction to Data Science. It explains Data Science from two perspectives - as a profession and as a descipline. While covering the benefits of Data Science for business, It explaints how to get started for embracing data science in business.
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrank Rybicki
These are my #AI slides for medical deep learning using #radiology and medical imaging examples. Please use them & modify to teach your own group about medical AI.
14. Take Away
There is no set path to becoming a data scientist
Focus on:
Developing a scientific mindset
Strengthening your “metaskills”
Exploring many disciplines
15. Should you listen to me?
I am not speaking as an authority
I am here to share what I have learned and to help move
people forward in data science
So:
- Don’t take what I say at face value
- Test for yourself
- Challenge what you hear
- Come up with new and better ideas
19. What is Data Science?
Wikipedia that:
“...interdisciplinary field about processes and systems to extract knowledge or
insights from data in various forms, either structured or unstructured, which is a
continuation of some of the data analysis fields such as statistics, data mining, and
predictive analytics…”
“...Data science employs techniques and theories drawn from many fields within
the broad areas of mathematics, statistics, information science, and computer
science, including signal processing, probability models, machine learning,
statistical learning, data mining, database, data engineering, pattern recognition
and learning, visualization, predictive analytics, uncertainty modeling, data
warehousing, data compression, computer programming, artificial intelligence, and
high performance computing.”
20. What is a Data Scientist?
“Data scientists use their data and analytical ability to
find and interpret rich data sources;
manage large amounts of data despite hardware, software, and bandwidth
constraints;
merge data sources;
ensure consistency of datasets;
create visualizations to aid in understanding data;
build mathematical models using the data; and
present and communicate the data insights/findings.”
23. The purpose of a scientific discipline
Do the following descriptions make sense?
- Astronomy is the field of science that uses telescopes
- Chemistry is about mixing chemicals and torturing undergrads
- Statistics uses maths
Nope.
- Astronomy is the study of celestial objects and processes that allows
us to understand the universe
- Chemistry examines the composition, structure, properties and
change of matter to help us understand the physical world
- Statistics allows us to use data more effectively by studying the
collection, analysis, interpretation, and organization of data….
Methods are invented to serve the field,
not as a purpose in themselves.
24. Is data science just statistics “rebranded”?
"Data scientist is just a sexed up word for statistician."
- Nate Silver
“Statistical modelling - two cultures” - Leo Breiman
“50 Years of Data Science” - David Donoho
Summary, data science is just an expanded form of statistics
But see:
“What ‘50 years of data science’ leaves out” - Sean Owen, Cloudera
What is the purpose of data science?
25. Data Science is about decisions
We democratize data access to empower all employees to make data-informed
decisions, give everybody the ability to use experiments to correctly measure the
impact of their decisions, and turn insights on user preferences into data
products that improve the experience of using Airbnb
- Scaling Knowledge at Airbnb
That is more than statistics:
- Need to understand business processes
- Requires data engineering approaches to provide the
environment
- Requires software engineering to create platforms to measure
the impact and develop the data products
26. Data science is the scientific discipline focused on determining
how data can drive better decisions across a wide set of
domains
Scientific discipline - not just data analysis, but science
“...determining how data...”
- methodologies, statistics, computer science
“...can drive better decisions…”
- domain knowledge, science, engineering, social sciences...
27. How does a focus on decisions change our approach?
1) Takes the focus away from specific methodologies (we do deep learning
too!) to using the appropriate methodologies to achieve a larger
overarching goal - better decisions
a) Side effect is we get to use a larger array of disciplines
i) Systems theory
ii) Psychology
2) Focus on making good data products that change decisions
a) Focusing on data products takes us away from “scripts” and towards an
engineered approach to data product manufacturing
28. Data science is not rebranded statistics.
Data science is a multidisciplinary
discipline that seeks to understand how
data can be used to improve decision
making.
Statistics is just a part of the approach.
30. What is a data product?
Desired OutcomeDecisionExperienceWorld
learning
data information knowledge wisdom
data product
Other Outcome
Other Outcome
Other Outcome
Other Outcome
Other Outcome
Other Outcome
31. Data products are the mechanism
by which data science creates impact
32. Scientific Method
Framework for finding
value in data
Data is a raw resource.
Converting data to a data
product requires
experimentation,
exploration and learning.
This is the domain of
science.
Agile Development
Process for creation in
the face of uncertainty
Agile processes allow
software teams to meet
changing requirements,
but stay on track and
create effective products.
Engineered Products
Practices for ensuring
high quality products
It is one thing to make an
R script to analyse a
dataset. It is another to
have a resilient,
auditable, scalable data
product.
Desid Labs Approach
“Data science - more than just R scripts”
- unofficial Desid Labs motto
33. Levels of data products
Reporting
Dashboards
Prediction
AI (Autonomous)
Intelligent Decision-making Support Systems
34. Other dimensions
Complexity of UI
Complexity, size, and speed
of data, information, and knowledge (3V’s)
This branches into the field of AI and decision making
Start with Herbert Simon
35. Learning from the other doctors (MD, not PhD)
Clinical Decisions Rules (Dr. Ian Stiell)
1) Derivation
2) Validation
a) Cross-validation (should be standard practice!)
b) Prospective validation - this is the real experiment
3) Implementation
4) Studying barriers to adoption
These steps help determine the validity of
your data product
36. More than just R scripts
“It’s one thing to create an excellent fraud detection model in R, and quite another
to build:
● Fault-tolerant ingest of live data at scale that could represent fraudulent
actions
● Real-time computation of features based on the data stream
● Serialization, versioning and management of a fraud detection model
● Real-time prediction of fraud based on computed features at scale
● Learning over all historical data
● Incremental update of the production model in near-real-time
● Monitoring, testing, productionization of all of the above”
- Sean Owen, Cloudera
These are the sorts of things to think about when it
comes to implementing your data product
40. Learn by doing
1) Figure out where you are in the spectrum
2) Determine what experience you need to expand in either
direction
3) Find projects that will give you that experience
a) Online competitions
b) Hackathons
c) Freelance work
d) Your own projects
e) Data journalism
f) Data for Good (!)
41. Post production
Treat your data product as an hypothesis about the world
● Collect prospective data on its use
● Perform cohort analyses on people who make decisions based
on the data
● Consider A/B testing
● Consider canary testing
● Set a point where you will analyze the data (X people, X
amount of time)
● Answer the question - did it make a difference?
● Did it make the right difference?
43. “...science and technology have been unable to
keep pace with the second-order effects caused
by their first-order victories.”
- Gerald Weinberg
44. How do we know that our data products are having
the desired effect?
Data is cleaned, features determined, model created (AUC: 0.88!), implementation
tested, UI designed, UX tested, integrated into production system, monitored.
Everything is done
Pat on the back - walk away
Next month’s headline:
45. What happened?
- An algorithm is only as good as its data
- An algorithm learns from the data - data is an
representation of the real world including its flaws
- The real world is complex and there can be non-linear
effects
46. Obviously Data for Evil (Commission)
Predatory advertising
Surveillance of dissidents, activists
Identity theft
Social Engineering
47. Gray areas
Web lining
Databases in elections to determine wedge issues
Surveillance for security reasons
Targeted advertising
48. Data for Good … right? (Omission)
Model to determine who will respond best to social assistance
What if the data is from an area with strong historical racism?
(Don’t use variables/features that could be impart racial bias)
Automatic tagging of photos
What are the consequences of the algorithm being wrong?
(Need to balance sensitivity and specificity)
Apps to help first-responder (geolocation)
Will providing a service to some people limit access based on
arbitrary technology choices?
49. How Big Data Enables Economic Harm to
Consumers, Especially to Low-Income and
Other Vulnerable Sectors of the Population
50. Algorithms aren’t biased - but data is
Historical data encompasses our societal biases
Algorithms learn from that data and inherit these biases
https://www.fordfoundation.org/ideas/equals-change-blog/posts/can-computers-
be-racist-big-data-inequality-and-discrimination/
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2477899
https://www.propublica.org/article/when-big-data-becomes-bad-data
https://theconversation.com/big-data-algorithms-can-discriminate-and-its-not-
clear-what-to-do-about-it-45849
51. So what do we do?
Possibilities:
● Strengthen User Control of Personal Data
● Enforce Structural Changes in Market to Increase Competition
● Directly Regulate Big Data Platforms to Prohibit Harmful Practices
● Investing in the technical capacity of public interest lawyers, and developing a
greater cohort of public interest technologists
● Pressing for “algorithmic transparency.”
● Exploring effective regulation of personal data
● Ethical code of conduct for data science
These are strategic suggestions -
they suggest the what, but not the how
52. We need a solution that keeps pace with the tech
1) Systematic scientific process should be applied
Equivalent of peer review
2) Agile development and testing
Ensure models are implemented correctly
3) Systems modeling
Understand the second-order effects of the system
4) Monitoring
Validation of our model in the world
53. Conclusions
Data science is about decisions.
The creation of data products involves many
disciplines
Determine where you are at, then expand your
skills
Approach data science with care and thought - it
is as easier to hurt than help
54. If you are interested in specifics about
methodologies, sign up for the Desid Labs
newsletter:
desidlabs.com