The document discusses FAIR data and its importance. FAIR stands for Findable, Accessible, Interoperable, and Reusable. The author argues that data science is becoming a major driver in many fields due to the large amounts of digital data being created. For data and data science to reach their full potential, data needs to be FAIR so it can be easily discovered, accessed, integrated and reused. An example is given of a researcher combining health and vehicle crash data using techniques from data science to improve emergency care. Making data FAIR enables greater collaboration, public-private partnerships and opportunities for translation.
1. What is FAIR Data and Who Needs It?
Philip E. Bourne PhD
peb6a@virginia.edu
https://www.slideshare.net/pebourne
November 2, 2023 COGGE @ NASM
2. My
Perspective/Bias
• Data resource developer
• Open science/scholarship advocate
• An author of the FAIR Principles
• First Chief Data Officer for NIH
• Founding dean of a school of data science
• Biologist not a Geotech person
11/02/23 2
COGGE
3. What is Open Data?
11/02/23 3
Open data refers to data that is freely available to
the public without any restrictions on its use or
distribution. The idea behind open data is to
promote transparency, enable research, and foster
innovation by allowing individuals and
organizations to access and use datasets without
encountering legal, technical, or financial barriers.
ChatGPT
COGGE
4. How is Open Data Licensed?
License Types
• Public Domain Dedication and
License (PDDL)
• Open Data Commons
Attribution License (ODC-BY):
• Open Data Commons Open
Database License (ODbL)
• Creative Commons Licenses:
• MIT License and BSD Licenses
• Government-specific Licenses
Conditions
• Public access
• Attribution
• Copyright, patent, IP rights
• Share-alike
• Commercial/non-commercial
11/02/23 4
COGGE
11. Data Science
As a Driver Its Just the Beginning….
https://zenodo.org/record/6497693
45 Members Data scientist jobs are predicted to experience 36
percent growth between 2021 and 2031, according
to the US Bureau of Labor Statistics.
The global data science platform market size was
valued at USD 64.14 billion in 2021 and is projected
to grow from USD 81.47 billion in 2022 to USD
484.17 billion by 2029, exhibiting a CAGR of 29.0%
during the forecast period.
Data science is the fastest emerging field around the
globe.
11/02/23 11
COGGE
12. Data Science –
In 45+ Years in Academia I Have Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can’t keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive and soon {now?} the driver of what you do
11/02/23 12
COGGE
16. How Disruptive – Witness AlphaGo
https://www.alphagomovie.com/
1. Even the programmers were
disquieted by creating
something better than any
human
2. AlphaGo made a move that no
human Go expert nor
programmer anticipated
3. It takes a lot of resources to
defeat the world champion
Go has more moves than there are atoms in the universe
11/02/23 16
COGGE
17. Proteins have ~20**300 combinations also more than the
number of atoms in the universe
11/02/23 17
COGGE
21. Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● FAIR Data
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
21
22. Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
11/02/23 22
COGGE
23. OK if data are
important, they
need to be FAIR
11/02/23 23
COGGE
24. OK if data are
important, they
need to be FAIR
11/02/23 24
COGGE
25. A FAIR Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
11/02/23 25
COGGE
26. Conversation Cards
• Is the disruption as profound as I indicate? If so,
• What is in it for you?
• How much will it cost?
• If you sustain FAIR data
• What is in it for you?
• How much will it cost?
11/02/23 26
COGGE
Editor's Notes
I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit