Open Science
meets
Data Science:
Some challenges to be faced
Claudia Bauzer Medeiros
Institute of Computing – Unicamp
Open Science – G7 Priority
1. Human Capital Formation – research and
innovation
2. Financing – inclusive science, research and
innovation
3. Global Research Infrastructures
4. Open Science
RDA - https://www.rd-alliance.org
Open Science
• Open to all the access to scientific research
• How ??????
• Why ????
• What ???????
COLLABORATION THROUGH DATA –
OPEN SCIENCE = OPEN DATA
6/10000
Open Science – slide
adapted from Gray
Respost
Perguntas
Data driven-science
Models
Simulations
Papers
Files
Experiments
Instruments
XXXXX
National Academies of Sciences,
Engineering, Medicine
July 2018
Open science =
Open access = papers
Open data
Open methods = open source
What is Open Data?
• “What is OPEN DIGITAL DATA”
– Share “everything”? Not necessarily
• Everyone can
– Discover if data exist
– Discover how to obtain them
Under constraints – security, confidentiality,
ethics, intellectual property
8
OPEN SCIENCE – OPEN METADATA
HOW??? Datacite.org
(Find, share, cite, connect)
Open science requires FAIR Data
• Findable
• Accessible
• Interoperable
• Reusable
• ??? Have you fairicized your data???
11
Open Science – Basic concepts
Curation
Preservation
Data (which?)
Processes: Workflows Reproducibility
Cyberinfrastructure
Reusability
Provenance
WHAT ABOUT DATA SCIENCE?
The “sexiest job of the 21st century”
14
@Altigran Silva, Brasnam’18 keynote
Data Science (CACM)
• Processes and systems to extract knowledge or
insight from data in various forms and translate it
into action.
• Interdisciplinary field that integrates approaches
from statistics, data mining, predictive analytics
• Incorporates advances in scalable computing and
data management.
Berman et al. CACM 61(4), April 2018
@Altigran Silva, Brasnam’18 keynote
15
Data Science: Reality (FORBES 2016)
• 80% of time of data scientists spent on data
pre-processing, cleansing, etc.
16
https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says
Open Science meets data science
Open science
• Fairicized data
• Concerns with
– Privacy
– Accessibility
– Quality
– Provenance
– Reproducibility
WHERE, HOW, will be used?
Data Science
• Mine and correlate data
• Concerns with
– Pattern extraction
– Algorithmic efficiency
– Production of knowledge
– Ask interesting questions
from data
Big data and VVVVVVVVVV…
Open Science meets Data Science
Open science
• Fairicized data
• Concerns with
– Privacy
– Accessibility
– Quality
– Provenance
– Reproducibility
WHERE, HOW, will be used?
Data Science
• Mine and correlate data
• Concerns with
– Pattern extraction
– Algorithmic efficiency
– Production of knowledge
– Ask interesting questions
from data
Big data and VVVVVVVVVV…
Open Science meets Data Science
Open science
• Fairicized data
• Concerns with
– Privacy
– Accessibility
– Quality
– Provenance
– Reproducibility
WHERE, HOW, will be used?
Data Science
• Mine and correlate data
• Concerns with
– Pattern extraction
– Algorithmic efficiency
– Production of knowledge
– Ask interesting questions
from data
Big data and VVVVVVVVVV…
WHAT ABOUT
VISUALIZATION????
Challenges
• Fairicization
• Curation
• Visualization!!!!!!!!!!!!!!
• For xxx science to work, interpretation is
needed (who are the “appropriate” experts?)
www.fapesp.br/gestaodedados
Obrigada!!!!
cmbm@ic.unicamp.br

Claudia Bauzer Medeiros - Open Science meets Data Science: Some challenges to be faced

  • 1.
    Open Science meets Data Science: Somechallenges to be faced Claudia Bauzer Medeiros Institute of Computing – Unicamp
  • 2.
    Open Science –G7 Priority 1. Human Capital Formation – research and innovation 2. Financing – inclusive science, research and innovation 3. Global Research Infrastructures 4. Open Science
  • 3.
  • 4.
    Open Science • Opento all the access to scientific research • How ?????? • Why ???? • What ???????
  • 5.
    COLLABORATION THROUGH DATA– OPEN SCIENCE = OPEN DATA
  • 6.
    6/10000 Open Science –slide adapted from Gray Respost Perguntas Data driven-science Models Simulations Papers Files Experiments Instruments XXXXX
  • 7.
    National Academies ofSciences, Engineering, Medicine July 2018 Open science = Open access = papers Open data Open methods = open source
  • 8.
    What is OpenData? • “What is OPEN DIGITAL DATA” – Share “everything”? Not necessarily • Everyone can – Discover if data exist – Discover how to obtain them Under constraints – security, confidentiality, ethics, intellectual property 8
  • 9.
    OPEN SCIENCE –OPEN METADATA
  • 10.
  • 11.
    Open science requiresFAIR Data • Findable • Accessible • Interoperable • Reusable • ??? Have you fairicized your data??? 11
  • 12.
    Open Science –Basic concepts Curation Preservation Data (which?) Processes: Workflows Reproducibility Cyberinfrastructure Reusability Provenance
  • 13.
  • 14.
    The “sexiest jobof the 21st century” 14 @Altigran Silva, Brasnam’18 keynote
  • 15.
    Data Science (CACM) •Processes and systems to extract knowledge or insight from data in various forms and translate it into action. • Interdisciplinary field that integrates approaches from statistics, data mining, predictive analytics • Incorporates advances in scalable computing and data management. Berman et al. CACM 61(4), April 2018 @Altigran Silva, Brasnam’18 keynote 15
  • 16.
    Data Science: Reality(FORBES 2016) • 80% of time of data scientists spent on data pre-processing, cleansing, etc. 16 https://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says
  • 17.
    Open Science meetsdata science Open science • Fairicized data • Concerns with – Privacy – Accessibility – Quality – Provenance – Reproducibility WHERE, HOW, will be used? Data Science • Mine and correlate data • Concerns with – Pattern extraction – Algorithmic efficiency – Production of knowledge – Ask interesting questions from data Big data and VVVVVVVVVV…
  • 18.
    Open Science meetsData Science Open science • Fairicized data • Concerns with – Privacy – Accessibility – Quality – Provenance – Reproducibility WHERE, HOW, will be used? Data Science • Mine and correlate data • Concerns with – Pattern extraction – Algorithmic efficiency – Production of knowledge – Ask interesting questions from data Big data and VVVVVVVVVV…
  • 19.
    Open Science meetsData Science Open science • Fairicized data • Concerns with – Privacy – Accessibility – Quality – Provenance – Reproducibility WHERE, HOW, will be used? Data Science • Mine and correlate data • Concerns with – Pattern extraction – Algorithmic efficiency – Production of knowledge – Ask interesting questions from data Big data and VVVVVVVVVV…
  • 20.
  • 21.
    Challenges • Fairicization • Curation •Visualization!!!!!!!!!!!!!! • For xxx science to work, interpretation is needed (who are the “appropriate” experts?)
  • 22.
  • 23.