This document discusses biomedical data science and the opportunities and challenges presented by new developments in data science. Some key points:
- We are at a tipping point where biomedical research is no longer the sole leader in data science due to advances in many other fields. Biomedical researchers need to become data scientists to stay relevant.
- Data science is being driven by the massive growth of digital data and requires an interdisciplinary approach. It is touching every field and attracting many students.
- Developing effective data systems and infrastructure is a major challenge to enable open sharing and analysis of data. Initiatives are underway but more collaboration is needed across sectors.
- Advances in machine learning, like Alpha
Gain an overview of data verification and validation, the methods and techniques used to keep data clean as well as new business practices in the industry that help in maintaining data quality and preventing data decay.
Adopt new approaches of “Think Blue” and “Think Green”, in order to create a pollution free virtual environment.
Check out more - http://www.infocheckpoint.com/Images/pdf/Expand-your-Enterprise-Exponentially-whitepaper.pdf
Data analytics presentation- Management career institute PoojaPatidar11
1. The basic definition of Data, Analytics, and Data Analytics
2. Definition: Data: Data is a set of values of qualitative or quantitative variables. It is information in the raw or unorganized form. It may be a fact, figure, characters, symbols etc
Analytics: Analytics is the discovery, interpretation, and communication of meaningful patterns in data and applying those patterns towards effective decision making.
Data Analytics: Data analytics refers to qualitative and quantitative techniques and processes used to enhance productivity and business gain.
3.Types of analytics: Predictive Analytics (What could happen?)
Prescriptive Analytics (What should we do)
Descriptive Analytics (What has happened?)
4.Why Data analytics? Data Analytics is needed in Business to Consumer applications (B2C)
5.The process of Data analytics: Data requirements,
Data collection, Data processing, Data cleaning, Exploratory data analysis,
Modeling and algorithms, Data product, Communication
6.The scope of Data Analytics: Bright future of data analytics, many professionals and students are interested in a career in data analytics.
7.Importance of data analytics:1. Predict customer trends and behaviors
Analyze,
2 interpret and deliver data in meaningful ways
3.Increase business productivity
4.Drive effective decision-making
8.why become a data analyst? talented gaps of skill candidates, good salaries for freshers, great future growth path
9. What recruiters look for in applicants: Problem-Solving Skills, Analytical Mind, Maths and Statistic Skills, Communication (both oral and written), Teamwork Abilities
10. Skill is required for Data analytics?
1.) Analytical Skills
2.) Numeracy Skills
3.) Technical and Computer Skills
4.) Attention to Details
5.) Business Skills
6.) Communication Skills
11. Data analytics tools
1.SAS: SAS (Statistical Analysis System) is a software suite developed by SAS Institute. sas language can be defined as a programming language in the computing field. This language is generally used for the purpose of statistical analysis. The language has the ability to read data from databases and common spreadsheets.
2. R: R is a programming language and software environment for statistical analysis, graphics representation and reporting.R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows, and Mac.
3.PYTHON: Python is a popular programming language Python is a powerful, flexible, open-sources language that is easy to use,
and has a powerful library for data manipulation and analysis.
4.TABLEAU: Tableau Software is a software company that produces interactive data visualization products focused on business intelligence.
This presentation introduces some concepts of Data Analytics including: Data Science, Big Data, Social Network Analysis, Process Mining, Market Basket Analysis, and Pattern Recognition
Big Data Analytics Powerpoint Presentation SlideSlideTeam
If it’s that time to make analysis for the predicament of the management system or simply to present deafening data in front of your qualified team then you have reached the right match. SlideTeam presents you classy and eternally approaching PowerPoint slides for big data analytics. Data analysis agendas and big data plans are shown through captivating icons and subheadings for a precise and interesting approach. This unique PPT slide is useful for studying business and marketing related topics, approaching the correct conclusions and keeping a track on business growth. Make an outstanding presentation for your viewers with this unique PPT slide and deliver your message in an effective manner using Big data analytics Powerpoint Presentation slide and make your pathways more defining. Most of the elements of the slide are highly customizable. The text boxes help you in adding more information about the point mentioned and its associated icon. Every detail in our Big Data Analytics Powerpoint Presentation Slide is doubly cross checked. You can be certain of it's authenticity. https://bit.ly/3fvnRVK
Gain an overview of data verification and validation, the methods and techniques used to keep data clean as well as new business practices in the industry that help in maintaining data quality and preventing data decay.
Adopt new approaches of “Think Blue” and “Think Green”, in order to create a pollution free virtual environment.
Check out more - http://www.infocheckpoint.com/Images/pdf/Expand-your-Enterprise-Exponentially-whitepaper.pdf
Data analytics presentation- Management career institute PoojaPatidar11
1. The basic definition of Data, Analytics, and Data Analytics
2. Definition: Data: Data is a set of values of qualitative or quantitative variables. It is information in the raw or unorganized form. It may be a fact, figure, characters, symbols etc
Analytics: Analytics is the discovery, interpretation, and communication of meaningful patterns in data and applying those patterns towards effective decision making.
Data Analytics: Data analytics refers to qualitative and quantitative techniques and processes used to enhance productivity and business gain.
3.Types of analytics: Predictive Analytics (What could happen?)
Prescriptive Analytics (What should we do)
Descriptive Analytics (What has happened?)
4.Why Data analytics? Data Analytics is needed in Business to Consumer applications (B2C)
5.The process of Data analytics: Data requirements,
Data collection, Data processing, Data cleaning, Exploratory data analysis,
Modeling and algorithms, Data product, Communication
6.The scope of Data Analytics: Bright future of data analytics, many professionals and students are interested in a career in data analytics.
7.Importance of data analytics:1. Predict customer trends and behaviors
Analyze,
2 interpret and deliver data in meaningful ways
3.Increase business productivity
4.Drive effective decision-making
8.why become a data analyst? talented gaps of skill candidates, good salaries for freshers, great future growth path
9. What recruiters look for in applicants: Problem-Solving Skills, Analytical Mind, Maths and Statistic Skills, Communication (both oral and written), Teamwork Abilities
10. Skill is required for Data analytics?
1.) Analytical Skills
2.) Numeracy Skills
3.) Technical and Computer Skills
4.) Attention to Details
5.) Business Skills
6.) Communication Skills
11. Data analytics tools
1.SAS: SAS (Statistical Analysis System) is a software suite developed by SAS Institute. sas language can be defined as a programming language in the computing field. This language is generally used for the purpose of statistical analysis. The language has the ability to read data from databases and common spreadsheets.
2. R: R is a programming language and software environment for statistical analysis, graphics representation and reporting.R is freely available under the GNU General Public License, and pre-compiled binary versions are provided for various operating systems like Linux, Windows, and Mac.
3.PYTHON: Python is a popular programming language Python is a powerful, flexible, open-sources language that is easy to use,
and has a powerful library for data manipulation and analysis.
4.TABLEAU: Tableau Software is a software company that produces interactive data visualization products focused on business intelligence.
This presentation introduces some concepts of Data Analytics including: Data Science, Big Data, Social Network Analysis, Process Mining, Market Basket Analysis, and Pattern Recognition
Big Data Analytics Powerpoint Presentation SlideSlideTeam
If it’s that time to make analysis for the predicament of the management system or simply to present deafening data in front of your qualified team then you have reached the right match. SlideTeam presents you classy and eternally approaching PowerPoint slides for big data analytics. Data analysis agendas and big data plans are shown through captivating icons and subheadings for a precise and interesting approach. This unique PPT slide is useful for studying business and marketing related topics, approaching the correct conclusions and keeping a track on business growth. Make an outstanding presentation for your viewers with this unique PPT slide and deliver your message in an effective manner using Big data analytics Powerpoint Presentation slide and make your pathways more defining. Most of the elements of the slide are highly customizable. The text boxes help you in adding more information about the point mentioned and its associated icon. Every detail in our Big Data Analytics Powerpoint Presentation Slide is doubly cross checked. You can be certain of it's authenticity. https://bit.ly/3fvnRVK
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
Most companies do not think of data when they start out, let alone the quality of that data. With the proliferation of data and the usages of that data, organizations are compelled to focus more and more on data and their quality.
Join Kasu Sista of The Wisdom Chain to understand how to think about, implement, and maintain data quality.
You will learn about:
What do data people think about?
How do you get them to listen to what you want?
Business processes and data life span
Impact of data capture and data quality on down stream business processes
Data quality metrics and how to define them and use them
Practical metadata and data governance
What are the takeaways from the session?
How to talk to your data people
Understanding the importance of capturing data in the right way
Understanding the importance of quality metrics and bench marks
Understanding of operationalizing data quality processes
Reference data is something we often encounter in our projects. In our experience, it is often underestimated and does not get enough attention. In the webinar, we want to make you aware of some interesting aspects of ‘reference data’ such as how it relates to MDM, which it’s often mixed with.
All about Big Data components and the best tools to ingest, process, store and visualize the data.
This is a keynote from the series "by Developer for Developers" powered by eSolutionsGrup.
Reference matter data management:
Two categories of structured data :
Master data: is data associated with core business entities such as customer, product, asset, etc.
Transaction data: is the recording of business transactions such as orders in manufacturing, loan and credit card payments in banking, and product sales in retail.
Reference data: is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise .
Data Mining: What is Data Mining?
History
How data mining works?
Data Mining Techniques.
Data Mining Process.
(The Cross-Industry Standard Process)
Data Mining: Applications.
Advantages and Disadvantages of Data Mining.
Conclusion.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Slides: Taking an Active Approach to Data GovernanceDATAVERSITY
A Look at How Riot Games Implemented Non-Invasive Data Governance
Riot Games created and runs “League of Legends,” the world’s most-played PC game and most viewed eSport — and is now transforming to become a multi-title publisher. To keep pace with this transformation and support a growing player base of millions, Riot Games is taking a page from Bob Seiner’s book, “Non-Invasive Data Governance: The Path of Least Resistance and Greatest Success” and leveraging the Alation Data Catalog to help guide accurate, well-governed analysis.
Bob Seiner will join Riot Games’ Chris Kudelka, Technical Product Manager, and Michael Leslie, Senior Data Governance Architect, and Alation’s John Wills, VP of Professional Service, for an inside look at Data Governance at one of the world’s leading gaming companies.
Join this webinar to learn:
• How Riot Games is implementing Non-Invasive Data Governance
• How this new approach to Data Governance helps to drive the business
• How the Alation Data Catalog helps Riot Games create the foundation for guiding accurate, well-governed data use
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
Most companies do not think of data when they start out, let alone the quality of that data. With the proliferation of data and the usages of that data, organizations are compelled to focus more and more on data and their quality.
Join Kasu Sista of The Wisdom Chain to understand how to think about, implement, and maintain data quality.
You will learn about:
What do data people think about?
How do you get them to listen to what you want?
Business processes and data life span
Impact of data capture and data quality on down stream business processes
Data quality metrics and how to define them and use them
Practical metadata and data governance
What are the takeaways from the session?
How to talk to your data people
Understanding the importance of capturing data in the right way
Understanding the importance of quality metrics and bench marks
Understanding of operationalizing data quality processes
Reference data is something we often encounter in our projects. In our experience, it is often underestimated and does not get enough attention. In the webinar, we want to make you aware of some interesting aspects of ‘reference data’ such as how it relates to MDM, which it’s often mixed with.
All about Big Data components and the best tools to ingest, process, store and visualize the data.
This is a keynote from the series "by Developer for Developers" powered by eSolutionsGrup.
Reference matter data management:
Two categories of structured data :
Master data: is data associated with core business entities such as customer, product, asset, etc.
Transaction data: is the recording of business transactions such as orders in manufacturing, loan and credit card payments in banking, and product sales in retail.
Reference data: is any kind of data that is used solely to categorize other data found in a database, or solely for relating data in a database to information beyond the boundaries of the enterprise .
Data Mining: What is Data Mining?
History
How data mining works?
Data Mining Techniques.
Data Mining Process.
(The Cross-Industry Standard Process)
Data Mining: Applications.
Advantages and Disadvantages of Data Mining.
Conclusion.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
How open data contribute to improving the world. The life science use case. The technical, social, ethical issues.
This was a talk given within the iGEM 2020 programme by the London Imperial College students group (https://2020.igem.org/Team:Imperial_College), in a webinar organised by the SOAPLab group on the topic of Ethics of Automation. Excellent Dr Brandon Sepulvado was the other speaker of the day.
Keynote talk for NCRM Stream Analytics workshop, 19 January 2017, Manchester.
My talk is called "New and Emerging Forms of Data: Past, Present, and Future” and I will be giving a perspective from my role as one of the ESRC Strategic Advisers for Data Resources, in which I was responsible for new and emerging forms of data and realtime analytics. The talk also includes some of the current work in the Oxford e-Research Centre on Social Machines (the SOCIAM project) and an introduction to the PETRAS Internet of Things project.
The talk raises a number of important issues looking ahead, including massive scale of data that is already being supplied by Internet of Things, the implications of automation in our research, reproducibility and confidence in research results. I will also ask, how can the new forms of data and new research methods enable social scientists to work in new ways, and can we move on from the dependence on the traditional investment in longitudinal studies?
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Guest presentation: SASUF Symposium: Digital Technologies, Big Data, and Cybersecurity, Vaal University of Technology, Vanderbijlpark, South Africa, 15 May 2018
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
This is my presentation on the Topic "Data Science - An emerging Stream of Science with its Spreading Reach & Impact". I have compiled and collected different statistics and data from different sources. This may be useful for students and those who might be interested in this field of Study.
Presented online as part of the NASM series in Advancing Drug Discovery see https://www.nationalacademies.org/event/40883_09-2023_advancing-drug-discovery-data-science-meets-drug-discovery
For a panel discussion at the Associate Research Libraries Spring meeting April 27, 2022, Montreal https://www.arl.org/schedule-for-spring-2022-association-meeting/
Frontiers of Computing at the Cellular and Molecular ScalesPhilip Bourne
3 basic points when establishing a new biomedical initiative. Presented at Frontiers of Computing in Health and Society, George Mason University, September 21, 2021.
NITRD Big Data Interagency Working Group Workshop: Pioneering the Future of Federally Supported Data Repositories Jan 13, 2021 - Opening comments on where we are and one suggestion of where we might go with an International Data Science Institute (IDSI) - A blue sky view.
ADSA presentation to the Education SIG on May 28, 2020. Describes 6 years of experience with a capstone program as part of the MS in Data Science at the University of Virginia.
How to Create Map Views in the Odoo 17 ERPCeline George
The map views are useful for providing a geographical representation of data. They allow users to visualize and analyze the data in a more intuitive manner.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
The Indian economy is classified into different sectors to simplify the analysis and understanding of economic activities. For Class 10, it's essential to grasp the sectors of the Indian economy, understand their characteristics, and recognize their importance. This guide will provide detailed notes on the Sectors of the Indian Economy Class 10, using specific long-tail keywords to enhance comprehension.
For more information, visit-www.vavaclasses.com
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
We all have good and bad thoughts from time to time and situation to situation. We are bombarded daily with spiraling thoughts(both negative and positive) creating all-consuming feel , making us difficult to manage with associated suffering. Good thoughts are like our Mob Signal (Positive thought) amidst noise(negative thought) in the atmosphere. Negative thoughts like noise outweigh positive thoughts. These thoughts often create unwanted confusion, trouble, stress and frustration in our mind as well as chaos in our physical world. Negative thoughts are also known as “distorted thinking”.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
1. Biomedical Data Science:
We Are Not Alone
Philip E. Bourne PhD
peb6a@virginia.edu
https://www.slideshare.net/pebourne
July 26, 2023 ISMB Lyon France
2.
3. Disclaimer
I am privileged to be
helping build a new
kind of school within a
traditional institution. I
have drunk my own
Kool-Aid
5. The Human Genome was the Tipping Point
and Led the Way
http://www.ornl.gov/hgmis
• High throughput DNA digital data changed how
we think about biomedicine
• Spawned a new field – bioinformatics /
computational biology/ systems biology /
biomedical data science
• Spawned a multi billion-dollar industry
Is Bioinformatics Dead? PLOS Biology 2021
6. Bourne’s Timeline
(Apologies for the US Centricity)
1980s 1990s 2000s 2010s 2020’s
The Discipline (Whatever it is Called)
Unknown Expt. Driven Emergent Over-sold A Service A Partner The Driver
6
Digital Data
Systems
Analytics
Design
Value
4 Pillars of Data Science
HPC Cloud GPUs
HHMs SVMs NNs CNNs LLMs
HIPPA Privacy Security HiTech
Mol Graphics Web 2.0 Dashboards
8. Basic Premise …
“We need to be more aware than
ever of developments that may be
far outside our discipline that fall
under the broad topic of data
science. In short, we need to
become biomedical data
scientists.”
Stated another way, the
leadership role in data/informatics
afforded by the human genome
project no longer applies.
9. Data Science –
In 45+ Years in Academia I Have Never Seen Anything Like It
• It is a response to the digital transformation of
society
• It is touching every discipline (aka vertical)
• We can’t keep the students out of our classes
• Cause – large amounts of digital data
• Effect – interdisciplinarity, openness, translation,
search for responsibility and more
In summary, it is disruptive to current modes of biomedical research
10. Data Science
As a Driver Its Just the Beginning….
https://zenodo.org/record/6497693
45 Members Data scientist jobs are predicted to experience 36
percent growth between 2021 and 2031, according
to the US Bureau of Labor Statistics.
The global data science platform market size was
valued at USD 64.14 billion in 2021 and is projected
to grow from USD 81.47 billion in 2022 to USD
484.17 billion by 2029, exhibiting a CAGR of 29.0%
during the forecast period.
Data science is the fastest emerging field around the
globe.
11. Given these precedents about data and data
science we should start with a definition/framework
12. Big data and data science are like the Internet…
If I asked you to define them you would all say
something different, yet you use them every day…
http://vadlo.com/cartoons.php?id=357
13. One Definition of Data Science –
The 4+1 Model (aka domains)
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
14. The Data Science Interplay
• Value + Design = Openness,
responsibility
• Value + Analytics = Human
centered AI, algorithmic bias
• Value + Systems =
sustainability, access,
environmental impact
• Design + Analytics = literate
programming, visualization
• Design + Systems =
dashboards, engineering
design
• Analytics + Systems = ML
engineering
Thinking of data as a science unto itself is novel and controversial
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
16. The 4+1 Model - Systems
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
18. Systems….
• Need something akin to the electricity grid or banking system
• Need to consider data and methods as first-class data objects
• Examples: European Open Science Cloud (EOSC), the CS3MESH4EOSC Science
Mesh, the China Science and Technology (CST) Cloud, the African Open Science
Platform, the South African National Integrated Cyber Infrastructure System, the
Malaysia Open Science Platform, the Global Open Science Cloud (GOSC) the
Australian Research Data Commons (ARDC) Nectar Research Cloud, the Digital
Research Alliance of Canada (formerly known as the New Digital Research
Infrastructure Organization), and the Arab States Research and Education
Network.
• Problems span funding agencies; solutions do not
• There is a lack of public-private partnership
20. AlphaGo – Take Home Messages
https://www.alphagomovie.com/
1. Even the programmers were
disquieted by creating
something better than any
human
2. AlphaGo made a move that no
human Go expert nor
programmer anticipated
3. It takes a lot of resources to
defeat the world champion
Go has more moves than there are atoms in the universe
25. AlphaFold2
Numerical optimization – differential programming
Overall gradient descent trained to win CASP
Jumper et al.., 2021. Nature, 596 (7873),
pp.583-589
Transformer models using attention
Geometry invariant to
translation/rotation
26. Logistics Behind the Win
● Nothing fundamentally new from an AI perspective
● Data Integration
● Collaboration not competition
● Engineering challenge beyond most labs
● Compute power beyond most labs
● Team size beyond most labs
● Worked with protein structure specialists
27. Downstream Implications
• Cooperation rather than competition
• Public-private partnership
• Translational possibilities are endless
• Made possible by curated open data
• Appreciate engineering
30. AI Analytics Across the Scientific Discovery
Process
From Yolanda Gil 2023 AI for Science Eds. Choudhary, Fox & Hey p699
31. The 4+1 Model - Design
• Value – assuring societal
benefit
• Design - Communication
of the value of data
• Systems – the means to
communicate and
convey benefit
• Analytics – models and
methods
• Practice – where
everything happens
[Raf Alvarado & Phil Bourne https://doi.org/10.1142/9789811265679_0004]
35. Openness/FAIR
Data Science would not exist if it were not for open
data and methods. It would be wrong for us to take
and not give back
https://sparcopen.org/
https://datascience.virginia.edu/policies
36. Questions I Leave You With ….
• Are we indeed at a change point?
• Will biomedicine continue to lead data science?
• Do we need new models for doing science?
• Are we placing the right emphasis on our research
products, notably data and methods vs papers
38. Databases
organize data
around a project.
Data warehouses
organize the data
for an organization
Data commons
organize the data
for a scientific
discipline or field
Data
Warehouse
Data Ecosystems
How we think about our
infrastructure is important
39. Challenges
Fixed level of funding
Opportunities
data commons
Data commons co-locate data
with cloud computing
infrastructure and commonly
used software services, tools &
apps for managing, analyzing and
sharing data to create an
interoperable resource for the
research community.*
*Robert L. Grossman, Allison Heath, Mark Murphy, Maria Patterson and Walt Wells, A Case for Data Commons Towards Data Science as a Service, IEEE
Computing in Science and Engineer, 2016. Source of image: The CDIS, GDC, & OCC data commons infrastructure at a University of Chicago data center.
Bonazzi VR, Bourne PE (2017) Should biomedical research be like Airbnb? PLoS Biol 15(4): e2001818.
Systems
[Adapted from Bob Grossman]
42. A Data Integration Poster Child
Researcher and Assistant Professor of
Medicine Dr. Thomas Hartka, also a
current online Masters in Data Science
student, is combining two disparate
data sets—electronic health records
and DMV crash data—to save lives
after motor vehicle crashes.
“I enrolled in the MSDS program to
expand my research on automotive
safety. I have already used
techniques from classes in my work.
I hope to expand my research to
real-time analytics to improve
emergency room care.”
— Dr. Thomas Hartka, UVA School
of Medicine
43. Coming back to the question…
So we have a definition of data science and we
have a set of guiding principles, where does this
take us?
Stated another way, what do we want to be
recognized for in 10 years?
https://pebourne.wordpress.com/
44. Research ethics
committees (RECs) review
the ethical acceptability
of research involving
human participants.
Historically, the principal
emphases of RECs have
been to protect
participants from physical
harms and to provide
assurance as to
participants’ interests and
welfare.*
[The Framework] is
guided by, Article 27
of the 1948 Universal
Declaration of Human
Rights. Article 27
guarantees the rights
of every individual in
the world "to share in
scientific
advancement and its
benefits" (including to
freely engage in
responsible scientific
inquiry)…*
Protect human
subject data
The right of human
subjects to benefit
from research.
*GA4GH Framework for Responsible Sharing of Genomic and Health-Related Data, see goo.gl/CTavQR
Data sharing with protections provides the evidence
so patients can benefit from advances in research.
Balance protecting human subject data
with open research that benefits
patients
[Adapted from Bob Grossman]
Value
45. Why Responsible Data Science?
• A defining feature
• A partnership between STEM, social
sciences and the humanities
• Where UVA has strength
47. Gohlke et al. 2022
https://onlinelibrary.wiley.com/doi/10.1002/ctm2.726
Real World Evidence for Preventive Effects of Statins on
Cancer Incidence: A Transatlantic Analysis
EHR
Animal Models
Pathways
48. Daily Challenges
• Deciding what not to do
• Competition for the best team members (faculty and staff)
• Establishing a diverse team
• Lack of a comprehensive enterprise-wide data infrastructure
• Its easier to conform
Editor's Notes
I will introduce the concept of data science with a story that illustrates - citizen engagement, merging of unexpected data and societal benefit