SlideShare a Scribd company logo
1 of 22
Download to read offline
Big Data and AI In Fighting
Against Covid-19
--
Andrew Zhang
zhangan@amazon.com
7/8/2020
1. Introduction
2. Supercomputers for Scientific Research
3. Covid-19 Open Data Lake
4. NLP and BERT to answer scientific questions
Agenda
Speaker: Andrew Zhang
Senior Tech Acct Manager at AWS, his specialties are big data, machine
learning, and HPC. Before joining Amazon, Andrew was a data science
engineer with IBM. His interest is scaling machine learning in a hybrid
multi-cloud enterprise environment. Previously, Andrew was an
enterprise architect with Novartis Pharmaceuticals.
Source
Source
Motivation
6
Supercomputers for Scientific Research
Extensive research in
bioinformatics, epidemiology,
and molecular modeling to
understand the treatment
and develop strategies
Bringing together leaders to
provide access to the world’s
most powerful high-
performance computing
resources.
Covid-19 High Performance Computing Consortium
Covid-19 High Performance Computing Consortium
https://mit-satori.github.io/
Covid-19 Active Research Projects
“We have identified two target proteins that generate novel molecules to inhibit
the relevant proteins. The compute capacity will enable us to run and optimize our
neural networks to generate better molecules and estimate their binding affinity to
the target proteins, drug-likeness and ADMET properties. Our work will evolve to
use 3D SMILES (currently at 2D) and other improvement.”
“ We are working with a team who have developed a device to allow safe ventilator
splitting between 2 or more patients. We made the software to guide device
selection based on the patient's respiratory states, but we want the app to allow for
just lookup into pre-computed values from the …”
We have designed a mobile app and a technological platform, compliant to the
European legislation, which enables unidentified contact/exposure information of
users to be efficiently collected in a fully anonymous way.
https://covid19-hpc-consortium.org/
10
Covid-19 Open Data Lake
Covid-19 Tracking and Prediction
COVID-19 confirmed cases and deaths Genomic epidemiological tracking Hospital resource utilization modeling
This is a visual representation of the
number of confirmed cases (counties)
and deaths (circles).
Data Source: COVID-19 data sources:
the 2019 Novel Coronavirus COVID-19
(2019-nCoV) Data Repository by Johns
Hopkins CSSE.
Genomic epidemiology of novel
coronavirus which provides real-
time tracking of pathogen
evolution (click to play the
transmissions and phylogeny)
Hospital resource utilization
modeling Data Source: University of
Washington’s Institute of Health and
Metrics Evaluation (IHME) COVID-19
projections.
Source: DataBricks
Covid-19 Research and Diagnosis
Answer Key Questions from Scientific Literature Read COVID-19 X-ray or CT image
• What is known about transmission, incubation,
and environmental stability?
• What do we know about COVID-19 risk factors?
• What do we know about virus genetics, origin,
and evolution?
• What has been published about medical care?
While PCR tests offer many advantages they are physical things that
require shipping the test or the sample. X-ray machines can be
plugged in to screen patients as long as they have electricity.
AI tools can help general practitioners to triage and treat patients.
Companies are developing AI tools and deploying them at
hospitals Wired 2020.
Source: IEEESource: Kaggle
Open Data Lake: Query and Visualization
(Amazon)
• Global Coronavirus (COVID-19) Data – Tracks confirmed COVID-19 cases in
provinces, states, and countries across the world with a breakdown to the
county level in the US.
• Coronavirus (COVID-19) Data in the United States – Tracks confirmed
cases and deaths in the US by state and county.
• Coronavirus Disease (COVID-19) Testing Data – Tracks the number of
people tested, pending tests, and positive and negative tests for COVID-
19.
• USA Hospital Beds – COVID-19 – Data on hospital beds and their
utilization in the US.
• COVID-19 Open Research Dataset (CORD-19) – A collection of over 45,000
research articles (over 33,000 with full text) about COVID-19, SARS-CoV-2,
and related coronaviruses. AWS has preprocessed and enriched these
with annotations extracted from Amazon Comprehend Medical.
• Amazon: S3 Explorer https://dj2taa9i652rf.cloudfront.net/
• Amazon: Glue Simple Cost Effective ETL https://aws.amazon.com/glue/
• Amazon: Athena a serverless SQL query engine
https://aws.amazon.com/athena/
• Amazon: QuickSight https://aws.amazon.com/quicksight/
A public data lake for analysis of COVID-19 data | AWS Big ... QuickSight Dashboard
Open Data Lake: Query and Visualization (Google)
COVID-19 data from Johns Hopkins Center for
Systems Science and Engineering
OpenStreetMap Public Dataset : World map including
healthcare provider locations
Global Health Dataset from The World Bank :
Global health and population trends asked questions
and tips to get started.
New York Times COVID-19 database: The New York
Times' COVID-19 database based on US health
agency reports.
ECDC COVID-19 Cases by Country : COVID-19 cases
by country as reported by the European Centre for
Disease Prevention and Control.
USAFacts COVID-19 Cases by US County : COVID-19
cases by county aggregated by USAFacts from US
health agencies.
Big Query
15
NLP and BERT to answer scientific
questions
16
NLP and BERT
Source
• BERT, as a contextual model, captures these
relationships in a bidirectional way.
• I made a bank deposit the unidirectional
representation of bank is only based on I made
a but not deposit.
• The pre-trained model on massive datasets enables
anyone building natural language processing to use
this free powerhouse.
• BERT theoretically allows us to smash multiple
benchmarks with minimal task-specific fine-
tuning.
• Corporate data to create different application.
17
COVID-19 Open Research Dataset Challenge
https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset
Every scientist working on a cure or vaccine
must understand this prior research.
158,000 Coronavirus scholarly articles
including 75,000 with full text. • What is known about transmission, incubation, and environmental stability?
• What do we know about COVID-19 risk factors?
• What do we know about vaccines and therapeutics?
• What do we know about virus genetics, origin, and evolution?
• What has been published about medical care?
• What has been published about ethical and social science considerations?
• What do we know about non-pharmaceutical interventions?
• What do we know about diagnostics and surveillance?
Explore Covid-19 Scientific Literature (1)
Generate Summaries from Abstracts by training Summarizer Model
Databricks
Generate a WordCloud from all the titles
Explore Covid-19 Scientific Literature (2)All Task/Challenges answers using NLP:
We will use different libraries to get answer from these
papers.
• Bert QA Model (Pretrained by SQuAD dataset)
• BERT summary Model
• Python Google translate package
• HTML for visualize result
All Flow:
• Using QA Model, read all paper's abstract then
find answer for all tasks
• Concatenate Top 50 confident answers to be
article, and using Summary model to write
summary of answers
• Translate multiple language by google translate
• Write HTML to show summary of all ‘papers
answer for all tasks.
Kaggle
20
Explore Covid-19 Scientific Literature (3)
Google
1. When the user asks an initial
question, the tool not only returns a
set of papers (like in a traditional
search) but also highlights snippets
from the paper that are possible
answers to the question.
2. The user can review the snippets
and quickly make a decision on
whether or not that paper is worth
further reading.
3. If the user is satisfied with the initial
set of papers and snippets, we have
added functionality to pose follow-
up questions, which act as new
queries for the original set of
retrieved articles.
21
Explore Covid-19 Scientific Literature (4)
Amazon
AWS COVID-19 knowledge graph (CKG)
using AWS CloudFormation and Amazon
Neptune, and query the graph using
Jupyter notebooks hosted on Amazon
SageMaker in your AWS account.
The CKG aids in the exploration and
analysis of the COVID-19 Open Research
Dataset (CORD-19), hosted in the AWS
COVID-19 data lake.
The strength of the graph comes from
the connections between scholarly
articles, authors, scientific concepts, and
institutions. The CKG also helps power
the CORD-19 search page..
Questions ?
Twitter: @a9zhang
Email the speaker: zhangan@amazon.com

More Related Content

What's hot

The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceJoseph Benjamin Ilagan
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG CCG
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data worldCraig Milroy
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management StrategiesMicheal Axelsen
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analyticsSanjeev Solanki
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategylarryzagata
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research reportJULIO GONZALEZ SANZ
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...Neo4j
 
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyFrom Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyCambridge Semantics
 
Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides SlideTeam
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDATAVERSITY
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Carl Anderson
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeDATAVERSITY
 
How to create more impact with People Analytics
How to create more impact with People AnalyticsHow to create more impact with People Analytics
How to create more impact with People AnalyticsDavid Green
 

What's hot (20)

The role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practiceThe role of data engineering in data science and analytics practice
The role of data engineering in data science and analytics practice
 
software company.pdf
software company.pdfsoftware company.pdf
software company.pdf
 
Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG Data Governance with Profisee, Microsoft & CCG
Data Governance with Profisee, Microsoft & CCG
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Data strategy in a Big Data world
Data strategy in a Big Data worldData strategy in a Big Data world
Data strategy in a Big Data world
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
 
Data Management Strategies
Data Management StrategiesData Management Strategies
Data Management Strategies
 
Introduction of big data and analytics
Introduction of big data and analyticsIntroduction of big data and analytics
Introduction of big data and analytics
 
Building A Bi Strategy
Building A Bi StrategyBuilding A Bi Strategy
Building A Bi Strategy
 
Big data analytics, research report
Big data analytics, research reportBig data analytics, research report
Big data analytics, research report
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...EY: Why graph technology makes sense for fraud detection and customer 360 pro...
EY: Why graph technology makes sense for fraud detection and customer 360 pro...
 
Data Monetization Framework
Data Monetization FrameworkData Monetization Framework
Data Monetization Framework
 
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital StrategyFrom Data Lakes to the Data Fabric: Our Vision for Digital Strategy
From Data Lakes to the Data Fabric: Our Vision for Digital Strategy
 
Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides Data governance Program PowerPoint Presentation Slides
Data governance Program PowerPoint Presentation Slides
 
Do-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance FrameworkDo-It-Yourself (DIY) Data Governance Framework
Do-It-Yourself (DIY) Data Governance Framework
 
Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015Creating a Data-Driven Organization, Crunchconf, October 2015
Creating a Data-Driven Organization, Crunchconf, October 2015
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
How to Implement Data Governance Best Practice
How to Implement Data Governance Best PracticeHow to Implement Data Governance Best Practice
How to Implement Data Governance Best Practice
 
How to create more impact with People Analytics
How to create more impact with People AnalyticsHow to create more impact with People Analytics
How to create more impact with People Analytics
 

Similar to Big Data and AI in Fighting Against COVID-19

Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Robert Grossman
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Robert Grossman
 
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docxAcademic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docxmakdul
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.cafionabrinkman
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Databricks
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Sanjay Padhi, Ph.D
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...keesvb
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Presentation (1).pptx
Presentation (1).pptxPresentation (1).pptx
Presentation (1).pptxKrishna20539
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...GigaScience, BGI Hong Kong
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Robert Grossman
 
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...Dr. Haxel Consult
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...Ganesan Narayanasamy
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaKees van Bochove
 

Similar to Big Data and AI in Fighting Against COVID-19 (20)

Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
Kohlmeier "Innovations in Academic Search & Discovery - A Case Study From the...
 
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
2019 Triangle Machine Learning Day - Biomedical Image Understanding and EHRs ...
 
Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)Big Data, The Community and The Commons (May 12, 2014)
Big Data, The Community and The Commons (May 12, 2014)
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docxAcademic Research Team Project PaperCOVID-19 Open Research Datas.docx
Academic Research Team Project PaperCOVID-19 Open Research Datas.docx
 
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.caGenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
GenomeTrakr: Perspectives on linking internationally - Canada and IRIDA.ca
 
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
Building a Knowledge Graph with Spark and NLP: How We Recommend Novel Drugs t...
 
Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...TranSMART: How open source software revolutionizes drug discovery through cro...
TranSMART: How open source software revolutionizes drug discovery through cro...
 
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Presentation (1).pptx
Presentation (1).pptxPresentation (1).pptx
Presentation (1).pptx
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
Scott Edmunds, ReCon 2015: Beyond Dead Trees, Publishing Digital Research Obj...
 
Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)Bionimbus Cambridge Workshop (3-28-11, v7)
Bionimbus Cambridge Workshop (3-28-11, v7)
 
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
II-SDV 2012 From (Text) Mining to Models: Applying Large-Scale Text Mining on...
 
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...OpenPOWER Academia and Research team's webinar  - Presentations from Oak Ridg...
OpenPOWER Academia and Research team's webinar - Presentations from Oak Ridg...
 
Open Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in PharmaOpen Source Collaboration in Drug Discovery in Pharma
Open Source Collaboration in Drug Discovery in Pharma
 

More from Bill Liu

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectBill Liu
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Bill Liu
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeBill Liu
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroBill Liu
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsBill Liu
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixBill Liu
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScaleBill Liu
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsBill Liu
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsBill Liu
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Bill Liu
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningBill Liu
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileBill Liu
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningBill Liu
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsBill Liu
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldBill Liu
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeBill Liu
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...Bill Liu
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917Bill Liu
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLPBill Liu
 

More from Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 
Feature Engineering for NLP
Feature Engineering for NLPFeature Engineering for NLP
Feature Engineering for NLP
 

Recently uploaded

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Big Data and AI in Fighting Against COVID-19

  • 1. Big Data and AI In Fighting Against Covid-19 -- Andrew Zhang zhangan@amazon.com 7/8/2020
  • 2.
  • 3. 1. Introduction 2. Supercomputers for Scientific Research 3. Covid-19 Open Data Lake 4. NLP and BERT to answer scientific questions Agenda
  • 4. Speaker: Andrew Zhang Senior Tech Acct Manager at AWS, his specialties are big data, machine learning, and HPC. Before joining Amazon, Andrew was a data science engineer with IBM. His interest is scaling machine learning in a hybrid multi-cloud enterprise environment. Previously, Andrew was an enterprise architect with Novartis Pharmaceuticals.
  • 7. Extensive research in bioinformatics, epidemiology, and molecular modeling to understand the treatment and develop strategies Bringing together leaders to provide access to the world’s most powerful high- performance computing resources. Covid-19 High Performance Computing Consortium
  • 8. Covid-19 High Performance Computing Consortium https://mit-satori.github.io/
  • 9. Covid-19 Active Research Projects “We have identified two target proteins that generate novel molecules to inhibit the relevant proteins. The compute capacity will enable us to run and optimize our neural networks to generate better molecules and estimate their binding affinity to the target proteins, drug-likeness and ADMET properties. Our work will evolve to use 3D SMILES (currently at 2D) and other improvement.” “ We are working with a team who have developed a device to allow safe ventilator splitting between 2 or more patients. We made the software to guide device selection based on the patient's respiratory states, but we want the app to allow for just lookup into pre-computed values from the …” We have designed a mobile app and a technological platform, compliant to the European legislation, which enables unidentified contact/exposure information of users to be efficiently collected in a fully anonymous way. https://covid19-hpc-consortium.org/
  • 11. Covid-19 Tracking and Prediction COVID-19 confirmed cases and deaths Genomic epidemiological tracking Hospital resource utilization modeling This is a visual representation of the number of confirmed cases (counties) and deaths (circles). Data Source: COVID-19 data sources: the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. Genomic epidemiology of novel coronavirus which provides real- time tracking of pathogen evolution (click to play the transmissions and phylogeny) Hospital resource utilization modeling Data Source: University of Washington’s Institute of Health and Metrics Evaluation (IHME) COVID-19 projections. Source: DataBricks
  • 12. Covid-19 Research and Diagnosis Answer Key Questions from Scientific Literature Read COVID-19 X-ray or CT image • What is known about transmission, incubation, and environmental stability? • What do we know about COVID-19 risk factors? • What do we know about virus genetics, origin, and evolution? • What has been published about medical care? While PCR tests offer many advantages they are physical things that require shipping the test or the sample. X-ray machines can be plugged in to screen patients as long as they have electricity. AI tools can help general practitioners to triage and treat patients. Companies are developing AI tools and deploying them at hospitals Wired 2020. Source: IEEESource: Kaggle
  • 13. Open Data Lake: Query and Visualization (Amazon) • Global Coronavirus (COVID-19) Data – Tracks confirmed COVID-19 cases in provinces, states, and countries across the world with a breakdown to the county level in the US. • Coronavirus (COVID-19) Data in the United States – Tracks confirmed cases and deaths in the US by state and county. • Coronavirus Disease (COVID-19) Testing Data – Tracks the number of people tested, pending tests, and positive and negative tests for COVID- 19. • USA Hospital Beds – COVID-19 – Data on hospital beds and their utilization in the US. • COVID-19 Open Research Dataset (CORD-19) – A collection of over 45,000 research articles (over 33,000 with full text) about COVID-19, SARS-CoV-2, and related coronaviruses. AWS has preprocessed and enriched these with annotations extracted from Amazon Comprehend Medical. • Amazon: S3 Explorer https://dj2taa9i652rf.cloudfront.net/ • Amazon: Glue Simple Cost Effective ETL https://aws.amazon.com/glue/ • Amazon: Athena a serverless SQL query engine https://aws.amazon.com/athena/ • Amazon: QuickSight https://aws.amazon.com/quicksight/ A public data lake for analysis of COVID-19 data | AWS Big ... QuickSight Dashboard
  • 14. Open Data Lake: Query and Visualization (Google) COVID-19 data from Johns Hopkins Center for Systems Science and Engineering OpenStreetMap Public Dataset : World map including healthcare provider locations Global Health Dataset from The World Bank : Global health and population trends asked questions and tips to get started. New York Times COVID-19 database: The New York Times' COVID-19 database based on US health agency reports. ECDC COVID-19 Cases by Country : COVID-19 cases by country as reported by the European Centre for Disease Prevention and Control. USAFacts COVID-19 Cases by US County : COVID-19 cases by county aggregated by USAFacts from US health agencies. Big Query
  • 15. 15 NLP and BERT to answer scientific questions
  • 16. 16 NLP and BERT Source • BERT, as a contextual model, captures these relationships in a bidirectional way. • I made a bank deposit the unidirectional representation of bank is only based on I made a but not deposit. • The pre-trained model on massive datasets enables anyone building natural language processing to use this free powerhouse. • BERT theoretically allows us to smash multiple benchmarks with minimal task-specific fine- tuning. • Corporate data to create different application.
  • 17. 17 COVID-19 Open Research Dataset Challenge https://www.whitehouse.gov/briefings-statements/call-action-tech-community-new-machine-readable-covid-19-dataset Every scientist working on a cure or vaccine must understand this prior research. 158,000 Coronavirus scholarly articles including 75,000 with full text. • What is known about transmission, incubation, and environmental stability? • What do we know about COVID-19 risk factors? • What do we know about vaccines and therapeutics? • What do we know about virus genetics, origin, and evolution? • What has been published about medical care? • What has been published about ethical and social science considerations? • What do we know about non-pharmaceutical interventions? • What do we know about diagnostics and surveillance?
  • 18. Explore Covid-19 Scientific Literature (1) Generate Summaries from Abstracts by training Summarizer Model Databricks Generate a WordCloud from all the titles
  • 19. Explore Covid-19 Scientific Literature (2)All Task/Challenges answers using NLP: We will use different libraries to get answer from these papers. • Bert QA Model (Pretrained by SQuAD dataset) • BERT summary Model • Python Google translate package • HTML for visualize result All Flow: • Using QA Model, read all paper's abstract then find answer for all tasks • Concatenate Top 50 confident answers to be article, and using Summary model to write summary of answers • Translate multiple language by google translate • Write HTML to show summary of all ‘papers answer for all tasks. Kaggle
  • 20. 20 Explore Covid-19 Scientific Literature (3) Google 1. When the user asks an initial question, the tool not only returns a set of papers (like in a traditional search) but also highlights snippets from the paper that are possible answers to the question. 2. The user can review the snippets and quickly make a decision on whether or not that paper is worth further reading. 3. If the user is satisfied with the initial set of papers and snippets, we have added functionality to pose follow- up questions, which act as new queries for the original set of retrieved articles.
  • 21. 21 Explore Covid-19 Scientific Literature (4) Amazon AWS COVID-19 knowledge graph (CKG) using AWS CloudFormation and Amazon Neptune, and query the graph using Jupyter notebooks hosted on Amazon SageMaker in your AWS account. The CKG aids in the exploration and analysis of the COVID-19 Open Research Dataset (CORD-19), hosted in the AWS COVID-19 data lake. The strength of the graph comes from the connections between scholarly articles, authors, scientific concepts, and institutions. The CKG also helps power the CORD-19 search page..
  • 22. Questions ? Twitter: @a9zhang Email the speaker: zhangan@amazon.com