SlideShare a Scribd company logo
Advanced Natural
Language
Processing with
Spark NLP
David Talby
CTO, John Snow Labs
2
Agenda
Introducing Spark NLP
State-of-the-art Accuracy
Speed & Scalability
Ease of Use
Examples
1.
2.
3.
4.
5.
3
Introducing Spark NLP
Most popular
O’Reilly Media
54% share
of healthcare AI teams
use Spark NLP
Gradient Flow
16x growth
In downloads of the
library Since Jan 2020
PyPI Download Stats
NLP library in
the enterprise
What is Spark NLP?
▪ State of the art Natural Language Processing
▪ Production-grade, trainable, and scalable
▪ Open-Source Python, Java & Scala libraries
▪ 1,400+ Pre-trained models & pipelines
▪ Active: 26+ new releases/year since 2017!
Spark NLP in Industry
NLP Industry Survey by Gradient Flow,
an independent data science research & insights company, September 2020
Which NLP libraries does your organization use?
Trusted By
There’s a world of difference between
an academic result and a production system
TRAINABLE &
TUNABLE
100% PRIVATE
EXPLAINABLE
REPRODUCIBLE
HARDWARE
OPTIMIZED
SCALABLE
COMMUNITY &
EDUCATION
8
Spark NLP
Introducing Spark NLP 3
• Massive speedups
[Databricks 7.2 ML GPU on 10 AWS f4dn.large:]
7.9 times faster in calculating BERT–Large
6.5 times faster in calculating BERT-base
3.0 times faster in calculating NER DL
• The latest compute platforms
Spark 3.1, 3.0, 2.4, 2.3
Databricks 8.x, 7.x, 6.x – CPU and GPU
Linux, Max, Windows – local development
Docker – with & without Kubernetes
Hadoop 2.7 and 3.x
Cloudera & Hortonworks
AWS, Azure, and GCP
10
Agenda
Introducing Spark NLP
State-of-the-art Accuracy
Speed & Scalability
Ease of Use
Examples
1.
2.
3.
4.
5.
11
On Accuracy
Biomedical Named
Entity Recognition at Scale
Improving Clinical Document
Understanding on COVID-19
Research with Spark NLP
Accurate Clinical Named
Entity Recognition at Scale
• Obtains new state-of-the-art results
on seven public biomedical benchmarks
without using heavy contextual embeddings,
including:
• BC4CHEMD to 93.72%
(4.1% gain)
• Species800 to 80.91%
(4.6% gain)
• JNLPBA to 81.29% (5.2% gain)
• Production-grade codebase on top of the
Spark NLP library; can scale up for training
and inference in any Spark cluster; GPU
support; Polyglot API
• Improve on the previous best accuracy
benchmarks for assertion status detection
• Recognize 100+ entity types including social
determinants of health, anatomy, risk
factors, and adverse events in addition to
other commonly used clinical and
biomedical entities
• Extract trends and insights:
Most frequent disorders & symptoms and
most common vital signs and EKG findings
from CORD-19
Presented at CADL 2020 (International Workshop
on Computational Aspects of Deep Learning), in
conjunction with ICPR 2020
Presented at SDU (Scientific Document
Understanding) workshop at AAAI 2021
• Establishes new state-of-the-art accuracy on
3 clinical concept extraction challenges:
• 2010 i2b2/VA clinical concept extraction
• 2014 n2c2 de-identification
• 2018 n2c2 medication extraction
• Outperform the accuracy of AWS Medical
Comprehend and Google Cloud Healthcare
API by a large margin (8.9% and 6.7%
respectively)
• Outperform plain Keras implementation
Under review
● “State of the art” means the
best peer-reviewed academic
results
● For example: Best F1 score on
CoNLL-2003 NER
benchmark for a system in
production
● Spark NLP uses a custom
model based on Bi-LSTM +
Char-CNN
+ CRF + Word Embeddings
Accuracy: State-of-the-art Models
Named Entity Recognition
● The best F1 score on CoNLL-2003
NER benchmark for a system in
production by using Spark NLP
● BERT Large model was used to
train our Bi-LSTM + Char-CNN +
CRF model
Accuracy: State-of-the-art Models
Named Entity Recognition
● Everything must work right out of the box
● All the parameters are default
● CoNLL 2003 dataset is used in this
benchmark. The eng.train was used for
training and
the eng.testa was used for evaluating the
model
Accuracy: State-of-the-art Models
Named Entity Recognition
Transformers & Embeddings
Spark NLP: 100+ Word
Embeddings
● BERT
● Small BERT
● BioBERT
● CovidBERT
● ALBERT
● ELECTRA
● XLNet
● ELMO
● GloVe
Accuracy: State-of-the-art Models
Multi-class & Multi-label Text Classifications
● Multi-class text classification to
detect emotions, cyberbullying,
fake news, spams, etc.
● Multi-label text classification to
detect toxic comments, movie
genre, etc.
● Hundreds of pre-tained Word
and Sentence Embeddings
● Language-Agnostic BERT
Sentence Embedding
● Universal Sentence Encoder as
an input for text classifications
Accuracy: State-of-the-art Models
SentimentDL, ClassifierDL, and MultiClassifierDL
● BERT
● Small BERT
● BioBERT
● CovidBERT
● LaBSE
● ALBERT
● ELECTRA
● XLNet
● ELMO
● Universal Sentence
Encoder
● GloVe
● 100 dimensions
● 200
dimensions
● 128 dimensions
● 256
dimensions
● 300
dimensions
● 512 dimensions
● 768
dimensions
● 1024
dimensions
● tfhub_ues
● tfhub_use_lg
● glove_6B_100
● glove_6B_300
● glove_840B_300
● bert_base_cased
● bert_base_uncased
● bert_large_cased
● bert_large_uncased
● bert_multi_uncased
● electra_small_uncased
● elmo
● ...
● 2 classes (positive/negative)
● 3 classes (0, 1, 2)
● 4 classes (Sports, Business,
etc.)
● 5 classes (1.0, 2.0, 3.0, 4.0, 5.0)
● ... 100 classes!
Accuracy: State-of-the-art Models
Language Detection & Identification
● LanguageDetectorDL is a state-of-the-art
TensorFlow/Keras model
● Uses the positions of the characters
● It is around 3 MB to 5 MB
● It has been trained over 8 million Wikipedia
pages
● It has between 97% to 99% accuracy for text
longer than 140 characters
Accuracy: State-of-the-art Models
Context Spell Checker
● Ability to consider OCR specific error
patterns
● Ability to leverage the context
● Ability to preserve and even correct custom
patterns
● Flexibility to incorporate your own custom
patterns
20
Agenda
Introducing Spark NLP
State-of-the-art Accuracy
Speed & Scalability
Ease of Use
Examples
1.
2.
3.
4.
5.
Optimizing Performance
BERT Embeddings
● Transformers are slow!
● They need GPUs
● It depends highly on max sequence
length
Spark NLP 2.6 optimizations:
● Improve the memory consumption by
30%
● Improve performance by more than 70%
with dynamic shape
Performance
BERT Embeddings
● Trade off size, memory, and accuracy
● Tiny BERT
● Mini BERT
● Small BERT
● Medium BERT
● Others…
Example:
● BERT-Tiny is 24x times smaller and
28x times faster than BERT-Base
Performance:
Hardware
● Optimized builds of Spark NLP
for both Intel and Nvidia
● Out-of-the-box optimizations for
Intel (MKL, etc.) and Nvidia
(Spark 3, etc.)
● Ongoing profiling with
engineering teams at both
companies
Scale: Distribution & Parallelism
● Zero code changes to scale a pipeline to any
Spark cluster
● Only natively distributed open-source NLP
library
● Spark provides execution planning, caching,
serialization, and shuffling
● Caveats
● Speedup depends on what you actually do
● Spark configurations matter
● Cluster tuning based on your data is advised
Scale: Distribution & Parallelism
Recognize Entity DL Pipeline
● Amazon full reviews, 15 million
sentences, and
255 million tokens
● Single node, 32G memory & 32 cores
● 10x workers with 32G memory & 16 cores
● The pipeline includes sentence
detection, tokenization, word
embeddings, and NER
Setup:
● Single node is dedicated Dell Server
● 10 Nodes are in Databricks on AWS
Scale: Distribution & Parallelism
BERT Embeddings
● Amazon full reviews, 15 million
sentences, and 255 million tokens
● Single node with 64G memory & 32
cores
● 10x workers with 32G memory & 16 cores
● 128 max sequence length
Setup:
● Single node is dedicated Dell Server
● 10 Nodes are in Databricks on AWS
27
Agenda
Introducing Spark NLP
State-of-the-art Accuracy
Speed & Scalability
Ease of Use
Examples
1.
2.
3.
4.
5.
Easy to Use
Python, Scala, and Java
● Pretrained pipelines
● Pretrained models
● Training your own models
Easy to Use
Pretrained Pipelines
● 100+ pretrained pipelines
● Full support for 13 languages
● Simple and easy to use
● Works online and offline
● Preconfigured
Easy to Use
Pretrained Models
● Hundreds of pretrained models
● Support for 46 languages
● Works online and offline
● Flexible & customized pipelines
● Caveat: some models depend on each
other
Easy to Use
Train your own POS tagging models
● POS() accepts token-tag format
● POS Tagger is based on Perceptron Average
algorithm
● Language-agnostic and supports any
language
Easy to Use
Train your own NER models
● CoNLL 2003 format as input
● Accepts 50+ Word Embeddings
models
● Train on CPU or GPU
● Extended metrics and evaluation
● Built-in validation split with metrics
Easy to Use
Train your own NER models
● BERT with 2 layers & 768 dimensions
● 16 minutes training
● 91% Micro F1 on Dev
● 90% conll_eval on Dev
● Full CoNLL 2003 training dataset
● Google Colab with GPU
Easy to Use
Train your own multi-class classifiers
● Supports up to 100 classes
● Accepts 90+ Word & Sentence Embeddings
models
● Train on CPU or GPU
● Extended metrics and evaluation
● Built-in validation split with metrics
35
Agenda
Introducing Spark NLP
State-of-the-art Accuracy
Speed & Scalability
Ease of Use
Examples
1.
2.
3.
4.
5.
36
Spark NLP for
Healthcare
37
Spark OCR
38
Project creation
Team setup
Tasks creation
Labeling
The Annotation Lab
39
Learn More
Using Spark NLP to build a drug
discovery knowledge graph for Covid-19
Vishnu Vettrivel & Alexander Thomas
Founder & Principal Data Scientist at Wisecube
NLP in Healthcare:
Challenges & Opportunities
Ganesh Thodikulam
Executive Director, Kaiser Permanente
A Unified CV, OCR, and NLP for
Scalable Document Understanding
Text Analytics and its Applications
in the Pharma Industry
Harsha Gurulingappa, Ph.D.
Text Analytics Product Owner at Merck
NLP in Oncology Real World Data:
Opportunities to develop a true learning
healthcare system
Patrick Beukema, Ph.D.
Senior ML Engineer, DocuSign
Automated & Explainable Deep
Learning for Clinical Language
Understanding at Roche
Vishakha Sharma, Ph.D.
Principal Data Scientist, Roche
George A. Komatsoulis, Ph.D.
Chief of Bioinformatics at CancerLinQ
40
Thank you!
© 2015-2021 John Snow Labs Inc. All rights reserved. The John Snow Labs logo is a trademarks of John Snow Labs Inc. The included information is for informational purposes only and represents the current
view of John Snow Labs as of the date of this presentation. Since John Snow Labs must respond to changing market conditions, it should not be interpreted to be a commitment on its part, and John Snow Labs
cannot guarantee the accuracy of any information provided after the date of this presentation. John Snow Labs makes no warranties, express
or statutory, as to the information in this presentation.
demo.johnsnowlabs.com
nlp.johnsnowlabs.com
Live demos:
Get Started:

More Related Content

What's hot

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
Databricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
Ryan Blue
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
Till Rohrmann
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Carol McDonald
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Cloudera, Inc.
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
Slim Baltagi
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
Dr. C.V. Suresh Babu
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
Databricks
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
Dremio Corporation
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
Mohammed Fazuluddin
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
Whiteklay
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
Databricks
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
Girish Khanzode
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
Houw Liong The
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
Knoldus Inc.
 
Spark
SparkSpark

What's hot (20)

Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DBAnalyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
Analyzing Flight Delays with Apache Spark, DataFrames, GraphFrames, and MapR-DB
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark ApplicationsTop 5 Mistakes to Avoid When Writing Apache Spark Applications
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Design of Hadoop Distributed File System
Design of Hadoop Distributed File SystemDesign of Hadoop Distributed File System
Design of Hadoop Distributed File System
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Apache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In PracticeApache Arrow: In Theory, In Practice
Apache Arrow: In Theory, In Practice
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
Apache Spark Core
Apache Spark CoreApache Spark Core
Apache Spark Core
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Spark
SparkSpark
Spark
 

Similar to Advanced Natural Language Processing with Apache Spark NLP

From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
Carl W. Handlin
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
geetachauhan
 
Building A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation EngineBuilding A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation Engine
Databricks
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
GDSCNiT
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Cheer Chain Enterprise Co., Ltd.
 
Cloud Native Application Integration With APIs
Cloud Native Application Integration With APIsCloud Native Application Integration With APIs
Cloud Native Application Integration With APIs
Nirmal Fernando
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
Fwdays
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
Alexandr Savchenko
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
doppenhe
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deployment
Gang Tao
 
Cv of manjunath kudari
Cv of manjunath kudariCv of manjunath kudari
Cv of manjunath kudariJagadeesh Dh
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
Databricks
 
Dagster @ R&S MNT
Dagster @ R&S MNTDagster @ R&S MNT
Dagster @ R&S MNT
Simon Späti
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptx
Hai Nguyen Duy
 
Tips and Tricks for Toad
Tips and Tricks for ToadTips and Tricks for Toad
Tips and Tricks for Toad
Aflex Distribution
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
YashrajNayak4
 
Soma_Mishra_Resume
Soma_Mishra_ResumeSoma_Mishra_Resume
Soma_Mishra_ResumeSoma -mit
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
David Talby
 

Similar to Advanced Natural Language Processing with Apache Spark NLP (20)

From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
Scaling AI in production using PyTorch
Scaling AI in production using PyTorchScaling AI in production using PyTorch
Scaling AI in production using PyTorch
 
Building A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation EngineBuilding A Product Assortment Recommendation Engine
Building A Product Assortment Recommendation Engine
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
Intel Parallel Studio XE 2016 網路開發工具包新版本功能介紹(現已上市,歡迎詢價)
 
Cloud Native Application Integration With APIs
Cloud Native Application Integration With APIsCloud Native Application Integration With APIs
Cloud Native Application Integration With APIs
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Deploying ML models in the enterprise
Deploying ML models in the enterpriseDeploying ML models in the enterprise
Deploying ML models in the enterprise
 
Zakir_Hussain_cv
Zakir_Hussain_cvZakir_Hussain_cv
Zakir_Hussain_cv
 
Scale machine learning deployment
Scale machine learning deploymentScale machine learning deployment
Scale machine learning deployment
 
Cv of manjunath kudari
Cv of manjunath kudariCv of manjunath kudari
Cv of manjunath kudari
 
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
RubiOne: Apache Spark as the Backbone of a Retail Analytics Development Envir...
 
Dagster @ R&S MNT
Dagster @ R&S MNTDagster @ R&S MNT
Dagster @ R&S MNT
 
Tracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptxTracing-for-fun-and-profit.pptx
Tracing-for-fun-and-profit.pptx
 
Tips and Tricks for Toad
Tips and Tricks for ToadTips and Tricks for Toad
Tips and Tricks for Toad
 
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 MeetupPreparing for Neo - Singapore OutSystems User Group October 2022 Meetup
Preparing for Neo - Singapore OutSystems User Group October 2022 Meetup
 
Soma_Mishra_Resume
Soma_Mishra_ResumeSoma_Mishra_Resume
Soma_Mishra_Resume
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
Databricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Recently uploaded

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 

Recently uploaded (20)

Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 

Advanced Natural Language Processing with Apache Spark NLP

  • 1. Advanced Natural Language Processing with Spark NLP David Talby CTO, John Snow Labs
  • 2. 2 Agenda Introducing Spark NLP State-of-the-art Accuracy Speed & Scalability Ease of Use Examples 1. 2. 3. 4. 5.
  • 3. 3 Introducing Spark NLP Most popular O’Reilly Media 54% share of healthcare AI teams use Spark NLP Gradient Flow 16x growth In downloads of the library Since Jan 2020 PyPI Download Stats NLP library in the enterprise
  • 4. What is Spark NLP? ▪ State of the art Natural Language Processing ▪ Production-grade, trainable, and scalable ▪ Open-Source Python, Java & Scala libraries ▪ 1,400+ Pre-trained models & pipelines ▪ Active: 26+ new releases/year since 2017!
  • 5. Spark NLP in Industry NLP Industry Survey by Gradient Flow, an independent data science research & insights company, September 2020 Which NLP libraries does your organization use?
  • 7. There’s a world of difference between an academic result and a production system TRAINABLE & TUNABLE 100% PRIVATE EXPLAINABLE REPRODUCIBLE HARDWARE OPTIMIZED SCALABLE COMMUNITY & EDUCATION
  • 9. Introducing Spark NLP 3 • Massive speedups [Databricks 7.2 ML GPU on 10 AWS f4dn.large:] 7.9 times faster in calculating BERT–Large 6.5 times faster in calculating BERT-base 3.0 times faster in calculating NER DL • The latest compute platforms Spark 3.1, 3.0, 2.4, 2.3 Databricks 8.x, 7.x, 6.x – CPU and GPU Linux, Max, Windows – local development Docker – with & without Kubernetes Hadoop 2.7 and 3.x Cloudera & Hortonworks AWS, Azure, and GCP
  • 10. 10 Agenda Introducing Spark NLP State-of-the-art Accuracy Speed & Scalability Ease of Use Examples 1. 2. 3. 4. 5.
  • 11. 11 On Accuracy Biomedical Named Entity Recognition at Scale Improving Clinical Document Understanding on COVID-19 Research with Spark NLP Accurate Clinical Named Entity Recognition at Scale • Obtains new state-of-the-art results on seven public biomedical benchmarks without using heavy contextual embeddings, including: • BC4CHEMD to 93.72% (4.1% gain) • Species800 to 80.91% (4.6% gain) • JNLPBA to 81.29% (5.2% gain) • Production-grade codebase on top of the Spark NLP library; can scale up for training and inference in any Spark cluster; GPU support; Polyglot API • Improve on the previous best accuracy benchmarks for assertion status detection • Recognize 100+ entity types including social determinants of health, anatomy, risk factors, and adverse events in addition to other commonly used clinical and biomedical entities • Extract trends and insights: Most frequent disorders & symptoms and most common vital signs and EKG findings from CORD-19 Presented at CADL 2020 (International Workshop on Computational Aspects of Deep Learning), in conjunction with ICPR 2020 Presented at SDU (Scientific Document Understanding) workshop at AAAI 2021 • Establishes new state-of-the-art accuracy on 3 clinical concept extraction challenges: • 2010 i2b2/VA clinical concept extraction • 2014 n2c2 de-identification • 2018 n2c2 medication extraction • Outperform the accuracy of AWS Medical Comprehend and Google Cloud Healthcare API by a large margin (8.9% and 6.7% respectively) • Outperform plain Keras implementation Under review
  • 12. ● “State of the art” means the best peer-reviewed academic results ● For example: Best F1 score on CoNLL-2003 NER benchmark for a system in production ● Spark NLP uses a custom model based on Bi-LSTM + Char-CNN + CRF + Word Embeddings Accuracy: State-of-the-art Models Named Entity Recognition
  • 13. ● The best F1 score on CoNLL-2003 NER benchmark for a system in production by using Spark NLP ● BERT Large model was used to train our Bi-LSTM + Char-CNN + CRF model Accuracy: State-of-the-art Models Named Entity Recognition
  • 14. ● Everything must work right out of the box ● All the parameters are default ● CoNLL 2003 dataset is used in this benchmark. The eng.train was used for training and the eng.testa was used for evaluating the model Accuracy: State-of-the-art Models Named Entity Recognition
  • 15. Transformers & Embeddings Spark NLP: 100+ Word Embeddings ● BERT ● Small BERT ● BioBERT ● CovidBERT ● ALBERT ● ELECTRA ● XLNet ● ELMO ● GloVe
  • 16. Accuracy: State-of-the-art Models Multi-class & Multi-label Text Classifications ● Multi-class text classification to detect emotions, cyberbullying, fake news, spams, etc. ● Multi-label text classification to detect toxic comments, movie genre, etc. ● Hundreds of pre-tained Word and Sentence Embeddings ● Language-Agnostic BERT Sentence Embedding ● Universal Sentence Encoder as an input for text classifications
  • 17. Accuracy: State-of-the-art Models SentimentDL, ClassifierDL, and MultiClassifierDL ● BERT ● Small BERT ● BioBERT ● CovidBERT ● LaBSE ● ALBERT ● ELECTRA ● XLNet ● ELMO ● Universal Sentence Encoder ● GloVe ● 100 dimensions ● 200 dimensions ● 128 dimensions ● 256 dimensions ● 300 dimensions ● 512 dimensions ● 768 dimensions ● 1024 dimensions ● tfhub_ues ● tfhub_use_lg ● glove_6B_100 ● glove_6B_300 ● glove_840B_300 ● bert_base_cased ● bert_base_uncased ● bert_large_cased ● bert_large_uncased ● bert_multi_uncased ● electra_small_uncased ● elmo ● ... ● 2 classes (positive/negative) ● 3 classes (0, 1, 2) ● 4 classes (Sports, Business, etc.) ● 5 classes (1.0, 2.0, 3.0, 4.0, 5.0) ● ... 100 classes!
  • 18. Accuracy: State-of-the-art Models Language Detection & Identification ● LanguageDetectorDL is a state-of-the-art TensorFlow/Keras model ● Uses the positions of the characters ● It is around 3 MB to 5 MB ● It has been trained over 8 million Wikipedia pages ● It has between 97% to 99% accuracy for text longer than 140 characters
  • 19. Accuracy: State-of-the-art Models Context Spell Checker ● Ability to consider OCR specific error patterns ● Ability to leverage the context ● Ability to preserve and even correct custom patterns ● Flexibility to incorporate your own custom patterns
  • 20. 20 Agenda Introducing Spark NLP State-of-the-art Accuracy Speed & Scalability Ease of Use Examples 1. 2. 3. 4. 5.
  • 21. Optimizing Performance BERT Embeddings ● Transformers are slow! ● They need GPUs ● It depends highly on max sequence length Spark NLP 2.6 optimizations: ● Improve the memory consumption by 30% ● Improve performance by more than 70% with dynamic shape
  • 22. Performance BERT Embeddings ● Trade off size, memory, and accuracy ● Tiny BERT ● Mini BERT ● Small BERT ● Medium BERT ● Others… Example: ● BERT-Tiny is 24x times smaller and 28x times faster than BERT-Base
  • 23. Performance: Hardware ● Optimized builds of Spark NLP for both Intel and Nvidia ● Out-of-the-box optimizations for Intel (MKL, etc.) and Nvidia (Spark 3, etc.) ● Ongoing profiling with engineering teams at both companies
  • 24. Scale: Distribution & Parallelism ● Zero code changes to scale a pipeline to any Spark cluster ● Only natively distributed open-source NLP library ● Spark provides execution planning, caching, serialization, and shuffling ● Caveats ● Speedup depends on what you actually do ● Spark configurations matter ● Cluster tuning based on your data is advised
  • 25. Scale: Distribution & Parallelism Recognize Entity DL Pipeline ● Amazon full reviews, 15 million sentences, and 255 million tokens ● Single node, 32G memory & 32 cores ● 10x workers with 32G memory & 16 cores ● The pipeline includes sentence detection, tokenization, word embeddings, and NER Setup: ● Single node is dedicated Dell Server ● 10 Nodes are in Databricks on AWS
  • 26. Scale: Distribution & Parallelism BERT Embeddings ● Amazon full reviews, 15 million sentences, and 255 million tokens ● Single node with 64G memory & 32 cores ● 10x workers with 32G memory & 16 cores ● 128 max sequence length Setup: ● Single node is dedicated Dell Server ● 10 Nodes are in Databricks on AWS
  • 27. 27 Agenda Introducing Spark NLP State-of-the-art Accuracy Speed & Scalability Ease of Use Examples 1. 2. 3. 4. 5.
  • 28. Easy to Use Python, Scala, and Java ● Pretrained pipelines ● Pretrained models ● Training your own models
  • 29. Easy to Use Pretrained Pipelines ● 100+ pretrained pipelines ● Full support for 13 languages ● Simple and easy to use ● Works online and offline ● Preconfigured
  • 30. Easy to Use Pretrained Models ● Hundreds of pretrained models ● Support for 46 languages ● Works online and offline ● Flexible & customized pipelines ● Caveat: some models depend on each other
  • 31. Easy to Use Train your own POS tagging models ● POS() accepts token-tag format ● POS Tagger is based on Perceptron Average algorithm ● Language-agnostic and supports any language
  • 32. Easy to Use Train your own NER models ● CoNLL 2003 format as input ● Accepts 50+ Word Embeddings models ● Train on CPU or GPU ● Extended metrics and evaluation ● Built-in validation split with metrics
  • 33. Easy to Use Train your own NER models ● BERT with 2 layers & 768 dimensions ● 16 minutes training ● 91% Micro F1 on Dev ● 90% conll_eval on Dev ● Full CoNLL 2003 training dataset ● Google Colab with GPU
  • 34. Easy to Use Train your own multi-class classifiers ● Supports up to 100 classes ● Accepts 90+ Word & Sentence Embeddings models ● Train on CPU or GPU ● Extended metrics and evaluation ● Built-in validation split with metrics
  • 35. 35 Agenda Introducing Spark NLP State-of-the-art Accuracy Speed & Scalability Ease of Use Examples 1. 2. 3. 4. 5.
  • 38. 38 Project creation Team setup Tasks creation Labeling The Annotation Lab
  • 39. 39 Learn More Using Spark NLP to build a drug discovery knowledge graph for Covid-19 Vishnu Vettrivel & Alexander Thomas Founder & Principal Data Scientist at Wisecube NLP in Healthcare: Challenges & Opportunities Ganesh Thodikulam Executive Director, Kaiser Permanente A Unified CV, OCR, and NLP for Scalable Document Understanding Text Analytics and its Applications in the Pharma Industry Harsha Gurulingappa, Ph.D. Text Analytics Product Owner at Merck NLP in Oncology Real World Data: Opportunities to develop a true learning healthcare system Patrick Beukema, Ph.D. Senior ML Engineer, DocuSign Automated & Explainable Deep Learning for Clinical Language Understanding at Roche Vishakha Sharma, Ph.D. Principal Data Scientist, Roche George A. Komatsoulis, Ph.D. Chief of Bioinformatics at CancerLinQ
  • 40. 40 Thank you! © 2015-2021 John Snow Labs Inc. All rights reserved. The John Snow Labs logo is a trademarks of John Snow Labs Inc. The included information is for informational purposes only and represents the current view of John Snow Labs as of the date of this presentation. Since John Snow Labs must respond to changing market conditions, it should not be interpreted to be a commitment on its part, and John Snow Labs cannot guarantee the accuracy of any information provided after the date of this presentation. John Snow Labs makes no warranties, express or statutory, as to the information in this presentation. demo.johnsnowlabs.com nlp.johnsnowlabs.com Live demos: Get Started: