Natural Language Processing at Scale

Natural Language
Processing At Scale
For optimizing business success and Customer
Experience
ML Ops Aspects
MLOps: Production and Engineering - Bay Area, March 2021
Andrei Lopatenko, VP Engineering, Zillow

What’s the focus of this talk
How to implement and use Natural Language Processing in with your organization
at scale for a large business impact, with low development and infrastructure
costs
How to solve many different business problems with NLP, improve customer
experience,
How build NLP development processes, ops serving your business

NLP at Scale
Building NLP systems ‘at scale’
At scale means both
1. for multiple business tasks, building systems for a wide adoption within the company for very
heterogeneous tasks related to processing of natural languages and processing online requests of
your customers/users, documents, etc
2. For high load in the number of users’ requests per day/second, the number of documents to be
processed per second, the number of documents to be processed in one batch etc
Doing it right way has the high ROIs
I would like to advocate that building company wide and deep impact NLP systems is relatively ‘easy’ now
vs 5-10 years ago: it’s doable within relatively short period of times, with small investments, low
maintenance costs but big business and customer experience impact

Why I am talking about it
I have been applying NLP in Google, Apple, WalmartLab, eBay, Zillow since 2006.
Core contributor to core ranking Google Search 2006, co-founder Apple Maps Search (2010), and core
contributor AppStore Search, Walmart search, led Walmart (2014) and eBay Search Science teams,
engineering of Recruit Holding AI Lab, leading Zillow Search and Conversational AI (2019-now). Startups:
Ozlo (NLP, Conversational AI startup, acquired by Facebook in 2017),
In every organization I worked for, NLP was one of key technologies driving business and customer
experience
In 2021, due to abundance of NLP tools from development to serving, building big impact NLP systems is
more available than even several years ago
I’d like to share my 15-year experience of how to build NLP systems at scale for customer and business
gains

Motivating examples: NLP use cases. Case 1
Customer facing online systems: Search
Example: Web, Maps, Real Estate , eCommerce, Apps, and any other big search engines
https://blog.google/products/search/search-language-understanding-bert/
BERT - new NLP models radically changes search for 10% of traffic (reported in 2018)
But ‘old’ NLP techniques such as synonym expansion, term weighting, shallow parsing, phrase
chunking, query classification and many other have been driving majority of search experience
online since early 200X
This is applicable to any search engine (ecommerce, apps, films, real estate), NLP radically
improve quality of search results leading to improvement of customer experience and revenues
through purchases

Motivating example. NLP use cases. Case 2
Customer facing online systems: Recommendations .
Example from Zillow. Embedding representing information about properties
extracted from full text helps with online recommendations and other downstream
application. Similar homes recommendations.
https://www.zillow.com/tech/improve-quality-listing-text/

Business facing Question Answering / online
Example: Bloomberg Trade Order Management Solution
https://www.bloomberg.com/professional/blog/bloomberg-adds-new-nlp-capabilities-to-to
ms/
Questions such as “Who are our top 5 accounts in the tech sector?”
Natural Language based Question Answering to unstructured text information (documents, find the
paragraph with answer “what’s our return policy”) and structured information in databases (“how many
umbrellas we solt last week”) (Natural Language Interface to Databases NLIDB)

Analysis of conversations / (near) real-time streaming
Fascinating example: school project : Prioritization of Emergency Dispatch Calls, 3 high school participants built a
system to analyze emergency phone calls and assess their priority
https://medium.com/ai4allorg/using-natural-language-processing-to-prioritize-emergency-dispatch-calls-ab830a72de
98
Recent, example, European company Corti deploys a realtime system to analyze calls and detect Cardiac arrests ,
it’s more voice analysis (area close to the NLP) rather than NLP, but typically similar MLOps and systems as for the
the NLP systems
https://www.theverge.com/2018/4/25/17278994/ai-cardiac-arrest-corti-emergency-call-response
Understanding phone calls, transcribing them, assessing quality of service, needs of customer, performance of
customer support/business to get customer insights, assess quality of business agents, quality of conversations,
extract global insights

Customer support dialog systems (Chatbots)
Example: Amazon customer support chatbot
https://lifehacker.com/use-a-chatbot-for-faster-amazon-returns-1843927743
Reporting a problem to amazon, amazon chatbot solves many customer problems
Very fast and efficient, reduced costs on huma force

Item understanding
Example: Amazon or Walmart marketplaces
Getting large stream of unstructured data from various providers
Converting it into structured data (rather than embedding as in ex 2),
understanding items, both for customer and business applications (extraction of
attributes of items from merchant descriptions, analysis of reviews): multiple
business and consumer facing downstream applications from online user
experience to business analytics
https://dl.acm.org/doi/abs/10.1145/3183713.3196926

Motivating Examples: summary
NLP is already used to improve customer experience and business in many very different
types of businesses and across different customer experiences and business application.
It has high ROI if applied correctly (right applications, right technologies, right people)
There are very different ways to apply NLP: online, streaming, batch processing, it
frequently requires different types of systems - task is to build MLOps for all of them
6 use cases - mostly random list: there are too many other NLP use cases (
autocomplete online, classification by category offline, ...) where it radically improves
customer experience and brings big business gains

NLP: impact on business
Most of media hype about the NLP is about chatbots
In 2020, most of business impact of NLP is in other areas (but conversational
systems are useful too)
Better search, recommendation, personalization based on NLP -> billions of
dollars
Document understanding, better classification, information extraction -> hundreds
of millions of dollars

NLP serving scenarios
online scenarios : customer facing applications, critical latency and throughput,
such as search query understanding, up to 100s models, 50 ms latency, 10000+
qps, business facing (question answering)
Streaming scenarios: documents, items, processing relatively large texts (10K
symbols), 100 millions per day
Batch scenarios: documents, users, (process billion documents to extract data,
latency requirements might be process batch within a day or 4 hours etc, depends
on business needs)

NLP systems
But applying is NLP is not just about training models, it’s about building systems
Systems which will
1. Serve models in production (various serving scenarios)
2. Train models (science workbenches)
3. Annotate data
4. Deploy models from lab to production
5. Test models, validate models, monitor models (performance, accuracy, compliance, fairness, models
and end to end systems)
6. Integrate model serving with instream of production data
7. Integrate with outstreams : consumer and business facing applications

NLP Systems
Majority of big business revenue and customer experience gains are not from the
most recent, the best NLP models (‘the best science’ )
But from ‘the best engineering’, high performance, reliable, robust, scalable
systems, which are integratable with multiple business and consumer
applications, monitorable, debuggable
Focus is on reliability, robustness, performness, operations excellence,
development engineering quality, openness for collaboration (across functions:
data engineering, NLP scientists, DS scientists, application developers, etc)
Once system works and brings value, state of the art models (accuracy, no bias,
fairness, performance) is the focus

NLP systems
Scientist workbench: access to data sets (from large corpora: web or search logs
to ‘small’ ), annotation tools, data processing and data management, metric tools,
model training, tuning, model management (sharing, storing, retrieving)
Deployment tools: model validation, deployment into various environments
(integrated with CI/CD), model management,
Inference: model workflows, monitoring, alerting, online validation, performance
measurement/’observability’, hardware allocation / scaling
Integration with instream and outstream

NLP systems: high level 3 ways
Cloud native: build the system using standard cloud components
From scratch: write your own from scratch ,
Hybrid: using big open source or cloud blocks for certain tasks (there are plenty of
those now), custom build systems for other tasks

NLP systems; Cloud native
Multiple ways to build NLP systems
high level cloud NLP and ML services, Amazon Comprehend, Sagemaker,
Sagemaker Ground Truth, Transcribe (for Speech to Text) , Text Analytics, Lex.
Textract
Pros: Very fast to develop a prototype and to make working systems, low
development costs, easy to integrate with other systems on the same cloud
(Redshift if amazon etc), low cost operations for managed solution
Cons: high cloud/compute costs, low flexibility in the types of models to develop
and opportunities to develop high accuracy models, performance is not optimized,
integration with non-cloud systems

NLP systems Cloud native
Advantage: Fast MLOps pipeline development
Plenty of tools: S3 for models and artifacts, CloudFormation, AWS CodePipeline
and CodeBuild (with Git) ,ECR Container Registry, SageMaker, AWS Batch, API
Gateway, Sagemaker pipelines = NLP Services, Comprehend - very fast to build
and prototype NLP systems
Another advantage: reasonably easy to adopt to your environment, Terraform
instead of cloudformation, your serving infrastructure instead of AWS
Easy to build multi environments deployment scenarios

NLP Systems. Built from scratch
Built from scratch or based on (rewritten if needed) open source
Abundance of open source: PyTorch serving/TF Serving, Hugging Face
Transformers, AllenNLP Lab environment, Spacy(plenty of other NLP libraries),
Docano (annotations)
Pros: more opportunities of optimization for accuracy of models and performance
of systems, customization for your company needs, owning the software
Cons: longer prototype and production development times, high operational
support costs,

NLP Systems Built from scratch
Might be necessary - example, your NLP models as a part of query understanding
stack, 100s models, Gb+ dictionaries, complicated dependencies, specialized
hardware - flash drive storage etc is required on many nodes, latency and
throughput critical.
There is no good available software to serve this scenario. Nevertheless, part of
this stack can be based on open source (training, model sharing, annotation,
analysis of experiments, monitoring, )

NLP Systems. Mix of cloud and custom built
Mix of cloud and custom built software:
Cloud solution for serving: Sagemaker, AWS Batch, Elastic Inference: different
scenarios
Or Kubeflow on AWS
Pros: quite rapid development and deployment
Cons: cloud costs higher than in the build from scratch scenario but less than in
the cloud native scenario, development costs are cheaper than in bfs, but higher
than in the cloud native,

NLP Systems. Mix of cloud and custom built
Multiple scenarios of deployments (managed Kubernetes vs own)
Requires support to build custom expansions (Kubeflow operators for your serving
frameworks, not all native kubeflow operators are good - require some work to
improve them )
Many high level tools are available: Kubeflow, Cortex (from CortexLabs),
Hydrosphere (managing, monitoring models), Seldon (serving), Neptune
(experiment management), MLFlow (experiment, model, data tracking,
deployment, model registry), Comet (experiment tracking, comparison)
And low level tools: Istio, Kubernetes, Prometheus

NLP Systems. Mix of cloud and custom build
Kubeflow , example, streaming, non latency / qps critical online
Kubeflow pipelines, (end to end orchestration)
TF.Serving, Seldon, (serving)
Jupyter, katib, modeldb, TFMA (TF Model Analysis), TF transform (training,
workbench)
Pytorch, Tensorflow, MXNet

NLP systems development and adoption timeline
NLP is relatively new for many businesses, there is a lot of excitement and a lot of uncertainties
in expectations
To prove value , one have to iterate very fast, build NLP systems and models rapidly, integrate
them with business systems and environments fast, with minimum development (human+
software+) costs - show the value from business and customers points of view.
Build cloud native system fast - show the value to the business, and as it scales by the number of
consumers, lines of business, data, other loads -> move to other architectures if needed to
improve performance, costs. Important to build a good evolvable design from the beginning (it’s
true for any system, the evolvability is as important as scalability etc)

NLP systems - tradeoff
Tradeoffs because of difference methods: classical vs deep neural
1. Inference: 10% model accuracy vs 90% latency difference (gain in customer
experience/conversion due to quality vs loss in customer experience due to
latency and op costs)
2. Training: ex: 1 billion documents, need results in 4 hours, training time
● The design must support running very different solutions inside
● Organizational structure must support taking such decision
● Analytics/ROI assessment must support proving proper data as input to make
such decision
Many other tradeoffs

Scalability by design
When building, important to design systems to be scalable in multiple dimensions: it’s hard to overestimate
future demands
1. The number of human languages and domain area languages
2. The load (qps online systems, the number of messages per second - streaming, the number of
documents in batch and the number of batches - batch systems )
3. The number of different models and the number of different types of models (extraction,
classification, correction, text prediction, text generation etc)
4. The number of developers and scientists working simultaneously deploying new models, new types
of data, new integrations etc
5. The number of metrics the system is monitored and the models are monitored
6. The amounts of data in training, serving
7. The number of use cases, the number of deployments (data centers, regions, nodes)
8. etc

NLP at Scale
Important factor: typically, there are very different serving scenarios from,
example, search online - dozen/hundreds models with multi gigabyte ‘dictionaries’,
some are in parallel, some are consequential, 50ms latency, 10^4+ qps, to
streaming - billion documents per day to batch processing. No one system will
serve all inference cases, necessary to build multiple systems
But training, verification, testing scenario are more unifiable, and it’s possible to
build one scientist workbench, lab environment. It’s beneficial to build one to share
data and models across the organization

Ops
Continuous retraining of models when needed
Support of frequent deployment of models as models are improved and new
models are deployed. Integration of NLP scientist workbench with production
environment. Validation of models
Scalability, how the system scale as traffic, stream, batch size increase, size of
document increase, the number of models run in parallel, other load parameters
may change
Monitoring for performance, incidents, exceptions, quality of models and
end-to-end applications based on the NLP

Ops
Monitoring - what may go wrong:
1. Model performance : model and end-to-end (overall, by segments: users,
categories, regions)
2. Global data changes (changes in global distributions caused by events or seasonal
shifts.. )
3. Incoming data quality issues
4. System performance, uptime
5. Biases, compliances, fairness
6. Significant changes, outliers
Monitoring, alerting, logging

NLP libraries
(separation is conditional, many of them are in both categories)
‘Old’ good technologies: hidden markov chains, conditional random fields, SVM for classification, PCFGs and Dirichlet processes
and software: Stanford CoreNLP, CRFSuite, CRF++, OpenNLP, MeTA, Sempre, Mallet (still useful in some scenarios) - tradeoffs are
in next slides
‘New’ technologies: Spacy, GenSim, Hugging Face Transformers (invaluable by now), FastText, AllenNLP (lab environment),
PyTorch NLP, FlairNLP, DeText, many others
A lot of academic open source code which is adaptable to industrial environments (see Papers with Code, NLP section)
High level libraries helping to build end to end solutions for some domains: Rasa (dialog systems),
Do not hesitate to get inside of open source: Stanford library performance was improved 10X by proper multithreading
implementation and it makes a big difference when you need to process a stream of large documents 40+ millions tokens per hour

Team
To build, support and use system successfully
Strong engineering, science, and product management is required
Modern NLP stack based on deep neural architectures -> BERT and other
Deep understanding of cloud ML infrastructure if you are on the cloud (example
AWS ML infrastructure)
Generic software engineering - building systems rather than just models
Engineering culture, Ops

Data Training sets
Many NLP models are re-usable for many tasks
You company operates in a certain domain such as eCommerce or real estate or
medical or transportation with its particular language. Models and knowledge
which learnt particularities of the domain language for a certain case might be
re-usable for other cases in the same domain (by various technique). Model
discovery, re-sharing simplifies adoption of the NLP across multiple lines of
business
Training and testing data resharing - accelerates model development and the
NLP adoption

NLP Training sets and Metrics
Training sets are important as they train your models for something important/beneficial and
metrics are important if they contribute to measurement of the final impact
What are classification tasks which will benefit your business (improve conversion or purchase
rate for search, better routing of phone calls or customer support ticket ), what are extraction
tasks which will benefit your business (what knowledge graph do you need for better search or
recommendation, which entities are important for browsing by your business agents etc ) , not
‘what’s perplexity of the language model’ but ‘how many symbols customer type in autocomplete
or what percentage of spelling errors is solved’- what will improve the end to end quality and
performance
Focus is on end to end performance rather than classical NLP level metrics only and mature
your systems by development them to impact end to end quality and performance

Continuous improvement circle
Almost none of real world NLP tasks can have a final ‘perfect’ solution <- needs a lot of
leadership support to promote this vision and align with business, incremental gains in system
performance and model accuracy means gains to the business (but need to build system,
measurement, and attribution framework to execute well on it)
Each NLP model can be improved in accuracy, perplexity, etc but what really important is impact
on end to end system - conversion, revenue per session, document processing time etc
Each NLP system can be improved from performance, scalability, cloud costs etc points of view.
Improvement of NLP models and NLP systems has high ROI if done correctly but to do it
correctly requires a lot of work. End to end analysis of systems rather than just model evaluation.
Attribution analysis etc. In big business, building such environment and building organization to
improve NLP pays back

ROI assessment. Expenses
Expenses: salaries + software costs + compute/storage costs + data annotation costs
Small team of several good experts can create an NLP system, integrate it with business
within your company and prove value of it. You do not need more than 5 people to solve
serious tasks
Software costs. Most business cases can be solved using open source software,
Hugging Face Transformers, PyTorch Serve or TF Serving etc whole infrastructure for
training and serving (and other tasks : annotation, can be built using open source)
Compute/Storage costs. Depends. AWS Comprehend etc - more expensive, less flexible,
but fast prototyping. GPU machines are needed in many cases
Data annotation. With transfer learning you do not need huge data sets.

ROI assessment. Returns
For some tasks, such as search/recommendation functions directly facing
consumer, the return is easily computed running online controlled experiments.
For some tasks, such as business facing functions: ex. Document classification to
make faster processing, question answering for agents, return is harder to
compute, since one needs to run new business operation for period of time to
measure impact
Some tasks, replacing humans for information extraction, or question answering to
consumer/customer support: return is computed by the number of people
replaced.
Key: build solutions rapidly, to experiment and find the maximum returns.

Conclusion
NLP systems bring significant gains to business and customer experience
Building them is relatively easy task. There are multiple open source libraries,
multiple cloud solutions, there are multiple alternatives how to build NLP system
for your company.
The task of building and using NLP typically has high ROI if approached correctly

Natural Language Processing at Scale

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Natural Language Processing at Scale

Similar to Natural Language Processing at Scale (20)

Recently uploaded

Recently uploaded (20)

Natural Language Processing at Scale