(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
Volodymyr Lyubinets. One startup's journey of building ML pipelines for text tasks
1. One startup’s journey of
building ML pipelines for
text tasks
UA Online DS Marathon 2020
Volodymyr Lyubinets
2. 2
Myself
• Founding Engineer at
Forethought AI
• Previously worked at a
Databricks, Facebook,
LinkedIn, etc.
• Competitive
programmer, ACM
ICPC finalist
3. 3
Forethought
• Use AI to accurately
solve customer support
issues
• Service entire lifecycle
of the customer support
request
• Series A startup, raised
$10M led by NEA
• TechCrunch Disrupt ‘18
winner
FORETHOUGHT | WWW.FORETHOUGHT.AI
4. Product overview - Solve
4
● Attempt to answer the question before
an agent has to be involved
6. Product overview - Assist
6
● Tool for customer support agents to
get relevant documents from various
sources
7.
8. How the product works
8
01
Data ingest
Continuously ingest
data from
helpdesks, internal
wikis, web, etc.
02
Train ML models
> Solve and triage
are classification
tasks
> Assist is a question
answering task
03
Serve ML models
… and other product
features, efficiently.
9. Question answering
9
● Answer a question given in a natural language (with various flavours)
● Became a popular NLP task after SQUAD competition
○ Given wikipedia articles and a question in natural language, find
subsequence of the next that answers the question, or declare that no
answer exists.
10. Question answering - pipeline
10
● Find the right answer among 1M+ documents
● Ingest:
● Answer time (new case / search):
Fetch cases Store in Mongo Store in Elastic/SOLR
New
case/search
comes in
Retrieve a set of
candidates from
Elastic
Rank candidates
using ML models for
QA (BERT/XLNET)
11. Question answering - improving data
and search layer
11
● Trim redundant information, such as cited parts of email threads
○ Done via regexes
○ Going a step further: text summarization
● Configure the most of ES - synonyms, boost recent documents scores, etc.
● Use embeddings in ElasticSearch
○ Fine-tuned on customer data, served via BertAsAService
○ Improves results, but is costly
● Going a step further: sentence and document level embeddings
12. Question answering - model
12
● Basic model is BERT/XLNET trained on QA dataset (SQUAD, MSMARCO, etc.)
○ With Bert Large (L=24, H=1024) you can only serve batches of up to 32
examples in real time
● Distillation & smaller models
○ Google recently released a whole set of smaller BERT models
○ With Bert Small (L=6, H=512) you can serve 100+, with tiny even more,
with only ~5% to accuracy penalty
○ DistilBert from Huggingface - 97% of performance with 60% size
● Train on custom QA dataset, but it’s not trivial to make
13. Question answering - model
13
● TLDR: Tradeoff between how many results you retrieve from ElasticSearch
and how accurately you can rank them.
● A/B test for the best combination
○ Engagement, usage, etc.
14. Question answering - improving speed
14
● Case 1: Improving tokenization speed
○ "Trotsky is a notorious criminal.” ⇒ ['tr', '##ots', '##ky', 'is', 'a', 'notorious', 'criminal, ‘.’’] ⇒ ids
○ Code for BERT / XLNET was built for research, and many speed
benchmarks only measure inference time, rather end-to-end stats
○ Tokenization is done in python …
○ We have the data beforehand, and can put tokenized docs into Elastic
○ Shaved off ~0.3s from answering search requests
15. Question answering - improving speed
15
● Case 2: Quantization with Nvidia TensorRT, served via TRTIS
Time per batch
(ms)
Size (MB)
TF-serving, T4 1070 2100
TRTIS, T4 161 254
TF-serving, V100 319 2100
TRTIS, V100 50.0 254
16. Question answering - next steps
16
● Use obtained production data to iterate further
○ Text copy events from the sidebar app
○ Used articles vs high ranking examples from ElasticSearch
17. Classification
17
● Solve: given subject and description, predict output template (finite number
of options, typically 1-10 for macros, 5-100 for articles)
● Triage: given subject and description predict a category, or a set of values
(tags)
● Initially started with out of the box approaches (e.g. Facebook’s fastText),
switched over to BERT/XLNET architectures once those happened.
18. Classification - data cleaning
18
● In addition to the trimming we’ve done for QA models, it makes sense to
redact personal identifiers if you know the format (names, phones, IDs)
● For some models, we got ~1-2% accuracy boost when masking name tokens
○ "Hi ACME Support, I have a question about Max's bank account. Thanks, Linda."
===>
“Hi ACME Support, I have a question about [REDACTED]'s bank account. Thanks, [REDACTED]."
● Rule of thumb: remove as much noise as possible
19. Classification - invest in deployment
tooling
19
● We’ve built a custom deployment tool based on top of Spinnaker
○ Dictated by the fact that model training is done on GCS to take
advantage of TPU chips
○ SageMaker if applicable, or newer candidates such as BentoML
● Served with tfserving and kubernetes
● Components
○ Data collection
○ Model training
○ Model deployment
21. Classification - last remarks
21
● Distillation is very effective for classification tasks, more than it is for QA
● Served on g4 instances, benchmark what’s best for you
● Combine other signals - e.g. values of pre populated fields.
22. AI ⊂ Product ⊂ Company
22
● AI is still just one piece of the puzzle
● “My past cases” feature example
23. Conclusion
23
● Start out with “industry-standard” tooling, adapt for your use case, go further
● Contrasted with large orgs, more focus is on accuracy, less on data ops
● Automate as much as possible for each step of the process
● Invest into A/B test infra
● Since 2018 NLP is getting significant traction, certainly more to come
● Even for “AI companies”, AI is still just a piece of the puzzle