SlideShare a Scribd company logo
1 of 42
Download to read offline
NLP Pipeline Best
practices
Vitalii Radchenko

Data Scientist, YouScan
You’ll find out
✓What is a good pipeline

✓How to process effectively input data 

✓How to build a train pipeline 

✓Why is a declarative syntax useful

✓How to alter training to predictive pipeline with
small efforts
Pipeline
• sequences of processing and analysis steps applied to
data for a specific purpose

• save time on design time and coding, if you expect to
encounter similar tasks

• simplifies deployment to production
• Reusable pipeline 

(applied to different tasks with minor changes)

• Structured 

(logical chain, easy understandable)

• Documented 

(fully commented, defined arguments types, readme)

• Covered by tests 

(simple checks for input data/batch shape/model output, and
unit tests for each pipeline step)
What is a good pipeline?
Input data in NLP
• Text is always a part of an input

(token, sentence, article, dialogue, html page etc)

• Tags/Labels

• Spans (start and end indexes)
• For text input we need to create a text field

• Text Field should have next methods:

- preprocess (preprocess text, 

argument – a list of preprocessors)

- tokenize (tokenize text, argument – a tokenizer)

- index (index text, argument – a vocabulary)

- pad (pad text, argument – maximum number of tokens)

- as_tensor (returns a tensor, basic method for all fields)
Text Field
Input text
Preprocess
Tokenize
Index
Pad
As tensor
Text Field
Input Text
Input Labels
Text Field
Label Field
Label Field
• label field should be convertible to
appropriate model format based on
vocabulary:

- index 

(index labels, argument – vocabulary)

- as tensor 

(returns a tensor with a proper
shape)
Input labels
Index
As tensor
Label Field
Input Text
Input Labels
Text Field
Label Field
Other Fields
• Metadata Field – other information which will not be used for training
(raw text, author, topic, etc)

• Categorical Field – for any categorical data which could be encoded
as embeddings or OHE (post type, source etc)

• Span Field – start, end or inside index

• Index field – index for the right answer over the sequence

• Create your own fields which contain “as tensor” method, for
comfortable use inside your model
Instance
• Instance is a collection of fields

• One sample – one instance

• Has Mapping[str, Field] type, all fields are
keyed

• Get field as tensor by key (“text”, “labels"
– in our example)

• Should contain “index_fields” method to
index all fields by the given vocabulary
Instance
Label Field
Text FieldText Fieldtext
labels
Instance
Input Text
Input Labels
Text Field
Label Field
Vocabulary
• Vocabulary is a class where all vocabularies
(text, labels, categories) are available by
namespace 

• We should be able to create a vocabulary
from list of instances, counters or predefined
files

• Add special tokens: OOV (Out-of-
Vocabulary), padded

• Should have options to get statistics,

“string to index” and “index to string”
Vocabulary
Label Vocab
Text FieldText Vocabtext
labels
Instance
Vocabulary
Input Text
Input Labels
Text Field
Label Field
Starting a pipeline schema
Instance
Vocabulary
Input Text
Input Labels
Text Field
Label Field
1 2
34
1. Input data to Fields

2. Fields to Instance

3. List of Instances to Vocabulary

4. Index Fields with Vocabulary
Preprocessing and tokenization
• Preprocessing and tokenization you should apply before
setting text to field

• Better to store raw text in metadata (if it is possible, to track
correctness of preprocessing and tokenization)

• Preprocessing applied to raw text and returns preprocessed
text

• Preprocessed text is sent to tokenizer, which returns a list of
tokens
DataSet Reader
• Our pipeline schema should be used in dataset
reader

• Each task should have own dataset reader in which
you will construct instances and vocabulary

• Could be lazy
Iterator
• Gets output from Dataset Reader

• Types:

- “Simple Iterator” just shuffles data

- “Bucket Iterator” creates batches with the same
length

• Index and pad fields during training
Trainer
• Arguments:

- Train and Validation dataset 

(List of Instances)

- Iterator

- Define model

- Optimizer and Loss

- Train configurations (number of epochs, shuffling,
metrics)

- Callbacks (early stopping, learning rate schedule,
logging etc)

• Whole train process is done by Trainer class
Dataset
Model
Optimizer
Configuration
Callbacks
Trainer
Iterator
Config
• config file in JSON or XML format

• all pipeline steps are defined in config file

• declarative syntax allows to change experiment 

without code changing
"data_folder": "data/training_data",

"dataset_reader": {

"type": "multilabel_classification"

}

"preprocessing": {
"lower": true, 

"max_seq_len": 150, 

"include_length": true, 

"label_one_hot": true, 

“preprocessors": [ ["HtmlEntitiesUnescaper"], ["BoldTagReplacer"],

["HtmlTagReplacer", [" "]], ["URLReplacer", [" urlTag “]]] 

}
Data
"type": "bucket_iterator",
"params": {

"batch_size": 1000,

"shuffle": true,

"sort_key": { 

"field": "text", 

"type": "length" 

}
}
Iterator
Config
"early_stopping": { 

"patience": 5, 

"monitor": "f1_score", 

"save_all_iterations": false
}, 

"tf_logging": { "path": "tf_logs" }
Callbacks
"type": "f1_score", 

"params": { 

"average": "micro", 

"activation": "sigmoid"

}
Metrics
"loss": {

"type": "bce_with_logits",

"params": {} 

}, 

"optimizer": {

"type": "adam",

"params": {"lr": 0.001 }
}
Optimizer and loss
"model_folder": "models",

"epochs": 100, 

"device": "1"
Trainer
"type": "mixed_rnn", 

"params": { 

"embedding_dropout": 0.3, 

"rnn_1": {

"rnn_type": "lstm", 

"hidden_cells": 100, 

"hidden_layers": 1

}, 

"rnn_2": { 

"rnn_type": "gru", 

"hidden_cells": 100, 

"hidden_layers": 1 

}, 

"aggregation_layers": { 

"types": ["max_pool", "mean_pool"] 

}, 

"activation": { "use": false }

}
Model
Saving
• Config file (you can always reproduce an experiment
if you have a config file)

• Vocabulary (match embeddings indexes or labels)

• Model weights (load model)
Predictor
• Input data – JSON-format (API format)

• Dataset Reader adopted to JSON-format

• Iterator for batch processing 

• Options to get predictions with probabilities, return
top-5 most probable labels, change thresholds and
other options for validating results
Pipeline schema
Instance
Vocabulary
Input Text
Input Labels
Text Field
Label Field
Dataset Reader
Iterator
Config
Model
Trainer/Predictor
1
2
3
4
4
4
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Training
# define reader
reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])

# load train and val datasets

train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 

val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 

# get vocabulary and set token vectors

vocab = Vocabulary.from_dataset(train_dataset + val_dataset)

vocab.set_vectors(training_config["embeddings"]) 

# define data iterator 

iterator = get_iterator(training_config["iterator"], vocab=vocab) 



# define model, optimizer, loss and metrics

model = get_model(training_config["model"], vocab) 

loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 

opt = get_opt(training_config["optimizer"], model.parameters()) 

metrics = get_metrics(training_config["metrics"])



# define trainer

trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 

optimizer=opt, metrics=metrics)



# train model

trainer.train()
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Predicting
# define reader
reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])



# load vocab

vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 



# define iterator

iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)



# load model

model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)



# define predictor 

predictor = Predictor(model=model, reader=reader, iterator=iterator)



# function which predicts your input

def predict(data): 

return predictor.predict(data)
Conclusions
• Abstractions make a pipeline more structured, logical and
understandable

• Declarative syntax helps to keep a pipeline simple and set
the whole experiment in config without code changing

• Production pipeline almost the same as training

• With good base, prototyping becomes very fast
Resources
• AllenNLP GitHub: https://github.com/allenai/allennlp

• Writing Code for NLP Research: https://bit.ly/2Desi4b

• TorchText: https://torchtext.readthedocs.io/en/latest/
Any questions?
Contacts
• ODS-slack: @vradchenko

• Email: radchenko.vitaliy.o@gmail.com

• Facebook: fb.com/vradchenko
Thank you

More Related Content

Similar to Оформление пайплайна в NLP проекте​, Виталий Радченко. 22 июня, 2019

Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluationavniS
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Michael Rys
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersMichael Rys
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...andrejusb
 
Dynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeDynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeClay Helberg
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sasAjay Ohri
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMateusz Dymczyk
 
Text field and textarea
Text field and textareaText field and textarea
Text field and textareamyrajendra
 
Informatica overview
Informatica overviewInformatica overview
Informatica overviewkarthik kumar
 
Informatica overview
Informatica overviewInformatica overview
Informatica overviewkarthik kumar
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost ModelOlav Sandstå
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLMichael Rys
 
Spring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard WolffSpring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard WolffJAX London
 

Similar to Оформление пайплайна в NLP проекте​, Виталий Радченко. 22 июня, 2019 (20)

Overview of query evaluation
Overview of query evaluationOverview of query evaluation
Overview of query evaluation
 
Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)Introducing U-SQL (SQLPASS 2016)
Introducing U-SQL (SQLPASS 2016)
 
Taming the shrew Power BI
Taming the shrew Power BITaming the shrew Power BI
Taming the shrew Power BI
 
U-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for DevelopersU-SQL - Azure Data Lake Analytics for Developers
U-SQL - Azure Data Lake Analytics for Developers
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
ElasticSearch
ElasticSearchElasticSearch
ElasticSearch
 
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor... Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
Machine Learning Applied - Contextual Chatbots Coding, Oracle JET and Tensor...
 
Dynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data MergeDynamic Publishing with Arbortext Data Merge
Dynamic Publishing with Arbortext Data Merge
 
Introduction to sas
Introduction to sasIntroduction to sas
Introduction to sas
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
Text field and textarea
Text field and textareaText field and textarea
Text field and textarea
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 
Informatica overview
Informatica overviewInformatica overview
Informatica overview
 
MySQL Optimizer Cost Model
MySQL Optimizer Cost ModelMySQL Optimizer Cost Model
MySQL Optimizer Cost Model
 
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQLTaming the Data Science Monster with A New ‘Sword’ – U-SQL
Taming the Data Science Monster with A New ‘Sword’ – U-SQL
 
Full Text Search In PostgreSQL
Full Text Search In PostgreSQLFull Text Search In PostgreSQL
Full Text Search In PostgreSQL
 
Spring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard WolffSpring Day | Spring and Scala | Eberhard Wolff
Spring Day | Spring and Scala | Eberhard Wolff
 
Apex code (Salesforce)
Apex code (Salesforce)Apex code (Salesforce)
Apex code (Salesforce)
 
Stata tutorial university of princeton
Stata tutorial university of princetonStata tutorial university of princeton
Stata tutorial university of princeton
 

More from Mail.ru Group

Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...
Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...
Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...Mail.ru Group
 
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...Mail.ru Group
 
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир Дубровин
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир ДубровинДругая сторона баг-баунти-программ: как это выглядит изнутри, Владимир Дубровин
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир ДубровинMail.ru Group
 
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...Mail.ru Group
 
Управление инцидентами в Почте Mail.ru, Антон Викторов
Управление инцидентами в Почте Mail.ru, Антон ВикторовУправление инцидентами в Почте Mail.ru, Антон Викторов
Управление инцидентами в Почте Mail.ru, Антон ВикторовMail.ru Group
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваMail.ru Group
 
Почему вам стоит использовать свой велосипед и почему не стоит Александр Бел...
Почему вам стоит использовать свой велосипед и почему не стоит  Александр Бел...Почему вам стоит использовать свой велосипед и почему не стоит  Александр Бел...
Почему вам стоит использовать свой велосипед и почему не стоит Александр Бел...Mail.ru Group
 
CV в пайплайне распознавания ценников товаров: трюки и хитрости Николай Масл...
CV в пайплайне распознавания ценников товаров: трюки и хитрости  Николай Масл...CV в пайплайне распознавания ценников товаров: трюки и хитрости  Николай Масл...
CV в пайплайне распознавания ценников товаров: трюки и хитрости Николай Масл...Mail.ru Group
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidiaMail.ru Group
 
WebAuthn в реальной жизни, Анатолий Остапенко
WebAuthn в реальной жизни, Анатолий ОстапенкоWebAuthn в реальной жизни, Анатолий Остапенко
WebAuthn в реальной жизни, Анатолий ОстапенкоMail.ru Group
 
AMP для электронной почты, Сергей Пешков
AMP для электронной почты, Сергей ПешковAMP для электронной почты, Сергей Пешков
AMP для электронной почты, Сергей ПешковMail.ru Group
 
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила Стрелков
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила СтрелковКак мы захотели TWA и сделали его без мобильных разработчиков, Данила Стрелков
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила СтрелковMail.ru Group
 
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...Mail.ru Group
 
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.Такси
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.ТаксиМетапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.Такси
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.ТаксиMail.ru Group
 
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru Group
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru GroupКак не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru Group
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru GroupMail.ru Group
 
Этика искусственного интеллекта, Александр Кармаев (AI Journey)
Этика искусственного интеллекта, Александр Кармаев (AI Journey)Этика искусственного интеллекта, Александр Кармаев (AI Journey)
Этика искусственного интеллекта, Александр Кармаев (AI Journey)Mail.ru Group
 
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...Mail.ru Group
 
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...Конвергенция технологий как тренд развития искусственного интеллекта, Владими...
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...Mail.ru Group
 
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)Mail.ru Group
 
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()Mail.ru Group
 

More from Mail.ru Group (20)

Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...
Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...
Автоматизация без тест-инженеров по автоматизации, Мария Терехина и Владислав...
 
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...
BDD для фронтенда. Автоматизация тестирования с Cucumber, Cypress и Jenkins, ...
 
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир Дубровин
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир ДубровинДругая сторона баг-баунти-программ: как это выглядит изнутри, Владимир Дубровин
Другая сторона баг-баунти-программ: как это выглядит изнутри, Владимир Дубровин
 
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...
Использование Fiddler и Charles при тестировании фронтенда проекта pulse.mail...
 
Управление инцидентами в Почте Mail.ru, Антон Викторов
Управление инцидентами в Почте Mail.ru, Антон ВикторовУправление инцидентами в Почте Mail.ru, Антон Викторов
Управление инцидентами в Почте Mail.ru, Антон Викторов
 
DAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга СвиридоваDAST в CI/CD, Ольга Свиридова
DAST в CI/CD, Ольга Свиридова
 
Почему вам стоит использовать свой велосипед и почему не стоит Александр Бел...
Почему вам стоит использовать свой велосипед и почему не стоит  Александр Бел...Почему вам стоит использовать свой велосипед и почему не стоит  Александр Бел...
Почему вам стоит использовать свой велосипед и почему не стоит Александр Бел...
 
CV в пайплайне распознавания ценников товаров: трюки и хитрости Николай Масл...
CV в пайплайне распознавания ценников товаров: трюки и хитрости  Николай Масл...CV в пайплайне распознавания ценников товаров: трюки и хитрости  Николай Масл...
CV в пайплайне распознавания ценников товаров: трюки и хитрости Николай Масл...
 
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidiaRAPIDS: ускоряем Pandas и scikit-learn на GPU  Павел Клеменков, NVidia
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
 
WebAuthn в реальной жизни, Анатолий Остапенко
WebAuthn в реальной жизни, Анатолий ОстапенкоWebAuthn в реальной жизни, Анатолий Остапенко
WebAuthn в реальной жизни, Анатолий Остапенко
 
AMP для электронной почты, Сергей Пешков
AMP для электронной почты, Сергей ПешковAMP для электронной почты, Сергей Пешков
AMP для электронной почты, Сергей Пешков
 
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила Стрелков
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила СтрелковКак мы захотели TWA и сделали его без мобильных разработчиков, Данила Стрелков
Как мы захотели TWA и сделали его без мобильных разработчиков, Данила Стрелков
 
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...
Кейсы использования PWA для партнерских предложений в Delivery Club, Никита Б...
 
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.Такси
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.ТаксиМетапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.Такси
Метапрограммирование: строим конечный автомат, Сергей Федоров, Яндекс.Такси
 
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru Group
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru GroupКак не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru Group
Как не сделать врагами архитектуру и оптимизацию, Кирилл Березин, Mail.ru Group
 
Этика искусственного интеллекта, Александр Кармаев (AI Journey)
Этика искусственного интеллекта, Александр Кармаев (AI Journey)Этика искусственного интеллекта, Александр Кармаев (AI Journey)
Этика искусственного интеллекта, Александр Кармаев (AI Journey)
 
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...
Нейро-машинный перевод в вопросно-ответных системах, Федор Федоренко (AI Jour...
 
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...Конвергенция технологий как тренд развития искусственного интеллекта, Владими...
Конвергенция технологий как тренд развития искусственного интеллекта, Владими...
 
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)
Обзор трендов рекомендательных систем от Пульса, Андрей Мурашев (AI Journey)
 
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()
Мир глазами нейросетей, Данила Байгушев, Александр Сноркин ()
 

Recently uploaded

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...chiefasafspells
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...masabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 

Recently uploaded (20)

%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
WSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - KanchanaWSO2Con2024 - Hello Choreo Presentation - Kanchana
WSO2Con2024 - Hello Choreo Presentation - Kanchana
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
WSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AIWSO2CON 2024 Slides - Unlocking Value with AI
WSO2CON 2024 Slides - Unlocking Value with AI
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
Abortion Pill Prices Boksburg [(+27832195400*)] 🏥 Women's Abortion Clinic in ...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
WSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaSWSO2CON 2024 Slides - Open Source to SaaS
WSO2CON 2024 Slides - Open Source to SaaS
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
WSO2CON 2024 - Navigating API Complexity: REST, GraphQL, gRPC, Websocket, Web...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 

Оформление пайплайна в NLP проекте​, Виталий Радченко. 22 июня, 2019

  • 1. NLP Pipeline Best practices Vitalii Radchenko
 Data Scientist, YouScan
  • 2. You’ll find out ✓What is a good pipeline ✓How to process effectively input data ✓How to build a train pipeline ✓Why is a declarative syntax useful ✓How to alter training to predictive pipeline with small efforts
  • 3. Pipeline • sequences of processing and analysis steps applied to data for a specific purpose • save time on design time and coding, if you expect to encounter similar tasks • simplifies deployment to production
  • 4. • Reusable pipeline 
 (applied to different tasks with minor changes) • Structured 
 (logical chain, easy understandable) • Documented 
 (fully commented, defined arguments types, readme) • Covered by tests 
 (simple checks for input data/batch shape/model output, and unit tests for each pipeline step) What is a good pipeline?
  • 5. Input data in NLP • Text is always a part of an input
 (token, sentence, article, dialogue, html page etc) • Tags/Labels • Spans (start and end indexes)
  • 6. • For text input we need to create a text field • Text Field should have next methods: - preprocess (preprocess text, 
 argument – a list of preprocessors) - tokenize (tokenize text, argument – a tokenizer) - index (index text, argument – a vocabulary) - pad (pad text, argument – maximum number of tokens) - as_tensor (returns a tensor, basic method for all fields) Text Field Input text Preprocess Tokenize Index Pad As tensor Text Field Input Text Input Labels Text Field Label Field
  • 7. Label Field • label field should be convertible to appropriate model format based on vocabulary: - index 
 (index labels, argument – vocabulary) - as tensor 
 (returns a tensor with a proper shape) Input labels Index As tensor Label Field Input Text Input Labels Text Field Label Field
  • 8. Other Fields • Metadata Field – other information which will not be used for training (raw text, author, topic, etc) • Categorical Field – for any categorical data which could be encoded as embeddings or OHE (post type, source etc) • Span Field – start, end or inside index • Index field – index for the right answer over the sequence • Create your own fields which contain “as tensor” method, for comfortable use inside your model
  • 9. Instance • Instance is a collection of fields • One sample – one instance • Has Mapping[str, Field] type, all fields are keyed • Get field as tensor by key (“text”, “labels" – in our example) • Should contain “index_fields” method to index all fields by the given vocabulary Instance Label Field Text FieldText Fieldtext labels Instance Input Text Input Labels Text Field Label Field
  • 10. Vocabulary • Vocabulary is a class where all vocabularies (text, labels, categories) are available by namespace • We should be able to create a vocabulary from list of instances, counters or predefined files • Add special tokens: OOV (Out-of- Vocabulary), padded • Should have options to get statistics,
 “string to index” and “index to string” Vocabulary Label Vocab Text FieldText Vocabtext labels Instance Vocabulary Input Text Input Labels Text Field Label Field
  • 11. Starting a pipeline schema Instance Vocabulary Input Text Input Labels Text Field Label Field 1 2 34 1. Input data to Fields 2. Fields to Instance 3. List of Instances to Vocabulary 4. Index Fields with Vocabulary
  • 12. Preprocessing and tokenization • Preprocessing and tokenization you should apply before setting text to field • Better to store raw text in metadata (if it is possible, to track correctness of preprocessing and tokenization) • Preprocessing applied to raw text and returns preprocessed text • Preprocessed text is sent to tokenizer, which returns a list of tokens
  • 13. DataSet Reader • Our pipeline schema should be used in dataset reader • Each task should have own dataset reader in which you will construct instances and vocabulary • Could be lazy
  • 14. Iterator • Gets output from Dataset Reader • Types: - “Simple Iterator” just shuffles data - “Bucket Iterator” creates batches with the same length • Index and pad fields during training
  • 15. Trainer • Arguments: - Train and Validation dataset 
 (List of Instances) - Iterator - Define model - Optimizer and Loss - Train configurations (number of epochs, shuffling, metrics) - Callbacks (early stopping, learning rate schedule, logging etc) • Whole train process is done by Trainer class Dataset Model Optimizer Configuration Callbacks Trainer Iterator
  • 16. Config • config file in JSON or XML format • all pipeline steps are defined in config file • declarative syntax allows to change experiment 
 without code changing "data_folder": "data/training_data",
 "dataset_reader": {
 "type": "multilabel_classification"
 }
 "preprocessing": { "lower": true, 
 "max_seq_len": 150, 
 "include_length": true, 
 "label_one_hot": true, 
 “preprocessors": [ ["HtmlEntitiesUnescaper"], ["BoldTagReplacer"],
 ["HtmlTagReplacer", [" "]], ["URLReplacer", [" urlTag “]]] 
 } Data "type": "bucket_iterator", "params": {
 "batch_size": 1000,
 "shuffle": true,
 "sort_key": { 
 "field": "text", 
 "type": "length" 
 } } Iterator
  • 17. Config "early_stopping": { 
 "patience": 5, 
 "monitor": "f1_score", 
 "save_all_iterations": false }, 
 "tf_logging": { "path": "tf_logs" } Callbacks "type": "f1_score", 
 "params": { 
 "average": "micro", 
 "activation": "sigmoid"
 } Metrics "loss": {
 "type": "bce_with_logits",
 "params": {} 
 }, 
 "optimizer": {
 "type": "adam",
 "params": {"lr": 0.001 } } Optimizer and loss "model_folder": "models",
 "epochs": 100, 
 "device": "1" Trainer "type": "mixed_rnn", 
 "params": { 
 "embedding_dropout": 0.3, 
 "rnn_1": {
 "rnn_type": "lstm", 
 "hidden_cells": 100, 
 "hidden_layers": 1
 }, 
 "rnn_2": { 
 "rnn_type": "gru", 
 "hidden_cells": 100, 
 "hidden_layers": 1 
 }, 
 "aggregation_layers": { 
 "types": ["max_pool", "mean_pool"] 
 }, 
 "activation": { "use": false }
 } Model
  • 18. Saving • Config file (you can always reproduce an experiment if you have a config file) • Vocabulary (match embeddings indexes or labels) • Model weights (load model)
  • 19. Predictor • Input data – JSON-format (API format) • Dataset Reader adopted to JSON-format • Iterator for batch processing • Options to get predictions with probabilities, return top-5 most probable labels, change thresholds and other options for validating results
  • 20. Pipeline schema Instance Vocabulary Input Text Input Labels Text Field Label Field Dataset Reader Iterator Config Model Trainer/Predictor 1 2 3 4 4 4
  • 21. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 22. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 23. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 24. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 25. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 26. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 27. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 28. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 29. Training # define reader reader = get_reader(training_config["dataset_reader"], preprocessing=training_config[“preprocessing"])
 # load train and val datasets
 train_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "train_data.p")) 
 val_dataset = reader.read(data_path=os.path.join(training_config["data_folder"], "val_data.p")) 
 # get vocabulary and set token vectors
 vocab = Vocabulary.from_dataset(train_dataset + val_dataset)
 vocab.set_vectors(training_config["embeddings"]) 
 # define data iterator 
 iterator = get_iterator(training_config["iterator"], vocab=vocab) 
 
 # define model, optimizer, loss and metrics
 model = get_model(training_config["model"], vocab) 
 loss = get_loss(training_config["loss"]["type"], training_config["loss"]["params"]) 
 opt = get_opt(training_config["optimizer"], model.parameters()) 
 metrics = get_metrics(training_config["metrics"])
 
 # define trainer
 trainer = Trainer(training_params=training_config["training_params"], model=model, loss=loss, 
 optimizer=opt, metrics=metrics)
 
 # train model
 trainer.train()
  • 30. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 31. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 32. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 33. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 34. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 35. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 36. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 37. Predicting # define reader reader = get_reader(config["deploy"]["dataset_reader"], preprocessing=config["preprocessing"])
 
 # load vocab
 vocab = Vocabulary.from_file(os.path.join(config["model_path"], "vocab")) 
 
 # define iterator
 iterator = get_iterator(config["deploy"]["iterator"], vocab=vocab)
 
 # load model
 model = load_model(os.path.join(config["model_path"], "model"), vocab=vocab)
 
 # define predictor 
 predictor = Predictor(model=model, reader=reader, iterator=iterator)
 
 # function which predicts your input
 def predict(data): 
 return predictor.predict(data)
  • 38. Conclusions • Abstractions make a pipeline more structured, logical and understandable • Declarative syntax helps to keep a pipeline simple and set the whole experiment in config without code changing • Production pipeline almost the same as training • With good base, prototyping becomes very fast
  • 39. Resources • AllenNLP GitHub: https://github.com/allenai/allennlp • Writing Code for NLP Research: https://bit.ly/2Desi4b • TorchText: https://torchtext.readthedocs.io/en/latest/
  • 41. Contacts • ODS-slack: @vradchenko • Email: radchenko.vitaliy.o@gmail.com • Facebook: fb.com/vradchenko