SlideShare a Scribd company logo
1 of 61
Download to read offline
Structure unstructured data
From 300k uncategorised images of watches to production models that
recognise 58 classes with 90% precision, at scale
Carmine Paolino
OLX Group
Offline OLX
“Hey, we have thousands of
watches and I want to organise
them!
1. Ask the right questions
Clarify business problem
Limit the problem space
Define an evaluation metric
Make models that extract at
least 50 attributes from images
of watches at 90% precision,
and prepare a labelled dataset.
2. Data exploration and
Dataset creation
Use pretrained models for classification
Faster R-CNN
InceptionResNet v2
OpenImageDataset
Faster R-CNN
NASNet
COCO
OLX Stock image
detection
Clock
Clock
Stock ImageWatch
Wall clock
Alarm clock
Original Image
Digital clock
Pocket watch
Watch strap
Cluster embeddings of items
from pretrained models
(spoiler alert)
Fine-tune on your domain
N-gram frequency analysis
1-gram:
“Omega”
2-gram:
“Omega Speedmaster”
A few other options:
Topic modelling
Sentiment analysis
Named entity recognition
4. Train some models!
InceptionResNet v2
pretrained on ImageNet
Epochs
Accuracy
Brand classifier’s learning curve
training
softmax
Epochs
Accuracy
Brand classifier’s learning curve
training
softmax
+1 block
Epochs
Accuracy
Brand classifier’s learning curve
training
softmax
+1 block
+1 block
Epochs
Accuracy
Brand classifier’s learning curve
training
softmax
+1 block
+1 block
+1 block
Epochs
Accuracy
Brand classifier’s learning curve
training
softmax
+1 block
+1 block
+1 block
+1 block
Set the classification threshold of
every category in order to have
90% precision
Classifier Class Precision Recall Threshold
Brand Apple 0,90020 0,97720 0,17410
Brand Armani 0,90080 0,51010 0,54613
Brand Breitling 0,90030 0,63290 0,47542
Brand Casio 0,90100 0,25290 0,77514
Brand Diesel 0,90060 0,85480 0,27741
5. Deploy and scale
Export your model to
TensorFlow SavedModel
def export_to_tf(model_path, classes_json, export_dir, top_k):
"""Converts the model to TensorFlow Serving."""
model = keras.models.load_model(model_path)
K.set_learning_phase(0)
# deal with serialized inputs
serialized_tf_example = tf.placeholder(tf.string, name='tf_example')
feature_configs = {
'image/encoded': tf.FixedLenFeature(shape=[], dtype=tf.string),
}
tf_example = tf.parse_example(serialized_tf_example, feature_configs)
jpegs = tf_example['image/encoded']
images = tf.map_fn(preprocess_image, jpegs, dtype=tf.float32)
# ...
def preprocess_image(image_buffer):
"""Emulates how Keras does preprocessing for its InceptionResnetV2
model. Check https://goo.gl/je7BCx"""
# Decode the string as an RGB JPEG
image = tf.image.decode_jpeg(image_buffer, channels=3)
# After this point, all image pixels reside in [0,1)
# until the very end, when they're rescaled to (-1, 1). The various
# adjust_* ops all require this range for dtype float.
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
# Resize the image to the original height and width.
image = tf.expand_dims(image, 0)
image = tf.image.resize_bilinear(
image, [IMAGE_SIZE, IMAGE_SIZE], align_corners=False)
image = tf.squeeze(image, [0])
# Finally, rescale to [-1, 1] instead of [0, 1)
image = tf.subtract(image, 0.5)
image = tf.multiply(image, 2.0)
return image
# … continuing the function export_to_tf
# attach preprocessing to graph
output = model(images)
# output the top 3 predictions
prediction_scores, prediction_indexes = tf.nn.top_k(output, top_k)
# prediction classes
classes = json.loads(classes_json.read())
prediction_table = tf.contrib.lookup.index_to_string_table_from_tensor(
tf.constant(classes))
prediction_classes = prediction_table.lookup(
tf.to_int64(prediction_indexes))
# build the signature_def_map
classification_inputs = tf.saved_model.utils.build_tensor_info(
serialized_tf_example)
classification_outputs_classes = tf.saved_model.utils.build_tensor_info(
prediction_classes)
classification_outputs_scores = tf.saved_model.utils.build_tensor_info(
prediction_scores)
# … continuing the function export_to_tf
classification_signature = (
tf.saved_model.signature_def_utils.build_signature_def(
inputs={
signature_constants.CLASSIFY_INPUTS: classification_inputs
},
outputs={
signature_constants.CLASSIFY_OUTPUT_CLASSES:
classification_outputs_classes,
signature_constants.CLASSIFY_OUTPUT_SCORES:
classification_outputs_scores
},
method_name=signature_constants.CLASSIFY_METHOD_NAME))
builder = tf.saved_model.builder.SavedModelBuilder(export_dir)
with K.get_session() as sess:
builder.add_meta_graph_and_variables(
sess=sess,
tags=[tf.saved_model.tag_constants.SERVING],
clear_devices=True,
legacy_init_op=tf.tables_initializer(),
signature_def_map={
signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY:
classification_signature
})
builder.save()
Amazon
SageMaker
Save it as tar.gz to S3 and
deploy
import sagemaker
from sagemaker.tensorflow.model import TensorFlowModel, TensorFlowPredictor
def custom_tensorflow_predictor(endpoint_name, sagemaker_session):
def predictor():
return TensorFlowPredictor(endpoint_name, sagemaker_session)
return predictor
def deploy(model_path, entry_point_path, inference_instance_type, model_name):
sess = sagemaker.Session()
predictor = custom_tensorflow_predictor(model_name, sess)
model = TensorFlowModel(
model_data=model_path,
role=sagemaker.get_execution_role(),
entry_point=entry_point_path,
predictor_cls=predictor,
name=model_name)
model.deploy(
initial_instance_count=1, instance_type=inference_instance_type)
Set an autoscaling group and…
Thanks!
Questions?
Post, code, and more soon at
https://tech.olx.com/

More Related Content

What's hot

Practical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowPractical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowIllia Polosukhin
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance SegmentationHichem Felouat
 
Senjor project
Senjor projectSenjor project
Senjor projectUsman Ali
 
Kaggle review Planet: Understanding the Amazon from Space
Kaggle reviewPlanet: Understanding the Amazon from SpaceKaggle reviewPlanet: Understanding the Amazon from Space
Kaggle review Planet: Understanding the Amazon from SpaceEduard Tyantov
 
Tensor board
Tensor boardTensor board
Tensor boardSung Kim
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowAI Frontiers
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowPaolo Tomeo
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Sneha Ravikumar
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learningMax Kleiner
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANWuhyun Rico Shin
 
Bcs apsg 2010-05-06_presentation
Bcs apsg 2010-05-06_presentationBcs apsg 2010-05-06_presentation
Bcs apsg 2010-05-06_presentationGeoff Sharman
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerSeiya Tokui
 

What's hot (12)

Practical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlowPractical Reinforcement Learning with TensorFlow
Practical Reinforcement Learning with TensorFlow
 
Object detection and Instance Segmentation
Object detection and Instance SegmentationObject detection and Instance Segmentation
Object detection and Instance Segmentation
 
Senjor project
Senjor projectSenjor project
Senjor project
 
Kaggle review Planet: Understanding the Amazon from Space
Kaggle reviewPlanet: Understanding the Amazon from SpaceKaggle reviewPlanet: Understanding the Amazon from Space
Kaggle review Planet: Understanding the Amazon from Space
 
Tensor board
Tensor boardTensor board
Tensor board
 
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlowRajat Monga at AI Frontiers: Deep Learning with TensorFlow
Rajat Monga at AI Frontiers: Deep Learning with TensorFlow
 
Introduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlowIntroduction to Machine Learning with TensorFlow
Introduction to Machine Learning with TensorFlow
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
Tutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GANTutorial: Image Generation and Image-to-Image Translation using GAN
Tutorial: Image Generation and Image-to-Image Translation using GAN
 
Bcs apsg 2010-05-06_presentation
Bcs apsg 2010-05-06_presentationBcs apsg 2010-05-06_presentation
Bcs apsg 2010-05-06_presentation
 
Learning stochastic neural networks with Chainer
Learning stochastic neural networks with ChainerLearning stochastic neural networks with Chainer
Learning stochastic neural networks with Chainer
 

Similar to Structure Unstructured Data

Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
29-kashyap-mask-detaction.pptx
29-kashyap-mask-detaction.pptx29-kashyap-mask-detaction.pptx
29-kashyap-mask-detaction.pptxKASHYAPPATHAK7
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Ganesan Narayanasamy
 
GANS Project for Image idetification.pdf
GANS Project for Image idetification.pdfGANS Project for Image idetification.pdf
GANS Project for Image idetification.pdfVivekanandaGN1
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfssuserb4d806
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Using the code below- I need help with the following 3 things- 1) Writ.pdf
Using the code below- I need help with the following 3 things- 1) Writ.pdfUsing the code below- I need help with the following 3 things- 1) Writ.pdf
Using the code below- I need help with the following 3 things- 1) Writ.pdfacteleshoppe
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnDebarko De
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowJulien SIMON
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Julien SIMON
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RShirin Elsinghorst
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx0567Padma
 
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...Amazon Web Services
 
SCons an Introduction
SCons an IntroductionSCons an Introduction
SCons an Introductionslantsixgames
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 

Similar to Structure Unstructured Data (20)

Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
29-kashyap-mask-detaction.pptx
29-kashyap-mask-detaction.pptx29-kashyap-mask-detaction.pptx
29-kashyap-mask-detaction.pptx
 
Computer vision
Computer vision Computer vision
Computer vision
 
Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117Power ai tensorflowworkloadtutorial-20171117
Power ai tensorflowworkloadtutorial-20171117
 
GANS Project for Image idetification.pdf
GANS Project for Image idetification.pdfGANS Project for Image idetification.pdf
GANS Project for Image idetification.pdf
 
AIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdfAIML4 CNN lab256 1hr (111-1).pdf
AIML4 CNN lab256 1hr (111-1).pdf
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Using the code below- I need help with the following 3 things- 1) Writ.pdf
Using the code below- I need help with the following 3 things- 1) Writ.pdfUsing the code below- I need help with the following 3 things- 1) Writ.pdf
Using the code below- I need help with the following 3 things- 1) Writ.pdf
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using TensorflowAWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
AWS re:Invent 2018 - AIM401 - Deep Learning using Tensorflow
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
Keras and TensorFlow
Keras and TensorFlowKeras and TensorFlow
Keras and TensorFlow
 
Workshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with RWorkshop - Introduction to Machine Learning with R
Workshop - Introduction to Machine Learning with R
 
CNN_INTRO.pptx
CNN_INTRO.pptxCNN_INTRO.pptx
CNN_INTRO.pptx
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
DLT UNIT-3.docx
DLT  UNIT-3.docxDLT  UNIT-3.docx
DLT UNIT-3.docx
 
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
[REPEAT] Deep Learning Applications Using TensorFlow (AIM401-R) - AWS re:Inve...
 
Deep learning-practical
Deep learning-practicalDeep learning-practical
Deep learning-practical
 
SCons an Introduction
SCons an IntroductionSCons an Introduction
SCons an Introduction
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 

Structure Unstructured Data

  • 1. Structure unstructured data From 300k uncategorised images of watches to production models that recognise 58 classes with 90% precision, at scale Carmine Paolino OLX Group
  • 3. “Hey, we have thousands of watches and I want to organise them!
  • 4.
  • 5.
  • 6.
  • 7.
  • 8. 1. Ask the right questions
  • 12. Make models that extract at least 50 attributes from images of watches at 90% precision, and prepare a labelled dataset.
  • 13. 2. Data exploration and Dataset creation
  • 14. Use pretrained models for classification Faster R-CNN InceptionResNet v2 OpenImageDataset Faster R-CNN NASNet COCO OLX Stock image detection Clock Clock Stock ImageWatch Wall clock Alarm clock Original Image Digital clock Pocket watch Watch strap
  • 15. Cluster embeddings of items from pretrained models
  • 16.
  • 17.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. A few other options: Topic modelling Sentiment analysis Named entity recognition
  • 30.
  • 31.
  • 32. 4. Train some models!
  • 33.
  • 35.
  • 36.
  • 37.
  • 38.
  • 40.
  • 41. Epochs Accuracy Brand classifier’s learning curve training softmax +1 block
  • 42.
  • 43. Epochs Accuracy Brand classifier’s learning curve training softmax +1 block +1 block
  • 44.
  • 45. Epochs Accuracy Brand classifier’s learning curve training softmax +1 block +1 block +1 block
  • 46.
  • 47. Epochs Accuracy Brand classifier’s learning curve training softmax +1 block +1 block +1 block +1 block
  • 48. Set the classification threshold of every category in order to have 90% precision
  • 49. Classifier Class Precision Recall Threshold Brand Apple 0,90020 0,97720 0,17410 Brand Armani 0,90080 0,51010 0,54613 Brand Breitling 0,90030 0,63290 0,47542 Brand Casio 0,90100 0,25290 0,77514 Brand Diesel 0,90060 0,85480 0,27741
  • 50. 5. Deploy and scale
  • 51. Export your model to TensorFlow SavedModel
  • 52. def export_to_tf(model_path, classes_json, export_dir, top_k): """Converts the model to TensorFlow Serving.""" model = keras.models.load_model(model_path) K.set_learning_phase(0) # deal with serialized inputs serialized_tf_example = tf.placeholder(tf.string, name='tf_example') feature_configs = { 'image/encoded': tf.FixedLenFeature(shape=[], dtype=tf.string), } tf_example = tf.parse_example(serialized_tf_example, feature_configs) jpegs = tf_example['image/encoded'] images = tf.map_fn(preprocess_image, jpegs, dtype=tf.float32) # ...
  • 53. def preprocess_image(image_buffer): """Emulates how Keras does preprocessing for its InceptionResnetV2 model. Check https://goo.gl/je7BCx""" # Decode the string as an RGB JPEG image = tf.image.decode_jpeg(image_buffer, channels=3) # After this point, all image pixels reside in [0,1) # until the very end, when they're rescaled to (-1, 1). The various # adjust_* ops all require this range for dtype float. image = tf.image.convert_image_dtype(image, dtype=tf.float32) # Resize the image to the original height and width. image = tf.expand_dims(image, 0) image = tf.image.resize_bilinear( image, [IMAGE_SIZE, IMAGE_SIZE], align_corners=False) image = tf.squeeze(image, [0]) # Finally, rescale to [-1, 1] instead of [0, 1) image = tf.subtract(image, 0.5) image = tf.multiply(image, 2.0) return image
  • 54. # … continuing the function export_to_tf # attach preprocessing to graph output = model(images) # output the top 3 predictions prediction_scores, prediction_indexes = tf.nn.top_k(output, top_k) # prediction classes classes = json.loads(classes_json.read()) prediction_table = tf.contrib.lookup.index_to_string_table_from_tensor( tf.constant(classes)) prediction_classes = prediction_table.lookup( tf.to_int64(prediction_indexes)) # build the signature_def_map classification_inputs = tf.saved_model.utils.build_tensor_info( serialized_tf_example) classification_outputs_classes = tf.saved_model.utils.build_tensor_info( prediction_classes) classification_outputs_scores = tf.saved_model.utils.build_tensor_info( prediction_scores)
  • 55. # … continuing the function export_to_tf classification_signature = ( tf.saved_model.signature_def_utils.build_signature_def( inputs={ signature_constants.CLASSIFY_INPUTS: classification_inputs }, outputs={ signature_constants.CLASSIFY_OUTPUT_CLASSES: classification_outputs_classes, signature_constants.CLASSIFY_OUTPUT_SCORES: classification_outputs_scores }, method_name=signature_constants.CLASSIFY_METHOD_NAME)) builder = tf.saved_model.builder.SavedModelBuilder(export_dir) with K.get_session() as sess: builder.add_meta_graph_and_variables( sess=sess, tags=[tf.saved_model.tag_constants.SERVING], clear_devices=True, legacy_init_op=tf.tables_initializer(), signature_def_map={ signature_constants.DEFAULT_SERVING_SIGNATURE_DEF_KEY: classification_signature }) builder.save()
  • 57. Save it as tar.gz to S3 and deploy
  • 58. import sagemaker from sagemaker.tensorflow.model import TensorFlowModel, TensorFlowPredictor def custom_tensorflow_predictor(endpoint_name, sagemaker_session): def predictor(): return TensorFlowPredictor(endpoint_name, sagemaker_session) return predictor def deploy(model_path, entry_point_path, inference_instance_type, model_name): sess = sagemaker.Session() predictor = custom_tensorflow_predictor(model_name, sess) model = TensorFlowModel( model_data=model_path, role=sagemaker.get_execution_role(), entry_point=entry_point_path, predictor_cls=predictor, name=model_name) model.deploy( initial_instance_count=1, instance_type=inference_instance_type)
  • 59. Set an autoscaling group and…
  • 60.
  • 61. Thanks! Questions? Post, code, and more soon at https://tech.olx.com/