SlideShare a Scribd company logo
Smart approach in development and
deployment process for various ML models
Jelena Pekez (Advanced Analytics Team Lead)
Miloš Josifović (Big Data Architect)
Danijel Ilievski (Senior ML Engineer)
Comtrade System Integration
Introduction
→Since 87% of models are never deployed, all steps should be planned at the
beginning of Data Science Lifecycle (pipeline):
1. Manage
2. Develop
3. Deploy
4. Monitor
→The first goal is to reduce go to production time for new ML models
with development of Smart Generic Data Mart(s).
→With Smart Data Mart(s) we can prototype ML model and evaluate feasibility.
→The final goal is to generate Production Models and easily orchestrate them.
2
Results
Interpretation
Modeling
Data
Preprocess
Data mart design
ADS
Problem
Formulation
Deployment
PROD.MODEL
Comtrade System Integration
3
ADS smart development to support all future ML models
→Planning DataMart for creation of first ML model in a program takes exhaustive time:
• Collect at high-level all possible future use-cases
• Come up with all relevant and available data sources
• Customer’s activities which company has interest in
• Combine data from structured and unstructured data sources
• Extensive feature engineering (text processing, normalization, binning,…)
• Complying with GDPR regulation
• Define proper access rights on selected Data Mart(s)
• Resolving data quality issues at the very beginning will reduce endless reloads
FornextMLmodeldatascientistscanspendmoretimeoncreativeactivitiesusingdevelopedAnalyticalDatamarts/Sets(ADS)
Comtrade System Integration
Smart generic data mart(s)
→Creating Multipurpose Data Marts:
• Generate list of target features and relevant target events
• Design it so new events can be easily added
• Eliminate data that have no business/use-case value
• Filter out system records - clean data
• Make initial (starting) base table/s - what is definition of customer?
• Aggregate data to different granularity levels to catch behavior trends
• Feature Engineering do indeed make a difference!
4
Generate quickly and easily new ML training datasets
Comtrade System Integration
Data Science requires domain knowledge
makes a big difference
→How much domain knowledge do I need? Depends.
→Domain knowledge is critical for data preparation, productization and orchestration
→Which data points add value?
→Domain knowledge is necessary in data pre-processing:
• Outlier detection, feature importance, model selection, model evaluation stage...
5
DATA SCIENCE
DOMAIN
KNOWLEDGE
MATH, STATS
& ML
COMPUTER
SCIENCE
You have to get best of both worlds!
Comtrade System Integration
Control your data mart(s) in production
→Steps in data pipeline for data quality check:
• Missing data vs Loaded data - aggregations
• Duplicates – the same records were repeated
• Relative change threshold - increment or decrement in the number of records
• Statistical expected range
• Data drift – target variable distribution
6
Data
Pipeline
Comtrade System Integration
Example how Generic Data Set can help to focus on
Data Science – Transfer between DWH and Data Lake
→Data on two platforms (DWH – SQL database, Data Lake – Hadoop)
→Data can be transferred among databases:
• Through SQL federation / DB link – with certain specifics/products compatibility
• Via Spark engine (PySpark) to Hadoop
→Aim is to simplify data transfer between platforms so,
Data Scientist can do it on their own, without:
• Dealing with Spark’s jobs directly
• Manage Hadoop security (Kerberos, read-write permissions, etc.)
7
Comtrade System Integration
Speed up writting SQL queries
→ADS  [GENERATE SQL QUERY]  Training/Scoring table
→Query automation for training table
→ Input for Python script: e.g. of Python script:
8
SCHEMA SOURCE VAR_IN VAR_OUT FUNCTIONS
PERIOD
S
ZERO
EXCLUDE
ADS DS_PAYMENT TOTAL_PAYMENT_AMT
TOTAL_PAYMENT_AM
T
[MAX, AVG/P] [3, 6] 1
ADS DS_PAYMENT TOTAL_PAYMENT_CNT
TOTAL_PAYMENT_CN
T
[SUM] [1] 1
ADS DS_PAYMENT MAX_PAYMENT_AMT MAX_PAYMENT_AMT [MAX] [3] 1
ADS DS_PAYMENT MIN_PAYMENT_AMT MIN_PAYMENT_AMT [MIN] [3] 1
ADS DS_PAYMENT ADD_PAYMENT_CNT ADD_PAYMENT_CNT [AVG/P] [6] 1
ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [SUM] [1] 1
ADS
DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR
[AVG/P, MAX,
MIN]
[3, 6] 1
ADS
DS_USAGE USAGE_OUT_IN_PACK_DUR
USAGE_OUT_IN_PACK
_DUR
[SUM] [1] 1
ADS
DS_USAGE
NVL(USAGE_OUT_REG_INT_DUR,
0) +
NVL(USAGE_OUT_INT_DUR,0) USAGE_OUT_INT_DUR
[AVG/P] [6] 1
for i, line in enumerate(variables):
for i2, k in enumerate(line[2]): #funkcija
for i3, kk in enumerate(line[3]): #period
if (i == len(variables) - 1) & (i2 == len(line[2])-1) & (i3 == len(line[3])-1):
zarez = ''
else:
zarez = ','
#KREIRA AGREGACIONU KOLONU, npr. AVG(FIELD_NAME) AS NEW_FIELD_NAME
divider = ''
if 'AVG/P' == str.upper(k):
func1 = 'SUM'
func2 = '_' + 'AVG'
divider = '/' + str(kk)
elif ('SUM' == str.upper(k)) & (kk == '1'):
func1 = 'SUM'
func2 = ''
else:
func1 = k
func2 = '_' + k
query += (func1 + '(' + line[1] + '_' + str(kk) + 'M' + ')' + divider + ' AS ' + line[1] + func2 + '_' + str(kk) + 'M' + zarez + ' n’)
…
for i, line in enumerate(variables):
for i2, line2 in enumerate(line[3]):
if (i == len(variables) - 1) & (i2 == len(line[3])-1):
zarez = ''
else:
zarez = ','
if line[4] == 1:
zero_rule = 'AND {varijabla} <> 0'.format(varijabla = line[0])
else:
zero_rule = ''
query += ("CASE WHEN TIME_ID BETWEEN ADD_MONTHS('{datum_place}', {vreme2}) AND
'{datum_place}' {zero_rule} THEN {varijabla} ELSE
NULL END AS
{varijabla2}_{vreme}M{zarez_place}".format(varijabla = line[0],
varijabla2 = line[1], datum_place = datum, vreme2 = -1 * (int(line2) - 1),
zero_rule=zero_rule, vreme = line2, zarez_place = zarez))+ ' n'
query += ("FROMn
Comtrade System Integration
Develop phase - Devote more time to the creative side
→Improve ML traditional development processes:
• Benefit from pre-trained models (deep learning – mainly image recognition)
• Automated Machine learning (AutoML) – pretty good in supervised ML
9
→Auto ML:
• Optimize DS workload or lack of experience
• Processes tasks like Feature Selection, Data Preprocessing, Hyperparameter Optimization,
Model/Algorithm Selection
• Let you focus more on the data side
• Is no silver bullet, it is more exploration tool rather than an optimal model generation tool
MLBox, Auto-Sklearn, TPOT, H2O AutoML, Auto Keras, Auto PyTorch, Google Cloud AutoML, DataRobot, etc.​
Comtrade System Integration
Deploy phase - don’tgetanyvalueoutofamodelsittingonsomeonecomputer
→Phase where model is transferred to a production environment.
→Same best-practice principles and design patterns for software also apply to ML models
→ML model should be deployed as part of existing data pipeline
→Output of ML model should be monitored for bias
→ML model in deploy phase:
• Registered in appropriate repository
• Passed testing
• Model artifacts are retained
→Validate model  Publish model Deliver model
→Don’t update Python libraries before proper testing on development environment 😊 10
Comtrade System Integration
Deploy phase – more than one ML model
12
→Model registry:
• Place for all trained/production-ready models (with version control)
• Alternative models as backup
• All model artifacts, model dependencies, evaluation metrics, documentation
• Which dataset was used for training / model lineage
• Log performance details of the model and comparison with other models
• Tracking models during whole time (training, staging and production)
→Model registry enables faster deployment of your models or retrain current ones
→Shared by multiple team members (team collaboration)
→Tie up business rules and output from production model
→Consume the model through API integration
Comtrade System Integration
Single
Pipeline for
datatransfer
Conclusion
12
Easy
deployment
Smart
Generic
Data
Mart(s)
More
creative
time
Contact us as on:
Danijel.Ilievski@comtrade.com
Jelena.Pekez@comtrade.com
Milos.Josifovic@comtrade.com Milos.
Q&A
www.comtradeintegration.com
Copyright © 2020 Comtrade. All rights reserved. The content of this presentation is copyright protected. Any reproduction, distribution, or modification is not allowed.
The information, solutions, and opinions contained in this presentation are of informative nature only and are not intended to be a comprehensive study, nor should they be relied on or treated as
a means to provide a complete solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality information, but we make no claims, promises, or
guaranties about the accuracy, completeness, or adequacy of the information contained herein.
Thank you

More Related Content

Similar to [DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic

Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Databricks
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Paige_Roberts
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems
Cavien Clever
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Edunomica
 
Prateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resumePrateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resume
Prateek Sharma
 
MSBI Online Training in Hyderabad
MSBI Online Training in HyderabadMSBI Online Training in Hyderabad
MSBI Online Training in Hyderabad
united global soft
 
MSBI Online Training
MSBI Online Training MSBI Online Training
MSBI Online Training
united global soft
 
MSBI Online Training in India
MSBI Online Training in IndiaMSBI Online Training in India
MSBI Online Training in India
united global soft
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
Sumo Logic
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Anant Corporation
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith Kumar Pampatti
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Joachim Schlosser
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
Deepak Chandramouli
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 

Similar to [DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic (20)

Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...Efficiently Building Machine Learning Models for Predictive Maintenance in th...
Efficiently Building Machine Learning Models for Predictive Maintenance in th...
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production FasterPython + MPP Database = Large Scale AI/ML Projects in Production Faster
Python + MPP Database = Large Scale AI/ML Projects in Production Faster
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems SOLIDWORKS reseller Whitepaper by Promedia Systems
SOLIDWORKS reseller Whitepaper by Promedia Systems
 
JESSIESEMANA_CV_1
JESSIESEMANA_CV_1JESSIESEMANA_CV_1
JESSIESEMANA_CV_1
 
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine LearningPaige Roberts: Shortcut MLOps with In-Database Machine Learning
Paige Roberts: Shortcut MLOps with In-Database Machine Learning
 
Prateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resumePrateek sharma etl_datastage_exp3.9yrs_resume
Prateek sharma etl_datastage_exp3.9yrs_resume
 
Sandeep Grandhi (1)
Sandeep Grandhi (1)Sandeep Grandhi (1)
Sandeep Grandhi (1)
 
MSBI Online Training in Hyderabad
MSBI Online Training in HyderabadMSBI Online Training in Hyderabad
MSBI Online Training in Hyderabad
 
MSBI Online Training
MSBI Online Training MSBI Online Training
MSBI Online Training
 
MSBI Online Training in India
MSBI Online Training in IndiaMSBI Online Training in India
MSBI Online Training in India
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data ScienceData Science as a Service: Intersection of Cloud Computing and Data Science
Data Science as a Service: Intersection of Cloud Computing and Data Science
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
ChakravarthyUppara
ChakravarthyUpparaChakravarthyUppara
ChakravarthyUppara
 
Ajith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETLAjith_kumar_4.3 Years_Informatica_ETL
Ajith_kumar_4.3 Years_Informatica_ETL
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
PayPal datalake journey | teradata - edge of next | san diego | 2017 october ...
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 

More from DataScienceConferenc1

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
DataScienceConferenc1
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
DataScienceConferenc1
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
DataScienceConferenc1
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
DataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
DataScienceConferenc1
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
DataScienceConferenc1
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
DataScienceConferenc1
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
DataScienceConferenc1
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
DataScienceConferenc1
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
DataScienceConferenc1
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
DataScienceConferenc1
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
DataScienceConferenc1
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
DataScienceConferenc1
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
DataScienceConferenc1
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
DataScienceConferenc1
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
DataScienceConferenc1
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
DataScienceConferenc1
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
DataScienceConferenc1
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
DataScienceConferenc1
 

More from DataScienceConferenc1 (20)

[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
[DSC MENA 24] Mostafa_Essa_-_Ai_and_cloud.pdf
 
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
[DSC MENA 24] Yasser_El_Bendary - How NLP & LLMs model can excel in comprehen...
 
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
[DSC MENA 24] Medhat_Kandil - Empowering Egypt's AI & Biotechnology Scenes.pdf
 
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
[DSC MENA 24] Youssef_Kamal - Data governance and quality.pdf
 
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
[DSC MENA 24] Abdelrahman_Ghallab_-_Data_Product_mgmt.pdf
 
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
[DSC MENA 24] Asmaa_Eltaher_-_Innovation_Beyond_Brainstorming.pptx
 
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
[DSC MENA 24] Muhammad_Ezzat_-_Sustianable_Growth_Empowerment.pdf
 
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
[DSC MENA 24] Basma_Rady_-_Building_a_Data_Driven_Culture_in_Your_Organizatio...
 
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
[DSC MENA 24] Ahmed_Muselhy_-_Unveiling-the-Secrets-of-AI-in-Hiring.pdf
 
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
[DSC MENA 24] Ziad_Diab_-_Data-Driven_Disruption_-_The_Role_of_Data_Strategy_...
 
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
[DSC MENA 24] Mohammad_Essam_- Leveraging Scene Graphs for Generative AI and ...
 
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
[DSC MENA 24] Ahmed_Fahmy - Navigating the Future.pdf
 
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
[DSC MENA 24] Hany_Saad_Gheit_-_Azure_OpenAI_service.pptx
 
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
[DSC MENA 24] Nezar_El_Kady_-_From_Turing_to_Transformers__Navigating_the_AI_...
 
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
[DSC MENA 24] Amira_Abdelaziz_-_AI_in_Financial_Services.pptx
 
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
[DSC MENA 24] Omar_Ossama - My Journey from the Field of Oil & Gas, to the Ex...
 
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
[DSC MENA 24] Ramy_Agieb_-_Advancements_in_Artificial_Intelligence_for_Cybers...
 
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
[DSC MENA 24] Sohaila_Diab_-_Lets_Talk_Gen_AI_Presentation.pptx
 
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
[DSC MENA 24] Amal_Elgammal_-_QUALITOP_presentation.pptx
 
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
[DSC MENA 24] Abdelrahman_Sleem_-_AI_For_Marketing_DSC.pdf
 

Recently uploaded

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
alex933524
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 

Recently uploaded (20)

一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 

[DSC Europe 22] Smart approach in development and deployment process for various ML models - Danijel Ilievski & Milos Josifovic

  • 1. Smart approach in development and deployment process for various ML models Jelena Pekez (Advanced Analytics Team Lead) Miloš Josifović (Big Data Architect) Danijel Ilievski (Senior ML Engineer)
  • 2. Comtrade System Integration Introduction →Since 87% of models are never deployed, all steps should be planned at the beginning of Data Science Lifecycle (pipeline): 1. Manage 2. Develop 3. Deploy 4. Monitor →The first goal is to reduce go to production time for new ML models with development of Smart Generic Data Mart(s). →With Smart Data Mart(s) we can prototype ML model and evaluate feasibility. →The final goal is to generate Production Models and easily orchestrate them. 2 Results Interpretation Modeling Data Preprocess Data mart design ADS Problem Formulation Deployment PROD.MODEL
  • 3. Comtrade System Integration 3 ADS smart development to support all future ML models →Planning DataMart for creation of first ML model in a program takes exhaustive time: • Collect at high-level all possible future use-cases • Come up with all relevant and available data sources • Customer’s activities which company has interest in • Combine data from structured and unstructured data sources • Extensive feature engineering (text processing, normalization, binning,…) • Complying with GDPR regulation • Define proper access rights on selected Data Mart(s) • Resolving data quality issues at the very beginning will reduce endless reloads FornextMLmodeldatascientistscanspendmoretimeoncreativeactivitiesusingdevelopedAnalyticalDatamarts/Sets(ADS)
  • 4. Comtrade System Integration Smart generic data mart(s) →Creating Multipurpose Data Marts: • Generate list of target features and relevant target events • Design it so new events can be easily added • Eliminate data that have no business/use-case value • Filter out system records - clean data • Make initial (starting) base table/s - what is definition of customer? • Aggregate data to different granularity levels to catch behavior trends • Feature Engineering do indeed make a difference! 4 Generate quickly and easily new ML training datasets
  • 5. Comtrade System Integration Data Science requires domain knowledge makes a big difference →How much domain knowledge do I need? Depends. →Domain knowledge is critical for data preparation, productization and orchestration →Which data points add value? →Domain knowledge is necessary in data pre-processing: • Outlier detection, feature importance, model selection, model evaluation stage... 5 DATA SCIENCE DOMAIN KNOWLEDGE MATH, STATS & ML COMPUTER SCIENCE You have to get best of both worlds!
  • 6. Comtrade System Integration Control your data mart(s) in production →Steps in data pipeline for data quality check: • Missing data vs Loaded data - aggregations • Duplicates – the same records were repeated • Relative change threshold - increment or decrement in the number of records • Statistical expected range • Data drift – target variable distribution 6 Data Pipeline
  • 7. Comtrade System Integration Example how Generic Data Set can help to focus on Data Science – Transfer between DWH and Data Lake →Data on two platforms (DWH – SQL database, Data Lake – Hadoop) →Data can be transferred among databases: • Through SQL federation / DB link – with certain specifics/products compatibility • Via Spark engine (PySpark) to Hadoop →Aim is to simplify data transfer between platforms so, Data Scientist can do it on their own, without: • Dealing with Spark’s jobs directly • Manage Hadoop security (Kerberos, read-write permissions, etc.) 7
  • 8. Comtrade System Integration Speed up writting SQL queries →ADS  [GENERATE SQL QUERY]  Training/Scoring table →Query automation for training table → Input for Python script: e.g. of Python script: 8 SCHEMA SOURCE VAR_IN VAR_OUT FUNCTIONS PERIOD S ZERO EXCLUDE ADS DS_PAYMENT TOTAL_PAYMENT_AMT TOTAL_PAYMENT_AM T [MAX, AVG/P] [3, 6] 1 ADS DS_PAYMENT TOTAL_PAYMENT_CNT TOTAL_PAYMENT_CN T [SUM] [1] 1 ADS DS_PAYMENT MAX_PAYMENT_AMT MAX_PAYMENT_AMT [MAX] [3] 1 ADS DS_PAYMENT MIN_PAYMENT_AMT MIN_PAYMENT_AMT [MIN] [3] 1 ADS DS_PAYMENT ADD_PAYMENT_CNT ADD_PAYMENT_CNT [AVG/P] [6] 1 ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [SUM] [1] 1 ADS DS_USAGE USAGE_OUT_DUR USAGE_OUT_DUR [AVG/P, MAX, MIN] [3, 6] 1 ADS DS_USAGE USAGE_OUT_IN_PACK_DUR USAGE_OUT_IN_PACK _DUR [SUM] [1] 1 ADS DS_USAGE NVL(USAGE_OUT_REG_INT_DUR, 0) + NVL(USAGE_OUT_INT_DUR,0) USAGE_OUT_INT_DUR [AVG/P] [6] 1 for i, line in enumerate(variables): for i2, k in enumerate(line[2]): #funkcija for i3, kk in enumerate(line[3]): #period if (i == len(variables) - 1) & (i2 == len(line[2])-1) & (i3 == len(line[3])-1): zarez = '' else: zarez = ',' #KREIRA AGREGACIONU KOLONU, npr. AVG(FIELD_NAME) AS NEW_FIELD_NAME divider = '' if 'AVG/P' == str.upper(k): func1 = 'SUM' func2 = '_' + 'AVG' divider = '/' + str(kk) elif ('SUM' == str.upper(k)) & (kk == '1'): func1 = 'SUM' func2 = '' else: func1 = k func2 = '_' + k query += (func1 + '(' + line[1] + '_' + str(kk) + 'M' + ')' + divider + ' AS ' + line[1] + func2 + '_' + str(kk) + 'M' + zarez + ' n’) … for i, line in enumerate(variables): for i2, line2 in enumerate(line[3]): if (i == len(variables) - 1) & (i2 == len(line[3])-1): zarez = '' else: zarez = ',' if line[4] == 1: zero_rule = 'AND {varijabla} <> 0'.format(varijabla = line[0]) else: zero_rule = '' query += ("CASE WHEN TIME_ID BETWEEN ADD_MONTHS('{datum_place}', {vreme2}) AND '{datum_place}' {zero_rule} THEN {varijabla} ELSE NULL END AS {varijabla2}_{vreme}M{zarez_place}".format(varijabla = line[0], varijabla2 = line[1], datum_place = datum, vreme2 = -1 * (int(line2) - 1), zero_rule=zero_rule, vreme = line2, zarez_place = zarez))+ ' n' query += ("FROMn
  • 9. Comtrade System Integration Develop phase - Devote more time to the creative side →Improve ML traditional development processes: • Benefit from pre-trained models (deep learning – mainly image recognition) • Automated Machine learning (AutoML) – pretty good in supervised ML 9 →Auto ML: • Optimize DS workload or lack of experience • Processes tasks like Feature Selection, Data Preprocessing, Hyperparameter Optimization, Model/Algorithm Selection • Let you focus more on the data side • Is no silver bullet, it is more exploration tool rather than an optimal model generation tool MLBox, Auto-Sklearn, TPOT, H2O AutoML, Auto Keras, Auto PyTorch, Google Cloud AutoML, DataRobot, etc.​
  • 10. Comtrade System Integration Deploy phase - don’tgetanyvalueoutofamodelsittingonsomeonecomputer →Phase where model is transferred to a production environment. →Same best-practice principles and design patterns for software also apply to ML models →ML model should be deployed as part of existing data pipeline →Output of ML model should be monitored for bias →ML model in deploy phase: • Registered in appropriate repository • Passed testing • Model artifacts are retained →Validate model  Publish model Deliver model →Don’t update Python libraries before proper testing on development environment 😊 10
  • 11. Comtrade System Integration Deploy phase – more than one ML model 12 →Model registry: • Place for all trained/production-ready models (with version control) • Alternative models as backup • All model artifacts, model dependencies, evaluation metrics, documentation • Which dataset was used for training / model lineage • Log performance details of the model and comparison with other models • Tracking models during whole time (training, staging and production) →Model registry enables faster deployment of your models or retrain current ones →Shared by multiple team members (team collaboration) →Tie up business rules and output from production model →Consume the model through API integration
  • 12. Comtrade System Integration Single Pipeline for datatransfer Conclusion 12 Easy deployment Smart Generic Data Mart(s) More creative time
  • 13. Contact us as on: Danijel.Ilievski@comtrade.com Jelena.Pekez@comtrade.com Milos.Josifovic@comtrade.com Milos.
  • 14. Q&A
  • 15. www.comtradeintegration.com Copyright © 2020 Comtrade. All rights reserved. The content of this presentation is copyright protected. Any reproduction, distribution, or modification is not allowed. The information, solutions, and opinions contained in this presentation are of informative nature only and are not intended to be a comprehensive study, nor should they be relied on or treated as a means to provide a complete solution or advice, since we may not be aware of all specific circumstances of the case. We try to provide quality information, but we make no claims, promises, or guaranties about the accuracy, completeness, or adequacy of the information contained herein. Thank you

Editor's Notes

  1. DANIJEL
  2. DANIJEL During deployment in large organizations, we have to orchestrate more than one ML model and best thing is to have in mind that at very beginning of projects that we will have more ml models in future, so organize everything in that manner that can support adding new models easily. … - Since very beginning special focus in Data Science Lifecycle should be on data quality and production. Foundation for more models in a future: Development of analytical dataset for future models development we can observe like a different project.
  3. JELENA So if we go more in details…. Kada se razvija model, focus na pripremi podataka –Organize DB tables considering performance and optimization Analiza dodavanje kolona, bitnih izvora Osmislite izvore, target tabele, kako organiz. Tabel po pitanju perform, I logike, imati higtlevel koji su use case-ove.
  4. JELENA POMENUTI:  Organize DB tables considering performance and optimization Feature Engineering - isn't about generating a higher quantity of new features. It's about the quality of the features created. 
  5. -DANIJEL ILI JELENA Doman knowledge cannot be optimized. - Make an instruction file with field names and action how to handling null: Constant value, Max(), Min(), Mean(), Nearby value, Regression, Delete record - Domain knowledge will allow you to take the impact of your machine learning skills to a much higher level of significance. -------------- --Random forests, for example, can handle heterogeneous data types right out of the box. As Data Scientist with domain knowledge you will have answer on question Which data points add value? And you just need to find them.
  6. DANIJEL
  7. MILOS Benefit / suggestion: Parallel execution No temp data on initial database Fast transfer Careful about data types specified on table level
  8. DANIJEL
  9. JELNEA DO KRAJA Efficiently automate all regular, manual, and tedious workloads of ML implementations „Fails short“ for Feature Engineering. Can easily overfit (watch for label distribution, how many outliers, etc.
  10. Deploy model as a stand alone container - easier