SlideShare a Scribd company logo
Ph.D in Computer Science at ENS Paris/INRIA
Postdoctoral Fellow at Carnegie Mellon University
>500 citations, Best Paper Award at 2009 CVPR Conference
NEC Labs (Bell Labs) in Cupertino (Silicon Valley)
Senior Researcher at Intel (3 pending patents)
- Developed ML algorithms for face recognition
Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc.
Co-Founder of Solidware
Olivier Duchenne
Co-founder | Chief Machine Learning Scientist
8 years experience in Machine learning, Computer Vision and Big Data
Guidelines for using Machine Learning on real data
Avoid Common Mistakes
Understand Better the Data
1.Big Enough Data?
2.Changing Data
Machine Learning and Data Science
From Computer Vision Experience
To Solving Companies issues:
Ex: car accident prediction (insurance),
default prediction (bank),
stock value prediction
Machine Learning and Data Science
Prediction Function
Predicted Target Value
ML Algorithms analyze
historical data
to detect patterns
PAST DATA
(Training Data Set)
Internal Data
Ex: Age, Gender
External Data
Ex: Web Crawl
Target Value
Machine-Learning based Predictive Modeling
Newly Incoming Data
Unknown
Target Value
Internal Data External Data
1. Prediction Function. Ex: a linear function, a neural net,…
2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊
3. The goal is to find the best prediction function, i.e. the best
parameters.
4. We build an objective function, that represents how good a
prediction function is.
5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 =
𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐
𝒔
6. The algorithm tries to find the best parameters, that optimizes this
objective function. Ex: closed form solution, stochastic gradient
descent, …
Basic Explanation of Machine Learning
History of Machine Learning for Computer Vision
Model-Driven Mixed Data-Driven
1970s
Hand-designed Model
1980s
Alignment
Method
2000s
Deformable
Model
2010s
Conv. Network
1990s
Grid Model
Why didn’t people use ML since the beginning?
General Assumptions for the reason
1.“Better Computer” available now
2.“Better Algorithms”
3.“Amount of Data”
“We create so much data that 90% of the data in the world today has
been created in the last two years alone”
- Petter Bae Brandtzæ g, SINTEF ICT
How much data did CV Researcher use?
Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/
2004
Caltech 101
10K Images
2005-2010
Pascal VOC
2K  30K objects
2010-2015
Image Net
10M  15M images
http://www.image-net.org/
The answer is… “Amount of Data”
Image source - Smartdatacollective.com
• Most Advanced Machine
Learning cannot be applied if
there are not enough data
• Critical mass of data is
necessary to use, for example,
deep learning
• When the amount of data
increases, the machine
learning models and, therefore,
the prediction model becomes
more complex and better
With enough data, ANY algorithms okay?
Support vector machines Bayesian networks
Regression forestSparse dictionary learning
Artificial neural networksK-Nearest neighbors
Deep learning Boosting
Deep Learning Neural Networks Log. Regression
No, it depends on the company and the problem you are trying to solve
A B C
What Changed in Machine Learning Domain
From the Past to the Present:
Synonym: Over generalizing
That is like visiting a new place during one day, seeing a mountain fire.
And believing that there are fires everyday there.
Why do we need lots of data?
Overfitting
In real life, we do not have many chances of having
clean & BIG data
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejeon Gwangju
Prob. To default
Prob. To default
… (many more cities)
An example: Overfitting due to lack of data
As there are many
categories,
some categories with small
data show outlier results
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejon Kyangju
Prob. To default
Prob. To default
… (many more cities)
So, always use error bars
You want to detect an event which occur on average with probability: p=5%
Let’s say you have many cities with ~50 samples
On average, 1/13 will have this event 0 times.
Without proper handling, the extreme case, will be all wrong.
This kind of error can happen often
How to fight against overfitting
Data
More Samples
Less Variables
Artificial Data Extension
Algorithm
Simpler Objective Function
Regularization
Bagging
Modeling
Feature Engineering
Data Normalization
Data
In Computer Vision, it is possible to extend the data.
Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha
Companies often they have a limited number of samples, and cannot extend it.
Ex: A Korean Bank that gives ~100K loans per year
1. Count only positives ( Detecting rare events require more data)
Ex: Image Detection. It’s easy to find an infinite number of negatives.
Often company want to detect rare events (few positives)
Ex: predicting car accident / ad clicks / defaults / online purchase
How to count your data?
2. Difficulty of the task
How to count your data?
• Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 )
(Requires ~100 samples)
• Learning object recognition
( Requires ~10M samples)
3. Probabilistic event detection is harder.
What is in this image? Will this user click on a car advertisement?
Client #1: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Client #2: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Yes
No
How to count your data?
Algorithm
1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM,
Gaussian Process, Bayesian Networks, Deep Learning, …
2. The complexity of their prediction functions differ.
3. The more complex the prediction function is, the more it fits the data.
Purchase
Prob.
Age
Purchase
Prob.
Age
Purchase
Prob.
Age
Underfitting Overfitting
Algorithm
1. Less parameters  Less overfitting
2. More parameters  Less underfitting
3. Ex: Best of both worlds: Deep Conv Nets
Algorithm
Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Grouping
Merging
Avoiding “Too Many Categories” problem
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6
Prob. To default
Prob. To default log10(population)
Regularization
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃)
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆
Ω 𝜃 =
𝜃 2
𝜃 1
Data Normalization
Removing variance that has no impact on the target value  Help the ML system to focus on meaningful variance
Deep Face (Facebook 2014), DB size: 120M images
Bagging
1. Randomly modify slightly the training set.
2. Do the training
3. Repeat
4. Average all prediction functions
• Market changes
• Law/Regulation Changes
• Collected Data changes
• Client filtering / Marketing changes
 Data change through time
 Representation of data change
• Variable names change
• Category names change
Changing Data
• Cyclic Data Changes
 Seasonality
• Trending has to be handled separately
 Interpolation – Extrapolation
Why is time so different from other variables ?
Prob.
To buy
A
smartphone
Age
Prob.
To buy
A
smartphone
Time
?
?
Interpolation Extrapolation
Time is correlated with hidden variables
Cost for car
insurance
(one type of
insurance)
Time
New Law
Change causes can be unknown, but consistant
Cost for car
insurance
(one type of
insurance)
Time
Seasonality
Cost for car
insurance
(one type of
insurance)
Time
Changing Data Representation
• Collected Data changes
• Category splitting, merging
• Variable names change
• Category names change
Job Applications: contact@solidware.io
Visit our booth 
Thank you
Visit our website: solidware.io

More Related Content

What's hot

FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
Kusano Hitoshi
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
NAVER D2
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
Yi-Fan Liou
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
Alexander Ulanov
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518
Yi-Fan Liou
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
Nicholas McClure
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
Kenta Oono
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
Seiya Tokui
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
Sri Ambati
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
Sri Ambati
 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ss
Satoshi Kume
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
Jen Aman
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Spark Summit
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computingbutest
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
Databricks
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
de:code 2017
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
Amazon Web Services
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Kenta Oono
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305mjfrankli
 

What's hot (20)

FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsFCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects
 
[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
20190927 generative models_aia
20190927 generative models_aia20190927 generative models_aia
20190927 generative models_aia
 
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
A Scalable Implementation of Deep Learning on Spark (Alexander Ulanov)
 
Machine teaching tbo_20190518
Machine teaching tbo_20190518Machine teaching tbo_20190518
Machine teaching tbo_20190518
 
Introduction to Neural Networks in Tensorflow
Introduction to Neural Networks in TensorflowIntroduction to Neural Networks in Tensorflow
Introduction to Neural Networks in Tensorflow
 
深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介深層学習フレームワーク概要とChainerの事例紹介
深層学習フレームワーク概要とChainerの事例紹介
 
Chainer v3
Chainer v3Chainer v3
Chainer v3
 
Deep Learning through Examples
Deep Learning through ExamplesDeep Learning through Examples
Deep Learning through Examples
 
Intro to Machine Learning for GPUs
Intro to Machine Learning for GPUsIntro to Machine Learning for GPUs
Intro to Machine Learning for GPUs
 
200612_BioPackathon_ss
200612_BioPackathon_ss200612_BioPackathon_ss
200612_BioPackathon_ss
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves MabialaDeep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
Deep Recurrent Neural Networks for Sequence Learning in Spark by Yves Mabiala
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Applying your Convolutional Neural Networks
Applying your Convolutional Neural NetworksApplying your Convolutional Neural Networks
Applying your Convolutional Neural Networks
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
BDT305 Transforming Big Data with Spark and Shark - AWS re: Invent 2012
 
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...Comparison of deep learning frameworks from a viewpoint of double backpropaga...
Comparison of deep learning frameworks from a viewpoint of double backpropaga...
 
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
Transforming Big Data with Spark and Shark - AWS Re:Invent 2012 BDT 305
 

Viewers also liked

[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델
NAVER D2
 
[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기
NAVER D2
 
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
NAVER D2
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit
NAVER D2
 
[223] h base consistent secondary indexing
[223] h base consistent secondary indexing[223] h base consistent secondary indexing
[223] h base consistent secondary indexing
NAVER D2
 
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
NAVER D2
 
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
NAVER D2
 
[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템
NAVER D2
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기
NAVER D2
 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2
NAVER D2
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
NAVER D2
 
[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos
NAVER D2
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
NAVER D2
 
[212] large scale backend service develpment
[212] large scale backend service develpment[212] large scale backend service develpment
[212] large scale backend service develpment
NAVER D2
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D2
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝
NAVER D2
 
[214] data science with apache zeppelin
[214] data science with apache zeppelin[214] data science with apache zeppelin
[214] data science with apache zeppelin
NAVER D2
 
[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법
NAVER D2
 
[213] ethereum
[213] ethereum[213] ethereum
[213] ethereum
NAVER D2
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
NAVER D2
 

Viewers also liked (20)

[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델[244] 분산 환경에서 스트림과 배치 처리 통합 모델
[244] 분산 환경에서 스트림과 배치 처리 통합 모델
 
[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기[242] wifi를 이용한 실내 장소 인식하기
[242] wifi를 이용한 실내 장소 인식하기
 
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
[241] Storm과 Elasticsearch를 활용한 로깅 플랫폼의 실시간 알람 시스템 구현
 
[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit[231] the simplicity of cluster apps with circuit
[231] the simplicity of cluster apps with circuit
 
[223] h base consistent secondary indexing
[223] h base consistent secondary indexing[223] h base consistent secondary indexing
[223] h base consistent secondary indexing
 
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스[232] 수퍼컴퓨팅과 데이터 어낼리틱스
[232] 수퍼컴퓨팅과 데이터 어낼리틱스
 
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
[234] 산업 현장을 위한 증강 현실 기기 daqri helmet 개발기
 
[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템[224] 번역 모델 기반_질의_교정_시스템
[224] 번역 모델 기반_질의_교정_시스템
 
[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기[252] 증분 처리 플랫폼 cana 개발기
[252] 증분 처리 플랫폼 cana 개발기
 
[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2[263] s2graph large-scale-graph-database-with-hbase-2
[263] s2graph large-scale-graph-database-with-hbase-2
 
[253] apache ni fi
[253] apache ni fi[253] apache ni fi
[253] apache ni fi
 
[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos[233] level 2 network programming using packet ngin rtos
[233] level 2 network programming using packet ngin rtos
 
[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼[262] netflix 빅데이터 플랫폼
[262] netflix 빅데이터 플랫폼
 
[212] large scale backend service develpment
[212] large scale backend service develpment[212] large scale backend service develpment
[212] large scale backend service develpment
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝[211] 네이버 검색과 데이터마이닝
[211] 네이버 검색과 데이터마이닝
 
[214] data science with apache zeppelin
[214] data science with apache zeppelin[214] data science with apache zeppelin
[214] data science with apache zeppelin
 
[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법[222]대화 시스템 서비스 동향 및 개발 방법
[222]대화 시스템 서비스 동향 및 개발 방법
 
[213] ethereum
[213] ethereum[213] ethereum
[213] ethereum
 
[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark[264] large scale deep-learning_on_spark
[264] large scale deep-learning_on_spark
 

Similar to [243] turning data into value

Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
László Kovács
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
Theo Schlossnagle
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .pptbutest
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
Unity Technologies
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
Armando Vieira
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
Max Pagels
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
SouravBiswas747273
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
Srinath Perera
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Dr. Sunil Kr. Pandey
 
ML basics.pptx
ML basics.pptxML basics.pptx
ML basics.pptx
PriyadharshiniG41
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
Andreas Dewes
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
Dr. Kim (Kyllesbech Larsen)
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.
Shashank Garg
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Mathieu DESPRIEE
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Govind Mudumbai
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual Overview
Modex
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
OCTO Technology
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
Saratoga
 

Similar to [243] turning data into value (20)

Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
The math behind big systems analysis.
The math behind big systems analysis.The math behind big systems analysis.
The math behind big systems analysis.
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...machine learning in the age of big data: new approaches and business applicat...
machine learning in the age of big data: new approaches and business applicat...
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
DataScience_introduction.pdf
DataScience_introduction.pdfDataScience_introduction.pdf
DataScience_introduction.pdf
 
Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference Data Science in the Real World: Making a Difference
Data Science in the Real World: Making a Difference
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
ML basics.pptx
ML basics.pptxML basics.pptx
ML basics.pptx
 
Say "Hi!" to Your New Boss
Say "Hi!" to Your New BossSay "Hi!" to Your New Boss
Say "Hi!" to Your New Boss
 
Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.Artificial intelligence - A Teaser to the Topic.
Artificial intelligence - A Teaser to the Topic.
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Modex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual OverviewModex Talks - AI Conceptual Overview
Modex Talks - AI Conceptual Overview
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 

More from NAVER D2

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
NAVER D2
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
NAVER D2
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
NAVER D2
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
NAVER D2
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
NAVER D2
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
NAVER D2
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
NAVER D2
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
NAVER D2
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
NAVER D2
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
NAVER D2
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
NAVER D2
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
NAVER D2
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
NAVER D2
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
NAVER D2
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
NAVER D2
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
NAVER D2
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
NAVER D2
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
NAVER D2
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
NAVER D2
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
NAVER D2
 

More from NAVER D2 (20)

[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다[211] 인공지능이 인공지능 챗봇을 만든다
[211] 인공지능이 인공지능 챗봇을 만든다
 
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
[233] 대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing: Maglev Hashing Scheduler i...
 
[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기[215] Druid로 쉽고 빠르게 데이터 분석하기
[215] Druid로 쉽고 빠르게 데이터 분석하기
 
[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발[245]Papago Internals: 모델분석과 응용기술 개발
[245]Papago Internals: 모델분석과 응용기술 개발
 
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
[236] 스트림 저장소 최적화 이야기: 아파치 드루이드로부터 얻은 교훈
 
[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A[235]Wikipedia-scale Q&A
[235]Wikipedia-scale Q&A
 
[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기[244]로봇이 현실 세계에 대해 학습하도록 만들기
[244]로봇이 현실 세계에 대해 학습하도록 만들기
 
[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning[243] Deep Learning to help student’s Deep Learning
[243] Deep Learning to help student’s Deep Learning
 
[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications[234]Fast & Accurate Data Annotation Pipeline for AI applications
[234]Fast & Accurate Data Annotation Pipeline for AI applications
 
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load BalancingOld version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
Old version: [233]대형 컨테이너 클러스터에서의 고가용성 Network Load Balancing
 
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
[226]NAVER 광고 deep click prediction: 모델링부터 서빙까지
 
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
[225]NSML: 머신러닝 플랫폼 서비스하기 & 모델 튜닝 자동화하기
 
[224]네이버 검색과 개인화
[224]네이버 검색과 개인화[224]네이버 검색과 개인화
[224]네이버 검색과 개인화
 
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
[216]Search Reliability Engineering (부제: 지진에도 흔들리지 않는 네이버 검색시스템)
 
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
[214] Ai Serving Platform: 하루 수 억 건의 인퍼런스를 처리하기 위한 고군분투기
 
[213] Fashion Visual Search
[213] Fashion Visual Search[213] Fashion Visual Search
[213] Fashion Visual Search
 
[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화[232] TensorRT를 활용한 딥러닝 Inference 최적화
[232] TensorRT를 활용한 딥러닝 Inference 최적화
 
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
[242]컴퓨터 비전을 이용한 실내 지도 자동 업데이트 방법: 딥러닝을 통한 POI 변화 탐지
 
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
[212]C3, 데이터 처리에서 서빙까지 가능한 하둡 클러스터
 
[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?[223]기계독해 QA: 검색인가, NLP인가?
[223]기계독해 QA: 검색인가, NLP인가?
 

Recently uploaded

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 

Recently uploaded (20)

20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 

[243] turning data into value

  • 1.
  • 2. Ph.D in Computer Science at ENS Paris/INRIA Postdoctoral Fellow at Carnegie Mellon University >500 citations, Best Paper Award at 2009 CVPR Conference NEC Labs (Bell Labs) in Cupertino (Silicon Valley) Senior Researcher at Intel (3 pending patents) - Developed ML algorithms for face recognition Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc. Co-Founder of Solidware Olivier Duchenne Co-founder | Chief Machine Learning Scientist 8 years experience in Machine learning, Computer Vision and Big Data
  • 3. Guidelines for using Machine Learning on real data Avoid Common Mistakes Understand Better the Data 1.Big Enough Data? 2.Changing Data Machine Learning and Data Science
  • 4. From Computer Vision Experience To Solving Companies issues: Ex: car accident prediction (insurance), default prediction (bank), stock value prediction Machine Learning and Data Science
  • 5. Prediction Function Predicted Target Value ML Algorithms analyze historical data to detect patterns PAST DATA (Training Data Set) Internal Data Ex: Age, Gender External Data Ex: Web Crawl Target Value Machine-Learning based Predictive Modeling Newly Incoming Data Unknown Target Value Internal Data External Data
  • 6. 1. Prediction Function. Ex: a linear function, a neural net,… 2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊 3. The goal is to find the best prediction function, i.e. the best parameters. 4. We build an objective function, that represents how good a prediction function is. 5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 = 𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐 𝒔 6. The algorithm tries to find the best parameters, that optimizes this objective function. Ex: closed form solution, stochastic gradient descent, … Basic Explanation of Machine Learning
  • 7. History of Machine Learning for Computer Vision Model-Driven Mixed Data-Driven 1970s Hand-designed Model 1980s Alignment Method 2000s Deformable Model 2010s Conv. Network 1990s Grid Model
  • 8. Why didn’t people use ML since the beginning? General Assumptions for the reason 1.“Better Computer” available now 2.“Better Algorithms” 3.“Amount of Data” “We create so much data that 90% of the data in the world today has been created in the last two years alone” - Petter Bae Brandtzæ g, SINTEF ICT
  • 9. How much data did CV Researcher use? Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/ 2004 Caltech 101 10K Images 2005-2010 Pascal VOC 2K  30K objects 2010-2015 Image Net 10M  15M images http://www.image-net.org/
  • 10. The answer is… “Amount of Data” Image source - Smartdatacollective.com • Most Advanced Machine Learning cannot be applied if there are not enough data • Critical mass of data is necessary to use, for example, deep learning • When the amount of data increases, the machine learning models and, therefore, the prediction model becomes more complex and better
  • 11. With enough data, ANY algorithms okay? Support vector machines Bayesian networks Regression forestSparse dictionary learning Artificial neural networksK-Nearest neighbors Deep learning Boosting Deep Learning Neural Networks Log. Regression No, it depends on the company and the problem you are trying to solve A B C
  • 12. What Changed in Machine Learning Domain From the Past to the Present:
  • 13. Synonym: Over generalizing That is like visiting a new place during one day, seeing a mountain fire. And believing that there are fires everyday there. Why do we need lots of data? Overfitting In real life, we do not have many chances of having clean & BIG data
  • 14. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Seoul Busan Daejeon Gwangju Prob. To default Prob. To default … (many more cities) An example: Overfitting due to lack of data As there are many categories, some categories with small data show outlier results
  • 15. 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 Seoul Busan Daejon Kyangju Prob. To default Prob. To default … (many more cities) So, always use error bars
  • 16. You want to detect an event which occur on average with probability: p=5% Let’s say you have many cities with ~50 samples On average, 1/13 will have this event 0 times. Without proper handling, the extreme case, will be all wrong. This kind of error can happen often
  • 17. How to fight against overfitting Data More Samples Less Variables Artificial Data Extension Algorithm Simpler Objective Function Regularization Bagging Modeling Feature Engineering Data Normalization
  • 18. Data In Computer Vision, it is possible to extend the data. Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha Companies often they have a limited number of samples, and cannot extend it. Ex: A Korean Bank that gives ~100K loans per year
  • 19. 1. Count only positives ( Detecting rare events require more data) Ex: Image Detection. It’s easy to find an infinite number of negatives. Often company want to detect rare events (few positives) Ex: predicting car accident / ad clicks / defaults / online purchase How to count your data?
  • 20. 2. Difficulty of the task How to count your data? • Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 ) (Requires ~100 samples) • Learning object recognition ( Requires ~10M samples)
  • 21. 3. Probabilistic event detection is harder. What is in this image? Will this user click on a car advertisement? Client #1: Male, 27y.o, lives in Seoul, Salary man in the construction sector, already previously clicked on a car advertisement Client #2: Male, 27y.o, lives in Seoul, Salary man in the construction sector, already previously clicked on a car advertisement Yes No How to count your data?
  • 22. Algorithm 1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM, Gaussian Process, Bayesian Networks, Deep Learning, … 2. The complexity of their prediction functions differ. 3. The more complex the prediction function is, the more it fits the data. Purchase Prob. Age Purchase Prob. Age Purchase Prob. Age Underfitting Overfitting Algorithm
  • 23. 1. Less parameters  Less overfitting 2. More parameters  Less underfitting 3. Ex: Best of both worlds: Deep Conv Nets Algorithm
  • 24. Avoiding “Too Many Categories” problem Busan Seoul Dae- jeon Dae -gou Po- hang In- cheon Soo- won Ul- San
  • 25. Avoiding “Too Many Categories” problem Busan Seoul Dae- jeon Dae -gou Po- hang In- cheon Soo- won Ul- San Grouping Merging
  • 26. Avoiding “Too Many Categories” problem 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 1 2 3 4 5 6 Prob. To default Prob. To default log10(population)
  • 27. Regularization 𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃) 𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆 Ω 𝜃 = 𝜃 2 𝜃 1
  • 28. Data Normalization Removing variance that has no impact on the target value  Help the ML system to focus on meaningful variance Deep Face (Facebook 2014), DB size: 120M images
  • 29. Bagging 1. Randomly modify slightly the training set. 2. Do the training 3. Repeat 4. Average all prediction functions
  • 30. • Market changes • Law/Regulation Changes • Collected Data changes • Client filtering / Marketing changes  Data change through time  Representation of data change • Variable names change • Category names change Changing Data • Cyclic Data Changes  Seasonality • Trending has to be handled separately  Interpolation – Extrapolation
  • 31. Why is time so different from other variables ? Prob. To buy A smartphone Age Prob. To buy A smartphone Time ? ? Interpolation Extrapolation
  • 32. Time is correlated with hidden variables Cost for car insurance (one type of insurance) Time New Law
  • 33. Change causes can be unknown, but consistant Cost for car insurance (one type of insurance) Time
  • 34. Seasonality Cost for car insurance (one type of insurance) Time
  • 35. Changing Data Representation • Collected Data changes • Category splitting, merging • Variable names change • Category names change
  • 36. Job Applications: contact@solidware.io Visit our booth  Thank you Visit our website: solidware.io