SlideShare a Scribd company logo
Dmitry	Larko
Sr Data	Scientist
H2O.ai
@DmitryLarko
Feature	Engineering	
for	ML
About	me
6x	
"Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine
learning” is basically feature engineering." ~ Andrew Ng
Feature	Engineering
“… some machine learning projects succeed and some fail. What makes the difference? Easily the
most important factor is the features used.” ~ Pedro Domingos
“Good data preparation and feature engineering is integral to better prediction” ~ Marios Michailidis
(KazAnova), Kaggle GrandMaster, Kaggle #3, former #1
”you have to turn your inputs into things the algorithm can understand” ~ Shayne Miel, answer to
“What is the intuitive explanation of feature engineering in machine learning?”
What	is	feature	engineering
6x	
Not	possible	to	separate	using	linear	classifier
What	is	feature	engineering
6x	
What	if	we	use	polar	coordinates	
instead?
6x	
What	is	feature	engineering
6x
6x	
Feature	Engineering
6x	
Your	data
6x	
Your	data
6x	
Target
6x	
Your	data
6x	
Target Categorical Cat
6x	
Your	data
6x	
Target Categorical CatNum
6x	
Feature	Engineering
6x	
• Extract	more	new	gold	features,	remove	irrelevant	
or	noisy	features	
• Simpler	models	with	better	results
• Key	Elements
• Numerical	features
• Categorical	features
• Target	Transformation
6x	
Numerical	features	– missing	values
6x	
• Some	machine	learning	tools	cannot	accept	NAs	in	the	input	
• Binary	features
• -1	for	negatives,	0	for	missing	values	and	1	for	positives
• Numeric	features
• Tree-based	methods
• Encode	as	a	big	positive	or	negative	number
• 999,	-99999,	….
• 2*max(x),	2*min(x)
• Linear,	neural	nets,	etc.	methods
• Encode	by	splitting	into	2	columns:
• Binary	column	isNA (0	if	not	and	1	if	yes)
• In	original	column	replace	NAs	by	mean	or	median
6x	
Numerical	features	– missing	values
6x	
Binary Numerical
1 NaN
0 6
1 5
NaN 8
0 0
0 NaN
1 9
NaN 3
0 5
Binary
Numerical
ver 1: 2*max
ver 2: mean and isNA
Value isNA
1 18 5.14 1
-1 6 6 0
1 5 5 0
0 8 8 0
-1 0 0 0
-1 18 5.14 1
1 9 9 0
0 3 3 0
-1 5 5 0
6x	
Categorical	features	– missing	values
6x	
• Encode	as	an	unique	category
• “Unknown”,	“Missing”,	….
• Use	the	most	frequent	category	level
Feature 1
Encoded
Feature 1
Encoded
Feature 1
A A A
A A A
NaN MISSING A
A A A
B B B
B B B
NaN MISSING A
C C C
C C C
6x	
Categorical	Encoding
6x	
• Turn	categorical	features	into	numeric	features	to	
provide	more	fine-grained	information
• Help	explicitly	capture	non-linear	relationships	and	
interactions	between	the	values	of	features
• Most	of	machine	learning	tools	only	accept	numbers	as	
their	input
• xgboost,	gbm,	glmnet,	libsvm,	liblinear,	etc.
6x	
Categorical	Encoding
6x	
• Labeled	Encoding
• Interpret	the	categories	as	ordered	integers	(mostly	
wrong)
• Python	scikit-learn:	LabelEncoder
• Ok	for	tree-based	methods
• One	Hot	Encoding
• Transform	categories	into	individual	binary	(0	or	1)	
features
• Python	scikit-learn:	DictVectorizer,	OneHotEncoder
• Ok	for	K-means,	Linear,	NNs,	etc.
6x	
Categorical	Encoding
• Labeled	Encoding
Feature 1 Encoded Feature 1
A 0
A 0
A 0
A 0
B 1
B 1
B 1
C 2
C 2
A 0
B 1
C 2
6x	
Categorical	Encoding
6x	
• One	Hot	Encoding
Feature Feature = A Feature = B Feature = C
A 1 0 0
A 1 0 0
A 1 0 0
A 1 0 0
B 0 1 0
B 0 1 0
B 0 1 0
C 0 0 1
C 0 0 1
A 1 0 0
B 0 1 0
C 0 0 1
Categorical	Encoding
6x	
• Frequency	Encoding
• Encoding	of	categorical	levels	of	feature	to	values	
between	0	and	1	based	on	their	relative	frequency
Feature Encoded Feature
A 0.44
A 0.44
A 0.44
A 0.44
B 0.33
B 0.33
B 0.33
C 0.22
C 0.22
A 0.44 (4 out of 9)
B 0.33 (3 out of 9)
C 0.22 (2 out of 9)
Categorcial Encoding	- Target	mean	encoding
6x	
• Instead	of	dummy	encoding	of	categorical	variables	and	increasing	the	
number	of	features	we	can	encode	each	level	as	the	mean	of	the	response.
Feature Outcome MeanEncode
A 1 0.75
A 0 0.75
A 1 0.75
A 1 0.75
B 1 0.66
B 1 0.66
B 0 0.66
C 1 1.00
C 1 1.00
A 0.75 (3 out of 4)
B 0.66 (2 out of 3)
C 1.00 (2 out of 2)
Categorical	Encoding	- Target	mean	encoding
6x	
• Also	it	is	better	to	calculate	weighted	average	of	the	
overall	mean	of	the	training	set	and	the	mean	of	the	
level:
𝜆 𝑛 ∗ 𝑚𝑒𝑎𝑛 𝑙𝑒𝑣𝑒𝑙 + 1 − 𝜆 𝑛 ∗ 𝑚𝑒𝑎𝑛(𝑑𝑎𝑡𝑎𝑠𝑒𝑡)
• The	weights	are	based	on	the	frequency	of	the	levels	i.e.	
if	a	category	only	appears	a	few	times	in	the	dataset	
then	its	encoded	value	will	be	close	to	the	overall	mean	
instead	of	the	mean	of	that	level.
Categorical	Encoding	– Target	mean	encoding		 𝜆 𝑛 example
6x	
x = frequency
k = inflection
point
f = steepness
https://www.desmos.com/calculator
Categorical	Encoding	- Target	mean	encoding	- Smoothing
6x	
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
x level dataset 𝜆
A 4 0.75 0.77 0.99 0.99*0.75 + 0.01*0.77 = 0.7502
B 3 0.66 0.77 0.98 0.98*0.66 + 0.02*0.77 = 0.6622
C 2 1.00 0.77 0.5 0.5*1.0 + 0.5*0.77 = 0.885
𝜆 =	
1
1 + exp	(−
𝑥 − 2
0.25
)
x level dataset 𝜆
A 4 0.75 0.77 0.98 0.98*0.75 + 0.01*0.77 = 0.7427
B 3 0.66 0.77 0.5 0.5*0.66 + 0.5*0.77 = 0.715
C 2 1.00 0.77 0.017 0.017*1.0 + 0.983*0.77 = 0.773
𝜆 =	
1
1 + exp	(−
𝑥 − 3
0.25
)
Categorical	Encoding	- Target	mean	encoding
6x	
• Instead	of	dummy	encoding	of	categorical	variables	and	increasing	the	
number	of	features	we	can	encode	each	level	as	the	mean	of	the	response.
Feature Outcome MeanEncode
A 1 0.75
A 0 0.75
A 1 0.75
A 1 0.75
B 1 0.66
B 1 0.66
B 0 0.66
C 1 1.00
C 1 1.00
A 0.75 (3 out of 4)
B 0.66 (2 out of 3)
C 1.00 (2 out of 2)
Categorical	Encoding	- Target	mean	encoding
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
0.50
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
0.50
0.50
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
0.50
0.50
1.00
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
0.50
0.50
1.00
1.00
Categorical	Encoding	- Target	mean	encoding
6x	
• To	avoid	overfitting	we	could	use	leave-one-out	schema
Feature Outcome
A 1
A 0
A 1
A 1
B 1
B 1
B 0
C 1
C 1
LOOencode
0.66
1.00
0.66
0.66
0.50
0.50
1.00
1.00
1.00
Categorical	Encoding	– Numerical	features
6x	
• Binning	using	quantiles	(population	of	the	same	size	in	each	bin)	or	
histograms	(bins	of	same	size)
• Replace	with	bin’s	mean	or	median
• Treat	bin	id	as	a	category	level	and	use	any	categorical	encoding	schema
• Dimensionality	reduction	techniques	– SVD	and	PCA
• Clustering	and	using	cluster	IDs	or/and	distances	to	cluster	centers	as	
new	features
Target	Transformation
6x	
• Predictor/Response	Variable	Transformation
• Use	it	when	variable	shows	a	skewed	distribution	make	
the	residuals	more	close	to	“normal	distribution”	(bell	
curve)
• Can	improve	model	fit
• log(x),	log(x+1),	sqrt(x),	sqrt(x+1),	etc.
Target	Transformation	
In	Liberty	Mutual	Group:	Property	Inspection	Prediction	Competition
6x	
Different	transformations	might	lead	to	different	results
Feature	Interaction
6x	
• 𝑦 =	 𝑥=
> − 𝑥>
> + 𝑥> − 1
Adding	𝑥=
>
and	𝑥>
>
as	new	features
Feature	Interaction	– how	to	find?
6x	
• Domain	knowledge
• ML	algorithm	behavior	(for	example	analyzing	GBM	splits	or	linear	
regressor weights)
Feature	Interaction	– how	to	model?
• Math	operations	(sum,	div,	mul,	sub)
• Clustering	and	kNN for	numerical	features
• Target	encoding	for	pairs	(or	even	triplets	and	etc.)	of	categorical	
features
• Encode	categorical	features	by	stats	of	numerical	features
Thank	you!
Q&A

More Related Content

What's hot

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Justin Basilico
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
NAVER Engineering
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
ivaderivader
 
Classification metrics
Classification metrics Classification metrics
Classification metrics
SPb_Data_Science
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
Dr. Jyoti Obia
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
Gabriel Cypriano Saca
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
Harald Steck
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
Balázs Hidasi
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
Houw Liong The
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
Mohammad Junaid Khan
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Gabriel Moreira
 
Statistics vs machine learning
Statistics vs machine learningStatistics vs machine learning
Statistics vs machine learning
Tom Dierickx
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
Akira Shibata
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Eugene Yan Ziyou
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
Crossing Minds
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Liad Magen
 

What's hot (20)

Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기
 
Recommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking SystemRecommending What Video to Watch Next: A Multitask Ranking System
Recommending What Video to Watch Next: A Multitask Ranking System
 
Classification metrics
Classification metrics Classification metrics
Classification metrics
 
Genetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuningGenetic algorithm for hyperparameter tuning
Genetic algorithm for hyperparameter tuning
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Calibrated Recommendations
Calibrated RecommendationsCalibrated Recommendations
Calibrated Recommendations
 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
 
Capter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & KamberCapter10 cluster basic : Han & Kamber
Capter10 cluster basic : Han & Kamber
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Statistics vs machine learning
Statistics vs machine learningStatistics vs machine learning
Statistics vs machine learning
 
Kaggle and data science
Kaggle and data scienceKaggle and data science
Kaggle and data science
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Recommendation System Explained
Recommendation System ExplainedRecommendation System Explained
Recommendation System Explained
 
Introduction to Graph neural networks @ Vienna Deep Learning meetup
Introduction to Graph neural networks @  Vienna Deep Learning meetupIntroduction to Graph neural networks @  Vienna Deep Learning meetup
Introduction to Graph neural networks @ Vienna Deep Learning meetup
 

Similar to Feature Engineering for ML - Dmitry Larko, H2O.ai

Feature Engineering Hands-On by Dmitry Larko
Feature Engineering Hands-On by Dmitry LarkoFeature Engineering Hands-On by Dmitry Larko
Feature Engineering Hands-On by Dmitry Larko
Sri Ambati
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Daniel Lewis
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
AdaCore
 
Xgboost
XgboostXgboost
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
Spark Summit
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
tdc-globalcode
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
Andrey Karpov
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
Ml5
Ml5Ml5
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
Garrett Teoh Hor Keong
 
IEEE DSP Workshop 2011
IEEE DSP Workshop 2011IEEE DSP Workshop 2011
IEEE DSP Workshop 2011
Behzad Dogahe
 
IEEE DSP Workshop 2011
IEEE DSP Workshop 2011IEEE DSP Workshop 2011
IEEE DSP Workshop 2011
dogahe
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
Wei-Bo Chen
 
Deep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationDeep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image Generation
TJ Torres
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
Wataru Hirota
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践
ghostsun
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problems
MongoDB
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Big Data Spain
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
Marijn van Zelst
 

Similar to Feature Engineering for ML - Dmitry Larko, H2O.ai (20)

Feature Engineering Hands-On by Dmitry Larko
Feature Engineering Hands-On by Dmitry LarkoFeature Engineering Hands-On by Dmitry Larko
Feature Engineering Hands-On by Dmitry Larko
 
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
Piotr Mirowski - Review Autoencoders (Deep Learning) - CIUUK14
 
SPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic librarySPARKNaCl: A verified, fast cryptographic library
SPARKNaCl: A verified, fast cryptographic library
 
Xgboost
XgboostXgboost
Xgboost
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
TDC2017 | São Paulo - Trilha Java EE How we figured out we had a SRE team at ...
 
The operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzerThe operation principles of PVS-Studio static code analyzer
The operation principles of PVS-Studio static code analyzer
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
Eye deep
Eye deepEye deep
Eye deep
 
Ml5
Ml5Ml5
Ml5
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
IEEE DSP Workshop 2011
IEEE DSP Workshop 2011IEEE DSP Workshop 2011
IEEE DSP Workshop 2011
 
IEEE DSP Workshop 2011
IEEE DSP Workshop 2011IEEE DSP Workshop 2011
IEEE DSP Workshop 2011
 
Klee and angr
Klee and angrKlee and angr
Klee and angr
 
Deep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image GenerationDeep Style: Using Variational Auto-encoders for Image Generation
Deep Style: Using Variational Auto-encoders for Image Generation
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
My sql查询优化实践
My sql查询优化实践My sql查询优化实践
My sql查询优化实践
 
Mongodb debugging-performance-problems
Mongodb debugging-performance-problemsMongodb debugging-performance-problems
Mongodb debugging-performance-problems
 
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
Deep Learning in Spark with BigDL by Petar Zecevic at Big Data Spain 2017
 
Machine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspectiveMachine Learning from a Software Engineer's perspective
Machine Learning from a Software Engineer's perspective
 

More from Sri Ambati

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
Sri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
Sri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
Sri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
Sri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
Sri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
Sri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
Sri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
Sri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
Sri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
Sri Ambati
 

More from Sri Ambati (20)

GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 

Recently uploaded

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 

Recently uploaded (20)

Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 

Feature Engineering for ML - Dmitry Larko, H2O.ai