SlideShare a Scribd company logo
Comparing	classifiers		
for	predicting		
wine	quality	
Project	Techniques	of	Artificial	Intelligence	
Laurent	Declercq	
May	2016
This	 study	 aims	 to	 highlight	 the	 differences	 between	 two	 classification	 algorithms.	 Both	
algorithms	are	applied	on	a	dataset	with	white	Portuguese	wine.	They	are	used	to	classify	
human	wine	taste	preferences,	based	on	physicochemical	properties.	The	algorithms	studied	
here	are	J48	(C4.5)	and	AdaBoostM1(J48).	AdaBoostM1(J48)	starts	with	the	basic	J48	but	
attempts	to	improve	it.	That	is	why	it	is	interesting	to	see	how	it	performs	on	a	real-world	
application.	Since	AdaBoostM1(J48)	adds	complexity,	a	tradeoff	should	be	made	based	on	its	
added	value	and	its	added	cost.	This	study	will	provide	clarity	on	its	added	value	in	classifying	
red	Portuguese	wine.
Table	of	Contents	
Abstract	.............................................................................................................................	I	
Table	of	figures	................................................................................................................	III	
1.	 Introduction	...............................................................................................................	1	
2.	 Dataset	......................................................................................................................	2	
2.1.	 Attributes	......................................................................................................................	2	
2.2.	 Class:	Quality	.................................................................................................................	3	
3.	 Algorithms	.................................................................................................................	4	
3.1.	 ZeroR	.............................................................................................................................	4	
3.2.	 J48	(C4.5)	.......................................................................................................................	4	
3.2.1.	 Improvements	over	ID3	................................................................................................	5	
3.3.	 AdaBoost	(J48)	...............................................................................................................	6	
4.	 Hypothesis	.................................................................................................................	9	
5.	 Methodology	.............................................................................................................	9	
5.1.	 Preprocessing	................................................................................................................	9	
5.1.1.	 Problem:	Numeric	class	................................................................................................	9	
5.1.2.	 Correctness	of	data	......................................................................................................	9	
5.1.3.	 Problem:	Imbalanced	dataset	....................................................................................	11	
5.1.4.	 Normalization	.............................................................................................................	14	
5.1.5.	 Feature	Selection	.......................................................................................................	14	
5.2.	 Training	and	Test	set	split	............................................................................................	18	
5.3.	 Comparing	algorithms	..................................................................................................	18	
5.3.1.	 Optimizing	J48	............................................................................................................	18	
5.3.2.	 Optimizing	AdaBoost	..................................................................................................	19	
5.3.3.	 Experimental	setup	....................................................................................................	20	
6.	 Results	&	Interpretation	..........................................................................................	21	
7.	 Conclusion	...............................................................................................................	22	
8.	 Limitations	...............................................................................................................	23	
8.1.	 Outlier	detection	..........................................................................................................	23	
8.2.	 Oversampling	...............................................................................................................	23	
8.3.	 Optimizing	J48	.............................................................................................................	23	
9.	 References	...............................................................................................................	24	
10.	 Appendix	.............................................................................................................	25	
10.1.	 Appendix	A	:	Outlier	detection	.....................................................................................	25	
10.2.	 Appendix	B	:	Algorithm	comparison	.............................................................................	27
Table	of	figures	
Figure	1	:	Pseudo-code	C4.5	three-constructing	algorithm	.....................................................	5	
Figure	2	:	Post-pruning	-	subtree	raising	..................................................................................	6	
Figure	3	:	Post-pruning	-	subtree		replacement	.......................................................................	6	
Figure	4	:	Principle	of	ensemble	learning,	here	with	3	Machine	Learning	algorithms	(ML)	.....	6	
Figure	5	:	Outliers	for	attribute	'fixed	acidity'	........................................................................	10	
Figure	6	:	Performance	difference	deleting	outliers	..............................................................	11	
Figure	7	:	Effect	of	balancing	dataset	.....................................................................................	12	
Figure	8	:	accuracy	improvements	by	balancing	dataset	.......................................................	13	
Figure	9	:	weighted	average	recall	.........................................................................................	13	
Figure	10	:	weighted	average	precision	.................................................................................	13	
Figure	11	:	Weighted	average	F-measure	..............................................................................	14	
Figure	12	:	Effect	of	normalising	numeric	attributes	..............................................................	14	
Figure	13	:	Filter	and	wrapper	strategies	...............................................................................	15	
Figure	14	:	Relation	of	‘residual	sugar’	to	‘density’	shows	high	correlation	...........................	16	
Figure	15	:	Components	of	CFS	..............................................................................................	17	
Figure	16	:	Results	of	Attribute	Selection	with	10	fold	cross-validation	................................	17	
Figure	17	:	Accuracy		before	and	after	deleting	'residual	sugar'	............................................	18	
Figure	18	:	CPU	time	training		before	and	after	deleting	'residual	sugar'	..............................	18	
Figure	19	:	Adjusting	minNumObj	parameter	for	J48	............................................................	19	
Figure	20	:	Accuracy	as	function	of	the	iterations	in	AdaBoost	.............................................	20	
Figure	21	:	Experimental	setup	in	Weka	KnowledgeFlow	......................................................	20	
Table	1	:	Attribute	characteristics	white	wine	dataset	.............................................................	2	
Table	2	:	Outlier	detection	.....................................................................................................	10	
Table	3	:	Balancing	dataset	using	SMOTE	filter	......................................................................	12	
Table	4	:	Experimental	results	................................................................................................	21	
Ap.	Figure	1	:	Outliers	with	attribute	'citric	acid'	...................................................................	25	
Ap.	Figure	2	:	Outliers	with	attribute	'residual	sugar'	............................................................	25	
Ap.	Figure	3	:	Outliers	with	attribute	'chlorides'	....................................................................	25	
Ap.	Figure	4	:	Outliers	with	attribute	'Free	Sulfur	Dioxide'	....................................................	26	
Ap.	Figure	5	:	Outliers	with	attribute	'Total	Sulfur	Dioxide'	...................................................	26	
Ap.	Figure	6	:	Outliers	with	attribute	'alcohol'	.......................................................................	26	
Ap.	Figure	7	:	Confusion	matrix	ZeroR	....................................................................................	27	
Ap.	Figure	8	:	Confusion	matrix	J48	........................................................................................	27	
Ap.	Figure	9	:	Confusion	matrix	AdaBoostM1(J48)	.................................................................	27
1. Introduction	
The	classification	task	is	a	form	of	supervised	learning	where	a	dataset,	labeled	with	the	right	
class,	is	first	specified.	A	classifier	is	trained	on	this	set	with	the	goal	of	generalizing	its	model	
to	 other	 data.	 Different	 classifiers	 (learning	 algorithms)	 exist	 for	 this	 task	 with	 their	
advantages	and	disadvantages.	Increasingly	complex	algorithms	combined	with	increasing	
accuracy	have	been	developed.	Computational	power	has	increased	strongly	over	the	years.	
Therefore,	much	attention	typically	goes	to	accuracy	of	an	algorithm.	Currently,	many	smart	
mobile	 devices	 and	 services	 are	 developed.	 However,	 for	 these	 less	 powerful,	 mobile	
applications,	the	context	is	different	and	power	and	efficiency	become	important.		
The	algorithms	under	study	here	are	J48	and	AdaBoostM1(J48).	J48	is	a	relatively	simple	tree-
building	 algorithm,	 very	 popular	 thanks	 to	 its	 great	 performance	 and	 its	 understandable	
models.	AdaBoost,	attempts	to	boost	the	performance	of	an	underlying	algorithm,	which	in	
this	case	is	J48.	This	boosting	comes	at	the	cost	of	increased	complexity.	For	the	average	
dataset,	AdaBoost	has	shown	better	performance	at	the	cost	of	computational	power	(J	R	
Quinlan,	2006).	For	it	to	be	relevant,	the	gain	should	outweigh	the	cost,	which	of	course	
depends	on	the	context.	
The	study	is	organized	as	follows:	In	section	2	the	dataset	and	its	attributes	are	discussed,	
followed	by	the	algorithms	that	are	going	to	be	applied	on	the	dataset.	Section	5.1	then	
discusses	 the	 preprocessing	 steps	 the	 dataset	 underwent	 in	 order	 to	 increase	 the	
performance	of	both	algorithms.	Once	applied,	the	results	are	followed	by	a	brief	discussion	
in	section	6.
2. Dataset	
The	dataset	talks	about	wine.	More	precisely	it	contains	both	physicochemical	properties	and	
sensory	data	of	red	and	white	Portuguese	wine	(vinho	verde).	It	was	collected	between	May	
2004	and	February	2007	(Cortez,	Cerdeira,	Almeida,	Matos,	&	Reis,	2009).	to	study	three	
regression	techniques,	support	vector	machine,	multiple	regression	and	neural	networks.	
Since	the	sensory	data	of	both	types	of	wine	are	based	on	a	completely	different	taste,	the	
authors	 decided	 to	 split	 the	 dataset	 in	 two,	 a	 red	 dataset	 and	 a	 white	 dataset.	 The	
experiments	below	are	based	solely	on	the	white	dataset,	since	the	goal	is	not	to	compare	
the	results	of	both	types	of	wine,	but	rather	to	compare	different	algorithms.	For	the	white	
dataset,	4898	instances	where	collected	with	12	attributes	each.		
2.1. Attributes	
There	are	eleven	physicochemical	properties	recorded:	fixed	acidity,	volatile	acidity,	citric	
acid,	residual	sugar,	chlorides,	free	sulfur	dioxide,	total	sulfur	dioxide,	density,	pH,	sulphates	
and	alcohol.	
			Table	1	:	Attribute	characteristics	white	wine	dataset	
Acidity	 of	 the	 wine	 protects	 the	 wine	 from	 bacteria	 during	 the	 fermentation	 process.	 A	
distinction	should	be	made	between	the	amount	of	acidity	and	the	strength	of	the	acidity.	
The	amount	is	measured	in	g/l,	whereas	the	strength	is	measured	in	pH.	Most	wines	show	
pH-values	between	2,9	and	3,9.	The	higher	the	wine’s	acidity,	the	lower	the	pH	value.	
Three	main	acids	can	be	found	in	wine	grapes:	tartaric,	malic	and	citric	acid.	These	are	fixed	
acids	that	contribute	to	the	quality	of	the	wine,	the	distinct	taste-shaping	during	winemaking	
and	the	aging	of	the	wine.	Tartaric	acid	is	important	for	maintaining	the	wine’s	chemical	
stability	and	color.	Its	concentration	in	the	wine	grape	varies	with	the	soil	and	grape	type.	
Malic	acid	is	not	measured	in	the	dataset.	Next	to	that,	citric	acid	is	also	present,	but	in	much	
smaller	concentrations	(1/20	of	tartaric	acid).	It	can	be	found	in	many	citrus	fruits	and	offsets	
a	strong	citric	taste.	Extra	citric	acid	can	be	added	but	with	caution.	Namely,	certain	bacteria	
Min Max Mean StdDev
Fixed acidity ( g(tartaric acid)/l ) 3,80 14,20 6,86 0,84
Volatile acidity ( g(acetic acid)/l ) 0,08 1,10 0,28 0,10
Citric acid (g/l) 0,00 1,66 0,33 0,12
Residual sugar (g/l) 0,60 65,80 6,39 5,07
Chlorides ( g(sodium chloride)/l ) 0,01 0,35 0,05 0,02
Free sulfur dioxide (mg/l) 2,00 289,00 35,31 17,01
Total sulfur dioxide (mg/l) 9,00 440,00 138,36 42,50
Density (g/ml) 0,99 1,04 0,99 0,00
pH 2,72 3,82 3,19 0,15
Sulphates( g(potassium sulphate)/l ) 0,22 1,08 0,49 0,11
Alcohol (vol.%) 8,00 14,20 10,51 1,23
can	convert	citric	acid	into	acetic	acid	(Beelman	&	Gallander,	1979).	Acetic	acid,	as	opposed	
to	the	fixed	acids	mentioned	above,	is	a	volatile	acid	that	can	be	found	in	vinegar	(Drysdale	&	
Fleet,	1988).	It	is	produced	by	bacteria	that	contribute	to	spoilage	and	has	a	low	sensory	
threshold.	When	the	amount	surpasses	700	mg/l	it	can	already	be	sensed.	Concentrations	
higher	than	1,2	g/l	generally	lead	to	an	unpleasant	taste.	Between	these	limits	however,	high	
levels	of	acetic	acid	can	sometimes	be	found	in	higher	quality	wines	in	which	it	is	said	to	create	
a	more	complex	taste.	
Sugar	counteracts	the	sourness	of	acids.	Residual	sugar	is	the	sugar	level,	measured	after	
fermentation.	The	sweetness	of	a	wine	is	clearly	a	function	of	the	residual	sugar.	However,	a	
wine	with	high	residual	sugar	levels	can	still	taste	‘dry’	when	acidity	levels	are	elevated	and	
alcohol	 concentration	 is	 low.	 It	 is	 the	 winemaker’s	 quest	 to	 find	 a	 harmonious	 balance	
between	these	elements.	
Sodium	chloride	is	known	to	be	major	contributor	to	saltiness.	These	levels	depend	on	the	
wine	grape	type	and	ultimately	on	the	soil	where	they	are	grown	(Coli	et	al.,	2015).	High	levels	
of	sodium	chloride	are	an	unwanted	characteristic	and	are	in	many	countries	limited	by	law.	
Sulfur	dioxide	is	a	by-product	of	fermentation.	Since	1487,	it	is	common	to	add	additional	
sulfur	dioxide	to	the	wine	in	order	to	preserve	longer	thanks	to	its	anti-oxidant	and	anti-
microbial	properties	(Robinson,	1994).	Part	of	the	sulfur	dioxide	binds	with	other	compounds.	
It	 is	 the	 free	 SO2	 that	 protects	 the	 wine	 from	 oxidizing	 and	 browning	 before	 and	 during	
fermentation.	When,	however,	too	much	of	it	is	added,	the	wine	taste	deteriorates	since	SO2	
can	prematurely	stop	fermentation.	Less	SO2	is	needed	when	the	wine	has	a	low	pH	value	and	
high	alcohol	percentage.	Potassium	sulphate	is	added	for	the	same	goals.	
The	density	of	wine	is	close	to	that	of	water.	Dry	wine	usually	has	a	low	density	whereas	sweet	
wine	is	denser.	The	density	of	water	is	1	kg/l,	of	ethanol	0,789	kg/l,	and	of	sugar	1,587	kg/l.	
This	means	that	wine	has	a	density	around	0,97	to	1,2	kg/l.	
Alcohol	(ethanol)	is	produced	during	fermentation.	It	influences	the	fullness	and	roundness	
of	the	wine	taste.	Its	levels	are	influenced	by	the	ripeness	of	the	wine	grapes	at	the	time	of	
2.2. Class:	Quality	
The	wine	quality	is	calculated	as	the	median	of	at	least	3	evaluations	made	by	different	wine	
experts.	Each	evaluation	ranges	from	0	(very	bad)	to	10	(very	excellent).	In	the	dataset,	scores	
were	distributed	between	3	and	9,	meaning	that	no	very	bad	nor	very	excellent	wines	were	
3. Algorithms	
Three	 algorithms	 where	 chosen	 based	 on	 their	 interesting	 aspects.	 First	 of	 all,	 ZeroR	 is	
included	 as	 a	 baseline	 algorithm	 compared	 to	 which	 any	 smart	 algorithm	 should	 be	
performing	 better	 in	 order	 to	 be	 useful.	 The	 J48	 is	 included	 as	 a	 tree-building	 algorithm	
because	trees	are	easy	to	understand	and	to	model.	A	disadvantage	of	trees	is	that	they	are	
prone	to	overfitting.	To	improve	the	J48	algorithm,	another	algorithm	called	AdaBoostM1	is	
included	which	is	kind	of	a	meta-algorithm	that	has	to	be	used	with	another	algorithm.	Here	
it	is	combined	with	J48	to	test	its	improvements.		
3.1. ZeroR	
Commonly	referred	to	as	the	baseline	algorithm,	ZeroR	is	the	simplest	classification	method.	
It	is	based	on	a	frequency	table	in	which	it	looks	at	the	majority	class	and	predicts	this	class	
all	the	time.	It	thus	simply	relies	on	the	class	attribute	and	ignores	all	other	attributes.	ZeroR	
has	no	predictability	power	but	is	useful	to	benchmark	with	other	classification	methods.	
3.2. J48	(C4.5)	
J48	is	an	open-source	java	implementation	of	the	C4.5	algorithm.	C4.5	is	a	widely	popular	
statistical	 classifier	 that	 generates	 trees	 using	 the	 concept	 of	 information	 gain.	 It	 is	 an	
improvement	over	the	ID3	algorithm,	earlier	developed	by	Ross	Quinlan	(J	Ross	Quinlan,	
Basically,	it	generates	a	decision	tree	in	a	top-down	manner.	Using	a	training	set,	at	each	stage	
it	will	use	a	greedy	approach	to	look	for	the	attribute	that	best	splits	the	set	into	subsets.	Let	
T	be	the	set	of	instances	associated	with	a	stage.	To	test	this,	each	attribute	is	evaluated	
separately	on	T	using	a	metric	called	information	gain	(more	correctly,	a	gain	ratio	is	used,	
which	is	the	attribute’s	information	gain	divided	by	its	entropy).	This	in	turn	is	derived	from	
entropy.	The	attribute	providing	the	highest	information	gain,	is	selected	as	node	in	the	tree.	
For	each	subset,	this	process	is	repeated	until	the	subset	contains	only	samples	from	the	same	
class	or	until	the	minimum	number	of	leaf	objects	is	reached.	A	simplified	pseudo-code	of	the	
tree-construction	algorithm	from	C4.5	is	included	below	(Salvatore,	2000).
3.2.1. Improvements	over	ID3	
C4.5	improves	ID3	in	a	number	of	ways.	Whereas	ID3	could	only	handle	discrete	and	nominal	
attributes,	C4.5	can	also	handle	continuous	attributes.	It	therefore	splits	the	set	by	going	
through	every	possible	split	point	of	the	attribute	(in	fact,	entropy	only	needs	to	be	evaluated	
between	points	of	different	classes).	It	then	chooses	the	split	point	that	provides	the	highest	
information	gain.	It	is	still	incapable	of	handling	a	numeric	class.	Only	missing	class	values,	
binary	class	or	nominal	class	values	are	allowed.	Next	to	that,	missing	values	are	handled	in	
the	entropy	calculations,	by	treating	them	as	a	separate	value.	Another	advantage	of	C4.5	is	
that	it	allows	to	create	context-independent	rules	from	trees.		
ID3	is	very	prone	to	overfitting	the	training	data.	This	is	the	case	when,	given	hypothesis	space	
H	and	your	hypothesis	h	∈	H,	there	exists	another	hypothesis	h’	∈	H	that	performs	worse	on	
the	training	data	than	h	∈	H	but	better	on	the	entire	population.	ID3	reduces	this	issue	slightly	
by	performing	pre-pruning.	Only	statistically	significant	attributes	are	allowed	to	be	selected	
by	 the	 information	 gain	 procedure.	 To	 determine	 this,	 it	 applies	 the	 chi-squared	 test.	
However,	 to	 better	 solve	 the	 issue	 of	 overfitting,	 C4.5	 allows	 for	 post-pruning,	 which	 is	
backtracking	the	decisions	and	removing	useless	branches	by	replacing	them	with	leaf	nodes.	
Another	way	it	does	this	is	by	subtree	raising.	Both	strategies	can	be	seen	in	the	figures	below.	
Pruning	also	reduces	complexity	and	can	remove	parts	of	the	classifier	that	are	based	on	noisy	
Figure	1	:	Pseudo-code	C4.5	three-constructing	algorithm
3.3. AdaBoost	(J48)	
AdaBoost	is,	as	the	name	shows,	a	boosting	method.	Boosting	is	a	form	of	ensemble	learning	
that	iteratively	forces	new	classifiers	to	focus	on	the	errors	produced	by	the	earlier	ones.	With	
ensemble	 learning,	 a	 committee	 of	 classifier	 algorithms	 is	 established.	 All	 committee	
members	get	a	vote	and	the	overall	decision	is	made	based	on	the	majority	vote.	By	making	
a	decision	based	on	a	committee	of	‘simple’	learning	algorithms,	the	stability	of	that	decision	
improves.	Therefore,	AdaBoost	may	lead	to	a	classifier	whose	variance	is	significantly	lower	
than	those	of	the	original	algorithm.	Next	to	that,	boosting	has	been	found	to	improve	bias	
(Kong	&	Dietterich,	1995).	
Figure	2	:	Post-pruning	-	subtree	raising	
Figure	3	:	Post-pruning	-	subtree		replacement	
Figure	4	:	Principle	of	ensemble	learning,	here	with	3	Machine	Learning	algorithms	(ML)
AdaBoost	boosts	the	performance	of	the	original	learning	algorithm	on	the	training	data.	To	
do	this,	it	iteratively	uses	a	weighted	training	set	(Freund	&	Schapire,	1996;	J	R	Quinlan,	2006;	
Schapire,	2013).	Each	instance	weight	wj	reflects	the	importance	of	the	respective	instance	
and	starts	at	value	1	for	all	instances.	The	first	hypothesis	is	made,	based	on	this	set.	An	error	
is	calculated	as	the	sum	of	all	weights	from	the	misclassified	instances.	All	correctly	classified	
instances	 are	 then	 given	 lower	 weights.	 A	 new	 hypothesis	 is	 generated	 using	 these	 new	
weights.	 This	 process	 is	 repeated	 until	 there	 are	 T	 hypotheses,	 with	 T	 being	 an	 input	 to	
AdaBoost.	All	hypotheses	can	be	seen	as	committee	members	with	the	weights	of	their	votes	
z	being	a	function	of	their	accuracy	on	the	training	set.	The	final	hypothesis	is	based	on	the	
majority	of	the	weighted	votes.	The	algorithm	actually	sums	the	votes	taking	into	account	
their	weights.	Pseudo-code	for	this	algorithm	can	be	seen	below.		
For	clarification	purposes	the	normalization	steps	and	the	steps	to	calculate	the	boosted	
classifier	are	discarded.	Also,	the	edge	cases	where	the	error	equals	0	or	exceeds	0.5	are	not	
mentioned	 in	 the	 code.	 However,	 they	 require	 different	 steps.	 When	 there	 is	 no	 error,	
obviously,	no	extra	trials	should	be	performed	and	T	should	be	set	at	t.	The	error	rate	of	the	
boosted	algorithm	should	approach	0	as	T	increases.	This	is	only	the	case	when	the	error	rate	
of	the	trials	is	below	0.5.	Therefore,	when	error	>	0.5,	the	trials	should	be	ended	and	T	should	
be	replaced	by	t-1.	AdaBoost	thus	makes	the	assumption	that	the	simple	classifiers	perform	
better	than	pure	guessing.	This	is	noted	as	the	weak	learning	condition.		
function	AdaBoost(examples,	L,	T)	returns	a	weighted-majority	hypothesis	
inputs:	examples,	set	of	N	labeled	examples	(x1,	y1),…,(xN,yN)	
	 	 L,	a	‘simple’	learning	algorithm	
	 	 T,	the	number	of	hypotheses	(trials	/	iterations)	in	the	ensemble	
	 local	variables:		w,	a	vector	of	N	example	weights	
	 h,	a	vector	of	T	hypotheses	
	 z,	a	vector	of	T	hypothesis	weights	
for	n	=	1	to	N	do	
	 w[n]	ß	1/N	
for	t	=	1	to	T	do	
	 h[k]	ß	L(examples,	w)	
	 error	ß	0	
	 for	n	=	1	to	N	do	
	 	 if	h[t](xn)	≠	yn	then	error	ß	error	+	w[n]	
	 for	n	=	1	to	N	do	
	 	 if	h[t](xn)	=	yn	then	w[n]	ß	w[n]	.	error	/	(1	–	error)	
	 w	ß	NORMALIZE(w)	(so	that	the	sum	of		
	 z[t]	ß	log(1	–	error)/error	
Although	the	robustness	against	overfitting	is	a	clear	advantage	from	AdaBoost	over	C4.5,	it	
is	found	that	AdaBoost,	when	run	for	a	large	number	of	iterations	is	still	prone	to	overfitting	
(Bartlett,	2007).		Luckily,	by	setting	a	stopping	rule,	AdaBoost	will	perform	consistently	accross	
datasets.	This	stopping	rule	limits	the	number	of	iterations	t	by	making	it	a	fixed	function	of	
4. Hypothesis	
AdaBoostM1	should	perform	better	than	J48	on	the	dataset,	given	that	it	takes	the	simpler	
J48	algorithm	and	iteratively	applies	it,	focusing	on	the	previously-made	errors.	The	accuracy	
and	bias	of	AdaBoost	is	expected	to	be	significantly	better	than	J48.	Voting	will	also	lead	to	
more	stable	behavior.	Therefore,	we	also	expect	AdaBoost	to	show	reduced	variance.	
5. Methodology	
First,	the	correctness	of	the	data	needs	to	be	checked.	Specifically,	outliers	and	missing	values	
in	the	dataset	are	of	interest.	After	that,	the	data	is	preprocessed	further	and	the	necessary	
parameters	of	the	two	interesting	algorithms	are	adjusted.	Next,	the	Weka	explorer	is	used	
to	split	our	data	into	a	training	set	and	a	test	set.	After	that,	the	algorithms	are	trained	on	the	
training	set	and	the	performance	on	the	test	set	is	checked.	
5.1. Preprocessing	
5.1.1. Problem:	Numeric	class	
The	class	variable	‘quality’	is	numeric.	This	puts	a	limit	to	what	classifying	algorithms	can	be	
used.	 Our	 baseline	 algorithm	 ZeroR	 can	 be	 applied	 but	 decision	 trees	 with	 J48	 and	
AdaBoostM1(J48)	cannot	be	built	since	J48	cannot	handle	numeric	class	values,	only	missing	
class	 values,	 binary	 class	 and	 nominal	 class	 values.	 Therefore,	 a	 filter	 called	
NumericToNominal	was	applied	to	convert	‘quality’	to	a	nominal	attribute.	
5.1.2. Correctness	of	data	
There	are	no	missing	values	in	the	dataset.	Possible	outliers	could	be	visually	detected	by	
checking	the	table	above,	combined	with	scatter	plots	and	knowledge	on	the	topic.	That	is	
partly	why	the	importance	of	the	different	attributes	is	explained	earlier.	A	more	theoretical	
approach	could	have	been	used,	cf.	the	paragraph	on	limitations	(Cousineau,	2009).	However,	
this	visual	method	is	chosen	in	order	to	avoid	a	normality	assumption	from	the	start	and	also	
to	limit	the	number	of	instances	to	be	removed.	Outliers	are	problematic	since	they	can	skew	
the	 results	 (Cousineau,	 2009).	 Namely,	 it	 might	 alter	 the	 mean,	 the	 variance	 and	 other	
metrics.	A	resulting	larger	variance	might	lead	to	insignificant	results	whereas	in	reality	the	
results	 should	 be	 significant.	 When	 the	 model	 is	 based	 on	 skewed	 data,	 results	 are	
suboptimal.	When	the	number	of	outliers	is	small	compared	to	the	sample,	these	instances	
should	be	removed.	
The	attribute	‘fixed	acidity’	shows	two	outliers	of	both	11,8	g/l	and	14,2	g/l.	Although	these	
values	are	possible,	they	greatly	disturb	the	distribution,	since	all	other	values	range	from	3,8	
to	10,7.	We	therefore	remove	both	instances	using	the	RemoveRange	filter.	The	figure	below	
shows	the	big	distance	from	the	other	instances.	By	removing	them,	the	standard	deviation	
lowered	from	0,844	to	0,834.
‘citric	acid’	shows	a	similar	problem.	Two	instances	are	outliers	to	the	general	distribution,	
namely	instance	746	with	citric	acid	level	of	1,66	and	instance	3151	with	value	of	1,23.	These	
numbers	are	very	high	compared	to	the	third	highest	level	measured,	namely	1.	By	removing	
these	two	instances,	the	standard	deviation	is	lowered	from	0,121	to	0,119.	The	graph	of	this	
attribute,	together	with	any	other	attribute	where	outliers	are	detected,	can	be	found	in	
appendix	A.	
Looking	 at	 ‘residual	 sugar’,	 three	 instances	 with	 very	 unlikely	 values	 can	 be	 spotted	
immediately.	31,6	g/l	and	65,8	g/l	worth	of	sugar	are	incredibly	high	levels.	Six	other	outliers	
exist	with	big	values	for	residual	sugar.	By	removing	them,	the	values	now	lie	between	0,6	
and	20,8	g/l.	With	this,	standard	deviation	is	reduced	from	5,07	to	4,94.	The	same	method	is	
applied	to	remove	8	instances	for	attribute	‘chlorides’,	with	a	lower	standard	deviation	of	
0,02.	‘free	sulfur	dioxide’	showed	one	instance	that	doubled	the	value	of	the	second-highest.	
This	one	is	removed.	8	instances	were	removed	looking	at	‘total	sulfur	dioxide’.	Lastly,	two	
instances	 showed	 extremely	 low	 values	 for	 ‘alcohol’	 level.	 No	 other	 attributes	 show	
significant	outliers.	To	summarize,	32	outliers	were	found	and	deleted.	The	remaining	dataset	
contains	4866	instances.		
Attribute	 Deleted	outliers	
fixed	acidity	 2	
citric	acid	 2	
residual	sugar	 9	
chlorides	 8	
free	sulfur	dioxide	 1	
total	sulfur	dioxide	 8	
alcohol	 2	
Table	2	:	Outlier	detection	
Figure	5	:	Outliers	for	attribute	'fixed	acidity'
The	accuracy	improvements	are	tested	using	a	paired	T-test.	The	paired	T-test	assumes	that	
the	results	from	both	datasets	are	independent	and	normally	distributed.	These	assumptions	
are	fulfilled	since	the	results	of	the	dataset	without	outliers	is	independent	of	the	results	
before	deleting	them.	Also	by	setting	the	Weka	experimenter	to	perform	30	iterations,	the	
distribution	 of	 results	 can	 be	 approximated	 by	 a	 normal	 distribution.	 This	 is	 done	 for	 all	
experiments	in	this	study.	The	algorithms	during	this	experiment,	and	the	experiments	during	
further	preprocessing	steps	are	applied	using	default	values	in	Weka.	The	models	are	trained	
and	evaluated	with	10-fold	stratified	cross-validation.	Compared	to	normal	cross-validation,	
stratified	cross-validation	has	the	benefit	that	every	piece	is	a	good	representation	of	the	
dataset.	The	folds	are	selected	so	that	the	mean	response	value	is	approximately	equal	in	all	
the	folds.	This	has	been	proven	to	reduce	the	variance	of	the	estimated	accuracy.	The	results	
are	given	in	the	table	below.	
Figure	6	:	Performance	difference	deleting	outliers	
No	significant	improvements	are	found	for	both	J48	and	AdaBoost,	with	a	95%	confidence	
level	(two	tailed).	Except	for	the	baseline	algorithm	ZeroR,	deleting	outliers	has	no	visible	
effect	on	the	accuracy	and	its	standard	deviation	of	the	different	algorithms.	
5.1.3. Problem:	Imbalanced	dataset	
When	 the	 separate	 classes	 are	 not	 equally	 represented,	 the	 dataset	 is	 imbalanced.	 An	
imbalanced	dataset	can	lead	to	overfitting	and	underperforming	algorithms.	Our	dataset	is	
severely	unbalanced	with	the	amount	of	instances	ranging	from	5	in	the	minority	class	up	to	
2188	in	the	majority	class.	Extreme	quality	scores	are	rare	compared	to	the	mediocre	classes.	
By	 resampling,	 this	 problem	 can	 be	 solved.	 Resampling	 can	 either	 be	 done	 by	 deleting	
instances	from	the	over-represented	class	(under-sampling)	or	by	adding	copies	of	instances	
from	the	under-represented	class	or	synthetically	creating	such	instances	(over-sampling).	
Generally,	it	might	be	better	to	over-sample	unless	you	have	plenty	of	data.	There	are	some	
disadvantages	 to	 over-sampling	 however.	 It	 increases	 the	 dataset,	 leading	 to	 increased	
processing	time	needed	to	build	a	model.	Also,	since	the	class	is	not	taken	into	account	it	may	
cause	 overgeneralization.	 When	 put	 to	 extremes,	 oversampling	 can	 lead	 to	 overfitting		
(Drummond	&	Holte,	n.d.;	Rahman	&	Davis,	2013).	
Another	 option	 would	 be	 to	 keep	 the	 imbalanced	 dataset	 but	 to	 wrap	 your	 learning	
algorithms	in	a	penalization	scheme,	which	adds	an	extra	cost	on	misclassifying	a	minority	
class.	This	however,	means	that	the	algorithms	that	are	to	be	compared,	are	changed,	making	
comparisons	less	intuitive.	Therefore,	sampling	is	preferred.
In	Weka,	sampling	can	be	achieved	by	applying	the	supervised	SMOTE	filter	(Nitesh	V	Chawla,	
2005).	 This	 resamples	 the	 dataset	 by	 applying	 the	 Synthetic	 Minority	 Oversampling	
Technique.	It	does	not	simply	copy	instances	from	the	minority	class.	Rather,	it	iteratively	
looks	at	a	number	of	neighbors	and	creates	an	instance	with	randomly	distorted	attributes,	
within	the	boundaries	of	these	neighbors.		
We	changed	the	percentage-parameter	to	correspond	to	the	necessary	extra	instances	to	be	
created.	Since	the	over-sampling	takes	on	extreme	percentages,	we	expect	a	certain	bias	in	
the	results	due	to	overgeneralization.	However,	this	does	not	impact	the	differences	between	
J48	and	AdaBoost.	Remarks	on	this	method	can	be	found	in	the	limitations	paragraph.	After	
balancing,	our	training	set	consists	of	15311	instances,	which	means	that	10445	instances	
were	created.	WEKA	pushes	these	extra	instances	on	the	bottom	of	the	dataset.	If	you	want	
to	use	10-fold	cross-validation,	this	might	lead	to	folds	with	a	lot	of	instances	from	the	same	
class,	and	thus	eventually	lead	to	overfitting.	To	avoid	this	issue,	we	apply	an	extra	filter	that	
randomizes	the	instances	over	the	dataset.	
Class	 Number	of	instances	 %	to	add	 Amount	added	
1	 14	 15528	 2173	
2	 161	 1259	 2026	
3	 1443	 51,6	 744	
4	 2188	 0	 0	
5	 880	 148,6	 1307	
6	 175	 1150	 2012	
7	 5	 43660	 2183	
Table	3	:	Balancing	dataset	using	SMOTE	filter	
Figure	7	:	Effect	of	balancing	dataset
Figure	5	shows	the	effect	of	balancing	the	dataset,	using	the	percentages	stated	in	table	4.	
However,	for	a	skewed	dataset,	accuracy	is	not	a	great	measure.	Better	is	then,	to	measure	
the	performance	of	algorithms	by	precision	and	recall	(Nv	Chawla	&	Bowyer,	2002).	Recall	
measures	the	fraction	of	correctly	predicted	positives	and	the	actual	positives.	Precision,	on	
the	other	hand	shows	the	fraction	of	correctly	predicted	positives	and	all	predicted	positives.	
Together,	these	metrics	are	used	to	calculate	a	harmonic	mean	known	as	the	F1-measure.	In	
a	multiclass	environment,	every	class	makes	up	a	recall	and	precision	metric,	that	can	be	
combined	 into	 a	 weighted	 recall,	 weighted	 precision	 and	 ultimately	 into	 a	 weighted	 F1-
measure	.	
Recall			 =	 TP	 						
	 TP	+	FN	
Precision		 =		 TP	 	
		 TP	+	FP	
F1-measure	 =	2	x		precision	x	recall	
	 precision	+	recall	
Figure	8	:	accuracy	improvements	by	balancing	dataset	
Figure	9	:	weighted	average	recall	
Figure	10	:	weighted	average	precision
Figure	11	:	Weighted	average	F-measure	
Results	from	the	experiments	are	shown	above.	With	a	two-tailed	confidence	level	of	95%,	
the	performance	of	the	J48	and	AdaBoostM1(J48)	algorithms	improved	significantly	(v)	by	
balancing	the	dataset.	This	was	found	by	running	the	Weka	experimenter.	Only	the	baseline	
algorithm	deteriorated	significantly	(*).	Also	the	standard	deviations	lowered,	meaning	more	
stable	 results.	 This	 provides	 a	 broad	 conclusion	 that	 balancing	 truly	 improves	 the	
performance	of	the	mentioned	algorithms.	Here	the	default	values	of	the	algorithms	were	
used.	The	parameters	of	the	different	algorithms	will	be	adjusted	in	a	later	stage	when	we	
are	comparing	them	to	one	another.		
5.1.4. Normalization	
When	big	differences	among	the	variable	ranges	can	be	seen,	normalization	can	be	beneficial.	
For	this	dataset,	the	scales	are	very	different	among	the	attributes.	The	values,	measured	on	
different	scales,	are	adjusted	to	fit	a	common	scale.	It	is	important	that	normalization	is	
applied	after	checking	for	outliers.	Outliers	have	already	been	processed	before.	The	default	
values	for	the	scale	(1)	and	translation	(0,0)	are	used,	meaning	that	everything	is	scaled	to	the	
interval	[0,1].	The	class	values	are	ignored	since	they	are	nominal	values.	At	95%	confidence	
level,	there	is	no	significant	difference	when	looking	at	the	accuracy	of	the	three	algorithms.	
Here,	 normalization	 has	 no	 effect.	 Therefore,	 we	 continue	 with	 the	 dataset	 without	
normalizing	the	numeric	attributes.		
5.1.5. Feature	Selection	
In	the	real	world,	more	attributes	can	lead	to	higher	discrimination	power.	However,	most	
machine	learning	algorithms	have	difficulties	handling	irrelevant	or	redundant	information.	
Sometimes	attributes	can	be	completely	irrelevant	for	the	class.	These	attributes	still	need	
processing	power	and	can	even	bias	the	result.	Therefore,	feature	subset	selection	is	a	great	
way	to	improve	classification	results,	lower	processing	time	and	raise	readability	of	the	model	
(Guyon,	 2003).	 This	 is	 done	 by	 identifying	 and	 neglecting	 or	 removing	 the	 irrelevant	
information.	 Feature	 selection	 is	 successful	 if	 the	 number	 of	 dimensions	 can	 be	 reduced	
without	lowering	(or	by	improving)	the	accuracy	of	the	induction	algorithm.	
Figure	12	:	Effect	of	normalising	numeric	attributes
There	are	four	elements	to	consider	when	applying	feature	subset	selection:	the	evaluation	
strategy,	search	method,	search	direction	and	termination	point.		
The	evaluation	strategy	used,	depends	on	the	processing	power	available	and	the	dataset	
used.	There	are	three	types	of	feature	selection	methods.	An	embedded	method	puts	the	
feature	selection	within	the	basic	learning	algorithm.	A	filter	first	selects	features	to	be	passed	
on	 to	 the	 learning	 algorithm.	 The	 wrapper	 method	 wraps	 a	 feature	 selection	 algorithm	
around	 a	 classifier.	 When	 the	 irrelevant	 attributes	 need	 to	 be	 deleted	 before	 applying	 a	
learning	algorithm,	a	filter	can	be	used.	This	filter	applies	heuristics	to	the	data	characteristics	
in	order	to	determine	the	merit	of	including	or	excluding	a	specific	attribute.	When	you	want	
to	take	into	account	the	bias	of	the	learning	algorithm	that	is	used	for	feature	selection,	the	
wrapper	method	can	be	used.	This	leads	to	more	reliable	results	for	large	datasets	since	the	
feature	 selection	 is	 optimized	 for	 the	 particular	 learning	 algorithm	 used.	 This	 strategy	
however,	requires	a	tremendous	amount	of	processing	power	because	for	every	feature		set	
considered,	the	learning	algorithm	is	called	(Hall,	1999).	A	comparison	of	the	two	approaches	
can	be	seen	in	the	figure	below	(Hall,	1999).	
The	search	method	also	has	a	great	influence	on	the	processing	power	needed.	If	there	are	
N	attributes,	there	exist	2N
	subsets.	The	search	space	can	thus	grow	quickly,	where	exhaustive	
search	 becomes	 infeasible.	 Heuristic	 search	 methods	 on	 the	 other	 hand	 can	 lead	 to	
suboptimal	solutions.	Weka	can	force	the	method	to	hop	over	a	suboptimal	solution	to	lower	
this	possibility.	
Figure	13	:	Filter	and	wrapper	strategies
The	search	direction	can	have	a	serious	effect	on	the	attributes	selected.	One	can	start	by	
selecting	 all	 attributes	 and	 iteratively	 deleting	 attributes	 from	 that	 selection	 until	 some	
termination	 point.	 This	 method	 is	 called	 backward	 elimination.	 On	 the	 other	 hand,	 the	
forward	selection	method	starts	with	zero	attributes	and	gradually	builds	up	a	selection	until	
some	termination	point.		Combining	these	two	methods	leads	to	bi-directional	search,	where	
you	start	with	a	subset	of	attributes	and	you	either	delete	or	add	attributes	depending	on	
some	characteristic	such	as	merit.	
By	setting	a	termination	point,	you	avoid	processing	over	the	entire	search	space.	Typically,	
a	termination	point	could	be	a	fixed	number	of	attributes	to	select	or	a	merit	threshold.	
(i) Feature	subset	selection	in	Weka	
Based	 on	 the	 scatter	 plots,	 we	 suspect	 some	 attributes	 to	 be	 irrelevant	 based	 on	 their	
seemingly	high	correlation	with	each	other.	An	example	can	be	seen	in	the	figure	below,	
which	shows	the	relation	of	‘residual	sugar’	to	‘density’.	Based	on	the	theory	above,	one	can	
see	that	the	higher	the	amount	of	residual	sugar,	the	higher	the	density	will	be.	This	relation,	
combined	with	the	low	correlation	of	SO2	with	‘quality’,	will	probably	lead	to	the	exclusion	of	
the	one	of	the	two	attributes.	
Figure	14	:	Relation	of	‘residual	sugar’	to	‘density’	shows	high	correlation	
Weka	 allows	 many	 methods	 to	 apply	 feature	 subset	 selection,	 either	 permanently	 or	
temporarily	 during	 learning	 algorithm	 execution.	 Since	 processing	 power	 is	 limited,	 all	
irrelevant	 attributes	 were	 first	 discarded	 before	 using	 the	 adapted	 dataset	 to	 train	 the	
models.	This	method	is	very	fast	and	leads	to	similar	performance	as	the	slower	wrapper	
method.	 Although	 there	 is	 a	 filter	 in	 Weka	 called	 AttributeSelection	 that	 combines	 an	
evaluation	strategy	with	a	search	method	to	automatically	select	the	correct	attributes,	it	
does	not	apply	cross-validation.	Therefore,	an	attribute	selection	is	first	processed,	and	its	
results	manually	applied	afterwards.	The	evaluation	strategy	used	is	CfsSubsetEval	(CFS	=	
Correlation	based	Feature	Selection),	which	looks	at	the	correlation	matrix	of	all	attributes.	
This	leads	to	a	metric	called	“symmetric	uncertainty”	(Hall,	1999).	It	considers	the	predictive	
value	of	each	attribute,	together	with	the	degree	of	inter-redundancy.
Attributes	 with	 high	 correlation	 with	 the	 class	 attribute	 and	 low	 inter-correlations	 are	
preferred.	CFS	assumes	that	that	the	attributes	are	independent	and	can	fail	to	select	the	
relevant	 attributes	 when	 they	 depend	 strongly	 on	 other	 attributes	 given	 a	 class.	 The	
components	of	CFS	are	listed	in	the	figure	below	(Hall,	1999).	
Figure	15	:	Components	of	CFS	
Multiple	 search	 methods	 were	 used	 to	 compare	 the	 results.	 All	 lead	 to	 the	 same	 result.	
Ultimately	exhaustive	search	was	used	because	it	was	more	than	feasible	with	only	twelve	
attributes.	With	10-fold	cross	validation,	it	shows	a	clear	exclusion	of	‘residual	sugar’.	
Experimenting	with	the	dataset	before	and	after	deleting	the	attribute	‘residual	sugar’	shows	
that	 by	 deleting	 it,	 the	 performance	 of	 the	 different	 algorithms	 does	 not	 deteriorate	
significantly	(95%	significance	level).	More	importantly	however,	the	CPU	time	needed	to	
build	the	model	does	decrease	significantly	from	0.96	to	0.87	for	J48	and	from	8.84	to	8.35	
for	AdaBoost	(cf	figures	below).	This	shows	that	by	removing	the	attribute	one	can	reduce	
processing	time	without	lowering	the	performance	of	the	algorithms.	It	is	therefore	beneficial	
to	remove	the	attribute,	especially	when	processing	power	is	limited.	
Figure	16	:	Results	of	Attribute	Selection	with	10	fold	cross-validation
5.2. Training	and	Test	set	split	
The	accuracies	stated	above	are	an	estimate	for	the	entire	population.	The	goal	here	is	to	
compare	the	performance	of	the	different	algorithms	on	the	dataset	itself.	To	avoid	biased	
results,	a	separate	test	set	should	be	used	to	check	the	performance	of	the	models	that	were	
built	using	the	training	set.	Our	dataset	is	sufficiently	large	to	perform	a	66%	split	in	order	to	
create	a	test	set.	To	do	this,	the	RemovePercentage	filter	was	used.	However,	the	way	this	
split	 is	 done	 can	 dramatically	 affect	 the	 performance	 of	 the	 algorithms.	 The	
RemovePercentage	 filter	 simply	 splits	 the	 dataset	 in	 the	 order	 listed	 at	 the	 moment.	
Therefore,	it	is	better	to	first	randomize	the	dataset.	Because	the	dataset	has	already	been	
randomized	after	balancing,	RemovePercentage	filter	was	directly	applied	with	a	66%	split.	
This	smaller	dataset	was	saved	as	the	test	set.	Afterwards,	the	original	set	was	reloaded,	the	
same	 percentage	 used,	 but	 now	 combined	 with	 the	 InvertSelection	 property.	 This	 way	 a	
training	set	of	10105	instances	and	a	test	set	of	5206	instances	was	created.	We	will	not	touch	
the	test	set	until	we	created	the	models	to	be	compared	with	each	other.		
5.3. Comparing	algorithms	
In	Weka,	the	induction	algorithms	can	be	optimized	by	adjusting	different	parameters.	The	
default	values	lead	to	reasonable	performance.	However,	since	every	dataset	differs,	it	might	
be	beneficial	to	make	use	of	tailored	values.	That	is	why,	before	performing	the	experiments,	
both	J48	and	AdaBoost	are	optimized	separately.		
5.3.1. Optimizing	J48	
Of	 interest	 are	 the	 parameters	 minNumObj,	 which	 determines	 the	 minimum	 number	 of	
instances	needed	to	make	up	a	leaf,	and	confidenceFactor,	which	determines	the	amount	of	
pruning	that	occurs	(with	a	smaller	value	meaning	more	pruning).	By	increasing	the	minimum	
Figure	17	:	Accuracy		before	and	after	deleting	'residual	sugar'	
Figure	18	:	CPU	time	training		before	and	after	deleting	'residual	sugar'
number	of	objects	in	a	leaf,	the	size	of	the	tree	can	be	limited.	This	allows	for	an	easier	to	
understand	 model	 and	 can	 reduce	 overfitting.	 However,	 it	 is	 expected	 that	 accuracy	 will	
decrease	with	growing	leafs.	Therefore,	a	tradeoff	needs	to	be	made	between	tree	size	and	
accuracy.	The	graph	below	shows	the	impact	of	adjusting	the	minNumObj	parameter.	The	
experimenter	was	used	with	30	iterations.	
Figure	19	:	Adjusting	minNumObj	parameter	for	J48	
By	default,	minNumObj	is	set	at	2.	This	is	increased	up	to	500.	The	graph	shows	a	gradual	
decline	in	both	tree	size	and	accuracy	as	a	result.	The	standard	deviation	is	not	included	in	
this	graph	but	gradually	increases	for	both	metrics.	The	tree	size	shrinks	much	faster	than	the	
accuracy.	By	changing	the	parameter	to	5,	the	tree	size	is	divided	in	two	while	the	accuracy	
decreases	only	from	69,8	to	68,12.	Going	further	to	10	as	a	minimum	results	in	yet	again	half	
the	tree	size,	with	a	slight	decrease	in	accuracy	to	66,5.	From	that	moment	on,	the	accuracy	
drops	slightly	faster.	Therefore,	minNumObj	is	set	at	10.	Remarks	to	this	decision	can	be	found	
in	the	limitations	section.	
By	adjusting	the	confidenceFactor,	accuracy	does	not	change	significantly,	values	from	0,05	
up	to	0,5	have	been	tested,	with	0,25	being	the	default	value.	The	default	value	is	therefore	
5.3.2. Optimizing	AdaBoost	
Since	AdaBoost(J48)	implements	J48,	it	is	important	to	use	the	same	parameters	for	J48	here	
as	those	used	for	the	standalone	J48	algorithm.	AdaBoost	itself	also	allows	some	adjusting.	
Namely,	 the	 number	 of	 iterations	 can	 be	 adjusted	 (cf.	 T	 in	 the	 theoretical	 section	 on	
AdaBoost).	Obviously,	the	accuracy	increases	as	more	iterations	make	up	the	committee.	The	
graph	 below	 shows	 the	 accuracies	 corresponding	 to	 different	 numbers	 of	 iterations.	 The	
accuracy	improvements	gradually	decline.	By	setting	an	improvement	threshold	at	1%,	the	
number	of	iterations	is	set	at	15,	which	leads	to	an	accuracy	of	78,28%.
Figure	20	:	Accuracy	as	function	of	the	iterations	in	AdaBoost	
5.3.3. Experimental	setup	
The	goal	is	to	compare	the	performance	of	the	different	models	on	our	test	set.	Since	the	
Weka	experimenter	does	not	allow	supplying	a	separate	test	set,	the	Weka	Knowledgeflow	
was	used	to	set	up	our	experiment.	The	figure	below	shows	the	setup.	Both	the	training	set	
and	the	test	set	are	loaded	and	assigned	a	purpose.	The	attribute	‘quality’	is	set	as	the	class	
of	the	datasets.	All	classifier	elements	are	configured	to	fit	the	optimal	parameters	found	
above.	The	results	are	evaluated	and	pointed	toward	both	a	text	viewer	and	performance	
Figure	21	:	Experimental	setup	in	Weka	KnowledgeFlow
6. Results	&	Interpretation	
The	table	below	shows	all	relevant	performance	metrics	of	the	different	algorithms.	
		 ZeroR	 J48	 AdaBoost	
Accuracy	 12,93%	 65,48%	 78,33%	
Kappa	 0,00	 0,60	 0,75	
Mean	absolute	error	 0,24	 0,12	 0,07	
Root	mean	squared	error	 0,35	 0,26	 0,22	
Relative	absolute	error	 100%	 48%	 27%	
Root	relative	squared	error	 100%	 75%	 63%	
TP	rate	 0,13	 0,66	 0,78	
FP	rate	 0,13	 0,06	 0,04	
Precision	 0,02	 0,65	 0,78	
Recall	 0,13	 0,66	 0,78	
F-measure	 0,03	 0,65	 0,78	
ROC	area	 0,50	 89,00	 0,96	
Table	4	:	Experimental	results	
From	the	table	it	is	clear	that	AdaBoost	outperforms	the	other	algorithms	in	every	aspect,	
with	J48	being	the	runner-up.	Although	our	dataset	is	balanced,	due	to	the	random	split,	small	
differences	exist	in	class	sizes.	In	the	training	set,	the	class	with	a	score	of	9	is	slightly	bigger	
than	the	other	classes.	That	is	why	ZeroR	will	impose	as	rule	to	always	choose	that	class.	This	
leads	to	an	accuracy	of	merely	12,9%	on	the	test	set.	J48	performs	much	better,	with	an	
accuracy	of	65,48%.	This	equals	an	error	reduction	of	60%.	Compared	to	J48,	AdaBoost	again	
reduces	the	error	of	J48	with	37%.	Since	the	classes	are	still	quite	balanced,	the	weighted	
average	 of	 the	 F-measure	 approximates	 the	 accuracy.	 In	 the	 case	 of	 J48	 and	 AdaBoost,	
(weighted	avg.)	Precision	and	(weighted	avg.)	Recall	also	approximate	the	accuracy.	ZeroR,	
on	the	other	hand,	shows	a	much	lower	value	for	the	weighted	average	of	precision	since	in	
all	but	one	class,	the	TP	and	FP	rates	are	zero.		
Although	the	ROC	area	is	especially	useful	for	unbalanced	dataset,	here	it	confirms	the	other	
metrics	with	the	area	for	AdaBoost	coming	close	to	1.	This	means	it	is	an	excellent	prediction.	
J48	with	an	ROC	area	just	below	0,9	shows	a	good	prediction.	The	confusion	matrices	can	be	
found	in	appendix	B.	They	show	that	for	the	class	with	score	9,	very	little	errors	are	made	in	
J48	and	AdaBoost.	For	J48,	no	instances	from	this	class	are	misclassified	as	having	lower	
scores	and	AdaBoost	misclassifies	only	3	instances	like	this.	Also,	only	instances	with	scores	
from	6	to	8	are	occasionally	misclassified	as	being	in	the	last	class.	This	shows	that	the	models	
perform	better	on	good	wines.
7. Conclusion	
As	expected,	AdaBoost	does	a	great	job	in	improving	the	performance	of	the	underlying	
algorithm.	The	performance	on	our	test	set	confirms	the	hypothesis.	The	performance	of	
ZeroR	severely	depends	on	the	amount	of	classes	and	the	distribution	among	them.	J48,	given	
its	lower	complexity	and	smaller	need	for	processing	power,	performs	fairly	well.	
The	results	shown	here	depend	of	course	on	the	dataset	split	and	will	vary	somewhat	when	
randomized.	 To	 give	 an	 estimate	 of	 their	 performance	 on	 the	 entire	 population,	 cross-
validation	should	be	performed	with	at	least	30	iterations	on	which	a	T-test	can	be	used.	This	
method	was	applied	during	the	intermediary	experiments	at	the	preprocessing	stage	and	
show	similar	results.
8. Limitations	
8.1. Outlier	detection	
Here,	outliers	were	detected	arbitrarily	by	looking	at	scatter	plots	and	comparing	the	values	
with	possible	values	in	the	area	of	winemaking.	However,	this	method	is	not	complete	since	
it	involves	choosing	which	instances	to	delete	and	which	not.	Better,	but	more	drastic,	would	
have	been	combining	the	visual	method	with	a	statistical	method	for	outlier	detection.	One	
could	assume	that	the	attributes	follow	a	normal	distribution,	or	adjust	the	right	filter	so	that	
they	do	(Ben-gal,	2005;	Cousineau,	2009).	Then,	a	criterion	could	be	established	based	on	z-
scores,	excluding	all	instances	that	are	x	standard	deviations	apart	from	the	sample	mean.	In	
Weka	this	can	be	done	using	the	InterQuartileRange	filter.	Any	value	outside	the	range	[Q1	–	
k.IQR	,	Q3	+	k.IQR],	with	k	being	a	constant,	would	be	designated	as	an	outlier.	
8.2. Oversampling	
It	has	been	shown	that	a	combination	of	under-sampling	and	over-sampling	might	lead	to	
better	 performance	 than	 pure	 over-sampling.	 Also,	 the	 SMOTE	 filter	 has	 been	 taken	 to	
extreme	percentages	in	this	study.	The	results	will	certainly	suffer	from	overgeneralisation.	
Although	other	studies	report	experiments	on	a	binary	class	where	one	class	takes	up	98%,	
no	best	practices	has	been	found	on	limiting	the	SMOTE	percentage	and	the	effects	on	error	
rate,	recall	and	precision.	Experimenting	on	different	percentages	goes	beyond	the	scope	of	
this	 study	 but	 is	 an	 interesting	 domain.	 Above	 that,	 the	 over-sampling	 happens	 before	
splitting	the	data	into	a	training	set	and	a	test	set.	This	means	that	after	splitting,	the	test	set	
will	include	synthetic	instances.	These	lead	to	results	that	hold	no	predictive	power.	For	this	
study	however,	only	the	difference	between	the	J48	algorithm	and	the	AdaBoost	algorithm	is	
wanted.	That	is	why,	in	this	context,	over-sampling	the	complete	dataset	is	not	a	problem.	
8.3. Optimizing	J48	
The	 accuracy	 of	 J48	 stated	 in	 the	 graph	 is	 based	 on	 the	 training	 set	 using	 10-fold	 cross	
validation	and	thus	provides	no	fixed	indication	of	the	accuracy	on	the	test	set.	Next	to	that,	
10	as	value	for	MinNumObj	is	chosen	arbitrarily,	is	not	based	on	a	fixed	criterion/threshold	
but	rather	on	what,	in	my	opinion,	would	be	a	good	tradeoff	between	tree	size	and	accuracy.	
Clearly,	this	would	need	to	be	further	investigated	to	be	correct.
9. References	
Bartlett,	P.	L.	(2007).	AdaBoost	is	Consistent,	8,	2347–2368.	
Beelman,	R.	B.,	&	Gallander,	J.	F.	(1979).	Wine	Deacidification.	Advances	in	Food	Research,	
25(C),	1–53.	
Ben-gal,	I.	(2005).	Outlier	Detection.	Data	Mining	and	Knowledge	Discovery	Handbook,	131–
Chawla,	N.	V.	(2005).	Data	Mining	for	Imbalanced	Datasets:	An	Overview.	Data	Mining	and	
Knowledge	Discovery	Handbook,	853–867.	
Chawla,	N.,	&	Bowyer,	K.	(2002).	SMOTE:	Synthetic	Minority	Over-sampling	Technique	
Nitesh.	Journal	of	Artificial	Intelligence	Research,	16,	321–357.	
Coli,	M.	S.,	Gil,	A.,	Rangel,	P.,	Souza,	E.	S.,	Oliveira,	M.	F.,	Cristina,	A.,	&	Chiaradia,	N.	(2015).	
Chloride	concentration	in	red	wines:	influence	of	terroir	and	grape	type.	Food	Science	
and	Technology,	35(1),	95–99.	
Cortez,	P.,	Cerdeira,	A.,	Almeida,	F.,	Matos,	T.,	&	Reis,	J.	(2009).	Modeling	wine	preferences	
by	data	mining	from	physicochemical	properties.	Decision	Support	Systems,	47(4),	547–
Cousineau,	D.	(2009).	Outliers	detection	and	treatment :	a	review	.,	3(2010),	58–67.	
Drummond,	C.,	&	Holte,	R.	C.	(n.d.).	C4.5,	Class	Imbalance,	and	Cost	Sensitivity:	Why	Under-
Sampling	beats	Over-Sampling.	
Drysdale,	G.	S.,	&	Fleet,	G.	H.	(1988).	Acetic	Acid	Bacteria	in	Winemaking:	A	Review.	Am.	J.	
Enol.	Vitic.,	39(2),	143–154.	
Freund,	Y.,	&	Schapire,	R.	E.	(1996).	Experiments	with	a	new	boosting	algorithm.	Thirteenth	
International	Conference	on	Machine	Learning,	148–156.	
Guyon,	I.	(2003).	An	Introduction	to	Variable	and	Feature	Selection,	3,	1157–1182.	
Hall,	M.	a.	(1999).	Correlation-based	Feature	Selection	for	Machine	Learning.	Methodology,	
21i195-i20(April),	1–5.	
Kong,	E.	B.,	&	Dietterich,	T.	G.	(1995).	Error-Correcting	Output	Coding	Corrects	Bias	and	
Variance.	Icml,	0,	313–321.	
Quinlan,	J.	R.	(1992).	C4.5:	Programs	for	Machine	Learning.	Morgan	Kaufmann	San	Mateo	
California	(Vol.	1).	
Quinlan,	J.	R.	(2006).	Bagging,	boosting,	and	C4.5.	Proceedings	of	the	Thirteenth	National	
Conference	on	Artificial	Intelligence,	5(Quinlan	1993),	725–730.	
Rahman,	M.	M.,	&	Davis,	D.	N.	(2013).	Addressing	the	Class	Imbalance	Problem	in	Medical	
Datasets.	International	Journal	of	Machine	Learning	and	Computing,	3(2),	224–228.	
Robinson,	J.	(1994).	The	Oxford	companion	to	wine.	In	The	Oxford	companion	to	wine	(pp.	
401,	530–31).	R22	
Salvatore,	R.	(2000).	Efficient	C4.5.	
Schapire,	R.	E.	(2013).	Explaining	adaboost.	Empirical	Inference:	Festschrift	in	Honor	of	
Vladimir	N.	Vapnik,	37–52.
10. Appendix	
10.1. Appendix	A	:	Outlier	detection	
Ap.	Figure	3	:	Outliers	with	attribute	'chlorides'	
Ap.	Figure	1	:	Outliers	with	attribute	'citric	acid'	
Ap.	Figure	2	:	Outliers	with	attribute	'residual	sugar'
Ap.	Figure	4	:	Outliers	with	attribute	'Free	Sulfur	Dioxide'	
Ap.	Figure	5	:	Outliers	with	attribute	'Total	Sulfur	Dioxide'	
Ap.	Figure	6	:	Outliers	with	attribute	'alcohol'
10.2. Appendix	B	:	Algorithm	comparison	
Ap.	Figure	7	:	Confusion	matrix	ZeroR	
Ap.	Figure	8	:	Confusion	matrix	J48	
Ap.	Figure	9	:	Confusion	matrix	AdaBoostM1(J48)

More Related Content

What's hot

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
Image processing
Image processingImage processing
Image processing
Pooja G N
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Derek Kane
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
Zaid Al-husseini
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical AnalysisBrain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
MD Abdullah Al Nasim
Journal For Research
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Seonho Park
R Graphical User Interface Comparison.pptx
R Graphical User Interface Comparison.pptxR Graphical User Interface Comparison.pptx
R Graphical User Interface Comparison.pptx
Ramakrishna Reddy Bijjam
Keystroke dynamics
Keystroke dynamicsKeystroke dynamics
Keystroke dynamics
Tushar Kayande
Fingerprint Technology
Fingerprint TechnologyFingerprint Technology
Fingerprint Technology
Joy Dutta
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
inovex GmbH
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
Arunabha Saha
Neeraj Goswami
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
Akhilesh Joshi
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
DataminingTools Inc
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
Lior Rokach
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Anna Fensel
Visual pattern recognition
Visual pattern recognitionVisual pattern recognition
Visual pattern recognition
Rushin Shah

What's hot (20)

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Image processing
Image processingImage processing
Image processing
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science -  Part XV - MARS, Logistic Regression, & Survival AnalysisData Science -  Part XV - MARS, Logistic Regression, & Survival Analysis
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Perceptron in ANN
Perceptron in ANNPerceptron in ANN
Perceptron in ANN
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical AnalysisBrain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Brain Tumor Segmentation using Enhanced U-Net Model with Empirical Analysis
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
Convolutional Neural Network for Alzheimer’s disease diagnosis with Neuroim...
R Graphical User Interface Comparison.pptx
R Graphical User Interface Comparison.pptxR Graphical User Interface Comparison.pptx
R Graphical User Interface Comparison.pptx
Keystroke dynamics
Keystroke dynamicsKeystroke dynamics
Keystroke dynamics
Fingerprint Technology
Fingerprint TechnologyFingerprint Technology
Fingerprint Technology
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
WEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic MethodsWEKA: Algorithms The Basic Methods
WEKA: Algorithms The Basic Methods
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
Visual pattern recognition
Visual pattern recognitionVisual pattern recognition
Visual pattern recognition

Viewers also liked

LBD BrandDev Portfolio
LBD BrandDev PortfolioLBD BrandDev Portfolio
LBD BrandDev Portfolio
Aaron Luck
My Hobbies
My HobbiesMy Hobbies
My Hobbies
Chapter 11 vocabular words and guided notes
Chapter 11 vocabular words and guided notesChapter 11 vocabular words and guided notes
Chapter 11 vocabular words and guided notes
Innovative lesson template
Innovative lesson templateInnovative lesson template
Innovative lesson template
Sertifikat Digital
Sertifikat DigitalSertifikat Digital
Sertifikat Digital
pingkan lumongdong
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Empirical Study on Classification Algorithm For Evaluation of Students Academ...Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
IJCSIS Research Publications
Jason Warnstaff
HCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink AppHCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink App
Darran Mottershead
Assessing Component based ERP Architecture for Developing Organizations
Assessing Component based ERP Architecture for Developing OrganizationsAssessing Component based ERP Architecture for Developing Organizations
Assessing Component based ERP Architecture for Developing Organizations
IJCSIS Research Publications
Manikandan Sundarapandian
Waltzing with Branches [ACCU]
Waltzing with Branches [ACCU]Waltzing with Branches [ACCU]
Waltzing with Branches [ACCU]
Chris Oldwood
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
Ishan Awadhesh
HCI - Group Report for Metrolink App
HCI - Group Report for Metrolink AppHCI - Group Report for Metrolink App
HCI - Group Report for Metrolink App
Darran Mottershead
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
Hein Min Htike
Project 2 Data Mining Part 1
Project 2 Data Mining Part 1Project 2 Data Mining Part 1
Project 2 Data Mining Part 1
открытый урок по пдд
открытый урок по пддоткрытый урок по пдд

Viewers also liked (20)

LBD BrandDev Portfolio
LBD BrandDev PortfolioLBD BrandDev Portfolio
LBD BrandDev Portfolio
My Hobbies
My HobbiesMy Hobbies
My Hobbies
Chapter 11 vocabular words and guided notes
Chapter 11 vocabular words and guided notesChapter 11 vocabular words and guided notes
Chapter 11 vocabular words and guided notes
Innovative lesson template
Innovative lesson templateInnovative lesson template
Innovative lesson template
Sertifikat Digital
Sertifikat DigitalSertifikat Digital
Sertifikat Digital
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Empirical Study on Classification Algorithm For Evaluation of Students Academ...Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Empirical Study on Classification Algorithm For Evaluation of Students Academ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
Naïve Bayes and J48 Classification Algorithms on Swahili Tweets: Performance ...
HCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink AppHCI - Individual Report for Metrolink App
HCI - Individual Report for Metrolink App
Assessing Component based ERP Architecture for Developing Organizations
Assessing Component based ERP Architecture for Developing OrganizationsAssessing Component based ERP Architecture for Developing Organizations
Assessing Component based ERP Architecture for Developing Organizations
Waltzing with Branches [ACCU]
Waltzing with Branches [ACCU]Waltzing with Branches [ACCU]
Waltzing with Branches [ACCU]
Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka Classification and Clustering Analysis using Weka
Classification and Clustering Analysis using Weka
HCI - Group Report for Metrolink App
HCI - Group Report for Metrolink AppHCI - Group Report for Metrolink App
HCI - Group Report for Metrolink App
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
Project 2 Data Mining Part 1
Project 2 Data Mining Part 1Project 2 Data Mining Part 1
Project 2 Data Mining Part 1
открытый урок по пдд
открытый урок по пддоткрытый урок по пдд
открытый урок по пдд

Recently uploaded

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Hitesh Sikarwar
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
Sérgio Sacani
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
Aditi Bajpai
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Texas Alliance of Groundwater Districts
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN

Recently uploaded (20)

Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
Unlocking the mysteries of reproduction: Exploring fecundity and gonadosomati...
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
Cytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptxCytokines and their role in immune regulation.pptx
Cytokines and their role in immune regulation.pptx
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
The debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically youngThe debris of the ‘last major merger’ is dynamically young
The debris of the ‘last major merger’ is dynamically young
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdfBob Reedy - Nitrate in Texas Groundwater.pdf
Bob Reedy - Nitrate in Texas Groundwater.pdf
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...

Classifiers for Predicting Wine Quality