SlideShare a Scribd company logo
1 of 22
Download to read offline
 
 
 
 
 
SabancıUniversity 
DataAnalyticsM.Sc.Programme 
2015-2016Term 
 
DA592-TermProject 
 
GrupoBimboInventoryDemand 
KaggleContest 
 
 
 
Students:BerkerKozan,CanKöklü 
   
 
Abstract 
ThisdataanalyticsprojectwasdoneforaKagglecontestwherethegoalwastoperform 
demandpredictionforGrupoBimbocompany.PythonlanguagewasusedwithJupyter 
notebooks.XGBoostlibrarywasusedtoperformtrainingandpredictions. 
VariousfeatureengineeringfeaturessuchasNLTKfortextextraction,creationoflag 
columnsandaveragingoveralargenumberofvariableswereusedtoenhancedata. 
Afterthetraintablewascreated,XGBoostwasutilizedtooptimizeaccordingtothe 
scoringfunctiondictatedbythecontest,RMSLE.Hyperparametertuningwasalso 
leveragedafterfeatureselectionbasedonfeatureimportanceandcorrelationanalysis 
todeterminethebestparametersfortheXGBoostoptimizer. 
ThefinalsubmissiontoKaggleachievedascoreof0.48666;placingourteaminthetop 
17%ofthe2000contestants. 
Thebiggestchallengeswererelatedtoanalyzingandtrainingonalargedataset.This 
wasovercomebyforcingthedatatypestosmallertypes(unsignedintegers,low 
accuracyfloats,etc.),usingHDF5fileformatfordatastorageandlaunchinga 
powerfulGoogleCloudComputePreemptibleInstance(with208GBRAM). 
Furtherimprovementswouldincludeattemptinghyperparametertuningacrossawider 
rangeoftrainingtables(withdifferentfeatures)andalsoimplementingafailsafe 
methodforrunningtheexperimentinpreemptibleinstances.Additionally,creating 
differentmodelsandaveragingthemtofindoptimalandnon-overfittedmodelswould 
haveyieldedbetterresults. 
 
 
Keywords:datascience,kaggle,demandprediction,python,jupyter,xgboost,cloud, 
googlecloudcompute,hdf5,hyperparametertuning,featureselection 
 
   
1 
TableofContents 
 
Abstract 1 
TableofContents 2 
Introduction 3 
WhatisKaggle? 3 
Whatisthecontestabout? 3 
Whythisproject? 4 
ToolsUsed 4 
Python 4 
Platforms 5 
DataExploration 6 
DefinitionoftheDataSets 6 
ExploratoryDataAnalysis 7 
DataTypesandSizes 7 
SummaryofData 8 
Correlations 9 
DecreasingtheDataSize 9 
Models 10 
NaivePrediction 10 
Score 10 
NLTKbasedModelling 10 
FeatureEngineering 10 
Modeling 11 
TechnicalProblems 11 
GarbageCollection 11 
DataSize 11 
Score 12 
Conclusion 12 
FinalModels 12 
DeeperDataExploration 12 
Demanda-Dev-Venta  12 
Train-TestDifference 13 
FeatureEngineering 14 
Agencia 14 
Producto 14 
ClientFeatures 14 
DemandFeatures 15 
GeneralTotals 15 
ValidationTechnique 17 
Xgboost 17 
Training 18 
HyperparameterTuning 18 
MaxDepth 18 
Subsample 18 
ColSampleByTree 18 
LearningRate 19 
TechnicalProblems 19 
StoringData 19 
RAMProblem 19 
CodeReuse&Automatization 19 
Results 20 
Conclusion 21 
CriticalMistakes 22 
FurtherExploration 22 
 
 
 
   
2 
Introduction 
WhatisKaggle? 
Kaggle isawebsitefoundedin2010thatprovidesdatasciencechallengestoits 1
participants.Participantscompeteagainsteachothertosolvedatascienceproblems. 
Kagglehasunranked“practice”challengesaswellascontestswithmonetaryrewards. 
CompaniesthathavedatachallengesworktogetherwithKaggletoformulatetheproblem 
andrewardthetopperformers. 
Whatisthecontestabout? 
Thecontestthatwehavetaken onforourprojectbelongstoaMexicancompanynamed 2
GrupoBimbo.GrupoBimboisacompanythatproducesanddistributesfreshbakery 
products.Thenatureoftheproblematitscoreisdemandestimation. 
GrupoBimboproducestheproductsandshipsthemfromstoragefacilities(agencies)to 
stores(clients).Thefollowingweek,acertainnumberofproductsthataren’tsoldare 
returnedfromtheclientstoBimbo.Tomaximizetheirprofits,GrupoBimboneedsto 
predictthedemandofstoresaccuratelytominimizethesereturns. 
Inthecontest,weareprovidedwith9weeksworthofdataregardingtheseshipments 
andweareaskedtopredictthedemandforweeks10and11.Participantsareallowedto 
submit3setsofpredictionseverydayuntilthedeadlineoftheprojectandcanpick 
anytwoofthesepredictionsastheirfinalsubmissions. 
AsastandardpracticeatKaggle,whenmakinginitialsubmissions,thepredictionsare 
rankedbasedona“public”rankingwhichonlyevaluatesacertainpartofthe 
submission.Thisisdonetopreventgamingthesystembyoverfittingthroughtrial& 
errorofsubmissions.Forthiscontest,the“public”rankingisdoneonweek10data; 
meaning,whensubmittingourpredictionswewouldonlybeabletoseeourperformance 
forweek10.Theprivateperformanceofourpredictions(i.e.weeks11)areonlyshown 
afterthecontestends. 
ForthiscontesttheevaluationmetricistheRootMeanSquaredLogarithmicErrorof 
ourpredictions. 
 
1
"About|Kaggle."2012.10Sep.2016<https://www.kaggle.com/about> 
2
"GrupoBimboInventoryDemand|Kaggle."2016.10Sep.2016 
<https://www.kaggle.com/c/grupo-bimbo-inventory-demand> 
3 
Whythisproject? 
WedecidedtodoaKagglecontestforourprojectforvariousreasons: 
1.Itwouldallowustobenchmarkourdatascienceabilitiesinaninternational 
field. 
2.Kagglehasveryactiveforumsforeachindividualcontestandthesewouldprovide 
uswithgreatnewmethodsandinsightsinsolvingproblems. 
3.Sincethedataprovidedisclean,wecouldspendmoretimeinfeatureandmodel 
buildingratherthandatacleaning. 
4.Wecouldworktowardsacleargoalandnotbedistracted. 
5.FromthenumberofcontestinKaggle,wepickedtheGrupoBimboprojectbecause: 
a.Itdealswithtextdatawhichisconsiderablyeasiertoworkwithfor 
beginners. 
b.Thedatawasverylargeandprovidedalearningopportunityinworking 
withlargedatasets. 
c.Thedeadlineoftheproject(August30)wasinlinewiththedeadlineof 
ourtermproject. 
ToolsUsed 
Python 
WedecidedtousePython(version2.7)astheourscriptinglanguage.Thisisthe 3
languageweworkedwithmostinourprogrammeandalsooneofthemostpopulardata 
sciencelanguages.WebuiltoursystemsontheAnacondapackagebyContinuum asit 4
offersalargenumberoflibrariesthathelpusfacethechallenges. 
WemainlyranJupyter (IPython)notebooksonvarioussystemstocodeandreport 5
results. 
Afewofthespecifictools/packagesthatweusedwere: 
● NLTK:NLTKisthemostpopularNaturalLanguageProcessingToolkitforPython. 6
Itoffersgreatfeatureslikestemming,tokenizingandchunkinginmultiple 
languages.ThiswascriticalsincetheproductnameswereinSpanish. 
● XGBoost:XGBoostisalibrarythatcanbeusedinconjunctionwithvarious 
scriptinglanguages(includingRandPython)thatisdesignedforgradient 
boostingtrees.Itismuchfasterthanregularscriptingtoolssincethe 
computationalpartsarewrittenandprecompiledinC++.Wepickedthissolution 
basedsimplyonitsfame,asmanyofthewinnershaveusedthistoolinKaggle 
contests. 7
● Pickle:Thepicklemoduleimplementsbinaryprotocolsforserializingand 8
de-serializingaPythonobjectstructure. 
3
"Python2.7.0Release|Python.org."2014.10Sep.2016 
<https://www.python.org/download/releases/2.7/> 
4
"DownloadAnacondaNow!|Continuum-ContinuumAnalytics."2015.10Sep.2016 
<https://www.continuum.io/downloads> 
5
"ProjectJupyter|Home."2014.10Sep.2016<http://jupyter.org/> 
6
"NaturalLanguageToolkit— NLTK3.0documentation."2005.10Sep.2016 
<http://www.nltk.org/> 
7
"xgboost/demoatmaster·dmlc/xgboost·GitHub."2015.10Sep.2016 
<https://github.com/dmlc/xgboost/tree/master/demo> 
8
"12.1.pickle— Pythonobjectserialization— Python3.5.2documentation."2014.20Sep. 
2016<https://docs.python.org/3/library/pickle.html> 
4 
● HDF5 FileFormat:HDFisself-describing,allowinganapplicationtointerpret 9
thestructureandcontentsofafilewithnooutsideinformation,a 
general-purpose,machine-independentstandardforstoringscientificdatain 
files,developedbytheNationalCenterforSupercomputingApplications(NCSA). 
● Scikit-Learn:Scikit-Learnisasimpleandefficienttoolfordataminingand 10
machinelearningbesidethatit’sfreeandbuildonnumpy,matplotlibandscipy. 
Weuseditonfeatureextractionphase. 
● NumPy:NumPyisanopensourceextensionmoduleforPython,whichprovidesfast 11
precompiledfunctionsformathematicalandnumericalroutines.Furthermore,NumPy 
enrichestheprogramminglanguagePythonwithpowerfuldatastructuresfor 
efficientcomputationofmultidimensionalarraysandmatrices. 
● SciPy:SciPyisaPython-basedecosystemofopen-sourcesoftwarefor 12
mathematics,science,andengineering.Weuseditforsparsematrices. 
● GarbageCollector:Thegcmodulewasusedinordertofreeupmemory 13
periodicallytooptimizeperformance. 
Platforms 
Forcodingandperformingourcomputations,weinitiallyattemptedtouseourlaptops 
(aMacbookProandanUbuntuMachineeachwith16GBofRAM).However,aftergetting 
numerousMemoryErrors,wegraduallycametorealizethatourcomputerswouldnotbe 
abletorunthecomputationsthatweneed(atleastnotinanefficientandtimely 
manner).Tosolveourproblemweturnedtocloudservices. 
WefirstsetupanEC2instanceonAmazonWebServiceswithabout100GBofRAMand16 
virtualCPUcores,usingapublictutorial.However,runningsuchapowerfulinstance 14
continuouslyprovedcostly;atwodayattempttobuildandrunmodelscostover150USD. 
(Animportantsidenote,oneshouldmakesurethatallitemsthatrelatetothe 
instancecreatedareremovedcompletelytoavoidincurringcharges.Inthecaseofone 
oftheauthorsofthispaper,anextra50USDwaslaterchargedbecausebackupcopiesof 
theinstanceswerenotdeleted.) 
WethendecidedtoswitchtoGoogleCloudComputeservice;buildingasystemwith 
similarspecs,againfollowingapubliclyavailabletutorial.Althoughslightly 15
cheaper,havingadedicatedmachinerunforanentiredayagainprovedcostly, 
incurringabout50USD.Atthispointwedecidedtofindacheapersolutionanddecided 
tolookatAmazon’sSpotInstancesandGoogle’sPreemptibleInstances. 
BothAmazonSpotInstancesandGooglePreemptibleInstancesoperateontheprinciple 
thattheyofferthecompany'ssurpluscomputingpoweratadiscount.Thecaveatbeing 
thatifthereareotherconsumersthatwanttousethiscomputingpower,theinstances 
9
"ImportingHDF5Files-MATLAB&Simulink-MathWorks."2012.20Sep.2016 
<http://www.mathworks.com/help/matlab/import_export/importing-hierarchical-data-format-hdf5-file
s.html> 
10
"scikit-learn:machinelearninginPython— scikit-learn0.17.1..."2011.20Sep.2016 
<http://scikit-learn.org/> 
11
"WhatisNumPy?-NumpyandScipyDocumentation."2009.20Sep.2016 
<http://docs.scipy.org/doc/numpy/user/whatisnumpy.html> 
12
"SciPy.org— SciPy.org."2002.21Sep.2016<http://www.scipy.org/> 
13
"28.12.gc— GarbageCollectorinterface— Python2.7.12..."2014.21Sep.2016 
<https://docs.python.org/2/library/gc.html> 
14
"SettingupAWSforKagglePart1– CreatingafirstInstance– grants..."2016.10Sep.2016 
<http://www.grant-mckinnon.com/?p=6> 
15
"SetupAnaconda+IPython+Tensorflow+JuliaonaGoogle..."2016.10Sep.2016 
<https://haroldsoh.com/2016/04/28/set-up-anaconda-ipython-tensorflow-julia-on-a-google-compute-e
ngine-vm/> 
5 
canbestoppedbythecompanyatanypoint.Thebiggestdifferencebetweenthetwois 
thatAmazonoffersamorebiddingmodelwherethepricesforthecomputingpower 
fluctuates;ifthebidthatthebuyerishigherthanthecurrentmarketprice,the 
instanceremainsactive;howeverifthemarketpriceraisesabovethebid,itisshut 
down.Googleontheotherhandoffersaspecificpricefortheinstance. 16
WeeventuallysettleddownonusingaGooglePreemptibleinstancewith32virtualCPUs 
and208GBofRAM.Wehadtodealwithaprematureshutdownonlyoncewhilerunningthe 
instanceoverthecourseofthreedays.Thetotalcostofthepreemptibleinstancesand 
backupsetccametoabout60USD. 
ThekeyinterfacetotheGoogleCloudinterfacewasacommandpromptterminal,where 
theJupyternotebookwasinitiatedanddatafileswereuploadedandsubmissionfiles 
weredownloadedviaSSH. 
Github 17
GitHubisacodehostingplatformforversioncontrolandcollaborationwhichlets 
peopleworktogetheronprojectsfromanywhere.Weusedthistoworkonourcodesin 
parallelwhileeasilymergingourdevelopments. 
DataExploration 
DefinitionoftheDataSets 
Thedatasetsthatwewereprovidedwithwasasfollows: 
● train.csv— thetrainingset,totalDemanddatafromclientsandproductsper 
weekforweeks3-9;containingthefollowingfields: 
○ Semana-Theweek 
○ Agencia_ID-IDofthestoragefacilityfromwhichtheorderis 
dispatched. 
○ Canal_ID-Thechannelthroughwhichtheorderisplaced. 
○ Ruta_SAK-TherouteIDofthedeliveryroute. 
○ Cliente_ID-TheClientID 
○ Producto_ID-TheProductID 
○ Venta_uni_hoy-Thenumberofitemsthatwereordered 
○ Venta_hoy-Thetotalcostoftheitemsthatwereordered 
○ Dev_uni_proxima-Thenumberofitemsthatwerereturned. 
○ Dev_proxima-Thetotalcostoftheitemsthatwerereturned. 
○ Demanda_uni_equil-Actualdemand(thestockthatwasactuallysold),this 
isthelabelthatweneedtopredictforweeks10and11. 
● test.csv— thetestset,datafromclientsandproductsforweeks10and11 
containingthefields: 
○ Id  
○ Semana 
○ Agencia_ID 
○ Canal_ID 
○ Ruta_SAK 
○ Cliente_ID 
16
"WhatarethekeydifferencesbetweenAWSSpotInstances...-Quora."2015.10Sep.2016 
<https://www.quora.com/What-are-the-key-differences-between-AWS-Spot-Instances-and-Googles-Preem
ptive-Instances> 
17
"HelloWorld·GitHubGuides."2014.20Sep.2016 
<https://guides.github.com/activities/hello-world/> 
6 
○ Producto_ID 
● cliente_tabla.csv— clientnames(canbejoinedwithtrain/testonCliente_ID) 
● producto_tabla.csv— productnames(canbejoinedwithtrain/teston 
Producto_ID) 
● town_state.csv— townandstate(canbejoinedwithtrain/testonAgencia_ID) 
● sample_submission.csv— asamplesubmissionfileinthecorrectformat 
 
Image1:DataStructure 
Noneofthenumericvariablesexistingintraindataareinthetestsetofthedata, 
sotheproblemhereispredictingthedemandwithonly6categoricalfeatures. 
ExploratoryDataAnalysis 
DataTypesandSizes 
Thesizesofthedatafileswereasfollows: 
● town_state.csv 0.03MB 
● train.csv 3199.36MB 
● cliente_tabla.csv 21.25MB 
● test.csv 251.11MB 
● producto_tabla.csv 0.11MB 
● sample_submission.csv 68.88MB 
7 
DistributionsandSummaryofData 
 
Image2:SummaryofTrainData 
 
Image3:SummaryofTrainData(cont.) 
 
Image4:DistributionofTargetVariable 
Targetvariable'smeanis7,medianis3,maxis5000,stdis25and%75ofthedatais 
between0-6.Thisisaclassicalright-skeweddataandthisexplainswhyevaluation 
metricisRMSLE.Moreover,weloggedtargetvariable(log(variable+1))beforestarting 
modelingandthantakeexponentialofitbeforesubmitting(exp(variable)-1). 
8 
Correlations 
 
Image5:ScatterPlotsofKeyVariables 
Inthesescatterplots,weseethatordersarehighlycorrelatedwithdemand,and 
secondly,wheredemandishighreturnsarelow. 
DecreasingtheDataSize 
InordertooptimizeRAMusageandspeedupXGBoost’sperformance,wemadesureto 
forcethetypeofthedatafieldsexplicitly.Wedefinedallourintegerstouse 
unsignedintegerformatanddecreasedtheaccuracyoffloatingpointintegersasmuch 
aspossible.Forexample,Canal_IDcanbeuint8.Afterconversions,memoryisreduced 
to2.1gbfrom6.1gb. 
 
DatawithDefaultDataTypes  DatawithOptimizedDataTypes 
RangeIndex:74180464entries,0to74180463 
Datacolumns(total11columns): 
Semana int64 
Agencia_ID int64 
Canal_ID int64 
RangeIndex:74180464entries,0to74180463 
Datacolumns(total11columns): 
Semana uint8 
Agencia_ID uint16 
Canal_ID uint8 
9 
Ruta_SAK int64 
Cliente_ID int64 
Producto_ID int64 
Venta_uni_hoy int64 
Venta_hoy float64 
Dev_uni_proxima int64 
Dev_proxima float64 
Demanda_uni_equil int64 
dtypes:float64(2),int64(9) 
 
memoryusage:6.1GB 
Ruta_SAK uint16 
Cliente_ID uint32 
Producto_ID uint16 
Venta_uni_hoy uint16 
Venta_hoy float32 
Dev_uni_proxima uint32 
Dev_proxima float32 
Demanda_uni_equil uint32 
dtypes:float32(2),uint16(4),uint32(3), 
uint8(2) 
memoryusage:2.1GB 
 
Models 
NaivePrediction 
Wefirstdecidedtocreateanaiveprediction;forthiswegroupedthetrainingdata 
basedonProductID,ClientID,AgencyIDandRouteID.Wesimplytookthemedianof 
thisgrouping,ifthisspecificgroupingdidnotexist,inthetrainingdataset,we 
defaultedbacktotheproduct’smediandemandandifthisalsodidnotexist,wesimply 
tooktheaverageoftheoveralldemand. 
Score 
Thismethodresultedinascoreof0.73whensubmitted. 
 
NLTKbasedModelling 
FeatureEngineering 
WeutilizedtheNLTKlibrarytoextractthefollowinginformationfromtheProducto 
Tablafile(weusedaslightlymodifiedversionofacodeprovidedbyAndreyVykhodtsev
) 18
● Weight:Ingrams 
● Pieces 
● BrandName:Extractedthroughathreeletteracronym 
● ShortName:ExtractedfromtheProductNamefield.Weprocessedthisinformation 
usingtheNLTKlibrary.WefirstremovedtheSpanish“stopwords”andthenused 
thestemminginordertomakesureonlythecoresofthenamesremained. 
18
"Exploringproducts-Kaggle."2016.10Sep.2016 
<https://www.kaggle.com/vykhand/grupo-bimbo-inventory-demand/exploring-products> 
10 
 
Image6:ProductDataNamesafterpreprocessing 
Modeling 
Wewantedtomodeltextdataandpredict.Herearethestepsthatweretaken: 
1)Separatexandyoftraindata 
2)Appendtestdatatotraindatatoalignthemtohavesamesparseproductfeatures 
order(Iftheydon’thavesamecolumnorder,traininggivesfalseresults). 
3)Mergethisdatawithproducts. 
4)Use“countvectorizer”ofScikit-learnonbrandandshort_namecolumnstocreate 
sparsecount-wordmatricesandappendthemtotrain-testdatahorizontally. 
5)Separateappendedtrainandtestdata. 
6)TrainXgboostwithdefaultparametersontraindataandpredicttestdata. 
TechnicalProblems 
1)GarbageCollection 
Garbagecollectionwasabigproblembecauseofthesizeofthedata.Whenwe 
stoppedusingapythonobject,wehadtoremoveandforcegarbagecollection 
mechanismtofreethismemory.Forthisthethegc librarywasused. 19
2)DataSize 
Beforeusingxgboost,wehad70+millionrecordswith577columns.Holdingthis 
sparsedatainmemorywithdataframewasimpossible.Wesolvedthisissuewith 
sparsematricesofSciPylibrary.  
Intheexamplebelow,insteadofholdingalldataincludingzerosinmemory, 
sparsemethodholdsonlydatadifferentthan0.Therearemanysparsematrices 
methods.Weused“CSR”and“COO”ones. 20
19
"28.12.gc— GarbageCollectorinterface— Python2.7.12..."2014.21Sep.2016 
<https://docs.python.org/2/library/gc.html> 
20
"Sparsematrices(scipy.sparse)— SciPyv0.18.1ReferenceGuide."2008.21Sep.2016 
<http://docs.scipy.org/doc/scipy/reference/sparse.html> 
11 
 
Image7:VisualexplanationofhowtheCOOsparsematrixworks. 
Score 
TheRMSLEscoresobtainedbyusingthismethodwereasfollows: 
Validation  Test10  Test11(Private) 
0.764  0.775  0.781 
Conclusion 
Thesescoresareworsethanthenaiveapproach,sowestartedtothinkaboutanew 
model. 
FinalModels 
DiggingDeeperinDataExploration 
1.Demanda-Dev-VentaRelationship 
Ondatadescriptionpageofcontest,it’ssaidthatDemanda=Venta-Devexcept 
somereturnsituations. 
Whenwequerythisequation,thereare615000recordswhichareexceptionsas 
shownbelow.Itcanmeanthatreturnscanbedoneaftermorethan1week.We 
flaggedtheseproducts.  
 
Image8:Exceptionalcaseswherethenumberofreturnsishigherthanthenumber 
oforders(laggingreturns). 
Secondly,wequeryDemanda=0&Dev=0andthereare199767recordswhich 
includesonlyreturns.Whenwefinddemandmeanofaproduct,theserecordscan 
falsifyourresultsastheyonlyincludereturnvalues. 
12 
 
Image9:Exceptionalcaseswherethenumberofordersanddemandarebothzero. 
2.Train-TestDifference 
Weanalyzedthemissingproducts,clients,agenciesandroutestupleswhichexist 
intrainbutnotintestandviceversaorinspecificfiles. 
Therewere9663clients,34products,0agenciesand1012routesthatdoesn’t 
existintraindata. 
heimportantoutcomeofthisanalysiswasthat:weshouldbuildageneralmodel 
thatcanhandlenewproducts,clientsandrouteswhichdon’texistintraindata 
butintestdata. 
FeatureEngineering 
Inordertoprovideourmodelswithmoreinformation,wehadtoperformsomefeature 
engineering. 
Agencia 
Agenciafileshowseachagency’stownidandstatename.Wecanmergethisfilewith 
trainandtestdataonAgencia_IDcolumnandencodestatecolumnsintointegers. 
 
Image10:AgenciaTableafterprocessing. 
Producto 
WeusedfeaturesfromNLTKmodel,weightsandpieces.Inadditiontothem,weincluded 
shortnamesofproductandbrandid.  
Inthepicturebelow,wecanseesameproductwithdifferentweightsanddifferentids. 
Wetaketheshortnameofthisproducts(wewilladdafeatureliketheyarethesame 
product)andincludethemtothefeatures.Laterwewillseewhy. 
Productfile 
2025,PanBlanco460gWON2025 
2027,PanBlanco567gWON2027 
13 
DemandFeatures 
Thiswasthemostcriticalpartofourdatastructure.Wegenerated4newcolumnsfor 
ourtrainingandtestingdataandnamedthemLag0,Lag1,Lag2andLag3.Weasked 
ourselveswhywehadn’taddedtheproduct’sexdemands.  
Lag0isaspecialcasethatattemptstofindtheaveragedemandforaspecificrow. 
Thisisdonebyattemptingtofindtheaveragebasedonalargenumberofvariables(as 
specificaspossible)andfailingthat,attemptingtofindtheaverageofafewer 
numberofvariables(amorerelaxed,lessaccurateandmoregeneralaverage). 
Forexample, 
● Averagedemandbasedon: 
"Producto_ID","Cliente_ID","Ruta_SAK","Agencia_ID","Canal_ID" 
● Ifthiscombinationisnotfound,attempttofindaveragebasedon: 
"Producto_ID","Cliente_ID","Ruta_SAK","Agencia_ID" 
● Ifthiscombinationisnotfound,attempttofindaveragebasedon: 
"Producto_ID","Cliente_ID","Ruta_SAK" 
● Andsoonandsoforth. 
Thiswasdoneintheorderoffindingvariousaveragesbasedonproductidfirst,then 
fallingbackonaveragesbasedontheshortnamesofproducts(InPanBlancoexample 
above,Ifproduct2025can’tbefound,weused“product2027”instead,thinkingthat 
product2027givesanideaabouttheproduct2025),failingthat,fallingbackon 
averagesbasedonthebrandnames(Inthesameexample,“WON”isused). 
Lag1through3wereconstructedinasimilarfashionbutweremorestrictand 
consideredonlyasingleweek’sdata.Inthesecases,wedidnotwanttocreateany 
informationbasedonBrandnamesasitwouldbetoogeneral.Onlycombinationswith 
productidandproductshortnamewereused.Soforalineoftrainingdatathat 
pertainedtoweek7,Lag1wouldbetheaveragesofthatproductidorproductname 
basedonweek6data;Lag2wouldbeaveragesofweek5data;andsoonandsoforth. 
ClientFeatures 
TheClientFeaturesweremoredifficulttoengineer.Unliketheproducttable,the 
clienttablehadalargenumberofduplicates,whereclientsnamesweremisspelledin 
differentcases.Weremovedtheduplicatesfromtheclienttableandthenusedacode 
snippetprovidedbyAbderRahmanSobh (theprocessmadeuseofusingTF-IDFscoringof 21
theclientnamesandthenmanualselectionofcertainkeywords)inordertoclassify 
theclientsbasedontheirtypes,resultinginthefollowingcategorization: 
● Individual 353145 
● NOIDENTIFICADO 281670 
● SmallFranchise 160501 
● GeneralMarket/Mart 66416 
● Eatery 30419 
● Supermarket 16019 
● OxxoStore 9313 
● Hospital/Pharmacy 5798 
● School 5705 
21
"ClassifyingClientTypeusingClientNames-Kaggle."2016.10Sep.2016 
<https://www.kaggle.com/abbysobh/grupo-bimbo-inventory-demand/classifying-client-type-using-clie
nt-names> 
14 
● Post 2667 
● Hotel 1127 
● FreshMarket 1069 
● GovtStore 959 
● BimboStore 320 
● Walmart 220 
● Consignment 14 
GeneralTotals 
Afterobtainingtheaboveaverages,wealsoincludedthefollowing: 
● TotalVentaperclient(giroofclient) 
● TotalVenta_uni_hoyperclient(totalunitproductsoldbyaclient) 
● Divisionofsumofventa_hoytoventa_uni_hoy(givingtheapproximatepriceper 
unit). 
● DivisionofsumofdemandtosumofVentauni(givingtheratioofgoodsactually 
soldbytheclient,i.e.abilitytosellinventory) 
Thiswasdoneforproductshortnamesandalsoproductids;resultinginanadditional 
12morecolumnsforourtrainingdata.Otheraddedcolumnsareshownbelow: 
● Clientpertown 
● Sumofreturnsofproduct 
● Sumofreturnsofshortnameofaproduct 
Aftereliminatinghighlycorrelatedfeatures(%90),thetrainingdatatablewasas 
follows: 
Int64Index:74180464entries,0to74180463 
Datacolumns(total36columns): 
Semana uint8 
Agencia_ID uint16 
Canal_ID uint8 
Ruta_SAK uint16 
Cliente_ID uint32 
Producto_ID uint16 
Venta_uni_hoy uint16 
Venta_hoy oat32 
Dev_uni_proxima uint32 
Dev_proxima oat32 
Demanda_uni_equil oat64 
Town_ID uint16 
State_ID uint8 
weight uint16 
pieces uint8 
Prod_name_ID uint16 
Brand_ID uint8 
Demanda_uni_equil_original oat64 
DemandaNotEqualTheDifferenceOfVentaUniAndDev bool 
Lag0 oat64 
Lag1 oat64 
Lag2 oat64 
Lag3 oat64 
weightppieces uint16 
Client_Sum_Venta_hoy oat32 
15 
Client_Sum_Venta_uni_hoy oat32 
Client_Sum_venta_div_venta_uni oat32 
prod_name_sum_Venta_hoy oat32 
prod_name_sum_Venta_uni_hoy oat32 
prod_name_sum_venta_div_venta_uni oat32 
Producto_sum_Venta_hoy oat32 
Producto_sum_Venta_uni_hoy oat32 
Producto_sum_venta_div_venta_uni oat32 
Producto_ID_sum_demanda_divide_sum_venta_uni oat64 
Prod_name_ID_sum_demanda_divide_sum_venta_uni oat64 
Cliente_ID_sum_demanda_divide_sum_venta_uni oat64 
memoryusage:10.6GB 
 
ValidationTechnique 
Validationisthemaybethemostcriticalpartofadatascienceproject.Toppriority 
wastonotoverfittingthedata.Weuseddifferentmodelstopredictweek10andweek 
11. 
 
Image11:Structureoftraining,validationandtestmechanism. 
Weused6thand7thweekdataastraining.Ourvalidationforweek10was8th,our 
validationforweek11was9th.Inthelatterone,wedidn’tuseLag1variable,because 
itmeansthatinordertopredictweek11,weshoulduseweek10’sdemand(Lag1ofweek 
11isweek10)whichdoesn’texist.Or,weshouldpredictweek10firstandwiththis 
predicteddemands,wepredictweek11butitcarrieserrorfromweek10toweek11. 
Afterfeatureextractionphaseandaddingfeaturestoeachrecord,wedeletedfirst3 
weeks.Becausetheydon’thaveLag1,Lag2andLag3features. 
Xgboost 
Xgboostcanbegiven2differentdatasets(trainandvalidation).Withplayingwith 
parameters,wecanmakeittrainuntilthevalidationscorestopsincreasingafter“N” 
iterations.Itautomaticallystopsandtellsyouthebestiterationnumberandits 
score. 
Xgboostcanalsogivefeatureimportancesaccordingtothecountsoffeaturesontrees 
ofmodel.Forexample: 
16 
 
Image12:FeatureimportancegraphoffittedtrainingdatabasedonXGboost 
Training 
Westartedtomakemodelsafterdefiningvalidationstrategyandfeatureextraction. 
Features  Validation1 
(Week8) 
Validation2 
(Week9) 
Trial1  0.476226  0.498475 
Trial2:Removinghighlycorrelatedfeatures  0.477067  0.493038 
Trial3:Addinglaginteractions.  0.502514  N/A 
Trial4:Addingmorelaginteractions  0.51825  N/A 
Trial5:Laginteractionsbutremovingmore 
correlatedfeatures 
0.517606  N/A 
Trial6:ReplacingextremevalueswithNAN  0.517467  0.517375 
Trial7:Removinglowimportancefeatures(all 
laginteractionsareremoved) 
0.480394  0.494104 
Trial8:AddingClientTypes  0.48101  0.494804 
Manyothervariationsweretriedbutabandonedduetopoorperformance. 
Interestingly,theoriginaldataset(withengineeredfeaturessuchasaverages,lags 
etc.)resultedinthebestperformance.Thereisacaveathowever,theseattemptswere 
allmadewithafixedsettinginXGBoost,aswillbeseennext,thenumberoftreesmay 
havebeensettoolowinthesetrialstotakeintoaccountthebenefitsofadded 
featuressuchasinteractionsbetweenlagsorclientstypesetc. 
HyperparameterTuning 
Afterselectingthedataset,weproceededwithhyperparametertuningoftheXGBoost 
model. 
TheXGBoostlibraryhasnumerousparameters,theonesthatwereusedfortuningwere: 
17 
MaxDepth: 
Themaximumdepthofthedecisiontrees. 
● Valuestried:10,12,8,6,14,18,20,22 
● OptimalValue:22 
Subsample: 
Thesubsamplingrateofrowsofthedata. 
● Valuestried:1,0.9,0.8,0.6 
● OptimalValue:0.9 
ColSampleByTree: 
Thesubsamplingrateofcolumnsofthedata. 
● Valuestried:0.4,0.3,0.5,0.6,0.8,1 
● OptimalValue:0.4 
LearningRate: 
Thegradientdescentoptimizationparameter(thesizeofeachstep). 
● Valuestried:0.1,0.05 
● OptimalValue:0.05 
 
Features  Validation1 
(Week8) 
Validation2 
(Week9) 
OriginalTraining  0.476226  0.498475 
TrainingafterParameterTuning  0.469628  0.489799 
TechnicalProblems 
StoringData 
“CSV”filetypeisveryslowtoloadandsave.Inadditiontothat,itisn’t 
self-describing.Whenwetrytoloaddatafromit,wehavetodoallconversionsaswe 
didbeforesavingit.Wesearchedforabetterfileformattostorethatmuchdata. 
Firstly,wetried“pickle”librarywhichweusedforstoringxgboostmodelsbecauseof 
self-describingfeature.Butafterfilesizegetsbigger,itstartstogiveerror. 
Secondly,wetried“HDF5”whichisdesignedforstoringbigdataonfile.Itwasboth 
veryfasttoloadandsaveandalsoself-describing.Wepickedthisone. 
RAMProblem 
Duetothesizeofthetrainingandtesttables,itwasnotpossibletoperformthe 
operationsusingourunderpoweredlaptops.Attemptingtojoinlargetablesoruse 
XGBoosttocreatemodelsalwaysresultedinmemoryerrors.Wesolvedthisissueby 
migratingourenvironmenttoGoogleCloudCompute.Weusedlinuxcommandlineprompts 
toinstallAnacondaandrelatedlibrariesandthenlaunchedJupyternotebooktocreate 
adevelopmentenvironment.Atitshighestlevel,ourinstance(with32virtualCPUsand 
18 
208GBRAM)wasperformingat100%CPUloadand40%RAMusage.Trainingandpredicting 
overourfulltrainandtestdatatookmorethan2hours. 
CodeReuseandAutomatization 
Therewerelotsofcodingchallengesforusasfollows: 
● Openingcsvfileswithpredefineddatatypesandnames 
● Handlinghdf5files 
● Addingconfigurablefeatures(Lag0,Lag1,…)todata 
● Automaticallydeletingfirst“N”laggedweeksfromtraindata 
● Appendingtesttotraindata 
● Separatingtestandtraindataautomatically  
● Xgboostconfigurablehyperparametertuning 
● Handlingmemoryissues 
WesolvedthisissueswithObjectOrientedProgrammingwithPython.Thisisthe 
structureofourgeneralclass. 
classFeatureEngineering: 
def__init__(self,ValidationStart,ValidationEnd,trainHdfPath,trainHdfFile, 
testHdfPath1,testHdfPath2,testHdfFile,testTypes,trainTypes,trainCsvPath,testCsvPath, 
maxLag=0) 
def__printDataFrameBasics__(data)   
defReadHdf(self,trainOrTestOrBoth) 
defReadCsv(self,trainOrTestOrBoth) 
defConvertCsvToHdf(csvPath,HdfPath,HdfName,ColumnTypeDict) 
defPreprocess(self,trainOrTestOrBoth,columnFunctionTypeList)   
defSaveDataFrameToHdf(self,trainOrTestOrBoth) 
defAddCon gurableFeaturesToTrain(self,con g) 
defDeleteLaggedWeeksFromTrain(self) 
defReadFirstNRowsOfACsv(self,nrows,trainOrTestOrBoth) 
defAppendTestToTrain(self,deleteTest=True) 
defSplitTrainToTestUsingValidationStart(self) 
Wecanusethisclassbygivingconfigurableparameters. 
 
parameterDict= {"ValidationStart":8,"ValidationEnd":9,"maxLag":3, 
"trainHdfPath":'../../input/train_wz.h5',"trainHdfFile":"train", 
"testHdfPath1":"../../input/test1_wz.h5","testHdfPath2":"../../input/test2_wz.h5", 
"testHdfFile":"test",  
"trainTypes":{'Semana':np.uint8,'Demanda_uni_equil':np.uint32},"testTypes": 
{'id':np.uint32,'Semana':np.uint8,'Agencia_ID':np.uint16}, 
"trainCsvPath":'../../input/train.csv',"testCsvPath":'../../input/test.csv'} 
FE=FeatureEngineering(**parameterDict) 
Toaddcomplexlaggedfeature,webuildanautomationsystemwhichworkswithaconfig 
variable. 
 
con gLag0Target1DeleteColumnsFalse=Con gElements(0,[("SPClRACh0_mean", 
["Producto_ID,"Cliente_ID,"Ruta_SAK,"Agencia_ID,"Canal_ID],["mean"]), 
("SPClRA0_mean",  
["Producto_ID","Cliente_ID","Ruta_SAK","Agencia_ID"],["mean"]), 
("SB0_mean",["Brand_ID"],["mean"])],"Lag0",True) 
FE.AddCon gurableFeaturesToTrain(con gLag0Target1) 
Todohyperparametertuningautomatically,wewroteapythonfunction. 
19 
defaultParams={"max_depth":10,"subsample":1.,"colsample_bytree":0.4,"missing":np.nan, 
"n_estimators":500,"learning_rate":0.1} 
testParams=[("max_depth",[12,8,6,14,16,18,20,22]),("subsample",[0.9,0.8,0.6]), 
("colsample_bytree",[0.3,0.5,0.6,0.8,1]),("learning_rate",[0.05])] 
tParams={"verbose":2,"early_stopping_rounds":10} 
GiveBestParameterWithoutCV(defaultParams,testParams,X_train,X_test,y_train,y_test, 
tParams) 
Results 
Overthecourseofthecontest,wecanname4milestonesubmissions.Thevalidation 
andpublicandprivatescoresofthesesubmissionsareshownbelow. 
Model  Validation1  Validation2  PublicScore  PrivateScore 
Naive 
(averages) 
0.736    0.734  0.754 
Optimizedwith 
ProductData 
viaNLTK 
0.764    0.775  0.781 
XGBoostwith 
default 
parameters 
0.476226  0.498475  0.46949  0.49596 
XGBoostwith 
parameter 
tuning 
0.469628  0.489799  0.46257  0.48666 
Wecanseefromtheresultsthatwedidn’toverfitdataatanypoint. 
Forourfinalsubmissionofpredictions;weachievedascoreof0.48666;placingour 
teaminthetop17%ofthe2000contestants. 
 
 
Lookingoverthescoresofotherparticipants,we’dliketosaythatforfirsttime 
participantsofaKagglecontest,ourresultswereverypromising. 
Conclusion 
WeareextremelyhappythatwepickedaKagglecontestforourproject.Itallowedus 
toworkonacommonrealworldproblemwhilegivingusabenchmarkofourabilitiesin 
theglobalarena. 
Welearnedtoleveragepowerfulcloudcomputingcapabilitiesacrossvariousplatforms 
whilealsolearninghowtomanipulatelargedatasetswithmemoryandcomputingpower 
20 
constraints;whilealsolearninghowtouseXGBoostlibraryfortrainingandtesting 
purposes. 
Wealsolearnedhowtouseimportanttoolslikecommandpromptstolaunchdevelopment 
environmentsandGithubforcodesharingcollaboration. 
CriticalMistakes 
Poordataexploration 
Weperformedverylittledataexplorationonourown.Wemainlydependedonthedata 
explorationthatwasdonebyotherKagglers.Thisresultedinsub-optimalsolutionsin 
ourtrainingandtestingaswedidnotexcludeoutliersetc. 
Notpreparingforsystemoutages 
WefacedoneoutagewhileusingtheGoogleCloudPreemptibleInstance(possiblydueto 
highdemandfromotherclients)whichcausedakeydatafiletobecomecorrupted.The 
re-creationofthisdatafilecostusover5hoursofwork.Inthefuture,itwouldbe 
preferableifthesystemwaslisteningto“shut-down”signalsthataresentbythe 
platformsandtooknecessarystepstopreventthecorruptionofthisdata. 
Performinghyperparametertuningtoolate 
Inourprocessweinitiallyperformedfeatureselectionusingasetofparametersfor 
XGBoostandthenproceededtohyperparametertuningstep.However,itbecameapparent 
thatsomefeatureswerebeinggivenlowerscoresbecauseourinitialsetofparameters 
werenotoptimalforahighnumberoffeaturecolumns.Specifically,thedepthofthe 
treesweresetto6inourinitialfeatureselection;whenthiswasincreasedto22,it 
becameapparentthatthefeaturesthatwereoriginallydroppedcouldhavebeengood 
predictors. 
FurtherExploration 
Ifwehadmoretimeandresources,wewouldhavelikedtoundertakeadditionalactions. 
PartialFitting 
Whenfacedwiththememoryproblemwedecidedtousecloudservices.However,another 
methodwouldhavebeenloadingandprocessingthedatainsmallerbatches.Thiswould 
beamorescalablemodelandcouldevenbeusedtocreateaclusterofcloudmachines 
toperformoperationsinparallel. 
MultipleModels 
AlthoughXGBoostisaveryeffectivetool,itgivesasinglemodel(orinourcase,2 
modelsoneforeachweek).Wewouldliketoexplorethepossibilityofcreatinga 
largernumberofmodelsusingdifferentsystemsandseeinghowtheyperformfor 
differentslicesofdata.Wewouldthentakesomesortofweightedaverageofthese 
predictionstoreachourfinalprediction. 
Asanextensiontothisidea,wewouldalsoperformparametertuningacrossthese 
variousmodelstofindoptimalsolutionsforeachone. 
NeuralNetworks 
Wewouldalsohaveliketoapproachthisproblemwithaneuralnetworksolutiontosee 
theaccuracyofthepredictionsandalsotheperformanceoftheneuralnetworksolution 
vstheXGBoosttool. 
21 

More Related Content

Similar to DA 592 - Term Project Report - Berker Kozan Can Koklu - Kaggle Contest

Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesScott Clark
 
Introduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ AnelenIntroduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ AnelenDaigo Tanaka, Ph.D.
 
How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...Mariano Zelaya Feijoo
 
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLHOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLIRJET Journal
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsPaco Nathan
 
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017Codemotion
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...LDBC council
 
Performance testing with 5 yrs experience
Performance testing with 5 yrs experiencePerformance testing with 5 yrs experience
Performance testing with 5 yrs experienceNavajeevan Reddy
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUMEHan Yan
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt
 
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and Services
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and ServicesfOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and Services
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and ServicesSpagoWorld
 
Prachi Pandit_Selenium
Prachi Pandit_SeleniumPrachi Pandit_Selenium
Prachi Pandit_SeleniumPrachi Pandit
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsScott Clark
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsSigOpt
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsIRJET Journal
 

Similar to DA 592 - Term Project Report - Berker Kozan Can Koklu - Kaggle Contest (20)

Using Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning PipelinesUsing Optimal Learning to Tune Deep Learning Pipelines
Using Optimal Learning to Tune Deep Learning Pipelines
 
chapter1.pdf
chapter1.pdfchapter1.pdf
chapter1.pdf
 
Introduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ AnelenIntroduction to Daigo Tanaka @ Anelen
Introduction to Daigo Tanaka @ Anelen
 
Resume kartikeya sharma
Resume kartikeya sharmaResume kartikeya sharma
Resume kartikeya sharma
 
How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...
 
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND MLHOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
VIJAYALAKSHMI V
VIJAYALAKSHMI VVIJAYALAKSHMI V
VIJAYALAKSHMI V
 
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
Gabriele Nocco - Massive distributed processing with H2O - Codemotion Milan 2017
 
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
8th TUC Meeting – George Fletcher (TU Eindhoven), gMark: Schema-driven data a...
 
Performance testing with 5 yrs experience
Performance testing with 5 yrs experiencePerformance testing with 5 yrs experience
Performance testing with 5 yrs experience
 
MY NEWEST RESUME
MY NEWEST RESUMEMY NEWEST RESUME
MY NEWEST RESUME
 
SigOpt for Hedge Funds
SigOpt for Hedge FundsSigOpt for Hedge Funds
SigOpt for Hedge Funds
 
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and Services
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and ServicesfOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and Services
fOSSa 2010 - Spago4Q: OSS for Quality Monitoring in IT Projects and Services
 
Prachi Pandit_Selenium
Prachi Pandit_SeleniumPrachi Pandit_Selenium
Prachi Pandit_Selenium
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Using Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning ModelsUsing Bayesian Optimization to Tune Machine Learning Models
Using Bayesian Optimization to Tune Machine Learning Models
 
Runtime Behavior of JavaScript Programs
Runtime Behavior of JavaScript ProgramsRuntime Behavior of JavaScript Programs
Runtime Behavior of JavaScript Programs
 
Irgan
IrganIrgan
Irgan
 
HARSH RESUME
HARSH RESUMEHARSH RESUME
HARSH RESUME
 

Recently uploaded

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

DA 592 - Term Project Report - Berker Kozan Can Koklu - Kaggle Contest