SlideShare a Scribd company logo
1 of 7
Questions from paper
"A Few Useful Things to Know about Machine Learning"
Reference: http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf
By:
Akhilesh Joshi
mail: akhileshjoshi123@gmail.com
1. Whatis the definitionofML?
Machine Learningisart of usingthe existingdata(historicalandpresent) toforecast/predict
ideal solutionswiththe helpof implementingstatistical modelswithout(orlesser) manual
intervention.HowevertechniquesforMachine Learningare still in development, itisone of the
importantconceptinfieldof datascience withvariousapplicationsthatwill be helpful to
mankind.
2. What is a classifier?
A Classifierisasystemwhere we provide inputstothe system(inputsmaybe discrete orcanbe
continuous) andclassifiergivesusanoutput.The data that we provide tothe classifieriscalled
as trainingdata.So the mainaim of classifieristoprovide anoutputbasedonour trainingdata
and that outputwill correctlyclassifyourtestdatato get more ideal results.
3. What are the 3 components ofa learningsystem,according to the author? Explainthem
briefly.
There are 3 componentsdescribedaboutthe learningsystem.Theyare asfollows.
a. Representation
 Representationisveryimportantaspectforapplicationof ML on our setof data. Here
we understandhowwe shouldrepresentdatasothat it will fitperfectly.Forexample,a
decisiontree mightbe suitedperfectlyforourdata whereasthere canbe neural
networksthatare bestsuitedforotherdata.
b. Evaluation
 Evaluationhelpsusdeterminingthe goodclassifiersforthe badclassifiers.Good
classifiersare those whichprovide arightsetof hypothesisthatare bestsuitedforour
testdata. For studentdatawe mightneed “Likelihood”evaluationparameterforgetting
a job ratherthan “Precisionandrecall”evaluationforgettingajob.Evaluationstep
helpsusindeterminingthe same.
c. Optimization
 Out of all the possible outcomesforourhypothesiswe have to decide whichof the
hypothesisprovide uswithoptimal solutionforourtestdata. Here we use the best
suitedhypothesisforarrivingatmostideal solution.
4. What is informationgain?
Giventhe numberof attributeswe have todecide the attribute which hasmaximum
informationgain.We calculate the average entropyandcompare the sumof entropiestothe
original set.Thiswill helpustobuilda decisiontree.Where anattribute withhighest
informationgainwillbe atroot node,thenagainwe subdivide the furthertree nodesby
comparingthe informationgainsw.r.ttothe root attribute thatwe have alreadychosen.The
orderthe splitsinadecisiontree isindecreasingorderof informationgain.
Formulaforinformationgain:
IG (A) = H(S) - Σv (Sv/S) H (Sv)
IG (A):InformationgainIGoverattribute A
H(S):entropyof all examples
H (Sv):entropyof one subsample afterpartitioningS
5. Whyis generalizationmore important than justgetting a good result on trainingdata i.e.the
data that was usedto train the classifier?
 Usingtrainingdata providesusaninsighthow our data lookslike.Sotrainingour
machine learningalgorithmsonthatparticularsetof data won’tguarantee the
algorithmtoworkcorrectlyon the test data.There mightbe a case where our testdata
iscomplete differentthanourtrainingdata andthe outputmaybe notas desired.So
we have to considerbothscenarioswhere ouralgorithmwill workonbothourtraining
data and testdata. Hence the conceptof generalization.
6. What is cross-validation?Whatare its advantages?
 Giventhe trainingdataS and hypothesisclassH(itcontainsall the possible hypothesis)
we have to find h (correcthypothesisforourdata).So to findhcorrectlywe make the
use of crossvalidationprocesstohave a data withmaximumadvantage.
Advantages of cross-validation
 Data is testedonbothtrainingandtestdata givingthe algorithmclearinsightsabout
the type of data that itmightsee or use for evaluationpurpose
 We can setaside our trainingdataas a part of our testingdatawhichhelpsusto use
that testdata for testingthe workingof ouralgorithmtogive desiredideal solutions.
 Since alreadyaset of data that is setaside asour test data,we neednothave to worry
abouthavinga test data.
Illustrationof cross-validation:
7. How is generalizationdifferentfromotheroptimizationproblems?
 Optimizationproblemsare more alignedtothe datathat is alreadyknown.Whereasin
generalizationwe have toassume the errorsandfindings fromourtrainingdatathat
will helpustoinferabouttestdata or at leastwill tryto infersomethingabouttestdata.
Since optimizationdealswithmore ideal situationswhere mostof the thingsare known
alreadywe can expectthe outputsasdesired, whichisnotthe case of generalization
problems.
8. If you have a scenario where the functioninvolves10 Boolean variables,how many possible
examples(calledinstance space) can there be?If you see 100 examples,whatpercentage of the
instance space have you seen?
 Numberof instancescanbe definedby2N
(where N isthe no.of Booleanvariables).So
inour case total instanceswill be 210
i.e. 1024 instances.Now we are givenwithonly100
examplessowe will be able tosee only 9.76% of instance space.
9. What is the "no free lunch" theoremin machine learning?You can do a Google searchif the
paper isn'tclear enough.
 NO FREE lunchsuggeststhatno learningalgorithmisinherentlysuperior toother
algorithms.If analgorithmisperformingwell inparticularclassof problem, thenit
shouldbe performingworstinotherclassof a problemi.e.performance here is
compensated.If we average the errorforall possible weightinan algorithm, thenwe
will getdifference inexpectederrorsasZERO betweenthosetwoalgorithms.
10. What general assumptionsallow us to carry out the machine learningprocess? What isthe
meaningof induction?
 Inductionismakingthe use of available knowledge toturnitintolarge amountof
knowledge.
11. How is learninglike farming?
 Farmingismore kindof dependentactivitywhere itdependsonNature.Alongwiththe
helpof Nature farmerscombine seedswithnutrientstogrow crops.In similarmanner
to grow programs(like crops),alearninghastocombine knowledge (logic) withdata for
growingthe programs.
12. What is overfitting?Howdoes it leadto a wrong idea that you have done a reallygood job on
training dataset?
 Overfittingiswhenmodel learnsfrommore trainingdata.Whenwe have more training
data thenthe model getsusedtothe characteristicsof the trainingdatawhicheven
includesthe noise anderrorof it.Now whenitcomesto applythe learningthatmodel
learnedontrainingdata,the resultsare not as expectedandthe model mightnotwork
well onthe testdata. It negativelyimpactsonmodelsabilitytogeneralize.Itishighly
likelythatwe will gettestdatasame as our trainingdata.
13. What is meantby biasand variance? Youdon't have to be really precise indefiningthem,just
get the idea.
 Bias: Learners erroneousassumptionsinlearningalgorithms.Low Bias→ more
assumptionsHighBias→ lessassumptions
 Variance:Amountof estimate foramodel tochange withdifferenttrainingdataisused.
14. What are some of the thingsthat can helpcombat overfitting?
 Use of followingtechniquesmighthelpincombatingoverfitting
 cross-validation
 Addinga regularizationtermtothe evaluationfunction.
 performa statistical significancetestlike chi-squarebeforeaddingnew
structure
15. Whydo algorithmsthat work well in lowerdimensionsfail at higherdimensions?Thinkabout
the numberof instancespossible inhigher dimensionsandthe cost of similaritycalculation
 As the dimensionsincreasethe amountof datathat is requiredtotraina model
(inthiscase algorithm) the amountof data neededgrowsexponentially.Ina
wayalgorithmswithlowerdimensionscangeneralize (keepsyncintrainingand
testdata) ina betterwaythan dealingwithmaintaininggeneralizationwith
higherdimensionality. Same phenomenonisexplainedby“Curse of
Dimensionality”
16. What is meantby "blessingofnon-uniformity"?
 Thisrefersto the fact that observationsfromreal-worlddomainsare oftennot
distributeduniformly,butgroupedorclusteredinuseful andmeaningful ways.
17. What has beenone of the major developmentsinthe recent decadesabout resultsof
induction?
 One of the majordevelopmentsisthatwe canhave guaranteesonthe resultsof
induction,particularlyif we’re willingtosettle forprobabilisticguarantees.
18. What is the most important factor that determineswhetheramachine learningproject
succeeds?
- Successof the projectdependsuponnumberof featuresused.If we have many
independentfeaturesthateachcorrelate wellwiththe class,learningiseasy.Onthe other
hand,if the classisa verycomplex functionof the features,we maynotbe able tolearnit.
19. In a ML project,which is more time consuming – feature engineeringorthe actual learning
process?Explain how ML is an iterative process?
 Feature engineeringformsthe more time consumingprocessformachine
learningsince itdealswithmanythingssuchasgatheringdata,cleaningitand
pre-processit.
 In ML we have to carry out certaintasksiterativelysuchasrunningthe learner,
analyzingthe results,modifyingthe dataandthe learner.Hence itisan iterative
process.
20. What, according to the author, is one of the holygrails of ML?
 Accordingto the author,the processof automatingfeature engineering
processesisthe holygrails.Itcan be done by generatinglarge no.of candidate
featuresandselectingthe bestbasedontheirinformationgainw.r.tclass.Butit
has some limitations.
21. If your ML solutionis not performingwell,what are two thingsthat you can do?Which one is
a betteroption?
When an ML solutiondoesnotperformwell we have twomainchoices
. To Designa betterlearneralgorithm
. Gathermore data.
It isalwaysbetterif we go forcollectingmore databecause a dumbalgorithmwith more and
more data beatsa cleveralgorithmwithmodestamountof data
22. What are the 3 limitedresourcesin ML computations? Whatis the bottlenecktoday? What is
one of the solutions?
The 3 limitedresourcesinMLcomputationsare:
. Time
. Memory
. TrainingData
The bottleneckhaschangedfromdecade todecade and todayit is“Time”. If there ismore data
thenit takesverylongto processitand learnthe complex algorithm.Sothe onlysolutionfor
thisisto come upwitha fasterwayto learnthe complex classifiers.
23. A surprisingfact mentionedbythe author is that all representations(typesoflearners)
essentially"all dothe same".Can you explain?Whichlearnersshouldyou try first?
All learnersworkbygroupingnearbyexamplesintothe same class,the keydifference
isin the meaningof nearby.Withnon-uniformlydistributeddata,learnerscanproduce widely
differentfrontierswhile still makingthe same predictionsinthe regionsthatmatter.
It isbetterto try the simplestlearnersfirst.Complexlearnersare usuallyhardertouse,because
theyhave more knobsyouneedto turnto get goodresults,andbecause theirinternalsare
more opaque
24. The author divideslearnersinto two typesbased on theirrepresentationsize.Write a brief
summary.
Accordingtothe authorthere are twotypesof learnersbasedonrepresentationsize.
1) Learners withfixedrepresentationsize
2) Learners whose representationsize growswithdata
Fixed-sizelearnerscanonlytake advantage of so muchdata. Variable-sizelearnerscanin
principle learnanyfunctiongivensufficientdata,butinpractice theymaynot,because of
limitationsof the algorithmorcomputational costorthe curse of dimensionality. Forthese
reasons,cleveralgorithmsthose thatmake the mostof the data andcomputingresources
available oftenpayoff inthe end.
25. Is it betterto have variation of a single model or a combination ofdifferentmodels,knownas
ensemble orstacking? Explainbriefly.
Researchersnoticedthat,if insteadof selectingthe bestvariationfound,we combine many
variations,the resultsare oftenmuchbetterandat little extraeffortforthe user.Inensembling
we generate randomvariationsof the trainingsetbyresampling,learnaclassifieroneach,and
combine the resultsbyvoting.Thisworksbecause itgreatlyreducesvariancewhile onlyslightly
increasingbias.
26. Read the last paragraph and explainwhy it makessense to prefersimpleralgorithms and
hypotheses.
Whenthe complexityiscomparedtothe size of hypothesisspace,smallerspacesof hypotheses
are allowedtobe representedinshortercodes.A learnerwithalargerhypothesisspace that
triesfewerhypothesesfromitislesslikelytooverfitthanone thattriesmore hypothesesfroma
smallerspace.Soitmakessense toprefersimpleralgorithmsandhypothesesasmore the
numberof assumptions tomake,more unlikelyexplanationis.
27. It has beenestablishedthat correlationbetweenindependentvariablesandpredicted
variablesdoes not implycausation, still correlation isused by many researchers.Explainbrieflythe
reason.
In a predictionstudy,the goal istodevelopaformulaformakingpredictionsaboutthe
dependentvariable,basedonthe observedvaluesof the independentvariables. Ina causal analysis,the
independentvariablesare regardedascausesof the dependentvariable. Manylearningalgorithmscan
potentiallyextractcausal informationfromobservational data,buttheirapplicabilityisratherrestricted.
To findcausation,yougenerallyneedexperimental data,notobservational data.Correlation isa
necessarybutnotsufficientconditionforcausation. Correlationisavaluable type of scientificevidence
infieldssuchasmedicine,psychology,andsociology.Butfirstcorrelationsmustbe confirmedasreal,
and theneverypossible causativerelationshipmustbe systematicallyexplored.Inthe endcorrelation
can be usedaspowerful evidence foracause-and-effectrelationshipbetweenatreatmentandbenefit,
a risk factorand a disease,ora social or economicfactorand variousoutcomes.

More Related Content

What's hot

Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learningbutest
 
Machine Learning
Machine LearningMachine Learning
Machine Learningbutest
 
Hot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesisHot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesisWriteMyThesis
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningKmPooja4
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answerskavinilavuG
 
Machine Learning: Machine Learning: Introduction Introduction
Machine Learning: Machine Learning: Introduction IntroductionMachine Learning: Machine Learning: Introduction Introduction
Machine Learning: Machine Learning: Introduction Introductionbutest
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..butest
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningSangath babu
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningPramit Choudhary
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using PythonSavitaHanchinal
 
Machine Learning an Research Overview
Machine Learning an Research OverviewMachine Learning an Research Overview
Machine Learning an Research OverviewKathirvel Ayyaswamy
 
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Madhav Mishra
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.pptbutest
 
Machine Learning Interview Questions Answers
Machine Learning Interview Questions AnswersMachine Learning Interview Questions Answers
Machine Learning Interview Questions AnswersShareDocView.com
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine LearningKnoldus Inc.
 

What's hot (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Basics of Machine Learning
Basics of Machine LearningBasics of Machine Learning
Basics of Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Hot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesisHot Topics in Machine Learning For Research and thesis
Hot Topics in Machine Learning For Research and thesis
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
Machine Learning: Machine Learning: Introduction Introduction
Machine Learning: Machine Learning: Introduction IntroductionMachine Learning: Machine Learning: Introduction Introduction
Machine Learning: Machine Learning: Introduction Introduction
 
DagdelenSiriwardaneY..
DagdelenSiriwardaneY..DagdelenSiriwardaneY..
DagdelenSiriwardaneY..
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Machine Learning Using Python
Machine Learning Using PythonMachine Learning Using Python
Machine Learning Using Python
 
Machine Learning an Research Overview
Machine Learning an Research OverviewMachine Learning an Research Overview
Machine Learning an Research Overview
 
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
Applied Artificial Intelligence Unit 5 Semester 3 MSc IT Part 2 Mumbai Univer...
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
Machine Learning Interview Questions Answers
Machine Learning Interview Questions AnswersMachine Learning Interview Questions Answers
Machine Learning Interview Questions Answers
 
Learning
LearningLearning
Learning
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
 

Viewers also liked

Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892Deepak Sharma
 
Lengua anuncio
Lengua anuncioLengua anuncio
Lengua anunciofranky226
 
Direct Relief FY 2014 Annual Report
Direct Relief FY 2014 Annual ReportDirect Relief FY 2014 Annual Report
Direct Relief FY 2014 Annual ReportDirect Relief
 
Nascent Financial Services
Nascent Financial ServicesNascent Financial Services
Nascent Financial Servicesnascentfs
 
Hum2220 sm2015 syllabus
Hum2220 sm2015 syllabusHum2220 sm2015 syllabus
Hum2220 sm2015 syllabusProfWillAdams
 
Obesità e stress ossidativo: una relazione pericolosa.
Obesità e stress ossidativo: una relazione pericolosa. Obesità e stress ossidativo: una relazione pericolosa.
Obesità e stress ossidativo: una relazione pericolosa. CreAgri Europe
 
Hum2310 fa2014 exam 4 study guide
Hum2310 fa2014 exam 4 study guideHum2310 fa2014 exam 4 study guide
Hum2310 fa2014 exam 4 study guideProfWillAdams
 
Ruolo dello stress ossidativo nei vari stadi della psoriasi
Ruolo dello stress ossidativo nei vari stadi della psoriasiRuolo dello stress ossidativo nei vari stadi della psoriasi
Ruolo dello stress ossidativo nei vari stadi della psoriasiCreAgri Europe
 
Hum2310 sp2015 proust questionnaire
Hum2310 sp2015 proust questionnaireHum2310 sp2015 proust questionnaire
Hum2310 sp2015 proust questionnaireProfWillAdams
 
Виртуализирано видеонаблюдение под FreeBSD
Виртуализирано видеонаблюдение под FreeBSDВиртуализирано видеонаблюдение под FreeBSD
Виртуализирано видеонаблюдение под FreeBSDOpenFest team
 
Hum2310 sp2016 annotated study guide
Hum2310 sp2016 annotated study guideHum2310 sp2016 annotated study guide
Hum2310 sp2016 annotated study guideProfWillAdams
 
2007 Spring Newsletter
2007 Spring Newsletter2007 Spring Newsletter
2007 Spring NewsletterDirect Relief
 
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbolsfrankjia
 

Viewers also liked (20)

Deborap
DeborapDeborap
Deborap
 
Power point harp seal
Power point harp sealPower point harp seal
Power point harp seal
 
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
Cssbestpracticesjstyleguidejandtips 150830184202-lva1-app6892
 
Lengua anuncio
Lengua anuncioLengua anuncio
Lengua anuncio
 
Direct Relief FY 2014 Annual Report
Direct Relief FY 2014 Annual ReportDirect Relief FY 2014 Annual Report
Direct Relief FY 2014 Annual Report
 
Nascent Financial Services
Nascent Financial ServicesNascent Financial Services
Nascent Financial Services
 
Hum2220 sm2015 syllabus
Hum2220 sm2015 syllabusHum2220 sm2015 syllabus
Hum2220 sm2015 syllabus
 
KEPERCAYAAN GURU
KEPERCAYAAN GURU KEPERCAYAAN GURU
KEPERCAYAAN GURU
 
Obesità e stress ossidativo: una relazione pericolosa.
Obesità e stress ossidativo: una relazione pericolosa. Obesità e stress ossidativo: una relazione pericolosa.
Obesità e stress ossidativo: una relazione pericolosa.
 
Hum2310 fa2014 exam 4 study guide
Hum2310 fa2014 exam 4 study guideHum2310 fa2014 exam 4 study guide
Hum2310 fa2014 exam 4 study guide
 
Ruolo dello stress ossidativo nei vari stadi della psoriasi
Ruolo dello stress ossidativo nei vari stadi della psoriasiRuolo dello stress ossidativo nei vari stadi della psoriasi
Ruolo dello stress ossidativo nei vari stadi della psoriasi
 
Hum2310 sp2015 proust questionnaire
Hum2310 sp2015 proust questionnaireHum2310 sp2015 proust questionnaire
Hum2310 sp2015 proust questionnaire
 
Hotmail
HotmailHotmail
Hotmail
 
Виртуализирано видеонаблюдение под FreeBSD
Виртуализирано видеонаблюдение под FreeBSDВиртуализирано видеонаблюдение под FreeBSD
Виртуализирано видеонаблюдение под FreeBSD
 
Progetto mamma si (1)
Progetto mamma si (1)Progetto mamma si (1)
Progetto mamma si (1)
 
My day Jesus
My day JesusMy day Jesus
My day Jesus
 
Hum2310 sp2016 annotated study guide
Hum2310 sp2016 annotated study guideHum2310 sp2016 annotated study guide
Hum2310 sp2016 annotated study guide
 
Node.social
Node.socialNode.social
Node.social
 
2007 Spring Newsletter
2007 Spring Newsletter2007 Spring Newsletter
2007 Spring Newsletter
 
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
華語教學必用的雙拼快注音Instant bopomo chinese phonetic symbols
 

Similar to Machine learning (domingo's paper)

Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxShubhamBishnoi14
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine LearningVedaj Padman
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learningJohnson Ubah
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learningShareDocView.com
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningSujith Jayaprakash
 
ML crash course
ML crash courseML crash course
ML crash coursemikaelhuss
 
machine learning
machine learningmachine learning
machine learningMounisha A
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationAnkit Gupta
 
Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsRamsha Ijaz
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfAnanthReddy38
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...PhD Assistance
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Ankit Gupta
 

Similar to Machine learning (domingo's paper) (20)

Training_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docxTraining_Report_on_Machine_Learning.docx
Training_Report_on_Machine_Learning.docx
 
An Introduction to Machine Learning
An Introduction to Machine LearningAn Introduction to Machine Learning
An Introduction to Machine Learning
 
introduction to machine learning
introduction to machine learningintroduction to machine learning
introduction to machine learning
 
Andrew NG machine learning
Andrew NG machine learningAndrew NG machine learning
Andrew NG machine learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
ML crash course
ML crash courseML crash course
ML crash course
 
machine learning
machine learningmachine learning
machine learning
 
Eckovation Machine Learning
Eckovation Machine LearningEckovation Machine Learning
Eckovation Machine Learning
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
Machine Learning - Deep Learning
Machine Learning - Deep LearningMachine Learning - Deep Learning
Machine Learning - Deep Learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Machine learning: A Walk Through School Exams
Machine learning: A Walk Through School ExamsMachine learning: A Walk Through School Exams
Machine learning: A Walk Through School Exams
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
Machine Learning On Big Data: Opportunities And Challenges- Future Research D...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2Intro/Overview on Machine Learning Presentation -2
Intro/Overview on Machine Learning Presentation -2
 
Machine learning
 Machine learning Machine learning
Machine learning
 

More from Akhilesh Joshi

PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learningAkhilesh Joshi
 
random forest regression
random forest regressionrandom forest regression
random forest regressionAkhilesh Joshi
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 
support vector regression
support vector regressionsupport vector regression
support vector regressionAkhilesh Joshi
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regressionAkhilesh Joshi
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regressionAkhilesh Joshi
 
simple linear regression
simple linear regressionsimple linear regression
simple linear regressionAkhilesh Joshi
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r squareAkhilesh Joshi
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)Akhilesh Joshi
 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and RAkhilesh Joshi
 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonAkhilesh Joshi
 
Bastion Host : Amazon Web Services
Bastion Host : Amazon Web ServicesBastion Host : Amazon Web Services
Bastion Host : Amazon Web ServicesAkhilesh Joshi
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduceAkhilesh Joshi
 
Google knowledge graph
Google knowledge graphGoogle knowledge graph
Google knowledge graphAkhilesh Joshi
 
SoLoMo - Future of Marketing
SoLoMo - Future of MarketingSoLoMo - Future of Marketing
SoLoMo - Future of MarketingAkhilesh Joshi
 

More from Akhilesh Joshi (20)

PCA and LDA in machine learning
PCA and LDA in machine learningPCA and LDA in machine learning
PCA and LDA in machine learning
 
random forest regression
random forest regressionrandom forest regression
random forest regression
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
support vector regression
support vector regressionsupport vector regression
support vector regression
 
polynomial linear regression
polynomial linear regressionpolynomial linear regression
polynomial linear regression
 
multiple linear regression
multiple linear regressionmultiple linear regression
multiple linear regression
 
simple linear regression
simple linear regressionsimple linear regression
simple linear regression
 
R square vs adjusted r square
R square vs adjusted r squareR square vs adjusted r square
R square vs adjusted r square
 
K fold
K foldK fold
K fold
 
Grid search (parameter tuning)
Grid search (parameter tuning)Grid search (parameter tuning)
Grid search (parameter tuning)
 
svm classification
svm classificationsvm classification
svm classification
 
knn classification
knn classificationknn classification
knn classification
 
logistic regression with python and R
logistic regression with python and Rlogistic regression with python and R
logistic regression with python and R
 
Data preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and PythonData preprocessing for Machine Learning with R and Python
Data preprocessing for Machine Learning with R and Python
 
Design patterns
Design patternsDesign patterns
Design patterns
 
Bastion Host : Amazon Web Services
Bastion Host : Amazon Web ServicesBastion Host : Amazon Web Services
Bastion Host : Amazon Web Services
 
Design patterns in MapReduce
Design patterns in MapReduceDesign patterns in MapReduce
Design patterns in MapReduce
 
Google knowledge graph
Google knowledge graphGoogle knowledge graph
Google knowledge graph
 
SoLoMo - Future of Marketing
SoLoMo - Future of MarketingSoLoMo - Future of Marketing
SoLoMo - Future of Marketing
 
Webcrawler
WebcrawlerWebcrawler
Webcrawler
 

Recently uploaded

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 

Recently uploaded (20)

Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 

Machine learning (domingo's paper)

  • 1. Questions from paper "A Few Useful Things to Know about Machine Learning" Reference: http://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf By: Akhilesh Joshi mail: akhileshjoshi123@gmail.com
  • 2. 1. Whatis the definitionofML? Machine Learningisart of usingthe existingdata(historicalandpresent) toforecast/predict ideal solutionswiththe helpof implementingstatistical modelswithout(orlesser) manual intervention.HowevertechniquesforMachine Learningare still in development, itisone of the importantconceptinfieldof datascience withvariousapplicationsthatwill be helpful to mankind. 2. What is a classifier? A Classifierisasystemwhere we provide inputstothe system(inputsmaybe discrete orcanbe continuous) andclassifiergivesusanoutput.The data that we provide tothe classifieriscalled as trainingdata.So the mainaim of classifieristoprovide anoutputbasedonour trainingdata and that outputwill correctlyclassifyourtestdatato get more ideal results. 3. What are the 3 components ofa learningsystem,according to the author? Explainthem briefly. There are 3 componentsdescribedaboutthe learningsystem.Theyare asfollows. a. Representation  Representationisveryimportantaspectforapplicationof ML on our setof data. Here we understandhowwe shouldrepresentdatasothat it will fitperfectly.Forexample,a decisiontree mightbe suitedperfectlyforourdata whereasthere canbe neural networksthatare bestsuitedforotherdata. b. Evaluation  Evaluationhelpsusdeterminingthe goodclassifiersforthe badclassifiers.Good classifiersare those whichprovide arightsetof hypothesisthatare bestsuitedforour testdata. For studentdatawe mightneed “Likelihood”evaluationparameterforgetting a job ratherthan “Precisionandrecall”evaluationforgettingajob.Evaluationstep helpsusindeterminingthe same. c. Optimization  Out of all the possible outcomesforourhypothesiswe have to decide whichof the hypothesisprovide uswithoptimal solutionforourtestdata. Here we use the best suitedhypothesisforarrivingatmostideal solution. 4. What is informationgain? Giventhe numberof attributeswe have todecide the attribute which hasmaximum informationgain.We calculate the average entropyandcompare the sumof entropiestothe original set.Thiswill helpustobuilda decisiontree.Where anattribute withhighest informationgainwillbe atroot node,thenagainwe subdivide the furthertree nodesby comparingthe informationgainsw.r.ttothe root attribute thatwe have alreadychosen.The orderthe splitsinadecisiontree isindecreasingorderof informationgain.
  • 3. Formulaforinformationgain: IG (A) = H(S) - Σv (Sv/S) H (Sv) IG (A):InformationgainIGoverattribute A H(S):entropyof all examples H (Sv):entropyof one subsample afterpartitioningS 5. Whyis generalizationmore important than justgetting a good result on trainingdata i.e.the data that was usedto train the classifier?  Usingtrainingdata providesusaninsighthow our data lookslike.Sotrainingour machine learningalgorithmsonthatparticularsetof data won’tguarantee the algorithmtoworkcorrectlyon the test data.There mightbe a case where our testdata iscomplete differentthanourtrainingdata andthe outputmaybe notas desired.So we have to considerbothscenarioswhere ouralgorithmwill workonbothourtraining data and testdata. Hence the conceptof generalization. 6. What is cross-validation?Whatare its advantages?  Giventhe trainingdataS and hypothesisclassH(itcontainsall the possible hypothesis) we have to find h (correcthypothesisforourdata).So to findhcorrectlywe make the use of crossvalidationprocesstohave a data withmaximumadvantage. Advantages of cross-validation  Data is testedonbothtrainingandtestdata givingthe algorithmclearinsightsabout the type of data that itmightsee or use for evaluationpurpose  We can setaside our trainingdataas a part of our testingdatawhichhelpsusto use that testdata for testingthe workingof ouralgorithmtogive desiredideal solutions.  Since alreadyaset of data that is setaside asour test data,we neednothave to worry abouthavinga test data. Illustrationof cross-validation:
  • 4. 7. How is generalizationdifferentfromotheroptimizationproblems?  Optimizationproblemsare more alignedtothe datathat is alreadyknown.Whereasin generalizationwe have toassume the errorsandfindings fromourtrainingdatathat will helpustoinferabouttestdata or at leastwill tryto infersomethingabouttestdata. Since optimizationdealswithmore ideal situationswhere mostof the thingsare known alreadywe can expectthe outputsasdesired, whichisnotthe case of generalization problems. 8. If you have a scenario where the functioninvolves10 Boolean variables,how many possible examples(calledinstance space) can there be?If you see 100 examples,whatpercentage of the instance space have you seen?  Numberof instancescanbe definedby2N (where N isthe no.of Booleanvariables).So inour case total instanceswill be 210 i.e. 1024 instances.Now we are givenwithonly100 examplessowe will be able tosee only 9.76% of instance space. 9. What is the "no free lunch" theoremin machine learning?You can do a Google searchif the paper isn'tclear enough.  NO FREE lunchsuggeststhatno learningalgorithmisinherentlysuperior toother algorithms.If analgorithmisperformingwell inparticularclassof problem, thenit shouldbe performingworstinotherclassof a problemi.e.performance here is compensated.If we average the errorforall possible weightinan algorithm, thenwe will getdifference inexpectederrorsasZERO betweenthosetwoalgorithms. 10. What general assumptionsallow us to carry out the machine learningprocess? What isthe meaningof induction?  Inductionismakingthe use of available knowledge toturnitintolarge amountof knowledge. 11. How is learninglike farming?  Farmingismore kindof dependentactivitywhere itdependsonNature.Alongwiththe helpof Nature farmerscombine seedswithnutrientstogrow crops.In similarmanner to grow programs(like crops),alearninghastocombine knowledge (logic) withdata for growingthe programs. 12. What is overfitting?Howdoes it leadto a wrong idea that you have done a reallygood job on training dataset?  Overfittingiswhenmodel learnsfrommore trainingdata.Whenwe have more training data thenthe model getsusedtothe characteristicsof the trainingdatawhicheven includesthe noise anderrorof it.Now whenitcomesto applythe learningthatmodel learnedontrainingdata,the resultsare not as expectedandthe model mightnotwork well onthe testdata. It negativelyimpactsonmodelsabilitytogeneralize.Itishighly likelythatwe will gettestdatasame as our trainingdata.
  • 5. 13. What is meantby biasand variance? Youdon't have to be really precise indefiningthem,just get the idea.  Bias: Learners erroneousassumptionsinlearningalgorithms.Low Bias→ more assumptionsHighBias→ lessassumptions  Variance:Amountof estimate foramodel tochange withdifferenttrainingdataisused. 14. What are some of the thingsthat can helpcombat overfitting?  Use of followingtechniquesmighthelpincombatingoverfitting  cross-validation  Addinga regularizationtermtothe evaluationfunction.  performa statistical significancetestlike chi-squarebeforeaddingnew structure 15. Whydo algorithmsthat work well in lowerdimensionsfail at higherdimensions?Thinkabout the numberof instancespossible inhigher dimensionsandthe cost of similaritycalculation  As the dimensionsincreasethe amountof datathat is requiredtotraina model (inthiscase algorithm) the amountof data neededgrowsexponentially.Ina wayalgorithmswithlowerdimensionscangeneralize (keepsyncintrainingand testdata) ina betterwaythan dealingwithmaintaininggeneralizationwith higherdimensionality. Same phenomenonisexplainedby“Curse of Dimensionality” 16. What is meantby "blessingofnon-uniformity"?  Thisrefersto the fact that observationsfromreal-worlddomainsare oftennot distributeduniformly,butgroupedorclusteredinuseful andmeaningful ways. 17. What has beenone of the major developmentsinthe recent decadesabout resultsof induction?  One of the majordevelopmentsisthatwe canhave guaranteesonthe resultsof induction,particularlyif we’re willingtosettle forprobabilisticguarantees. 18. What is the most important factor that determineswhetheramachine learningproject succeeds? - Successof the projectdependsuponnumberof featuresused.If we have many independentfeaturesthateachcorrelate wellwiththe class,learningiseasy.Onthe other hand,if the classisa verycomplex functionof the features,we maynotbe able tolearnit. 19. In a ML project,which is more time consuming – feature engineeringorthe actual learning process?Explain how ML is an iterative process?  Feature engineeringformsthe more time consumingprocessformachine learningsince itdealswithmanythingssuchasgatheringdata,cleaningitand pre-processit.  In ML we have to carry out certaintasksiterativelysuchasrunningthe learner, analyzingthe results,modifyingthe dataandthe learner.Hence itisan iterative process.
  • 6. 20. What, according to the author, is one of the holygrails of ML?  Accordingto the author,the processof automatingfeature engineering processesisthe holygrails.Itcan be done by generatinglarge no.of candidate featuresandselectingthe bestbasedontheirinformationgainw.r.tclass.Butit has some limitations. 21. If your ML solutionis not performingwell,what are two thingsthat you can do?Which one is a betteroption? When an ML solutiondoesnotperformwell we have twomainchoices . To Designa betterlearneralgorithm . Gathermore data. It isalwaysbetterif we go forcollectingmore databecause a dumbalgorithmwith more and more data beatsa cleveralgorithmwithmodestamountof data 22. What are the 3 limitedresourcesin ML computations? Whatis the bottlenecktoday? What is one of the solutions? The 3 limitedresourcesinMLcomputationsare: . Time . Memory . TrainingData The bottleneckhaschangedfromdecade todecade and todayit is“Time”. If there ismore data thenit takesverylongto processitand learnthe complex algorithm.Sothe onlysolutionfor thisisto come upwitha fasterwayto learnthe complex classifiers. 23. A surprisingfact mentionedbythe author is that all representations(typesoflearners) essentially"all dothe same".Can you explain?Whichlearnersshouldyou try first? All learnersworkbygroupingnearbyexamplesintothe same class,the keydifference isin the meaningof nearby.Withnon-uniformlydistributeddata,learnerscanproduce widely differentfrontierswhile still makingthe same predictionsinthe regionsthatmatter. It isbetterto try the simplestlearnersfirst.Complexlearnersare usuallyhardertouse,because theyhave more knobsyouneedto turnto get goodresults,andbecause theirinternalsare more opaque 24. The author divideslearnersinto two typesbased on theirrepresentationsize.Write a brief summary. Accordingtothe authorthere are twotypesof learnersbasedonrepresentationsize. 1) Learners withfixedrepresentationsize 2) Learners whose representationsize growswithdata
  • 7. Fixed-sizelearnerscanonlytake advantage of so muchdata. Variable-sizelearnerscanin principle learnanyfunctiongivensufficientdata,butinpractice theymaynot,because of limitationsof the algorithmorcomputational costorthe curse of dimensionality. Forthese reasons,cleveralgorithmsthose thatmake the mostof the data andcomputingresources available oftenpayoff inthe end. 25. Is it betterto have variation of a single model or a combination ofdifferentmodels,knownas ensemble orstacking? Explainbriefly. Researchersnoticedthat,if insteadof selectingthe bestvariationfound,we combine many variations,the resultsare oftenmuchbetterandat little extraeffortforthe user.Inensembling we generate randomvariationsof the trainingsetbyresampling,learnaclassifieroneach,and combine the resultsbyvoting.Thisworksbecause itgreatlyreducesvariancewhile onlyslightly increasingbias. 26. Read the last paragraph and explainwhy it makessense to prefersimpleralgorithms and hypotheses. Whenthe complexityiscomparedtothe size of hypothesisspace,smallerspacesof hypotheses are allowedtobe representedinshortercodes.A learnerwithalargerhypothesisspace that triesfewerhypothesesfromitislesslikelytooverfitthanone thattriesmore hypothesesfroma smallerspace.Soitmakessense toprefersimpleralgorithmsandhypothesesasmore the numberof assumptions tomake,more unlikelyexplanationis. 27. It has beenestablishedthat correlationbetweenindependentvariablesandpredicted variablesdoes not implycausation, still correlation isused by many researchers.Explainbrieflythe reason. In a predictionstudy,the goal istodevelopaformulaformakingpredictionsaboutthe dependentvariable,basedonthe observedvaluesof the independentvariables. Ina causal analysis,the independentvariablesare regardedascausesof the dependentvariable. Manylearningalgorithmscan potentiallyextractcausal informationfromobservational data,buttheirapplicabilityisratherrestricted. To findcausation,yougenerallyneedexperimental data,notobservational data.Correlation isa necessarybutnotsufficientconditionforcausation. Correlationisavaluable type of scientificevidence infieldssuchasmedicine,psychology,andsociology.Butfirstcorrelationsmustbe confirmedasreal, and theneverypossible causativerelationshipmustbe systematicallyexplored.Inthe endcorrelation can be usedaspowerful evidence foracause-and-effectrelationshipbetweenatreatmentandbenefit, a risk factorand a disease,ora social or economicfactorand variousoutcomes.