That Conference 2017 - Killing a Fly with a Shotgun: Metacognition and the Art of Problem Solving
Killing a Fly with a Shotgun
Metacognition and the Art of Problem Solving
Thank you sponsors!
o That Conference sent me an e-mail requesting that I do this.
o Also, I’m genuinely thankful. This is an impressive event.
o BA Mathematics, The Ohio State University, 2007
o MAT Teaching Leadership and Curriculum Studies, Kent State University,
o PhD Applied Mathematics, Case Western Reserve University, 2014
o Currently, Data Scientist at Nexosis
o Previously, various flavors of nerd.
METACOGNITION IS TOTALLY NOT MADE UP
o Metacognition is simply thinking about the way you think
o Why does it matter?
o Problem Solving is a creative endeavor
Themes that I have noticed
o Knobbiness – Creative ideas are the result of twisting knobs on an idea
o Local Triviality – Complicated ideas are just a series of simple ideas
o Cognitive Resolution
o Like visual resolution, but with ideas
o Explain things as simply as possible, but no simpler
Applying these ideas to regression
Linear regression – also known as finding a line of “best” of fit
𝑦𝑦𝑖𝑖 ≈ 𝛼𝛼1 𝑥𝑥 𝑖𝑖
where y is the target value, x is the input, α are the model parameters.
How do we find such a line?
How do we find this?
Find the αi that minimize the following sum:
Where 𝑟𝑟𝑖𝑖 = 𝑦𝑦𝑖𝑖 − 𝛼𝛼1 𝑥𝑥 𝑖𝑖
− 𝛼𝛼0. This is called minimizing the residuals.
We do this by creating a system of equations called the “Normal equations.”
What are the Normal equations? The subject of another talk entirely.
What about non-Linear regression?
o Polynomial regression
o 𝒓𝒓𝒊𝒊 = 𝒚𝒚𝒊𝒊 − 𝛼𝛼2 𝑥𝑥1
− 𝛼𝛼1 𝑥𝑥1
o Polynomial multiple-regression or multinomial
o 𝒓𝒓𝒊𝒊 = 𝒚𝒚𝒊𝒊 − 𝛼𝛼1 𝑥𝑥2
− 𝛼𝛼2 𝑥𝑥1
− 𝛼𝛼3 𝑥𝑥2
− 𝑎𝑎4 𝑥𝑥1
A subtle twist of the knob allows us to create all sorts “new” methods. They’re all
solved the same way!
Variable Selection – How much is too much?
o How do you know which is relevant?
o You could manually test all combinations - not wise
o Physical intuition – doesn’t always apply, but it extremely useful when it does
o Past experience – food service employees know their regulars
o Is there an algorithmic approach?
o Should you preserve the contributions of all features?
o What happens if you decided to throw some of them out?
𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚 �
+ 𝜆𝜆 �
Ridge regression is just regular regression with an additional term.
Ridge regression forces the parameters to be small.
𝑎𝑎𝑎𝑎𝑎𝑎 𝑚𝑚𝑚𝑚𝑚𝑚 �
+ 𝜆𝜆 �
LASSO is just ridge regression, but with a different penalty term.
Unlike previous small changes, this one leads to a significant difference in how
the solution is computed. Namely, the normal equations no longer apply.
Will often force a “sparse” set of parameters.
o Machine Learning is nuanced.
o Most methods are variations on a theme.
o X is just Y but with Z changed.
o Explain things as simply as possible, but no simpler.
o Complex problems are series of simple problems strung together.
o Think about how you think. It will make you a better problem solver.