Data Science versus Jungle Cats

Data Science vs. Jungle Cats
A Paradigm For Data Science in Fundamental Investing
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
vs.y = m1x + ...mnx + b
By Ashlee Bennett

Data Science is a combination of:
- Computer Science/programming
- Math & Statistics
- Domain expertise
Data Science requires domain expertise (or so a google image search suggests)
This is the hardest to find
and often the most
important !
2

Domain expertise is crucial in nearly every step of the data science process
What data
would answer
what
question?
What transformations
or interpolations are
contextually
appropriate?
What performance
metric is aligned
with business
objectives?
What model is
optimal or practical
for the business
framework?
What assumptions
can be made given
fundamental
knowledge ?
Qs:
Ex: Sales v. Units
B&M v. Online
Doors v. users
Quarter vs. Monthly
Outliers: drop or keep
Nulls: drop or fill with #
Precision v. Recall
Correlation v. Contrast
Ranking v. Grouping
Black v. Clear Box
Speed v. Accuracy
Descriptive v. Predictive
Customer base?
Management claims?
Business initiatives?
3

Data
Collection
Cleaning &
Transformation
Performance
Metric Selection
Model
Evaluation
Analytic
Interpretation
Data Scientists can answer some of the questions that arise during the pipeline with
common sense or research, but often the process and ultimate outcome is more
timely and better served when the expertise of the business end user is
incorporated from the getgo and/or directly used to refine the process.
Business end users can be a key source of domain expertise
4

Without domain expertise, irrelevant data could be misleadingly transformed,
deceptively interpolated, evaluated via an irrelevant performance metric with
inappropriate models, only to reach an meaningless conclusion
5
. .
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Based upon the log-transformed, tree-rat
interpolated mean-square error rate of
sea slug population decline predicted in
this naive random support vector forest
gradient boost ensemble regressor, the
price of tea in China should rise two cents
over the next decade...
. . . .
--
z
z
z
? wtf?

So what does this have to do with jungle cats?
6
A jungle cat, or a jungle cat attack, is a metonymy for a rare,
yet critical event. We want to quickly identify and avoid jungle
cats, just like we want to identify and avoid disastrous
investment choices, especially when these choices are few
and carry a big impact.
While fundamental & PE investors have to worry about "jungle cats",
quants just worry about mosquitos. Mosquito bites suck & happen
frequently, but they don't kill you because they're small compared to
your total surface area of skin...Unless you experience many over a
short time period, or if they act as vectors spreading a disease

7
vs.
You can handle a few mosquito bites, but probably not a few jungle cat
attacks.
Quants placing many [smaller] bets can afford to gloss-over, or not incorporate domain
expertise in lieu of speed and diversity because a slightly less accurate or "underfit" model with
a few hiccups won't tank their portfolio; they're just metaphorical mosquito bites
Fundamental and Private Equity investors place fewer [larger] bets, so they have more to gain,
but more to lose. Incorporation of domain expertise can provide the mission-critical edge that
both identifies a good investment and avoids a disastrous one; a metaphorical jungle cat

. .
So where am I going with this?
Imagine a primitive villager who walks out of their
hut one day only to see a fierce jungle-cat....
?

. .
Even if they've only been attacked by a jungle cat
once, or maybe only ever heard about one, they'll
probably instantly know to turn around and GTFO

. .
But would an algorithm know to turn and run to
safety??
?

. .
`
`
vs.
This is no mosquito bite, the stakes are high...

So would an algorithm know to turn and run to safety??
Eventually...

An algorithm would eventually know to turn and run to safety...
But our villager might have to be mauled to death 2000 times first.
That's a lot of dead villagers

vs.
Algorithms and models are only as good as the data that you feed them. Too little
data or poor quality data will produce a suboptimal or even incorrect prediction
An intrinsic dearth of data, especially that pertaining to rare events (ie. jungle cat attacks, Mergers
& Acquisitions activity) can disguise the potential of data science techniques, and even make them
seem inferior to intuition alone

The problem : An algorithm sees and weights features in ways we don't
The advantage: An algorithm sees and weights features in ways we don't
For Instance:
What if the jungle cat came in different colors? Our algorithm might
need to see an instance of each to identify it as a vicious jungle cat, in
addition to being a similar size and build.

What if the jungle cats could have stripes??

What if environmental settings can play a role in triggering an attack?

What if all these things matter in combination??

"Al" the Algorithm
Turn and RUN you fool !!!
Our algorithm might need to be fed
multiple instances of every jungle cat
type and environment combo to
correctly call when it's time to GTFO
with great accuracy

. .
So then why try to use data science at all?
OMG, another jungle cat - RUN
FOR YOUR LIVES!!!
WTF?
Hmm...wait a second....

Please grant me a quick
death...
I just wanted
a belly rub...
Yo! Calm down. I don't
think this is a
jungle-cat...
-

What do you mean?!? It's large,
yellow and has four legs & a tail. My
experience & instincts are telling me a
violent death is nigh if I don't high-10
it outta here. Stat!...
Who? Me?
Yeah, but it also has a
waggly tail, boopable
snoot, floppy ears and
an adorably dumb-look
on its face...
. .

. .
Based on all the jungle cat data points
I've seen, it's highly improbable that it
has this combination of differentiating
features & is still a jungle cat. I could be
wrong, but I'm here to bring these subtle
quantitative differences to your attention

You're right. At first glance I thought it was a
jungle cat based on my instinct and life
experience, but at closer inspection there are
quantitative differences between this beast and
any typical jungle cat I've seen or heard of...
You were just weighting the
size and color more than
other features like ear shape,
tail and stupidity of
expression, due to experience
or rumor-based bias
. .

Algorithms like me should
be used to augment
decision making by raising
flags when intuition-based
decisions don't align with all
the data available
. .
Happily Ever After??
OH, yasss...

. .
OMG, another jungle cat - RUN
FOR YOUR LIVES!!!
Hmm...wait a second....

. .
It's big, it has four legs and it's a
color jungle cats come in
Yo! Calm down. I don't
think this is a jungle cat...
Looks ...Tasty....

Based on the jungle-cat data points I've
ingested, it's highly improbable that it
has this combination of differentiating
features and is still a jungle cat. I could
be wrong, but I'm here to bring these
subtle quantitative differences to your
attention
. .

. .
Hmm...Come to think of it, that
doesn't look like a jungle cat
after all.
Yeah. Why don't you take a closer
look?

What happens After a [Metaphorical] Bear Attack??
Algorithms are only as good as the data they're trained on & they are
scoped to answer a specific question
"Not a Jungle cat" "Won't rip your arm off and eat it"
1) An invaluable "training" data point is gained & used to inform future predictions
2) The limitations or "scope" of the algorithm is revealed, emphasized, or re-considered

1) An invaluable "training" data point is gained & used to inform future predictions
The algorithm is now trained to avoid
bears, or animals with the
characteristics of bears, as well.

1) An invaluable "training" datapoint is gained & used to inform future predictions
Or we can even "boot-strap" our bear
data point to avoid bears under all
environmental scenarios

Over time, the result is an algorithm that is more
accurate, comprehensive and attuned to the
investor's personal experience and expertise

Over time, the result is an algorithm that is more accurate, comprehensive and attuned to the
investors personal experience and expertise, and whose utility is inheritable for new
investors whose lack of experience makes them especially prone to naivety and
chronological bias
. .
. .
This is akin to how knowledge and experience might be passed
down from a villager to his child, but without any bias, loss of
memory, or reliance on untested and mutable heuristics

2) The limitations or "scope" of the algorithm is revealed, emphasized or re-considered
Al was right, the bear was not a jungle cat. But it was a fucking bear, so our villager still should
have run. Al was not intentionally trying to be a smartass, he was just doing the only classification
task he was trained to do
Oh, Shit.

A solution to this dilemma might be to train Al as a multi-class classifier, or create and train new
algorithms who specialize in making different predictions
Run
Don't Run
Pet
Don't Pet
Jungle Cat
Not a Jungle Cat
Bear
Not a Bear
Dog
Not a Dog

These different algorithms can even be used to "sanity-check" each others output and find
inconsistencies in the data, or algorithmic failures when their predictions are incongruent
Run
Don't Run
Pet
Don't Pet
Jungle Cat
Not a Jungle Cat
Bear
Not a Bear
Dog
Not a Dog
Collectively Reads As: "Don't Run, Pet, Jungle Cat"

These different algorithms can even be used to "sanity-check" each others output and find
inconsistencies in the data, or algorithmic failures when their predictions are incongruent
. .
Petting a jungle cat? Even our villager knows that's crazy-talk. This discrepancy is less than ideal, but it
allows our villager to weight his confidence in the pooled algorithmic suggestion vs. his own instincts,
and based on the actual outcome, decide which algorithms to trust more than others in the future
WTF?

So what's the moral of the story??
● Algorithms are only as good as the data they're
trained on, and at addressing questions within
the scope for which they were designed
● While it can be a powerful tool to guide
decisions, in fundamental & PE investing data
science should never be completely divorced
from fundamental domain expertise, especially
when there is a dearth of relevant data points for
the algorithm to train on
● Also, don't pet bears.

Data Science vs. Jungle Cats
Cast of Characters (in case you didn't get the metaphor)
Villager
A Fundamental long/short or PE
investor
Jungle Cat
A detrimental equity or PE investment
opportunity to be avoided
"Al" the Algorithm
Your theoretical and abstract, yet
friendly data science help-meet
Affable Canine
A promising, yet non-obvious equity
or PE investment opportunity whose
value is realized after algorithmic,
unbiased assessment of its similarity
to historical wins is brought to the
investors attention
Asshole Bear
A potentially promising, yet non-obvious
equity or PE investment opportunity
whose undesirability is realized upon
further investigation, & whose encounter
should be used as an additional future
"training" data point, or used to remind or
re-think the scope of the algorithm

Data Science versus Jungle Cats

Recommended

Recommended

More Related Content

Similar to Data Science versus Jungle Cats

Similar to Data Science versus Jungle Cats (20)

Recently uploaded

Recently uploaded (20)

Data Science versus Jungle Cats