SlideShare a Scribd company logo
1 of 42
Prototyping in the Data World 
Clare Corthell 
! 
Data @ Mattermark 
@clarecorthell
weirdest food 
you’ve ever eaten?
whale blubber ice cream, 
with blueberries. 
(mine)
Open Source Data Science 
Masters 
datasciencemasters.org 
! 
Mattermark 
Data Scientist 
Machine Learning Engineer 
about me
Mattermark 
Private Company Deal Intelligence Platform 
! 
or, a huge spreadsheet full of live data about private companies 
of which you can ask questions 
my company
Today’s Goal
ask questions of data 
(what we think about all the time at Mattermark)
Why do we ask questions 
of our data?
Because we want to gain 
knowledge from data
@maebert
creating knowledge, 
understanding, 
& more data 
(after exploration)
Data Scientists turn data into knowledge 
by answering ambiguous questions 
such as 
How do we bucket companies by industry? 
What are those industries? 
Can we predict whether someone will start a company? 
Are there patterns that computers can see that humans can’t? 
What do Data Scientist do?
Turning Data into Knowledge 
How Data Scientists spend their time: 
• 80% on Cleaning > Munging > Exploration 
• 20% Experiments / Analysis / Machine Learning 
Exploration is important because it lets us determine 
what questions we might be able to answer with the data. 
Only then can we run experiments, analyze, and finally 
begin to fundamentally understand and model the world. 
What do Data Scientist do?
Exploration results in Prototypes 
definition 
When you explore and ask questions, 
you create knowledge prototype. 
(a first, probably incomplete version that leads to knowledge) 
! 
Knowledge prototypes answer questions. 
(they might not perfectly model the world, but they’re a useful start) 
! 
Questions lead to more questions, 
and subsequently more knowledge.
“All prototypes are wrong, 
but some prototypes are useful.” 
— blatant misquoting of George E. P. Box
by exploring data 
we start to answer questions 
by building knowledge prototypes 
lemme show you what I mean
What do we need to explore data? 
• Tools for working with that data (python!) 
• A data structure to make the data usable 
• Data 
• Questions we want to answer 
(we’ll make them up as we go today)
toolkit 
numpy 
pandas 
iPython 
multi-dimensional container of data 
data structures analysis tools 
browser-based code notebook / IDE 
(run blocks of code, not the whole program) 
python
the data structure: DataFrame 
and you thought you hated excel, 
but you actually don’t
dataframe 
• records are rows 
! 
• columns are values across those rows 
! 
• basic actions: filtering, sorting, slicing 
! 
(paradigmatically not a far cry from excel) 
basic data structure
The Data (from Mattermark) 
• Categorical (industry) 
• Continuous (uniques) 
• Binary (mobile app) 
• Dates (date of funding) 
Company funding events 
in New York City 
from the last 5 years 
data types (examples)
Initial Questions of Exploration 
• What’s in here? 
• Are there patterns? 
• What might we find out if we investigate further? 
Exploration
From questions 
come more questions 
And eventually, you find something very, very 
interesting (and probably valuable!)
What’s in here? 
(sample 10 rows) 
iPython code block 
pd.read_csv(csvfilename)
What’s in here? 
(sample 1 row) 
.iloc[index_int]
What’s in here? 
(sample & describe 1 column) 
… 
df['colname'] 
df['colname'].describe()
What’s in here? 
(summary across columns) 
columns cont —> 
df.describe()
What’s in here? 
(sort by round size) 
… 
df.sort(‘colname’, ascending=False)
What’s not in here? 
(null or missing values) 
In the column, is the value at a given index null? (true or false) 
… 
Count the number of null values in the column 
df[‘colname'].isnull() 
len(np.where(df[‘colname’].isnull())[0])
Question: 
What is the most common stage for funding? 
to get a quick idea of scale… 
df['colname'].value_counts() 
df['colname'].value_counts().plot(kind='bar')
Leads to Question: 
What is the typical funding amount by round? 
Further questions: 
• What kind of companies 
raised at each stage? 
• How much variability is 
there in the amount raised 
at each stage? 
• Is this different from other 
geographies? 
groupby_var = df.groupby(‘colname') 
print groupby_var[‘colname’].mean().astype(int)
Question: 
How many of these are mobile companies? 
df.shape 
Further questions: 
• Do mobile companies have lots of employees? 
• Do mobile companies typically have revenue? 
• Do mobile companies raise less or more than other 
companies?
Question: 
How many of these are mobile companies? 
Further questions: 
! 
• Do mobile companies have lots of employees? 
• Do mobile companies typically have revenue? 
• Do mobile companies raise less or more than other 
companies?
Our prototypes of knowledge: 
With regard to private companies in NYC 
that raised capital in the last 5 years: 
! 
• ~10% have mobile applications 
• Most funding events were at the seed stage 
• The average seed round was $839k 
! 
In total: 
! 
• There were 3209 reported funding events 
what we discovered
Why it’s a prototype 
(eg, why we’re not done yet) 
• The data isn’t completely clean 
• We haven’t accounted for null, missing, zero 
values 
• We haven’t connected directly to a business 
question 
• We aren’t working in production (just locally)
by exploring data 
we start to answer questions 
with knowledge prototypes
Why does this matter? 
• Exploration lets us build prototypes of knowledge 
that start to answer real questions. 
• One question paves the road to another. 
• Answering questions leads to knowledge. 
• People who have knowledge understand more 
about the world.
Why does this matter? 
There aren’t enough people that do this with code.
Why does this matter? 
People who can code in the world of technology 
companies are a dime a dozen and get no respect. 
People who can code in biology, medicine, 
government, sociology, physics, history, and 
mathematics are respected and can do amazing 
things to advance those disciplines. 
- Zed Shaw (Python the Hard Way)
daw.
Thank You! 
Best way to reach me? 
Twitter @clarecorthell 
psst — Mattermark is hiring! 
Come talk to me!

More Related Content

Viewers also liked

Pure cdaviewer
Pure cdaviewerPure cdaviewer
Pure cdaviewermerckator
 
Cellular respiration updated
Cellular respiration updatedCellular respiration updated
Cellular respiration updatedLumen Learning
 
Erecting a Billboard in Tiananmen Square: Global Market
Erecting a Billboard in Tiananmen Square: Global MarketErecting a Billboard in Tiananmen Square: Global Market
Erecting a Billboard in Tiananmen Square: Global MarketLumen Learning
 
Boundless What Is Accounting
Boundless What Is AccountingBoundless What Is Accounting
Boundless What Is AccountingLumen Learning
 
Cbc the argumentative research paper overview
Cbc the argumentative research paper overviewCbc the argumentative research paper overview
Cbc the argumentative research paper overviewLumen Learning
 
State of Lumen Update: September 2015
State of Lumen Update: September 2015State of Lumen Update: September 2015
State of Lumen Update: September 2015Lumen Learning
 
Imperfectly Comeptitive Markets
Imperfectly Comeptitive MarketsImperfectly Comeptitive Markets
Imperfectly Comeptitive MarketsLumen Learning
 
Applications in Demand and Supply
Applications in Demand and SupplyApplications in Demand and Supply
Applications in Demand and SupplyLumen Learning
 
Elasticity - A Measure of Response
Elasticity - A Measure of ResponseElasticity - A Measure of Response
Elasticity - A Measure of ResponseLumen Learning
 
Eli eng125 structure in literary essays
Eli eng125 structure in literary essaysEli eng125 structure in literary essays
Eli eng125 structure in literary essaysLumen Learning
 
Human Geography: NASA's Earth Observatory
Human Geography: NASA's Earth ObservatoryHuman Geography: NASA's Earth Observatory
Human Geography: NASA's Earth ObservatoryLumen Learning
 
Design process 2015
Design process 2015Design process 2015
Design process 2015David Lloyd
 
英雄部落职位
英雄部落职位英雄部落职位
英雄部落职位HeroBoss
 

Viewers also liked (16)

Pure cdaviewer
Pure cdaviewerPure cdaviewer
Pure cdaviewer
 
DNA Lab
DNA LabDNA Lab
DNA Lab
 
Cellular respiration updated
Cellular respiration updatedCellular respiration updated
Cellular respiration updated
 
Erecting a Billboard in Tiananmen Square: Global Market
Erecting a Billboard in Tiananmen Square: Global MarketErecting a Billboard in Tiananmen Square: Global Market
Erecting a Billboard in Tiananmen Square: Global Market
 
Boundless What Is Accounting
Boundless What Is AccountingBoundless What Is Accounting
Boundless What Is Accounting
 
Cbc the argumentative research paper overview
Cbc the argumentative research paper overviewCbc the argumentative research paper overview
Cbc the argumentative research paper overview
 
Confronting Scarcity
Confronting ScarcityConfronting Scarcity
Confronting Scarcity
 
The Multiplier
The MultiplierThe Multiplier
The Multiplier
 
State of Lumen Update: September 2015
State of Lumen Update: September 2015State of Lumen Update: September 2015
State of Lumen Update: September 2015
 
Imperfectly Comeptitive Markets
Imperfectly Comeptitive MarketsImperfectly Comeptitive Markets
Imperfectly Comeptitive Markets
 
Applications in Demand and Supply
Applications in Demand and SupplyApplications in Demand and Supply
Applications in Demand and Supply
 
Elasticity - A Measure of Response
Elasticity - A Measure of ResponseElasticity - A Measure of Response
Elasticity - A Measure of Response
 
Eli eng125 structure in literary essays
Eli eng125 structure in literary essaysEli eng125 structure in literary essays
Eli eng125 structure in literary essays
 
Human Geography: NASA's Earth Observatory
Human Geography: NASA's Earth ObservatoryHuman Geography: NASA's Earth Observatory
Human Geography: NASA's Earth Observatory
 
Design process 2015
Design process 2015Design process 2015
Design process 2015
 
英雄部落职位
英雄部落职位英雄部落职位
英雄部落职位
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...amitlee9823
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Pooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Prototyping in the Data World - Data Scripting with Python

  • 1. Prototyping in the Data World Clare Corthell ! Data @ Mattermark @clarecorthell
  • 3. whale blubber ice cream, with blueberries. (mine)
  • 4. Open Source Data Science Masters datasciencemasters.org ! Mattermark Data Scientist Machine Learning Engineer about me
  • 5. Mattermark Private Company Deal Intelligence Platform ! or, a huge spreadsheet full of live data about private companies of which you can ask questions my company
  • 7. ask questions of data (what we think about all the time at Mattermark)
  • 8. Why do we ask questions of our data?
  • 9. Because we want to gain knowledge from data
  • 11.
  • 12. creating knowledge, understanding, & more data (after exploration)
  • 13. Data Scientists turn data into knowledge by answering ambiguous questions such as How do we bucket companies by industry? What are those industries? Can we predict whether someone will start a company? Are there patterns that computers can see that humans can’t? What do Data Scientist do?
  • 14. Turning Data into Knowledge How Data Scientists spend their time: • 80% on Cleaning > Munging > Exploration • 20% Experiments / Analysis / Machine Learning Exploration is important because it lets us determine what questions we might be able to answer with the data. Only then can we run experiments, analyze, and finally begin to fundamentally understand and model the world. What do Data Scientist do?
  • 15. Exploration results in Prototypes definition When you explore and ask questions, you create knowledge prototype. (a first, probably incomplete version that leads to knowledge) ! Knowledge prototypes answer questions. (they might not perfectly model the world, but they’re a useful start) ! Questions lead to more questions, and subsequently more knowledge.
  • 16. “All prototypes are wrong, but some prototypes are useful.” — blatant misquoting of George E. P. Box
  • 17. by exploring data we start to answer questions by building knowledge prototypes lemme show you what I mean
  • 18. What do we need to explore data? • Tools for working with that data (python!) • A data structure to make the data usable • Data • Questions we want to answer (we’ll make them up as we go today)
  • 19. toolkit numpy pandas iPython multi-dimensional container of data data structures analysis tools browser-based code notebook / IDE (run blocks of code, not the whole program) python
  • 20. the data structure: DataFrame and you thought you hated excel, but you actually don’t
  • 21. dataframe • records are rows ! • columns are values across those rows ! • basic actions: filtering, sorting, slicing ! (paradigmatically not a far cry from excel) basic data structure
  • 22. The Data (from Mattermark) • Categorical (industry) • Continuous (uniques) • Binary (mobile app) • Dates (date of funding) Company funding events in New York City from the last 5 years data types (examples)
  • 23. Initial Questions of Exploration • What’s in here? • Are there patterns? • What might we find out if we investigate further? Exploration
  • 24. From questions come more questions And eventually, you find something very, very interesting (and probably valuable!)
  • 25. What’s in here? (sample 10 rows) iPython code block pd.read_csv(csvfilename)
  • 26. What’s in here? (sample 1 row) .iloc[index_int]
  • 27. What’s in here? (sample & describe 1 column) … df['colname'] df['colname'].describe()
  • 28. What’s in here? (summary across columns) columns cont —> df.describe()
  • 29. What’s in here? (sort by round size) … df.sort(‘colname’, ascending=False)
  • 30. What’s not in here? (null or missing values) In the column, is the value at a given index null? (true or false) … Count the number of null values in the column df[‘colname'].isnull() len(np.where(df[‘colname’].isnull())[0])
  • 31. Question: What is the most common stage for funding? to get a quick idea of scale… df['colname'].value_counts() df['colname'].value_counts().plot(kind='bar')
  • 32. Leads to Question: What is the typical funding amount by round? Further questions: • What kind of companies raised at each stage? • How much variability is there in the amount raised at each stage? • Is this different from other geographies? groupby_var = df.groupby(‘colname') print groupby_var[‘colname’].mean().astype(int)
  • 33. Question: How many of these are mobile companies? df.shape Further questions: • Do mobile companies have lots of employees? • Do mobile companies typically have revenue? • Do mobile companies raise less or more than other companies?
  • 34. Question: How many of these are mobile companies? Further questions: ! • Do mobile companies have lots of employees? • Do mobile companies typically have revenue? • Do mobile companies raise less or more than other companies?
  • 35. Our prototypes of knowledge: With regard to private companies in NYC that raised capital in the last 5 years: ! • ~10% have mobile applications • Most funding events were at the seed stage • The average seed round was $839k ! In total: ! • There were 3209 reported funding events what we discovered
  • 36. Why it’s a prototype (eg, why we’re not done yet) • The data isn’t completely clean • We haven’t accounted for null, missing, zero values • We haven’t connected directly to a business question • We aren’t working in production (just locally)
  • 37. by exploring data we start to answer questions with knowledge prototypes
  • 38. Why does this matter? • Exploration lets us build prototypes of knowledge that start to answer real questions. • One question paves the road to another. • Answering questions leads to knowledge. • People who have knowledge understand more about the world.
  • 39. Why does this matter? There aren’t enough people that do this with code.
  • 40. Why does this matter? People who can code in the world of technology companies are a dime a dozen and get no respect. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines. - Zed Shaw (Python the Hard Way)
  • 41. daw.
  • 42. Thank You! Best way to reach me? Twitter @clarecorthell psst — Mattermark is hiring! Come talk to me!