The Human Algorithm: Automating Startup Data Collection at Mattermark

Janessa Lantz
Janessa LantzReader // Writer // Editor at HubSpot
#datapointlive
The Human Algorithm:
Automating Startup Data Collection
at Mattermark
Sarah Catanzaro,
Head of Data at Mattermark
@sarahcat21
#DPL15 | @sarahcat21
Mattermark is a deal intelligence platform and
private company database used by
●
investors
●
business and corporate development
●
sales
Mattermark
#DPL15 | @sarahcat21
THE CHALLENGE
Scale + Information Overload +
Stealth
#DPL15 | @sarahcat21
Scale
Over 125 million private companies in the world
(only about 45.5 thousand public).
#DPL15 | @sarahcat21
Information overload
#DPL15 | @sarahcat21
Stealth
●
Private companies do not have strong incentives
(e.g. legal obligations) to share data. Many may
have competitive incentives to obfuscate
information.
●
Investors may request non-disclosure.
#DPL15 | @sarahcat21
Mattermark’s Solution
#DPL15 | @sarahcat21
Software-oriented approach
●
A must, due to the scale of our dataset
○ 1.3 million companies
○ 16.5k investors
○ 110k funding events
●
Leverage a lean data team
#DPL15 | @sarahcat21
Data collection strategy
●
Web scraping
●
Machine learning
●
Direct submission
●
Manual data entry
#DPL15 | @sarahcat21
The “Human Algorithm”
#DPL15 | @sarahcat21
Investors ask questions like
What start-ups
might raise capital
in the next 6
months? What startups is
Stephanie Palmeri
investing in?
#DPL15 | @sarahcat21
Our data analysts seek to understand:
●
Why does this question matter?
●
What data is required to answer this question?
●
Where can this data be accessed?
#DPL15 | @sarahcat21
Next, data analysts:
1.
Define repeatable processes for data collection.
2.
Determine whether processes can be replicated
through web scraping and/or machine learning
algorithms to collect data at scale.
3.
Write functional specifications, reviewed by
sales and engineering team members.
#DPL15 | @sarahcat21
Next, web and/or machine learning
engineers
1.
Write dev designs, reviewed by data analysts.
2.
Upon implementation and marketing release,
this data becomes available to customers.
3.
New questions arise and the cycle starts again.
#DPL15 | @sarahcat21
Funding Automation
#DPL15 | @sarahcat21
Investors ask questions like
How much funding
has a company
already raised?
Who were the
investors at each of
those rounds?
#DPL15 | @sarahcat21
Problems with existing sources
Rely on wiki-style data collection (cannot confirm
the credibility of sources)
News reports are better; but
●
facts are harder to extricate
●
different sources report different figures
#DPL15 | @sarahcat21
Solution: funding automation
A new framework for collecting and synthesizing
funding data.
1.
News article fact extraction (machine learning)
2.
Funding override system (web engineering)
3.
Funding confirmation email campaign
(marketing)
#DPL15 | @sarahcat21
2. News article fact extraction
Crawl RSS feeds, extract
data from stories (title,
texts, links, etc.)
● 750+ sources
● 5,000 - 10,000 articles
#DPL15 | @sarahcat21
2. News article fact extraction
Classify stories
about funding
● 250 articles/day
#DPL15 | @sarahcat21
2. News article fact extraction
●
Identify sentences containing information about
investors, amount, and/or series
#DPL15 | @sarahcat21
2. News article fact extraction
● Extract facts
● Match companies and
investors to entities in our
database
○ 30% of extracted articles
are entered automatically
#DPL15 | @sarahcat21
1. Funding override system
●
Identify reports about the same funding event
●
Combine information from multiple reports using wongi rules engine
#DPL15 | @sarahcat21
3. Funding confirmation email
campaign
Use CRM and Hubspot
to automatically send
emails to founders
after equity financing.
#DPL15 | @sarahcat21
What We Learned
#DPL15 | @sarahcat21
Where we struggled
Our initial implementation of a funding override
system was inefficient. Why?
Because our data analysts and developers were
not aligned on functional requirements.
#DPL15 | @sarahcat21
Solution
●
Analysts must work closely with developers
○ Pre-spec check-ins
○ Analysts review dev designs to ensure that
the system design addresses the use case.
●
Analysts must avoid being prescriptive
●
Analysts must understand data mining and
machine learning concepts
#DPL15 | @sarahcat21
Where we succeeded
Implementation of news article fact extraction
was successful. Why?
Because data analysts and developers worked as
service providers to each other.
#DPL15 | @sarahcat21
How We Did It
#DPL15 | @sarahcat21
1. Tighter Analyst + Dev Communication
Tiger teams: 1 ML developer, 1 web/infrastructure
developer, 1 data analyst, 1 project lead
Define milestones & hold daily stand-ups.
#DPL15 | @sarahcat21
3. Track II interaction reinforce symbiotic
relationship
●
Devs lead Python learning group
●
Data analysts hold seminars on topics like admin
tooling and alternative assets
#DPL15 | @sarahcat21
Thank You!
1 of 32

Recommended

How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor... by
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...
How We're Using Data Consolidation to Gain X-Ray Vision Into Marketing Perfor...Janessa Lantz
630 views15 slides
Salesforce & SQL: Get More from Your CRM Data Using the Tools You Love by
Salesforce & SQL: Get More from Your CRM Data Using the Tools You LoveSalesforce & SQL: Get More from Your CRM Data Using the Tools You Love
Salesforce & SQL: Get More from Your CRM Data Using the Tools You LoveJanessa Lantz
1.2K views34 slides
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd... by
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...
Measuring Offline Sources: How We Used Regional Data Analysis to See the Hidd...Janessa Lantz
595 views12 slides
500 Demo Day Batch 19: Eventxtra by
500 Demo Day Batch 19: Eventxtra500 Demo Day Batch 19: Eventxtra
500 Demo Day Batch 19: Eventxtra500 Startups
263.9K views17 slides
SteadyBudget's Seed Funding Pitch Deck by
SteadyBudget's Seed Funding Pitch DeckSteadyBudget's Seed Funding Pitch Deck
SteadyBudget's Seed Funding Pitch DeckShape Integrated Software
1.7M views23 slides
Perfect dashboard pitch deck by
Perfect dashboard pitch deckPerfect dashboard pitch deck
Perfect dashboard pitch deckAdamSawicki9
24.5K views12 slides

More Related Content

What's hot

Jacqueline Urick - Advanced Search Summit Napa 2021 by
Jacqueline Urick - Advanced Search Summit Napa 2021Jacqueline Urick - Advanced Search Summit Napa 2021
Jacqueline Urick - Advanced Search Summit Napa 2021Digital Marketers Organization
31 views38 slides
Mattermark 1st Series A Deck by
Mattermark 1st Series A DeckMattermark 1st Series A Deck
Mattermark 1st Series A DeckDanielle Morrill
248.6K views32 slides
Linio IR Deck - May 2014 by
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014SYGroup
67.6K views23 slides
Marko Savic - MarTech and the buyer journey by
Marko Savic - MarTech and the buyer journeyMarko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journeyFunnelCake
1.7K views67 slides
Bookkeeping executives mailing database by
Bookkeeping executives mailing databaseBookkeeping executives mailing database
Bookkeeping executives mailing databaseGlobal B2B Contacts
75 views7 slides
ITAC Presentation by
ITAC PresentationITAC Presentation
ITAC PresentationITAC Management Consultancy
196 views12 slides

What's hot(20)

Mattermark 1st Series A Deck by Danielle Morrill
Mattermark 1st Series A DeckMattermark 1st Series A Deck
Mattermark 1st Series A Deck
Danielle Morrill248.6K views
Linio IR Deck - May 2014 by SYGroup
Linio IR Deck - May 2014Linio IR Deck - May 2014
Linio IR Deck - May 2014
SYGroup67.6K views
Marko Savic - MarTech and the buyer journey by FunnelCake
Marko Savic - MarTech and the buyer journeyMarko Savic - MarTech and the buyer journey
Marko Savic - MarTech and the buyer journey
FunnelCake1.7K views
Making Your Site Vendor Agnostic via a Modern Data Layer by Ensighten
Making Your Site Vendor Agnostic via a Modern Data LayerMaking Your Site Vendor Agnostic via a Modern Data Layer
Making Your Site Vendor Agnostic via a Modern Data Layer
Ensighten828 views
10 Analytics Dashboards To Monitor Your Business by Beeckon
10 Analytics Dashboards To Monitor Your Business10 Analytics Dashboards To Monitor Your Business
10 Analytics Dashboards To Monitor Your Business
Beeckon3.3K views
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The... by Martech Alliance
Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...Steve Lok - SUPERNOVA:  Centralised Data Platforms (CDPs) blow sh*t up at The...
Steve Lok - SUPERNOVA: Centralised Data Platforms (CDPs) blow sh*t up at The...
Martech Alliance1.3K views
Scott Brinker - Navigating the Marketing Technology landscape by Avaus
Scott Brinker - Navigating the Marketing Technology landscapeScott Brinker - Navigating the Marketing Technology landscape
Scott Brinker - Navigating the Marketing Technology landscape
Avaus 852 views
InnerTrends - Batch 25 Demo Day by 500 Startups
InnerTrends - Batch 25 Demo DayInnerTrends - Batch 25 Demo Day
InnerTrends - Batch 25 Demo Day
500 Startups424 views
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14 by Charlene Dipaola
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Return on Content: Data Driven Insights for Publishers eMetrics Toronto 14
Charlene Dipaola489 views
Fix, don't stitch: be a steward of your marketing data by MAD//Fest London
Fix, don't stitch: be a steward of your marketing dataFix, don't stitch: be a steward of your marketing data
Fix, don't stitch: be a steward of your marketing data
MAD//Fest London79 views
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd by saastr
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
"The “Dos & Don’ts” of Building Winning SaaS Companies with G2 Crowd
saastr383 views
Fixing marketing data: how to achieve success in a data-driven world by MAD//Fest London
Fixing marketing data: how to achieve success in a data-driven worldFixing marketing data: how to achieve success in a data-driven world
Fixing marketing data: how to achieve success in a data-driven world
MAD//Fest London117 views
Aisling McKeod- Talent Development in the Digital Age by Martech Alliance
Aisling McKeod- Talent Development in the Digital AgeAisling McKeod- Talent Development in the Digital Age
Aisling McKeod- Talent Development in the Digital Age
Martech Alliance413 views

Viewers also liked

Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark by
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkHustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkSam Parr
1.3K views21 slides
Pandoland 2015: Q1-Q2 State of Startups | Mattermark by
Pandoland 2015: Q1-Q2 State of Startups | MattermarkPandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | MattermarkMattermark
14.3K views34 slides
The Value Proposition Canvas 워크샵 강의안 by
The Value Proposition Canvas 워크샵 강의안The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안Jung Soo Kim
5.3K views34 slides
Customers' Job To Be Done by
Customers' Job To Be DoneCustomers' Job To Be Done
Customers' Job To Be DoneINNODYN
2.8K views42 slides
Designing products against customer jobs by
Designing products against customer jobsDesigning products against customer jobs
Designing products against customer jobsMartin Jordan
7.1K views33 slides
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be... by
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Martin Jordan
9.5K views28 slides

Viewers also liked(13)

Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark by Sam Parr
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of MattermarkHustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Hustle Con: Prototyping Mattermark with Danielle Morrill, founder of Mattermark
Sam Parr1.3K views
Pandoland 2015: Q1-Q2 State of Startups | Mattermark by Mattermark
Pandoland 2015: Q1-Q2 State of Startups | MattermarkPandoland 2015: Q1-Q2 State of Startups | Mattermark
Pandoland 2015: Q1-Q2 State of Startups | Mattermark
Mattermark14.3K views
The Value Proposition Canvas 워크샵 강의안 by Jung Soo Kim
The Value Proposition Canvas 워크샵 강의안The Value Proposition Canvas 워크샵 강의안
The Value Proposition Canvas 워크샵 강의안
Jung Soo Kim5.3K views
Customers' Job To Be Done by INNODYN
Customers' Job To Be DoneCustomers' Job To Be Done
Customers' Job To Be Done
INNODYN2.8K views
Designing products against customer jobs by Martin Jordan
Designing products against customer jobsDesigning products against customer jobs
Designing products against customer jobs
Martin Jordan7.1K views
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be... by Martin Jordan
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Integrating JTBD into existing tools & frameworks / Jobs-to-be-Done Meetup Be...
Martin Jordan9.5K views
Customer Experience in digital identification by Pieter Baert
Customer Experience in digital identificationCustomer Experience in digital identification
Customer Experience in digital identification
Pieter Baert36K views
Making jobs-to-be-done actionable / Service Design Drinks by Service Design Berlin
Making jobs-to-be-done actionable / Service Design DrinksMaking jobs-to-be-done actionable / Service Design Drinks
Making jobs-to-be-done actionable / Service Design Drinks
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub... by 500 Startups
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
[PreMoney SF 2015] CB Insights >> "Venture-nomics: A Quantitative Look At Bub...
500 Startups37.7K views
How to Create a Strong Value Proposition Design for B2B - It's all about the ... by Daniel Nilsson
How to Create a Strong Value Proposition Design for B2B - It's all about the ...How to Create a Strong Value Proposition Design for B2B - It's all about the ...
How to Create a Strong Value Proposition Design for B2B - It's all about the ...
Daniel Nilsson133.7K views
Mattermark 2nd (Final) Series A Deck by Danielle Morrill
Mattermark 2nd (Final) Series A DeckMattermark 2nd (Final) Series A Deck
Mattermark 2nd (Final) Series A Deck
Danielle Morrill3.4M views
Value Proposition Design by Yves Pigneur
Value Proposition DesignValue Proposition Design
Value Proposition Design
Yves Pigneur369.3K views
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies by Mattermark
The State of Sales & Marketing at the 50 Fastest-Growing B2B CompaniesThe State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
The State of Sales & Marketing at the 50 Fastest-Growing B2B Companies
Mattermark227.4K views

Similar to The Human Algorithm: Automating Startup Data Collection at Mattermark

Big Data and Marketing: Data Activation and Management by
Big Data and Marketing: Data Activation and ManagementBig Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and ManagementConor Duke
1.1K views47 slides
Architecting for Analytics, Aaron Crear by
Architecting for Analytics, Aaron CrearArchitecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron CrearCzechDreamin
1K views16 slides
From IoT to IoTA by
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTAStriim
326 views38 slides
Crawl, Walk, Run: How to Get Started with Hadoop by
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with HadoopInside Analysis
1.2K views21 slides
The Bigger Picture: New Opportunities for the Modern Enterprise by
The Bigger Picture: New Opportunities for the Modern EnterpriseThe Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern EnterpriseInside Analysis
529 views36 slides
Data Insights for Breakfast, Malmö - Solita by
Data Insights for Breakfast, Malmö - SolitaData Insights for Breakfast, Malmö - Solita
Data Insights for Breakfast, Malmö - SolitaSolita Oy
62 views13 slides

Similar to The Human Algorithm: Automating Startup Data Collection at Mattermark(20)

Big Data and Marketing: Data Activation and Management by Conor Duke
Big Data and Marketing: Data Activation and ManagementBig Data and Marketing: Data Activation and Management
Big Data and Marketing: Data Activation and Management
Conor Duke1.1K views
Architecting for Analytics, Aaron Crear by CzechDreamin
Architecting for Analytics, Aaron CrearArchitecting for Analytics, Aaron Crear
Architecting for Analytics, Aaron Crear
CzechDreamin1K views
From IoT to IoTA by Striim
From IoT to IoTAFrom IoT to IoTA
From IoT to IoTA
Striim326 views
Crawl, Walk, Run: How to Get Started with Hadoop by Inside Analysis
Crawl, Walk, Run: How to Get Started with HadoopCrawl, Walk, Run: How to Get Started with Hadoop
Crawl, Walk, Run: How to Get Started with Hadoop
Inside Analysis1.2K views
The Bigger Picture: New Opportunities for the Modern Enterprise by Inside Analysis
The Bigger Picture: New Opportunities for the Modern EnterpriseThe Bigger Picture: New Opportunities for the Modern Enterprise
The Bigger Picture: New Opportunities for the Modern Enterprise
Inside Analysis529 views
Data Insights for Breakfast, Malmö - Solita by Solita Oy
Data Insights for Breakfast, Malmö - SolitaData Insights for Breakfast, Malmö - Solita
Data Insights for Breakfast, Malmö - Solita
Solita Oy62 views
Analytics trends report 2017 by Robert Sibo
Analytics trends report 2017Analytics trends report 2017
Analytics trends report 2017
Robert Sibo340 views
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic... by DATAVERSITY
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
Data as a Profit Driver – Emerging Techniques to Monetize Data as a Strategic...
DATAVERSITY2K views
The Big Data Ecosystem for Financial Services by DataStax
The Big Data Ecosystem for Financial ServicesThe Big Data Ecosystem for Financial Services
The Big Data Ecosystem for Financial Services
DataStax1.3K views
Five Trends in Real Time Applications by confluent
Five Trends in Real Time ApplicationsFive Trends in Real Time Applications
Five Trends in Real Time Applications
confluent312 views
Analytics: What is it really and how can it help my organization? by SAS Canada
Analytics: What is it really and how can it help my organization?Analytics: What is it really and how can it help my organization?
Analytics: What is it really and how can it help my organization?
SAS Canada1.3K views
How Deloitte Uses AI to Simplify Reporting and Increase Value by Amazon Web Services
How Deloitte Uses AI to Simplify Reporting and Increase ValueHow Deloitte Uses AI to Simplify Reporting and Increase Value
How Deloitte Uses AI to Simplify Reporting and Increase Value
Amazon Web Services1.6K views
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ... by Databricks
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
How a Media Data Platform Drives Real-time Insights & Analytics using Apache ...
Databricks150 views
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner by SIGMA Marketing Insights
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic PartnerTableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Tableau Conference 2014: How One Agency Evolved from Vendor to Strategic Partner
Using Web Data for Finance by Scrapinghub
Using Web Data for FinanceUsing Web Data for Finance
Using Web Data for Finance
Scrapinghub1.3K views
Strategic CIOs: What Comes After the Cloud by SAP Ariba
Strategic CIOs: What Comes After the CloudStrategic CIOs: What Comes After the Cloud
Strategic CIOs: What Comes After the Cloud
SAP Ariba1.2K views
Big Data & New Media by Tara Fusco
Big Data & New MediaBig Data & New Media
Big Data & New Media
Tara Fusco126 views
How to setup Big Data Company in India or data analytics Company by startupscratch
How to setup Big Data Company in India or data analytics  CompanyHow to setup Big Data Company in India or data analytics  Company
How to setup Big Data Company in India or data analytics Company
startupscratch54 views
Graph+AI for Fin. Services by TigerGraph
Graph+AI for Fin. ServicesGraph+AI for Fin. Services
Graph+AI for Fin. Services
TigerGraph92 views
Synthetic VIX Data Generation Using ML Techniques by QuantUniversity
Synthetic VIX Data Generation Using ML TechniquesSynthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML Techniques
QuantUniversity388 views

More from Janessa Lantz

From Question to Action by
From Question to ActionFrom Question to Action
From Question to ActionJanessa Lantz
1.9K views8 slides
Analyzing Mixpanel Data with SQL by
Analyzing Mixpanel Data with SQLAnalyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQLJanessa Lantz
1.2K views29 slides
Optimizing Customer Support by
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer SupportJanessa Lantz
1.8K views19 slides
Analyzing ROI Using Your Facebook and Adwords Data by
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords DataJanessa Lantz
1.8K views16 slides
How to Find the Customer Retention Secrets Hiding in Your Data by
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your DataJanessa Lantz
1.1K views39 slides
How to Use Feedback Surveys to Improve Customer Retention by
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer RetentionJanessa Lantz
2.4K views44 slides

More from Janessa Lantz(20)

From Question to Action by Janessa Lantz
From Question to ActionFrom Question to Action
From Question to Action
Janessa Lantz1.9K views
Analyzing Mixpanel Data with SQL by Janessa Lantz
Analyzing Mixpanel Data with SQLAnalyzing Mixpanel Data with SQL
Analyzing Mixpanel Data with SQL
Janessa Lantz1.2K views
Optimizing Customer Support by Janessa Lantz
Optimizing Customer SupportOptimizing Customer Support
Optimizing Customer Support
Janessa Lantz1.8K views
Analyzing ROI Using Your Facebook and Adwords Data by Janessa Lantz
Analyzing ROI Using Your Facebook and Adwords DataAnalyzing ROI Using Your Facebook and Adwords Data
Analyzing ROI Using Your Facebook and Adwords Data
Janessa Lantz1.8K views
How to Find the Customer Retention Secrets Hiding in Your Data by Janessa Lantz
How to Find the Customer Retention Secrets Hiding in Your DataHow to Find the Customer Retention Secrets Hiding in Your Data
How to Find the Customer Retention Secrets Hiding in Your Data
Janessa Lantz1.1K views
How to Use Feedback Surveys to Improve Customer Retention by Janessa Lantz
How to Use Feedback Surveys to Improve Customer RetentionHow to Use Feedback Surveys to Improve Customer Retention
How to Use Feedback Surveys to Improve Customer Retention
Janessa Lantz2.4K views
Shopify and rjmetrics 2.25.16 by Janessa Lantz
Shopify and rjmetrics 2.25.16Shopify and rjmetrics 2.25.16
Shopify and rjmetrics 2.25.16
Janessa Lantz968 views
The Ultimate 30-Minute Guide to SaaS Analytics by Janessa Lantz
The Ultimate 30-Minute Guide to SaaS AnalyticsThe Ultimate 30-Minute Guide to SaaS Analytics
The Ultimate 30-Minute Guide to SaaS Analytics
Janessa Lantz1.3K views
Using Benchmark Data to Improve Performance by Janessa Lantz
Using Benchmark Data to Improve PerformanceUsing Benchmark Data to Improve Performance
Using Benchmark Data to Improve Performance
Janessa Lantz731 views
How to Build a Data-Driven Company: From Infrastructure to Insights by Janessa Lantz
How to Build a Data-Driven Company: From Infrastructure to InsightsHow to Build a Data-Driven Company: From Infrastructure to Insights
How to Build a Data-Driven Company: From Infrastructure to Insights
Janessa Lantz1.4K views
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id... by Janessa Lantz
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Logos, Brand and Underpants: One Startup's Journey to Finding Their Visual Id...
Janessa Lantz639 views
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics by Janessa Lantz
How to Analyze Your Marketing Funnel Using Pardot + RJMetricsHow to Analyze Your Marketing Funnel Using Pardot + RJMetrics
How to Analyze Your Marketing Funnel Using Pardot + RJMetrics
Janessa Lantz687 views
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value by Janessa Lantz
The Insider’s Guide to Increasing Ecommerce Customer Lifetime ValueThe Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
The Insider’s Guide to Increasing Ecommerce Customer Lifetime Value
Janessa Lantz1.2K views
Two Founders Share How Startups Can Reach a Massive Audience by Janessa Lantz
Two Founders Share How Startups Can Reach a Massive AudienceTwo Founders Share How Startups Can Reach a Massive Audience
Two Founders Share How Startups Can Reach a Massive Audience
Janessa Lantz839 views
Evaluating SaaS Startups: The Investor's Perspective by Janessa Lantz
Evaluating SaaS Startups: The Investor's PerspectiveEvaluating SaaS Startups: The Investor's Perspective
Evaluating SaaS Startups: The Investor's Perspective
Janessa Lantz2.7K views
How to Build a $24 Million Ecommerce Company in 2 Years by Janessa Lantz
How to Build a $24 Million Ecommerce Company in 2 YearsHow to Build a $24 Million Ecommerce Company in 2 Years
How to Build a $24 Million Ecommerce Company in 2 Years
Janessa Lantz2.5K views
How to 2X Your Paid Search ROI Without More Conversions by Janessa Lantz
How to 2X Your Paid Search ROI Without More ConversionsHow to 2X Your Paid Search ROI Without More Conversions
How to 2X Your Paid Search ROI Without More Conversions
Janessa Lantz939 views
The Growth Hacking Skill No One's Talking About by Janessa Lantz
The Growth Hacking Skill No One's Talking AboutThe Growth Hacking Skill No One's Talking About
The Growth Hacking Skill No One's Talking About
Janessa Lantz2.6K views

Recently uploaded

Introduction to Microsoft Fabric.pdf by
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdfishaniuudeshika
24 views16 slides
Short Story Assignment by Kelly Nguyen by
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyenkellynguyen01
18 views17 slides
How Leaders See Data? (Level 1) by
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
13 views76 slides
Advanced_Recommendation_Systems_Presentation.pptx by
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptxneeharikasingh29
5 views9 slides
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx by
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxJaysonGarabilesEspej
6 views3 slides
ColonyOS by
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 views17 slides

Recently uploaded(20)

Introduction to Microsoft Fabric.pdf by ishaniuudeshika
Introduction to Microsoft Fabric.pdfIntroduction to Microsoft Fabric.pdf
Introduction to Microsoft Fabric.pdf
ishaniuudeshika24 views
Short Story Assignment by Kelly Nguyen by kellynguyen01
Short Story Assignment by Kelly NguyenShort Story Assignment by Kelly Nguyen
Short Story Assignment by Kelly Nguyen
kellynguyen0118 views
Advanced_Recommendation_Systems_Presentation.pptx by neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation by DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
CRIJ4385_Death Penalty_F23.pptx by yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 views
Chapter 3b- Process Communication (1) (1)(1) (1).pptx by ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 views
Data structure and algorithm. by Abdul salam
Data structure and algorithm. Data structure and algorithm.
Data structure and algorithm.
Abdul salam 18 views
Cross-network in Google Analytics 4.pdf by GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 views
Understanding Hallucinations in LLMs - 2023 09 29.pptx by Greg Makowski
Understanding Hallucinations in LLMs - 2023 09 29.pptxUnderstanding Hallucinations in LLMs - 2023 09 29.pptx
Understanding Hallucinations in LLMs - 2023 09 29.pptx
Greg Makowski13 views
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx by DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
3196 The Case of The East River by ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9011 views
Survey on Factuality in LLM's.pptx by NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 views
UNEP FI CRS Climate Risk Results.pptx by pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 views
RuleBookForTheFairDataEconomy.pptx by noraelstela1
RuleBookForTheFairDataEconomy.pptxRuleBookForTheFairDataEconomy.pptx
RuleBookForTheFairDataEconomy.pptx
noraelstela167 views
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf by vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 views

The Human Algorithm: Automating Startup Data Collection at Mattermark

  • 1. #datapointlive The Human Algorithm: Automating Startup Data Collection at Mattermark Sarah Catanzaro, Head of Data at Mattermark @sarahcat21
  • 2. #DPL15 | @sarahcat21 Mattermark is a deal intelligence platform and private company database used by ● investors ● business and corporate development ● sales Mattermark
  • 3. #DPL15 | @sarahcat21 THE CHALLENGE Scale + Information Overload + Stealth
  • 4. #DPL15 | @sarahcat21 Scale Over 125 million private companies in the world (only about 45.5 thousand public).
  • 6. #DPL15 | @sarahcat21 Stealth ● Private companies do not have strong incentives (e.g. legal obligations) to share data. Many may have competitive incentives to obfuscate information. ● Investors may request non-disclosure.
  • 8. #DPL15 | @sarahcat21 Software-oriented approach ● A must, due to the scale of our dataset ○ 1.3 million companies ○ 16.5k investors ○ 110k funding events ● Leverage a lean data team
  • 9. #DPL15 | @sarahcat21 Data collection strategy ● Web scraping ● Machine learning ● Direct submission ● Manual data entry
  • 10. #DPL15 | @sarahcat21 The “Human Algorithm”
  • 11. #DPL15 | @sarahcat21 Investors ask questions like What start-ups might raise capital in the next 6 months? What startups is Stephanie Palmeri investing in?
  • 12. #DPL15 | @sarahcat21 Our data analysts seek to understand: ● Why does this question matter? ● What data is required to answer this question? ● Where can this data be accessed?
  • 13. #DPL15 | @sarahcat21 Next, data analysts: 1. Define repeatable processes for data collection. 2. Determine whether processes can be replicated through web scraping and/or machine learning algorithms to collect data at scale. 3. Write functional specifications, reviewed by sales and engineering team members.
  • 14. #DPL15 | @sarahcat21 Next, web and/or machine learning engineers 1. Write dev designs, reviewed by data analysts. 2. Upon implementation and marketing release, this data becomes available to customers. 3. New questions arise and the cycle starts again.
  • 16. #DPL15 | @sarahcat21 Investors ask questions like How much funding has a company already raised? Who were the investors at each of those rounds?
  • 17. #DPL15 | @sarahcat21 Problems with existing sources Rely on wiki-style data collection (cannot confirm the credibility of sources) News reports are better; but ● facts are harder to extricate ● different sources report different figures
  • 18. #DPL15 | @sarahcat21 Solution: funding automation A new framework for collecting and synthesizing funding data. 1. News article fact extraction (machine learning) 2. Funding override system (web engineering) 3. Funding confirmation email campaign (marketing)
  • 19. #DPL15 | @sarahcat21 2. News article fact extraction Crawl RSS feeds, extract data from stories (title, texts, links, etc.) ● 750+ sources ● 5,000 - 10,000 articles
  • 20. #DPL15 | @sarahcat21 2. News article fact extraction Classify stories about funding ● 250 articles/day
  • 21. #DPL15 | @sarahcat21 2. News article fact extraction ● Identify sentences containing information about investors, amount, and/or series
  • 22. #DPL15 | @sarahcat21 2. News article fact extraction ● Extract facts ● Match companies and investors to entities in our database ○ 30% of extracted articles are entered automatically
  • 23. #DPL15 | @sarahcat21 1. Funding override system ● Identify reports about the same funding event ● Combine information from multiple reports using wongi rules engine
  • 24. #DPL15 | @sarahcat21 3. Funding confirmation email campaign Use CRM and Hubspot to automatically send emails to founders after equity financing.
  • 26. #DPL15 | @sarahcat21 Where we struggled Our initial implementation of a funding override system was inefficient. Why? Because our data analysts and developers were not aligned on functional requirements.
  • 27. #DPL15 | @sarahcat21 Solution ● Analysts must work closely with developers ○ Pre-spec check-ins ○ Analysts review dev designs to ensure that the system design addresses the use case. ● Analysts must avoid being prescriptive ● Analysts must understand data mining and machine learning concepts
  • 28. #DPL15 | @sarahcat21 Where we succeeded Implementation of news article fact extraction was successful. Why? Because data analysts and developers worked as service providers to each other.
  • 30. #DPL15 | @sarahcat21 1. Tighter Analyst + Dev Communication Tiger teams: 1 ML developer, 1 web/infrastructure developer, 1 data analyst, 1 project lead Define milestones & hold daily stand-ups.
  • 31. #DPL15 | @sarahcat21 3. Track II interaction reinforce symbiotic relationship ● Devs lead Python learning group ● Data analysts hold seminars on topics like admin tooling and alternative assets