SlideShare a Scribd company logo
Euangelos Linardos
Data Scientist @ Pollfish Inc
2nd Athens Data Science Meetup, Athens 17 December 2015
Data at Pollfish
Twitter: @eualin  Email: euangelos@pollfish.com
I AM EUANGELOS LINARDOS
THE CONCEPT
PART I
ABOUT POLLFISH
Pollfish is a mobile survey platform that delivers online surveys globally.
Pollfish ensures your survey reaches just the right audience and provides the most cost
effective, quick and accurate survey results.
DIY SURVEY TOOL
PUBLISHERS NETWORK
MORE THAN 170M MOBILE DEVICES ALL OVER THE WORLD
UNIQUE USER EXPERIENCE
A WIN WIN WIN SITUATION
I WIN, YOU WIN
EVERYBODY WINS
REAL-TIME RESULTS
SUPERIOR QUALITY
IT DOESN’T MATTER WHAT WE SAY
CLAIM YOUR FREE COUPON
AND TRY IT NOW
NATURE OF DATA
PART II
MOBILE SURVEYS IS
A BIG DATA BUSINESS
VOLUME
● UNIQUE USERS:
~2 M daily
~15 M monthly
~170 M total
● DATA TRAFFIC:
~1 TB daily
~26 TB monthly
~210 TB total
* volume = scale of data
THAT’S A LOT OF SELFIES
VARIETY
● survey
● location
● device
● weather
● network
● publisher
● language
● and many more
* variety = different forms of data
PERSONA (200+)
≠
VARIETY
"taxonomy" and "persona" are used
Interchangeably throughout this presentation!
[TAXONOMY = FEATURE ] [PERSONA = COMB. OF FEATURES]
VELOCITY
● ~11 M requests per day; on every request:
detect possible fraudulent activity
predict user action (start, finish, abort)
OF WHICH…
● ~13% accounts for classifications (new users)
1 update / user / taxonomy
● ~87% accounts for “traditional” lookups (old users)
1 lookup / user
* velocity = analysis of streaming data
VERACITY
● survey answers may be inaccurate
● device  location data may be misleading
● 3rd party data may be outdated or wrong
* veracity = uncertainty of data
Too much to store on a single computer.
We need a cluster to process it.
This is typically what is called “Big Data”.
Amazing dataset to slice and dice!
DATA PROCESSES
PART III
MAIN DATA OPERATIONS
● Reporting
● Business Analytics
● Operational Analytics
● Product Features
REPORTING
REPORTING
● GROUPS OF INTEREST:
publishers
researchers
● EXAMPLE QUERIES:
# of surveys completed through my app?
# of users completed my survey?
BUSINESS ANALYTICS
BUSINESS ANALYTICS
● GROUPS OF INTEREST:
sales and operations
management, executives and investors
● EXAMPLE QUERIES:
count number of (daily, weekly etc.) active users
analyze growth, user behavior, sign-up funnels
company KPIs (Key Performance Indicator)
NPS analysis (Net Promoter Score)
* KPI: evaluate the success of an organization.
* NPS: measure the loyalty of a firm’s customer relationships.
OPERATIONAL ANALYTICS
OPERATIONAL ANALYTICS
● GROUPS OF INTEREST:
devops engineers
data engineers
● EXAMPLE QUERIES:
latency analysis: msec to wait for survey after loading the app
capacity planning: server, people, bandwidth etc.
root cause analysis: locates the root causes of faults
PRODUCT FEATURES
PRODUCT FEATURES
● Data enrichment
● Publisher classification
● Fraud detection
● User personas
● A/B testing
SURVEY PERSONALISATION IS THE FUTURE!
SURVEY
... should fit your mood.
... should fit your activity.
... should be personal!
IF YOU LOOK LIKE THIS #1
Gender: male
Age: 24-34
Marital status: single
Location: california
Interest: sports
salary: 150K
Show PERSONAL
survey! #1
SURVEY SHOULD FOLLOW #1
Gender: male
Age: 24-34
Marital status: single
Location: california
Interest: sports
salary: 150K
interested in
buying the
latest convertible
from
BMW?
IF YOU LOOK LIKE THIS #2
Gender: male
Age: 34-44
Marital status: married
Location: helsinki
Interest: video games
salary: 90K
Show PERSONAL
survey! #2
IF YOU LOOK LIKE THIS #2
Gender: male
Age: 34-44
Marital status: married
Location: helsinki
Interest: video games
salary: 90K
interested in
buying the
latest SUV from
VOLVO?
OVERCOME THE CHALLENGE
Challenge:
survey data is accurate but limited. How do you scale?
Solution:
dedicated machine learning models using quality survey data.
Pollfish Personas:
targetable groups of consumers with similar characteristics, based on device, location data,
and most importantly, survey answers!
POLLFISH PREDICTORS
Multivariate:
persona probability score calculated based on all available attributes.
Daily Updated:
keep your models current with daily model refreshments.
With Customizable Threshold:
customize threshold for precision or recall.
SYSTEM ARCHITECTURE
PART IV
TO MAKE DATA-DRIVEN DECISIONS
DATA AND INFRASTRUCTURE
ARE REQUIRED (AMONG THE OTHERS).
HIGH LEVEL ARCHITECTURE
HDFS
● more data usually beats better algorithms
● raw data is:
complicated
often dirty
evolving structure
duplication all over
● getting data to a central point is hard! #NOT
● it's simple! we just throw them into HDFS!
C*
● a distributed and linearly scalable and distributed key-value store
● ideal for time-series data
● provides fast random access for many small pieces of data
● use it for surveys, user profiles, popularity count and almost anything
POSTGRESQL
● we still use it, a lot!
● powering features that require transactions support, integrity constraints, and more
● aggregated data for dashboard and quick analysis
CRITICAL AND CONSISTENCY
IMPORTANT? → POSTGRESQL
HUGE, GROWING FAST, EVENTUAL
CONSISTENCY OK? → CASSANDRA
RAW AND HISTORICAL? → HDFS
AZKABAN
● allows us to build pipelines of batch jobs
● handles dependency resolution, workflow management, visualisation and more
● the alternative to Luigi and Oozie
SPARK
● general cluster computing platform:
distributed in-memory computational framework
SQL, Machine Learning, Stream Processing, etc.
● easy to use, powerful, high-level API:
Scala, Java, Python and R
TIPS FOR DEVELOPING DATA PRODUCTS
● Collect data, data, DATA!!!
● Large amounts of data can reveal new patterns
● Be careful of “black box” approaches
● Look at your raw data (exploratory analysis)
● Aggregate statistics can be misleading
● Visualize your data
● Include data geeks in design process
● Find opportunity in your error data
Thank you
(we’re hiring):
https://pollfish.workable.com/

More Related Content

Viewers also liked

3 key factors to be a great facilitator -Jen
3 key factors to be a great facilitator -Jen3 key factors to be a great facilitator -Jen
3 key factors to be a great facilitator -Jen
Jen Vuhuong
 
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
waylay
 
4 Effortless Tactics To Generate PR
4 Effortless Tactics To Generate PR4 Effortless Tactics To Generate PR
4 Effortless Tactics To Generate PR
Survata
 
People, Brands and Social Media - presented by Soud Hyder from Al Jazeera
People, Brands and Social Media - presented by Soud Hyder from Al JazeeraPeople, Brands and Social Media - presented by Soud Hyder from Al Jazeera
People, Brands and Social Media - presented by Soud Hyder from Al Jazeera
Squad_Digital
 
120630【販促会議賞】tsutayaヒマポ
120630【販促会議賞】tsutayaヒマポ120630【販促会議賞】tsutayaヒマポ
120630【販促会議賞】tsutayaヒマポMasahide Yoshida
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
Jon Morley
 
Balance of payment
Balance of paymentBalance of payment
Balance of payment
Subhashvpv Vp
 
Community Management - presented by Gaurav Singh of Squad Digital
Community Management - presented by Gaurav Singh of Squad DigitalCommunity Management - presented by Gaurav Singh of Squad Digital
Community Management - presented by Gaurav Singh of Squad Digital
Squad_Digital
 
Gebruikersparticipatie in voedselbedeling
Gebruikersparticipatie in voedselbedelingGebruikersparticipatie in voedselbedeling
Gebruikersparticipatie in voedselbedeling
POD Maatschappelijke Integratie - SPP Intégration Sociale
 
Proeven van participatie
Proeven van participatieProeven van participatie
ورقة عمل Wordpress ووردبريس
ورقة عمل Wordpress ووردبريسورقة عمل Wordpress ووردبريس
ورقة عمل Wordpress ووردبريسMohamed Ali Mougou
 
A fainting case in a fm clinic
A fainting case in a fm clinicA fainting case in a fm clinic
A fainting case in a fm clinic
Aaron Lee
 
Business Exit Strategy
Business Exit Strategy Business Exit Strategy
Business Exit Strategy
Anirban Chakraborty
 
Effective Use of Social Media for Customer Service - presented by Maryann Mic...
Effective Use of Social Media for Customer Service - presented by Maryann Mic...Effective Use of Social Media for Customer Service - presented by Maryann Mic...
Effective Use of Social Media for Customer Service - presented by Maryann Mic...
Squad_Digital
 
ingles en niños especiales nixelena lopez
ingles en niños especiales nixelena lopezingles en niños especiales nixelena lopez
ingles en niños especiales nixelena lopez
NixelenaL
 

Viewers also liked (20)

3 key factors to be a great facilitator -Jen
3 key factors to be a great facilitator -Jen3 key factors to be a great facilitator -Jen
3 key factors to be a great facilitator -Jen
 
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
A Cloud-Based Bayesian Smart Agent Architecture for Internet-of-Things Applic...
 
4 Effortless Tactics To Generate PR
4 Effortless Tactics To Generate PR4 Effortless Tactics To Generate PR
4 Effortless Tactics To Generate PR
 
People, Brands and Social Media - presented by Soud Hyder from Al Jazeera
People, Brands and Social Media - presented by Soud Hyder from Al JazeeraPeople, Brands and Social Media - presented by Soud Hyder from Al Jazeera
People, Brands and Social Media - presented by Soud Hyder from Al Jazeera
 
120630【販促会議賞】tsutayaヒマポ
120630【販促会議賞】tsutayaヒマポ120630【販促会議賞】tsutayaヒマポ
120630【販促会議賞】tsutayaヒマポ
 
Kleuren
KleurenKleuren
Kleuren
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
Balance of payment
Balance of paymentBalance of payment
Balance of payment
 
T3
T3T3
T3
 
Community Management - presented by Gaurav Singh of Squad Digital
Community Management - presented by Gaurav Singh of Squad DigitalCommunity Management - presented by Gaurav Singh of Squad Digital
Community Management - presented by Gaurav Singh of Squad Digital
 
Daklozengids 2012
Daklozengids 2012Daklozengids 2012
Daklozengids 2012
 
ورقة عمل Wordpress
ورقة عمل Wordpress ورقة عمل Wordpress
ورقة عمل Wordpress
 
Gebruikersparticipatie in voedselbedeling
Gebruikersparticipatie in voedselbedelingGebruikersparticipatie in voedselbedeling
Gebruikersparticipatie in voedselbedeling
 
Proeven van participatie
Proeven van participatieProeven van participatie
Proeven van participatie
 
ورقة عمل Wordpress ووردبريس
ورقة عمل Wordpress ووردبريسورقة عمل Wordpress ووردبريس
ورقة عمل Wordpress ووردبريس
 
A fainting case in a fm clinic
A fainting case in a fm clinicA fainting case in a fm clinic
A fainting case in a fm clinic
 
Business Exit Strategy
Business Exit Strategy Business Exit Strategy
Business Exit Strategy
 
Effective Use of Social Media for Customer Service - presented by Maryann Mic...
Effective Use of Social Media for Customer Service - presented by Maryann Mic...Effective Use of Social Media for Customer Service - presented by Maryann Mic...
Effective Use of Social Media for Customer Service - presented by Maryann Mic...
 
ورقة عمل Kodu
ورقة عمل Koduورقة عمل Kodu
ورقة عمل Kodu
 
ingles en niños especiales nixelena lopez
ingles en niños especiales nixelena lopezingles en niños especiales nixelena lopez
ingles en niños especiales nixelena lopez
 

Similar to Data At Pollfish, Dec. 2015, Euangelos Linardos

Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
BigDataEverywhere
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
VMware Tanzu
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
Sally Sadosky
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
RajendraKankrale1
 
Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]
aj_cache
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
Outi Aramo
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
Product School
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
Apigee | Google Cloud
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
ANAND PRAKASH
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
Splunk
 
Splunk MINT Deepdive
Splunk MINT Deepdive Splunk MINT Deepdive
Splunk MINT Deepdive
Splunk
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
Splunk
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
Splunk
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
Inside Analysis
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
Mukund Babbar
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
SingleStore
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
Lviv Startup Club
 
Using splunk for_big_data
Using splunk for_big_dataUsing splunk for_big_data
Using splunk for_big_data
Accenture
 
Presentation
PresentationPresentation
Presentation
cdadral
 

Similar to Data At Pollfish, Dec. 2015, Euangelos Linardos (20)

Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
Big Data Everywhere Chicago: Platfora - Practices for Customer Analytics on H...
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
Unit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big DataUnit-I_Big data life cycle.pptx, sources of Big Data
Unit-I_Big data life cycle.pptx, sources of Big Data
 
Agile data science
Agile data scienceAgile data science
Agile data science
 
Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]Where does Data Democracy begin? [Segment-Synapse, 2019]
Where does Data Democracy begin? [Segment-Synapse, 2019]
 
Using analytics in ux design my view
Using analytics in ux design   my viewUsing analytics in ux design   my view
Using analytics in ux design my view
 
Intro to Data Analytics with Oscar's Director of Product
 Intro to Data Analytics with Oscar's Director of Product Intro to Data Analytics with Oscar's Director of Product
Intro to Data Analytics with Oscar's Director of Product
 
Benchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the MarketBenchmarking Digital Readiness: Moving at the Speed of the Market
Benchmarking Digital Readiness: Moving at the Speed of the Market
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
 
Splunk MINT Deepdive
Splunk MINT Deepdive Splunk MINT Deepdive
Splunk MINT Deepdive
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
 
Splunk MINT Deepdive
Splunk MINT DeepdiveSplunk MINT Deepdive
Splunk MINT Deepdive
 
Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion Moving Targets: Harnessing Real-time Value from Data in Motion
Moving Targets: Harnessing Real-time Value from Data in Motion
 
Data Analytics in Digital Transformation
Data Analytics in Digital TransformationData Analytics in Digital Transformation
Data Analytics in Digital Transformation
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"Borys Pratsiuk "How to be NVidia partner"
Borys Pratsiuk "How to be NVidia partner"
 
Using splunk for_big_data
Using splunk for_big_dataUsing splunk for_big_data
Using splunk for_big_data
 
Presentation
PresentationPresentation
Presentation
 

Recently uploaded

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
GDSC PJATK
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 

Recently uploaded (20)

Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!Finale of the Year: Apply for Next One!
Finale of the Year: Apply for Next One!
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 

Data At Pollfish, Dec. 2015, Euangelos Linardos

  • 1. Euangelos Linardos Data Scientist @ Pollfish Inc 2nd Athens Data Science Meetup, Athens 17 December 2015 Data at Pollfish
  • 2. Twitter: @eualin Email: euangelos@pollfish.com I AM EUANGELOS LINARDOS
  • 4. ABOUT POLLFISH Pollfish is a mobile survey platform that delivers online surveys globally. Pollfish ensures your survey reaches just the right audience and provides the most cost effective, quick and accurate survey results.
  • 6. PUBLISHERS NETWORK MORE THAN 170M MOBILE DEVICES ALL OVER THE WORLD
  • 8. A WIN WIN WIN SITUATION I WIN, YOU WIN EVERYBODY WINS
  • 11. IT DOESN’T MATTER WHAT WE SAY CLAIM YOUR FREE COUPON AND TRY IT NOW
  • 13. MOBILE SURVEYS IS A BIG DATA BUSINESS
  • 14. VOLUME ● UNIQUE USERS: ~2 M daily ~15 M monthly ~170 M total ● DATA TRAFFIC: ~1 TB daily ~26 TB monthly ~210 TB total * volume = scale of data
  • 15. THAT’S A LOT OF SELFIES
  • 16. VARIETY ● survey ● location ● device ● weather ● network ● publisher ● language ● and many more * variety = different forms of data PERSONA (200+)
  • 17. ≠ VARIETY "taxonomy" and "persona" are used Interchangeably throughout this presentation! [TAXONOMY = FEATURE ] [PERSONA = COMB. OF FEATURES]
  • 18. VELOCITY ● ~11 M requests per day; on every request: detect possible fraudulent activity predict user action (start, finish, abort) OF WHICH… ● ~13% accounts for classifications (new users) 1 update / user / taxonomy ● ~87% accounts for “traditional” lookups (old users) 1 lookup / user * velocity = analysis of streaming data
  • 19. VERACITY ● survey answers may be inaccurate ● device location data may be misleading ● 3rd party data may be outdated or wrong * veracity = uncertainty of data
  • 20. Too much to store on a single computer. We need a cluster to process it. This is typically what is called “Big Data”. Amazing dataset to slice and dice!
  • 21.
  • 23. MAIN DATA OPERATIONS ● Reporting ● Business Analytics ● Operational Analytics ● Product Features
  • 25. REPORTING ● GROUPS OF INTEREST: publishers researchers ● EXAMPLE QUERIES: # of surveys completed through my app? # of users completed my survey?
  • 27. BUSINESS ANALYTICS ● GROUPS OF INTEREST: sales and operations management, executives and investors ● EXAMPLE QUERIES: count number of (daily, weekly etc.) active users analyze growth, user behavior, sign-up funnels company KPIs (Key Performance Indicator) NPS analysis (Net Promoter Score) * KPI: evaluate the success of an organization. * NPS: measure the loyalty of a firm’s customer relationships.
  • 29. OPERATIONAL ANALYTICS ● GROUPS OF INTEREST: devops engineers data engineers ● EXAMPLE QUERIES: latency analysis: msec to wait for survey after loading the app capacity planning: server, people, bandwidth etc. root cause analysis: locates the root causes of faults
  • 31. PRODUCT FEATURES ● Data enrichment ● Publisher classification ● Fraud detection ● User personas ● A/B testing
  • 33. SURVEY ... should fit your mood. ... should fit your activity. ... should be personal!
  • 34. IF YOU LOOK LIKE THIS #1 Gender: male Age: 24-34 Marital status: single Location: california Interest: sports salary: 150K Show PERSONAL survey! #1
  • 35. SURVEY SHOULD FOLLOW #1 Gender: male Age: 24-34 Marital status: single Location: california Interest: sports salary: 150K interested in buying the latest convertible from BMW?
  • 36. IF YOU LOOK LIKE THIS #2 Gender: male Age: 34-44 Marital status: married Location: helsinki Interest: video games salary: 90K Show PERSONAL survey! #2
  • 37. IF YOU LOOK LIKE THIS #2 Gender: male Age: 34-44 Marital status: married Location: helsinki Interest: video games salary: 90K interested in buying the latest SUV from VOLVO?
  • 38. OVERCOME THE CHALLENGE Challenge: survey data is accurate but limited. How do you scale? Solution: dedicated machine learning models using quality survey data. Pollfish Personas: targetable groups of consumers with similar characteristics, based on device, location data, and most importantly, survey answers!
  • 39.
  • 40.
  • 41. POLLFISH PREDICTORS Multivariate: persona probability score calculated based on all available attributes. Daily Updated: keep your models current with daily model refreshments. With Customizable Threshold: customize threshold for precision or recall.
  • 43. TO MAKE DATA-DRIVEN DECISIONS DATA AND INFRASTRUCTURE ARE REQUIRED (AMONG THE OTHERS).
  • 45.
  • 46. HDFS ● more data usually beats better algorithms ● raw data is: complicated often dirty evolving structure duplication all over ● getting data to a central point is hard! #NOT ● it's simple! we just throw them into HDFS!
  • 47. C* ● a distributed and linearly scalable and distributed key-value store ● ideal for time-series data ● provides fast random access for many small pieces of data ● use it for surveys, user profiles, popularity count and almost anything
  • 48. POSTGRESQL ● we still use it, a lot! ● powering features that require transactions support, integrity constraints, and more ● aggregated data for dashboard and quick analysis
  • 49. CRITICAL AND CONSISTENCY IMPORTANT? → POSTGRESQL HUGE, GROWING FAST, EVENTUAL CONSISTENCY OK? → CASSANDRA RAW AND HISTORICAL? → HDFS
  • 50. AZKABAN ● allows us to build pipelines of batch jobs ● handles dependency resolution, workflow management, visualisation and more ● the alternative to Luigi and Oozie
  • 51. SPARK ● general cluster computing platform: distributed in-memory computational framework SQL, Machine Learning, Stream Processing, etc. ● easy to use, powerful, high-level API: Scala, Java, Python and R
  • 52. TIPS FOR DEVELOPING DATA PRODUCTS ● Collect data, data, DATA!!! ● Large amounts of data can reveal new patterns ● Be careful of “black box” approaches ● Look at your raw data (exploratory analysis) ● Aggregate statistics can be misleading ● Visualize your data ● Include data geeks in design process ● Find opportunity in your error data