SlideShare a Scribd company logo
1 of 38
Download to read offline
Real-time big data analytics
based on product recommendations case study
IT Business Solutions B2B Conference
October 2015
© deep.bi
We started as an ad network
The challenge was to recommend
the best product (out of millions)
to the right person in a given moment
(thousands of users within a second)
5 billionad views delivered in 24 months
To put it in the scale context:
If we would serve 1 ad per second it will take
160 years
to serve 5 billion ads
So we needed a solution
SQL databases did not work
Popular NoSQL databases did not work
Standard data warehouse approaches (pre-
aggregations, creating schemas) - did not work
Re-thinking all the problems with
huge data streams flowing to us every second
we have built a complete solution
based on open-source technologies
and fresh, smart ideas from our engineering team
It is called deep.bi
and now we make it available to other companies
DEEP.BI = BIG DATA FAST DATA SOLUTION
high velocity
high volume
deep.bi lets high-growth companies
solve fast data problems by providing
scalable, flexible and real-time
data collection, enrichment and analytics
deep.bi – complete data processing flow
Data
enrichment,
transformation
and integration
Unstructured,
raw data from
many sources
page views, IoT events,
IP, URL, cookie,
transactions, call detail
records, etc.
Find
patterns,
build models,
predict
behavior
collect enrich analyze
How to predict the best offer
based on online data – case study.
Collect website, campaigns and CRM data
Website:
Google
Analytics
Campaigns:
Agency
reports
Apps:
Dedicated
monitoring
tools
Other
systems:
Call center
IVR, emails
Instead of integrating current reporting tools we need to
gather all the single events that our customers generate.
Data is stored in silos. Reporting tools provide aggregated
reports impossible to integrate around single customer.
Collecting raw web data is not enough
2015-05-15T00:26:41.328Z,3,D,
[ip_hidden],i1xszg0f-19hqrje,"Mozilla/5.0 (Windows NT
5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/
42.0.2311.152 Safari/537.36",”[url_hidden]",
7279848891,@906,"https://www.google.pl/",vuser-history-
allegro-1-
hc20150509.1,"122_100003_Park@700:html_620x100_single_ban
ner:See offer"
IP, URL, cookie, user-agent, timestamp
* Coming soon
Enrich raw web and mobile data
50+information
from one
interaction
Purchase intent
Device
Time
Location
ISP
Online context
Weather*
Demographics
We can learn quite a few things from user IP
Example use:
•  international travellers
•  townspeople
•  people in mountains
•  rainy day
•  Country
•  Region
•  City
•  ZIP Code
•  Population
•  Latitude & Longitude
•  Time zone
•  IDD prefix to call the city from
another country
•  Phone area code
•  Mobile Country Code (MCC)
•  Mobile Network Code (MNC)
•  Elevation
•  Weather at the moment of event
ISP tells us more we could expect
Example use:
•  competitors’ users->
acquisition
•  our users -> retention/up-
selling/cross-selling
•  people from particular
company or company type
•  ISP name or Organization name
•  Organization type:
•  Commercial
•  Organization
•  Government
•  Military
•  University/College/School
•  Library
•  Content Delivery Network
•  Fixed Line ISP
•  Mobile ISP
•  Data Center/Web Hosting/Transit
•  Search Engine Spider
•  Reserved
•  Mobile brand
•  Net speed
Detailed information about user device
Example use:
•  smartphone users
•  Apple users
•  Samsung Galaxy users
•  Google browser users
•  Device Type
•  Device Brand
•  Device Model
•  Device Operating System
•  Operating System Producer
•  Browser
•  Browser Producer
Besides user features, track user behavior too.
Deeper understanding of people’s behavior:
•  RFM Segmentation (Recency, Frequency, Monetary)
•  Shopping cart analysis
•  Purchase sequence analysis
User behavior and characteristics
helps predicts next best action/offer
What product should we recommend?
How could end this purchase path?
So, how to build tailored recommendations?
Pick an algorithm that is suitable for the problem
Product [ feature_1, feature_2, …, feature_N]
User [ feature_1, feature_2, …, feature_N]
User [ product_1, product_2, …, product_N]
  Simple rules: if a user has some features serve
this group of products
  Manual segment creating: analysts find
segments of users and match them with
product segments
  Simple feature matching: get user weighted
feature vector and match with products feature
vectors
Manual / people managed rules
  Find segments automatically (e.g. k-means)
  Product features based recommendations
  User features based recommendations
  Combined product and user based
recommendations (collaborative filtering, deep
learning)
Machine learning-supported recommendations
Productpopularity
Products
The most interesting
recommendations
Recommendations long tail phenomenon
Technology behind Deep BI
  Complex data model for query optimization
 split dimensions in several tables based on reports made
 pre cherry-pick dimensions which we can aggregate based on
cardinality
 index every dimension column is a must
  Impossible to add high-cardinality dimensions
 no way to analyze per user (millions of them)
 no way to event add all of user-agent, url, geo-info, ...
Problems with SQL and NoSQL databases
  Complex data loading process
 needs to pre-aggregate in memory
 non-trivial reliability issues
 hard to parallelize
  There is always latency
 pre-aggregation in job loading memory
Problems with SQL and NoSQL databases
Customer
databases
Event
sources*
Raw data
stream
Transformed
data stream
Real-time data ingestion
Kafka
Data
Transformation
& Enrichment
Node.js, Spark
Streaming
Real-time
OLAP Store
Druid
Operational
Store
Cassandra
High performance, multi-purpose storage
Webanalytics
dashboard
deep.biAPI	
  
ETL	
  
Customer
analytics
dashboard
*e.g.. mobile apps, websites, marketing campaigns, IoT (beacons, wearables)	
  
Raw Data Store
Hadoop,
Parquet, Spark
deep.bi – real-time big data architecture
DEEP
Data enrichment,
storage
& analytics
Client’s DEEP
Data Space
End-user browser
Web Data Collection API
(HTML or JS)
Trackers pass event data with
<DEEP tracker>
Ingestion
API
Data Collection APIs
1	
  
<D>
<D>
Mobile Data Collection API
(HTML, JS or Native SDK)
Trackers pass event data with
Events are represented with full flexibility of JSON
{
"data": {
"event_type": "CLICK",
"ad_request_event": {
"ctx": {
"event_time": "2015-07-10T06:15:50.819Z",
"ip_address": "XX.XX.XX.XX",
"geo_info": {
"country": ”US", "region": ”California", "city": ”San Francisco",
"timezone": ”PST", "isp": ”XXX",
"population": 849,774
},
“page": {
"raw_url": ”XXX",
"standardized_domain": ”XXX"
},
"page_info": {
"page_raw_url": ”XXX",
”product_categories": [
{ "id": 20585 },
{ "id": 100126 },
},
"cookie": "ibx8axlw-17j287o",
"user_agent": "Mozilla/5.0 (Linux; Android 4.2.2; GT-S7580 Build/JDQ39)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile
Safari/537.36",
  Publish-subscribe service
  The nervous system of enterprise data
  decouple producers from consumers
  reliable buffer data
  send now, process later.
  Scalable distributed, replicated log system
  Pause components, restart processing
  Powered by:
  web giants like LinkedIn, Twitter, Netflix, Uber, Spotify or Pinterest
  >10M messages/second
Apache Kafka
  Scalable, fault-tolerant stream processing system
  With simple programming model & rich API & integrations
  Powered by:
  Yahoo, Netflix, eBay
  NASA, Intel, Cisco
  It is our fundamental technology for streaming applications
sessionize events
  detect frauds
  attribute purchases to click or views
  load & read external stores like Druid, Hadoop, Cassand
Apache Spark Streaming
  Open Source Streaming Data Store for Interactive Analytics at Scale
  denormalized data
  no more snowflake or star-schema!
  Build real-time dashboards, analytic applications, exploratory tools on it.
  It’s FAST!
  aggregate, drill-down, slice-n-dice in sub-seconds
  advanced column-store with compression
  sophisticated approximate algorithms
  It’s SCALABLE
  horizontally scalable - just add more machines
  replicated, highly-available
  Over 100 PBs of data, millions events/second
Druid – Real-time OLAP Store
  Ingest historical & real-time data
  data available for exploration in milliseconds
  can store years of data in very optimized storage
  Powered by
  eBay, Netflix, PayPal, Yahoo
  Cisco
  It is our core data store of all events, historical and real-time data
Druid – Real-time OLAP Store
  Apache Spark for batch-processing: fast and general engine for
large-scale data processing
  Replaces Map-Reduce, being up to 10x-100x faster!
  Number 1 open-source project in big data space (contributors, commits)
  In-memory processing (if possible)
  Spark SQL for SQL processing
  Apache Parquet - an optimized storage format
  columnar – read only columns you need
  compressed – specialized compression for data type + generic compression
  2x-4x: 600 GB data -> 150 GB data
  Hadoop can be optimized by 2 order of magnitudes: from hours
to seconds!
Hadoop Optimized
Thank you!
Share your thoughts, challenges
or case studies with us.
Or drop us a line: hello@deep.bi
SUBMIT»
Backup slides
Let’s assume we want to find users who:
  Were interested in smartphones
  Use Samsung product
  Live in cities with population over 1M people
  Are woman
  Were traveling abroad
  Came from our display campaign
So, we have a combination of 6 (k) dimensions from 50 (n).
Using the combination formula: we will have…
Complexity of multidimensional queries
… similar number of possible combinations:
15,890,700
as in Lotto (6 from 49).
Thank you!
Share your thoughts, challenges
or case studies with us.
Or drop us a line: hello@deep.bi
SUBMIT»

More Related Content

What's hot

SplunkLive! Splunk for Business Analytics
SplunkLive! Splunk for Business AnalyticsSplunkLive! Splunk for Business Analytics
SplunkLive! Splunk for Business AnalyticsSplunk
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Databricks
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreAmazon Web Services
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016yalisassoon
 
Graphs in Action
Graphs in ActionGraphs in Action
Graphs in ActionNeo4j
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsBob Samuels
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analyticsRohanKumarJumnani
 
Callcenter HPE IDOL overview
Callcenter HPE IDOL overviewCallcenter HPE IDOL overview
Callcenter HPE IDOL overviewTania Akinina
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Cambridge Semantics
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
 
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Databricks
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...VoltDB
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...MongoDB
 
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
 
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4jKeynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4jNeo4j
 
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremGraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremNeo4j
 
MicroStrategy on Amazon Web Services (AWS) Cloud
MicroStrategy on Amazon Web Services (AWS) CloudMicroStrategy on Amazon Web Services (AWS) Cloud
MicroStrategy on Amazon Web Services (AWS) CloudCCG
 
Building an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signalsBuilding an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signalsTigerGraph
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowCambridge Semantics
 

What's hot (20)

SplunkLive! Splunk for Business Analytics
SplunkLive! Splunk for Business AnalyticsSplunkLive! Splunk for Business Analytics
SplunkLive! Splunk for Business Analytics
 
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
Bank Struggles Along the Way for the Holy Grail of Personalization: Customer 360
 
Graph Database
Graph Database  Graph Database
Graph Database
 
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and moreBig Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
Big Data & Analytics - Use Cases in Mobile, E-commerce, Media and more
 
Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016Snowplow: where we came from and where we are going - March 2016
Snowplow: where we came from and where we are going - March 2016
 
Graphs in Action
Graphs in ActionGraphs in Action
Graphs in Action
 
Finance and Audit Predictive Analytics
Finance and Audit Predictive AnalyticsFinance and Audit Predictive Analytics
Finance and Audit Predictive Analytics
 
Tools and techniques for predictive analytics
Tools and techniques for predictive analyticsTools and techniques for predictive analytics
Tools and techniques for predictive analytics
 
Callcenter HPE IDOL overview
Callcenter HPE IDOL overviewCallcenter HPE IDOL overview
Callcenter HPE IDOL overview
 
Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?Should a Graph Database Be in Your Next Data Warehouse Stack?
Should a Graph Database Be in Your Next Data Warehouse Stack?
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
Creating an Omnichannel Banking Experience with Machine Learning on Azure Dat...
 
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
Fast Data for Competitive Advantage: 4 Steps to Expand your Window of Opportu...
 
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
Using NoSQL and Enterprise Shared Services (ESS) to Achieve a More Efficient ...
 
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)
 
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4jKeynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
Keynote GraphTour Europe 2019, Emil Eifrem, CEO & Co-Founder Neo4j
 
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil EifremGraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
GraphConnect Europe 2016 - Opening Keynote, Emil Eifrem
 
MicroStrategy on Amazon Web Services (AWS) Cloud
MicroStrategy on Amazon Web Services (AWS) CloudMicroStrategy on Amazon Web Services (AWS) Cloud
MicroStrategy on Amazon Web Services (AWS) Cloud
 
Building an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signalsBuilding an accurate understanding of consumers based on real-world signals
Building an accurate understanding of consumers based on real-world signals
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 

Viewers also liked

Digital news publishing: Increasing content consumption and distribution on s...
Digital news publishing: Increasing content consumption and distribution on s...Digital news publishing: Increasing content consumption and distribution on s...
Digital news publishing: Increasing content consumption and distribution on s...Deep.BI
 
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...William Hendrickson
 
Accenture fare management solution uitp it trans 2014 a4-pp_v3
Accenture fare management solution   uitp it trans 2014 a4-pp_v3Accenture fare management solution   uitp it trans 2014 a4-pp_v3
Accenture fare management solution uitp it trans 2014 a4-pp_v3Emmanuel Schneider
 
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014CIOnl
 
O2 Eloqua Marketing Automation implementation
O2 Eloqua Marketing Automation implementationO2 Eloqua Marketing Automation implementation
O2 Eloqua Marketing Automation implementationThe Marketing Practice
 
Deloitte & Mulesoft : The Right Mix
Deloitte & Mulesoft : The Right MixDeloitte & Mulesoft : The Right Mix
Deloitte & Mulesoft : The Right MixDavid Graham
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011Hortonworks
 
Deloitte Cloud Accelerators Salesforce Tour Melbourne
Deloitte Cloud Accelerators Salesforce Tour Melbourne Deloitte Cloud Accelerators Salesforce Tour Melbourne
Deloitte Cloud Accelerators Salesforce Tour Melbourne Deloitte Australia
 
Deloitte Innovation
Deloitte InnovationDeloitte Innovation
Deloitte InnovationDWCroese
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku
 
Technology Factor: Accelerating Your Journey to As a Service
Technology Factor: Accelerating Your Journey to As a ServiceTechnology Factor: Accelerating Your Journey to As a Service
Technology Factor: Accelerating Your Journey to As a Serviceaccenture
 
Slide deck example
Slide deck exampleSlide deck example
Slide deck examplescaleit
 
The New World of As a Service
The New World of As a ServiceThe New World of As a Service
The New World of As a Serviceaccenture
 
Accenture Cloud Platform: Control, Manage and Govern the Enterprise Cloud
Accenture Cloud Platform: Control, Manage and Govern the Enterprise CloudAccenture Cloud Platform: Control, Manage and Govern the Enterprise Cloud
Accenture Cloud Platform: Control, Manage and Govern the Enterprise Cloudaccenture
 
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue Model
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue ModelJDA Innovation Forum: Seamless Omnichannel Campaigns Revenue Model
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue ModelFederico Gasparotto
 

Viewers also liked (20)

Digital news publishing: Increasing content consumption and distribution on s...
Digital news publishing: Increasing content consumption and distribution on s...Digital news publishing: Increasing content consumption and distribution on s...
Digital news publishing: Increasing content consumption and distribution on s...
 
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...
CON8438_Hendrickson-Oracle and Accenture Well Delivery Solution Presentation ...
 
Accenture fare management solution uitp it trans 2014 a4-pp_v3
Accenture fare management solution   uitp it trans 2014 a4-pp_v3Accenture fare management solution   uitp it trans 2014 a4-pp_v3
Accenture fare management solution uitp it trans 2014 a4-pp_v3
 
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014
Innovation Beyond IT - Erik Ubels (Deloitte) CIO Summit 2014
 
O2 Eloqua Marketing Automation implementation
O2 Eloqua Marketing Automation implementationO2 Eloqua Marketing Automation implementation
O2 Eloqua Marketing Automation implementation
 
Deloitte & Mulesoft : The Right Mix
Deloitte & Mulesoft : The Right MixDeloitte & Mulesoft : The Right Mix
Deloitte & Mulesoft : The Right Mix
 
HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011HCatalog Hadoop Summit 2011
HCatalog Hadoop Summit 2011
 
Deloitte Cloud Accelerators Salesforce Tour Melbourne
Deloitte Cloud Accelerators Salesforce Tour Melbourne Deloitte Cloud Accelerators Salesforce Tour Melbourne
Deloitte Cloud Accelerators Salesforce Tour Melbourne
 
Deloitte Innovation
Deloitte InnovationDeloitte Innovation
Deloitte Innovation
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team Dataiku -  Big data paris 2015 - A Hybrid Platform, a Hybrid Team
Dataiku - Big data paris 2015 - A Hybrid Platform, a Hybrid Team
 
Technology Factor: Accelerating Your Journey to As a Service
Technology Factor: Accelerating Your Journey to As a ServiceTechnology Factor: Accelerating Your Journey to As a Service
Technology Factor: Accelerating Your Journey to As a Service
 
Slide deck example
Slide deck exampleSlide deck example
Slide deck example
 
The Promise of as-a-Service
The Promise of as-a-ServiceThe Promise of as-a-Service
The Promise of as-a-Service
 
Real-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the AgencyReal-Time Analytics: The Future of Big Data in the Agency
Real-Time Analytics: The Future of Big Data in the Agency
 
CPQ Solution Study
CPQ Solution StudyCPQ Solution Study
CPQ Solution Study
 
The New World of As a Service
The New World of As a ServiceThe New World of As a Service
The New World of As a Service
 
Accenture Cloud Platform: Control, Manage and Govern the Enterprise Cloud
Accenture Cloud Platform: Control, Manage and Govern the Enterprise CloudAccenture Cloud Platform: Control, Manage and Govern the Enterprise Cloud
Accenture Cloud Platform: Control, Manage and Govern the Enterprise Cloud
 
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue Model
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue ModelJDA Innovation Forum: Seamless Omnichannel Campaigns Revenue Model
JDA Innovation Forum: Seamless Omnichannel Campaigns Revenue Model
 
Technology Vision 2017 - Overview
Technology Vision 2017 - OverviewTechnology Vision 2017 - Overview
Technology Vision 2017 - Overview
 

Similar to Real-time big data analytics based on product recommendations case study

Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataTreasure Data, Inc.
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...MSAdvAnalytics
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAmazon Web Services
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemShirshanka Das
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemYael Garten
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?James Serra
 
Internet of Things in Tbilisi
Internet of Things in TbilisiInternet of Things in Tbilisi
Internet of Things in TbilisiAlexey Bokov
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsSriskandarajah Suhothayan
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! Sumeet Singh
 
CWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCapgemini
 
Meetup Toulouse Microsoft Azure : Bâtir une solution IoT
Meetup Toulouse Microsoft Azure : Bâtir une solution IoTMeetup Toulouse Microsoft Azure : Bâtir une solution IoT
Meetup Toulouse Microsoft Azure : Bâtir une solution IoTAlex Danvy
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017SingleStore
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeMongoDB
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Yael Garten
 

Similar to Real-time big data analytics based on product recommendations case study (20)

Partner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_dataPartner webinar presentation aws pebble_treasure_data
Partner webinar presentation aws pebble_treasure_data
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Analyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon KinesisAnalyzing Real-time Streaming Data with Amazon Kinesis
Analyzing Real-time Streaming Data with Amazon Kinesis
 
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystemStrata 2016 - Architecting for Change: LinkedIn's new data ecosystem
Strata 2016 - Architecting for Change: LinkedIn's new data ecosystem
 
Architecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystemArchitecting for change: LinkedIn's new data ecosystem
Architecting for change: LinkedIn's new data ecosystem
 
How does Microsoft solve Big Data?
How does Microsoft solve Big Data?How does Microsoft solve Big Data?
How does Microsoft solve Big Data?
 
Internet of Things in Tbilisi
Internet of Things in TbilisiInternet of Things in Tbilisi
Internet of Things in Tbilisi
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo! SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
SAP Technology Services Conference 2013: Big Data and The Cloud at Yahoo!
 
Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?Big data for Telco: opportunity or threat?
Big data for Telco: opportunity or threat?
 
CWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercxCWIN17 Frankfurt / data_stax_personalisatontopowercx
CWIN17 Frankfurt / data_stax_personalisatontopowercx
 
Meetup Toulouse Microsoft Azure : Bâtir une solution IoT
Meetup Toulouse Microsoft Azure : Bâtir une solution IoTMeetup Toulouse Microsoft Azure : Bâtir une solution IoT
Meetup Toulouse Microsoft Azure : Bâtir une solution IoT
 
In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017In-Memory Computing Webcast. Market Predictions 2017
In-Memory Computing Webcast. Market Predictions 2017
 
Unlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data LakeUnlocking Operational Intelligence from the Data Lake
Unlocking Operational Intelligence from the Data Lake
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 

Recently uploaded

The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...stockholm university
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024BookNet Canada
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceOpsTree solutions
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Antonio de Llamas
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxAna-Maria Mihalceanu
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneUiPathCommunity
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...BookNet Canada
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MIRomil Mishra
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationBuild Intuit
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?Rustici Software
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxmprakaash5
 

Recently uploaded (20)

The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...The work to make the piecework work: An ethnographic study of food delivery w...
The work to make the piecework work: An ethnographic study of food delivery w...
 
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
Green paths: Learning from publishers’ sustainability journeys - Tech Forum 2024
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
Deliver Latency Free Customer Experience
Deliver Latency Free Customer ExperienceDeliver Latency Free Customer Experience
Deliver Latency Free Customer Experience
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+Tetracrom printing process for packaging with CMYK+
Tetracrom printing process for packaging with CMYK+
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
A Glance At The Java Performance Toolbox
A Glance At The Java Performance ToolboxA Glance At The Java Performance Toolbox
A Glance At The Java Performance Toolbox
 
WomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyoneWomenInAutomation2024: AI and Automation for eveyone
WomenInAutomation2024: AI and Automation for eveyone
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
Transcript: Green paths: Learning from publishers’ sustainability journeys - ...
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Transport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MITransport in Open Pits______SM_MI10415MI
Transport in Open Pits______SM_MI10415MI
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Dynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientationDynamical Context introduction word sensibility orientation
Dynamical Context introduction word sensibility orientation
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Which standard is best for your content?
Which standard is best for your content?Which standard is best for your content?
Which standard is best for your content?
 
BoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another CenturyBoSEU24 | Bill Thompson | Talk From Another Century
BoSEU24 | Bill Thompson | Talk From Another Century
 
Introduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptxIntroduction-to-Wazuh-and-its-integration.pptx
Introduction-to-Wazuh-and-its-integration.pptx
 

Real-time big data analytics based on product recommendations case study

  • 1. Real-time big data analytics based on product recommendations case study IT Business Solutions B2B Conference October 2015 © deep.bi
  • 2. We started as an ad network The challenge was to recommend the best product (out of millions) to the right person in a given moment (thousands of users within a second)
  • 3. 5 billionad views delivered in 24 months
  • 4. To put it in the scale context: If we would serve 1 ad per second it will take 160 years to serve 5 billion ads
  • 5. So we needed a solution SQL databases did not work Popular NoSQL databases did not work Standard data warehouse approaches (pre- aggregations, creating schemas) - did not work
  • 6. Re-thinking all the problems with huge data streams flowing to us every second we have built a complete solution based on open-source technologies and fresh, smart ideas from our engineering team It is called deep.bi and now we make it available to other companies
  • 7. DEEP.BI = BIG DATA FAST DATA SOLUTION high velocity high volume
  • 8. deep.bi lets high-growth companies solve fast data problems by providing scalable, flexible and real-time data collection, enrichment and analytics
  • 9. deep.bi – complete data processing flow Data enrichment, transformation and integration Unstructured, raw data from many sources page views, IoT events, IP, URL, cookie, transactions, call detail records, etc. Find patterns, build models, predict behavior collect enrich analyze
  • 10. How to predict the best offer based on online data – case study.
  • 11. Collect website, campaigns and CRM data Website: Google Analytics Campaigns: Agency reports Apps: Dedicated monitoring tools Other systems: Call center IVR, emails Instead of integrating current reporting tools we need to gather all the single events that our customers generate. Data is stored in silos. Reporting tools provide aggregated reports impossible to integrate around single customer.
  • 12. Collecting raw web data is not enough 2015-05-15T00:26:41.328Z,3,D, [ip_hidden],i1xszg0f-19hqrje,"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ 42.0.2311.152 Safari/537.36",”[url_hidden]", 7279848891,@906,"https://www.google.pl/",vuser-history- allegro-1- hc20150509.1,"122_100003_Park@700:html_620x100_single_ban ner:See offer" IP, URL, cookie, user-agent, timestamp
  • 13. * Coming soon Enrich raw web and mobile data 50+information from one interaction Purchase intent Device Time Location ISP Online context Weather* Demographics
  • 14. We can learn quite a few things from user IP Example use: •  international travellers •  townspeople •  people in mountains •  rainy day •  Country •  Region •  City •  ZIP Code •  Population •  Latitude & Longitude •  Time zone •  IDD prefix to call the city from another country •  Phone area code •  Mobile Country Code (MCC) •  Mobile Network Code (MNC) •  Elevation •  Weather at the moment of event
  • 15. ISP tells us more we could expect Example use: •  competitors’ users-> acquisition •  our users -> retention/up- selling/cross-selling •  people from particular company or company type •  ISP name or Organization name •  Organization type: •  Commercial •  Organization •  Government •  Military •  University/College/School •  Library •  Content Delivery Network •  Fixed Line ISP •  Mobile ISP •  Data Center/Web Hosting/Transit •  Search Engine Spider •  Reserved •  Mobile brand •  Net speed
  • 16. Detailed information about user device Example use: •  smartphone users •  Apple users •  Samsung Galaxy users •  Google browser users •  Device Type •  Device Brand •  Device Model •  Device Operating System •  Operating System Producer •  Browser •  Browser Producer
  • 17. Besides user features, track user behavior too. Deeper understanding of people’s behavior: •  RFM Segmentation (Recency, Frequency, Monetary) •  Shopping cart analysis •  Purchase sequence analysis
  • 18. User behavior and characteristics helps predicts next best action/offer What product should we recommend? How could end this purchase path?
  • 19. So, how to build tailored recommendations? Pick an algorithm that is suitable for the problem Product [ feature_1, feature_2, …, feature_N] User [ feature_1, feature_2, …, feature_N] User [ product_1, product_2, …, product_N]
  • 20.   Simple rules: if a user has some features serve this group of products   Manual segment creating: analysts find segments of users and match them with product segments   Simple feature matching: get user weighted feature vector and match with products feature vectors Manual / people managed rules
  • 21.   Find segments automatically (e.g. k-means)   Product features based recommendations   User features based recommendations   Combined product and user based recommendations (collaborative filtering, deep learning) Machine learning-supported recommendations
  • 24.   Complex data model for query optimization  split dimensions in several tables based on reports made  pre cherry-pick dimensions which we can aggregate based on cardinality  index every dimension column is a must   Impossible to add high-cardinality dimensions  no way to analyze per user (millions of them)  no way to event add all of user-agent, url, geo-info, ... Problems with SQL and NoSQL databases
  • 25.   Complex data loading process  needs to pre-aggregate in memory  non-trivial reliability issues  hard to parallelize   There is always latency  pre-aggregation in job loading memory Problems with SQL and NoSQL databases
  • 26. Customer databases Event sources* Raw data stream Transformed data stream Real-time data ingestion Kafka Data Transformation & Enrichment Node.js, Spark Streaming Real-time OLAP Store Druid Operational Store Cassandra High performance, multi-purpose storage Webanalytics dashboard deep.biAPI   ETL   Customer analytics dashboard *e.g.. mobile apps, websites, marketing campaigns, IoT (beacons, wearables)   Raw Data Store Hadoop, Parquet, Spark deep.bi – real-time big data architecture
  • 27. DEEP Data enrichment, storage & analytics Client’s DEEP Data Space End-user browser Web Data Collection API (HTML or JS) Trackers pass event data with <DEEP tracker> Ingestion API Data Collection APIs 1   <D> <D> Mobile Data Collection API (HTML, JS or Native SDK) Trackers pass event data with
  • 28. Events are represented with full flexibility of JSON { "data": { "event_type": "CLICK", "ad_request_event": { "ctx": { "event_time": "2015-07-10T06:15:50.819Z", "ip_address": "XX.XX.XX.XX", "geo_info": { "country": ”US", "region": ”California", "city": ”San Francisco", "timezone": ”PST", "isp": ”XXX", "population": 849,774 }, “page": { "raw_url": ”XXX", "standardized_domain": ”XXX" }, "page_info": { "page_raw_url": ”XXX", ”product_categories": [ { "id": 20585 }, { "id": 100126 }, }, "cookie": "ibx8axlw-17j287o", "user_agent": "Mozilla/5.0 (Linux; Android 4.2.2; GT-S7580 Build/JDQ39) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.93 Mobile Safari/537.36",
  • 29.   Publish-subscribe service   The nervous system of enterprise data   decouple producers from consumers   reliable buffer data   send now, process later.   Scalable distributed, replicated log system   Pause components, restart processing   Powered by:   web giants like LinkedIn, Twitter, Netflix, Uber, Spotify or Pinterest   >10M messages/second Apache Kafka
  • 30.   Scalable, fault-tolerant stream processing system   With simple programming model & rich API & integrations   Powered by:   Yahoo, Netflix, eBay   NASA, Intel, Cisco   It is our fundamental technology for streaming applications sessionize events   detect frauds   attribute purchases to click or views   load & read external stores like Druid, Hadoop, Cassand Apache Spark Streaming
  • 31.   Open Source Streaming Data Store for Interactive Analytics at Scale   denormalized data   no more snowflake or star-schema!   Build real-time dashboards, analytic applications, exploratory tools on it.   It’s FAST!   aggregate, drill-down, slice-n-dice in sub-seconds   advanced column-store with compression   sophisticated approximate algorithms   It’s SCALABLE   horizontally scalable - just add more machines   replicated, highly-available   Over 100 PBs of data, millions events/second Druid – Real-time OLAP Store
  • 32.   Ingest historical & real-time data   data available for exploration in milliseconds   can store years of data in very optimized storage   Powered by   eBay, Netflix, PayPal, Yahoo   Cisco   It is our core data store of all events, historical and real-time data Druid – Real-time OLAP Store
  • 33.   Apache Spark for batch-processing: fast and general engine for large-scale data processing   Replaces Map-Reduce, being up to 10x-100x faster!   Number 1 open-source project in big data space (contributors, commits)   In-memory processing (if possible)   Spark SQL for SQL processing   Apache Parquet - an optimized storage format   columnar – read only columns you need   compressed – specialized compression for data type + generic compression   2x-4x: 600 GB data -> 150 GB data   Hadoop can be optimized by 2 order of magnitudes: from hours to seconds! Hadoop Optimized
  • 34. Thank you! Share your thoughts, challenges or case studies with us. Or drop us a line: hello@deep.bi SUBMIT»
  • 36. Let’s assume we want to find users who:   Were interested in smartphones   Use Samsung product   Live in cities with population over 1M people   Are woman   Were traveling abroad   Came from our display campaign So, we have a combination of 6 (k) dimensions from 50 (n). Using the combination formula: we will have… Complexity of multidimensional queries
  • 37. … similar number of possible combinations: 15,890,700 as in Lotto (6 from 49).
  • 38. Thank you! Share your thoughts, challenges or case studies with us. Or drop us a line: hello@deep.bi SUBMIT»

Editor's Notes

  1. 5000000000/60/60/24/30/12
  2. Sources: http://www.infoworld.com/article/2608040/big-data/fast-data--the-next-step-after-big-data.html
  3. Segmentacja RFM (Recency, Frequency, Monetary) Ocena potencjału przychodowego klientów Analiza migracji między segmentami Analiza koszyków zakupowych Zrozumienie jakie koszyki konstruują klienci. Zrozumienie, które kategorie produktów najczęściej sprzedają się razem. Analiza sekwencji zakupowych Zrozumienie jak zachowania klientów układają się w czasie. Jakie sekwencje poprzedzają zakup. Jakie sekwencje poprzedzają wycofanie się.
  4. Modele typu uplift: Kupi po rekomendacji Grupa celowa Kupi bez rekomendacji Zbędny wydatek Nie kupi po rekomendacji Strata klienta
  5. Modele typu uplift: Kupi po rekomendacji Grupa celowa Kupi bez rekomendacji Zbędny wydatek Nie kupi po rekomendacji Strata klienta
  6. Source: http://saasaddict.walkme.com/saas-2015-new-shifts-will-see/ 1.Companies Will Be Investing More in Personal Consumer Research Currently a lot of consumer research is performed in a very static manner, through surveys and analysis of raw data. What more companies will be investing in is in personalization and customization in their services. They will also focus on getting to know their customers more personally, usually through social media, through the use of Big Data (see more on that below) and through direct engagement (via email and social media). Details like purchasing motivations, lifestyle, and desires are all important. Relevant marketing strategies seek to improve customer satisfaction and motivate customers to value your brand as more than just a service. 2. Cloud Data Services Will Overtake Traditional Means of Storage According to Forrester research, Microsoft will be generating more revenue from its cloud services compared to its traditional on-premise application. Traditional services are limited by their on-premise storage space, while cloud data services are much more open. This will allow for businesses to look into contracting cloud services for meaningful growth while it is still relatively inexpensive. One challenge to watch out for is that cloud data breaches are a legitimate issue. Expect companies to invest heavily in shoring up their securities to avoid breaches. 3. More SaaS Apps Will Specialize in Specific Industries Industries like healthcare, manufacturing, and retail will be developing more apps in their specific fields. One of the challenges to this new approach is that it burdens the customer with a deeper, more complex experience to acclimate to. However, a benefit to specialized SaaS is that companies will have a built-in userbase which gives them a head start when developing features. It also benefits enterprise customers. The reason that this trend is important is because consumers are demanding more apps that are relevant to specific needs. Generalized apps avoid getting too complex in any one area which can alienate consumers by not providing solutions they desire. 4. New Alternatives to Multitenancy Will Develop Allowing multiple customers to share a single application instance is useful for managing data on cloud services. While the traditional sense allowed for multiple users to be plugged in, and had individual views, alternatives that allow for more personalized experiences are being developed. For example, Salesforce.com is offering a new 'Superpod' service for enterprises. This allows companies to have their own dedicated infrastructure inside their data centers, rather than connect to a single server-side instance. These new hybrid services gives enterprises more options leading into the future, allows for more innovation in developing delivery systems, and thus frees up the bottleneck in the cloud service market. It also gives consumers options as well. 5. A Bigger Emphasis on Big Data Analytics According to IDC reports, there is a trend leading towards a greater use of data-as-a-service (DaaS) with spending reaching $215 billion in 2015. DaaS will leverage cloud to deliver their services. They also predict that more companies will be using big data analytics as a part of their commercial and open data sets. Cloud storage offers more flexibility for enterprise access and overall capacity. Since the relative cost of cloud storage per unit is decreasing, more companies are becoming interested in big data analysis, which makes it a perfect opportunity to begin implementing open data set technologies.
  7. Source: http://saasaddict.walkme.com/saas-2015-new-shifts-will-see/ 1.Companies Will Be Investing More in Personal Consumer Research Currently a lot of consumer research is performed in a very static manner, through surveys and analysis of raw data. What more companies will be investing in is in personalization and customization in their services. They will also focus on getting to know their customers more personally, usually through social media, through the use of Big Data (see more on that below) and through direct engagement (via email and social media). Details like purchasing motivations, lifestyle, and desires are all important. Relevant marketing strategies seek to improve customer satisfaction and motivate customers to value your brand as more than just a service. 2. Cloud Data Services Will Overtake Traditional Means of Storage According to Forrester research, Microsoft will be generating more revenue from its cloud services compared to its traditional on-premise application. Traditional services are limited by their on-premise storage space, while cloud data services are much more open. This will allow for businesses to look into contracting cloud services for meaningful growth while it is still relatively inexpensive. One challenge to watch out for is that cloud data breaches are a legitimate issue. Expect companies to invest heavily in shoring up their securities to avoid breaches. 3. More SaaS Apps Will Specialize in Specific Industries Industries like healthcare, manufacturing, and retail will be developing more apps in their specific fields. One of the challenges to this new approach is that it burdens the customer with a deeper, more complex experience to acclimate to. However, a benefit to specialized SaaS is that companies will have a built-in userbase which gives them a head start when developing features. It also benefits enterprise customers. The reason that this trend is important is because consumers are demanding more apps that are relevant to specific needs. Generalized apps avoid getting too complex in any one area which can alienate consumers by not providing solutions they desire. 4. New Alternatives to Multitenancy Will Develop Allowing multiple customers to share a single application instance is useful for managing data on cloud services. While the traditional sense allowed for multiple users to be plugged in, and had individual views, alternatives that allow for more personalized experiences are being developed. For example, Salesforce.com is offering a new 'Superpod' service for enterprises. This allows companies to have their own dedicated infrastructure inside their data centers, rather than connect to a single server-side instance. These new hybrid services gives enterprises more options leading into the future, allows for more innovation in developing delivery systems, and thus frees up the bottleneck in the cloud service market. It also gives consumers options as well. 5. A Bigger Emphasis on Big Data Analytics According to IDC reports, there is a trend leading towards a greater use of data-as-a-service (DaaS) with spending reaching $215 billion in 2015. DaaS will leverage cloud to deliver their services. They also predict that more companies will be using big data analytics as a part of their commercial and open data sets. Cloud storage offers more flexibility for enterprise access and overall capacity. Since the relative cost of cloud storage per unit is decreasing, more companies are becoming interested in big data analysis, which makes it a perfect opportunity to begin implementing open data set technologies.
  8. Sources: https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani
  9. Sources: https://speakerdeck.com/metamx/druid-plus-r https://www.linkedin.com/pulse/combining-druid-spark-interactive-flexible-analytics-scale-butani