SlideShare a Scribd company logo
Insights into Customer Behavior
from Clickstream Data
Ronald J. Nowling
Red Hat, Inc.
rnowling@redhat.com
http://rnowling.github.io/
Who Am I?
•  Software Engineer at Red Hat
•  Data Science Team, Emerging Technologies
–  Evaluate solutions in open-source Big Data
space
–  Ensure software works for Red Hat customers
–  Promote data science internally through
consulting projects
•  Apache Bigtop PMC
2	
  
Clickstream Data
3	
  
Clickstream Data
61 million page views
4	
  
Clickstream Data
61 million page views
125,000 registered users
5	
  
Clickstream Data
61 million page views
125,000 registered users
500,000 pages
6	
  
Clickstream Data
61 million page views
125,000 registered users
500,000 pages
125,000 knowledgebase articles
7	
  
Potential Applications
•  Build customer profiles to aid sales teams
•  Recommendation system for
knowledgebase
•  Improve customer portal search
•  Guide selection of new knowledgebase
topics by content writers
8	
  
9	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
What are the different types of kernel packages in Red Hat
Enterprise Linux?
=============================================================
Issue
------
What are the different types of kernel packages in Red Hat
Enterprise Linux?
Environment
---------------
Red Hat Enterprise Linux
Resolution
------------
Red Hat Enterprise Linux contains the following kernel
packages:
10	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
What are the different types of kernel packages in Red Hat
Enterprise Linux
Issue
What are the different types of kernel packages in Red Hat
Enterprise Linux
Environment
Red Hat Enterprise Linux
Resolution
Red Hat Enterprise Linux contains the following kernel
packages some may not apply to your architecture and not all
are available in all major releases kernel contains the
kernel and following key features
11	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
What are the different types of kernel packages in Red Hat
Enterprise Linux
Issue
What are the different types of kernel packages in Red Hat
Enterprise Linux
Environment
Red Hat Enterprise Linux
Resolution
Red Hat Enterprise Linux contains the following kernel
packages some may not apply to your architecture and not all
are available in all major releases kernel contains the
kernel and following key features
12	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
What are the different type of kernel package in Red Hat
Enterprise Linux
Issue
What are the different type of kernel package in Red Hat
Enterprise Linux
Environment
Red Hat Enterprise Linux
Resolution
Red Hat Enterprise Linux contain the follow kernel
package some may not apply to your architecture and not all
are available in all major release kernel contain the
kernel and follow key feature
13	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
What are the different type of kernel package in Red Hat
Enterprise Linux
Issue
What are the different type of kernel package in Red Hat
Enterprise Linux
Environment
Red Hat Enterprise Linux
Resolution
Red Hat Enterprise Linux contain the follow kernel
package some may not apply to your architecture and not all
are available in all major release kernel contain the
kernel and follow key feature
14	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
different type kernel package Red Hat
Enterprise Linux
Issue
different type kernel package Red Hat
Enterprise Linux
Environment
Red Hat Enterprise Linux
Resolution
Red Hat Enterprise Linux contain kernel
package apply architecture
available major release kernel contain
kernel follow key feature
15	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
kernel: 5
red: 4
hat: 4
enterprise: 4
linux: 4
package: 3
contain: 3
different: 2
type: 2
intel: 2
environment: 1
resolution: 1
follow: 1
system: 1
16	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
kernel: 5
red: 4
hat: 4
enterprise: 4
linux: 4
package: 3
contain: 3
different: 2
type: 2
intel: 2
environment: 1
resolution: 1
follow: 1
system: 1
17	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
kernel: 5
red: 4
hat: 4
enterprise: 4
linux: 4
package: 3
contain: 3
18	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
19	
  
Strip
Formatting
Clean
Words
Vectorize Cluster
Topics
openshift gear cartridge online
node broker
vm rhev virtualization disk
glusterfs storage volume brick rhs
glusterd node client mount geo
rhel support driver hp hardware
version firmware card intel
20	
  
Topics
openshift gear cartridge online
node broker
vm rhev virtualization disk
glusterfs storage volume brick rhs
glusterd node client mount geo
rhel support driver hp hardware
version firmware card intel
21	
  
Topics
openshift gear cartridge online
node broker
vm rhev virtualization disk
glusterfs storage volume brick rhs
glusterd node client mount geo
rhel support driver hp hardware
version firmware card intel
22	
  
Topics
openshift gear cartridge online
node broker
vm rhev virtualization disk
glusterfs storage volume brick rhs
glusterd node client mount geo
rhel support driver hp hardware
version firmware card intel
23	
  
Topics
openshift gear cartridge online
node broker
vm rhev virtualization disk
glusterfs storage volume brick rhs
glusterd node client mount geo
rhel support driver hp hardware
version firmware card intel
24	
  
Topic Article Counts
25	
  
Clickstream Processing
Parse
Raw Daily
Page Views
Clean &
Filter
Raw Daily
Page Views
Raw Daily
Page Views
Parse
Parse
Clean &
Filter
Clean &
Filter
Accounts
Aggregate
Topic View
Counts
Project onto
Topics
26	
  
Clickstream Processing
Parse
Raw Daily
Page Views
Clean &
Filter
Raw Daily
Page Views
Raw Daily
Page Views
Parse
Parse
Clean &
Filter
Clean &
Filter
Accounts
Aggregate
Topic View
Counts
Project onto
Topics
27	
  
Clickstream Processing
Parse
Raw Daily
Page Views
Clean &
Filter
Raw Daily
Page Views
Raw Daily
Page Views
Parse
Parse
Clean &
Filter
Clean &
Filter
Accounts
Aggregate
Topic View
Counts
Project onto
Topics
28	
  
Clickstream Processing
Parse
Raw Daily
Page Views
Clean &
Filter
Raw Daily
Page Views
Raw Daily
Page Views
Parse
Parse
Clean &
Filter
Clean &
Filter
Accounts
Aggregate
Topic View
Counts
Project onto
Topics
29	
  
Clickstream Processing
Parse
Raw Daily
Page Views
Clean &
Filter
Raw Daily
Page Views
Raw Daily
Page Views
Parse
Parse
Clean &
Filter
Clean &
Filter
Accounts
Aggregate
Topic View
Counts
Project onto
Topics
30	
  
Customer Profiles
•  Dominant topics
– JBoss
– Red Hat Enterprise Virtualization
– Hardware support
– Gluster
– Booting into rescue mode
– Packages
31	
  
Customer Profiles
•  Supporting topics
– Logging
– LDAP
– Samba
– High resource usage
– File systems / LVM / block devices
– Networking
32	
  
Customer Profiles
•  JBoss and RHEV appear in combination
with a number of other products
•  Some products only appear by
themselves with supporting topics
(logging, networking, filesystems)
– OpenShift
– Gluster
33	
  
Topic Enrichments
34	
  
Malformed TSV Files
•  Gzip files need to be read sequentially
•  Tab-separated, no quoting (in theory!)
•  Escaped tabs and newlines within records
•  E.g., n or t
•  Improperly escaped tabs and newlines
•  E.g., t vs t
•  Extraneous unmatched quote marks
•  E.g., ‘some_user
35	
  
Lessons Learned
•  Consider custom Hadoop input formats
for tricky file formats
•  Verify everything – what works in general
may not work for you
– Stemming
– Filtering most frequent words
– K-Means vs LDA
36	
  
Lessons Learned
•  K-Means
– Improve accuracy: Multiple runs, more
iterations
•  Watch out for memory leaks
– Un-persist cached RDDs
– Un-persist broadcasted variables
•  Parquet for performance
37	
  
Potential Applications
•  Build customer profiles to aid sales teams
•  Recommendation system for
knowledgebase
•  Improve customer portal search
•  Guide selection of new knowledgebase
topics for content writers
38	
  
Resources
http://rnowling.github.io/
39	
  
QUESTIONS
40	
  

More Related Content

What's hot

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Seunghyun Lee
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
Databricks
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
James Serra
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
Adrien Blind
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
Databricks
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
confluent
 
Composable Software Architecture with Spring
Composable Software Architecture with SpringComposable Software Architecture with Spring
Composable Software Architecture with Spring
Sam Brannen
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
ScyllaDB
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
Dmitry Kan
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
Lars Albertsson
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
Saurav Haloi
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
Databricks
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Databricks
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
Timothy Spann
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
Databricks
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
Kishore Gopalakrishna
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
DataWorks Summit
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
Nilesh Gule
 

What's hot (20)

Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's ScalePinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
Pinot: Enabling Real-time Analytics Applications @ LinkedIn's Scale
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)Introdution to Dataops and AIOps (or MLOps)
Introdution to Dataops and AIOps (or MLOps)
 
What’s New with Databricks Machine Learning
What’s New with Databricks Machine LearningWhat’s New with Databricks Machine Learning
What’s New with Databricks Machine Learning
 
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron SchildkroutKafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
Kafka + Uber- The World’s Realtime Transit Infrastructure, Aaron Schildkrout
 
Composable Software Architecture with Spring
Composable Software Architecture with SpringComposable Software Architecture with Spring
Composable Software Architecture with Spring
 
Powering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraphPowering a Graph Data System with Scylla + JanusGraph
Powering a Graph Data System with Scylla + JanusGraph
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Introduction to Apache ZooKeeper
Introduction to Apache ZooKeeperIntroduction to Apache ZooKeeper
Introduction to Apache ZooKeeper
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim ChenApache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
Apache Spark on Kubernetes Anirudh Ramanathan and Tim Chen
 
Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4Introduction to Apache NiFi 1.11.4
Introduction to Apache NiFi 1.11.4
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Pinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastorePinot: Realtime Distributed OLAP datastore
Pinot: Realtime Distributed OLAP datastore
 
The columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache ArrowThe columnar roadmap: Apache Parquet and Apache Arrow
The columnar roadmap: Apache Parquet and Apache Arrow
 
Improve monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss toolsImprove monitoring and observability for kubernetes with oss tools
Improve monitoring and observability for kubernetes with oss tools
 

Viewers also liked

Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
Spark Summit
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
DataWorks Summit
 
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Spark Summit
 
Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
Albert Hui
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida Ha
Spark Summit
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
Cepoi Eugen
 
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Spark Summit
 
20 Inspiring Quotes On Customer Service
20 Inspiring Quotes On Customer Service20 Inspiring Quotes On Customer Service
20 Inspiring Quotes On Customer Service
WebAble Digital
 
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Spark Summit
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
lucenerevolution
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Spark Summit
 
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu VatsBuilding a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Spark Summit
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Spark Summit
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
Spark Summit
 
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-FiHow Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Spark Summit
 
Data Scientist Workbench 入門
Data Scientist Workbench 入門Data Scientist Workbench 入門
Data Scientist Workbench 入門
soh kaijima
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Spark Summit
 
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your applicationSpark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Databricks
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Spark Summit
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
Spark Summit
 

Viewers also liked (20)

Time Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy RyzaTime Series Analysis with Spark by Sandy Ryza
Time Series Analysis with Spark by Sandy Ryza
 
How we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBaseHow we solved Real-time User Segmentation using HBase
How we solved Real-time User Segmentation using HBase
 
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
Some Important Streaming Algorithms You Should Know About-(Ted Dunning, MapR)
 
Clickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customersClickstream Data Warehouse - Turning clicks into customers
Clickstream Data Warehouse - Turning clicks into customers
 
Not Your Father's Database by Vida Ha
Not Your Father's Database by Vida HaNot Your Father's Database by Vida Ha
Not Your Father's Database by Vida Ha
 
Viadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on MesosViadeos Segmentation platform with Spark on Mesos
Viadeos Segmentation platform with Spark on Mesos
 
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
Mapping Brain Connectivity Through Large-Scale Segmentation and Analysis by S...
 
20 Inspiring Quotes On Customer Service
20 Inspiring Quotes On Customer Service20 Inspiring Quotes On Customer Service
20 Inspiring Quotes On Customer Service
 
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
Monte Carlo Simulations in Ad-Lift Measurement Using Spark by Prasad Chalasan...
 
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting PlatformHow Lucene Powers the LinkedIn Segmentation and Targeting Platform
How Lucene Powers the LinkedIn Segmentation and Targeting Platform
 
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
Using GraphX/Pregel on Browsing History to Discover Purchase Intent by Lisa Z...
 
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu VatsBuilding a Recommendation Engine Using Diverse Features by Divyanshu Vats
Building a Recommendation Engine Using Diverse Features by Divyanshu Vats
 
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
Highlights and Challenges from Running Spark on Mesos in Production by Morri ...
 
Production Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlibProduction Readiness Testing At Salesforce Using Spark MLlib
Production Readiness Testing At Salesforce Using Spark MLlib
 
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-FiHow Apache Spark Is Helping Tame the Wild West of Wi-Fi
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
 
Data Scientist Workbench 入門
Data Scientist Workbench 入門Data Scientist Workbench 入門
Data Scientist Workbench 入門
 
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
Tagging and Processing Data in Real Time-(Hari Shreedharan and Siddhartha Jai...
 
Spark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your applicationSpark Summit EU 2015: SparkUI visualization: a lens into your application
Spark Summit EU 2015: SparkUI visualization: a lens into your application
 
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
Distributed Time Travel for Feature Generation by DB Tsai and Prasanna Padman...
 
Spark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher BateySpark with Cassandra by Christopher Batey
Spark with Cassandra by Christopher Batey
 

Similar to Insights into Customer Behavior from Clickstream Data by Ronald Nowling

2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
Shawn Wells
 
ansible_rhel_90.pdf
ansible_rhel_90.pdfansible_rhel_90.pdf
ansible_rhel_90.pdf
ssuserd254491
 
Best Red Hat Linux Certification Course
Best Red Hat Linux Certification CourseBest Red Hat Linux Certification Course
Best Red Hat Linux Certification Course
Network Kings
 
Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64
Ganesh Raju
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
Fwdays
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
Alexandr Savchenko
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
Joel Falcou
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
All Things Open
 
Presentazione Lenovo evento 9 febbraio
Presentazione Lenovo evento 9 febbraioPresentazione Lenovo evento 9 febbraio
Presentazione Lenovo evento 9 febbraio
PRAGMA PROGETTI
 
Linux: embarquement immédiat pour le cloud
Linux: embarquement immédiat pour le cloudLinux: embarquement immédiat pour le cloud
Linux: embarquement immédiat pour le cloud
Microsoft
 
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
Frédéric Aatz
 
Startup Engineering Cookbook
Startup Engineering CookbookStartup Engineering Cookbook
Startup Engineering Cookbook
Manish Jain
 
Open Source Software – Open Day Oracle 2013
Open Source Software  – Open Day Oracle 2013Open Source Software  – Open Day Oracle 2013
Open Source Software – Open Day Oracle 2013
Erik Gur
 
Red hat storage el almacenamiento disruptivo
Red hat storage el almacenamiento disruptivoRed hat storage el almacenamiento disruptivo
Red hat storage el almacenamiento disruptivo
Nextel S.A.
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
Shawn Wells
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
Marcel Mitran
 
2011-03-15 Lockheed Martin Open Source Day
2011-03-15 Lockheed Martin Open Source Day2011-03-15 Lockheed Martin Open Source Day
2011-03-15 Lockheed Martin Open Source Day
Shawn Wells
 
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
Frederik Wouters
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
Vijayendra Shamanna
 
Dev trends 18_q1
Dev trends 18_q1Dev trends 18_q1
Dev trends 18_q1
Pini Cohen
 

Similar to Insights into Customer Behavior from Clickstream Data by Ronald Nowling (20)

2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
2008-07-30 IBM Teach the Teacher (IBM T3), Red Hat Update for System z
 
ansible_rhel_90.pdf
ansible_rhel_90.pdfansible_rhel_90.pdf
ansible_rhel_90.pdf
 
Best Red Hat Linux Certification Course
Best Red Hat Linux Certification CourseBest Red Hat Linux Certification Course
Best Red Hat Linux Certification Course
 
Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64
 
"Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa..."Different software evolutions from Start till Release in PHP product" Oleksa...
"Different software evolutions from Start till Release in PHP product" Oleksa...
 
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
PHPFrameworkDay 2020 - Different software evolutions from Start till Release ...
 
(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures(Costless) Software Abstractions for Parallel Architectures
(Costless) Software Abstractions for Parallel Architectures
 
Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!Linux Distribution Collaboration …on a Mainframe!
Linux Distribution Collaboration …on a Mainframe!
 
Presentazione Lenovo evento 9 febbraio
Presentazione Lenovo evento 9 febbraioPresentazione Lenovo evento 9 febbraio
Presentazione Lenovo evento 9 febbraio
 
Linux: embarquement immédiat pour le cloud
Linux: embarquement immédiat pour le cloudLinux: embarquement immédiat pour le cloud
Linux: embarquement immédiat pour le cloud
 
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
Linux on Azure - Session TechDays 2014 par Blaise Vignon (Microsoft), Julien ...
 
Startup Engineering Cookbook
Startup Engineering CookbookStartup Engineering Cookbook
Startup Engineering Cookbook
 
Open Source Software – Open Day Oracle 2013
Open Source Software  – Open Day Oracle 2013Open Source Software  – Open Day Oracle 2013
Open Source Software – Open Day Oracle 2013
 
Red hat storage el almacenamiento disruptivo
Red hat storage el almacenamiento disruptivoRed hat storage el almacenamiento disruptivo
Red hat storage el almacenamiento disruptivo
 
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z2008-09-09 IBM Interaction Conference, Red Hat Update for System z
2008-09-09 IBM Interaction Conference, Red Hat Update for System z
 
LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0LinuxONE cavemen mmit 20160505 v1.0
LinuxONE cavemen mmit 20160505 v1.0
 
2011-03-15 Lockheed Martin Open Source Day
2011-03-15 Lockheed Martin Open Source Day2011-03-15 Lockheed Martin Open Source Day
2011-03-15 Lockheed Martin Open Source Day
 
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
[RUHR] DRPSL- Drupal on kubernetes in production. What should you do_know_.pdf
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
Dev trends 18_q1
Dev trends 18_q1Dev trends 18_q1
Dev trends 18_q1
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Spark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Spark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
Spark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
Spark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
Spark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Spark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
Spark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Spark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
VyNguyen709676
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 

Recently uploaded (20)

Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
writing report business partner b1+ .pdf
writing report business partner b1+ .pdfwriting report business partner b1+ .pdf
writing report business partner b1+ .pdf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 

Insights into Customer Behavior from Clickstream Data by Ronald Nowling

  • 1. Insights into Customer Behavior from Clickstream Data Ronald J. Nowling Red Hat, Inc. rnowling@redhat.com http://rnowling.github.io/
  • 2. Who Am I? •  Software Engineer at Red Hat •  Data Science Team, Emerging Technologies –  Evaluate solutions in open-source Big Data space –  Ensure software works for Red Hat customers –  Promote data science internally through consulting projects •  Apache Bigtop PMC 2  
  • 4. Clickstream Data 61 million page views 4  
  • 5. Clickstream Data 61 million page views 125,000 registered users 5  
  • 6. Clickstream Data 61 million page views 125,000 registered users 500,000 pages 6  
  • 7. Clickstream Data 61 million page views 125,000 registered users 500,000 pages 125,000 knowledgebase articles 7  
  • 8. Potential Applications •  Build customer profiles to aid sales teams •  Recommendation system for knowledgebase •  Improve customer portal search •  Guide selection of new knowledgebase topics by content writers 8  
  • 9. 9   Strip Formatting Clean Words Vectorize Cluster What are the different types of kernel packages in Red Hat Enterprise Linux? ============================================================= Issue ------ What are the different types of kernel packages in Red Hat Enterprise Linux? Environment --------------- Red Hat Enterprise Linux Resolution ------------ Red Hat Enterprise Linux contains the following kernel packages:
  • 10. 10   Strip Formatting Clean Words Vectorize Cluster What are the different types of kernel packages in Red Hat Enterprise Linux Issue What are the different types of kernel packages in Red Hat Enterprise Linux Environment Red Hat Enterprise Linux Resolution Red Hat Enterprise Linux contains the following kernel packages some may not apply to your architecture and not all are available in all major releases kernel contains the kernel and following key features
  • 11. 11   Strip Formatting Clean Words Vectorize Cluster What are the different types of kernel packages in Red Hat Enterprise Linux Issue What are the different types of kernel packages in Red Hat Enterprise Linux Environment Red Hat Enterprise Linux Resolution Red Hat Enterprise Linux contains the following kernel packages some may not apply to your architecture and not all are available in all major releases kernel contains the kernel and following key features
  • 12. 12   Strip Formatting Clean Words Vectorize Cluster What are the different type of kernel package in Red Hat Enterprise Linux Issue What are the different type of kernel package in Red Hat Enterprise Linux Environment Red Hat Enterprise Linux Resolution Red Hat Enterprise Linux contain the follow kernel package some may not apply to your architecture and not all are available in all major release kernel contain the kernel and follow key feature
  • 13. 13   Strip Formatting Clean Words Vectorize Cluster What are the different type of kernel package in Red Hat Enterprise Linux Issue What are the different type of kernel package in Red Hat Enterprise Linux Environment Red Hat Enterprise Linux Resolution Red Hat Enterprise Linux contain the follow kernel package some may not apply to your architecture and not all are available in all major release kernel contain the kernel and follow key feature
  • 14. 14   Strip Formatting Clean Words Vectorize Cluster different type kernel package Red Hat Enterprise Linux Issue different type kernel package Red Hat Enterprise Linux Environment Red Hat Enterprise Linux Resolution Red Hat Enterprise Linux contain kernel package apply architecture available major release kernel contain kernel follow key feature
  • 15. 15   Strip Formatting Clean Words Vectorize Cluster kernel: 5 red: 4 hat: 4 enterprise: 4 linux: 4 package: 3 contain: 3 different: 2 type: 2 intel: 2 environment: 1 resolution: 1 follow: 1 system: 1
  • 16. 16   Strip Formatting Clean Words Vectorize Cluster kernel: 5 red: 4 hat: 4 enterprise: 4 linux: 4 package: 3 contain: 3 different: 2 type: 2 intel: 2 environment: 1 resolution: 1 follow: 1 system: 1
  • 17. 17   Strip Formatting Clean Words Vectorize Cluster kernel: 5 red: 4 hat: 4 enterprise: 4 linux: 4 package: 3 contain: 3
  • 20. Topics openshift gear cartridge online node broker vm rhev virtualization disk glusterfs storage volume brick rhs glusterd node client mount geo rhel support driver hp hardware version firmware card intel 20  
  • 21. Topics openshift gear cartridge online node broker vm rhev virtualization disk glusterfs storage volume brick rhs glusterd node client mount geo rhel support driver hp hardware version firmware card intel 21  
  • 22. Topics openshift gear cartridge online node broker vm rhev virtualization disk glusterfs storage volume brick rhs glusterd node client mount geo rhel support driver hp hardware version firmware card intel 22  
  • 23. Topics openshift gear cartridge online node broker vm rhev virtualization disk glusterfs storage volume brick rhs glusterd node client mount geo rhel support driver hp hardware version firmware card intel 23  
  • 24. Topics openshift gear cartridge online node broker vm rhev virtualization disk glusterfs storage volume brick rhs glusterd node client mount geo rhel support driver hp hardware version firmware card intel 24  
  • 26. Clickstream Processing Parse Raw Daily Page Views Clean & Filter Raw Daily Page Views Raw Daily Page Views Parse Parse Clean & Filter Clean & Filter Accounts Aggregate Topic View Counts Project onto Topics 26  
  • 27. Clickstream Processing Parse Raw Daily Page Views Clean & Filter Raw Daily Page Views Raw Daily Page Views Parse Parse Clean & Filter Clean & Filter Accounts Aggregate Topic View Counts Project onto Topics 27  
  • 28. Clickstream Processing Parse Raw Daily Page Views Clean & Filter Raw Daily Page Views Raw Daily Page Views Parse Parse Clean & Filter Clean & Filter Accounts Aggregate Topic View Counts Project onto Topics 28  
  • 29. Clickstream Processing Parse Raw Daily Page Views Clean & Filter Raw Daily Page Views Raw Daily Page Views Parse Parse Clean & Filter Clean & Filter Accounts Aggregate Topic View Counts Project onto Topics 29  
  • 30. Clickstream Processing Parse Raw Daily Page Views Clean & Filter Raw Daily Page Views Raw Daily Page Views Parse Parse Clean & Filter Clean & Filter Accounts Aggregate Topic View Counts Project onto Topics 30  
  • 31. Customer Profiles •  Dominant topics – JBoss – Red Hat Enterprise Virtualization – Hardware support – Gluster – Booting into rescue mode – Packages 31  
  • 32. Customer Profiles •  Supporting topics – Logging – LDAP – Samba – High resource usage – File systems / LVM / block devices – Networking 32  
  • 33. Customer Profiles •  JBoss and RHEV appear in combination with a number of other products •  Some products only appear by themselves with supporting topics (logging, networking, filesystems) – OpenShift – Gluster 33  
  • 35. Malformed TSV Files •  Gzip files need to be read sequentially •  Tab-separated, no quoting (in theory!) •  Escaped tabs and newlines within records •  E.g., n or t •  Improperly escaped tabs and newlines •  E.g., t vs t •  Extraneous unmatched quote marks •  E.g., ‘some_user 35  
  • 36. Lessons Learned •  Consider custom Hadoop input formats for tricky file formats •  Verify everything – what works in general may not work for you – Stemming – Filtering most frequent words – K-Means vs LDA 36  
  • 37. Lessons Learned •  K-Means – Improve accuracy: Multiple runs, more iterations •  Watch out for memory leaks – Un-persist cached RDDs – Un-persist broadcasted variables •  Parquet for performance 37  
  • 38. Potential Applications •  Build customer profiles to aid sales teams •  Recommendation system for knowledgebase •  Improve customer portal search •  Guide selection of new knowledgebase topics for content writers 38