SlideShare a Scribd company logo
1 of 20
Download to read offline
Bullseye P13n Platform
April 7, 2014
Charles Bracher
Bullseye Dev Manager
Ranjan Sinha, PhD
Lead Research Scientist
Bullseye	
  
Outline
P13n Platform
Why Cassandra?
Cassandra Setup
Cassandra Usage
Cassandra Issues and Resolutions
Hand over to Ranjan for the Data Science Perspective
Bullseye
Bullseye
Bullseye Functional Architecture
Offline AnalysisOffline Database/
Batch Processing
Recent User Data
1-5 days
(Cassandra)
Real Time Model
Evaluation & Caching
(sharded/full user state
in memory)
Client
Access
Near Real Time
Event Collection
Tracking
Long Term
User Data
(Local SSD)
Why Cassandra?
Great write performance
Great replication performance
Reasonable read performance
Reasonable cost
Client controlled consistency settings
Bullseye
Cassandra Setup
Cassandra Version 1.2.9
We use Replication
–  Cassandra rings deployed to 3 datacenters
Cassandra clients
–  We use both the Datastax Java and C++ Beta clients
Using CQL Table specifications and commands
Not on SSDs
Bullseye
Cassandra Usage
Column Family Design:
– Avoid Tombstones
– Avoid Compaction
With Focus on Short Term Storage:
– Turn off automatic compaction / only manual compaction
– Use unique column key names to avoid tombstones
– Clear out old data with truncation
Bullseye
Cache Miss Flow (New Session)
Bullseye
CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME));
Write to active day column family with key user id.
Truncate the oldest day column family.
When going from one day to the next, do a manual compaction for the old day.
On read, pull user id info from all col. families newer than the local SSD data.
Queuing Flow (Ongoing Activity)
Bullseye
CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT,
RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME));
Read/Write from active hour with key timestamp rounded to nearest second
Store the column family one hour old to offline DB
Truncate the column family two hours old
Do async probe of record for current second as well as recent seconds till
state is captured. Data may be read 1-3 times. More if replication is lagging.
Cassandra Issues and Resolutions
Issues with C++ Datastax Cassandra beta client
– open sourced, so could apply fixes
Performance issues with the cache miss query
– increased heap size
– reduced replication factor
– turned off cross colo read repair
– deployed data center aware policy for C++
Bullseye
Personalization Applications
Ranjan Sinha, PhD
Lead Research Scientist
April 7, 2014
Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
Outline
Why Personalize?
P13N Platform
– Introduction
– Conceptual architecture
– Modeling stages
P13N Applications
– User badges
– Search ranking
– Contextual models
– Deals
Personalization Applications
Why Personalize?
Enable more relevant experience
Retention of existing users
New user acquisition
Reactivating churned users
Increasing activity per user
Improving conversion from visits to transactions
Personalization Applications
P13N Platform: Introduction
Maintains activity timeline information
Enables event processing at near real-time
Enables in-session personalization
Provides environment for predictive model evaluation
Backup and restore to and from Hadoop/HBase
Personalization Applications
P13N Platform: Conceptual Architecture
Personalization Applications
Tracking Event
Source
m1 m3m2
….
Model Executor
Filters and forwards
events
Activity
Timeline
+
User Badges
In-memory
Cache +
Model
Evaluation
CEP Processor
Client Access
Hadoop/
HBase
Offline Modeling
Platform
User Badges
mn
Cassandra
P13N Platform: Modeling stages
Realtime
– In-session user intent
– Contextual Models
Nearline
– Update propensity models (aka User Badges)
Offline
– Bootstrap propensity models by mining long-term behavior history
Personalization Applications
Application (1): User Badges
Personalization Applications
Name Description
SaleType Auction vs. Buy-it-now
ItemCondition New vs. Used
Category Preference of categories
Price Price range of purchasing activity
Deals Propensity to purchase deals
Social Share Propensity to share items in social media
Profile based on long-term behavior
Application (2): Search Ranking …
Should all queries be personalized in the same manner?
– For some queries (ebay or google), everyone would like the same results
– For other queries, different people may want completely different results
Personalization Applications
Query: “big ben puzzles”
Not_P13N
Rank
P13N
Rank
Sold IsNew Title
1 1 No No
LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500
PUZZLES EUC
2 3 No Yes
1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan
North Cascades National Park WA
3 2 Yes No
COMPLETE Fishing Village,Smalls Island MB Big
Ben Puzzle 1000 Piece Puzzle Size!
User: always buys used items
Application (3): Contextual models …
Personalization Applications
Infer categories that user is interested in within the current session
Long and Short term behavior
– Historic behavior may provide benefits at the start of the session
– Short-term behavior may contribute gains in an extended search session
– Combination of session and historic behavior may outperform using either alone
e2
t
Nearline, after session expiry
Online, in-session
Offline, historical
e3e1 …events… e1
Event
source
Application (4): Deals
Personalization Applications
Personalize
categories
Personalize
modules
Personalize
tabs
Personalize
items
fin
Personalization Applications

More Related Content

More from DataStax Academy

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

More from DataStax Academy (20)

Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Recently uploaded

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Recently uploaded (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Cassandra Day SV 2014: Building a Personalization Platform with Cassandra at eBay

  • 1. Bullseye P13n Platform April 7, 2014 Charles Bracher Bullseye Dev Manager Ranjan Sinha, PhD Lead Research Scientist Bullseye  
  • 2. Outline P13n Platform Why Cassandra? Cassandra Setup Cassandra Usage Cassandra Issues and Resolutions Hand over to Ranjan for the Data Science Perspective Bullseye
  • 3. Bullseye Bullseye Functional Architecture Offline AnalysisOffline Database/ Batch Processing Recent User Data 1-5 days (Cassandra) Real Time Model Evaluation & Caching (sharded/full user state in memory) Client Access Near Real Time Event Collection Tracking Long Term User Data (Local SSD)
  • 4. Why Cassandra? Great write performance Great replication performance Reasonable read performance Reasonable cost Client controlled consistency settings Bullseye
  • 5. Cassandra Setup Cassandra Version 1.2.9 We use Replication –  Cassandra rings deployed to 3 datacenters Cassandra clients –  We use both the Datastax Java and C++ Beta clients Using CQL Table specifications and commands Not on SSDs Bullseye
  • 6. Cassandra Usage Column Family Design: – Avoid Tombstones – Avoid Compaction With Focus on Short Term Storage: – Turn off automatic compaction / only manual compaction – Use unique column key names to avoid tombstones – Clear out old data with truncation Bullseye
  • 7. Cache Miss Flow (New Session) Bullseye CREATE TABLE DAY_N (USER_ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (USER_ID, RECORD_NAME)); Write to active day column family with key user id. Truncate the oldest day column family. When going from one day to the next, do a manual compaction for the old day. On read, pull user id info from all col. families newer than the local SSD data.
  • 8. Queuing Flow (Ongoing Activity) Bullseye CREATE TABLE HOUR_N (ID TEXT, RECORD_NAME TEXT, RECORD_VALUE BLOB, PRIMARY KEY (ID, RECORD_NAME)); Read/Write from active hour with key timestamp rounded to nearest second Store the column family one hour old to offline DB Truncate the column family two hours old Do async probe of record for current second as well as recent seconds till state is captured. Data may be read 1-3 times. More if replication is lagging.
  • 9. Cassandra Issues and Resolutions Issues with C++ Datastax Cassandra beta client – open sourced, so could apply fixes Performance issues with the cache miss query – increased heap size – reduced replication factor – turned off cross colo read repair – deployed data center aware policy for C++ Bullseye
  • 10. Personalization Applications Ranjan Sinha, PhD Lead Research Scientist April 7, 2014 Disclaimer: Some of the content in this talk is based on my personal opinion. It does not reflect the views of ebay.
  • 11. Outline Why Personalize? P13N Platform – Introduction – Conceptual architecture – Modeling stages P13N Applications – User badges – Search ranking – Contextual models – Deals Personalization Applications
  • 12. Why Personalize? Enable more relevant experience Retention of existing users New user acquisition Reactivating churned users Increasing activity per user Improving conversion from visits to transactions Personalization Applications
  • 13. P13N Platform: Introduction Maintains activity timeline information Enables event processing at near real-time Enables in-session personalization Provides environment for predictive model evaluation Backup and restore to and from Hadoop/HBase Personalization Applications
  • 14. P13N Platform: Conceptual Architecture Personalization Applications Tracking Event Source m1 m3m2 …. Model Executor Filters and forwards events Activity Timeline + User Badges In-memory Cache + Model Evaluation CEP Processor Client Access Hadoop/ HBase Offline Modeling Platform User Badges mn Cassandra
  • 15. P13N Platform: Modeling stages Realtime – In-session user intent – Contextual Models Nearline – Update propensity models (aka User Badges) Offline – Bootstrap propensity models by mining long-term behavior history Personalization Applications
  • 16. Application (1): User Badges Personalization Applications Name Description SaleType Auction vs. Buy-it-now ItemCondition New vs. Used Category Preference of categories Price Price range of purchasing activity Deals Propensity to purchase deals Social Share Propensity to share items in social media Profile based on long-term behavior
  • 17. Application (2): Search Ranking … Should all queries be personalized in the same manner? – For some queries (ebay or google), everyone would like the same results – For other queries, different people may want completely different results Personalization Applications Query: “big ben puzzles” Not_P13N Rank P13N Rank Sold IsNew Title 1 1 No No LOT OF 7 BIG BEN PUZZLES 5/1000PC. 2/1500 PUZZLES EUC 2 3 No Yes 1000 Pc MB Big Ben Jigsaw Puzzle Mount Shuksan North Cascades National Park WA 3 2 Yes No COMPLETE Fishing Village,Smalls Island MB Big Ben Puzzle 1000 Piece Puzzle Size! User: always buys used items
  • 18. Application (3): Contextual models … Personalization Applications Infer categories that user is interested in within the current session Long and Short term behavior – Historic behavior may provide benefits at the start of the session – Short-term behavior may contribute gains in an extended search session – Combination of session and historic behavior may outperform using either alone e2 t Nearline, after session expiry Online, in-session Offline, historical e3e1 …events… e1 Event source
  • 19. Application (4): Deals Personalization Applications Personalize categories Personalize modules Personalize tabs Personalize items