SlideShare a Scribd company logo
Cassandra at Digby

Cody Koeninger
ckoeninger@digby.com
Localpoint Architecture
Localpoint In-App SDK
Location Algorithm – Opt-in – Push – Rich Message – Message Management

Localpoint Cloud
Messaging

Identity
•
Attributes
•
Location
History
•
Campaign
History

Campaign Management (Push, Triggered) – Mobile
Offer Management – Campaign Reporting

Create/Manage

Location

Location API

Accuracy, Power, Privacy Optimization – Geofence
Management - Cross-OS, Cross-Device

Create/Manage

Analytics / Events Engine

Profiles

Campaign API

Real-Time API

Visits – Dwell Time – Frequency - Occupancy

Publish/Subscribe

•

CRM
API

Analytics Engine API
Transaction Record Export

© 2013 Digby. CONFIDENTIAL

Web
Console
Why Cassandra?
●

Somewhat of a green field project: add market
segmentation (aka “Profiles”) to our existing
geolocation / messaging infrastructure

●

Horizontal scalability

●

Homogenous deployment, less ops pain

●

No pre-existing investment in Hadoop

●

Data model matches our problem
Devices
●

Android and iOS mobile devices

●

Unique ID

●

●

Other parts of the codebase handle
geolocation. Here we're concerned primarily
with device as an ID
~Millions of devices
Attributes
●

Arbitrary key-value pairs associated to devices

●

Defined by marketers and app developers

●

String, boolean, integer, date

●

Encrypted due to PII concerns

●

e.g. birthdate: 1989-01-01, ownsPs3: true

●

~100 attributes
Profiles
●
●

●

●

Market segmentation on attributes of devices
Boolean expressions comparing to a fixed
value
Combined via Boolean 'and', aka set
intersection. No 'or'
e.g. wantsPs4: birthdate >= 1978-01-01 &&
ownsPs3 == true && ownsPs4 == false

●

May be defined long after attributes are defined

●

~100 profiles
Data Modeling
●

●

●
●

For nonrelational data stores, you need to know
what your queries are before you store data
Probably true of relational databases as well,
but they let you get away with it
Answering queries via primary key is ideal
Cassandra has 2 parts to a primary key lookup:
partitioning (by hash), then clustering (by order)
Use Case 1: Triggered Messaging
●

●

When a device breaches a geofence, check to
see if it is in a profile, then send a promotion
e.g. device is near a store, and is in the
wantsPs4 profile, tell it there are Ps4s in stock

●

Latency is important

●

Query: Given a device, which profiles is it in?
Use Case 2: Scheduled Messaging
●

●

At some date and time, find all the devices in a
given profile, and send them a promotion
e.g. send all devices in the wantsPs4 profile a
message telling them Ps4 is out of stock for
months, but Xbox One is on sale cheap

●

Throughput is more important than latency

●

Query: Given a profile, which devices are in it?
Use Case 3: Historical Analytics
●

●

●

Marketers may want to analyze past data
based on attributes that were known at that
time, but not included in profiles at that time
In other words, we need to know raw facts
(attributes), not just derived conclusions (profile
membership)
Query: Given a device and time, what were the
attributes for that device at that time
Brainstorming
●

Need to answer 3 questions:

●

given Device, get Profiles

●

given Profile, get Devices

●

given (Device, Time), get Attributes
given (Device, Time), get Attributes
create table attributes (
brandCode ascii,
deviceId ascii,
unixtime bigint,
attrs blob,
primary key ((brandCode, deviceId), unixtime)
) with compact storage
and clustering order by (unixtime desc)
select attrs from attributes where brandCode = ? and
deviceId = ? and unixtime <= ? limit 1
given Device, get Profiles
select attrs from attributes where brandCode = ?
and deviceId = ? limit 1
Then, in code, filter the (relatively small) set of
profiles based on whether attrs match it
given Profile, get Devices
create table profile_devices (
brandCode ascii,
profileId bigint,
deviceId ascii,
primary key((brandCode, profileId), deviceId)
) with compact storage
select deviceId from profile_devices where
brandCode = ? and profileId = ?
Why Spark?
●
●

Scala
Distributed computing that will interop with
Hadoop IO (and thus Cassandra), but doesn't
depend on HDFS

●

Approachable codebase (20kloc, vs 200kloc+)

●

Interactive shell

●

Fast to write, fast to run
Why Spark?
file = spark.textFile("hdfs://...")
file.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
Deployment
●

http://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html

●

Spark worker processes on Cassandra storage nodes

●

Gives data locality

●

Spark master process on Cassandra monitoring machine

●

Cluster start/stop done via ssh key from master

●

Submit jobs to master url

●

Consider pre-installing dependency jars on workers

●

Must use exact same binary version of Scala throughout
Spark / Cassandra Interop
// from CassandraTest.scala in the Spark distro
val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
classOf[ColumnFamilyInputFormat], classOf[ByteBuffer],
classOf[SortedMap[ByteBuffer, IColumn]])
// Let us first get all the paragraphs from the retrieved rows
val paraRdd = casRdd.map {
case (key, value) => {
ByteBufferUtil.string(value.get(ByteBufferUtil.bytes("para")).value())
}
}
// Lets get the word count in paras
val counts = paraRdd.flatMap(p => p.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _)
counts.collect().foreach {
case (word, count) => println(word + ":" + count)
}
Spark Resources
●

●

●

Project homepage
http://spark.incubator.apache.org/
AMP Camp tutorials
http://ampcamp.berkeley.edu/
Introduction to Spark internals
http://www.youtube.com/watch?v=49Hr5xZyTEA

More Related Content

Viewers also liked

"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan
cmmcgowan
 
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίαςΜελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
agrogos
 
Nature of organizing
Nature of organizingNature of organizing
Nature of organizing
Aadesh Shrestha
 
Makalah softskill 2 rate of return
Makalah softskill 2 rate of returnMakalah softskill 2 rate of return
Makalah softskill 2 rate of return
Ibnu Siroj
 
Nilai Waktu dari Uang
Nilai Waktu dari UangNilai Waktu dari Uang
Nilai Waktu dari Uang
Ibnu Siroj
 
Disability Project - ASD
Disability Project - ASDDisability Project - ASD
Disability Project - ASD
cmmcgowan
 
Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1
Ibnu Siroj
 
Makalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraanMakalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraan
Ibnu Siroj
 
"MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA""MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA"
Ibnu Siroj
 

Viewers also liked (9)

"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan"What TIME is it?" by Caitlin McGowan
"What TIME is it?" by Caitlin McGowan
 
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίαςΜελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
Μελέτη διάγνωσης των αναγκών της τοπικής αγοράς εργασίας
 
Nature of organizing
Nature of organizingNature of organizing
Nature of organizing
 
Makalah softskill 2 rate of return
Makalah softskill 2 rate of returnMakalah softskill 2 rate of return
Makalah softskill 2 rate of return
 
Nilai Waktu dari Uang
Nilai Waktu dari UangNilai Waktu dari Uang
Nilai Waktu dari Uang
 
Disability Project - ASD
Disability Project - ASDDisability Project - ASD
Disability Project - ASD
 
Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1Tugas ekonomi teknik # 1
Tugas ekonomi teknik # 1
 
Makalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraanMakalah pendidikan kewarganegaraan
Makalah pendidikan kewarganegaraan
 
"MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA""MENGANALISIS SUKU BUNGA"
"MENGANALISIS SUKU BUNGA"
 

Similar to Cassandra at Digby

Pentesting iOS Applications
Pentesting iOS ApplicationsPentesting iOS Applications
Pentesting iOS Applications
jasonhaddix
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
wesley chun
 
SplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk OverviewSplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk Overview
Splunk
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
wesley chun
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
GDSCIITBhilai
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
MongoDB
 
Tejas bichave m tech python
Tejas bichave  m tech pythonTejas bichave  m tech python
Tejas bichave m tech python
tejas bichave
 
Serverless Computing with Python
Serverless Computing with PythonServerless Computing with Python
Serverless Computing with Python
wesley chun
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
wesley chun
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
Sandra Garcia
 
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
.NET Conf UY
 
Outsmarting SmartPhones
Outsmarting SmartPhonesOutsmarting SmartPhones
Outsmarting SmartPhones
saurabhharit
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
Riccardo Zamana
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
Damien Dallimore
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
Sriskandarajah Suhothayan
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
wesley chun
 
Php melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddyPhp melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddy
Douglas Reith
 
Google's serverless journey: past to present
Google's serverless journey: past to presentGoogle's serverless journey: past to present
Google's serverless journey: past to present
wesley chun
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
Sriskandarajah Suhothayan
 

Similar to Cassandra at Digby (20)

Pentesting iOS Applications
Pentesting iOS ApplicationsPentesting iOS Applications
Pentesting iOS Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
SplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk OverviewSplunkLive! London 2016 Splunk Overview
SplunkLive! London 2016 Splunk Overview
 
Exploring Google APIs with Python
Exploring Google APIs with PythonExploring Google APIs with Python
Exploring Google APIs with Python
 
GDSC Cloud Jam.pptx
GDSC Cloud Jam.pptxGDSC Cloud Jam.pptx
GDSC Cloud Jam.pptx
 
MongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDBMongoDB Days Germany: Data Processing with MongoDB
MongoDB Days Germany: Data Processing with MongoDB
 
Tejas bichave m tech python
Tejas bichave  m tech pythonTejas bichave  m tech python
Tejas bichave m tech python
 
Serverless Computing with Python
Serverless Computing with PythonServerless Computing with Python
Serverless Computing with Python
 
Accessing Google Cloud APIs
Accessing Google Cloud APIsAccessing Google Cloud APIs
Accessing Google Cloud APIs
 
Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)Building search and discovery services for Schibsted (LSRS '17)
Building search and discovery services for Schibsted (LSRS '17)
 
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
Azure: un parque de diversiones en la nube para el desarrollador moderno by A...
 
Outsmarting SmartPhones
Outsmarting SmartPhonesOutsmarting SmartPhones
Outsmarting SmartPhones
 
Time Series Analytics Azure ADX
Time Series Analytics Azure ADXTime Series Analytics Azure ADX
Time Series Analytics Azure ADX
 
Integrating Splunk into your Spring Applications
Integrating Splunk into your Spring ApplicationsIntegrating Splunk into your Spring Applications
Integrating Splunk into your Spring Applications
 
WSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needsWSO2 Analytics Platform: The one stop shop for all your data needs
WSO2 Analytics Platform: The one stop shop for all your data needs
 
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
WSO2Con USA 2015: WSO2 Analytics Platform - The One Stop Shop for All Your Da...
 
Serverless Computing with Google Cloud
Serverless Computing with Google CloudServerless Computing with Google Cloud
Serverless Computing with Google Cloud
 
Php melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddyPhp melb cqrs-ddd-predaddy
Php melb cqrs-ddd-predaddy
 
Google's serverless journey: past to present
Google's serverless journey: past to presentGoogle's serverless journey: past to present
Google's serverless journey: past to present
 
WSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needsWSO2 Analytics Platform - The one stop shop for all your data needs
WSO2 Analytics Platform - The one stop shop for all your data needs
 

Recently uploaded

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 

Recently uploaded (20)

Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 

Cassandra at Digby

  • 1. Cassandra at Digby Cody Koeninger ckoeninger@digby.com
  • 2. Localpoint Architecture Localpoint In-App SDK Location Algorithm – Opt-in – Push – Rich Message – Message Management Localpoint Cloud Messaging Identity • Attributes • Location History • Campaign History Campaign Management (Push, Triggered) – Mobile Offer Management – Campaign Reporting Create/Manage Location Location API Accuracy, Power, Privacy Optimization – Geofence Management - Cross-OS, Cross-Device Create/Manage Analytics / Events Engine Profiles Campaign API Real-Time API Visits – Dwell Time – Frequency - Occupancy Publish/Subscribe • CRM API Analytics Engine API Transaction Record Export © 2013 Digby. CONFIDENTIAL Web Console
  • 3. Why Cassandra? ● Somewhat of a green field project: add market segmentation (aka “Profiles”) to our existing geolocation / messaging infrastructure ● Horizontal scalability ● Homogenous deployment, less ops pain ● No pre-existing investment in Hadoop ● Data model matches our problem
  • 4. Devices ● Android and iOS mobile devices ● Unique ID ● ● Other parts of the codebase handle geolocation. Here we're concerned primarily with device as an ID ~Millions of devices
  • 5. Attributes ● Arbitrary key-value pairs associated to devices ● Defined by marketers and app developers ● String, boolean, integer, date ● Encrypted due to PII concerns ● e.g. birthdate: 1989-01-01, ownsPs3: true ● ~100 attributes
  • 6. Profiles ● ● ● ● Market segmentation on attributes of devices Boolean expressions comparing to a fixed value Combined via Boolean 'and', aka set intersection. No 'or' e.g. wantsPs4: birthdate >= 1978-01-01 && ownsPs3 == true && ownsPs4 == false ● May be defined long after attributes are defined ● ~100 profiles
  • 7. Data Modeling ● ● ● ● For nonrelational data stores, you need to know what your queries are before you store data Probably true of relational databases as well, but they let you get away with it Answering queries via primary key is ideal Cassandra has 2 parts to a primary key lookup: partitioning (by hash), then clustering (by order)
  • 8. Use Case 1: Triggered Messaging ● ● When a device breaches a geofence, check to see if it is in a profile, then send a promotion e.g. device is near a store, and is in the wantsPs4 profile, tell it there are Ps4s in stock ● Latency is important ● Query: Given a device, which profiles is it in?
  • 9. Use Case 2: Scheduled Messaging ● ● At some date and time, find all the devices in a given profile, and send them a promotion e.g. send all devices in the wantsPs4 profile a message telling them Ps4 is out of stock for months, but Xbox One is on sale cheap ● Throughput is more important than latency ● Query: Given a profile, which devices are in it?
  • 10. Use Case 3: Historical Analytics ● ● ● Marketers may want to analyze past data based on attributes that were known at that time, but not included in profiles at that time In other words, we need to know raw facts (attributes), not just derived conclusions (profile membership) Query: Given a device and time, what were the attributes for that device at that time
  • 11. Brainstorming ● Need to answer 3 questions: ● given Device, get Profiles ● given Profile, get Devices ● given (Device, Time), get Attributes
  • 12. given (Device, Time), get Attributes create table attributes ( brandCode ascii, deviceId ascii, unixtime bigint, attrs blob, primary key ((brandCode, deviceId), unixtime) ) with compact storage and clustering order by (unixtime desc) select attrs from attributes where brandCode = ? and deviceId = ? and unixtime <= ? limit 1
  • 13. given Device, get Profiles select attrs from attributes where brandCode = ? and deviceId = ? limit 1 Then, in code, filter the (relatively small) set of profiles based on whether attrs match it
  • 14. given Profile, get Devices create table profile_devices ( brandCode ascii, profileId bigint, deviceId ascii, primary key((brandCode, profileId), deviceId) ) with compact storage select deviceId from profile_devices where brandCode = ? and profileId = ?
  • 15. Why Spark? ● ● Scala Distributed computing that will interop with Hadoop IO (and thus Cassandra), but doesn't depend on HDFS ● Approachable codebase (20kloc, vs 200kloc+) ● Interactive shell ● Fast to write, fast to run
  • 16. Why Spark? file = spark.textFile("hdfs://...") file.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _)
  • 17. Deployment ● http://spark.incubator.apache.org/docs/0.8.1/spark-standalone.html ● Spark worker processes on Cassandra storage nodes ● Gives data locality ● Spark master process on Cassandra monitoring machine ● Cluster start/stop done via ssh key from master ● Submit jobs to master url ● Consider pre-installing dependency jars on workers ● Must use exact same binary version of Scala throughout
  • 18. Spark / Cassandra Interop // from CassandraTest.scala in the Spark distro val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(), classOf[ColumnFamilyInputFormat], classOf[ByteBuffer], classOf[SortedMap[ByteBuffer, IColumn]]) // Let us first get all the paragraphs from the retrieved rows val paraRdd = casRdd.map { case (key, value) => { ByteBufferUtil.string(value.get(ByteBufferUtil.bytes("para")).value()) } } // Lets get the word count in paras val counts = paraRdd.flatMap(p => p.split(" ")). map(word => (word, 1)). reduceByKey(_ + _) counts.collect().foreach { case (word, count) => println(word + ":" + count) }
  • 19. Spark Resources ● ● ● Project homepage http://spark.incubator.apache.org/ AMP Camp tutorials http://ampcamp.berkeley.edu/ Introduction to Spark internals http://www.youtube.com/watch?v=49Hr5xZyTEA

Editor's Notes

  1. Short Script: “All of this is made possible by the advanced technology we’ve made available in the Digby Mobile Suite, an enterprise-grade and PCI certified SaaS platform that is our focus as a company. Our customer, using Digby Services or in self-implementation mode, can use each of these products, in blue, to support the building of applications. Each of them is modular and works with the others, all of them connected to the base platform and a collection of shared services and integration points. The Digby Mobile Console, as mentioned before, is the place where customers can manage the products they have deployed and access relevant analytics both within each product and across the entire solution. This Digby Mobile Suite allows for the deployment of powerful mobile websites and rich applications quickly, efficiently, and with less risk than any custom-built work. It handles cross-platform differences elegantly. And in a space that is constantly changing and innovating, each of these products has its own roadmap where we continue to handle any platform changes and bring innovations to market that make the products more powerful over time. Additionally, future products mean that customers can extend the capabilities of their mobile footprint even more widely, ensuring they are keeping pace with consumer expectations.” &lt;number&gt;