SlideShare a Scribd company logo
Storing Time Series Metrics

         Implementing Multi-Dimensional
           Aggregate Composites with
           Counters For Reporting
         /*
         Joe Stein
         http://www.linkedin.com/in/charmalloc
         @allthingshadoop
         @cassandranosql
         @allthingsscala
         @charmalloc
         */

         Sample code project up at
           https://github.com/joestein/apophis



                      1
Medialets

What we do




     2
Medialets
• Largest deployment of rich media ads for mobile devices
• Over 300,000,000 devices supported
• 3-4 TB of new data every day
• Thousands of services in production
• Hundreds of Thousands of simultaneous requests per second
• Keeping track of what is and was going on when and where
  used to be difficult before we started using Cassandra
• What do I do for Medialets?
   – Chief Architect and Head of Server Engineering
     Development & Operations.




                             3
What does the schema look like?

CREATE COLUMN FAMILY ByDay                                   Column Families hold
WITH default_validation_class=CounterColumnType              your rows of data. Each
AND key_validation_class=UTF8Type AND comparator=UTF8Type;   row within each column
                                                             family will be equal to the
CREATE COLUMN FAMILY ByHour                                  time period you are
WITH default_validation_class=CounterColumnType              dealing with. So an
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
                                                             “event” occurring at
                                                             10/20/2011 11:22:41 will
CREATE COLUMN FAMILY ByMinute
WITH default_validation_class=CounterColumnType              become 4 rows
AND key_validation_class=UTF8Type AND comparator=UTF8Type;
                                                             BySecond = 20111020112141
                                                             ByMinute= 201110201122
CREATE COLUMN FAMILY BySecond                                ByHour= 2011102011
WITH default_validation_class=CounterColumnType              ByDay=20111020
AND key_validation_class=UTF8Type AND comparator=UTF8Type;




                                            4
Why multiple column families?
http://www.datastax.com/docs/1.0/configuration/storage_configuration




                                 5
Ok now how do we keep track of what?
             Lets setup a quick example data set first

• The Animal Logger – fictitious logger of the world around us
  – animal
  – food
  – sound
  – home

• YYYY/MM/DD HH:MM:SS GET /sample?animal=X&food=Y
  – animal=duck&sound=quack&home=pond
  – animal=cat&sound=meow&home=house
  – animal=cat&sound=meow&home=street
  – animal=pigeon&sound=coo&home=street



                                 6
Now what?
      Columns babe, columns make your aggregates work

• Setup your code for columns you want aggregated
  – animal=
  – animal#sound=
  – animal#home=
  – animal#food=
  – animal#food#home=
  – animal#food#sound=
  – animal#sound#home=
  – food#sound=
  – home#food=
  – sound#animal=



                             7
Inserting data
                   Column aggregate concatenated with values
      2011/10/29 11:22:43 GET /sample?animal=duck&home=pond&sound=quack
•   mutator.insertCounter(“20111029112243, “BySecond”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“20111029112243, “BySecond”,
    HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“201110291122, “ByMinute”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“201110291122, “ByMinute”,
    HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#home=duck#pond”),
    1))
•   mutator.insertCounter(“2011102911, “ByHour”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal=duck”), 1))

•   mutator.insertCounter(“20111029, “ByDay”,
    HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1))
•   mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1))
•   mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal=duck”), 1))




                                                     8
The implementation, its functional
       kind of like “its electric” but without the boogie woogieoogie

def r(columnName: String): Unit = {
aggregateKeys.foreach{tuple:(ColumnFamily, String) => {
val(columnFamily,row) = tuple
         if (row !=null &&row.size> 0)
                   rows add (columnFamily -> row has columnName inc) //increment the counter
         }
  }
}

def ccAnimal(c: (String) => Unit) = {
c(aggregateColumnNames("Animal") + animal)
}

//rows we are going to write too
aggregateKeys(KEYSPACE  "ByDay") = day
aggregateKeys(KEYSPACE  "ByHour") = hour
aggregateKeys(KEYSPACE  "ByMinute") = minute

aggregateColumnNames("Animal") = "animal=”

ccAnimal(r)


                                                 9
Retrieving Data
                      MultigetSliceCounterQuery

•   setColumnFamily(“ByDay”)
•   setKeys("20111029")
•   setRange(”animal#sound=","animal#sound=~",false,1000)
•   We will get all animals and all of their sounds and counts for
    that day

• setRange(”sound#animal=purr#",”sound#animal=purr#~",false
  ,1000)
• We will get all animals that purr and their count


• What is with the tilde?


                                  10
Sort for success
Not magic, just Cassandra




           11
What it looks like in Cassandra
valsample1: String = "10/12/2011 11:22:33   GET   /sample?animal=duck&sound=quack&home=pond”
valsample4: String = "10/12/2011 11:22:33   GET   /sample?animal=cat&sound=purr&home=house”
valsample5: String = "10/12/2011 11:22:33   GET   /sample?animal=lion&sound=purr&home=zoo”
valsample6: String = "10/12/2011 11:22:33   GET   /sample?animal=dog&sound=woof&home=street"

[default@FixtureTestApophis] get ByDay[20111012];
=> (counter=animal#sound#home=cat#purr#house, value=70)
=> (counter=animal#sound#home=dog#woof#street, value=20)
=> (counter=animal#sound#home=duck#quack#pond, value=98)
=> (counter=animal#sound#home=lion#purr#zoo, value=70)
=> (counter=animal#sound=cat#purr, value=70)
=> (counter=animal#sound=dog#woof, value=20)
=> (counter=animal#sound=duck#quack, value=98)
=> (counter=animal#sound=lion#purr, value=70)
=> (counter=animal=cat, value=70)
=> (counter=animal=dog, value=20)
=> (counter=animal=duck, value=98)
=> (counter=animal=lion, value=70)
=> (counter=sound#animal=purr#cat, value=42)
=> (counter=sound#animal=purr#lion, value=42)
=> (counter=sound#animal=quack#duck, value=43)
=> (counter=sound#animal=woof#dog, value=20)
   (counter=total=, value=258)

https://github.com/joestein/apophis


                                                   12
A few more things about retrieving data

• You need to start backwards from here.
• If you want to-do things adhoc then map/reduce is better
• Sometimes more rows is better allowing more nodes to-dowork
   – If you need to look at 100,000 metrics it is better to pull this out
     of 100 rows than out of 1
   – Don’t be afraid to make CF and composite keys out of Time+
     Aggregate data
       • 20111023#animal=duck
       • This could be the row that holds ALL of the animal duck
         information for that day, if you want to look at 100 animals at
         once with 1000 metrics for each per time period, this is the
         way to go




                                    13
Q&A



Medialets
The rich media
adplatform for mobile.
            connect@medialets.com
            www.medialets.com/showcase




      14

More Related Content

Similar to Storing Time Series Metrics With Cassandra and Composite Columns

jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
Joe Stein
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Eve Lyons-Berg
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Aggregage
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
Pooyan Jamshidi
 
Westie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do ZabbixWestie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do Zabbix
Luiz Sales
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
Tomasz Kowal
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.David Tollmyr
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Daniel Katz
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015
Fastly
 
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
HostedbyConfluent
 
700 Tons of Code Later
700 Tons of Code Later700 Tons of Code Later
700 Tons of Code Later
Alexander Shopov
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
Herval Freire
 
Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET ApplicationMainul Islam, CSM®
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Phil Leggetter
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
Longhow Lam
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
ScyllaDB
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tcl
ke9tv
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
nathanmarz
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Future of Data Meetup
 

Similar to Storing Time Series Metrics With Cassandra and Composite Columns (20)

jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday AdminImprove Your Salesforce Efficiency: Formulas for the Everyday Admin
Improve Your Salesforce Efficiency: Formulas for the Everyday Admin
 
Architectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based SoftwareArchitectural Tradeoff in Learning-Based Software
Architectural Tradeoff in Learning-Based Software
 
Eugene goostman the bot
Eugene goostman the botEugene goostman the bot
Eugene goostman the bot
 
Westie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do ZabbixWestie - Um Framework canino em prol do Zabbix
Westie - Um Framework canino em prol do Zabbix
 
Very basic functional design patterns
Very basic functional design patternsVery basic functional design patterns
Very basic functional design patterns
 
Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.Gotcha! Ruby things that will come back to bite you.
Gotcha! Ruby things that will come back to bite you.
 
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
Quantitative Methods for Lawyers - Class #14 - R Boot Camp - Part 1 - Profess...
 
It Probably Works - QCon 2015
It Probably Works - QCon 2015It Probably Works - QCon 2015
It Probably Works - QCon 2015
 
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
Pragmatic Patterns (and Pitfalls) for Event Streaming in Brownfield Environme...
 
700 Tons of Code Later
700 Tons of Code Later700 Tons of Code Later
700 Tons of Code Later
 
Stress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year EvesStress Testing at Twitter: a tale of New Year Eves
Stress Testing at Twitter: a tale of New Year Eves
 
Performance Tuning of .NET Application
Performance Tuning of .NET ApplicationPerformance Tuning of .NET Application
Performance Tuning of .NET Application
 
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
Real-Time Web Apps & .NET. What Are Your Options? NDC Oslo 2016
 
Xomia_20220602.pptx
Xomia_20220602.pptxXomia_20220602.pptx
Xomia_20220602.pptx
 
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at TwitterScylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
Scylla Summit 2017: Managing 10,000 Node Storage Clusters at Twitter
 
Exact Real Arithmetic for Tcl
Exact Real Arithmetic for TclExact Real Arithmetic for Tcl
Exact Real Arithmetic for Tcl
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
Don't reengineer, reimagine: Hive buzzing with Druid's magic potion
 

More from Joe Stein

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
Joe Stein
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
Joe Stein
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
Joe Stein
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
Joe Stein
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
Joe Stein
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Joe Stein
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
Joe Stein
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
Joe Stein
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Joe Stein
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
Joe Stein
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
Joe Stein
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
Joe Stein
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
Joe Stein
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
Joe Stein
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
Joe Stein
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
Joe Stein
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
Joe Stein
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 

More from Joe Stein (20)

Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Get started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache MesosGet started with Developing Frameworks in Go on Apache Mesos
Get started with Developing Frameworks in Go on Apache Mesos
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Building and Deploying Application to Apache Mesos
Building and Deploying Application to Apache MesosBuilding and Deploying Application to Apache Mesos
Building and Deploying Application to Apache Mesos
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Introduction to Apache Mesos
Introduction to Apache MesosIntroduction to Apache Mesos
Introduction to Apache Mesos
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
 

Recently uploaded

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Vladimir Iglovikov, Ph.D.
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIEnchancing adoption of Open Source Libraries. A case study on Albumentations.AI
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AI
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Storing Time Series Metrics With Cassandra and Composite Columns

  • 1. Storing Time Series Metrics Implementing Multi-Dimensional Aggregate Composites with Counters For Reporting /* Joe Stein http://www.linkedin.com/in/charmalloc @allthingshadoop @cassandranosql @allthingsscala @charmalloc */ Sample code project up at https://github.com/joestein/apophis 1
  • 3. Medialets • Largest deployment of rich media ads for mobile devices • Over 300,000,000 devices supported • 3-4 TB of new data every day • Thousands of services in production • Hundreds of Thousands of simultaneous requests per second • Keeping track of what is and was going on when and where used to be difficult before we started using Cassandra • What do I do for Medialets? – Chief Architect and Head of Server Engineering Development & Operations. 3
  • 4. What does the schema look like? CREATE COLUMN FAMILY ByDay Column Families hold WITH default_validation_class=CounterColumnType your rows of data. Each AND key_validation_class=UTF8Type AND comparator=UTF8Type; row within each column family will be equal to the CREATE COLUMN FAMILY ByHour time period you are WITH default_validation_class=CounterColumnType dealing with. So an AND key_validation_class=UTF8Type AND comparator=UTF8Type; “event” occurring at 10/20/2011 11:22:41 will CREATE COLUMN FAMILY ByMinute WITH default_validation_class=CounterColumnType become 4 rows AND key_validation_class=UTF8Type AND comparator=UTF8Type; BySecond = 20111020112141 ByMinute= 201110201122 CREATE COLUMN FAMILY BySecond ByHour= 2011102011 WITH default_validation_class=CounterColumnType ByDay=20111020 AND key_validation_class=UTF8Type AND comparator=UTF8Type; 4
  • 5. Why multiple column families? http://www.datastax.com/docs/1.0/configuration/storage_configuration 5
  • 6. Ok now how do we keep track of what? Lets setup a quick example data set first • The Animal Logger – fictitious logger of the world around us – animal – food – sound – home • YYYY/MM/DD HH:MM:SS GET /sample?animal=X&food=Y – animal=duck&sound=quack&home=pond – animal=cat&sound=meow&home=house – animal=cat&sound=meow&home=street – animal=pigeon&sound=coo&home=street 6
  • 7. Now what? Columns babe, columns make your aggregates work • Setup your code for columns you want aggregated – animal= – animal#sound= – animal#home= – animal#food= – animal#food#home= – animal#food#sound= – animal#sound#home= – food#sound= – home#food= – sound#animal= 7
  • 8. Inserting data Column aggregate concatenated with values 2011/10/29 11:22:43 GET /sample?animal=duck&home=pond&sound=quack • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“20111029112243, “BySecond”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“201110291122, “ByMinute”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“2011102911, “ByHour”, HFactory.createCounterColumn(“animal=duck”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#sound#home=duck#quack#pond”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal#home=duck#pond”), 1)) • mutator.insertCounter(“20111029, “ByDay”, HFactory.createCounterColumn(“animal=duck”), 1)) 8
  • 9. The implementation, its functional kind of like “its electric” but without the boogie woogieoogie def r(columnName: String): Unit = { aggregateKeys.foreach{tuple:(ColumnFamily, String) => { val(columnFamily,row) = tuple if (row !=null &&row.size> 0) rows add (columnFamily -> row has columnName inc) //increment the counter } } } def ccAnimal(c: (String) => Unit) = { c(aggregateColumnNames("Animal") + animal) } //rows we are going to write too aggregateKeys(KEYSPACE "ByDay") = day aggregateKeys(KEYSPACE "ByHour") = hour aggregateKeys(KEYSPACE "ByMinute") = minute aggregateColumnNames("Animal") = "animal=” ccAnimal(r) 9
  • 10. Retrieving Data MultigetSliceCounterQuery • setColumnFamily(“ByDay”) • setKeys("20111029") • setRange(”animal#sound=","animal#sound=~",false,1000) • We will get all animals and all of their sounds and counts for that day • setRange(”sound#animal=purr#",”sound#animal=purr#~",false ,1000) • We will get all animals that purr and their count • What is with the tilde? 10
  • 11. Sort for success Not magic, just Cassandra 11
  • 12. What it looks like in Cassandra valsample1: String = "10/12/2011 11:22:33 GET /sample?animal=duck&sound=quack&home=pond” valsample4: String = "10/12/2011 11:22:33 GET /sample?animal=cat&sound=purr&home=house” valsample5: String = "10/12/2011 11:22:33 GET /sample?animal=lion&sound=purr&home=zoo” valsample6: String = "10/12/2011 11:22:33 GET /sample?animal=dog&sound=woof&home=street" [default@FixtureTestApophis] get ByDay[20111012]; => (counter=animal#sound#home=cat#purr#house, value=70) => (counter=animal#sound#home=dog#woof#street, value=20) => (counter=animal#sound#home=duck#quack#pond, value=98) => (counter=animal#sound#home=lion#purr#zoo, value=70) => (counter=animal#sound=cat#purr, value=70) => (counter=animal#sound=dog#woof, value=20) => (counter=animal#sound=duck#quack, value=98) => (counter=animal#sound=lion#purr, value=70) => (counter=animal=cat, value=70) => (counter=animal=dog, value=20) => (counter=animal=duck, value=98) => (counter=animal=lion, value=70) => (counter=sound#animal=purr#cat, value=42) => (counter=sound#animal=purr#lion, value=42) => (counter=sound#animal=quack#duck, value=43) => (counter=sound#animal=woof#dog, value=20) (counter=total=, value=258) https://github.com/joestein/apophis 12
  • 13. A few more things about retrieving data • You need to start backwards from here. • If you want to-do things adhoc then map/reduce is better • Sometimes more rows is better allowing more nodes to-dowork – If you need to look at 100,000 metrics it is better to pull this out of 100 rows than out of 1 – Don’t be afraid to make CF and composite keys out of Time+ Aggregate data • 20111023#animal=duck • This could be the row that holds ALL of the animal duck information for that day, if you want to look at 100 animals at once with 1000 metrics for each per time period, this is the way to go 13
  • 14. Q&A Medialets The rich media adplatform for mobile. connect@medialets.com www.medialets.com/showcase 14