SlideShare a Scribd company logo
1 of 59
Download to read offline
Clojure at BackType
How we learned to stop worrying and love the
               parentheses



                                    Nathan Marz
                                     BackType
                                    @nathanmarz
BackType

Data Services (APIs)

Social Media Analytics
      Dashboard
APIs
• Conversational graph for url
• Comment search
• #Tweets / URL
• Influence scores
• Top sites
• Trending links stream
• etc.
Clojure at BackType
URL Profiles
Site comparisons
Influencer Profiles
Twitter Account
   Analytics
Topic Analysis
Topic Analysis
BackType’s Challenges
BackType’s Challenges

 Complex analytics
BackType’s Challenges

 Complex analytics
on lots of data (> 30TB)
BackType’s Challenges

 Complex analytics
on lots of data (> 30TB)
      in realtime
Clojure at BackType

• Cascalog
• ElephantDB
• Storm
Let’s build an app
Let’s build an app
Cascalog

               Cascalog   Variables and logic
Abstraction




              Cascading   Tuples, data workflows

                              Key/value pairs,
              MapReduce        aggregation
Cascalog basics




 The “age” dataset
Cascalog basics
Cascalog basics




Define and
execute a query
Cascalog basics


        Where to
        emit results



Define and
execute a query
Cascalog basics


        Where to
        emit results

                   Output variables
Define and
execute a query
Cascalog basics


        Where to                      “Predicates”: constrain
        emit results                  the output variables

                   Output variables
Define and
execute a query
Predicates
Predicates


Input fields
Predicates


Input fields   Output fields
Predicates



Fields can be constants or variables
Predicates



Fields can be constants or variables

 Variables are prefixed with ? or !
Predicates
Predicates
• Functions
• Filters
• Aggregators
• Generators: finite sources of tuples
Example #1



    Generator   Filter
Example #2



Generator        Function
Example #3



Generator   Aggregator   Filter
Join example
Join example




     Triggers a join
Join example
Join example




Joins are an implementation detail
Cascalog demo!
Composability




 “Predicate macro”
Composability

       expands to




Using a predicate macro
Contrast to Pig




Pig’s AVG is 300 lines of code
Let’s build an app
Graph Schema
                              Reshare: true
Gender: female
                                      Property
                                                       Tweet: 456
 Property
                                                 Reaction
                    Reactor                                            Reactor
                                  Tweet: 123

            Alice
                                                                            Bob
                                                            Property
                                    Property



                    Content: RT @bob                        Content: Data is fun!
                       Data is fun!
ElephantDB
                                   Shard 0
                                   Shard 1
                                   Shard 2       Distributed
Key/Value pairs
                                   Shard 3       Filesystem
                    Pre-shard      Shard 4
                   and index in
                                   Shard 5
                   MapReduce


                  Generation of domain of data
ElephantDB
DFS                       ElephantDB
                             Server
Shard 0
Shard 1
Shard 2                   ElephantDB
                             Server
Shard 3
Shard 4
Shard 5                   ElephantDB
                             Server


     Serving domain of data
Storm

Stream Processing

 Distributed RPC
Stream processing

• Automatically distributes computation
• Horizontally scalable
• Fault-tolerant
• Guarantees processing of messages
Stream processing

                         DB

Queue
                         DB


                         DB

         Storm cluster
Raw data   What is a query?


                          View
Tweets   What is a query?


                       # Tweets for
                          a URL
Tweets   What is a query?

                        Influence
                       Score for a
                         person
Raw data   Computing a query


              Fully precompute view   DB   Query
Raw data   Computing a query


              Do a live compute from scratch   Query
Computing a query

                                 DB
Raw data




           Precompute subviews        Compute query from
                                 DB                        Query
                                       intermediate dbs

                                 DB
Distributed RPC


Application                                                           Queue

              “I want to know X, and return the results to me at Y”
Distributed RPC
              DBs




Queue                     App queries




          Storm cluster
(BackType is hiring)
Questions?

More Related Content

What's hot

Storm presentation
Storm presentationStorm presentation
Storm presentationShyam Raj
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easynathanmarz
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaAndrew Montalenti
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm ConceptsAndré Dias
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormJohn Georgiadis
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012Dan Lynn
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Databricks
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Robert Evans
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm毅 吕
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time ComputationSonal Raj
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleDung Ngua
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processingnathanmarz
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceRobert Evans
 

What's hot (20)

Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
Introduction to Storm
Introduction to StormIntroduction to Storm
Introduction to Storm
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Using Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems EasyUsing Simplicity to Make Hard Big Data Problems Easy
Using Simplicity to Make Hard Big Data Problems Easy
 
Real-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and KafkaReal-time streams and logs with Storm and Kafka
Real-time streams and logs with Storm and Kafka
 
Apache Storm Concepts
Apache Storm ConceptsApache Storm Concepts
Apache Storm Concepts
 
Real-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and StormReal-Time Analytics with Kafka, Cassandra and Storm
Real-Time Analytics with Kafka, Cassandra and Storm
 
Storm: The Real-Time Layer - GlueCon 2012
Storm: The Real-Time Layer  - GlueCon 2012Storm: The Real-Time Layer  - GlueCon 2012
Storm: The Real-Time Layer - GlueCon 2012
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
Assigning Responsibility for Deteriorations in Video Quality with Henry Milne...
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Storm and Cassandra
Storm and Cassandra Storm and Cassandra
Storm and Cassandra
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Analysis big data by use php with storm
Analysis big data by use php with stormAnalysis big data by use php with storm
Analysis big data by use php with storm
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
The inherent complexity of stream processing
The inherent complexity of stream processingThe inherent complexity of stream processing
The inherent complexity of stream processing
 
Multi-tenant Apache Storm as a service
Multi-tenant Apache Storm as a serviceMulti-tenant Apache Storm as a service
Multi-tenant Apache Storm as a service
 

Viewers also liked

Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Paul Lam
 
Clojure at a post office
Clojure at a post officeClojure at a post office
Clojure at a post officeLogan Campbell
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrongnathanmarz
 
Clojure: Towards The Essence of Programming
Clojure: Towards The Essence of ProgrammingClojure: Towards The Essence of Programming
Clojure: Towards The Essence of ProgrammingHoward Lewis Ship
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with ClojureJohn Stevenson
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Codemotion
 

Viewers also liked (7)

Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)Yet another startup built on Clojure(Script)
Yet another startup built on Clojure(Script)
 
Promise list
Promise listPromise list
Promise list
 
Clojure at a post office
Clojure at a post officeClojure at a post office
Clojure at a post office
 
Your Code is Wrong
Your Code is WrongYour Code is Wrong
Your Code is Wrong
 
Clojure: Towards The Essence of Programming
Clojure: Towards The Essence of ProgrammingClojure: Towards The Essence of Programming
Clojure: Towards The Essence of Programming
 
Getting started with Clojure
Getting started with ClojureGetting started with Clojure
Getting started with Clojure
 
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
Functional Reactive Programming with Kotlin on Android - Giorgio Natili - Cod...
 

Similar to Clojure at BackType

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseAge Mooij
 
The Return of the Living Datalog
The Return of the Living DatalogThe Return of the Living Datalog
The Return of the Living DatalogMike Fogus
 
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...Amazon Web Services
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
Scaling with SQL Server and SQL Azure Federations
Scaling with SQL Server and SQL Azure FederationsScaling with SQL Server and SQL Azure Federations
Scaling with SQL Server and SQL Azure FederationsMichael Rys
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleSean Cribbs
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Toolsboorad
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the DatabusAmy W. Tang
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?confluent
 
JRuby on Rails and Thread Safety
JRuby on Rails and Thread SafetyJRuby on Rails and Thread Safety
JRuby on Rails and Thread SafetyNaoto Takai
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleBishop Fox
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in ScalaAlex Payne
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)ALATechSource
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud ComputingAmazon Web Services
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLYan Cui
 
A Behind the Scenes Look at the Force.com Platform
A Behind the Scenes Look at the Force.com PlatformA Behind the Scenes Look at the Force.com Platform
A Behind the Scenes Look at the Force.com PlatformSalesforce Developers
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB
 

Similar to Clojure at BackType (20)

Scaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBaseScaling Out With Hadoop And HBase
Scaling Out With Hadoop And HBase
 
The Return of the Living Datalog
The Return of the Living DatalogThe Return of the Living Datalog
The Return of the Living Datalog
 
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
RMG202 Rainmakers: How Netflix Operates Clouds for Maximum Freedom and Agilit...
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
Scaling with SQL Server and SQL Azure Federations
Scaling with SQL Server and SQL Azure FederationsScaling with SQL Server and SQL Azure Federations
Scaling with SQL Server and SQL Azure Federations
 
Embrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with RippleEmbrace NoSQL and Eventual Consistency with Ripple
Embrace NoSQL and Eventual Consistency with Ripple
 
Large Scale Data Analysis Tools
Large Scale Data Analysis ToolsLarge Scale Data Analysis Tools
Large Scale Data Analysis Tools
 
Liferay and Cloud
Liferay and CloudLiferay and Cloud
Liferay and Cloud
 
All Aboard the Databus
All Aboard the DatabusAll Aboard the Databus
All Aboard the Databus
 
An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?What is Apache Kafka and What is an Event Streaming Platform?
What is Apache Kafka and What is an Event Streaming Platform?
 
JRuby on Rails and Thread Safety
JRuby on Rails and Thread SafetyJRuby on Rails and Thread Safety
JRuby on Rails and Thread Safety
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)Libraries and Linked Data: Looking to the Future (2)
Libraries and Linked Data: Looking to the Future (2)
 
High Performance Cloud Computing
High Performance Cloud ComputingHigh Performance Cloud Computing
High Performance Cloud Computing
 
Gerry McNicol Graph Databases
Gerry McNicol Graph DatabasesGerry McNicol Graph Databases
Gerry McNicol Graph Databases
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
A Behind the Scenes Look at the Force.com Platform
A Behind the Scenes Look at the Force.com PlatformA Behind the Scenes Look at the Force.com Platform
A Behind the Scenes Look at the Force.com Platform
 
MongoDB Basic Concepts
MongoDB Basic ConceptsMongoDB Basic Concepts
MongoDB Basic Concepts
 

More from nathanmarz

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineeringnathanmarz
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itnathanmarz
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypenathanmarz
 
Cascalog workshop
Cascalog workshopCascalog workshop
Cascalog workshopnathanmarz
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Daynathanmarz
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Groupnathanmarz
 

More from nathanmarz (10)

Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
The Epistemology of Software Engineering
The Epistemology of Software EngineeringThe Epistemology of Software Engineering
The Epistemology of Software Engineering
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Storm
StormStorm
Storm
 
Become Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackTypeBecome Efficient or Die: The Story of BackType
Become Efficient or Die: The Story of BackType
 
Cascalog workshop
Cascalog workshopCascalog workshop
Cascalog workshop
 
Cascalog at Hadoop Day
Cascalog at Hadoop DayCascalog at Hadoop Day
Cascalog at Hadoop Day
 
Cascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User GroupCascalog at May Bay Area Hadoop User Group
Cascalog at May Bay Area Hadoop User Group
 
Cascalog
CascalogCascalog
Cascalog
 
Cascading
CascadingCascading
Cascading
 

Clojure at BackType

Editor's Notes

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n
  42. \n
  43. \n
  44. \n
  45. \n
  46. \n
  47. \n
  48. \n
  49. \n
  50. \n
  51. \n
  52. \n