SlideShare a Scribd company logo
1 of 33
Polyglottany Is Not
a Sin
           Eric Lubow
           @elubow
           elubow@simplereach.com
           #MongoBoston
Overview
•   SimpleReach
•   Definitions and Data Stores
•   Evolution to Polyglottany
•   Tie It Together
•   Final Thoughts
•   Questions

    Polyglottany Is Not A Sin     Eric Lubow   @elubow
Socially Intelligent


Polyglottany Is Not A Sin                          Eric Lubow   @elubow
Size
•   150m events
    recorded per day
    and growing
•   600m Pageviews per
    month and growing




      Polyglottany Is Not A Sin   Eric Lubow   @elubow
Polyglot Persistence
Polyglot Persistence, like polyglot programming, is all about
choosing the right persistence option for the task at hand.
                              http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence




  Polyglottany Is Not A Sin                                              Eric Lubow     @elubow
Right Tool For The Job




Polyglottany Is Not A Sin        Eric Lubow   @elubow
Decisions. Decisions.
                                                                                    •   Is the
•   What are my query patterns?             •   Are my display requirements                                                   •   How fault tolerant is the system?
                                                                                        encryption/authentication/authoriz
                                                for realtime data?                      ation support sufficient for my
    Is my data ingestion high volume/high                                                                                         What supporting tools do I need?




                                                                                                      Tech
•                                                                                                                             •
                                                                                        needs?




                Data
    velocity?                               •   Do I need to aggregate data
                                                on the fly?                                                                   •   Is there support for my language?
                                                                                    •   Are there monitoring
•   Am I batch loading data?                                                            architectures already built?
                                            •   Is my data structured or
•   Am I write heavy or read heavy?             unstructured?                       •   Are there best practices guides
                                                                                        already
•   Are data relationships important?       •   Does my data lend itself to a
                                                specific design pattern?            •   Will the data need to be
•   Does my data need to be                                                             distributed?
    immediately available everywhere?
                                                                           Data   Tech

                                                                      Financial   Other
•   Am I cloud based?
                                                                                        Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)?




                                                                                                    Other
                                                                                    •




      Financial
•   Am I hardware based?
                                                                                    •   What kind of enterprise support is available?
•   Am I a cloud/iron hybrid?
                                                                                    •   What is the community like?
•   How much am I willing to spend?
                                                                                    •   Does the product roadmap pertain to my roadmap?
•   How much am I willing to spend if something goes wrong?

     Polyglottany Is Not A Sin                                                                                            Eric Lubow           @elubow
No One Size Fits All




Polyglottany Is Not A Sin          Eric Lubow   @elubow
Tools
                            C*



Polyglottany Is Not A Sin        Eric Lubow   @elubow
Free vs. Cost




Polyglottany Is Not A Sin                   Eric Lubow   @elubow
Languages




Polyglottany Is Not A Sin   Eric Lubow   @elubow
Pre-Scale




Polyglottany Is Not A Sin               Eric Lubow   @elubow
SimpleReach Pre-Scale




Polyglottany Is Not A Sin       Eric Lubow   @elubow
Scale




Polyglottany Is Not A Sin           Eric Lubow   @elubow
SimpleReach

                               C*


Polyglottany Is Not A Sin                 Eric Lubow   @elubow
Mongo Conference




Polyglottany Is Not A Sin       Eric Lubow   @elubow
Cassandra                                                       C*
•   Large data volume ingestion at high velocity
•   Really fast writes to many locations (eventual
    consistency)
•   Query by column groups within rows (slicing)
•   Opscenter
•   Data toolkit: more than a data storage layer
•   TTLs for small group aggregation
•   Wrote Helenus, Node.js driver for Cassandra
    Polyglottany Is Not A Sin                        Eric Lubow   @elubow
MongoDB
•   Fast atomic increments (Node.js is native JSON)
•   Sharding
•   Solid ORM for Rails (MongoID)
•   Fast access for pub/sub of durable/persisted documents
•   B-Tree Indexes
•   Document based via JSON
•   TTLs for ephemeral data

    Polyglottany Is Not A Sin                                Eric Lubow   @elubow
Redis
•   Supports hundreds of thousands transactions per second
•   Great caching engine
•   Supports useful variable types like sets, sorted set, lists
•   Everything is guaranteed to Memory Mapped (mmap)
•   Transactional and supports bulk operations
•   Centralized queueing and locking system


    Polyglottany Is Not A Sin                                     Eric Lubow   @elubow
Infobright
•   Works with standard MySQL driver
•   Column Stores for ad-hoc analytics queries in SQL
•   Databases built for business intelligence
•   Heavy compression of data
•   Pre-aggregated data (Knowledge Grid)




    Polyglottany Is Not A Sin                           Eric Lubow   @elubow
Ruby, Node.js, Python
•   Polyglottany doesn’t only apply to data stores
•   Each language has its own benefit to each data storage layer
•   Each language has its own individual benefits
•   JSON, APIs, Performance




    Polyglottany Is Not A Sin                                Eric Lubow   @elubow
Choice




Polyglottany Is Not A Sin            Eric Lubow   @elubow
Cons
•   Redis - Can only utilize a single core. SerDe price.
•   MySQL Column Store - DELETE/UPDATEs are VERY expensive
•   Cassandra - No btree indexes
•   Mongo - Indexes must fit in memory. Forced Replica ping times
•   Python - Whitespace. Community
•   Ruby - Not high performance enough for our standards
•   Javascript (Node.js) - Bad for CPU or IO intensive workloads


    Polyglottany Is Not A Sin                                      Eric Lubow   @elubow
Tying It Together
Even with the right tools, 80% of the work of building a
big data system is acquiring and refining the raw data into
usable data.




  Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Tying It Together




Polyglottany Is Not A Sin   Eric Lubow   @elubow
Tying It Together
•   Service Oriented Architecture (Internal API)
•   Data accuracy checks: visual and programmatic
•   Built framework for testing out storage engines
•   Access to many toolsets (for all languages and
    DBs)


    Polyglottany Is Not A Sin                 Eric Lubow   @elubow
Service Architecture
 Analytics
          C*
 Real-time
          C*
                            Internal API


Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Distributed Architecture
           US-EAST-1a            US-EAST-1b           US-EAST-1e



       CASSANDRA-0001         CASSANDRA-0002       CASSANDRA-0003

       CASSANDRA-0010         CASSANDRA-0011       CASSANDRA-0012

          REDIS-0001A           REDIS-0001B

           MYSQL-0001                                MYSQL-0002

    MONGO-SHARD-0000-A                           MONGO-SHARD-0000-B

    MONGO-SHARD-0001-B      MONGO-SHARD-0001-A

                            MONGO-SHARD-0002-B   MONGO-SHARD-0002-A

            iAPI-0001             iAPI-0002            iAPI-0003

Polyglottany Is Not A Sin                          Eric Lubow      @elubow
Points To Consider
•   Data consistency - Same in all data stores
•   How important is data durability?
•   Managing many servers (Chef, AWS, CSSH)
•   Managing and learning many different applications
    and tuning for them
•   Expertise



    Polyglottany Is Not A Sin                           Eric Lubow   @elubow
Expertise
•   What happens when you need help?
•   How do you become experts?
•   What happens when you need more experts?




    Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Summary
•   Polyglottany is not a sin
•   Know your data read/write
    patterns
•   Know the tools available to you
•   Know your compromises
•   Expertise


    Polyglottany Is Not A Sin         Eric Lubow   @elubow
We’re Hiring




Polyglottany Is Not A Sin                  Eric Lubow   @elubow
Questions are guaranteed in life.
Answers aren’t.
               Eric Lubow
               @elubow
               elubow@simplereach.com
               #MongoBoston

               Thank you.

More Related Content

What's hot

Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on RailsJohn McCaffrey
 
Reactive All the Way Down the Stack
Reactive All the Way Down the StackReactive All the Way Down the Stack
Reactive All the Way Down the StackSteve Pember
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuningJohn McCaffrey
 
Greach 2018: Surviving Microservices
Greach 2018: Surviving MicroservicesGreach 2018: Surviving Microservices
Greach 2018: Surviving MicroservicesSteve Pember
 
Windy cityrails performance_tuning
Windy cityrails performance_tuningWindy cityrails performance_tuning
Windy cityrails performance_tuningJohn McCaffrey
 
Programming for the Internet of Things
Programming for the Internet of ThingsProgramming for the Internet of Things
Programming for the Internet of ThingsKinoma
 
APIs for the Internet of Things
APIs for the Internet of ThingsAPIs for the Internet of Things
APIs for the Internet of ThingsKinoma
 
My Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicMy Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicApollo Clark
 
An Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureAn Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureEric Saxby
 
Api fundamentals
Api fundamentalsApi fundamentals
Api fundamentalsAgileDenver
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performanceMike North
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Derek Jacoby
 
Conexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraConexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraFabio Akita
 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaGeorge Wilson
 
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club Ukraine
You ain't gonna need write a GenServer - Ulisses Almeida  | Elixir Club UkraineYou ain't gonna need write a GenServer - Ulisses Almeida  | Elixir Club Ukraine
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club UkraineElixir Club
 
Powerful Automation Made Simple
Powerful Automation Made SimplePowerful Automation Made Simple
Powerful Automation Made SimpleGaetano Giunta
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilFabio Akita
 
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Fwdays
 

What's hot (20)

Freelancing and side-projects on Rails
Freelancing and side-projects on RailsFreelancing and side-projects on Rails
Freelancing and side-projects on Rails
 
Reactive All the Way Down the Stack
Reactive All the Way Down the StackReactive All the Way Down the Stack
Reactive All the Way Down the Stack
 
Cvcc performance tuning
Cvcc performance tuningCvcc performance tuning
Cvcc performance tuning
 
Greach 2018: Surviving Microservices
Greach 2018: Surviving MicroservicesGreach 2018: Surviving Microservices
Greach 2018: Surviving Microservices
 
Cloud tools
Cloud toolsCloud tools
Cloud tools
 
Windy cityrails performance_tuning
Windy cityrails performance_tuningWindy cityrails performance_tuning
Windy cityrails performance_tuning
 
Programming for the Internet of Things
Programming for the Internet of ThingsProgramming for the Internet of Things
Programming for the Internet of Things
 
APIs for the Internet of Things
APIs for the Internet of ThingsAPIs for the Internet of Things
APIs for the Internet of Things
 
My Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is MagicMy Little Webap - DevOpsSec is Magic
My Little Webap - DevOpsSec is Magic
 
An Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented ArchitectureAn Iterative Approach to Service Oriented Architecture
An Iterative Approach to Service Oriented Architecture
 
Api fundamentals
Api fundamentalsApi fundamentals
Api fundamentals
 
User-percieved performance
User-percieved performanceUser-percieved performance
User-percieved performance
 
Untangling - fall2017 - week 8
Untangling - fall2017 - week 8Untangling - fall2017 - week 8
Untangling - fall2017 - week 8
 
Conexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização PrematuraConexão Kinghost - Otimização Prematura
Conexão Kinghost - Otimização Prematura
 
Modern websites in 2020 and Joomla
Modern websites in 2020 and JoomlaModern websites in 2020 and Joomla
Modern websites in 2020 and Joomla
 
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club Ukraine
You ain't gonna need write a GenServer - Ulisses Almeida  | Elixir Club UkraineYou ain't gonna need write a GenServer - Ulisses Almeida  | Elixir Club Ukraine
You ain't gonna need write a GenServer - Ulisses Almeida | Elixir Club Ukraine
 
Powerful Automation Made Simple
Powerful Automation Made SimplePowerful Automation Made Simple
Powerful Automation Made Simple
 
Dibi Conference 2012
Dibi Conference 2012Dibi Conference 2012
Dibi Conference 2012
 
Premature optimisation: The Root of All Evil
Premature optimisation: The Root of All EvilPremature optimisation: The Root of All Evil
Premature optimisation: The Root of All Evil
 
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
Michael North "Ember.js 2 - Future-friendly ambitious apps, that scale!"
 

Similar to Polyglottany Is Not A Sin

C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?DataStax
 
Building and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with LiferayBuilding and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with Liferayrivetlogic
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011Craig Ulliott
 
Continuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsContinuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsTechWell
 
Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Adrian Roselli
 
Website Architecture Presentation from Web Strategy Workshops
Website Architecture Presentation from Web Strategy WorkshopsWebsite Architecture Presentation from Web Strategy Workshops
Website Architecture Presentation from Web Strategy WorkshopsCharles Edmunds
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryPaul Walk
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkJisc
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsRodrigo Campos
 
Building and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with LiferayBuilding and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with Liferayrivetlogic
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyondimoneytech
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsDorothea Salo
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managersNitin T Bhat
 
To Migrate or Not to Migrate? A Case Study on Database Migration
To Migrate or Not to Migrate?  A Case Study on Database MigrationTo Migrate or Not to Migrate?  A Case Study on Database Migration
To Migrate or Not to Migrate? A Case Study on Database MigrationVisual Resources Association
 
Technology Development in Early Stage Startup Indonesia
Technology Development in Early Stage Startup IndonesiaTechnology Development in Early Stage Startup Indonesia
Technology Development in Early Stage Startup Indonesiakargoindonesia
 
MLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSMLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSAntonChernov9
 
The business case for contributing code
The business case for contributing codeThe business case for contributing code
The business case for contributing codeZivtech, LLC
 

Similar to Polyglottany Is Not A Sin (20)

C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?C*ollege Credit: Is My App a Good Fit for Cassandra?
C*ollege Credit: Is My App a Good Fit for Cassandra?
 
Building and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with LiferayBuilding and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with Liferay
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011
 
Continuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOpsContinuous Integration Is for Everyone—Especially DevOps
Continuous Integration Is for Everyone—Especially DevOps
 
Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018Prototyping Accessibility - WordCamp Europe 2018
Prototyping Accessibility - WordCamp Europe 2018
 
Website Architecture Presentation from Web Strategy Workshops
Website Architecture Presentation from Web Strategy WorkshopsWebsite Architecture Presentation from Web Strategy Workshops
Website Architecture Presentation from Web Strategy Workshops
 
Technical Challenges in Resource Discovery
Technical Challenges in Resource DiscoveryTechnical Challenges in Resource Discovery
Technical Challenges in Resource Discovery
 
Technical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul WalkTechnical Coping Strategies for Resource Discovery - Paul Walk
Technical Coping Strategies for Resource Discovery - Paul Walk
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOps
 
Building and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with LiferayBuilding and Deploying a Global Intranet with Liferay
Building and Deploying a Global Intranet with Liferay
 
Surviving a Hackathon and Beyond
Surviving a Hackathon and BeyondSurviving a Hackathon and Beyond
Surviving a Hackathon and Beyond
 
Lipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library SystemsLipstick on a Pig: Integrated Library Systems
Lipstick on a Pig: Integrated Library Systems
 
What is devops
What is devopsWhat is devops
What is devops
 
Tech essentials for Product managers
Tech essentials for Product managersTech essentials for Product managers
Tech essentials for Product managers
 
Optimera STHLM 2011 - Mikael Berggren, Spotify
Optimera STHLM 2011 - Mikael Berggren, SpotifyOptimera STHLM 2011 - Mikael Berggren, Spotify
Optimera STHLM 2011 - Mikael Berggren, Spotify
 
To Migrate or Not to Migrate? A Case Study on Database Migration
To Migrate or Not to Migrate?  A Case Study on Database MigrationTo Migrate or Not to Migrate?  A Case Study on Database Migration
To Migrate or Not to Migrate? A Case Study on Database Migration
 
Technology Development in Early Stage Startup Indonesia
Technology Development in Early Stage Startup IndonesiaTechnology Development in Early Stage Startup Indonesia
Technology Development in Early Stage Startup Indonesia
 
MLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWSMLOps for living: Infrastructure-as-Code on AWS
MLOps for living: Infrastructure-as-Code on AWS
 
The business case for contributing code
The business case for contributing codeThe business case for contributing code
The business case for contributing code
 
Tech diligence
Tech diligenceTech diligence
Tech diligence
 

Polyglottany Is Not A Sin

  • 1. Polyglottany Is Not a Sin Eric Lubow @elubow elubow@simplereach.com #MongoBoston
  • 2. Overview • SimpleReach • Definitions and Data Stores • Evolution to Polyglottany • Tie It Together • Final Thoughts • Questions Polyglottany Is Not A Sin Eric Lubow @elubow
  • 3. Socially Intelligent Polyglottany Is Not A Sin Eric Lubow @elubow
  • 4. Size • 150m events recorded per day and growing • 600m Pageviews per month and growing Polyglottany Is Not A Sin Eric Lubow @elubow
  • 5. Polyglot Persistence Polyglot Persistence, like polyglot programming, is all about choosing the right persistence option for the task at hand. http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence Polyglottany Is Not A Sin Eric Lubow @elubow
  • 6. Right Tool For The Job Polyglottany Is Not A Sin Eric Lubow @elubow
  • 7. Decisions. Decisions. • Is the • What are my query patterns? • Are my display requirements • How fault tolerant is the system? encryption/authentication/authoriz for realtime data? ation support sufficient for my Is my data ingestion high volume/high What supporting tools do I need? Tech • • needs? Data velocity? • Do I need to aggregate data on the fly? • Is there support for my language? • Are there monitoring • Am I batch loading data? architectures already built? • Is my data structured or • Am I write heavy or read heavy? unstructured? • Are there best practices guides already • Are data relationships important? • Does my data lend itself to a specific design pattern? • Will the data need to be • Does my data need to be distributed? immediately available everywhere? Data Tech Financial Other • Am I cloud based? Do I have legal requirements (HIPAA/FIPS/Sarbanes Oxley/PII)? Other • Financial • Am I hardware based? • What kind of enterprise support is available? • Am I a cloud/iron hybrid? • What is the community like? • How much am I willing to spend? • Does the product roadmap pertain to my roadmap? • How much am I willing to spend if something goes wrong? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 8. No One Size Fits All Polyglottany Is Not A Sin Eric Lubow @elubow
  • 9. Tools C* Polyglottany Is Not A Sin Eric Lubow @elubow
  • 10. Free vs. Cost Polyglottany Is Not A Sin Eric Lubow @elubow
  • 11. Languages Polyglottany Is Not A Sin Eric Lubow @elubow
  • 12. Pre-Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 13. SimpleReach Pre-Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 14. Scale Polyglottany Is Not A Sin Eric Lubow @elubow
  • 15. SimpleReach C* Polyglottany Is Not A Sin Eric Lubow @elubow
  • 16. Mongo Conference Polyglottany Is Not A Sin Eric Lubow @elubow
  • 17. Cassandra C* • Large data volume ingestion at high velocity • Really fast writes to many locations (eventual consistency) • Query by column groups within rows (slicing) • Opscenter • Data toolkit: more than a data storage layer • TTLs for small group aggregation • Wrote Helenus, Node.js driver for Cassandra Polyglottany Is Not A Sin Eric Lubow @elubow
  • 18. MongoDB • Fast atomic increments (Node.js is native JSON) • Sharding • Solid ORM for Rails (MongoID) • Fast access for pub/sub of durable/persisted documents • B-Tree Indexes • Document based via JSON • TTLs for ephemeral data Polyglottany Is Not A Sin Eric Lubow @elubow
  • 19. Redis • Supports hundreds of thousands transactions per second • Great caching engine • Supports useful variable types like sets, sorted set, lists • Everything is guaranteed to Memory Mapped (mmap) • Transactional and supports bulk operations • Centralized queueing and locking system Polyglottany Is Not A Sin Eric Lubow @elubow
  • 20. Infobright • Works with standard MySQL driver • Column Stores for ad-hoc analytics queries in SQL • Databases built for business intelligence • Heavy compression of data • Pre-aggregated data (Knowledge Grid) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 21. Ruby, Node.js, Python • Polyglottany doesn’t only apply to data stores • Each language has its own benefit to each data storage layer • Each language has its own individual benefits • JSON, APIs, Performance Polyglottany Is Not A Sin Eric Lubow @elubow
  • 22. Choice Polyglottany Is Not A Sin Eric Lubow @elubow
  • 23. Cons • Redis - Can only utilize a single core. SerDe price. • MySQL Column Store - DELETE/UPDATEs are VERY expensive • Cassandra - No btree indexes • Mongo - Indexes must fit in memory. Forced Replica ping times • Python - Whitespace. Community • Ruby - Not high performance enough for our standards • Javascript (Node.js) - Bad for CPU or IO intensive workloads Polyglottany Is Not A Sin Eric Lubow @elubow
  • 24. Tying It Together Even with the right tools, 80% of the work of building a big data system is acquiring and refining the raw data into usable data. Polyglottany Is Not A Sin Eric Lubow @elubow
  • 25. Tying It Together Polyglottany Is Not A Sin Eric Lubow @elubow
  • 26. Tying It Together • Service Oriented Architecture (Internal API) • Data accuracy checks: visual and programmatic • Built framework for testing out storage engines • Access to many toolsets (for all languages and DBs) Polyglottany Is Not A Sin Eric Lubow @elubow
  • 27. Service Architecture Analytics C* Real-time C* Internal API Polyglottany Is Not A Sin Eric Lubow @elubow
  • 28. Distributed Architecture US-EAST-1a US-EAST-1b US-EAST-1e CASSANDRA-0001 CASSANDRA-0002 CASSANDRA-0003 CASSANDRA-0010 CASSANDRA-0011 CASSANDRA-0012 REDIS-0001A REDIS-0001B MYSQL-0001 MYSQL-0002 MONGO-SHARD-0000-A MONGO-SHARD-0000-B MONGO-SHARD-0001-B MONGO-SHARD-0001-A MONGO-SHARD-0002-B MONGO-SHARD-0002-A iAPI-0001 iAPI-0002 iAPI-0003 Polyglottany Is Not A Sin Eric Lubow @elubow
  • 29. Points To Consider • Data consistency - Same in all data stores • How important is data durability? • Managing many servers (Chef, AWS, CSSH) • Managing and learning many different applications and tuning for them • Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 30. Expertise • What happens when you need help? • How do you become experts? • What happens when you need more experts? Polyglottany Is Not A Sin Eric Lubow @elubow
  • 31. Summary • Polyglottany is not a sin • Know your data read/write patterns • Know the tools available to you • Know your compromises • Expertise Polyglottany Is Not A Sin Eric Lubow @elubow
  • 32. We’re Hiring Polyglottany Is Not A Sin Eric Lubow @elubow
  • 33. Questions are guaranteed in life. Answers aren’t. Eric Lubow @elubow elubow@simplereach.com #MongoBoston Thank you.

Editor's Notes

  1. SimpleReach is a social intelligence tool for content creators. We track everything social action, on every major network, across the entire web in real-time. That means every like, tweet, pin, stumble and many more.