SlideShare a Scribd company logo
reimagining the business of
apps
©2013 NativeX Holdings, LLC
The Perils and Triumphs of using
Cassandra at a .NET/Microsoft Shop
About the Presenters
Jeff Smoley – Sr. Infrastructure Architect
Derek Bromenshenkel – Infrastructure Architect
Agenda
• About NativeX
• Why Cassandra?
• Challenges
• Auto Id Generation
• FluentCassandra
• Hector
• IKVM.NET
• HectorNet
• Reporting Integration
• Data Modeling
• Lessons Learned
©2013 NativeX Holdings, LLC
About NativeX
• Formerly W3i
• Home Office in Sartell, MN
• 75 miles NW of Minneapolis
• Remote Offices in MSP and SF
• ~120 Employees
©2013 NativeX Holdings, LLC
What NativeX Does
• Marketing technology
platform that enables
developers to build
successful business around
their apps.
• We provide Publishers with
a way to monetize and
Advertisers with a way to
gain distribution through a
native ad experience.
Mobile Vanity Metrics
• Over 1B unique devices
• 1000s of Apps
• > 135M Monthly Active Users
• > 200GB of data ingest per week
Agenda
• About NativeX
• Why Cassandra?
• Challenges
• Auto Id Generation
• FluentCassandra
• Hector
• IKVM.NET
• HectorNet
• Reporting Integration
• Data Modeling
• Lessons Learned
©2013 NativeX Holdings, LLC
Backstory
• In early 2012 realized
infrastructure needed to
change to support high
growth business.
• From 100M session/quarter
to 6B+.
0
1
2
3
4
5
6
7
2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3
Billions
API Requests
©2013 NativeX Holdings, LLC
Original OLTP Architecture
• Microsoft SQL Server
• 2 Node Cluster (Failover)
• 12 cores / node
• 192 GB mem / node
• Compellent SAN
• 172 Tiered Disk
• SSD, FC, SATA
©2013 NativeX Holdings, LLC
Objectives
Scale
• Horizontal
• Incremental cost
structure
Resiliency
• No single point
of failure
• Geographically
distributed
©2013 NativeX Holdings, LLC
What is NoSQL
• Stands for Not Only SQL.
• The NoSQL movement is about understanding problems and
focusing on solutions.
• It‟s not about silver bullets and black boxes.
• It is about using the right tool for the right problem.
©2013 NativeX Holdings, LLC
Researched Products
• Compared features like:
• Distributed / Shared Nothing
• Multi-Cluster Support
• Maturity & Popularity
• Documentation
• .NET Support
©2013 NativeX Holdings, LLC
Selecting Cassandra
DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity
MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High
VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low
MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium
MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low
Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High
CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium
RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium
Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium
*Disclaimer, this data was complied in spring of 2012 and my not reflect the
current state of each database system shown here.
http://nosql.mypopescu.com/ is a helpful site for discovering and learning about
different DB Systems.
©2013 NativeX Holdings, LLC
Top Choices
• MySQL Cluster
• Relational and very familiar.
• Has physical row limitations.
• MongoDB
• Data modeling was simpler than C*.
• Not very clear if it had multi-cluster support.
• Cassandra
• At the very core it‟s all about scalability and resiliency.
• Data modeling a little scary, immature .Net support.
©2013 NativeX Holdings, LLC
Why Cassandra?
• Multi-node
• Multi-cluster
• Highly Available
• Durable
• Shared Nothing
• Tunable Consistency
©2013 NativeX Holdings, LLC
Cassandra at NativeX
• C* was not a replacement DB system.
• We continue to use MS SQL Server alongside C*.
• SQL Server used for storing configuration data.
• C* solves a very specific problem for us.
• Writing large volumes of data quickly.
• Reading very specific data out of a large record set.
Challenges
• C* does not have Auto Id generation.
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC
Auto ID Generation
• Pre-existing requirements
• Unique, 64-bit positive integers
• Increasing (sortable) a plus
• Previously SQL Server Identity column
• A Time-based UUID is sortable and unique
• Changed everything we could
• The future for us
©2013 NativeX Holdings, LLC
Auto ID – What are the options?
• SQL dummy table
• Easy & familiar, but limited
• Pre-generated range
• Proposed by Datastax‟s Architect
• Distributed, but more complicated to implement
• Sharding [Instagram]
• Discovered too late
• Unfamiliar with Postgres
©2013 NativeX Holdings, LLC
We chose Snowflake
• Built by Twitter, Apache 2.0 license
• https://github.com/twitter/snowflake
• “… network service for generating unique ID numbers at high
scale..”
• Same motivation; MySQL -> C*
• A few tweaks for our Windows environment
©2013 NativeX Holdings, LLC
Technical reasons for Snowflake
• Meets all requirements
• Tested in high transaction system
• Java based [Scala] implementation
• Thrift server
• Run as a Windows service with Apache Daemon
• Con: Requires Apache Zookeeper
• Coordinate the worker id
©2013 NativeX Holdings, LLC
Connecting to Snowflake
• Built our own .NET
Snowflake Client
• Snowflake server on each
web node
• Local instance is primary
• Round robin failover to other
nodes
• Auto failover AND recovery
• “Circuit Breaker” pattern
Web
App
SF
Server 1
Web
App
SF
Server 3
Web
App
SF
Server 2
Web
App
SF
Server 4
Challenges
• How to connect to C* with C#?
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC
Connecting to Cassandra with C#
• Thrift alone too low level
• Needs
• CQL support
• Active development / support
• Wants
• ADO.NET / LINQ feel
• ????
• FluentCassandra is where we started
©2013 NativeX Holdings, LLC
Vetting FluentCassandra
• Pros
• Open source -
https://github.com/fluentcassandra/fluentcassandra
• Nick Berardi, project owner, is excellent
• Designed for CQL
• Familiar feel
• Were able to start project development with it
©2013 NativeX Holdings, LLC
Vetting FluentCassandra
• Cons
• Immaturity
• Few users with high transaction system
• Permanent node blacklisting
• Lacked auto retry
• Couldn‟t live with these limitations
• Tried adding resources dedicated to maturing it
Challenges
• Finding a connector with good Failure Tolerance.
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC
Hector: Yes, please
• Popular C* connector
• Use cases matching ours
• Good maturity
• Auto node discovery
• Auto retry
• Auto failure recovery
• Written in Java – major roadblock
©2013 NativeX Holdings, LLC
Help!
• We knew we still needed help.
• We found a company named Concord.
• Based out of the Twin Cites.
• Specialize in System, Process, and Data Integration.
• http://concordusa.com/
©2013 NativeX Holdings, LLC
Concord’s Recommendation
• Concord recommended that we use IKVM.NET to port Hector to
a .NET assembly.
• They had previous success using IKVM for other Java to .NET
ports.
• They felt that maturing FluentCassandra was going to take
longer than our timeline allowed.
©2013 NativeX Holdings, LLC
About the IKVM.NET Project
• http://www.ikvm.net/
• Open Source Project.
• Main contributor is Jeroen Frijters.
• He is actively contributing to the project.
• License allows for use in commercial applications.
©2013 NativeX Holdings, LLC
What is IKVM.NET?
• IKVM.NET includes the following components:
• A Java Virtual Machine implemented in .NET.
• A .NET implementation of the Java class libraries.
• Set of tools that enable Java and .NET interoperability.
©2013 NativeX Holdings, LLC
Uses for IKVM
• Drop-in JVM
• Included is a distribution of a .NET implementation of a Java
Virtual Machine.
• Allows you to run jar files using the .NET stack.
• Example: ikvm -jar myapp.jar
©2013 NativeX Holdings, LLC
Uses for IKVM
• Use Java libraries in your .NET applications
• Using ikvmc you can compile Java bytecode to .NET IL.
• Example: ikvmc -target:library mylib.jar
©2013 NativeX Holdings, LLC
Uses for IKVM
• Develop .NET applications in Java
• Write code in Java.
• Compile to JVM bytecode.
• Use ikvmc to produce a .NET Executable.
• Can also use .NET API‟s in Java code using the ikvmstub
application to generate a Java jar file.
• Example: ikvmstub MyDotNetAssemblyName
©2013 NativeX Holdings, LLC
Hector Converted to .NET
• Per Concord‟s recommendation we chose to compile the Hector
jar into a .NET Assembly.
• Hector and all of it‟s dependencies are pulled into one .NET
dll that can be referenced by any .NET assembly.
• In addition you will have to reference some core IKVM
assemblies.
• Each Java dependency is given it‟s own namespace with in
the .NET dll.
©2013 NativeX Holdings, LLC
HectorNet
• Concord also created a dll called HectorNet that wraps some of
the Hector behaviors and makes it feel more like .NET.
• Such as supporting connection strings.
• Mapping Thrift byte arrays to .NET data types.
• Mapping to native .NET collections instead of using Java
collections.
Why Not DataStax C# Driver?
• We built everything using CQL 2.0.
• Wasn‟t ready in time for our launch date.
Challenges
• How to integrate our reporting system?
©2013 NativeX Holdings, LLC
Integrating Reporting
OLTP
C*
Extract Transform
CUBE
SSAS
OLAP
MS SQL
Load
ETL - SSIS
©2013 NativeX Holdings, LLC
Integrating Reporting
• The SSIS Extract process uses C# Script Tasks.
• Script Task needs references to HectorNet and all of its
dependencies.
• SSIS can only reference assemblies that are in the GAC.
• Assemblies in the GAC have to be Signed.
Agenda
• About NativeX
• Why Cassandra?
• Challenges
• Auto Id Generation
• FluentCassandra
• Hector
• IKVM.NET
• HectorNet
• Reporting Integration
• Data Modeling
• Lessons Learned
©2013 NativeX Holdings, LLC
Data Classification
• NativeX has three major classifications of data.
• Configuration or Master Data
• Activity Tracking
• Device History
©2013 NativeX Holdings, LLC
Configuration Data
• Also referred to as Lookup Data or Master Data.
• This data is relatively small in terms of record counts.
• 10s – 100,000s of records not millions.
• Is used to operationally run our products.
©2013 NativeX Holdings, LLC
Configuration Data
• Examples in NativeX‟s business:
• Mobile Apps
• Offers
• Campaigns
• Restrictions
• Queue Settings
Relational Data
Configuration data is typically relational in nature and
therefore we continue to store it in MS SQL Server.
©2013 NativeX Holdings, LLC
C* Data Modeling Basics
• Data is stored inside of Column Families using nested
Key/Value Pairs.
• A CF can be thought of as a Table.
• They are made up of Rows and Columns.
• However, CFs do not have direct relationships to each other.
• You typically deal with one row at a time.
©2013 NativeX Holdings, LLC
Rows
A Row is the first level of the nested Key/Value pairs.
• A Row consists of:
• A Row Key (unique to the CF).
• A Row Value which is 1 to many Columns.
• A Row will typically represent:
• Single Entity/Record.
• Multiple records (known as a Wide Row CF).
©2013 NativeX Holdings, LLC
Columns
A Column is the second level of the nested Key/Value pairs.
• A Column consists of:
• A Column Name (Key) (unique to the Row).
• A Column Value.
©2013 NativeX Holdings, LLC
Column Name
• Column Names can consists of a value of any data type.
• String, Integer, Date, UUID (GUID), etc.
• The Column Name is stored as part of every column.
• This means it has an impact to the size of your data.
• Can also use the Column Name to store data.
©2013 NativeX Holdings, LLC
Column Value
• A Column Value will typically contain:
• A single value such as an integer, string, date, etc.
• A whole record usually represented in XML, JSON, or some
other document or object structure.
©2013 NativeX Holdings, LLC
CF - Putting it all Together
©2013 NativeX Holdings, LLC
Wide Row CF
• A collection of like records organized into a single row.
• Each record is stored as a distinct column.
• Not unheard of for each row to have millions of columns.
• Data is often denormalized into XML or JSON documents.
• Good for storing:
• Time Series Data
• Event Series Data
• Logging Data
• Tracking Data
©2013 NativeX Holdings, LLC
Wide Row Examples
Agenda
• About NativeX
• Why Cassandra?
• Challenges
• Auto Id Generation
• FluentCassandra
• Hector
• IKVM.NET
• HectorNet
• Reporting Integration
• Data Modeling
• Lessons Learned
©2013 NativeX Holdings, LLC
Lessons Learned
• Get into production early
• Migration is hard
• Data Import = Reality
• Dev team needs to be integrated right away
• Training
• Operations / Troubleshooting
• Understanding your I/O profile is really important
• Are you sure you‟re write heavy?
• Effects your hardware config, i.e. SSDs for us
©2013 NativeX Holdings, LLC
Lessons Learned
• Cluster sizing and hardware selection
• Dependent on data set + workload
• You might get it wrong the first time
• Enterprise vs. „commodity‟
• Cassandra changes quickly
• You need to keep up
• Leverage mailing list, forums, release notes
• Scalable systems like C* have a massive amount of knobs, you
need to know them
©2013 NativeX Holdings, LLC
Projections
Month Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14
Pub DAU 18,500,000 28,500,000 33,500,000 38,500,000 43,500,000 48,500,000 53,500,000 58,500,000 63,500,000
Adv DAU 12,000,000 6,600,000 7,600,000 8,600,000 9,600,000 10,600,000 11,600,000 12,600,000 13,600,000
Total Devices 1,060,000,000 1,120,000,000 1,180,000,000 1,240,000,000 1,300,000,000 1,360,000,000 1,420,000,000 1,480,000,000 1,540,000,000
Nodes Need for Disk 9 12 13 15 16 17 19 20 21
Nodes Need for BF 22 16 17 18 19 20 21 22 23
Nodes Need for RR 14 16 19 22 24 27 30 33 35
Capacities
Number of Nodes 30.00
Usable Space/Node (GB) 600.00
Total Usable Space (GB) 18,000.00
Memory/Node (GB) 64.00
JVM Heap Size (GB) 8.00
BF Size / Node (GB) 1.50
Replication Factor 3.00
Read Requests/Node 1,000.00
Understand which KPI represents Node capacity.
DSE for the Win!
• We use DataStax Enterprise.
• Mainly for support, which continues to be a life saver.
©2013 NativeX Holdings, LLC
Thank you!
• Join the MSP C* Meetup
• http://www.meetup.com/Minneapolis-St-Paul-Cassandra-Meetup/
• Contact us
• Jeff.Smoley@nativex.com
• Derek.Bromenshenkel@nativex.com @breakingtrail
• Slide Deck
• http://www.slideshare.net/jjsmoley/the-perils-and-triumphs-of-
using-cassandra-at-a-netmicrosoft-shop

More Related Content

What's hot

Docker San Diego 2015-03-25
Docker San Diego 2015-03-25Docker San Diego 2015-03-25
Docker San Diego 2015-03-25
Casey Bisson
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
Jonas Bonér
 
Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta Users
Hazelcast
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
Kellyn Pot'Vin-Gorman
 
Digitally Transform (And Keep) Your On-Premises File Servers
Digitally Transform (And Keep) Your On-Premises File ServersDigitally Transform (And Keep) Your On-Premises File Servers
Digitally Transform (And Keep) Your On-Premises File Servers
Aidan Finn
 
Make a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNASMake a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNAS
Buurst
 
Melbourne Chef Meetup: Automating Azure Compliance with InSpec
Melbourne Chef Meetup: Automating Azure Compliance with InSpecMelbourne Chef Meetup: Automating Azure Compliance with InSpec
Melbourne Chef Meetup: Automating Azure Compliance with InSpec
Matt Ray
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
adunne
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
Marc Cluet
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache Cassandra
Johnny Miller
 
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
PROIDEA
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
Johnny Miller
 
Pulling Back the Cloud Curtain
Pulling Back the Cloud CurtainPulling Back the Cloud Curtain
Pulling Back the Cloud Curtain
Sagi Brody
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture Tony Ng
 
The DBA 3.0 Upgrade
The DBA 3.0 UpgradeThe DBA 3.0 Upgrade
The DBA 3.0 Upgrade
Sean Scott
 
Turbocharging php applications with zend server
Turbocharging php applications with zend serverTurbocharging php applications with zend server
Turbocharging php applications with zend server
Eric Ritchie
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
Cloudera, Inc.
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
Linjith Kunnon
 
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Sagi Brody
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with Puppet
Mark Voelker
 

What's hot (20)

Docker San Diego 2015-03-25
Docker San Diego 2015-03-25Docker San Diego 2015-03-25
Docker San Diego 2015-03-25
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Hazelcast for Terracotta Users
Hazelcast for Terracotta UsersHazelcast for Terracotta Users
Hazelcast for Terracotta Users
 
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys  How to Build a Successful Microsoft DevOps Including the DataDevOps and Decoys  How to Build a Successful Microsoft DevOps Including the Data
DevOps and Decoys How to Build a Successful Microsoft DevOps Including the Data
 
Digitally Transform (And Keep) Your On-Premises File Servers
Digitally Transform (And Keep) Your On-Premises File ServersDigitally Transform (And Keep) Your On-Premises File Servers
Digitally Transform (And Keep) Your On-Premises File Servers
 
Make a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNASMake a Move to the Azure Cloud with SoftNAS
Make a Move to the Azure Cloud with SoftNAS
 
Melbourne Chef Meetup: Automating Azure Compliance with InSpec
Melbourne Chef Meetup: Automating Azure Compliance with InSpecMelbourne Chef Meetup: Automating Azure Compliance with InSpec
Melbourne Chef Meetup: Automating Azure Compliance with InSpec
 
Scalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and ApproachesScalable Web Architectures: Common Patterns and Approaches
Scalable Web Architectures: Common Patterns and Approaches
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache Cassandra
 
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
PLNOG19 - Piotr Wojciechowski - Sieć w chmurze publicznej i hybrydowej dla si...
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
 
Pulling Back the Cloud Curtain
Pulling Back the Cloud CurtainPulling Back the Cloud Curtain
Pulling Back the Cloud Curtain
 
eBay Architecture
eBay Architecture eBay Architecture
eBay Architecture
 
The DBA 3.0 Upgrade
The DBA 3.0 UpgradeThe DBA 3.0 Upgrade
The DBA 3.0 Upgrade
 
Turbocharging php applications with zend server
Turbocharging php applications with zend serverTurbocharging php applications with zend server
Turbocharging php applications with zend server
 
Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption Overview of HDFS Transparent Encryption
Overview of HDFS Transparent Encryption
 
Micro Service Architecture
Micro Service ArchitectureMicro Service Architecture
Micro Service Architecture
 
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
Enabling Limitless Connectivity, Opportunity and Growth with Interconnection ...
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with Puppet
 

Similar to The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop

Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
mathraq
 
Basics of Java Cloud
Basics of Java CloudBasics of Java Cloud
Basics of Java Cloud
Ankur Gupta
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the Cloud
Instaclustr
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with Docker
Web à Québec
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
TechWell
 
Virtualization Vs. Containers
Virtualization Vs. ContainersVirtualization Vs. Containers
Virtualization Vs. Containers
actualtechmedia
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
Karthik Gaekwad
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise
Shy Engelberg
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
EDB
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
Newt Global Consulting LLC
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)
Camuel Gilyadov
 
Migrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWSMigrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWS
Claes Buckwalter
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
Markus Eisele
 
Fuse integration-services
Fuse integration-servicesFuse integration-services
Fuse integration-services
Christian Posta
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
Francisco González Jiménez
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
Raul Chong
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
DataStax Academy
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
botsplash.com
 

Similar to The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop (20)

Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Basics of Java Cloud
Basics of Java CloudBasics of Java Cloud
Basics of Java Cloud
 
Apache Cassandra in the Cloud
Apache Cassandra in the CloudApache Cassandra in the Cloud
Apache Cassandra in the Cloud
 
Rami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with DockerRami Sayar - Node microservices with Docker
Rami Sayar - Node microservices with Docker
 
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
Can Your Mobile Infrastructure Survive 1 Million Concurrent Users?
 
Virtualization Vs. Containers
Virtualization Vs. ContainersVirtualization Vs. Containers
Virtualization Vs. Containers
 
Why to Cloud Native
Why to Cloud NativeWhy to Cloud Native
Why to Cloud Native
 
SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise SQL 2014 hybrid platform - Azure and on premise
SQL 2014 hybrid platform - Azure and on premise
 
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Webinar : Docker in Production
Webinar : Docker in ProductionWebinar : Docker in Production
Webinar : Docker in Production
 
Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)Apache Drill (ver. 0.1, check ver. 0.2)
Apache Drill (ver. 0.1, check ver. 0.2)
 
Migrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWSMigrating a build farm from on-prem to AWS
Migrating a build farm from on-prem to AWS
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
Fuse integration-services
Fuse integration-servicesFuse integration-services
Fuse integration-services
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
Azure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User StoreAzure + DataStax Enterprise Powers Office 365 Per User Store
Azure + DataStax Enterprise Powers Office 365 Per User Store
 
Cloud computing options
Cloud computing optionsCloud computing options
Cloud computing options
 

Recently uploaded

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop

  • 1. reimagining the business of apps ©2013 NativeX Holdings, LLC The Perils and Triumphs of using Cassandra at a .NET/Microsoft Shop
  • 2. About the Presenters Jeff Smoley – Sr. Infrastructure Architect Derek Bromenshenkel – Infrastructure Architect
  • 3. Agenda • About NativeX • Why Cassandra? • Challenges • Auto Id Generation • FluentCassandra • Hector • IKVM.NET • HectorNet • Reporting Integration • Data Modeling • Lessons Learned
  • 4. ©2013 NativeX Holdings, LLC About NativeX • Formerly W3i • Home Office in Sartell, MN • 75 miles NW of Minneapolis • Remote Offices in MSP and SF • ~120 Employees
  • 5. ©2013 NativeX Holdings, LLC What NativeX Does • Marketing technology platform that enables developers to build successful business around their apps. • We provide Publishers with a way to monetize and Advertisers with a way to gain distribution through a native ad experience.
  • 6. Mobile Vanity Metrics • Over 1B unique devices • 1000s of Apps • > 135M Monthly Active Users • > 200GB of data ingest per week
  • 7. Agenda • About NativeX • Why Cassandra? • Challenges • Auto Id Generation • FluentCassandra • Hector • IKVM.NET • HectorNet • Reporting Integration • Data Modeling • Lessons Learned
  • 8. ©2013 NativeX Holdings, LLC Backstory • In early 2012 realized infrastructure needed to change to support high growth business. • From 100M session/quarter to 6B+. 0 1 2 3 4 5 6 7 2011 Q4 2012 Q1 2012 Q2 2012 Q3 2012 Q4 2013 Q1 2013 Q2 2013 Q3 Billions API Requests
  • 9. ©2013 NativeX Holdings, LLC Original OLTP Architecture • Microsoft SQL Server • 2 Node Cluster (Failover) • 12 cores / node • 192 GB mem / node • Compellent SAN • 172 Tiered Disk • SSD, FC, SATA
  • 10. ©2013 NativeX Holdings, LLC Objectives Scale • Horizontal • Incremental cost structure Resiliency • No single point of failure • Geographically distributed
  • 11. ©2013 NativeX Holdings, LLC What is NoSQL • Stands for Not Only SQL. • The NoSQL movement is about understanding problems and focusing on solutions. • It‟s not about silver bullets and black boxes. • It is about using the right tool for the right problem.
  • 12. ©2013 NativeX Holdings, LLC Researched Products • Compared features like: • Distributed / Shared Nothing • Multi-Cluster Support • Maturity & Popularity • Documentation • .NET Support
  • 13. ©2013 NativeX Holdings, LLC Selecting Cassandra DB Distributed Maturity High Availability Style Documentation Native Language Drivers Popularity MongoDB Yes Medium Yes Document - NoSQL Excellent Major Languages High VoltDB Yes Low Yes RDBMS - SQL Good Major Languages Low MySQL Cluster Yes High Yes RDBMS - SQL & Key/Value Excellent Major Languages Medium MySQL ScaleDB Yes Low Yes RDBMS - SQL Good Major Languages Low Cassandra Yes Medium Yes Key/Value - Column Family Excellent Major; Poor .Net High CouchDB No Medium Yes Document - NoSQL ? No - REST only Medium RavenDB Yes? Low No Document - NoSQL Poor C#, JS, REST Medium Couchbase Yes Medium Yes Key/Value - Document Good Major Languages Medium *Disclaimer, this data was complied in spring of 2012 and my not reflect the current state of each database system shown here. http://nosql.mypopescu.com/ is a helpful site for discovering and learning about different DB Systems.
  • 14. ©2013 NativeX Holdings, LLC Top Choices • MySQL Cluster • Relational and very familiar. • Has physical row limitations. • MongoDB • Data modeling was simpler than C*. • Not very clear if it had multi-cluster support. • Cassandra • At the very core it‟s all about scalability and resiliency. • Data modeling a little scary, immature .Net support.
  • 15. ©2013 NativeX Holdings, LLC Why Cassandra? • Multi-node • Multi-cluster • Highly Available • Durable • Shared Nothing • Tunable Consistency
  • 16. ©2013 NativeX Holdings, LLC Cassandra at NativeX • C* was not a replacement DB system. • We continue to use MS SQL Server alongside C*. • SQL Server used for storing configuration data. • C* solves a very specific problem for us. • Writing large volumes of data quickly. • Reading very specific data out of a large record set.
  • 17. Challenges • C* does not have Auto Id generation. • How to connect to C* with C#? • Finding a connector with good Failure Tolerance. • How to integrate our reporting system?
  • 18. ©2013 NativeX Holdings, LLC Auto ID Generation • Pre-existing requirements • Unique, 64-bit positive integers • Increasing (sortable) a plus • Previously SQL Server Identity column • A Time-based UUID is sortable and unique • Changed everything we could • The future for us
  • 19. ©2013 NativeX Holdings, LLC Auto ID – What are the options? • SQL dummy table • Easy & familiar, but limited • Pre-generated range • Proposed by Datastax‟s Architect • Distributed, but more complicated to implement • Sharding [Instagram] • Discovered too late • Unfamiliar with Postgres
  • 20. ©2013 NativeX Holdings, LLC We chose Snowflake • Built by Twitter, Apache 2.0 license • https://github.com/twitter/snowflake • “… network service for generating unique ID numbers at high scale..” • Same motivation; MySQL -> C* • A few tweaks for our Windows environment
  • 21. ©2013 NativeX Holdings, LLC Technical reasons for Snowflake • Meets all requirements • Tested in high transaction system • Java based [Scala] implementation • Thrift server • Run as a Windows service with Apache Daemon • Con: Requires Apache Zookeeper • Coordinate the worker id
  • 22. ©2013 NativeX Holdings, LLC Connecting to Snowflake • Built our own .NET Snowflake Client • Snowflake server on each web node • Local instance is primary • Round robin failover to other nodes • Auto failover AND recovery • “Circuit Breaker” pattern Web App SF Server 1 Web App SF Server 3 Web App SF Server 2 Web App SF Server 4
  • 23. Challenges • How to connect to C* with C#? • Finding a connector with good Failure Tolerance. • How to integrate our reporting system?
  • 24. ©2013 NativeX Holdings, LLC Connecting to Cassandra with C# • Thrift alone too low level • Needs • CQL support • Active development / support • Wants • ADO.NET / LINQ feel • ???? • FluentCassandra is where we started
  • 25. ©2013 NativeX Holdings, LLC Vetting FluentCassandra • Pros • Open source - https://github.com/fluentcassandra/fluentcassandra • Nick Berardi, project owner, is excellent • Designed for CQL • Familiar feel • Were able to start project development with it
  • 26. ©2013 NativeX Holdings, LLC Vetting FluentCassandra • Cons • Immaturity • Few users with high transaction system • Permanent node blacklisting • Lacked auto retry • Couldn‟t live with these limitations • Tried adding resources dedicated to maturing it
  • 27. Challenges • Finding a connector with good Failure Tolerance. • How to integrate our reporting system?
  • 28. ©2013 NativeX Holdings, LLC Hector: Yes, please • Popular C* connector • Use cases matching ours • Good maturity • Auto node discovery • Auto retry • Auto failure recovery • Written in Java – major roadblock
  • 29. ©2013 NativeX Holdings, LLC Help! • We knew we still needed help. • We found a company named Concord. • Based out of the Twin Cites. • Specialize in System, Process, and Data Integration. • http://concordusa.com/
  • 30. ©2013 NativeX Holdings, LLC Concord’s Recommendation • Concord recommended that we use IKVM.NET to port Hector to a .NET assembly. • They had previous success using IKVM for other Java to .NET ports. • They felt that maturing FluentCassandra was going to take longer than our timeline allowed.
  • 31. ©2013 NativeX Holdings, LLC About the IKVM.NET Project • http://www.ikvm.net/ • Open Source Project. • Main contributor is Jeroen Frijters. • He is actively contributing to the project. • License allows for use in commercial applications.
  • 32. ©2013 NativeX Holdings, LLC What is IKVM.NET? • IKVM.NET includes the following components: • A Java Virtual Machine implemented in .NET. • A .NET implementation of the Java class libraries. • Set of tools that enable Java and .NET interoperability.
  • 33. ©2013 NativeX Holdings, LLC Uses for IKVM • Drop-in JVM • Included is a distribution of a .NET implementation of a Java Virtual Machine. • Allows you to run jar files using the .NET stack. • Example: ikvm -jar myapp.jar
  • 34. ©2013 NativeX Holdings, LLC Uses for IKVM • Use Java libraries in your .NET applications • Using ikvmc you can compile Java bytecode to .NET IL. • Example: ikvmc -target:library mylib.jar
  • 35. ©2013 NativeX Holdings, LLC Uses for IKVM • Develop .NET applications in Java • Write code in Java. • Compile to JVM bytecode. • Use ikvmc to produce a .NET Executable. • Can also use .NET API‟s in Java code using the ikvmstub application to generate a Java jar file. • Example: ikvmstub MyDotNetAssemblyName
  • 36. ©2013 NativeX Holdings, LLC Hector Converted to .NET • Per Concord‟s recommendation we chose to compile the Hector jar into a .NET Assembly. • Hector and all of it‟s dependencies are pulled into one .NET dll that can be referenced by any .NET assembly. • In addition you will have to reference some core IKVM assemblies. • Each Java dependency is given it‟s own namespace with in the .NET dll.
  • 37. ©2013 NativeX Holdings, LLC HectorNet • Concord also created a dll called HectorNet that wraps some of the Hector behaviors and makes it feel more like .NET. • Such as supporting connection strings. • Mapping Thrift byte arrays to .NET data types. • Mapping to native .NET collections instead of using Java collections.
  • 38. Why Not DataStax C# Driver? • We built everything using CQL 2.0. • Wasn‟t ready in time for our launch date.
  • 39. Challenges • How to integrate our reporting system?
  • 40. ©2013 NativeX Holdings, LLC Integrating Reporting OLTP C* Extract Transform CUBE SSAS OLAP MS SQL Load ETL - SSIS
  • 41. ©2013 NativeX Holdings, LLC Integrating Reporting • The SSIS Extract process uses C# Script Tasks. • Script Task needs references to HectorNet and all of its dependencies. • SSIS can only reference assemblies that are in the GAC. • Assemblies in the GAC have to be Signed.
  • 42. Agenda • About NativeX • Why Cassandra? • Challenges • Auto Id Generation • FluentCassandra • Hector • IKVM.NET • HectorNet • Reporting Integration • Data Modeling • Lessons Learned
  • 43. ©2013 NativeX Holdings, LLC Data Classification • NativeX has three major classifications of data. • Configuration or Master Data • Activity Tracking • Device History
  • 44. ©2013 NativeX Holdings, LLC Configuration Data • Also referred to as Lookup Data or Master Data. • This data is relatively small in terms of record counts. • 10s – 100,000s of records not millions. • Is used to operationally run our products.
  • 45. ©2013 NativeX Holdings, LLC Configuration Data • Examples in NativeX‟s business: • Mobile Apps • Offers • Campaigns • Restrictions • Queue Settings
  • 46. Relational Data Configuration data is typically relational in nature and therefore we continue to store it in MS SQL Server.
  • 47. ©2013 NativeX Holdings, LLC C* Data Modeling Basics • Data is stored inside of Column Families using nested Key/Value Pairs. • A CF can be thought of as a Table. • They are made up of Rows and Columns. • However, CFs do not have direct relationships to each other. • You typically deal with one row at a time.
  • 48. ©2013 NativeX Holdings, LLC Rows A Row is the first level of the nested Key/Value pairs. • A Row consists of: • A Row Key (unique to the CF). • A Row Value which is 1 to many Columns. • A Row will typically represent: • Single Entity/Record. • Multiple records (known as a Wide Row CF).
  • 49. ©2013 NativeX Holdings, LLC Columns A Column is the second level of the nested Key/Value pairs. • A Column consists of: • A Column Name (Key) (unique to the Row). • A Column Value.
  • 50. ©2013 NativeX Holdings, LLC Column Name • Column Names can consists of a value of any data type. • String, Integer, Date, UUID (GUID), etc. • The Column Name is stored as part of every column. • This means it has an impact to the size of your data. • Can also use the Column Name to store data.
  • 51. ©2013 NativeX Holdings, LLC Column Value • A Column Value will typically contain: • A single value such as an integer, string, date, etc. • A whole record usually represented in XML, JSON, or some other document or object structure.
  • 52. ©2013 NativeX Holdings, LLC CF - Putting it all Together
  • 53. ©2013 NativeX Holdings, LLC Wide Row CF • A collection of like records organized into a single row. • Each record is stored as a distinct column. • Not unheard of for each row to have millions of columns. • Data is often denormalized into XML or JSON documents. • Good for storing: • Time Series Data • Event Series Data • Logging Data • Tracking Data
  • 54. ©2013 NativeX Holdings, LLC Wide Row Examples
  • 55. Agenda • About NativeX • Why Cassandra? • Challenges • Auto Id Generation • FluentCassandra • Hector • IKVM.NET • HectorNet • Reporting Integration • Data Modeling • Lessons Learned
  • 56. ©2013 NativeX Holdings, LLC Lessons Learned • Get into production early • Migration is hard • Data Import = Reality • Dev team needs to be integrated right away • Training • Operations / Troubleshooting • Understanding your I/O profile is really important • Are you sure you‟re write heavy? • Effects your hardware config, i.e. SSDs for us
  • 57. ©2013 NativeX Holdings, LLC Lessons Learned • Cluster sizing and hardware selection • Dependent on data set + workload • You might get it wrong the first time • Enterprise vs. „commodity‟ • Cassandra changes quickly • You need to keep up • Leverage mailing list, forums, release notes • Scalable systems like C* have a massive amount of knobs, you need to know them
  • 58. ©2013 NativeX Holdings, LLC Projections Month Aug-13 Sep-13 Oct-13 Nov-13 Dec-13 Jan-14 Feb-14 Mar-14 Apr-14 Pub DAU 18,500,000 28,500,000 33,500,000 38,500,000 43,500,000 48,500,000 53,500,000 58,500,000 63,500,000 Adv DAU 12,000,000 6,600,000 7,600,000 8,600,000 9,600,000 10,600,000 11,600,000 12,600,000 13,600,000 Total Devices 1,060,000,000 1,120,000,000 1,180,000,000 1,240,000,000 1,300,000,000 1,360,000,000 1,420,000,000 1,480,000,000 1,540,000,000 Nodes Need for Disk 9 12 13 15 16 17 19 20 21 Nodes Need for BF 22 16 17 18 19 20 21 22 23 Nodes Need for RR 14 16 19 22 24 27 30 33 35 Capacities Number of Nodes 30.00 Usable Space/Node (GB) 600.00 Total Usable Space (GB) 18,000.00 Memory/Node (GB) 64.00 JVM Heap Size (GB) 8.00 BF Size / Node (GB) 1.50 Replication Factor 3.00 Read Requests/Node 1,000.00 Understand which KPI represents Node capacity.
  • 59. DSE for the Win! • We use DataStax Enterprise. • Mainly for support, which continues to be a life saver.
  • 60. ©2013 NativeX Holdings, LLC Thank you! • Join the MSP C* Meetup • http://www.meetup.com/Minneapolis-St-Paul-Cassandra-Meetup/ • Contact us • Jeff.Smoley@nativex.com • Derek.Bromenshenkel@nativex.com @breakingtrail • Slide Deck • http://www.slideshare.net/jjsmoley/the-perils-and-triumphs-of- using-cassandra-at-a-netmicrosoft-shop

Editor's Notes

  1. (Transition to Challenges)
  2. External API is locked to 64-bit integer (no strings).Increasing over time helps SQL Server indexingNo Identity column in C*TimeUUIDs – now and everything in the future
  3. Solutions:SQL – table(s) with a single IDENTITY column; not infinitelyscalablePre-generated range – Matt Dennis, DataStax; ordering not guaranteed either.Instagram - http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram; slick but we found it too late.
  4. Our tweaks - apache daemon - remove internal logging - JRE version -
  5. https://github.com/twitter/snowflakeRequirements: unique, time sortable, increasing, 64-bit int, fast, distributedTwitter already using itJava – yes, MS shop, but we can handle itThrift – maybe not ideal, but it’s familiarZookeeper is a drawback – but only needed for SF startup.
  6. Still need to connect from our .NET app – rolled our ownThrift part is generated code Web app connects to local SF instance to save network hopCan failover to SF on other web nodes.Auto failover and recoveryhttp://timross.wordpress.com/2008/02/17/implementing-the-circuit-breaker-pattern-in-c-part-2/
  7. Thrift is fine, but it would be too much to overcome in terms of its low levelness with our engineers.CQL is the future; easier for us to understand anywayDon’t know what we’re doing – need supportWould like the familiar feelCQL eliminated most players. FluentCassandra had everything else. DataStax not in the game yet.
  8. Open source – we were able to contribute bug fixes; relatively activeNick very knowledgeable & responsiveIt was drop in and go for development – but that’s when we started stumbling a bit.
  9. It was drop in and go for development – but that’s when we started stumbling a bit.Resources – taking too long; distracting leads; back to drawing board.
  10. Really, the challenges is connecting WITH failure tolerance.
  11. Along the way, learned about Hector.Wanted to know more about its features re: FluentCassandra, so reasearchedVariety of use cases including high transactionMuch better advanced feature set.Java – we’re not scared of it, but also not willing to rewrite entire business app.
  12. (Transition to Reporting)
  13. Migration Schema changes A lot of data Keep it in sync (dual threads)Get into production early – dual threaded requests, worked out really well, but…Data Import = Reality – we didn’t have a complete set of data imported. It turns out that our dataset size once imported dramatically effected performance, particularly in regards to how bloom filters and the JVM work with Cassandra. Break down communication barriers – C* DojoDeveloping against C* is a paradigm shift. It takes time for developers – start them early.Understanding your IO profile is really important – This is the essence of noSQL, start here. Cassandra is best at writing Most data systems, write once – read many We’re actually a read-heavy workload.
  14. Sizing and hardware - This is why the above points are so important – increase your chances of getting it right the first time - You might get it wrong – we did – so set those expectations up front - Commodity means readily available and high value.. Doesn’t have to be ‘consumer’ grade. - Leverage cloud resources in working toward right sizing your cluster Cassandra changes quickly, you need to keep up – It’s open source and immature compared to SQL Server, MySQL, Oracle – For example, object level security was just implemented in the last version of C*Scalable systems like C* have a massive amount of knobs, you need to know them – There are hundreds, you need to expect to have someone focused on this.