The use case for Cassandra at Ping Identity

  • 825 views
Uploaded on

The Use Case for Cassandra at Ping Identity …

The Use Case for Cassandra at Ping Identity
How and why Ping Identity uses Cassandra database inside PingOne.

By
Michael Ward, Site Reliability Engineer, On-Demand
Ping Identity
mward@pingidentity.com
@devoperandi

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
825
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
10
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • History:Cassandra like most things at Ping started out as a trial run. We implemented reporting for PingOne on Cassandra and let it bake.And we wanted to see what direction it was going, get our feet wet and see how it fit in with existing and future projects. Experimenting on MongoDB. Great debate between Cassandra and MongoDbCassandra won due to write anywhere technologyMore servers, smaller capacityGeographic distribution for data redundancy, availability and performanceHorizontal scalabilityNo single point of failure
  • Remember to mention our migration from Mongo by year endHaven’t performed this migration yet
  • Why? Built to provide insight into PingOneWhy?SaaS applications are known for not providing logging and reporting information into their customers. We wanted to change that. And we continue building out this functionality out.Reports range from Number of success and failed SSOs. Unique user access per application over any period of time back to a year.Same schema – Use case still fitsClient = Hector of thrift api
  • Requirements:Geographically distributedRespectable performanceNo updates or deletes (repairs suck)Benefits:Easy management due to requirements for the clusterLimitations:One big ringWrites could start in DC1 and actually write to DC2Lopsided dataNo compressionReads were slowNodes recover over the WANLack of Security
  • So this upgrade happened in two parts:First to v0.8Second to 1.1.2After upgrading the cluster in place we found out this wasn’t a good ideaWe missed out on compressionOur data was still not really evenly distributedReplication was set to one per DC
  • Started with 9 nodes in the cluster with intent to horizontally scale25-35% performance improvement on reads5-10% performance improvement on writesCompression enabled 50% reduction in data sizeToken offsetsbetter data distributionnode recovery happened locallyMultiple replicas means always read locallyFirst write always happens locally thus faster response back to applicationLimitations
  • Traffic first directed at old clusterTake snapshot of clusterPush to new clusterCopy Schema from old cluster to new clusterAdd Snappy CompressionBulk load into new clusterSwitch Traffic to new ClusterReplay logs from central log server from bulk load timeCompression We chose to stream the data into a new cluster to allow for compression. Steps: tar up snaptshot push to new cluster stream in using bulk loader Because we did this during the day, we new consistency between the clusters would fall behind. We allowed this because we are capable replaying this into the cluster after the switch.
  • Here is what our Reporting Cluster looks like on the front end
  • New ClusterMuch easier to implementNo Manual token generationMore efficient memory utilizationImplemented Secondary Indexes Better data distribution via VnodesDevs wanted to take advantage of CQL3, implement Astyanax client, Atomic BatchesOps wanted Internal authPerformance Boosts in v1.2:Reduced memory footprint by partition summary (last on-heap memory structure)15% read performance increase by including ‘USETLAB JVM flag’, (localizes object allocation in memory) https://blogs.oracle.com/jonthecollector/entry/the_real_thingAuto Token Generation Just set the number of token ranges you want per serverData Distribution More token ranges = less likely to have unbalanced clusterMemory UtilizationMoved compression metadata and bloomfilters off-heapAtomic batches If one is successful they all areRequest Tracing Allows for performance testing of individual queries against the databaseAuthentication/Authorization Hey security around the cluster. Go figure.Less manual cluster rebalance is using something other than random partitioner
  • We aren’t currently performing any Row CacheThe number of replicas per datacenter can actually reduce the effectiveness of Row Caching

Transcript

  • 1. Copyright ©2013 Ping Identity Corporation. All rights reserved. Site Reliability Engineering
  • 2. Copyright ©2013 Ping Identity Corporation. All rights reserved. • We believe secure professional and personal identities underlie human progress in a connected world. Our purpose is to enable and protect identity, defend privacy and secure the Internet. • Over 1,000 companies, including over half of the Fortune 100, rely on our award- winning products to make the digital world a better experience for hundreds of millions of people. • Denver, Colorado. Est. 2003 About Ping Identity
  • 3. Copyright ©2013 Ping Identity Corporation. All rights reserved. Design Philosophy Memory: 6-8GB CPU: 2 Disk: 30GB More servers, smaller capacity Geographic distribution for data redundancy, availability and performance Horizontal scalability No single point of failure
  • 4. Copyright ©2013 Ping Identity Corporation. All rights reserved. Cassandra at Ping….taking the plunge Current: PingOne reporting PingOne for Groups Future goals: Migration from Mongo (EOY) Some migration from MySQL Real Time Analytics (innovation project)
  • 5. Copyright ©2013 Ping Identity Corporation. All rights reserved. Cassandra Reporting Cluster • Built to provide customer insight into PingOne • First dive into Cassandra v0.7 (production) • Upgrade (in place) to v1.1.2 • v0.8 as stepping stone • In production today as v1.1.9
  • 6. Copyright ©2013 Ping Identity Corporation. All rights reserved. Cassandra Reporting Cluster (v0.7) DC1 DC2
  • 7. Copyright ©2013 Ping Identity Corporation. All rights reserved. Where did we miss? (v1.1.2) Upgrade in place Missed out on compression Failed to gain read performance Data wasn’t spread evenly across the cluster
  • 8. Copyright ©2013 Ping Identity Corporation. All rights reserved. Cassandra Reporting (v1.1.9) DC1 DC2 DC3 0 10 20 Features: 9 nodes Compression Token Offsets 2 Replicas per DC Off-Heap caching Server-to-Server SSL Limitations: No Access Control Manual Token Generation Node recovery 1 replica from each token range DC1 DC2 DC3
  • 9. Copyright ©2013 Ping Identity Corporation. All rights reserved. central logs Snapshots to new cluster Bulkload snapshots into cluster to gain compressionv1.1.2 v1.1.9 Create Tables with: compression={'sstable_compression': 'SnappyCompressor'}; Replay reporting gap to new cluster log transform Traffic switch after bulkload Migration from 1.1.2 to 1.1.9
  • 10. Copyright ©2013 Ping Identity Corporation. All rights reserved. PingOne Reports
  • 11. Copyright ©2013 Ping Identity Corporation. All rights reserved. PingOne for Groups Cassandra v1.2.5 Features: Auto Token Generation Vnodes Descent data distribution More efficient memory utilization Atomic Batches Secondary indexes Request Tracing Internal Authentication/Authorization
  • 12. Copyright ©2013 Ping Identity Corporation. All rights reserved. .setDiscoveryType(NodeDiscoveryType.RING_DESCRIBE) .setCqlVersion("3.0.0") .setTargetCassandraVersion("1.2”) .setRetryPolicy(retryPolicy); .setConnectTimeout(2000); RUN_ONCE Astyanax config
  • 13. Copyright ©2013 Ping Identity Corporation. All rights reserved. Limitations (v1.2.5) Caching # Replicas Size-Tiered Compaction
  • 14. Copyright ©2013 Ping Identity Corporation. All rights reserved. Questions?
  • 15. Copyright ©2013 Ping Identity Corporation. All rights reserved. Thanks! mward@pingidentity.com @devoperandi www.pingidentity.com/blogs http://status.pingidentity.com http://uptime.pingidentity.com