Your SlideShare is downloading. ×
Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Cassandra Day SV 2014: Apache Cassandra at Equinix for High Performance, Scalability and Short Response Time

321
views

Published on

In this session, Praveen will be presenting Equinix's big data platform and how Cassandra sits at the center of it.

In this session, Praveen will be presenting Equinix's big data platform and how Cassandra sits at the center of it.

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
321
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
9
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CONFIDENTIAL 1 Praveen Kumar Emerging Software Platforms, Global Software Engineering Mar 2014 Equinix Big Data Platform & Cassandra
  • 2. Confidential – © 2013 Equinix Inc. www.equinix.com 2 Big Data at Equinix ~2 million Alarms ~200k interconnections ~250k Electrical circuits Sensors across 95+ IBXs ~40k Infrastructure objects
  • 3. Confidential – © 2013 Equinix Inc. www.equinix.com 3 Big Data at Equinix Sensors across 95+ IBXs Lead to / produce Support for multiple protocols Push as well pull methods Time series data Cross sectional dataNot so clean data High velocity Clean data Lots and lots of noise Some useful intel
  • 4. Confidential – © 2013 Equinix Inc. www.equinix.com 4 Big Data at Equinix What do we use(or plan to use) this data for? Customer Presentment Billing Operations New Product & Services
  • 5. Confidential – © 2013 Equinix Inc. www.equinix.com 5 Big Data at Equinix Use-case analysis : 80-20 rule ~80% of use-cases analyzed act upon “Hot Data” ~80% of data for most of use-cases analyzed is time-series. All “quick win” use-cases need data mediation, aggregation and roll-up for presentment. Real-time to near real-time processing of events Collection, processing and storage technologies suitable for time-series data. Collection, mediation, cross-referencing and co-relation of data from different sources; roll-up and aggregate.
  • 6. Confidential – © 2013 Equinix Inc. www.equinix.com 6 Big Data at Equinix Our Approach : Equinix Big Data Platform §  Common platform to be shared by all initial Big Data use cases – multi tenancy §  Built on inexpensive hardware using free or inexpensive software §  Seamless & massive scalability using scale-out §  High reliability - partial failover, graceful degradation, self-healing, self-balancing §  Data ingestion and processing capabilities for high volumes at high velocity §  Support for structured and semi-structured data §  Provides real-time processing abilities §  Provides parallel processing capabilities §  Support for low latency queries, wide range scan queries and search §  Provides abstraction via connectors, frameworks and libraries §  Support for low latency queries, wide range scan queries and search §  Support for predictive analytics using machine learning Immediate requirements Long term goals Big Data Platform - Logical Architecture (technology agnostic)
  • 7. Confidential – © 2013 Equinix Inc. www.equinix.com 7 Big Data at Equinix Requirements & Technologies considered for Big Data Platform
  • 8. Confidential – © 2013 Equinix Inc. www.equinix.com 8 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop  Distributed  File  System (Storage/Analytics) NameNode Secondary  Name  Node Data  Nodes  (Storage) HBase  (Storage/Analytics) Hbase  Master Hbase  Region  Servers Hbase  Master Search Management   Services Cloudera  Manager Solr  Nodes Zookeeper Pros •  Scalability •  Cloud readiness •  Resource availability •  Industry momentum •  Product eco-system maturity •  Technical support Cons •  Infrastructure footprint •  Operational Complexity •  Learning curve •  Availability •  Total cost of ownership Pros •  Infrastructure footprint •  Operational ease •  Scalability •  Availability •  Cloud readiness •  Learning curve •  Resource availability •  Technical support •  Total cost of ownership Cons •  Industry momentum •  Product eco-system maturity
  • 9. Confidential – © 2013 Equinix Inc. www.equinix.com 9 Criteria   Cassandra   HBase   CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability Data Partitioning Supports ordered & random partitioning, random partitioning is recommended. Ordered Partitioning. Load balancing achieved through resharding. Distributed System P2P architecture (Amazon Dynamo) Master / Slave via HDFS, Zookeeper for coordination Administration & Maintenance Medium High Single Write Master No (R+W+1 to get Strong Consistency) Yes Multi-tenancy Yes Yes Secondary indexes Supports secondary indexes on CF where column name is known. Does not natively support secondary indexes. Consistency Tunable Consistency Strict consistency (Not ACID) Hot Spot Problem No, distributes load across nodes using random partition strategy. Yes, one node may handle most of the traffic due to ordered partition. Multi-Data Center Support and Disaster Recovery Asynchronous replication via WAN Asynchronous replication via WAN Single point of failure Ring topology, there is no single point of failure. Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure. Commercial vendors Datastax, Acunu Clodera, Hortonworks Cassandra Vs. HBase Big Data at Equinix
  • 10. Confidential – © 2013 Equinix Inc. www.equinix.com 10 Why DSE Cassandra Big Data at Equinix Support for Analytics Integrated search using Solr Security features Cluster management capabilities Commercial support DataStax would probably list lots of more reasons, these are the reasons relevant to us.
  • 11. Confidential – © 2013 Equinix Inc. www.equinix.com 11 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop  Distributed  File  System (Storage/Analytics) NameNode Secondary  Name  Node Data  Nodes  (Storage) HBase  (Storage/Analytics) Hbase  Master Hbase  Region  Servers Hbase  Master Search Management   Services Cloudera  Manager Solr  Nodes Zookeeper Pros •  Scalability •  Cloud readiness •  Resource availability •  Industry momentum •  Product eco-system maturity •  Technical support Cons •  Infrastructure footprint •  Operational Complexity •  Learning curve •  Availability •  Total cost of ownership Pros •  Infrastructure footprint •  Operational ease •  Scalability •  Availability •  Cloud readiness •  Learning curve •  Resource availability •  Technical support •  Total cost of ownership Cons •  Industry momentum •  Product eco-system maturity ü Sold
  • 12. Confidential – © 2013 Equinix Inc. www.equinix.com 12 Big Data at Equinix How far are we on our Big Data journey? ü  Pilot use-case from PoC to Production ü  Moved network statistics use case from RRD based solution to DSE Cassandra ü  Build in progress for §  power monitoring use cases §  data center monitoring §  network monitoring In-plans Ø  Recommendation engine on interconnection platform Ø  Use case analysis and technology selection for connected data sets Ø  Building data science capabilities for use cases requiring predictive modeling A few data points Physical bare metal boxes for DSE nodes Densely packed data nodes with 4TB storage on each node, 96GB RAM About ~250 million records a day Also used for log analysis for internal IT systems monitoring use-cases
  • 13. Confidential – © 2013 Equinix Inc. www.equinix.com 13 Big Data at Equinix Experience so far Lack of standards based connectors / drivers DataStax has developed a Java Driver, but doesn’t support JDBC No data visualization tools to access from Cassandra for low-latency access No data access tools (Toad equivalent) available yet; DevCenter is not there yet We used Astyanax and are evaluating DataStax java driver built libraries to abstract Astyanax for application engineering teams built rest services for data access by applications Good reliability Not many instances of nodes being down Handled loads even when nodes were down
  • 14. Confidential – © 2013 Equinix Inc. www.equinix.com 14 Big Data at Equinix Where do we go from here?? Graph databases Batch processing (Hadoop, Spark , MapReduce ??) Interactive queries Online data processing Data analytics Data science and machine learning Data visualization tools and applications Developer toolkits We are hiring Big Data Architect Big Data Engineers Data Scientists send resume at pkumar@equinix.com
  • 15. CONFIDENTIAL 15 Thank you! •  pkmr.work@gmail.com •  pkumar@equinix.com •  www.equinix.com
  • 16. EQUINIX?
  • 17. Confidential – © 2013 Equinix Inc. www.equinix.com 17 WHO IS EQUINIX?
  • 18. Confidential – © 2013 Equinix Inc. www.equinix.com 18 GLOBAL DATA CENTERS 95+ Data Centers 9M+ Square Feet 99.999% Uptime Record INTERCONNECTION 950+ Networks 110,000+ Cross Connects BUSINESS ECOSYSTEMS Equinix Marketplace™ 4,000+ Businesses Revenue Opportunities MOVING TOWARDS THE FUTURE | PLATFORM Equinix: A Platform for Growth
  • 19. Solid. Powerful. Growing. $1.8B IN ANNUALIZED REVENUE MEMBER OF THE NASDAQ 100 $7B INVESTMENTS IN EXPANSION
  • 20. 15 COUNTRIES 5 CONTINENTS 31 MARKETS
  • 21. Confidential – © 2013 Equinix Inc. www.equinix.com 21 HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT Where You Are. Where You Need To Be.
  • 22. 90% PASS THROUGH EQUINIX DATA CENTERS OVER OF INTERNET ROUTES 950+NETWORK PROVIDERS
  • 23. 450+ CLOUD & SaaS PROVIDERS
  • 24. CONFIDENTIAL 24 Thank you! •  pkmr.work@gmail.com •  pkumar@equinix.com •  www.equinix.com