CONFIDENTIAL
1
Praveen Kumar
Emerging Software Platforms,
Global Software Engineering
Mar 2014
Equinix Big Data Platform &...
Confidential – © 2013 Equinix Inc. www.equinix.com 2
Big Data at Equinix
~2 million
Alarms
~200k
interconnections
~250k
El...
Confidential – © 2013 Equinix Inc. www.equinix.com 3
Big Data at Equinix
Sensors across 95+ IBXs
Lead to / produce
Support...
Confidential – © 2013 Equinix Inc. www.equinix.com 4
Big Data at Equinix
What do we use(or plan to use) this data for?
Cus...
Confidential – © 2013 Equinix Inc. www.equinix.com 5
Big Data at Equinix
Use-case analysis : 80-20 rule
~80% of use-cases ...
Confidential – © 2013 Equinix Inc. www.equinix.com 6
Big Data at Equinix
Our Approach : Equinix Big Data Platform
 Common...
Confidential – © 2013 Equinix Inc. www.equinix.com 7
Big Data at Equinix
Documents
Sensors
Requirements & Technologies con...
Confidential – © 2013 Equinix Inc. www.equinix.com 8
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enterp...
Confidential – © 2013 Equinix Inc. www.equinix.com 9
Criteria Cassandra HBase
CAP Theorem Focus Availability, Partition-To...
Confidential – © 2013 Equinix Inc. www.equinix.com 10
Why DSE Cassandra
Big Data at Equinix
Support for Analytics
Integrat...
Confidential – © 2013 Equinix Inc. www.equinix.com 11
Big Data at Equinix
Grand Finale
Hadoop Ecosystem vs. DataStax Enter...
Confidential – © 2013 Equinix Inc. www.equinix.com 12
Big Data at Equinix
How far are we on our Big Data journey?
 Pilot ...
Confidential – © 2013 Equinix Inc. www.equinix.com 13
Big Data at Equinix
Experience so far
Lack of standards based connec...
Confidential – © 2013 Equinix Inc. www.equinix.com 14
Big Data at Equinix
Where do we go from here??
Graph databases
Batch...
CONFIDENTIAL
15
Thank you!
• pkmr.work@gmail.com
• pkumar@equinix.com
• www.equinix.com
EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 17
WHO IS EQUINIX?
Confidential – © 2013 Equinix Inc. www.equinix.com 18
GLOBAL
DATA CENTERS
95+ Data Centers
9M+ Square Feet
99.999% Uptime ...
Solid. Powerful. Growing.
$1.8B
IN ANNUALIZED
REVENUE
MEMBER OF THE NASDAQ 100
$7B
INVESTMENTS
IN EXPANSION
15 COUNTRIES
5 CONTINENTS
31 MARKETS
Confidential – © 2013 Equinix Inc. www.equinix.com 21
HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT
Where You Are. Where You Need...
90%
PASS THROUGH EQUINIX DATA CENTERS
OVER
OF INTERNET ROUTES
950+NETWORK PROVIDERS
450+
CLOUD & SaaS
PROVIDERS
CONFIDENTIAL
24
Thank you!
• pkmr.work@gmail.com
• pkumar@equinix.com
• www.equinix.com
Upcoming SlideShare
Loading in...5
×

Equinix Big Data Platform and Cassandra - A view into the journey

1,551

Published on

Story of building Big Data Platform in Equinix to cater a number of use cases. It explains journey and selection of Cassandra for NoSQL solution sitting in the heart of the platform. Storm , flume, AMQ, Drools, Solr technologies playing an important role in the platform. Platform processing large amounts of data in real-time.

Published in: Technology
1 Comment
2 Likes
Statistics
Notes
  • Big Data folks like this guy here reminds me so called Y2K professionals who got 3 months training and they are Engineers! Same with some Java folks. No Computer science background and they start writing code without understanding what Stack is!

    Anyway, look for any Big Data ecosystem and is a boilerplate of one presented by Forrester blog. You have all these open source tools developed by few smart engineers and you have other folks like this presenter making career out of these free tools!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
1,551
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Equinix Big Data Platform and Cassandra - A view into the journey

  1. 1. CONFIDENTIAL 1 Praveen Kumar Emerging Software Platforms, Global Software Engineering Mar 2014 Equinix Big Data Platform & Cassandra
  2. 2. Confidential – © 2013 Equinix Inc. www.equinix.com 2 Big Data at Equinix ~2 million Alarms ~200k interconnections ~250k Electrical circuits Sensors across 95+ IBXs ~40k Infrastructure objects
  3. 3. Confidential – © 2013 Equinix Inc. www.equinix.com 3 Big Data at Equinix Sensors across 95+ IBXs Lead to / produce Support for multiple protocols Push as well pull methods Time series data Cross sectional dataNot so clean data High velocity Clean data Lots and lots of noise Some useful intel
  4. 4. Confidential – © 2013 Equinix Inc. www.equinix.com 4 Big Data at Equinix What do we use(or plan to use) this data for? Customer Presentment Billing Operations New Product & Services
  5. 5. Confidential – © 2013 Equinix Inc. www.equinix.com 5 Big Data at Equinix Use-case analysis : 80-20 rule ~80% of use-cases analyzed act upon “Hot Data” ~80% of data for most of use-cases analyzed is time-series. All “quick win” use-cases need data mediation, aggregation and roll-up for presentment. Real-time to near real-time processing of events Collection, processing and storage technologies suitable for time-series data. Collection, mediation, cross-referencing and co-relation of data from different sources; roll-up and aggregate.
  6. 6. Confidential – © 2013 Equinix Inc. www.equinix.com 6 Big Data at Equinix Our Approach : Equinix Big Data Platform  Common platform to be shared by all initial Big Data use cases – multi tenancy  Built on inexpensive hardware using free or inexpensive software  Seamless & massive scalability using scale-out  High reliability - partial failover, graceful degradation, self-healing, self-balancing  Data ingestion and processing capabilities for high volumes at high velocity  Support for structured and semi-structured data  Provides real-time processing abilities  Provides parallel processing capabilities  Support for low latency queries, wide range scan queries and search  Provides abstraction via connectors, frameworks and libraries  Support for low latency queries, wide range scan queries and search  Support for predictive analytics using machine learning Immediate requirements Long term goals Data Sources Java Messages Flat Files FTP Log Streams RDBMS JSON Files Files (Unstructured) Equinix Big Data Platform Ingestion Layer Connector Parser Data Processor Writer Real-time Processing Layer Repository Raw Data Processed/Derived Data Parallel Processing Layer Reconciliator Deep Analyzer Real-time monitoring Real-time Predictive Analytics Access Layer Low latency Ad-hoc access Batch frameworkLarge range data access Big Data Platform - Logical Architecture (technology agnostic)
  7. 7. Confidential – © 2013 Equinix Inc. www.equinix.com 7 Big Data at Equinix Documents Sensors Requirements & Technologies considered for Big Data Platform Data Sources Data Collection & Ingestion Data Processing & Storage Data Intelligence Data Visualization Sales Cloud Service Cloud On Premise Apps Equinix Custom Apps Oracle eBiz, Siebel…. Equinix Custom Apps Real-time Analytics Ad-hoc Analysis Dashboards Log Analysis Bulk/ Trend Analysis Data Ingestion capabilities • Scale-out System • Real-time validation • Real-time analytics • Supports stream, batch, extraction on industry standard protocols Data Formats / Types • System & App Logs • Usage Data (Time-series) • Behavior tracking events • Complex business events • Transactional & operational • Master & meta data Apache Kafka Apache Scribe Apache Flume Batch Reporting Predictive Modeling Alerts & Notifications Search • Machine learning • Pattern detection • Regression analysis • Time-series analysis • Statistical modeling • Clustering • Classification • Recommendation engine • Parallel processing capabilities • Scale-out System • Runs on inexpensive HW • High availability • Supports structured, semi- structured & unstructured data. • Fast write-speed • NoSQL capabilities • Time-series data support • Data mart capability • Relational schema support • Self-healing capability
  8. 8. Confidential – © 2013 Equinix Inc. www.equinix.com 8 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop Distributed File System (Storage/Analytics) NameNode Secondary Name Node Data Nodes (Storage) HBase (Storage/Analytics) Hbase Master Hbase Region Servers Hbase Master Search Management Services Cloudera Manager Solr Nodes Zookeeper Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system maturity • Technical support Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership Cons • Industry momentum • Product eco-system maturity
  9. 9. Confidential – © 2013 Equinix Inc. www.equinix.com 9 Criteria Cassandra HBase CAP Theorem Focus Availability, Partition-Tolerance Consistency, Availability Data Partitioning Supports ordered & random partitioning, random partitioning is recommended. Ordered Partitioning. Load balancing achieved through resharding. Distributed System P2P architecture (Amazon Dynamo) Master / Slave via HDFS, Zookeeper for coordination Administration & Maintenance Medium High Single Write Master No (R+W+1 to get Strong Consistency) Yes Multi-tenancy Yes Yes Secondary indexes Supports secondary indexes on CF where column name is known. Does not natively support secondary indexes. Consistency Tunable Consistency Strict consistency (Not ACID) Hot Spot Problem No, distributes load across nodes using random partition strategy. Yes, one node may handle most of the traffic due to ordered partition. Multi-Data Center Support and Disaster Recovery Asynchronous replication via WAN Asynchronous replication via WAN Single point of failure Ring topology, there is no single point of failure. Although there exists a concept of a master server, HBase itself does not depend on it heavily. HBase cluster can keep serving data even if the master goes down. Hadoop namenode is a single point of failure. Commercial vendors Datastax, Acunu Clodera, Hortonworks Cassandra Vs. HBase Big Data at Equinix
  10. 10. Confidential – © 2013 Equinix Inc. www.equinix.com 10 Why DSE Cassandra Big Data at Equinix Support for Analytics Integrated search using Solr Security features Cluster management capabilities Commercial support DataStax would probably list lots of more reasons, these are the reasons which made sense to us.
  11. 11. Confidential – © 2013 Equinix Inc. www.equinix.com 11 Big Data at Equinix Grand Finale Hadoop Ecosystem vs. DataStax Enterprise SearchSearch SearchSearch AnalyticsAnalytics StorageStorageAnalyticsAnalytics StorageStorage StorageStorage Hadoop Distributed File System (Storage/Analytics) NameNode Secondary Name Node Data Nodes (Storage) HBase (Storage/Analytics) Hbase Master Hbase Region Servers Hbase Master Search Management Services Cloudera Manager Solr Nodes Zookeeper Pros • Scalability • Cloud readiness • Resource availability • Industry momentum • Product eco-system maturity • Technical support Cons • Infrastructure footprint • Operational Complexity • Learning curve • Availability • Total cost of ownership Pros • Infrastructure footprint • Operational ease • Scalability • Availability • Cloud readiness • Learning curve • Resource availability • Technical support • Total cost of ownership Cons • Industry momentum • Product eco-system maturity  Sold
  12. 12. Confidential – © 2013 Equinix Inc. www.equinix.com 12 Big Data at Equinix How far are we on our Big Data journey?  Pilot use-case from PoC to Production  Moved network statistics use case from RRD based solution to DSE Cassandra  Build in progress for  power monitoring use cases  data center monitoring  network monitoring In-plans  Recommendation engine on interconnection platform  Use case analysis and technology selection for connected data sets  Building data science capabilities for use cases requiring predictive modeling A few data points Physical bare metal boxes for DSE nodes Densely packed data nodes with 4TB storage on each node, 96GB RAM About ~250 million records a day Also used for log analysis for internal IT systems monitoring use-cases
  13. 13. Confidential – © 2013 Equinix Inc. www.equinix.com 13 Big Data at Equinix Experience so far Lack of standards based connectors / drivers DataStax has developed a Java Driver, but doesn’t support JDBC No data visualization tools to access from Cassandra for low-latency access No data access tools (Toad equivalent) available yet Datastax DevCenter is trying to solve this problem We used Astyanax and are evaluating DataStax java driver built libraries to abstract Astyanax for application engineering teams built rest services for data access by applications
  14. 14. Confidential – © 2013 Equinix Inc. www.equinix.com 14 Big Data at Equinix Where do we go from here?? Graph databases Batch processing (Hadoop, Spark , MapReduce ??) Interactive queries Online data processing Data analytics Data science and machine learning Data visualization tools and applications Developer toolkits We are hiring Big Data Engineers Data Scientists send resume at pkumar@equinix.com
  15. 15. CONFIDENTIAL 15 Thank you! • pkmr.work@gmail.com • pkumar@equinix.com • www.equinix.com
  16. 16. EQUINIX?
  17. 17. Confidential – © 2013 Equinix Inc. www.equinix.com 17 WHO IS EQUINIX?
  18. 18. Confidential – © 2013 Equinix Inc. www.equinix.com 18 GLOBAL DATA CENTERS 95+ Data Centers 9M+ Square Feet 99.999% Uptime Record INTERCONNECTION 950+ Networks 110,000+ Cross Connects BUSINESS ECOSYSTEMS Equinix Marketplace™ 4,000+ Businesses Revenue Opportunities MOVING TOWARDS THE FUTURE | PLATFORM Equinix: A Platform for Growth
  19. 19. Solid. Powerful. Growing. $1.8B IN ANNUALIZED REVENUE MEMBER OF THE NASDAQ 100 $7B INVESTMENTS IN EXPANSION
  20. 20. 15 COUNTRIES 5 CONTINENTS 31 MARKETS
  21. 21. Confidential – © 2013 Equinix Inc. www.equinix.com 21 HOW WE’RE DIFFERENT | GLOBAL FOOTPRINT Where You Are. Where You Need To Be.
  22. 22. 90% PASS THROUGH EQUINIX DATA CENTERS OVER OF INTERNET ROUTES 950+NETWORK PROVIDERS
  23. 23. 450+ CLOUD & SaaS PROVIDERS
  24. 24. CONFIDENTIAL 24 Thank you! • pkmr.work@gmail.com • pkumar@equinix.com • www.equinix.com

×