• Like

Rackspace Analytical Compute Grid (ACG)

  • 927 views
Uploaded on

Rackspace’s Enterprise Business Intelligence group (EBI) was looking for a cost-effective way to support the reporting and information needs of its internal users, which include business and …

Rackspace’s Enterprise Business Intelligence group (EBI) was looking for a cost-effective way to support the reporting and information needs of its internal users, which include business and operations personnel. It was also looking to scale out new infrastructure in order to meet their increasing business demands, house increasing amounts of data, and customize the collection of data, while seeking a way to move away from their legacy Data Warehouse solution. To do this, Rackspace built the Analytical Compute Grid (ACG) by using Hadoop, Cassandra and PostgreSQL with an OpenStack cloud. Read more about it in this presentation.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
927
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
23
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big Data on Open CloudAnalytical Compute Grid (ACG)Elastic “Big Data” Infrastructure by Natasha GajicMarch 1, 2013
  • 2. Rackspace’s EBI EnvironmentCurrent Environment “Big Data” Problem  Windows and Linux  Cost of purchasing operating systems additional licenses  Oracle and Microsoft  Time required to set up databases solutions new hardware  Microsoft and Oracle  Increased demand for DBA replication technology resources  SSIS  System performance  Informatica  System scalability  Dedicated servers  Capacity  Rapid data set growth RACKSPACE® HOSTING | WWW.RACKSPACE.COM 2
  • 3. Analytical Compute Grid (ACG) Features• Host ever growing set of data• Quick data collection and retrieval• Rapid scalability• Ease of maintenance• Provide standard data access API RACKSPACE® HOSTING | WWW.RACKSPACE.COM 3
  • 4. Analytical Compute Grid (ACG) Features• Ability to provide variety of storage types: • Columnar • Relational • HDFS• Enable users to select optimal storage type for information collected• Leverage Rackspace® Private Cloud powered by OpenStack® and open source technology RACKSPACE® HOSTING | WWW.RACKSPACE.COM 4
  • 5. Analytical Compute Grid (ACG) Quality Attributes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 5
  • 6. ACG on Rackspace® PrivateCloud powered by OpenStack® High Level Architecture RACKSPACE® HOSTING | WWW.RACKSPACE.COM 6
  • 7. ACG on Rackspace® Private Cloud powered by OpenStack® RACKSPACE® HOSTING | WWW.RACKSPACE.COM 7
  • 8. ACG on Rackspace® Private Cloud powered by OpenStack®Image RACKSPACE® HOSTING | WWW.RACKSPACE.COM 8
  • 9. ACG on Rackspace® Private Cloud powered by OpenStack®Database Engine Selection Columnar Cassandra Relational PostgreSQL HDFS Hadoop RACKSPACE® HOSTING | WWW.RACKSPACE.COM 9
  • 10. ACG on Rackspace® Private Cloud powered by OpenStack®Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 10
  • 11. ACG on Rackspace® Private Cloud powered by OpenStack®Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 11
  • 12. ACG on Rackspace® Private Cloud powered by OpenStack®Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 12
  • 13. ACG on Rackspace® Private Cloud powered by OpenStack®Node RACKSPACE® HOSTING | WWW.RACKSPACE.COM 13
  • 14. ACG on Rackspace® Private Cloud powered by OpenStack®Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 14
  • 15. ACG on Rackspace® Private Cloud powered by OpenStack®Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 15
  • 16. ACG on Rackspace® Private Cloud powered by OpenStack®Controller RACKSPACE® HOSTING | WWW.RACKSPACE.COM 16
  • 17. ACG on Rackspace® Private Cloud powered by OpenStack®API RACKSPACE® HOSTING | WWW.RACKSPACE.COM 17
  • 18. ACG on Rackspace® PrivateCloud powered by OpenStack® Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 18
  • 19. ACG on Rackspace® Private Cloud powered by OpenStack®Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 19
  • 20. ACG on Rackspace® Private Cloud powered by OpenStack®Indexing Structure What is ACG Indexing Structure? • System entry point • Set of pointers ultimately addressing database entities RACKSPACE® HOSTING | WWW.RACKSPACE.COM 20
  • 21. ACG on Rackspace® Private Cloud powered by OpenStack®Indexing Structure What is ACG Indexing Structure? • System entry point • Set of pointers ultimately addressing database entities Where is Indexing Structure Located? • It is a part of ACG so it resides on Open Cloud • ACG Controller manages Indexing Structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 21
  • 22. ACG on Rackspace® Private Cloud powered by OpenStack®Indexing Structure What ACG Indexing Structure Enables? • Splitting of large data sets across many instances • Query parallelization • Controlled data store size • Optimal data store configuration • Uniform access to data residing in various storage types • System scalability as it expands horizontally and vertically to address ever growing data set RACKSPACE® HOSTING | WWW.RACKSPACE.COM 22
  • 23. ACG on Rackspace® PrivateCloud powered by OpenStack® Quality Attributes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 23
  • 24. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes - PerformanceRackspace® Private Cloudpowered by OpenStack®Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs RACKSPACE® HOSTING | WWW.RACKSPACE.COM 24
  • 25. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes - PerformanceRackspace® Private Cloudpowered by OpenStack®Creates ACG node in 30 secondsCreates ACG nodes concurrentlyRe-size ACG nodes adding CPUs ACGIndexing structure and controlleddata set size allow for: Quick data distribution Query parallelization RACKSPACE® HOSTING | WWW.RACKSPACE.COM 25
  • 26. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – AvailabilityRackspace® Private Cloudpowered by OpenStack®Rapidly replace failed ACG nodes RACKSPACE® HOSTING | WWW.RACKSPACE.COM 26
  • 27. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – AvailabilityRackspace® Private Cloudpowered by OpenStack®Rapidly replace failed ACG nodes ACGDeploys data store nativeavailability mechanisms(replication, data distribution…) RACKSPACE® HOSTING | WWW.RACKSPACE.COM 27
  • 28. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – MaintainabilityRackspace® Private Cloudpowered by OpenStack®Adding ACG nodes expands: Storage capacity CPU power MemoryNo DBA or system administratorsactivity required RACKSPACE® HOSTING | WWW.RACKSPACE.COM 28
  • 29. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – MaintainabilityRackspace® Private Cloudpowered by OpenStack®Adding ACG nodes expands: Storage capacity CPU power RAMNo DBA or system administratorsactivity required ACGControlled data set size enables: Optimal and stable data storeconfiguration Reducing demand for managingdata store objects Stable query execution plans RACKSPACE® HOSTING | WWW.RACKSPACE.COM 29
  • 30. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – Flexibility ACGVariety of storage types:Columnar – Cassandra : time seriesdataRelational – PostgreSQL : relational dataHDFS – Hadoop : un-structured dataAbility to select optimal storage typefor individual use case RACKSPACE® HOSTING | WWW.RACKSPACE.COM 30
  • 31. ACG on Rackspace® Private Cloud powered by OpenStack®Quality Attributes – Usability ACGStandard interfaces: SQL language JDBC API ODBCACG Management ConsoleACG Monitoring ConsoleLoader utility implementing: Bulk Loader Insert Loader RACKSPACE® HOSTING | WWW.RACKSPACE.COM 31
  • 32. ACG on Rackspace® PrivateCloud powered by OpenStack® Current State RACKSPACE® HOSTING | WWW.RACKSPACE.COM 32
  • 33. ACG on Rackspace® Private Cloud powered by OpenStack®Current State Columnar Relational HDFS ACG Controller Implementation Implementation Implementation• ACG Manager • Data Store • Data Store • Will start soon• Rule Engine Controller Controller• Node • JDBC • JDBC driver Manager extended to extended with• ACG work with distributed Management supercolumn query rewrite Console • Loader • Loader• ACG integrated integrated Monitoring with with Informatica Informatica • ODBC (In Progress) RACKSPACE® HOSTING | WWW.RACKSPACE.COM 33
  • 34. ACG on Rackspace® PrivateCloud powered by OpenStack® Rackspace Use Case RACKSPACE® HOSTING | WWW.RACKSPACE.COM 34
  • 35. ACG on Rackspace® Private Cloud powered by OpenStack®Rackspace Use Case• Subject: • Complex availability calculation sourcing 3 months of monitoring data and creating 1 billion records in initial calculation RACKSPACE® HOSTING | WWW.RACKSPACE.COM 35
  • 36. ACG on Rackspace® Private Cloud powered by OpenStack®Rackspace Use Case• Environment 1 • Data Warehouse Microsoft SQL server database • SSIS data loading • SQL server with 24 CPUs and 250GB RAM was dedicated to the initial calculation • SQL server stored procedure performed the calculation • Source and result are stored in traditional data warehouse structure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 36
  • 37. ACG on Rackspace® Private Cloud powered by OpenStack®Rackspace Use Case• Environment 2 • ACG running two Cassandra clusters 4 nodes each • Informatica with Cassandra bulk loader • Each ACG node has 2CPUs and 8GB RAM • Java program running on instance with 4CPUs and 8GB RAM • Source and result are stored in columnar structure suitable for time series data RACKSPACE® HOSTING | WWW.RACKSPACE.COM 37
  • 38. ACG on Rackspace® Private Cloud powered by OpenStack®Rackspace Use Case - Result• Calculation Duration •Microsoft SQL Server lasted 5 days •ACG calculation completed in 3.5 hours• Storage Size • Microsoft SQL server 500GB •ACG 20 GB• Complexity of the calculation •Columnar data store is optimal for time series data. Sourcing from columnar data store resulted in relatively simple Java calculation process comparing to SQL server stored procedure RACKSPACE® HOSTING | WWW.RACKSPACE.COM 38
  • 39. ACG on Rackspace® Private Cloud powered by OpenStack®Rackspace Use Case - Conclusion • Selecting optimal data store for use case resulted in: • Substantial performance improvement • Reduced storage demand •Simplified processes •Ability to process terabytes of data per day close to real-time and on-demand •Improved trending and reporting: • enhances support capabilities • improved Rackspace customer experience • Significant cost reduction RACKSPACE® HOSTING | WWW.RACKSPACE.COM 39
  • 40. RACKSPACE® HOSTING | 5000 WALZEM ROAD | SAN ANTONIO, TX 78218 US SALES: 1-800-961-2888 | US SUPPORT: 1-800-961-4454 | WWW.RACKSPACE.COMRACKSPACE® HOSTING | © RACKSPACE US, INC. | RACKSPACE® AND FANATICAL SUPPORT® ARE SERVICE MARKS OF RACKSPACE US, INC. REGISTERED IN TH E UNITED STATES AND OTHER COUNTRIES. | WWW.RACKSPACE.COM 40
  • 41. ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 41
  • 42. ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 42
  • 43. ACG UI RACKSPACE® HOSTING | WWW.RACKSPACE.COM 43