0
Big Data in the Cloud
S&P Capital IQS&P Capital IQ combines two of ourstrongest brands - S&P, with its longhistory and experience in the financi...
Agenda• Creation of Excel Plug-in with Global  Data, Global Sales and US based servers• High Performance data gets for Big...
S&P Capital IQ Excel Plug-in• Excel Plug-in provides thousands of data points  on Demand• Allows customers anywhere in the...
Global Customers US Data Center                                 Average Response Time MillisecondsFrom: London To: New Jer...
Global Customers Global Data Center                                Average Response Time MillisecondsFrom: London To: Irel...
Cloud Architecture                              HTTPS                               HTTPS            HTTPSNew Jersey DC   ...
How do we make it even faster?                                                      Smart Cache                           ...
Smart Cache                2.                                     3.   Smart Cache                                        ...
Smart Caching Data                                                              Smart Cache                               ...
Smart Caching Data             Lessons Learned• The algorithm works similar to a website matching engine for  shopping.• D...
High Performance Data Gets
High Performance Data Gets• Some data assets due to size are still routed back to the  US• Big Data sets ~10T of time seri...
High Performance Data Gets• Using Hadoop learn what are the most used  large data assets.• Move the subset of data identif...
High Performance Data GetsCassandra                                                            Hbasehttp://cassandra.apach...
High Performance Data Gets                 Cassandra          Hbase            OracleData Get         400 Microseconds   1...
High Performance Data Gets• Virtual Oracle instances did not meet our  performance needs.• EMR needed for Hbase was not co...
Questions?
Big data cloud architecture
Upcoming SlideShare
Loading in...5
×

Big data cloud architecture

1,469

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,469
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
56
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Maybe Remove
  • Transcript of "Big data cloud architecture"

    1. 1. Big Data in the Cloud
    2. 2. S&P Capital IQS&P Capital IQ combines two of ourstrongest brands - S&P, with its longhistory and experience in the financialmarkets and Capital IQ, which is knownamong professionals globally for itscomprehensive company and financialinformation and powerful analyticaltools.
    3. 3. Agenda• Creation of Excel Plug-in with Global Data, Global Sales and US based servers• High Performance data gets for Big Historical Time Series Data.• QA
    4. 4. S&P Capital IQ Excel Plug-in• Excel Plug-in provides thousands of data points on Demand• Allows customers anywhere in the world to use our data assets on their desktops on demand• It needs to be a fast user experience everywhere in the world
    5. 5. Global Customers US Data Center Average Response Time MillisecondsFrom: London To: New Jersey 400From: New York To: New Jersey 30From: Melbourne To: New Jersey 800 Response times rounded
    6. 6. Global Customers Global Data Center Average Response Time MillisecondsFrom: London To: Ireland 400 to 40From: New York To: New Jersey 30 to 30From: Melbourne To: Singapore 800 to 60 Response times rounded
    7. 7. Cloud Architecture HTTPS HTTPS HTTPSNew Jersey DC HTTPS HTTPS HTTPS
    8. 8. How do we make it even faster? Smart Cache Pre-send data Router- Move data the customers uses the most to their desktop.- Automatically get the data for the customer.- Learn to send the right data to the customer.
    9. 9. Smart Cache 2. 3. Smart Cache 1. 5. 4. a Router b1. User Opts into Smart Cache2. The system pre-sends data package to customer3. User makes a request for data a. Smart Cache Checks Locally first b. Not local grab data from the cloud4. Smart Cache sends usage logs5. Pre-sent data package is altered for the customer
    10. 10. Smart Caching Data Smart Cache 1. 4. 2. 3. Router5. 6. 7.1. Collect logs from smart cache2. Collect and decrypt cloud and local usage logs3. Apply logs to Mahout4. Use customer profile5. Mahout comes out with an update suggestion list6. Customer specific package is created7. Prepared package is ready for pick by smart cache
    11. 11. Smart Caching Data Lessons Learned• The algorithm works similar to a website matching engine for shopping.• Different in that the customer does not see the recommendations they just have a faster experience• All data sets are used to learn but only large data sets are custom packaged for delivery• Sometimes it is easier to just send the entire package when the data set is small enough and used by the customer.• Don’t expect success day 1 or day 30 the longer you learn the more accurate it should become• Not a replacement for simple logic• Algorithm requires constant feeding and attention.• There are cases where you can’t learn about your user such as when they share ID’s.
    12. 12. High Performance Data Gets
    13. 13. High Performance Data Gets• Some data assets due to size are still routed back to the US• Big Data sets ~10T of time series data• As those data assets became more popular we needed to move the right data to the cloud• Cannot synchronize the data so fast loads are required• Single Milliseconds get times
    14. 14. High Performance Data Gets• Using Hadoop learn what are the most used large data assets.• Move the subset of data identified as the most used data to the cloud.• Fast loading of millions of records• Allow for Single Milliseconds data retrieval times
    15. 15. High Performance Data GetsCassandra Hbasehttp://cassandra.apache.org/ http://hbase.apache.org/Apache Cassandra is a highly scalable, eventually consistent, HBase is an open-source, distributed, versioned, column-orienteddistributed, structured key-value store. Cassandra brings together store modeled after Googles Bigtable: A Distributed Storagethe distributed systems technologies from Dynamo and the data System for Structured Data by Chang et al. Just as Bigtablemodel from Googles BigTable. Like Dynamo, Cassandra leverages the distributed data storage provided by the Google Fileis eventually consistent. Like BigTable, Cassandra provides a System, HBase provides Bigtable-like capabilities on top of HadoopColumnFamily-based data model richer than typical key/value and HDFS.systems.Cassandra was open sourced by Facebook in 2008, where it was Hbase is similar to an RDBMS in that it has the concept of tables;designed by Avinash Lakshman (one of the authors of Amazons however, columns in Hbase tables are not fixed in number or dataDynamo) and Prashant Malik ( Facebook Engineer ). In a lot of type and can have any data type which varies from one row to theways you can think of Cassandra as Dynamo 2.0 or a marriage of other.Dynamo and BigTable. Cassandra is in production use at Facebookbut is still under heavy development.We tried to do a similar POC using Cassandra with a smaller subsetof data because of above mentioned hardware restrictions.Unlike Hbase and RDBMS, there is no concept of a table. Insteadwe have columns, column families and Keypsaces.
    16. 16. High Performance Data Gets Cassandra Hbase OracleData Get 400 Microseconds 1 Milliseconds 5 SecondsData Load 10 Minutes 10 Minutes 10 Minutes • Data Get – Time to pull 1 security and 1 data point • Data Load – Time take to load 6 million securities
    17. 17. High Performance Data Gets• Virtual Oracle instances did not meet our performance needs.• EMR needed for Hbase was not cost effective for data gets.• Hbase is difficult to implement in AWS due to the hardware requirements of Hadoop• Cassandra can be segmented logically for Big Data Assets with minimal to no performance degradation in AWS
    18. 18. Questions?
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×