Big Data Learnings from a Vendor's Perspective
Upcoming SlideShare
Loading in...5
×
 

Big Data Learnings from a Vendor's Perspective

on

  • 332 views

Best practices in building and operating 24x7 Internet scale services.

Best practices in building and operating 24x7 Internet scale services.

Statistics

Views

Total Views
332
Views on SlideShare
332
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data Learnings from a Vendor's Perspective Big Data Learnings from a Vendor's Perspective Presentation Transcript

  • BIG DATA LEARNINGSFROM A VENDOR’sPerspectiveSrini V. SRINIVASANToronto BIG data WEEKAPRIL 23, 2013
  • Response time: Hours, WeeksTB to PBRead IntensiveTRANSACTIONS (OLTP)Response time: SecondsGigabytes of dataBalanced Reads/WritesANALYTICS (OLAP)STRUCTUREDDATAResponse time: SecondsTerabytes of dataRead Intensive© 2013 Aerospike. All rights reserved. Confidential Pg. 2BIG DATA ANALYTICSReal-time TransactionsResponse time: < 10 ms1-20 TBBalanced Reads/Writes24x7x365 AvailabilityUNSTRUCTURED DATAREAL-TIME BIG DATADatabase Landscape
  • Requirements for Internet Enterprises1. Know who the Interaction iswith Monitor 200+ Million US Consumers,5+ Billion mobile devices and sensors1. Determine intent based oncurrent context Page views, search terms, game state,last purchase, friends list, ads served,location1. Respond now, use big data formore accurate decisions Display the most relevant Ad Recommend the best product Deliver the richest gaming experience Eliminate fraud…1. Service can NEVER go down!© 2013 Aerospike. All rights reserved. Confidential Pg. 3
  • Challenges1. Handle extremely high rates of persistentread/write transactions2. Avoid hot spots to maintain tight latency SLAs3. Provide immediate consistency with replication4. Allow long running tasks with transactions5. Scale linearly as data sizes increase1. Add capacity with no service interruption© 2013 Aerospike. All rights reserved. Pg. 4
  • Native Flash  Performance➤ Low Latency at High Throughput© 2012 Aerospike. All rights reserved. Confidential Pg. 5
  • © 2013 Aerospike. All rights reserved. Confidential Pg. 6“Only Aerospike was able to function in synchronous mode with a replicationfactor of two.. it is a significant advantage that Aerospike is able to functionreliably on a smaller amount of hardware while still maintaining true consistency.”
  • Shared-Nothing Architecture© 2013 Aerospike. All rights reserved. Pg. 7OHIO Data Center➤ Every node in a cluster is identical,handles both transactions and longrunning tasks➤ Data is replicated synchronously withimmediate consistency within thecluster➤ Data is replicated asynchronouslyacross data centers
  • Distributed Hash TableHow Data Is Distributed (Replication Factor 2)➤ Every key is hashed into a20 byte (fixed length) stringusing the RIPEMD160 hash function➤ This hash + additional data(fixed 64 bytes)are stored in RAM in the index➤ Some bits from this hash value areused to compute the partition id➤ There are 4096 partitions➤ Partition id maps to node idbased on cluster membership© 2013 Aerospike. All rights reserved. Pg. 8cookie-abcdefg-12345678cookie-abcdefg-12345678182023kh15hh3kahdjsh182023kh15hh3kahdjshPartitionIDMasternodeReplicanode… 1 41820 2 31821 3 24096 4 1
  • Organizing the cluster➤ Automatic multicast gossip protocol for node discovery➤ Paxos consensus algorithm determines nodes in cluster➤ Ordered list of nodes determines data location➤ Data partitions balanced for minimal data motion➤ Vote initiated and terminated in 100 milliseconds© 2013 Aerospike. All rights reserved. Pg. 9
  • How it Works1. Write sent to row master2. Latch against simultaneous writes3. Apply write to master memoryand replica memorysynchronously4. Queue operations to disk5. Signal completed transaction(optional storage commit wait)6. Master applies conflict resolutionpolicy (rollback/ rollforward)© 2013 Aerospike. All rights reserved. Pg. 10master replica1. Cluster discovers new node viagossip protocol2. Paxos vote determines new dataorganization3. Partition migrations scheduled4. When a partition migration starts,write journal starts on destination5. Partition moves atomically6. Journal is applied and source datadeletedtransactionscontinueWriting with Immediate Consistency Adding a Node
  • Intelligent ClientShields Applications from the Complexity of the Cluster➤ Implements Aerospike API➤ Optimistic row locking➤ Optimized binary protocol➤ Cluster tracking Learns about clusterchanges, partition map Gossip protocol➤ Transaction semantics Global transaction ID Retransmit and timeout© 2013 Aerospike. All rights reserved. Pg. 11
  • Cross Data Center Replication (XDR)➤ Asynchronous replication for long linkdelays and outages➤ Namespace is configured to replicate to adestination cluster – master / slave,including star and ring➤ Replication process Transaction journal on partition master andreplica XDR process writes batches to destination Transmission state shared with sourcereplica Retransmission in case of network fault When data arrives back at originatingcluster, transaction ID matching preventssubsequent application and forwarding➤ In master / master replication, conflictresolution via multiple versions, ortimestamp© 2013 Aerospike. All rights reserved. Confidential Pg. 12
  • Multi-core Optimization Right Architecture Shared nothing In-memory (or multiple SSDs) Tight code loop Lock free isolation OS, Programming Language, Libraries Modern Linux kernel C language Use epoll Tweaks Pin threads to processor cores IRQ affinity settings for NIC CPU Socket Isolation via pairing of CPU to NICRuss’s 10 Ingredient Recipe forMaking 1 Million TPS on $5K Hardware© 2013 Aerospike. All rights reserved. Pg. 13
  • Flash-optimized Storage Layer➤ Direct device access Direct attach performance Data written in flash optimallarge block patterns All indexes in RAM for low wear Constant backgrounddefragmentation Log structured file system, “copyon write” Clean restart through sharedmemory➤ Random distribution using hashdoes not require RAID hardware© 2013 Aerospike. All rights reserved. Pg. 14…SSD performance varies widely•Aerospike has a certifiedhardware list•Free SSD certification tool,CIO, is also available
  • Native Flash  17x better TCO“…data-in-DRAM implementations like SAP HANA..should be bypassed…..current leading data-in-flash database for transactional analytic appsis Aerospike.” - David Floyer, CTO, Wikibon© 2012 Aerospike. All rights reserved. Confidential | Pg. 15$$$http://wikibon.org/wiki/v/Data_in_DRAM_is_a_Flash_in_the_Pan
  • Case studies
  • Proven in Production➤ AppNexus - #2 RTB after Google 27 Billion auctions per day 600+ QPS Aerospike servers in 6 clusters in 3data centers➤ Chango – #2 Search after Google Sees more Searches thanYahoo! + bing Data on 300 Million users➤ TradeDesk – first Ad Exchange Facebook Exchange partner FBX serves 25% of Ads on theInternet 1200% growth in 2012“Aerospike has operatedwithout interruptionsand easily scaled to meetour performance demands.”– Mike Nolet, CTO, AppNexus© 2013 Aerospike. All rights reserved. Confidential Pg. 17
  • Proven in Production➤ eXelate – Data on 500 Million users Online data plus Nielsen, Mastercard,Autobytel, Bizo data.. Data on 400 million users 20 Billion Transactions per month 4x2 TB data per cluster 4 clusters across 4 data centers “Scale.Real-time performance.Real-time replication at 4 datacenters.Aerospike delivered.”- Elad Efraim, eXelate CTO➤ BlueKai – Serves half the Fortune 30 #1 Data Exchange 2 Trillion Transactions per month© 2013 Aerospike. All rights reserved. Confidential
  • Mission➤ Build the Modern Real-time Data Platform1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime© 2013 Aerospike. All rights reserved. Confidential Pg. 19Publish &Subscribe• ASQL & NoSQL• Powerful Aggregations(MapReduce++)• ASQL & NoSQL• Powerful Aggregations(MapReduce++)• Secondary Index QueriesTransactions• User Defined Functions (UDF)SecurityEncryptionCompressionAEROSPIKE REAL-TIME DATA DATAPLATFORM• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List• Storage– DRAM, SSD, HDD
  • Mission➤ Build the Modern Real-time Data Platform1. Scaling the Internet of Everything2. Pushing the limits of modern hardware3. No data loss and No downtime© 2013 Aerospike. All rights reserved. Confidential Pg. 20Publish &Subscribe• ASQL & NoSQL• Powerful Aggregations(MapReduce++)• ASQL & NoSQL• Powerful Aggregations(MapReduce++)• Secondary Index QueriesTransactions• User Defined Functions (UDF)SecurityEncryptionCompressionAEROSPIKE REAL-TIME DATA DATAPLATFORM• Distribution - Shared Nothing, ACID, Scale-out, Multiple datacenters• Data Types – Int, Str, Blob, List, Map, Large Stack, Large Set, Large List• Storage– DRAM, SSD, HDD
  • Aerospike Real-time Big Data PlatformPurpose Built Proven in Production➤ In-DRAM + Flash Indexes always in-DRAM,data in DRAM + Flash➤ ACID + Tunable Consistency Never loses data – synchronousreplication, cross data centersynchronization➤ Vertical + Horizontal Scaling Multi-core, multi-processor Shared nothing, elastic,transparent sharding, clientsknow exactly where the data is➤ Predictable High Performance 99.9% < 2-3ms at 500k TPS➤ ACID, Zero Downtime in 3 Years Tunable consistency No SPoF, self-managing clusters Cross Data Center Replication➤ 17x better TCO Fewer servers, less power, easiermaintenance© 2013 Aerospike. All rights reserved. Confidential Pg. 21
  • Aerospike Real-time Big Data PlatformRapid Development Complete Customizability➤ Support for popular languages andtools ASQL and Aerospike Client inJava, C#, Ruby, Python..➤ Complex data types Nested documents(map, list, string, integer) Large (Stack, Set, List) Objects➤ Queries Single record Batch multi-record lookups Equality and range Aggregations and MapReduce➤ User Defined Functions (UDFs) In-DB processing➤ Aggregation Framework UDF Pipeline MapReduce ++➤ Time Series Queries Just 2 IOPs for most r/windependent of object size© 2013 Aerospike. All rights reserved. Confidential Pg. 22
  • How to get Aerospike?FreeCommunity Edition Enterprise Edition➤ For developers lookingfor speed and stabilityand transparently scaleas they grow All features for 2 nodes, 100GB 1 cluster 1 datacenter Community support➤ For mission critical appsneeding to scale right fromthe start Unlimited number ofnodes, clusters, datacenters Cross data centerreplication Premium 24x7 support Priced by TBs of uniquedata (not replicas)➤ © 2013 Aerospike. All rights reserved. Pg. 23
  • Questions© 2013 Aerospike. All rights reserved. Confidential Pg. 24