Big data cache-surge-2012

  • 5,528 views
Uploaded on

These days it is not uncommon to have 100s of gigabytes of data that must be sliced and diced, then delivered fast and rendered quickly. Typically solutions involve lots of caching and expensive …

These days it is not uncommon to have 100s of gigabytes of data that must be sliced and diced, then delivered fast and rendered quickly. Typically solutions involve lots of caching and expensive hardware with lots of memory. And, while those solutions certainly can work, they aren’t always cost effective, or feasible in certain environments (like in the cloud). This talk seeks to cover some strategies for caching large data sets without tons of expensive hardware, but through software and data design.

It’s common wisdom that serving your data from memory dramatically improves application performance and is a key to scaling. However, caching large datasets brings its own challenges: distribution, consistency, dealing with memory limits, and optimizing data loading just to name a few. This talk will go through some of the challenges, and solutions, to achieve fast data queries in the cloud. The audience will come away armed with a number of practical techniques for organizing and building caches for non-transactional datasets that can be applied to scale existing systems, or design new systems.

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
5,528
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
114
Comments
0
Likes
9

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Big DataWithout a Big Database Kate Matsudaira Decide.com @katemats
  • 2. Two kinds of data “reference”, “non- nicknames “user”, “transactional” transactional” •  product/offer catalogs •  user accounts •  service catalogs •  shopping cart/ordersexamples: •  static geolocation data •  user messages •  dictionaries •  … •  …created/modified users business (you)by:sensitivity to high lowstaleness:plan for growth: hard easyaccess read/write mostly readoptimization:
  • 3. user datareference data
  • 4. user datareference data
  • 5. user datareference data
  • 6. Reference Data Needs Speed credit:: http://cache.gawker.com
  • 7. Performance Remindermain memory read 0.0001 ms (100 ns)network round trip 0.5 ms (500,000 ns) disk seek 10 ms (10,000,000 ns) source: http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf
  • 8. availability performance problemsThe Beginning problems service webapp load balancer load balancer service BIG
 DATABASE webapp service data loader scalability problems
  • 9. operational overheadReplication REPLICA service webapp load balancer load balancer service BIG
 DATABASE webapp service data loader performance problems scalability problems
  • 10. operationalLocal Caching scalability overhead problems REPLICA cache service webapp load balancer load balancer cache service BIG
 DATABASE webapp cache service data loader performance long tail performance problems problems consistency problems
  • 11. 80% of requests query 10% of entries (head)The Long 20% of requests Tail query remaining 90% of entries (tail)Problem
  • 12. operational performanceBig Cache long tail overhead problems performance problems replica service webappload balancer load balancer database BIG service CACHE webapp data loader service preload scalability consistency problems problems
  • 13. Do I look like I memcached(b) need a cache? ElastiCache (AWS) Oracle Coherence
  • 14. Dynamically Targeted assign keys to generic data/ the “nodes” use cases. Scales Dynamically horizontally rebalances data No assumptions Poor about loading/ performance updating data on cold starts
  • 15. Big Cache Technologies•  Additional hardware•  Additional configuration•  Additional monitoring } operational overhead•  Extra network hop•  Slow scanning•  Additional deserialization } performance
  • 16. NoSQL to some operational overhead The Rescue? NoSQL Replica service load balancer webapp load balancer service NoSQL Database webapp service data loader some performance some scalability problems problems
  • 17. Remote Store Retrieval Latency network remote store network client Lookup/write TCP request: response: 0.5 ms 0.5 ms TCP response: 0.5 ms read/parse response: 0.25 ms Total time to retrieve single value: 1.75 ms
  • 18. Total time to retrieveA single value from remote store: 1.75 ms
 from memory: 0.001 ms 
 (10 main memory reads)Sequential access of1 million random keys from remote store: 30 minutes
 from memory: 1 second
  • 19. The Truth About Databases
  • 20. “What Im going to call as the hot data cliff: As the size of your hot data set (data frequently read at sustained rates above disk I/O capacity) approaches available memory, write operation bursts that exceeds disk write I/O capacity can create a trashing death spiral where hot disk pages that MongoDB desperately needs are evicted from disk cache by the OS as it consumes more buffer space to hold the writes in memory.” MongoDB Source: http://www.quora.com/Is-MongoDB-a-good-replacement-for-Memcached
  • 21. “Redis is an in-memory but persistent on diskdatabase, so it represents a different trade off where very high write and read speed is achieved with the limitation of data sets that cant be larger than memory.”
 Redis source: http://redis.io/topics/faq
  • 22. They are fast if everything fits into memory.
  • 23. Can you keep it inmemory yourself? operational relief full data service cache loader load balancer load balancer webapp full data BIG
 service cache loader DATABASE webapp full data service cache loader consistency performance scales problems gain infinitely
  • 24. Fixing 1.  Deployment “Cells”Consistency 2.  Sticky user sessions deployment cell full data webapp service cache loader load balancer full data webapp service cache loader full data webapp service cache loader
  • 25. credit: http://www.fruitshare.ca/wp-content/uploads/2011/08/car-full-of-apples.jpeg
  • 26. "Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and theseattempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.” Donald Knuth
  • 27. How do you fit allthat data in memory?
  • 28. The Answer1 2 3 4 5
  • 29. “Domain Layer (or Model Layer): Responsible for representing concepts of the business, information about the business situation, and business rules. State thatDomain reflects the business situation is controlled and used here, even though the technical Model details of storing it are delegated to the infrastructure. This layer is the heart of business software.”Design Eric Evans, Domain-Driven Design, 20031 2 3 4 5
  • 30. Domain Model Design Guidelines #3 Optimize#1 Keep it Data immutable #2 Use independent Help! I am hierarchies in the trunk! http://alloveralbany.com/images/bumper_gawking_dbgeek.jpg
  • 31. intern()your immutables K1 V1 K1 V1 A A B D B C E C D K2 V2 K2 V2 E F E’ B’ D’ F C’ E’
  • 32. private  final  Map<Class<?>,  Map<Object,  WeakReference<Object>>>  cache  =            new  ConcurrentHashMap<Class<?>,  Map<Object,  WeakReference<Object>>>();    public  <T>  T  intern(T  o)  {    if  (o  ==  null)      return  null;    Class<?>  c  =  o.getClass();    Map<Object,  WeakReference<Object>>  m  =  cache.get(c);    if  (m  ==  null)        cache.put(c,  m  =  synchronizedMap(new  WeakHashMap<Object,  WeakReference<Object>>()));    WeakReference<Object>  r  =  m.get(o);    @SuppressWarnings("unchecked")    T  v  =  (r  ==  null)  ?  null  :  (T)  r.get();    if  (v  ==  null)  {      v  =  o;      m.put(v,  new  WeakReference<Object>(v));    }        return  v;  }  
  • 33. Use Independent Hierarchies Product Summary productId = … Productid = … Offerstitle= … productId = … Offers Specifications productId = … Specifications Product Reviews Info Description productId = … Reviews Model History productId = … Rumors Description Model History productId = … Rumors productId = …
  • 34. Collection Optimization1 2 3 4 5
  • 35. Leverage Primitive Keys/Values collection with 10,000 elements [0 .. 9,999] size in memoryjava.util.ArrayList<Integer>   200K  java.util.HashSet<Integer>   546K  gnu.trove.list.array.TIntArrayList   40K  gnu.trove.set.hash.TIntHashSet   102K   Trove (“High Performance Collections for Java”)
  • 36. Optimizesmall immutable collectionsCollections with small number of entries (up to ~20): class  ImmutableMap<K,  V>  implements  Map<K,V>,  Serializable   {  ...  }     class  MapN<K,  V>  extends  ImmutableMap<K,  V>  {    final  K  k1,  k2,  ...,  kN;    final  V  v1,  v2,  ...,  vN;    @Override  public  boolean  containsKey(Object  key)  {        if  (eq(key,  k1))  return  true;        if  (eq(key,  k2))  return  true;        ...        return  false;    }    ...  
  • 37. java.util.HashMap: Space 128 bytes + 32 bytes per entry Savings compact immutable map: 24 bytes + 8 bytes per entry
  • 38. Numeric Data Optimization1 2 3 4 5
  • 39. Price History Example
  • 40. Problem: •  Store daily prices for 1MExample: products, 2 offers per product Price •  Average price history length per product ~2 years History Total price points: (1M + 2M) * 730 = ~2 billion
  • 41. Price History TreeMap<Date,  Double>   First 88 bytes per entry * 2 billion = ~180 GBattempt
  • 42. Typical Shopping Price History price$100 0 20 60 70 90 100 120 121 days
  • 43. Run Length Encodinga a a a a a b b b c c c c c c 6 a 3 b 6 a
  • 44. Price History •  positive: price (adjusted to scale) Optimization •  negative: run length (precedes price) •  zero: unavailable-20 100 -40 150 -10 140 -20 100 -10 0 -20 100 90 -9 80 •  Drop pennies •  Store prices in primitive short (use scale factor to represent prices greater than Short.MAX_VALUE) Memory: 15 * 2 + 16 (array) + 24 (start date) + 4 (scale factor) = 74 bytes
  • 45. Reduction compared to TreeMap<Date,  Double>: 155Savings times Estimated memory for 2 billion price points: 1.2 GB << 180 GB
  • 46. public  class  PriceHistory  {   Price History Model    private  final  Date  startDate;  //  or  use  org.joda.time.LocalDate    private  final  short[]  encoded;    private  final  int  scaleFactor;      public  PriceHistory(SortedMap<Date,  Double>  prices)  {  …  }  //  encode    public  SortedMap<Date,  Double>  getPricesByDate()  {  …  }  //  decode    public  Date  getStartDate()  {  return  startDate;  }      //  Below  computations  implemented  directly  against  encoded  data    public  Date  getEndDate()  {  …  }    public  Double  getMinPrice()  {  …  }    public  int  getNumChanges(double  minChangeAmt,  double  minChangePct,  boolean  abs)  {  …  }    public  PriceHistory  trim(Date  startDate,  Date  endDate)  {  …  }    public  PriceHistory  interpolate()  {  …  }  
  • 47. Know Your Data
  • 48. Compress text1 2 3 4 5
  • 49. String Compression: byte arrays •  Use the minimum character set encodingstatic  Charset  UTF8  =  Charset.forName("UTF-­‐8");  String  s  =  "The  quick  brown  fox  jumps  over  the  lazy  dog”;  //  42  chars,  136  bytes  byte[]  b  a=  "The  quick  brown  fox  jumps  over  the  lazy  dog”.getBytes(UTF8);  //  64  bytes  String  s1  =  “Hello”;  //  5  chars,  64  bytes  byte[]  b1  =  “Hello”.getBytes(UTF8);  //  24  bytes    byte[]  toBytes(String  s)  {  return  s  ==  null  ?  null  :  s.getBytes(UTF8);  }  String  toString(byte[]  b)  {  return  b  ==  null  ?  null  :  new  String(b,  UTF8);  }    
  • 50. String Compression: shared prefix•  Great for URLspublic  class  PrefixedString  {    private  PrefixedString  prefix;    private  byte[]  suffix;      .  .  .        @Override  public  int  hashCode()  {  …  }    @Override  public  boolean  equals(Object  o)  {  …  }  }  
  • 51. String Compression:short alphanumeric case-insensitive stringspublic abstract class AlphaNumericString {
 public static AlphaNumericString make(String s) {
 try { return new Numeric(Long.parseLong(s, Character.MAX_RADIX)); } catch (NumberFormatException e) { return new Alpha(s.getBytes(UTF8)); } } protected abstract String value(); @Override public String toString() {return value(); } private static class Numeric extends AlphaNumericString { long value; Numeric(long value) { this.value = value; } @Override protected String value() { return Long.toString(value, Character.MAX_RADIX); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } } private static class Alpha extends AlphaNumericString { byte[] value; Alpha(byte[] value) {this.value = value; } @Override protected String value() { return new String(value, UTF8); } @Override public int hashCode() { … } @Override public boolean equals(Object o) { … } }}
  • 52. Just convert bzip2 to byte[] first, then compress Gzip Become the master of your strings! String Compression: Large Strings Image source:https://www.facebook.com/note.php?note_id=80105080079image
  • 53. JVM Tuning1 2 3 4 5
  • 54. make sure to use compressed pointers use low pause (-XX:+UseCompressedOops) GC (Concurrent Mark Sweep, G1)JVM Tuning Overprovision Adjust heap by ~30% generation sizes/ratios This s#!% is heavy! Image srouce: http://foro-cualquiera.co
  • 55. Print garbage collection JVMTuning If GC pauses still prohibitive thenconsider partitioning
  • 56. Cache Loading full data webapp service cache loaderload balancer full data reliable file webapp service store (S3) cache loader full data webapp service cache loader “cooked” datasets
  • 57. Keep the format simple Poll for (CSV, JSON) updates Final datasets should be compressed and stored (i.e. S3) Poll frequency == data inconsistency threshold Help! I amin the trunk!
  • 58. Cache LoadingTime Sensitivity
  • 59. Cache Loading:low time Sensitivity Data/tax-rates /date=2012-05-01 tax-rates.2012-05-01.csv.gz /date=2012-06-01 tax-rates.2012-06-01.csv.gz /date=2012-07-01 tax-rates.2012-07-01.csv.gz
  • 60. Cache Loading:/prices medium/high time /full Sensitivity /date=2012-07-01 price-obs.2012-07-01.csv.gz /date=2012-07-02 /inc /date=2012-07-01 2012-07-01T00-10-00.csv.gz 2012-07-01T00-20-00.csv.gz
  • 61. meow Cache is immutable, so no locking is required CacheLoadingStrategy Swap And for datasets that Works well for need to be refreshed infrequently each update updated data sets Image src:http://static.fjcdn.com/pictures/funny_22d73a_372351.jpg
  • 62. Deletions can be tricky Avoid fullsynchronization Use one container per partition YARRRRRR! http://www.lostwackys.com/wacky-packages/WackyAds/capn-crud.htm
  • 63. Concurrent Locking with Trove Mappublic class LongCache<V> { private TLongObjectMap<V> map = new TLongObjectHashMap<V>(); private ReentrantReadWriteLock lock = new ReentrantReadWriteLock(); private Lock r = lock.readLock(), w = lock.writeLock(); public V get(long k) { r.lock(); try { return map.get(k); } finally { r.unlock(); } } public V update(long k, V v) { w.lock(); try { return map.put(k, v); } finally { w.unlock(); } } public V remove(long k) { w.lock(); try { return map.remove(k); } finally { w.unlock(); } }}
  • 64. I am “cooking” Periodically the data sets. generate Ha! serialized data/ state Validatewith CRC or hash Keep local copies
  • 65. Dependent Caches service instance cache Aload balancer service status dependencies aggregator cache B cache C health check (servlet) cache D
  • 66. Deployment Cell Status deployment cell webapp status aggregator load balancer health HTTP or JMX service 1 check cell status status aggregator aggregator service 2 status aggregator
  • 67. Hierarchical Status Aggregation
  • 68. http://ginormasource.com/humor/wp-content/uploads/2011/10/girls-on-scale-funny-photo.jpg
  • 69. Data doesn’t fit Keep a smaller into the heap heap http://pryncepality.com/wp-content/uploads/2012/06/2reasons.jpg
  • 70. Partitioning Decision Tree yes does my no data fit in a single VM? yes can I no partition statically? don’t use fixed use dynamicpartition partitioning partitioning difficulty
  • 71. FixedPartitions http://www.nhp-spfd.com/toilet_partitions/fullsize/plastic_laminate_toilet_partitions_fs.jpg
  • 72. Fixed Partition Assignment deployment cell p 1 p 2 webapp p 3 p 4 load balancer p 1 p 2 webapp p 3 p 4
  • 73. Dynamic Partitioning http://pryncepality.com/wp-content/uploads/2012/06/2reasons.jpg
  • 74. Dynamic Partition Assignment•  Each partition lives on at least 2 nodes: primary/secondary•  Partition (set) determined via a simple computation
  • 75. Member nodes operate with“active state” from the New Eleader node Leader F Leader E A The leader is re-elected D C B
  • 76. Each member node learns their targetstate from the leader E F Leader E A D C B All nodes load partitions to achieve their target state. Then will transition into an active state.
  • 77. Dynamic primary secondaryPartitioning p 1 p 2 p 3 p 4 p 5 p 6 webapp load balancer load balancer p 4 p 5 p 6 p 7 p 8 p 9 webapp p 7 p 8 p 9 extra network hop is back p 1 p 2 p 3
  • 78. Ad-hoc CacheQuerying Image source: http://talkfunnyjokes.com/wp-content/uploads/2011/10/20111015-134527.jpg
  • 79. •  Groovy •  JRuby Ad-hoc •  MVEL 
 Querying (http://mvel.codehaus.org)Languages •  JoSQL 
 (Java Object SQL, http://josql.sourceforge.net)
  • 80. Break up the query: •  Extractor expressions (like SQL “SELECT” clause)Organizing •  Filter (evaluates to boolean, like SQL “WHERE” clause) Queries •  Sorter (like SQL “ORDER BY” clause) •  Limit (like SQL “LIMIT” clause)
  • 81. prune partitions partition 1 partition 2 partition N … filter filter filter sort sort sort limit limit limit intermediate intermediate intermediate results results results sort Parallel limit (fan-out) final resultsQuery Execution extract
  • 82. Parallel (fan-out) Query: Multi-level Reductionp 1 p 2 p 3 p M p M+1 p N r 1 r 2 r 3 r 4
  • 83. Optimizing for Network Topology Using Multi-level Reductionslower/lower capacity linksfaster/higher capacity links p 1 p 2 p 7 p 8 p 13 p 14 p 3 p 4 p 9 p 10 p 15 p 16 p 5 p 6 p 11 p 12 p 17 p 18 location 1 location 2 location 3
  • 84. Example (from JoSQL site): SELECT  *   FROM    java.io.File   WHERE  name  $LIKE  "%.html"  JoSQL AND      toDate  (lastModified)          BETWEEN  toDate  (01-­‐12-­‐2004)   AND  toDate  (31-­‐12-­‐2004)  
  • 85. find average number of significant price changes and average pricehistory by product category for a specific seller and time period (firsthalf of 2012)select @priceChangeRate, product, @avgPriceHistory JoSQL Examplefrom products where offers.lowestPrice >= 100 and offer(offers, 12).priceHistory.startDate <= date(2012-01-01) and offer(offers, 12).priceHistory.endDate >= date(2012-06-01) and decideRank.rank < 200group by category(product.primaryCategoryId).namegroup by order 1execute on group_by_results sum(getNumberOfDailyPriceChanges( trimPriceHistory(offer(offers, 12).priceHistory, 2012-01-01, 2012-06-01), offer(offers, 12).priceHistory.startDate, 15, 5)) / count(1) as priceChangeRate, avgPriceHistory(trimPriceHistory(offer(offers, 12).priceHistory, 2012-01-01, 2012-06-01)) as avgPriceHistory
  • 86. Total time to execute where clause on all objects: 6010.0 ms where clause average over 3,393,731 objects: 0.0011 ms Total time to execute 2 expressions on GROUP_BY_RESULTS objects: Group operation 3.0 ms took: 1.0 ms Group column collection and sort: 46.0 ms
  • 87. Monitoring cache readiness •  rolled up status aggregate metrics •  staleness •  total counts •  counts by partition •  counts by attribute •  average cardinality of one-to-many relationships http://dailyjokes.in/wp-content/uploads/2012/06/server.jpg
  • 88. Monitoring cache readiness •  rolled up status aggregate metrics •  staleness •  total counts •  counts by partition •  counts by attribute •  average cardinality of one- to-many relationships
  • 89. The End http://www.decide.com http://katemats.com And much of the credit for this talk goes to Decide’s Chief Architect, Leon Stein for developing thetechnology and creating the content. Thank you, Leon.