SlideShare a Scribd company logo
Infinispan
from POC to Production
Who am I?
   Mark Addy, Senior Consultant




Fast, Reliable, Secure, Manageable
Agenda
Part 1
• An existing production system unable to scale


Part 2
• A green-field project unable to meet SLA’s
About the Customer
• Global on-line travel & accommodation
  provider
  – 50 million searches per day
• Our relationship
  – Troubleshooting
  – Workshops
Part 1


         Part 1 – Existing Application
     Connectivity Engine
     • Supplements site content with data from
       third parties (Content Providers)
         – Tomcat
         – Spring
         – EhCache
         – MySQL
         – Apache load-balancer / mod_jk
Part 1


 Logical View
Part 1


         Content Provider Challenges
    •    Unreliable third party systems
    •    Distant network communications
    •    Critical for generating local site content
    •    Response time
    •    Choice & low response time == more
         profit
Part 1


                        Existing Cache
      • NOT Hibernate 2LC
      • Spring Interceptors wrap calls to content providers
<bean id="searchService" class="org.springframework.aop.framework.ProxyFactoryBean">
   <property name="proxyInterfaces" value=“ISearchServiceTargetBean"/>
   <property name="target" ref="searchServiceTargetBean"/>
   <property name="interceptorNames">
         <list>
                 <value>cacheInterceptor</value>
         </list>
   </property>
</bean>

<bean id="searchServiceTargetBean“ class=“SearchServiceTargetBean">
   ...
</bean>
Part 1


         Extreme Redundancy
               800,000 elements
          10 nodes = 10 copies of data
Part 1


                      The Price
     • 10G JVM Heap
         – 10-12 second pauses for major GC
         – Over 8G of heap is cache
     • Eviction before Expiry
         – More trips to content providers
     • EhCache expiry / eviction piggybacks
       client cache access
Part 1


  How to Scale?
Part 1


                     Objectives
     • Reduce JVM Heap Size
         – 10 second pauses are too long
     • Increase cache capacity
     • Remove Eviction
         – Cache entries should expire naturally
     • Improve Response Times
         – Latency decreases if eviction, GC pauses
           and frequency are reduced
Part 1


                   Discussions
     • Pre-sales workshop
         – Express Terracotta EhCache
         – Oracle Coherence
         – Infinispan
Part 1


             Why Infinispan?
    • Open source advocates
    • Cutting edge technology
Part 1


                  Benchmarking
     • Must be reproducible
     • Must reflect accurately the production
       load and data
         – 50 million searches / day == 600 / sec
     • Must be able to imitate the content
       providers
Part 1


                       Solution
     • Replica load-test environment
     • Port mirror production traffic
         – Capture incoming requests
         – Capture content provider responses
     • Custom JMeter script
     • Mock application Spring Beans
Part 1


         Benchmarking Architecture
Part 1


          Benchmarking Validation
     • Understand your cached data
         – jmap
         jmap –dump:file=mydump.hprof <pid>
         – Eclipse Memory Analyzer
         – OQL
                  SELECT
                  toString(x.key)
                  , x.key.@retainedHeapSize
                  , x.value.@retainedHeapSize
                  FROM net.sf.ehcache.Element x
Part 1


               Benchmarking Validation




  Extract cached object properties - you can learn a lot quickly
         –   creationTime             –   timeToLive
         –   lastAccessTime           –   timeToIdle
         –   lastUpdateTime           –   etc
         –   hitCount                 –   etc
Part 1


           Enable JMX for Infinispan
    Enable CacheManager Statistics
          <global>
             <globalJmxStatistics
               enabled="true"
               jmxDomain="org.infinispan"
               cacheManagerName=“MyCacheManager"/>
             ...
          </global>


    Enable Cache Statistics
         <default>
              <jmxStatistics enabled="true"/>
              ...
         </default>
Part 1


            Enable Remote JMX
    -Dcom.sun.management.jmxremote.port=nnnn
    -Dcom.sun.management.jmxremote.authenticate=false
    -Dcom.sun.management.jmxremote.ssl=false
Part 1


             Record Performance
     • RHQ http://rhq-project.org
         – JVM memory, GC profile, CPU usage
         – Infinispan plugin
Part 1


         Infinispan
Part 1


         Distributed Mode
         hash(key) determines owners
Part 1


             Distribution Features
     • Configurable redundancy
         – numOwners
     • Dynamic scaling
         – Automatic rebalancing for distribution and
           recovery of redundancy
     • Replication (distribution) overhead does
       not increase as more nodes are added
Part 1


                        Hotrod
     • Client – Server architecture
         – Java client
         – Connection pooling
         – Dynamic scaling
         – Smart routing
     • Separate application and cache
       memory requirements
Part 1


    Application – Cache Separation
  Application
  • CPU intensive
  • High infant mortality
  Cache
  • Low CPU requirement
  • Mortality linked to
    expiry / eviction
Part 1


         Hotrod Architecture
Part 1


     Remember this is cutting edge
     • Latest final release was 4.2.1
     • Lets get cracking...
         – Distributed mode
         – Hotrod client
         – What issues did we encounter...
Part 1


  Topology Aware Consistent Hash
     • Ensure back-ups are held preferentially on
       separate machine, rack and site




     • https://community.jboss.org/thread/168236
     • We need to upgrade to the latest 5.0.0.CR
Part 1


               Virtual Nodes
    Sub-divides hash wheel positions




    <hash numVirtualNodes=2/>
Part 1


                  Virtual Nodes
     • Improves data distribution




     • But didn’t work at the time for Hotrod
     • https://issues.jboss.org/browse/ISPN-1217
Part 1


         Hotrod Concurrent Start-up
 • Dynamic scaling
    – Replicated ___hotRodTopologyCache holds current cluster topology




    – New starters must lock and update this cache to add themselves to
      the current view
    – Deadlock!
    – https://issues.jboss.org/browse/ISPN-1182
 • Stagger start-up
Part 1


     Hotrod Client Failure Detection
         Unable to recover from cluster splits
Part 1


    Hotrod Client Failure Detection
    • New servers only added to
      ___hotRodTopologyCache on start-up
    • Restart required to re-establish client topology view
Part 1


   Hotrod Server Cluster Meltdown
Part 1


   Hotrod Server Cluster Meltdown
     • Clients can’t start without an available
       server
     • Static Configuration is only read once
     • To restart client-server communications
       either
         – Restart last “known” server
         – Restart the client
Part 1


                      Change of tack
     • Hotrod abandoned, for now
         –   Data distribution
         –   Concurrent start up
         –   Failure detection
         –   Unacceptable for this customer
     • Enter the classic embedded approach




     • How did we get this to work...
Part 1


                    Dynamic Scaling
    • Unpredictable under heavy load, writers blocked
         – Unacceptable waits for this system
           <hash numOwners=“2” rehashEnabled=“false” />
         – Accept some data loss during a leave / join
    • Chunked rehashing / state transfer (5.1)
         – https://issues.jboss.org/browse/ISPN-284
    • Non-blocking state transfer
         – https://issues.jboss.org/browse/ISPN-1424
    • Manual rehashing
         – https://issues.jboss.org/browse/ISPN-1394
Part 1


                 Cache Entry Size
    • Average cache entry ~6K
         – 1 million entries = 6GB
         – Hotrod stores serialized entries by default
    • JBoss Marshalling
         – Default Infinispan mechanism
         – Get reference from ComponentRegistry
    • JBoss Serialization
         – Quick, easy to implement
Part 1


         Compression Considerations
     • Trade
         – Capacity in JVM vs Serialization Overhead
     • Suitability
         – Assess on a cache by cache basis
         – Very high access is probably too expensive
     • Average 6K reduced to 1K
Part 1


           Advanced Cache Tuning
     cache.getAdvancedCache.withFlags(Flag... flags)
     • Flag.SKIP_REMOTE_LOOKUP
        – Prevents remote gets being run for an update
          put(K key, V value)
     DistributionInterceptor.remoteGetBeforeWrite()
     DistributionInterceptor.handleWriteCommand()
     DistributionInterceptor.visitPutKeyValueCommand()
         – We don’t need to return the previous cache entry
           value
Part 1


                         JGroups
    • UDP out-performed TCP (for us)
    • Discovery
         – For a cold, full cluster start-up avoid split
           brain / merge scenarios
           <PING timeout="3000" num_initial_members="10"/>

    • Heartbeat
         – Ensure failure detection is configured
           appropriately
           <FD_ALL interval="3000" timeout="10000"/>
Part 1


         Extending Embedded
Part 1


         Current Production System
    • Over 20 nodes
         – 8 Request facing, remainder storage only
    • Over 15 million entries
         – 7.5 million unique
         – 20GB cached data
         – Nothing is evicted before natural expiration
    • 5GB JVM Heap, 3-4 second GC pauses
    • 30% reduction in response times
Part 1


                           Summary
     • Don’t compromise on the benchmarking
         – Understand your cached data profile
         – Functional testing is NOT sufficient
         – Monitoring and Analysis is essential
     • Tune Virtual Nodes for best distribution
     • Mitigate memory usage of embedded cache
         – Consider compressing embedded cache entries
         – Non request facing storage nodes
     • Distributed Infinispan out performs EhCache
     • Don’t rule Hotrod out
         – Not acceptable for this customer
         – Many improvements and bug fixes
Part 2


         Part 2 – Green Field SLA’s
    New Pricing Engine
         –   Tomcat
         –   Spring & Grails
         –   Infinispan
         –   Oracle RAC
         –   Apache load-balancer / mod_jk
    Historical Pricing Engine
         – EhCache
         – MySQL
         – 2 second full Paris Query
Part 2


                        Logical View
  • New Pricing Engine
         – Side by side rollout
         – Controller determines
           where to send requests and
           aggregates results
         – NOT Hibernate 2LC
         – Spring Interceptors
           containing logic to check /
           update cache wrap calls to
           DB that extract and
           generate cache entries
Part 2


                Proposed Caching
    • Everything distributed
         – It worked before so we just turn in on, right?
Part 2


                         The Pain
     • Distributed Mode
         – Network saturation on 1Gb switch
           (125MB/second) under load
         – Contention in org.jgroups
     • Performance SLA’s
         – Caching data in Local mode required 14G heap &
           20 second GC pauses
     • Aggressive rollout strategy
         – Struggling at low user load
Part 2


                    Cache Analysis

 • Eclipse Memory Analyzer
    – Identify cache profile
    – Small subset of elements account for
      almost all the space
    – Paris “Rates” sizes 20K – 1.6MB
    – Paris search (500 rates records) ==
      50MB total
    – 1Gb switch max throughput =
      125MB/second
Part 2


              Revised Caching
    • Local caching for numerous “small” elements
    • Distributed for “large” expensive elements
Part 2


                  Distributed Issue
 • Here’s why normal distributed doesn’t work
    – One Paris request requires 500 rates records (50MB)
    – 10 nodes distributed cluster = 1 in 5 chance data is local
    – 80% remote Gets == 40MB network traffic
Part 2


                             Options
  • Rewrite the application caching logic
         – Significantly reduce the element size
  • Run Local caching with oversized heap
         – Daily restart, eliminate full GC pauses
         – Large memory investment and careful management
  • Sacrifice caching and hit the DB
         – Hits response times and hammer the database
  • Distributed Execution?
         – Send a task to the data and extract just what you need
Part 2


          Change in Psychology...
           If the mountain will not come to
         Muhammad, then Muhammad must go
                    to the mountain
Part 2


                  Distributed Execution
  • DefaultExecutorService
         – http://docs.jboss.org/infinispan/5.1/apidocs/org/infinispan/distexec
           /DefaultExecutorService.html
  • Create the Distributed Execution Service to run on the cache
    node specified
     public DefaultExecutorService(Cache masterCacheNode)
  • Run task on primary owner of Key input
     public Future<T> submit(Callable<T> task, K... input)
         – Resolve primary owner of Key then either
             • Run locally
             • Issue a remote command and run on the owning node
Part 2


                Pricing Controller
     • Callable task
         – Contains code to
           • Grab reference to local Spring Context
           • Load required beans
           • Spring interceptor checks cache at the owning
Existing     node (local get)
 Code      • If not found then goto database, retrieve and
             update cache
           • Extract pricing based on request criteria
           • Return results
Part 2


                    Pricing Controller
 • Create a new DefaultExecutorService
    – Create callable tasks required to satisfy request
    – Issue callable tasks concurrently
    while (moreKeys) {
       Callable<T> callable = new MyCallable<T>(...);
       Future<T> future = distributedExecutorService.submit(callable, key);
       ...
    }

    – Collate results and assemble response
    while (moreFutures) {
       T result = future.get();
    }
Part 2


           Distributed Execution
    • Only the relevant information from the cache
      entry is returned
Part 2


                       Results
     • Latency – Paris search
         – Historic Engine 2 seconds
         – Dist-Exec 200ms
     • JVM
         – 5GB Heap
         – 3-4 second pauses
Part 2


                        Limitations
     • Failover
         – Task sent to primary owner only
         – https://community.jboss.org/wiki/Infinispan60-
           DistributedExecutionEnhancements
         – Handle failures yourself
     • Hotrod not supported
         – This would be fantastic!
         – https://issues.jboss.org/browse/ISPN-1094
     • Both in 6.0?
Part 2


                         Summary
     • Analysis and re-design of cached data
     • Accessing large data sets requires an
       alternative access pattern
     • Dramatically reduced latency
         – Parallel execution
         – Fraction of data transferred across the wire
     • Execution failures must be handled by
       application code, at the moment...
Thanks for Listening!

    Any Questions?

More Related Content

What's hot

PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
Jignesh Shah
 
Deep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseDeep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL Universe
Jignesh Shah
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
PostgreSQL Experts, Inc.
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
EDB
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerkuchinskaya
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
Continuent
 
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Microsoft Technet France
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Microsoft Technet France
 
MariaDB High Availability Webinar
MariaDB High Availability WebinarMariaDB High Availability Webinar
MariaDB High Availability Webinar
MariaDB plc
 
Percona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupPercona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupVadim Tkachenko
 
Garbage First and you
Garbage First and youGarbage First and you
Garbage First and you
Kai Koenig
 
인피니스팬데이터그리드따라잡기 (@JCO 2014)
인피니스팬데이터그리드따라잡기 (@JCO 2014)인피니스팬데이터그리드따라잡기 (@JCO 2014)
인피니스팬데이터그리드따라잡기 (@JCO 2014)
Jaehong Cheon
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
MariaDB plc
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling Up
Sander Temme
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
Codership Oy - Creators of Galera Cluster
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQLPeter Eisentraut
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld
 
Galera Cluster 3.0 Features
Galera Cluster 3.0 FeaturesGalera Cluster 3.0 Features
Galera Cluster 3.0 Features
Codership Oy - Creators of Galera Cluster
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
Cloudera, Inc.
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing Demand
Jonas Bonér
 

What's hot (20)

PostgreSQL and Benchmarks
PostgreSQL and BenchmarksPostgreSQL and Benchmarks
PostgreSQL and Benchmarks
 
Deep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL UniverseDeep Dive into RDS PostgreSQL Universe
Deep Dive into RDS PostgreSQL Universe
 
Fail over fail_back
Fail over fail_backFail over fail_back
Fail over fail_back
 
Postgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster SuitePostgres & Red Hat Cluster Suite
Postgres & Red Hat Cluster Suite
 
Buytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemakerBuytaert kris my_sql-pacemaker
Buytaert kris my_sql-pacemaker
 
Training Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten ClusteringTraining Slides: Basics 102: Introduction to Tungsten Clustering
Training Slides: Basics 102: Introduction to Tungsten Clustering
 
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
Exchange 2013 Haute disponibilité et tolérance aux sinistres (Session 1/2 pre...
 
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
Exchange Server 2013 : les mécanismes de haute disponibilité et la redondance...
 
MariaDB High Availability Webinar
MariaDB High Availability WebinarMariaDB High Availability Webinar
MariaDB High Availability Webinar
 
Percona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF MeetupPercona XtraDB Cluster SF Meetup
Percona XtraDB Cluster SF Meetup
 
Garbage First and you
Garbage First and youGarbage First and you
Garbage First and you
 
인피니스팬데이터그리드따라잡기 (@JCO 2014)
인피니스팬데이터그리드따라잡기 (@JCO 2014)인피니스팬데이터그리드따라잡기 (@JCO 2014)
인피니스팬데이터그리드따라잡기 (@JCO 2014)
 
Choosing the right high availability strategy
Choosing the right high availability strategyChoosing the right high availability strategy
Choosing the right high availability strategy
 
Apache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling UpApache Performance Tuning: Scaling Up
Apache Performance Tuning: Scaling Up
 
How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013How to understand Galera Cluster - 2013
How to understand Galera Cluster - 2013
 
Replication Solutions for PostgreSQL
Replication Solutions for PostgreSQLReplication Solutions for PostgreSQL
Replication Solutions for PostgreSQL
 
VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters VMworld 2013: Performance and Capacity Management of DRS Clusters
VMworld 2013: Performance and Capacity Management of DRS Clusters
 
Galera Cluster 3.0 Features
Galera Cluster 3.0 FeaturesGalera Cluster 3.0 Features
Galera Cluster 3.0 Features
 
What's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File SystemWhat's New and Upcoming in HDFS - the Hadoop Distributed File System
What's New and Upcoming in HDFS - the Hadoop Distributed File System
 
Reactive Supply To Changing Demand
Reactive Supply To Changing DemandReactive Supply To Changing Demand
Reactive Supply To Changing Demand
 

Similar to Infinispan from POC to Production

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
Sander Temme
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance_Out.pptx
Performance_Out.pptxPerformance_Out.pptx
Performance_Out.pptx
sanjanabal
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance out
Jack Huang
 
Performance out
Performance outPerformance out
Performance out
Dragan Kolarevic
 
Performance out
Performance outPerformance out
Performance out
test account
 
Performance out
Performance outPerformance out
Performance out
Moderator_kursa
 
Performance out
Performance outPerformance out
Performance outJack Huang
 
Performance out
Performance outPerformance out
Performance out
Andrea Martinez
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
Jon Haddad
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement VMware Tanzu
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
Jagadish Venkatraman
 
GeeCon- 'www.NoSQL.com' by Mark Addy
GeeCon- 'www.NoSQL.com' by Mark Addy GeeCon- 'www.NoSQL.com' by Mark Addy
GeeCon- 'www.NoSQL.com' by Mark Addy
C2B2 Consulting
 

Similar to Infinispan from POC to Production (20)

Apache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling OutApache Performance Tuning: Scaling Out
Apache Performance Tuning: Scaling Out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance_Out.pptx
Performance_Out.pptxPerformance_Out.pptx
Performance_Out.pptx
 
2 7
2 72 7
2 7
 
Performance out
Performance outPerformance out
Performance out
 
title
titletitle
title
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Performance out
Performance outPerformance out
Performance out
 
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
Flink Forward SF 2017: Feng Wang & Zhijiang Wang - Runtime Improvements in Bl...
 
Diagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - CassandraDiagnosing Problems in Production - Cassandra
Diagnosing Problems in Production - Cassandra
 
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
Slides for the Apache Geode Hands-on Meetup and Hackathon Announcement
 
Performance out
Performance outPerformance out
Performance out
 
ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?ApacheCon BigData - What it takes to process a trillion events a day?
ApacheCon BigData - What it takes to process a trillion events a day?
 
GeeCon- 'www.NoSQL.com' by Mark Addy
GeeCon- 'www.NoSQL.com' by Mark Addy GeeCon- 'www.NoSQL.com' by Mark Addy
GeeCon- 'www.NoSQL.com' by Mark Addy
 

More from JBUG London

London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application ServerLondon JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
JBUG London
 
WebSocketson WildFly
WebSocketson WildFly WebSocketson WildFly
WebSocketson WildFly
JBUG London
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9
JBUG London
 
Introduction to PicketLink
Introduction to PicketLinkIntroduction to PicketLink
Introduction to PicketLink
JBUG London
 
Extending WildFly
Extending WildFlyExtending WildFly
Extending WildFly
JBUG London
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0
JBUG London
 
Compensating Transactions: When ACID is too much
Compensating Transactions: When ACID is too muchCompensating Transactions: When ACID is too much
Compensating Transactions: When ACID is too much
JBUG London
 
London JBUG - Connecting Applications Everywhere with JBoss A-MQ
London JBUG - Connecting Applications Everywhere with JBoss A-MQLondon JBUG - Connecting Applications Everywhere with JBoss A-MQ
London JBUG - Connecting Applications Everywhere with JBoss A-MQ
JBUG London
 
Easy Integration with Apache Camel and Fuse IDE
Easy Integration with Apache Camel and Fuse IDEEasy Integration with Apache Camel and Fuse IDE
Easy Integration with Apache Camel and Fuse IDE
JBUG London
 
jBPM5 - The Evolution of BPM Systems
jBPM5 - The Evolution of BPM SystemsjBPM5 - The Evolution of BPM Systems
jBPM5 - The Evolution of BPM Systems
JBUG London
 
Arquillian - Integration Testing Made Easy
Arquillian - Integration Testing Made EasyArquillian - Integration Testing Made Easy
Arquillian - Integration Testing Made Easy
JBUG London
 
Hibernate OGM - JPA for Infinispan and NoSQL
Hibernate OGM - JPA for Infinispan and NoSQLHibernate OGM - JPA for Infinispan and NoSQL
Hibernate OGM - JPA for Infinispan and NoSQL
JBUG London
 
JBoss jBPM, the future is now for all your Business Processes by Eric Schabell
JBoss jBPM, the future is now for all your Business Processes by Eric SchabellJBoss jBPM, the future is now for all your Business Processes by Eric Schabell
JBoss jBPM, the future is now for all your Business Processes by Eric Schabell
JBUG London
 
JBoss AS7 by Matt Brasier
JBoss AS7 by Matt BrasierJBoss AS7 by Matt Brasier
JBoss AS7 by Matt Brasier
JBUG London
 

More from JBUG London (14)

London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application ServerLondon JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
London JBUG April 2015 - Performance Tuning Apps with WildFly Application Server
 
WebSocketson WildFly
WebSocketson WildFly WebSocketson WildFly
WebSocketson WildFly
 
Hacking on WildFly 9
Hacking on WildFly 9Hacking on WildFly 9
Hacking on WildFly 9
 
Introduction to PicketLink
Introduction to PicketLinkIntroduction to PicketLink
Introduction to PicketLink
 
Extending WildFly
Extending WildFlyExtending WildFly
Extending WildFly
 
What's New in Infinispan 6.0
What's New in Infinispan 6.0What's New in Infinispan 6.0
What's New in Infinispan 6.0
 
Compensating Transactions: When ACID is too much
Compensating Transactions: When ACID is too muchCompensating Transactions: When ACID is too much
Compensating Transactions: When ACID is too much
 
London JBUG - Connecting Applications Everywhere with JBoss A-MQ
London JBUG - Connecting Applications Everywhere with JBoss A-MQLondon JBUG - Connecting Applications Everywhere with JBoss A-MQ
London JBUG - Connecting Applications Everywhere with JBoss A-MQ
 
Easy Integration with Apache Camel and Fuse IDE
Easy Integration with Apache Camel and Fuse IDEEasy Integration with Apache Camel and Fuse IDE
Easy Integration with Apache Camel and Fuse IDE
 
jBPM5 - The Evolution of BPM Systems
jBPM5 - The Evolution of BPM SystemsjBPM5 - The Evolution of BPM Systems
jBPM5 - The Evolution of BPM Systems
 
Arquillian - Integration Testing Made Easy
Arquillian - Integration Testing Made EasyArquillian - Integration Testing Made Easy
Arquillian - Integration Testing Made Easy
 
Hibernate OGM - JPA for Infinispan and NoSQL
Hibernate OGM - JPA for Infinispan and NoSQLHibernate OGM - JPA for Infinispan and NoSQL
Hibernate OGM - JPA for Infinispan and NoSQL
 
JBoss jBPM, the future is now for all your Business Processes by Eric Schabell
JBoss jBPM, the future is now for all your Business Processes by Eric SchabellJBoss jBPM, the future is now for all your Business Processes by Eric Schabell
JBoss jBPM, the future is now for all your Business Processes by Eric Schabell
 
JBoss AS7 by Matt Brasier
JBoss AS7 by Matt BrasierJBoss AS7 by Matt Brasier
JBoss AS7 by Matt Brasier
 

Recently uploaded

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
Vlad Stirbu
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 

Recently uploaded (20)

Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 

Infinispan from POC to Production

  • 1.
  • 3. Who am I? Mark Addy, Senior Consultant Fast, Reliable, Secure, Manageable
  • 4. Agenda Part 1 • An existing production system unable to scale Part 2 • A green-field project unable to meet SLA’s
  • 5. About the Customer • Global on-line travel & accommodation provider – 50 million searches per day • Our relationship – Troubleshooting – Workshops
  • 6. Part 1 Part 1 – Existing Application Connectivity Engine • Supplements site content with data from third parties (Content Providers) – Tomcat – Spring – EhCache – MySQL – Apache load-balancer / mod_jk
  • 8. Part 1 Content Provider Challenges • Unreliable third party systems • Distant network communications • Critical for generating local site content • Response time • Choice & low response time == more profit
  • 9. Part 1 Existing Cache • NOT Hibernate 2LC • Spring Interceptors wrap calls to content providers <bean id="searchService" class="org.springframework.aop.framework.ProxyFactoryBean"> <property name="proxyInterfaces" value=“ISearchServiceTargetBean"/> <property name="target" ref="searchServiceTargetBean"/> <property name="interceptorNames"> <list> <value>cacheInterceptor</value> </list> </property> </bean> <bean id="searchServiceTargetBean“ class=“SearchServiceTargetBean"> ... </bean>
  • 10. Part 1 Extreme Redundancy 800,000 elements 10 nodes = 10 copies of data
  • 11. Part 1 The Price • 10G JVM Heap – 10-12 second pauses for major GC – Over 8G of heap is cache • Eviction before Expiry – More trips to content providers • EhCache expiry / eviction piggybacks client cache access
  • 12. Part 1 How to Scale?
  • 13. Part 1 Objectives • Reduce JVM Heap Size – 10 second pauses are too long • Increase cache capacity • Remove Eviction – Cache entries should expire naturally • Improve Response Times – Latency decreases if eviction, GC pauses and frequency are reduced
  • 14. Part 1 Discussions • Pre-sales workshop – Express Terracotta EhCache – Oracle Coherence – Infinispan
  • 15. Part 1 Why Infinispan? • Open source advocates • Cutting edge technology
  • 16. Part 1 Benchmarking • Must be reproducible • Must reflect accurately the production load and data – 50 million searches / day == 600 / sec • Must be able to imitate the content providers
  • 17. Part 1 Solution • Replica load-test environment • Port mirror production traffic – Capture incoming requests – Capture content provider responses • Custom JMeter script • Mock application Spring Beans
  • 18. Part 1 Benchmarking Architecture
  • 19. Part 1 Benchmarking Validation • Understand your cached data – jmap jmap –dump:file=mydump.hprof <pid> – Eclipse Memory Analyzer – OQL SELECT toString(x.key) , x.key.@retainedHeapSize , x.value.@retainedHeapSize FROM net.sf.ehcache.Element x
  • 20. Part 1 Benchmarking Validation Extract cached object properties - you can learn a lot quickly – creationTime – timeToLive – lastAccessTime – timeToIdle – lastUpdateTime – etc – hitCount – etc
  • 21. Part 1 Enable JMX for Infinispan Enable CacheManager Statistics <global> <globalJmxStatistics enabled="true" jmxDomain="org.infinispan" cacheManagerName=“MyCacheManager"/> ... </global> Enable Cache Statistics <default> <jmxStatistics enabled="true"/> ... </default>
  • 22. Part 1 Enable Remote JMX -Dcom.sun.management.jmxremote.port=nnnn -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
  • 23. Part 1 Record Performance • RHQ http://rhq-project.org – JVM memory, GC profile, CPU usage – Infinispan plugin
  • 24. Part 1 Infinispan
  • 25. Part 1 Distributed Mode hash(key) determines owners
  • 26. Part 1 Distribution Features • Configurable redundancy – numOwners • Dynamic scaling – Automatic rebalancing for distribution and recovery of redundancy • Replication (distribution) overhead does not increase as more nodes are added
  • 27. Part 1 Hotrod • Client – Server architecture – Java client – Connection pooling – Dynamic scaling – Smart routing • Separate application and cache memory requirements
  • 28. Part 1 Application – Cache Separation Application • CPU intensive • High infant mortality Cache • Low CPU requirement • Mortality linked to expiry / eviction
  • 29. Part 1 Hotrod Architecture
  • 30. Part 1 Remember this is cutting edge • Latest final release was 4.2.1 • Lets get cracking... – Distributed mode – Hotrod client – What issues did we encounter...
  • 31. Part 1 Topology Aware Consistent Hash • Ensure back-ups are held preferentially on separate machine, rack and site • https://community.jboss.org/thread/168236 • We need to upgrade to the latest 5.0.0.CR
  • 32. Part 1 Virtual Nodes Sub-divides hash wheel positions <hash numVirtualNodes=2/>
  • 33. Part 1 Virtual Nodes • Improves data distribution • But didn’t work at the time for Hotrod • https://issues.jboss.org/browse/ISPN-1217
  • 34. Part 1 Hotrod Concurrent Start-up • Dynamic scaling – Replicated ___hotRodTopologyCache holds current cluster topology – New starters must lock and update this cache to add themselves to the current view – Deadlock! – https://issues.jboss.org/browse/ISPN-1182 • Stagger start-up
  • 35. Part 1 Hotrod Client Failure Detection Unable to recover from cluster splits
  • 36. Part 1 Hotrod Client Failure Detection • New servers only added to ___hotRodTopologyCache on start-up • Restart required to re-establish client topology view
  • 37. Part 1 Hotrod Server Cluster Meltdown
  • 38. Part 1 Hotrod Server Cluster Meltdown • Clients can’t start without an available server • Static Configuration is only read once • To restart client-server communications either – Restart last “known” server – Restart the client
  • 39. Part 1 Change of tack • Hotrod abandoned, for now – Data distribution – Concurrent start up – Failure detection – Unacceptable for this customer • Enter the classic embedded approach • How did we get this to work...
  • 40. Part 1 Dynamic Scaling • Unpredictable under heavy load, writers blocked – Unacceptable waits for this system <hash numOwners=“2” rehashEnabled=“false” /> – Accept some data loss during a leave / join • Chunked rehashing / state transfer (5.1) – https://issues.jboss.org/browse/ISPN-284 • Non-blocking state transfer – https://issues.jboss.org/browse/ISPN-1424 • Manual rehashing – https://issues.jboss.org/browse/ISPN-1394
  • 41. Part 1 Cache Entry Size • Average cache entry ~6K – 1 million entries = 6GB – Hotrod stores serialized entries by default • JBoss Marshalling – Default Infinispan mechanism – Get reference from ComponentRegistry • JBoss Serialization – Quick, easy to implement
  • 42. Part 1 Compression Considerations • Trade – Capacity in JVM vs Serialization Overhead • Suitability – Assess on a cache by cache basis – Very high access is probably too expensive • Average 6K reduced to 1K
  • 43. Part 1 Advanced Cache Tuning cache.getAdvancedCache.withFlags(Flag... flags) • Flag.SKIP_REMOTE_LOOKUP – Prevents remote gets being run for an update put(K key, V value) DistributionInterceptor.remoteGetBeforeWrite() DistributionInterceptor.handleWriteCommand() DistributionInterceptor.visitPutKeyValueCommand() – We don’t need to return the previous cache entry value
  • 44. Part 1 JGroups • UDP out-performed TCP (for us) • Discovery – For a cold, full cluster start-up avoid split brain / merge scenarios <PING timeout="3000" num_initial_members="10"/> • Heartbeat – Ensure failure detection is configured appropriately <FD_ALL interval="3000" timeout="10000"/>
  • 45. Part 1 Extending Embedded
  • 46. Part 1 Current Production System • Over 20 nodes – 8 Request facing, remainder storage only • Over 15 million entries – 7.5 million unique – 20GB cached data – Nothing is evicted before natural expiration • 5GB JVM Heap, 3-4 second GC pauses • 30% reduction in response times
  • 47. Part 1 Summary • Don’t compromise on the benchmarking – Understand your cached data profile – Functional testing is NOT sufficient – Monitoring and Analysis is essential • Tune Virtual Nodes for best distribution • Mitigate memory usage of embedded cache – Consider compressing embedded cache entries – Non request facing storage nodes • Distributed Infinispan out performs EhCache • Don’t rule Hotrod out – Not acceptable for this customer – Many improvements and bug fixes
  • 48. Part 2 Part 2 – Green Field SLA’s New Pricing Engine – Tomcat – Spring & Grails – Infinispan – Oracle RAC – Apache load-balancer / mod_jk Historical Pricing Engine – EhCache – MySQL – 2 second full Paris Query
  • 49. Part 2 Logical View • New Pricing Engine – Side by side rollout – Controller determines where to send requests and aggregates results – NOT Hibernate 2LC – Spring Interceptors containing logic to check / update cache wrap calls to DB that extract and generate cache entries
  • 50. Part 2 Proposed Caching • Everything distributed – It worked before so we just turn in on, right?
  • 51. Part 2 The Pain • Distributed Mode – Network saturation on 1Gb switch (125MB/second) under load – Contention in org.jgroups • Performance SLA’s – Caching data in Local mode required 14G heap & 20 second GC pauses • Aggressive rollout strategy – Struggling at low user load
  • 52. Part 2 Cache Analysis • Eclipse Memory Analyzer – Identify cache profile – Small subset of elements account for almost all the space – Paris “Rates” sizes 20K – 1.6MB – Paris search (500 rates records) == 50MB total – 1Gb switch max throughput = 125MB/second
  • 53. Part 2 Revised Caching • Local caching for numerous “small” elements • Distributed for “large” expensive elements
  • 54. Part 2 Distributed Issue • Here’s why normal distributed doesn’t work – One Paris request requires 500 rates records (50MB) – 10 nodes distributed cluster = 1 in 5 chance data is local – 80% remote Gets == 40MB network traffic
  • 55. Part 2 Options • Rewrite the application caching logic – Significantly reduce the element size • Run Local caching with oversized heap – Daily restart, eliminate full GC pauses – Large memory investment and careful management • Sacrifice caching and hit the DB – Hits response times and hammer the database • Distributed Execution? – Send a task to the data and extract just what you need
  • 56. Part 2 Change in Psychology... If the mountain will not come to Muhammad, then Muhammad must go to the mountain
  • 57. Part 2 Distributed Execution • DefaultExecutorService – http://docs.jboss.org/infinispan/5.1/apidocs/org/infinispan/distexec /DefaultExecutorService.html • Create the Distributed Execution Service to run on the cache node specified public DefaultExecutorService(Cache masterCacheNode) • Run task on primary owner of Key input public Future<T> submit(Callable<T> task, K... input) – Resolve primary owner of Key then either • Run locally • Issue a remote command and run on the owning node
  • 58. Part 2 Pricing Controller • Callable task – Contains code to • Grab reference to local Spring Context • Load required beans • Spring interceptor checks cache at the owning Existing node (local get) Code • If not found then goto database, retrieve and update cache • Extract pricing based on request criteria • Return results
  • 59. Part 2 Pricing Controller • Create a new DefaultExecutorService – Create callable tasks required to satisfy request – Issue callable tasks concurrently while (moreKeys) { Callable<T> callable = new MyCallable<T>(...); Future<T> future = distributedExecutorService.submit(callable, key); ... } – Collate results and assemble response while (moreFutures) { T result = future.get(); }
  • 60. Part 2 Distributed Execution • Only the relevant information from the cache entry is returned
  • 61. Part 2 Results • Latency – Paris search – Historic Engine 2 seconds – Dist-Exec 200ms • JVM – 5GB Heap – 3-4 second pauses
  • 62. Part 2 Limitations • Failover – Task sent to primary owner only – https://community.jboss.org/wiki/Infinispan60- DistributedExecutionEnhancements – Handle failures yourself • Hotrod not supported – This would be fantastic! – https://issues.jboss.org/browse/ISPN-1094 • Both in 6.0?
  • 63. Part 2 Summary • Analysis and re-design of cached data • Accessing large data sets requires an alternative access pattern • Dramatically reduced latency – Parallel execution – Fraction of data transferred across the wire • Execution failures must be handled by application code, at the moment...
  • 64. Thanks for Listening! Any Questions?