Making the Most of In-Memory: More than Speed


Published on

The Briefing Room with Robin Bloor and Kognitio
Live Webcast Oct. 1, 2013
Watch the archive:

Everyone’s talking about in-memory these days, and the term has become synonymous with speed. But pinning data into memory is just the beginning, and it’s about more than speed. In-memory solutions need a tailored architecture, one that can take full advantage RAM processing from every aspect, and this requires an approach that considers memory and CPU from the ground-up.

Register for this episode of The Briefing Room to hear from veteran Analyst Robin Bloor as he explains how memory is on the fast track to supersede disk, at least with respect to advanced analytics. He’ll be briefed by Kognitio CTO Roger Gaskell, who pioneered the in-memory analytical platform since its inception in 1989. He will also discuss how this type of solution changes the landscape for the modern data architecture and its impact on advanced analytical capabilities.

Visit for more information

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Making the Most of In-Memory: More than Speed

  1. 1. Making the Most of In-Memory: More than Speed The Briefing Room
  2. 2. Welcome Host: Eric Kavanagh Twitter Tag: #briefr The Briefing Room
  3. 3. Mission !   Reveal the essential characteristics of enterprise software, good and bad !   Provide a forum for detailed analysis of today s innovative technologies !   Give vendors a chance to explain their product to savvy analysts !   Allow audience members to pose serious questions... and get answers! Twitter Tag: #briefr The Briefing Room
  4. 4. Topics This Month: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION December: INNOVATORS Twitter Tag: #briefr The Briefing Room
  5. 5. Data Processing “ Efficiency  is  doing  things   right;  effec2veness  is  doing   the  right  things.   ~Peter Drucker Twitter Tag: #briefr The Briefing Room
  6. 6. Analyst: Robin Bloor Robin Bloor is Chief Analyst at The Bloor Group Twitter Tag: #briefr The Briefing Room
  7. 7. Kognitio !   Founded in 1989, Kognitio is both an in-memory database and an analytical engine !   The Kognitio Analytical Platform can be deployed as software, as an appliance, or in the cloud !   The platform enables flexible, ad hoc queries on complex data sets, including data from Hadoop, and it offers scaleup and scale-out capabilities Twitter Tag: #briefr The Briefing Room
  8. 8. Guest: Roger Gaskell   Roger Gaskell is the Chief Technology Officer and one of the founding members of the Kognitio Development Team. He has overall responsibility for all product development, strategic direction and roadmap of new innovation for the Kognitio Analytical Platform. Roger has been instrumental in all generations of the product to date. Over this time, it has evolved from an appliance-based system in the original beta offering in 1989, to a hardware-independent software for x86 processing, then to a cloud-based Platform-as-a-Service offering in in the mid-1990s. Prior to Kognitio, Roger was test and development manager at AB Electronics. During this time his primary responsibility was for the famous BBC Micro Computer and the development and testing of the first mass production of personal computers for IBM. Twitter Tag: #briefr The Briefing Room
  9. 9. Making the most of in-memory platforms October 2013
  10. 10. What is an “In-memory” analytical platform A database where queries are run from data held in computer memory (RAM) rather than mechanical disk Memory = Fast / Disk = Slow Analytics go much quicker – SIMPLE? Unfortunately, it’s not as simple as that…. 10
  11. 11. Why in-memory: RAM is faster than disk (really!) Actually, this only part of the story: workload filtering crunching Analytics completely change the workload characteristics on the database Simple reporting & transactional processing is all about “filtering” the data of interest Analytics is all about complex “crunching” of the data once it is filtered CPU cycles storing Storing data on physical disks severely limits the rate at which data can be provided to the CPUs access 11 Crunching needs processing power & consumes CPU cycles Accessing data directly from RAM allows much more CPU power to be deployed
  12. 12. Analytics is about crunching through data CPU cycle-intensive & CPU-bound “CRUNCHING” Analytical Functions Joins Aggregations Sorts Grouping •  To understand what is happening in the data More complex analytics = More pronounced this becomes •  In-memory analytical platforms are therefore CPU-bound –  Assume disk I/O speeds not a bottleneck –  In-memory removes the disk I/O bottleneck 12
  13. 13. For analytics, the CPU is king Being CPU-bound fundamentally changes a system’s design philosophy Disk IO Bound CPU Bound CPUs wait for data from disk No need for efficient coding Parallelisation ineffective Every CPU cycle is precious – efficient coding Parallelization = scalable performance Advanced techniques minimize CPU cycles Interactive / ad hoc analytics: THINK data to core ratios ≈ <10GB data per CPU core 13
  14. 14. Why now? Interest in in-memory Price of RAM, Logarithmic (10) 1987 14 1995 2000 2005 2010
  15. 15. Mature BI being overtaken Numbers, tables, charts, indicators Historical information, latency …accessed with ease and simplicity Decision Support But BI and BI tools have plateaued! Progression into advanced analytics & data science It’s now all about doing more math …a lot more math 15
  16. 16. Thus more complex methods – real-time Machine learning algorithms Analytical Complexity Behaviour modelling Statistical Analysis Dynamic Simulation Clustering Dynamic Interaction Fraud detection Reporting & BPM Campaign Management #PP_R Technology/Automation 16
  17. 17. How to efficiently exploit RAM •  A large cache is not in-memory –  In-memory platforms hold data in structures that take advantage of the properties of RAM –  Caches are copies of frequently used disk blocks •  Platform designed to specifically exploit the random access nature of memory –  Different algorithms –  CPU cycles are precious – code efficiency paramount –  Advanced techniques used to reduce code path length •  Dynamic Machine Code Generation •  Extended CPU instruction sets •  Parallelize everything –  Scale-out and Scale-up –  Fully and efficiently use every CPU core, in every CPU, in every server 17
  18. 18. Analytical Platform Reference Architecture Application & Client Layer All BI Tools All OLAP Clients Excel Analytical Platform Layer Near-line Storage (optional) Reporting Persistence Layer 18 Kognitio Storage Hadoop Clusters Cloud Storage Enterprise Data Warehouses Legacy Systems
  19. 19. Perceptions & Questions Analyst: Robin Bloor Twitter Tag: #briefr The Briefing Room
  20. 20. Big Data, Maybe — Big Parallelism, Yes Many latency-reducing changes are afoot: u  Hadoop u  CPU is a data lake – It’s about latency and memory rule – The old database is dying u  Grids, not clusters – A server is now a cluster u  Scaling Up AND Scaling Out – “Only scaling out” is last year’s story u  SSD will replace spinning disk – But it will never compete with RAM
  21. 21. Why the Excitement? What are the “new” applications? BIG DATA capture and staging BIG DATA ANALYTICS LITTLE DATA ANALYTICS OPERATIONAL INTELLIGENCE
  22. 22. A “Modern” Workload Query Light & Math Heavy
  23. 23. Where the Rubber Meets the Road It isn’t really about application latency any more, it’s about business process latency (business time!). This can have many aspects: u  The collapse of data flows – take the processing to the data u  Data u  Full warehouse offload process automation u  Lower latency = NEW BUSINESS PROCESSES
  24. 24. The Question The question for most organizations is: Exactly how do we take advantage of these changes? This is a BUSINESS question AND a TECHNICAL question.
  25. 25. u  Low latency is exciting, but where do you see the clear business opportunities? u  There seems to be a conundrum about where to store “slow” data: Ø  Hadoop? Ø  Traditional data warehouse? Ø  New data warehouse? u  Is the split between the application and the data real any more?
  26. 26. u  In your opinion, does the Enterprise need a new architecture? u  How is it possible to define and monitor service levels with in-memory applications? u  Whither data governance?
  27. 27. Twitter Tag: #briefr The Briefing Room
  28. 28. Upcoming Topics This Month: DATA PROCESSING November: DATA DISCOVERY & VISUALIZATION December: INNOVATORS Twitter Tag: #briefr The Briefing Room
  29. 29. Thank You for Your Attention Twitter Tag: #briefr The Briefing Room