This document discusses techniques for write-optimization in external memory data structures. It describes using OLAP (Online Analytical Processing) to improve the performance of write operations. With OLAP, data is logged and then sorted with an efficient external memory merge sort before being inserted into a B-tree index. This provides better performance than performing individual inserts into the B-tree, as merge sort has lower I/O complexity than many inserts.
Write-optimization in external memory data structures (Highload++ 2014)leifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Write-optimization in external memory data structuresleifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Whether your data's in MySQL, a NoSQL, or somewhere in the cloud, you're likely paying decent money for storage and IOPS. With ever-growing data volumes, and the need for SSDs to cut latency and replication to provide insurance, your storage footprint is an important place to look for savings. It makes sense, then, why so many storage vendors tout compression as a key metric and differentiator.
The language vendors and users employ to reason about storage footprint and compression is embarrassingly vague if not meaningless or downright deceptive, but we can do better, and we must do better.
In this talk, we'll discuss each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. We'll talk about what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, we'll see different ways benchmarketers present data to lie to you, and learn some techniques for identifying and cutting through those kinds of lies.
Given at Percona Live Amsterdam 2015
Basics of Algorithms and Analysis of algorithm is in there, which includes Time complexity , space complexity, three cases ( best, average, worst) and analysis of Insertion sort.
*For knowledge purpose only*
*Hope you'll come up with better one*
In this paper we propose a family of Viterbi algorithms specialized for lexical tree based FSA and HMM acoustic models. Two algorithms to decode a tree lexicon with left-to-right models with or without skips and other algorithm which takes a directed acyclic graph as input and performs error correcting decoding are presented. They store the set
of active states topologically sorted in contiguous memory queues. The number of basic operations needed to update each hypothesis is reduced and also more locality in memory is obtained reducing the expected number of cache misses and achieving a speed-up over other implementations.
Write-optimization in external memory data structures (Highload++ 2014)leifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Write-optimization in external memory data structuresleifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Whether your data's in MySQL, a NoSQL, or somewhere in the cloud, you're likely paying decent money for storage and IOPS. With ever-growing data volumes, and the need for SSDs to cut latency and replication to provide insurance, your storage footprint is an important place to look for savings. It makes sense, then, why so many storage vendors tout compression as a key metric and differentiator.
The language vendors and users employ to reason about storage footprint and compression is embarrassingly vague if not meaningless or downright deceptive, but we can do better, and we must do better.
In this talk, we'll discuss each part of the durable storage stack, from the hardware on up, and how usage numbers can take on different meanings at each layer. We'll talk about what's important to know at each layer, and how to think about and talk about concepts like compression, fragmentation, write amplification, and wear leveling. Finally, we'll see different ways benchmarketers present data to lie to you, and learn some techniques for identifying and cutting through those kinds of lies.
Given at Percona Live Amsterdam 2015
Basics of Algorithms and Analysis of algorithm is in there, which includes Time complexity , space complexity, three cases ( best, average, worst) and analysis of Insertion sort.
*For knowledge purpose only*
*Hope you'll come up with better one*
In this paper we propose a family of Viterbi algorithms specialized for lexical tree based FSA and HMM acoustic models. Two algorithms to decode a tree lexicon with left-to-right models with or without skips and other algorithm which takes a directed acyclic graph as input and performs error correcting decoding are presented. They store the set
of active states topologically sorted in contiguous memory queues. The number of basic operations needed to update each hypothesis is reduced and also more locality in memory is obtained reducing the expected number of cache misses and achieving a speed-up over other implementations.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
Presentation given at Cyient Insights (Hyderabad, India).
This work presents the solution proposed by Universidad CEU Cardenal Herrera (ESAI-CEU-UCH) at Kaggle American Epilepsy Society Seizure Prediction Challenge. The proposed solution was positioned as 4th at Kaggle competition.
Different kind of input features (different preprocessing pipelines) and different statistical models are being proposed. This diversity was motivated to improve model combination result.
It is important to note that any of the proposed systems use test set for calibration. The competition allow to do this model calibration using test set, but doing it will reduce the reproducibility of the results in a real world implementation.
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
El objetivo de este trabajo es mejorar el rendimiento de sistemas de reconocimiento de texto manuscrito \emph{off-line} basados en modelos ocultos de Markov y en modelos hibridados con redes neuronales. Para eliminar el efecto del modelo de lenguaje, se aborda una tarea de reconocimiento de palabras aisladas desprovistas por tanto de su
contexto. Un estudio de la influencia de la longitud de las palabras en el rendimiento del sistema ha llevado a combinar estos clasificadores con otro especializado en palabras cortas y que muestre una menor correlación. Para ello se han entrenado diversos
perceptrones multicapa que clasifican un subconjunto del vocabulario de manera holística. La combinación de clasificadores utilizando una variante del método de votación de Borda ofrece resultados muy satisfactorios.
In this paper, we describe a novel approach to Part-Of-Speech tagging based on neural networks. Multilayer perceptrons are used following corpus-based learning from contextual and lexical information. The Penn Treebank corpus has been used for the training and evaluation of the tagging system. The results show that the connectionist approach is feasible and comparable with other approaches.
Connectionist language models offer many advantages over their statistical counterparts, but they also have some drawbacks like a much more expensive computational cost. This paper describes a novel method to overcome this problem. A set of normalization values associated to the most frequent N-grams is pre-computed and the model is smoothed with lower N-gram connectionist or statistical models. The
proposed approach is favourably compared to standard connectionist language models and with statistical back-off language models.
Training Deep Neural Networks has been a difficult task for a long time. Recently diverse approaches have been presented to tackle these difficulties, showing that deep models improve the performance of shallow ones in some areas like signal processing, signal classification or signal segmentation, whatever type of signals, e.g. video, audio or images. One of the most important methods is greedy layer-wise unsupervised pre-training followed by a fine-tuning phase. Despite the advantages of this procedure, it does not fit some scenarios where real time learning is needed, as for adaptation of some time-series models. This paper proposes to couple both phases into one, modifying the loss function to mix together the unsupervised and supervised parts. Benchmark experiments with MNIST database prove the viability of the idea for simple image tasks, and experiments with time-series forecasting encourage the incorporation of this idea into on-line learning approaches. The interest of this method in time-series forecasting is motivated by the study of predictive models for domotic houses with intelligent control systems.
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
Top 10 Performance Gotchas in scaling in-memory Algorithms
Abstract:
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.
Track: Scalability, Availability, and Performance: Putting It All Together
Time: Wednesday, 11:45am - 12:35pm
Write optimization in external memory data structuresleifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Papers We Love SF #9: "The Level Ancestor Problem simplified" by Michael Bender and Martin Farach-Colton.
Pairs well with Papers We Love NYC #7: "The LCA Problem Revisited", by the same authors: http://bit.ly/pwl-lca
Both papers describe a simple query on a general tree structure, invite the reader on a journey through progressively more sophisticated and powerful techniques, and finish with a surprising result and polished mouthfeel.
Plus, learn one weird trick for squeezing the last bit of performance out of your algorithms.
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...locloud
This report presents the prototype of LoCloud Collections system. The main aim of LoCloud Collections (initially named Lightweight Digital Library) is to
provide small cultural institutions with the possibility to host their digitized collections (metadata as well as content) very easily in the cloud, and make that data widely available on the internet, and in particular to Europeana.
The developed service prototype is available on-line at https://locloudhosting.net/ and can be used by anyone to create new digital library in just a few minutes.
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
Most modern databases concern themselves with their ability to scale a workload beyond the power of one machine. But maintaining a database across multiple machines is inherently more complex than it is on a single machine. As soon as scaling out is required, suddenly a lot of scaling out is required, to deal with new problems like index suitability and load balancing.
Write optimized data structures are well-suited to a sharding architecture that delivers higher efficiency than traditional sharding architectures. This talk describes a new sharding architecture for MongoDB applications that can be achieved with write optimized storage like TokuMX's Fractal Tree indexes.
ESAI-CEU-UCH solution for American Epilepsy Society Seizure Prediction ChallengeFrancisco Zamora-Martinez
Presentation given at Cyient Insights (Hyderabad, India).
This work presents the solution proposed by Universidad CEU Cardenal Herrera (ESAI-CEU-UCH) at Kaggle American Epilepsy Society Seizure Prediction Challenge. The proposed solution was positioned as 4th at Kaggle competition.
Different kind of input features (different preprocessing pipelines) and different statistical models are being proposed. This diversity was motivated to improve model combination result.
It is important to note that any of the proposed systems use test set for calibration. The competition allow to do this model calibration using test set, but doing it will reduce the reproducibility of the results in a real world implementation.
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
El objetivo de este trabajo es mejorar el rendimiento de sistemas de reconocimiento de texto manuscrito \emph{off-line} basados en modelos ocultos de Markov y en modelos hibridados con redes neuronales. Para eliminar el efecto del modelo de lenguaje, se aborda una tarea de reconocimiento de palabras aisladas desprovistas por tanto de su
contexto. Un estudio de la influencia de la longitud de las palabras en el rendimiento del sistema ha llevado a combinar estos clasificadores con otro especializado en palabras cortas y que muestre una menor correlación. Para ello se han entrenado diversos
perceptrones multicapa que clasifican un subconjunto del vocabulario de manera holística. La combinación de clasificadores utilizando una variante del método de votación de Borda ofrece resultados muy satisfactorios.
In this paper, we describe a novel approach to Part-Of-Speech tagging based on neural networks. Multilayer perceptrons are used following corpus-based learning from contextual and lexical information. The Penn Treebank corpus has been used for the training and evaluation of the tagging system. The results show that the connectionist approach is feasible and comparable with other approaches.
Connectionist language models offer many advantages over their statistical counterparts, but they also have some drawbacks like a much more expensive computational cost. This paper describes a novel method to overcome this problem. A set of normalization values associated to the most frequent N-grams is pre-computed and the model is smoothed with lower N-gram connectionist or statistical models. The
proposed approach is favourably compared to standard connectionist language models and with statistical back-off language models.
Training Deep Neural Networks has been a difficult task for a long time. Recently diverse approaches have been presented to tackle these difficulties, showing that deep models improve the performance of shallow ones in some areas like signal processing, signal classification or signal segmentation, whatever type of signals, e.g. video, audio or images. One of the most important methods is greedy layer-wise unsupervised pre-training followed by a fine-tuning phase. Despite the advantages of this procedure, it does not fit some scenarios where real time learning is needed, as for adaptation of some time-series models. This paper proposes to couple both phases into one, modifying the loss function to mix together the unsupervised and supervised parts. Benchmark experiments with MNIST database prove the viability of the idea for simple image tasks, and experiments with time-series forecasting encourage the incorporation of this idea into on-line learning approaches. The interest of this method in time-series forecasting is motivated by the study of predictive models for domotic houses with intelligent control systems.
ScicomP 2015 presentation discussing best practices for debugging CUDA and OpenACC applications with a case study on our collaboration with LLNL to bring debugging to the OpenPOWER stack and OMPT.
qconsf 2013: Top 10 Performance Gotchas for scaling in-memory Algorithms - Sr...Sri Ambati
Top 10 Performance Gotchas in scaling in-memory Algorithms
Abstract:
Math Algorithms have primarily been the domain of desktop data science. With the success of scalable algorithms at Google, Amazon, and Netflix, there is an ever growing demand for sophisticated algorithms over big data. In this talk, we get a ringside view in the making of the world's most scalable and fastest machine learning framework, H2O, and the performance lessons learnt scaling it over EC2 for Netflix and over commodity hardware for other power users.
Top 10 Performance Gotchas is about the white hot stories of i/o wars, S3 resets, and muxers, as well as the power of primitive byte arrays, non-blocking structures, and fork/join queues. Of good data distribution & fine-grain decomposition of Algorithms to fine-grain blocks of parallel computation. It's a 10-point story of the rage of a network of machines against the tyranny of Amdahl while keeping the statistical properties of the data and accuracy of the algorithm.
Track: Scalability, Availability, and Performance: Putting It All Together
Time: Wednesday, 11:45am - 12:35pm
Write optimization in external memory data structuresleifwalsh
After a long reign as the dominant on-disk data structure for databases and filesystems, B-trees are slowly being replaced by write-optimized data structures, to handle ever-growing volumes of data. Some write optimization techniques, like LSM-trees, give up some of the query performance of B-trees in order to achieve this.
A Fractal Tree is a write-optimized data structure that matches the insertion performance of an LSM-tree while maintaining the optimal query performance of a B-tree. It's inspired by many data structures (Buffered Repository Trees, B^ε trees, ...) but the real definition is just what we've implemented at Tokutek.
I'll provide background on B-trees and LSM-trees, an overview of how Fractal Trees work, where they differ from B-trees and LSM-trees, and how we use their performance advantages in some obvious and some surprising ways to power new MySQL and MongoDB features in TokuDB and TokuMX.
Papers We Love SF #9: "The Level Ancestor Problem simplified" by Michael Bender and Martin Farach-Colton.
Pairs well with Papers We Love NYC #7: "The LCA Problem Revisited", by the same authors: http://bit.ly/pwl-lca
Both papers describe a simple query on a general tree structure, invite the reader on a journey through progressively more sophisticated and powerful techniques, and finish with a surprising result and polished mouthfeel.
Plus, learn one weird trick for squeezing the last bit of performance out of your algorithms.
LoCloud - D2.5: Lightweight Digital Library Prototype (LoCloud Collections Se...locloud
This report presents the prototype of LoCloud Collections system. The main aim of LoCloud Collections (initially named Lightweight Digital Library) is to
provide small cultural institutions with the possibility to host their digitized collections (metadata as well as content) very easily in the cloud, and make that data widely available on the internet, and in particular to Europeana.
The developed service prototype is available on-line at https://locloudhosting.net/ and can be used by anyone to create new digital library in just a few minutes.
Towards a rebirth of data science (by Data Fellas)Andy Petrella
Nowadays, Data Science is buzzing all over the place.
But what is a, so-called, Data Scientist?
Some will argue that a Data Scientist is a person able to report and present insights in a data set. Others will say that a Data Scientist can handle a high throughput of values and expose them in services. Yet another definition includes the capacity to create meaningful visualizations on the data.
However, we enter an age where velocity is a key. Not only the velocity of your data is high, but the time to market is shortened. Hence, the time separating the moment you receive a set of data and the time you’ll be able to deliver added value is crucial.
In this talk, we’ll review the legacy Data Science methodologies, what it meant in terms of delivered work and results.
Afterwards, we’ll slightly move towards different concepts, techniques and tools that Data Scientists will have to learn and appropriate in order to accomplish their tasks in the age of Big Data.
The dissertation is closed by exposing the Data Fellas view on a solution to the challenges, specially thanks to the Spark Notebook and the Shar3 product we develop.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteGoogle
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-pilot-review/
AI Pilot Review: Key Features
✅Deploy AI expert bots in Any Niche With Just A Click
✅With one keyword, generate complete funnels, websites, landing pages, and more.
✅More than 85 AI features are included in the AI pilot.
✅No setup or configuration; use your voice (like Siri) to do whatever you want.
✅You Can Use AI Pilot To Create your version of AI Pilot And Charge People For It…
✅ZERO Manual Work With AI Pilot. Never write, Design, Or Code Again.
✅ZERO Limits On Features Or Usages
✅Use Our AI-powered Traffic To Get Hundreds Of Customers
✅No Complicated Setup: Get Up And Running In 2 Minutes
✅99.99% Up-Time Guaranteed
✅30 Days Money-Back Guarantee
✅ZERO Upfront Cost
See My Other Reviews Article:
(1) TubeTrivia AI Review: https://sumonreview.com/tubetrivia-ai-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Navigating the Metaverse: A Journey into Virtual Evolution"Donna Lenk
Join us for an exploration of the Metaverse's evolution, where innovation meets imagination. Discover new dimensions of virtual events, engage with thought-provoking discussions, and witness the transformative power of digital realms."
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Quarkus Hidden and Forbidden ExtensionsMax Andersen
Quarkus has a vast extension ecosystem and is known for its subsonic and subatomic feature set. Some of these features are not as well known, and some extensions are less talked about, but that does not make them less interesting - quite the opposite.
Come join this talk to see some tips and tricks for using Quarkus and some of the lesser known features, extensions and development techniques.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
May Marketo Masterclass, London MUG May 22 2024.pdfAdele Miller
Can't make Adobe Summit in Vegas? No sweat because the EMEA Marketo Engage Champions are coming to London to share their Summit sessions, insights and more!
This is a MUG with a twist you don't want to miss.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
Enterprise Resource Planning System includes various modules that reduce any business's workload. Additionally, it organizes the workflows, which drives towards enhancing productivity. Here are a detailed explanation of the ERP modules. Going through the points will help you understand how the software is changing the work dynamics.
To know more details here: https://blogs.nyggs.com/nyggs/enterprise-resource-planning-erp-system-modules/
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
5. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
Data structures:
Provide retrieval of data.
Lookup(Key)
Pred(Key)
Succ(Key)
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
6. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
Data structures:
Provide retrieval of data.
Lookup(Key)
Pred(Key)
Succ(Key)
Dynamic data structures let you change
the data.
Insert(Key; Value)
Delete(Key)
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 3 / 31
7. .
.
.
.
.
.
.
.
.
[Aggarwal & Vitter ’88]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
DAM model
Problem size N.
Memory size M.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
8. .
.
.
.
.
.
.
.
.
[Aggarwal & Vitter ’88]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
DAM model
Problem size N.
Memory size M.
Transfer data to/from memory in blocks
of size B.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
9. .
.
.
.
.
.
.
.
.
[Aggarwal & Vitter ’88]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
DAM model
Problem size N.
Memory size M.
Transfer data to/from memory in blocks
of size B.
Efficiency of operations is measured as the
number of block transfers, a.k.a. IOPS.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 4 / 31
10. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
A B-tree is an external memory data structure:
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
11. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
A B-tree is an external memory data structure:
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
12. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
A B-tree is an external memory data structure:
Balanced search tree.
Fanout of B
(block size / key size).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
13. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
A B-tree is an external memory data structure:
Balanced search tree.
Fanout of B
(block size / key size).
Internal nodes < M.
Leaf nodes > M.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
14. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization in external memory data structures
A B-tree is an external memory data structure:
Balanced search tree.
Fanout of B
(block size / key size).
Internal nodes < M.
Leaf nodes > M.
Search: O(logB N) I/Os
Insert: O(logB N) I/Os
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 5 / 31
23. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Merge sort in external memory:
Merge sort cost in DAM model is:
Cost to scan through all the data once.
Multiplied by the # of levels in the merge tree.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
24. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Merge sort in external memory:
Merge sort cost in DAM model is:
Cost to scan through all the data once.
N/B
Multiplied by the # of levels in the merge tree.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
25. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Merge sort in external memory:
Merge sort cost in DAM model is:
Cost to scan through all the data once.
N/B
Multiplied by the # of levels in the merge tree.
logM/B N/B
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
26. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Merge sort in external memory:
Merge sort cost in DAM model is:
Cost to scan through all the data once.
N/B
Multiplied by the # of levels in the merge tree.
logM/B N/B
O
(
N
B
logM/B
N
B
)
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 11 / 31
27. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Insert N elements into a B-tree:
O
(
N logB
N
M
)
Merge sort:
O
(
N
B
logM/B
N
B
)
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
28. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Insert N elements into a B-tree:
O
(
N logB
N
M
)
Merge sort:
O
(
N
B
logM/B
N
B
)
2N
B
Typically, M/B is large, so only two passes are needed to sort.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
29. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Insert N elements into a B-tree:
O
(
N logB
N
M
)
N
Merge sort:
O
(
N
B
logM/B
N
B
)
2N
B
Typically, M/B is large, so only two passes are needed to sort.
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
30. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Insert N elements into a B-tree: (assuming 100-1000 byte elements)
O
(
N logB
N
M
)
N 10 100kB/s = 100 elements/s
Merge sort:
O
(
N
B
logM/B
N
B
)
2N
B
Typically, M/B is large, so only two passes are needed to sort.
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
31. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
Insert N elements into a B-tree: (assuming 100-1000 byte elements)
O
(
N logB
N
M
)
N 10 100kB/s = 100 elements/s
Merge sort:
O
(
N
B
logM/B
N
B
)
2N
B
50MB/s = 50k 500k elements/s
Typically, M/B is large, so only two passes are needed to sort.
Intuition: Each insert into a B-tree costs 1 seek, while sorting is close to disk bandwidth.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 12 / 31
33. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
34. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Sort the new data.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
35. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Sort the new data.
Use a merge pass through existing indexes to incorporate new data.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
36. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Sort the new data.
Use a merge pass through existing indexes to incorporate new data.
Use indexes to do analytics.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
37. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Sort the new data.
Use a merge pass through existing indexes to incorporate new data.
Use indexes to do analytics.
Moral: OLAP techniques can handle high insertion volume, but query results are delayed.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
38. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #1: OLAP
So, how does OLAP work?
Log new data unindexed until you accumulate a lot of it (10% of the data set).
Sort the new data.
Use a merge pass through existing indexes to incorporate new data.
Use indexes to do analytics.
Moral: OLAP techniques can handle high insertion volume, but query results are delayed.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 13 / 31
40. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP?
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
41. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP?
The buffer is small, let’s index it!
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
42. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP?
The buffer is small, let’s index it!
Inserts go into the “buffer B-tree”.
When the buffer gets full, we merge it with the “main B-tree”.
Queries have to touch both trees and merge results, but results are available immediately.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
43. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
The insight for LSM-trees starts by asking: how can we reduce the queryability delay in OLAP?
The buffer is small, let’s index it!
Inserts go into the “buffer B-tree”.
When the buffer gets full, we merge it with the “main B-tree”.
Queries have to touch both trees and merge results, but results are available immediately.
(This specific technique (which is not yet an LSM-tree) is used in InnoDB and is called the “change buffer”.)
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 15 / 31
45. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
Why is this fast?
The buffer is in-memory, so inserts are fast.
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes
the I/O cost to read the leaf ).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
46. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
Why is this fast?
The buffer is in-memory, so inserts are fast.
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes
the I/O cost to read the leaf ).
Eventually, we reach a problem:
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
47. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
Why is this fast?
The buffer is in-memory, so inserts are fast.
When we merge, we put many new elements in each leaf in the main B-tree (this amortizes
the I/O cost to read the leaf ).
Eventually, we reach a problem:
If the buffer gets too big, inserts get slow.
If the buffer stays too small, the merge gets inefficient (back to O(N logB N)).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 16 / 31
52. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How can we fix this? More buffering!
Each level is twice as large as the previous level, for some value of 2.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
53. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How can we fix this? More buffering!
Each level is twice as large as the previous level, for some value of 2. We’ll use 2.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 17 / 31
61. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How much do inserts cost?
Cost to flush a tree Tj of size X is O(X/B).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
62. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How much do inserts cost?
Cost to flush a tree Tj of size X is O(X/B).
Cost per element to flush Tj is O(1/B).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
63. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How much do inserts cost?
Cost to flush a tree Tj of size X is O(X/B).
Cost per element to flush Tj is O(1/B).
Each element moves log N times.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
64. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #2: LSM-trees
How much do inserts cost?
Cost to flush a tree Tj of size X is O(X/B).
Cost per element to flush Tj is O(1/B).
Each element moves log N times.
Total amortized insert cost per element is O
(
log N
B
)
.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 19 / 31
66. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
The pain in LSM-trees is doing a full O(logB N) search in each level.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
67. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
The pain in LSM-trees is doing a full O(logB N) search in each level.
We use fractional cascading to reduce the search per level to O(1).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
68. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
The pain in LSM-trees is doing a full O(logB N) search in each level.
We use fractional cascading to reduce the search per level to O(1).
The idea is that once we’ve searched Ti, we know where the key would be in Ti, and we can use
that information to guide our search of Ti+1.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 21 / 31
69. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Add forwarding pointers from leaves in Ti to leaves in Ti+1 (but remove the redundant ones that
point to the same leaf ):
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 22 / 31
70. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Add ghost pointers to leaves not pointed to in Ti+1 in leaves in Ti:
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 23 / 31
71. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost
pointers, and follow them down to look at O(1) leaves in Ti+1.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
72. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost
pointers, and follow them down to look at O(1) leaves in Ti+1.
This way, search is only O(logR N) (in our example, R = 2).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
73. [Bender, Farach-Colton, Fineman, Fogel, Kuszmaul, Nelson ’07]
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Now, after searching Ti for a missing element c, we look left and right for forwarding or ghost
pointers, and follow them down to look at O(1) leaves in Ti+1.
This way, search is only O(logR N) (in our example, R = 2).
The internal node structure in each level is now redundant, so we can represent each level as an
array. This is called a Cache-Oblivious Lookahead Array.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 24 / 31
74. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Though the amortized analysis says our inserts are fast, when we flush a very large level to the
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
75. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Though the amortized analysis says our inserts are fast, when we flush a very large level to the
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better.
We break each level’s array into chunks that can be flushed independently. Each chunk flushes
to a localized region of a few chunks in the next level down, found using its forwarding pointers.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
76. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Though the amortized analysis says our inserts are fast, when we flush a very large level to the
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better.
We break each level’s array into chunks that can be flushed independently. Each chunk flushes
to a localized region of a few chunks in the next level down, found using its forwarding pointers.
Now we have a tree again!
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
77. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Write-optimization technique #3: Fractal Trees
Though the amortized analysis says our inserts are fast, when we flush a very large level to the
next one, we might see a big stall. Concurrent merge algorithms exist, but we can do better.
We break each level’s array into chunks that can be flushed independently. Each chunk flushes
to a localized region of a few chunks in the next level down, found using its forwarding pointers.
Now we have a tree again!
As it turns out, this structure makes it easier to manage an LRU-style cache of blocks and is more
flexible in the face of “hotspot” workloads.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 25 / 31
78. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Results
Modified B-tree-like dynamic (inserts, updates, deletes) data structure that supports point
and range queries.
Inserts (
are a factor B/ log B (typically 10-100x in practice) faster than a B-tree:
O
log N
B
)
O
(
log N
log B
)
.
Searches are a factor log B/ log R slower than a B-tree: O
(
log N
log R
)
O
(
log N
log B
)
.
To amortize flush costs over many elements, we want each block we write to be large
(4MB), much larger than typical B-tree blocks (16KB). These compress well.
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 26 / 31
79. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
TokuDB for MySQL, TokuMX for MongoDB:
Faster indexed insertions.
Hot schema changes.
Compression.
Faster replication on secondaries (TokuMX).
Lower impact migrations (TokuMX).
Fast (no read before write) updates (in TokuDB, coming soon in TokuMX).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31
80. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Applications
TokuDB for MySQL, TokuMX for MongoDB:
Faster indexed insertions.
Hot schema changes.
Compression.
Faster replication on secondaries (TokuMX).
Lower impact migrations (TokuMX).
Fast (no read before write) updates (in TokuDB, coming soon in TokuMX).
ACID transactions.
Concurrency (TokuMX).
Leif Walsh (Tokutek) Fractal Trees November 12, 2014 27 / 31