Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Beyond relational: «neural» DBMS?

70 views

Published on

Applying machine learning techniques to traditional database management systems.

Published in: Technology
  • Be the first to comment

Beyond relational: «neural» DBMS?

  1. 1. BEYOND RELATIONAL: «NEURAL» DBMS? Roberto Reale @ Italian Association for Machine Learning 10 Apr 2019
  2. 2.  F. Codd, E. (1970). A Relational Model of Data for Large Shared Data Banks. Commun. ACM. 13. 377-387.  Kraska, T., Beutel, A., Chi, E.H., Dean, J. and Polyzotis, N., (2017). The Case for Learned Index Structures. arXiv preprint arXiv:1712.01208.
  3. 3. RELATIONAL MODEL Can be expressed in first-order predicate logic Data is represented as tuples, grouped into relations Abstraction from physical storage model
  4. 4. INDEX STRUCTURES Needed for efficient data access B-Trees, Hash maps, Bloom filters, ... Need tuning General data structures, do not take advantage of data patterns
  5. 5. ENTER MACHINE LEARNING Replacing core components of a data management system through learned models Traditional indexes are already models For efficiency reasons it is common not to index every single key of the sorted records, rather only the key of every n-th record Using other types of models as indexes can provide benefits
  6. 6. INDEXES ARE CDF MODELS An index is a model that takes a key as an input and predicts the position of the record A model that predicts the position given a key inside a sorted array approximates the cumulative distribution function F(Key) is the estimated cumulative distribution function for the data to estimate the likelihood to observe a key smaller or equal to the look- up key
  7. 7. ISSUES... Decision trees in general, are really good in overfitting the data with a few operations A single neural net requires significantly more space and CPU time for the “last mile” B-Trees are extremely cache- and operation-efficient
  8. 8. THE LEARNING INDEX FRAMEWORK (LIF) Given a trained Tensorflow model, LIF automatically extracts all weights from the model and generates efficient index structures in C++ Designed for small models No unnecessary overhead
  9. 9. THE RECURSIVE MODEL INDEX Challenge: accuracy for last-mile search We build a hierarchy of models Each model takes the key as an input and based on it picks another model
  10. 10. THE RECURSIVE MODEL INDEX, 2 We iteratively train each stage with loss Lℓ We separate model size and complexity from execution cost We effectively divide the space into smaller sub-ranges to make it easier to achieve the required “last mile” accuracy
  11. 11. HYBRID MODELS Top-layer: rectified linear unit (ReLU) neural net At the bottom: thousands of simple, inexpensive linear regression models Traditional B-Trees at the bottom if the data is particularly hard to learn
  12. 12. DOES THIS STUFF WORK? Simple NNs can be efficiently trained using stochastic gradient descent A closed form solution exists for linear multi-variate models The results are promising, but “learned indexes” might not be the best choice in every use case A new way to think about indexing
  13. 13. ROBERTO@REALE.ME

×