• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
The Adventure: BlackRay as a Storage Engine

The Adventure: BlackRay as a Storage Engine



This is Felix' talk at the 2010 edition of MySQL Con in Santa Clara, CA

This is Felix' talk at the 2010 edition of MySQL Con in Santa Clara, CA



Total Views
Views on SlideShare
Embed Views



2 Embeds 2

http://facebook.slideshare.com 1
http://www.slideshare.net 1



Upload Details

Uploaded via as OpenOffice

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    The Adventure: BlackRay as a Storage Engine The Adventure: BlackRay as a Storage Engine Presentation Transcript

    • Presentation Agenda
      • Brief History
      • Technology Overview
      • The Storage Engine Adventure
      • Roadmap
      • The Team
      • Wrap-Up
    • Brief BlackRay History
    • What is BlackRay?
      • BlackRay is a relational, in-memory database
      • Supports SQL, utilizes PostgreSQL drivers
      • Fulltext (Tokenized) Search in Text fields
      • Object-Oriented API Support
      • Persistence via Files, Transaction support
      • Scalable and Fault Tolerant
      • Open Source, Open Community
      • Available under the GPLv2
    • BlackRay History
      • Concept of BlackRay was developed in 1999 for a Web Phone Directory at Deutsche Telekom
      • Development of current BlackRay started in 2005, as a Data Access Library
      • First Production Use at Deutsche Telekom in 2006
      • Evolution to Data Engine in 2007 and 2008
      • Open Source under GPLv2 since June 2009
    • Why Build Another Database?
      • Rather unique set of requirements:
        • Phone Directory with approx. 80 Million subscribers
        • All queries in the 10-500 Millisecond range
        • Approximately 2000 concurrent users
        • Over 500 Queries per Second (sustained)
        • Updates once a day may only take several minutes
        • Index needs to be tokenized (SQL: CONTAINS)
        • Phonetic search
        • Extensive Wildcard queries (leading/midspan/trailing)
    • Decision to Implement BlackRay
      • Decision was formed in Mid 2005
      • Designed as a lightweight data access engine
      • Implementation in C++ for performance and maintainability
      • APIs for access across the network from different languages (Java, C++)
      • Feature set was designed for our specific business case in a particular project
    • Current Release
        Current 0.10.0 – Released December 2009
        • Complete rewrite of SQL Parser (boost::spirit2)
        • PostgreSQL client compatibility (via network protocol) to allow JDBC/ODBC... via PostgreSQL driver
        • Rewritten CLI tools
        • Major bugfixes (potential memory leaks)
        • Better Authentication suppor for Instances
    • Some more details
      • Written in C++
      • Relies heavily on boost
      • Compiles well with gcc and Sun Studio 12
      • Behaves well on Linux, Solaris, OpenSolaris and MacOS
      • Complete 64 Bit development and platform support
      • Use of cmake for multi platform support
    • Technology Overview
    • Why call it Data Engine?
      • BlackRay is a hybrid between a relational database and a search engine -> thus we call it „ data engine “
      • Database features:
        • Relational structure, with Join between tables
        • Wildcards and index functions
        • SQL and JDBC/ODBC
      • Search Engine Features
        • Fulltext retrieval (token index)
        • Phonetic and similar approximation search
        • Extremely low latency
    • BlackRay Architecture Redo Log Snapshots
    • Data Universe
      • BlackRay features a 5-Perspective Index
        • Layer 1: Dictionary
        • Layer 2: Postings
        • Layer 3: Row Index
        • Layer 4: Multi-Token Layer
        • Layer 5: Multi-Value Layer
      • Layer 1 and 2 comprise a fully inverted Index
      • Statistics in this Index used for Query Plan Building
      • All data - index and raw output - are held in memory
    • Getting Data Into BlackRay
      • Loading requires a running BlackRay Instance
      • Schemas, Tables, and Indexes are created using an XML description language
      • Standard loader utility to load CSV data
      • Bulk loading is done with logging disabled (no single transactions for better performance)
      • Indexing is done with maximum degree of parallelism depending on CPUs
      • After all data is indexed, a snapshot can be taken
    • Basic Load Performance Data
      • German yellowpage and whitepage data
        • 60 Million subscribers
        • 100 Million phone numbers
        • Raw data approx 16GB
      • Indexing performance
        • Total index size 11GB
        • Time to index: 33 Minutes, on dual 2GHz Xeon (Linux)
        • Updates: 500MB, 400K rows in approx 4 minutes
      • Time to load snapshot: 200 Seconds for 11GB (limited only by HDD read performance)
    • Snapshots and Data Versioning
      • Persistence is done via file based snapshots
      • A Snapshot is one flat file
      • Snapshots consist of all schemas in one instance
      • Snapshots have a version number
      • To make a backup of the data, simply copy the snapshot file to a backup media
      • It is possible to load an older snapshot: Data is version controlled if older snapshots are stored
      • Note: Currently Snapshots are not fully portable.
    • Transactions in BlackRay
      • BlackRay supports transactions via a Redo Log
      • All commands that modify data are logged if requested
      • In case of a crash, the latest snapshot will be loaded
      • Replay of the transaction log will then bring the database back to a consistent state
      • Redo Log is eliminated when a snapshot is persisted
      • For better performance snapshots should be taken periodically, ideally after each bulk update
    • Native Query APIs
      • C++, Java and Python Object Oriented APIs are available
      • Built with ICE (ZeroC) as the Network and Object Brokerage Protocol
      • Query Objects are constructed using an Object Builder
      • Execution via the network, Results as Objects
      • Load balacing and failover built into the protocol
      • Native support for Ruby, PHP and C# upcoming
    • Native Query APIs
      • Queries can use any combination of OR, AND
      • Index functions (phonetic, sysnonyms, stopwords) can be stacked
      • Token search (fulltext) and wildcard are also supported
      • Advantage of the APIs:
        • Minimized overhead
        • Very low latency
        • High availability
    • SQL Interface
      • BlackRay implements the PostgreSQL server socket interface
      • All PostgreSQL compatible frontends can be utilized against BlackRay
      • JDBC, ODBC, native drivers using socket interface are all supported
      • Limitations: SQL is only a reasonable subset of SQL92, Metadata is not yet fully implemented...
      • Currently not all index functions can be accessed via SQL statements
    • Management Features
      • Management Server acts as central broker for all Instances
      • Command line tools for administration
      • SNMP management:
        • Health check of Instances
        • Statistics , including access counters and performance measurements
    • Clustering and Replication
      • BlackRay supports a multi-node setup for fault tolerance
      • Cluster have N query nodes and single update node
      • All query nodes are equal and require the full index data available
      • Index creation is done offline on the update node
      • Query nodes can reload snapshots to receive updated data, via shared storage or local copy
      • Load-balancing handled by native APIs
    • Administration
      • In-Memory Databases require very little administrative tasks
      • Configuration via one file per Instance
      • Disk layout etc all are of no importance
      • Backups are performed by copying snapshots
      • Recovery is done by restoring a snapshot
      • No daily administration required
      • SNMP allows remote supervision with common tools
    • Our Adventure: BlackRay As A Storage Engine
    • Why even bother?
      • In Fall 2009, we embarked on a little adventure to implement BlackRay as a storage engine
      • The old Engine had only a minimal SQL interface and we lacked the expertise to build it ourselves
      • Plugging into the MySQL ecosystem seemed like a very pleasant choice
      • The features of BlackRay would make it a good query cache for large disk tables.....
    • Our First Problem
        BlackRay does not support a simple table scan.....
        • It may seem strange, but due to it's design as an in-memory index, we do not separate table and index
        • Each column index basically is the data of the column
        • BlackRay distinguishes select and output columns, both of which remain in RAM
        • The index therefore was never designed to be forced back into a row format, for a simple table walk
    • Possible Solution?
        So, we can walk the Index instead?
        • Rather than scanning the table, it is possible to scan the index instead
        • This only works for the columns markes „searchable“
        • Causes nasty errors when trying to select against result-only columns
        • In tokenized index columns, getting the data back out means concatenation with a blank between values – not nice, as tokenizing can follow complex rules
        • Requires Refactoring of our Layer 3 (Row-Index)
    • Next Issue
        Optimizing Queries
        • The BlackRay Optimizer uses the Layer 1/2 (Inverted Index) and Layer 4 (Multi-Tokens) Data to chose a Query Path
        • In BlackRay „SELECT text FROM t WHERE text LIKE '*pattern1*' AND text LIKE '*pattern2* is extremely efficient as the inverted index has all the data
        • Even with OR this is an efficient Query, due to the fact that we can immediately chose the smaller query first and eliminate double matches
    • Next Issue
      • Optimizing in the Storage Engine Interface?
        • In BlackRay, the Optimizer uses the AST from the SQL Parser to figure out what to optimize
        • Based on a field or single Index level, the number of matches really are not useful
        • Without utilizing the Layer2 and Layer4 structures, we lose performance by several orders of magnitude
        • Personal Opinion: The MySQL Optimizer really seems to like table scans, and tricking it with random vs sequential read cost did not do the trick
    • Functions in the Index
      • Columns can take functions to be used on the data upon indexing, and when select is carried out
      • The most common functions are
          • TOKENIZE – to support multi-token indexes
          • PHONETIC – match against defined phonetic rules
          • ALIAS – match a token against words with similar meaning
      • Internally these functions could be considered Meta-Columns on the Index
      • To be able to chose the proper column, we need to know what function was used in the select
    • Functions in the Index
        Consider this Query:
          SELECT text FROM t WHERE text LIKE fx_phonetic('maier')
      • Functions can take more than one parameter, and may be nested
      • We could not quite figure out how to explain this to the MySQL Parser
      • The function data would need to be available to the Index to chose where to look
    • Threading Models...
      • BlackRay has a highly optimized Threading Model
      • In RAM, we do not expect I/O-waits, so a model of two dedicated Threads per CPU core works really well
      • Locking in the Index is built around this model
      • „One Thread per Conection“ requires at least a careful review of the way critical data structures are accessed
    • .... Our Conclusion
      • Currently, BlackRay really does not fit too well into the storage engingine architecture
      • Did we lose all hope? Absolutely not.....
      • BlackRay Applications could really benefit from being able to utilize MySQL features, including the Archive Engine as well as temporary tables in Heap
      • Thanks to the excellent Blog and postings by Brian Aker, which allowed us to not make all beginner mistakes ourselves
    • Lessons Learned
      • We need to spend more time on this topic to get the issues worked out
      • Most of the issues require a significant change to the way BlackRay works
      • Without modifications of the way the Optimizer can be interacted with, this makes very little sense
    • Possible Alternatives (Best Bet)
      • BlackRay supports the PostgreSQL interfaces
      • Queries could be fed through a BlackRay frontend, passing parts of the Query to PostgreSQL (or even MySQL) – this can be done with a rather simple JDBC intermediate layer
      • It might be even more straightforward to modify the MySQL SQL processor to directly interact with the BlackRay core, allowing better interaction with the BlackRay Optimizer
    • Project Roadmap
    • Immediate Roadmap
      • Planned 0.11.0 – Due in July 2010
        • Support for ad-hoc data manipulation via SQL (INSERT/UPDATE/DELETE)
        • Pluggable Function architecture (loadable libraries)
        • Make all index functions available in SQL
        • Support for Prepared Statements (ODBC/JDBC)
        • Improved thread and memory management (Perftools?)
      • BlackRay Admin Console (Remora) 0.11
        • Engine Statistics via GUI
        • Cluster Node management
    • Midterm Roadmap
      • Scalability Features
        • Sharding & Partitioning Options
        • Federated Search
      • Fully portable snapshot format (across platforms)
      • Query Performance Analyzer
      • Improved Statistics Module with GUI
      • Removal of ICE as a core component
      • BlackRay as a Storage Backend for SUN OpenDS LDAP Engine
    • Midterm Roadmap
      • Security Features
        • Improved User and Access Control concepts
        • SSL for all connections
        • External User Store (LDAP/OpenSSO/PAM...)
      • Increased Platform support
        • Windows 7 and Windows Server platforms
        • Embedded platforms
      • Other, random features by popular request.
    • The Team behind BlackRay
    • SoftMethod GmbH
      • SoftMethod GmbH initiated the project in 2005
      • Company was founded in 2004 and currently has 10 employees
      • Focus of SoftMethod is high performance software engineering
      • Product portfolio includes telco/contact center and LDAP applications
      • SoftMethod also offers load testing and technical software quality assurance support.
    • Development Team
      • SoftMethod GmbH initiated the project in 2005
        • Thomas Wunschel, Architect and Lead Developer
        • Felix Schupp, Initiator and Project Sponsor
        • Mike Alexeev, Key Contributor (SQL/Functions)
        • Souvik Roy, Performance Analysis and Tools
    • Wrap-Up
    • What to do next
      • Get BlackRay:
        • Register yourself on http://forge.softmethod.de
        • SVN checkout available at http://svn.softmethod.de/opensource/blackray/trunk
      • Get Involved
        • Anyone can register and create tickets, news etc
        • We have an active mailing list for discussion as well
      • Contribute
        • We require a signed Contributor agreement before being allowed commit access to the repository
    • Contact Us
      • Website: http://www.blackray.org
      • Twitter: http://twitter.com/dataengine
      • Facebook http://facebook.com/dataengine
      • Mailing List: http://lists.softmethod.de
      • Download: http://sourceforge.net/projects/blackray
      • Felix: [email_address]
      • Thomas: [email_address]