SlideShare a Scribd company logo
1 of 43
Download to read offline
1
Presentation Agenda


➔
    Brief History
➔
    Technology Overview
➔
    Positioning towards other Projects
➔
    Roadmap
➔
    The Team
➔
    Wrap-Up



2
Brief BlackRay History
What is BlackRay?
●
        BlackRay is a relational, in-memory database
●
        SQL Support
●
        Fulltext (Tokenized) Search in Text fields
●
        Object-Oriented API Support
●
        Persistence via Files
●
        Scalable and Fault Tolerant
●
        Open Source, Open Community
●
        Dual licensed: GPL and Proprietary

    4
BlackRay History
●
        Concept of BlackRay was developed in 1999 for a
        Web Phone Directory at Deutsche Telekom
●
        Development of current BlackRay started in 2005,
        as a Data Access Library
●
        Production Use at Deutsche Telekom in 2006
●
        Evolution from Library to Data Engine in 2007 and
        2008
●
        Open Source under GPLv2/Dual License since June
        2009

    5
Why Build Another Database?
●
        Rather unique set of requirements:
        ●
            Phone Directory with approx. 80 Million subscribers
        ●
            All queries in the 10-500 Millisecond range
        ●
            Approximately 2000 concurrent users
        ●
            Over 500 Queries per Second (sustained)
        ●
            Updates once a day may only take several minutes
        ●
            Index needs to be tokenized (SQL: CONTAINS)
        ●
            Phonetic search
        ●
            Extensive Wildcard queries (leading/midspan/trailing)


    6
Options available 1999
●
        Pretty much no options available:
        ●
            ORACLE and DB2 did not support Tokenized Index
        ●
            MySQL would not support the table size required


        The first implementation in C was an embedded
        version of what is now called BlackRay.
        The implementation was done on Solaris with 64 Bit
        Ultrasparc (Sun Ultra 10)



    7
Options evaluated in 2005
●
        ORACLE 10g with Oracle Text
        ●
            Worst case search performance in the tens of minutes
        ●
            Updates of the 1TB+ Index takes hours or days....
●
        FAST Search and Transfer (now Microsoft)
        ●
            Extremely hardware intensive, 48 Sun V280 Servers
            required
        ●
            Not transactional, loses data on indexing
●
        PERST by McObject
●
        Thunderstone
        ●
            Well, lets say Deutsche Telekom could not afford it.....
    8
Decision to Implement BlackRay
●
        Decision was formed in Mid 2005
●
        Designed as a lightweight data access engine
●
        Implementation in C++ for performance and
        maintainability
●
        APIs for access across the network from different
        languages (Java, C++)
●
        Feature set was designed for our specific business
        case in a particular project


    9
Current Release
●
     Current release 0.9.0 released on June 12th, 2009
●
     Production level quality
●
     All relevant index functions for large scale search
     applications
●
     APIs in C++ and Java fully functional
●
     SQL still only a small subset of the API functionality




10
Technology Overview
Why call it Data Engine?
●
     BlackRay is a hybrid between a relational database
     and a search engine → thus we call it „data engine“
●
     Database features:
     ●
         Relational structure, with Join between tables
     ●
         Wildcards and index functions
     ●
         SQL and JDBC/ODBC
●
     Search Engine Features
     ●
         Fulltext retrieval (token index)
     ●
         Phonetic and similar approximation search
     ●
         Extremely low latency
12
Why call it Data Engine?
●
     BlackRay supports the combination of Search
     Engine and Database features:
     ●
         Full-text queries with wildcards and phonetics....
     ●
         Multiple columns with different data types
     ●
         Joining tables to better organize data




13
BlackRay Architecture

     Postgres*      SQL         Management
      Clients     Interface       Server

                                Data Universe
     C++ API
                  Instance
                   Server     5-Perspective Index
     Java API
                                L1: Dictionary
                  Redo Log       L2: Postings
                                L3: Row Index
                               L4: Multi-Tokens
                               L5: Multi-Values
                 Snapshots           <



14
Hierarchical Model
●
     Each BlackRay node can hold many Instances
●
     One Management process per node
●
     Each Instance runs in a separate Process
●
     Instances are completely separated from each other
●
     Snapshots (for persistence) are taken on an
     Instance level
●
     In an Instance, Schemas and Tables can be created
●
     Queries can span across Tables and Schemas


15
Getting Data Into BlackRay
●
     Once an Instane is created, data can be loaded
●
     Schemas, Tables, and Indexes are created using an
     XML description language
●
     Standard loader utility to load CSV data
●
     Bulk loading is done with logging disabled
●
     Indexing is done with maximum degree of
     parallelism depending on CPUs
●
     After all data is indexed, a snapshot can be taken


16
Basic Load Performance Data
●
     German yellowpage and whitepage data
     ●
         60 Million subscribers
     ●
         100 Million phone numbers
     ●
         Raw data approx 16GB
●
     Indexing performance
     ●
         Total index size 11GB
     ●
         Time to index: 40 Minutes, on dual 2GHz Xeon (Linux)
     ●
         Updates: 300MB, 200K rows in approx 5 minutes
●
     Time to load snapshot: 3.5 Minutes for 11GB

17
Data Universe
●
     BlackRay features a 5-Perspective Index
     ●
         Layer 1: Dictionary
     ●
         Layer 2: Postings
     ●
         Layer 3: Row Index
     ●
         Layer 4: Multi-Token Layer
     ●
         Layer 5: Multi-Value Layer
●
     Layer 1 and 2 comprise a fully inverted Index
●
     Statistics in this Index used for Query Plan Building


18
Snapshots and Data Versioning
●
     Persistence is done via file based snapshots
●
     Snapshots consist of all schemas in one instance
●
     Snapshots have a version number
●
     To make a backup of the data, simply copy the
     snapshot file to a backup media
●
     It is possible to load an older snapshot: Data is
     version controlled if older snapshots are stored




19
Transactions in BlackRay
●
     BlackRay supports transactions via a Redo Log
●
     All commands that modify data are logged if
     requested
●
     In case of a crash, the latest snapshot will be loaded
●
     Replay of the transaction log will then bring the
     database back to a consistent state
●
     Redo Log is eliminated when a snapshot is persisted
●
     For better performance snapshots should be taken
     periodically, ideally after each bulk update

20
Query APIs
●
     C++ and Java Object Oriented APIs are available
●
     Built with ICE (ZeroC) as the Network and Object
     Brokerage Protocol
●
     Query Objects are constructed using an Object
     Builder
●
     Execution via the network, Results as Objects
●
     Load balacing and failover built into the protocol
●
     Bindings for Python already done, C#, Ruby and
     PHP in the works

21
Query APIs
●
     Queries can use any combination of OR, AND
●
     Index functions (phonetic, sysnonyms, stopwords)
     can be stacked
●
     Token search (fulltext) and wildcard are also
     supported
●
     Advantage of the APIs:
     ●
         Minimized overhead
     ●
         Very low latency
     ●
         High availability

22
Management Features
●
     Management Server acts as central broker for all
     Instances
●
     Command line tools for administration
●
     SNMP management:
     ●
         Health check of Instances
     ●
         Statistics , including access counters and performance
         measurements




23
Administration Requirements
●
     In-Memory Databases require very little
     administrative tasks
●
     Configuration via one file per Instance
●
     Disk layout etc all are of no importance
●
     Backups are performed by copying snapshots
●
     Recovery is done by restoring a snapshot
●
     No daily administration required
●
     SNMP allows remote supervision with common tools


24
Some more details
●
     Written in C++
●
     Relies heavily on boost
●
     Compiles well with gcc and Sun Studio 12
●
     Behaves well on Linux, Solaris, OpenSolaris and
     MacOS
●
     Complete 64 Bit development
●
     Use of cmake for multi platform support



25
Positioning BlackRay
BlackRay and other Projects
●
     BlackRay is being positioned as a Query intensive
     database addition
●
     Ideally suited where updates are done in Bulk and
     searches outweigh updates by many orders of
     magnitude
●
     Wildcards come with little overhead: No overhead
     for trailing wildcard, some overhead for leading and
     midspan wildcard
●
     Good match when index functions such as phonetic
     or word rotation/position search combined with
     relational data are required
27
OS Projects with similar Goals
●
     Relational Databases (the usual suspects):
     ●
         MySQL, MariaDB, Drizzle....
     ●
         PostgreSQL...
     ●
         Many more alternatives, including embedded etc....
●
     Fulltext Search:
     ●
         Sphinx
     ●
         Lucene
●
     In-Memory Databases:
     ●
         FastDB
     ●
         HSQLDB/H2 (Java → Garbage collection issues....)
28
Commercial Alternatives
●
     Commercial In-Memory Databases
     ●
         ORACLE/TimesTen (Acquired by ORACLE in 2005)
     ●
         IBM/SolidDB (Acquired by IBM in 2007)
     ●
         VoltDB (No real data available as of yet)
     ●
         eXtremeDB (embedded use only)

●
     Dual-Licensed Alternatives
     ●
         CSQL (Open Source Version is severely crippled)
     ●
         MySQL with memcached

29
Is it the right thing for me?
●
     BlackRay is not designed as a 100% RDBMS
     replacement
●
     Questions to ask:
     ●
         Do I need ad-hoc data updates, or are updates done in
         bulk?
     ●
         How important are fulltext search and extensive
         wildcards?
     ●
         How large is my data? Gigabytes: OK. Terrabytes: Not yet
     ●
         Do I need a relational data model?
     ●
         Is SQL an important feature?

30
BlackRay will fit you well when...

     … searches outweigh updates
     … data is updated in bulk
     … needs to be available quickly
     … you have Gigabytes not Terrabytes
     … you need lots of SELECTs
     … SQL is necessary
     … a relational data model is required (JOIN)
     … source code must be available

31
Project Roadmap
Immediate Roadmap
●
     Pending immediate release:
     ●
         BlackRay Admin Console (Remora) 0.1.0
●
     Upcoming 0.10.0
     ●
         Rewrite of SQL Parser
     ●
         PostgreSQL client compatibility (via network protocol) to
         allow JDBC/ODBC... via PostgreSQL driver
     ●
         Rewritten CLI tools
     ●
         Some bugfixes (potential memory leaks)
     ●
         Authentication for Instances


33
Midterm Roadmap
●
     Data manipulation/Query Features
     ●
         Ad-hoc INSERT/UPDATE/DELETE support
     ●
         Aggregate functions for SELECT
●
     Scalability Features
     ●
         Sharding & Partitioning Options
     ●
         Federated Search
●
     Security Features
     ●
         Improved User and Access Control concepts
     ●
         SSL for all connections
●
     Improved Statistics Module
34
Longterm Roadmap
●
     Integration with other DBMSs
     ●
         Storage Engine for MariaDB → Depends on potential
         modification needs of the storage engine interfaces
     ●
         Trigger-based updates to support BlackRay as a Query
         cache instance over a regular DBMS
●
     Standalone Engine Improvements
     ●
         SQL92 compliance: SUBSELECT/UNION support
     ●
         Triggers in BlackRay




35
The Team behind BlackRay
Development Team
●
     SoftMethod Core Team
     ●
         Thomas Wunschel – Lead Developer
     ●
         Felix Schupp – Project Sponsor
     ●
         Andreas Meyer – Documentation, Porting
     ●
         Frank Fiedler
●
     Outside Contributors
     ●
         Mike Alexeev – Senior Developer
     ●
         Andreas Strafner – Developer, Porting to AIX



37
Thomas Wunschel
●
     Director of Development, SoftMethod GmbH
●
     Almost 10 years of development experience
●
     Involved with BlackRay and its applications since
     2005
●
     Currently involved in the Network Protocol Stack
●
     Lead Designer and Decision Lead for new Features




38
Mike Alexeev
●
     Senior Software Developer
●
     Over 10 years of C++ experience
●
     First outside committer to BlackRay
●
     Currently involved in rewriting the SQL grammar to
     support all index features available via the APIs




39
Felix Schupp
●
     Managing Director, SoftMethod GmbH
●
     Over 10 years of commercial software development
●
     Designer of first BlackRay predecessor in 1999
●
     Project sponsor and spokesperson
●
     Responsible for funding and applications
●
     Guide and coach in the development process




40
Wrap-Up
What to do next
●
     Get BlackRay:
     ●
         Register yourself on http://forge.softmethod.de
     ●
         SVN checkout available at
         http://svn.softmethod.de/opensource/blackray/trunk
●
     Get Involved
     ●
         Anyone can register and create tickets, news etc
     ●
         We have an active mailing list for discussion as well
●
     Contribute
     ●
         We require a signed Contributor agreement before being
         allowed commit access to the repository
42
Contact Us
●
     Website:     http://www.blackray.org
●
     Twitter:     http://twitter.com/dataengine
●
     Facebook     http://facebook.com/dataengine
●
     Mailing List: http://lists.softmethod.de
●
     Download:    http://sourceforge.net/projects/blackray

●
     Felix:       felix.schupp@softmethod.de
●
     Thomas:      thomas.wunschel@softmethod.de

43

More Related Content

What's hot

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres OpenPostgresOpen
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streamingdatamantra
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1PgTraining
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowEDB
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 APIdatamantra
 
Introducing Infinispan
Introducing InfinispanIntroducing Infinispan
Introducing InfinispanPT.JUG
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopJoe Drumgoole
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB plc
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark applicationdatamantra
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streamingdatamantra
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream APIApache Apex
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLMariaDB plc
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame APIdatamantra
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Taro L. Saito
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of PrestoTaro L. Saito
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good? Alkin Tezuysal
 

What's hot (20)

Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres OpenKoichi Suzuki - Postgres-XC Dynamic Cluster  Management @ Postgres Open
Koichi Suzuki - Postgres-XC Dynamic Cluster Management @ Postgres Open
 
Introduction to Flink Streaming
Introduction to Flink StreamingIntroduction to Flink Streaming
Introduction to Flink Streaming
 
Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1Oracle to Postgres Migration - part 1
Oracle to Postgres Migration - part 1
 
Postgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to KnowPostgres Vision 2018: WAL: Everything You Want to Know
Postgres Vision 2018: WAL: Everything You Want to Know
 
Introduction to Datasource V2 API
Introduction to Datasource V2 APIIntroduction to Datasource V2 API
Introduction to Datasource V2 API
 
Introducing Infinispan
Introducing InfinispanIntroducing Infinispan
Introducing Infinispan
 
Python Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB WorkshopPython Ireland Conference 2016 - Python and MongoDB Workshop
Python Ireland Conference 2016 - Python and MongoDB Workshop
 
MariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introductionMariaDB Enterprise Tools introduction
MariaDB Enterprise Tools introduction
 
Productionalizing a spark application
Productionalizing a spark applicationProductionalizing a spark application
Productionalizing a spark application
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Java High Level Stream API
Java High Level Stream APIJava High Level Stream API
Java High Level Stream API
 
Introducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQLIntroducing the ultimate MariaDB cloud, SkySQL
Introducing the ultimate MariaDB cloud, SkySQL
 
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
Anatomy of Data Frame API :  A deep dive into Spark Data Frame APIAnatomy of Data Frame API :  A deep dive into Spark Data Frame API
Anatomy of Data Frame API : A deep dive into Spark Data Frame API
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
When is MyRocks good?
When is MyRocks good? When is MyRocks good?
When is MyRocks good?
 

Viewers also liked

2013-14 NLA Annual Report_R6
2013-14 NLA Annual Report_R62013-14 NLA Annual Report_R6
2013-14 NLA Annual Report_R6Lauren Elrick
 
Hawaiian Community Resources
Hawaiian Community ResourcesHawaiian Community Resources
Hawaiian Community Resourceshonl
 
20090326 Speech Ceal Cjm Presentation.Ver
20090326 Speech Ceal Cjm Presentation.Ver20090326 Speech Ceal Cjm Presentation.Ver
20090326 Speech Ceal Cjm Presentation.Ver真 岡本
 
Euromoney Indices Brochure
Euromoney Indices BrochureEuromoney Indices Brochure
Euromoney Indices BrochureLilian Mettey
 
History of Computer Hindi Notes
History of Computer Hindi NotesHistory of Computer Hindi Notes
History of Computer Hindi NotesSirajRock
 
Hsinchun Chen.doc.doc
Hsinchun Chen.doc.docHsinchun Chen.doc.doc
Hsinchun Chen.doc.docbutest
 
Gaucheで本を作る
Gaucheで本を作るGaucheで本を作る
Gaucheで本を作るguest7a66b8
 
Community Service Directory 2011
Community Service Directory 2011Community Service Directory 2011
Community Service Directory 2011fewjapan
 
Manual mi primera página en html pre
Manual   mi primera página en html preManual   mi primera página en html pre
Manual mi primera página en html prejtk1
 
著作権について
著作権について著作権について
著作権についてk plus
 
From GNETS to Home School
From GNETS to Home SchoolFrom GNETS to Home School
From GNETS to Home Schooleeniarrol
 
Dpiap open house_summarize_12_02_02
Dpiap open house_summarize_12_02_02Dpiap open house_summarize_12_02_02
Dpiap open house_summarize_12_02_02Ghulamnabi Nizamani
 

Viewers also liked (20)

2013-14 NLA Annual Report_R6
2013-14 NLA Annual Report_R62013-14 NLA Annual Report_R6
2013-14 NLA Annual Report_R6
 
NetSuite One World
NetSuite One WorldNetSuite One World
NetSuite One World
 
Dng de-01012014
Dng de-01012014Dng de-01012014
Dng de-01012014
 
Hawaiian Community Resources
Hawaiian Community ResourcesHawaiian Community Resources
Hawaiian Community Resources
 
20090326 Speech Ceal Cjm Presentation.Ver
20090326 Speech Ceal Cjm Presentation.Ver20090326 Speech Ceal Cjm Presentation.Ver
20090326 Speech Ceal Cjm Presentation.Ver
 
Euromoney Indices Brochure
Euromoney Indices BrochureEuromoney Indices Brochure
Euromoney Indices Brochure
 
History of Computer Hindi Notes
History of Computer Hindi NotesHistory of Computer Hindi Notes
History of Computer Hindi Notes
 
The gate
The gateThe gate
The gate
 
Counterfeit drugs and the threat to Mississippi patients
Counterfeit drugs and the threat to Mississippi patientsCounterfeit drugs and the threat to Mississippi patients
Counterfeit drugs and the threat to Mississippi patients
 
Tips for smarter working
Tips for smarter working Tips for smarter working
Tips for smarter working
 
Paychex
PaychexPaychex
Paychex
 
Hsinchun Chen.doc.doc
Hsinchun Chen.doc.docHsinchun Chen.doc.doc
Hsinchun Chen.doc.doc
 
Gaucheで本を作る
Gaucheで本を作るGaucheで本を作る
Gaucheで本を作る
 
Community Service Directory 2011
Community Service Directory 2011Community Service Directory 2011
Community Service Directory 2011
 
Manual mi primera página en html pre
Manual   mi primera página en html preManual   mi primera página en html pre
Manual mi primera página en html pre
 
Tokyo.R #22 LT
Tokyo.R #22 LTTokyo.R #22 LT
Tokyo.R #22 LT
 
Sunlyleng
SunlylengSunlyleng
Sunlyleng
 
著作権について
著作権について著作権について
著作権について
 
From GNETS to Home School
From GNETS to Home SchoolFrom GNETS to Home School
From GNETS to Home School
 
Dpiap open house_summarize_12_02_02
Dpiap open house_summarize_12_02_02Dpiap open house_summarize_12_02_02
Dpiap open house_summarize_12_02_02
 

Similar to BlackRay - The open Source Data Engine

Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointChristopher Dubois
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Enginefschupp
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2aspyker
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2Linaro
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Sparkdatamantra
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthNicolas Brousse
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireSimon J Mudd
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
The Fn Project by Jesse Butler
 The Fn Project by Jesse Butler The Fn Project by Jesse Butler
The Fn Project by Jesse ButlerOracle Developers
 
Serverless Boston @ Oracle Meetup
Serverless Boston @ Oracle MeetupServerless Boston @ Oracle Meetup
Serverless Boston @ Oracle MeetupWayne Scarano
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using AnsibleAlok Patra
 
What is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentWhat is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentProduct School
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On DemandBogdan Kyryliuk
 
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)Rick Hwang
 

Similar to BlackRay - The open Source Data Engine (20)

Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
The Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage EngineThe Adventure: BlackRay as a Storage Engine
The Adventure: BlackRay as a Storage Engine
 
AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned AWS Big Data Demystified #1: Big data architecture lessons learned
AWS Big Data Demystified #1: Big data architecture lessons learned
 
Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2Netflix Open Source Meetup Season 4 Episode 2
Netflix Open Source Meetup Season 4 Episode 2
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Cloud arch patterns
Cloud arch patternsCloud arch patterns
Cloud arch patterns
 
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a MonthUSENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
USENIX LISA15: How TubeMogul Handles over One Trillion HTTP Requests a Month
 
PostgreSQL and MySQL
PostgreSQL and MySQLPostgreSQL and MySQL
PostgreSQL and MySQL
 
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | EnglishAWS big-data-demystified #1.1  | Big Data Architecture Lessons Learned | English
AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Fluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at ScaleFluent Bit: Log Forwarding at Scale
Fluent Bit: Log Forwarding at Scale
 
The Fn Project by Jesse Butler
 The Fn Project by Jesse Butler The Fn Project by Jesse Butler
The Fn Project by Jesse Butler
 
Serverless Boston @ Oracle Meetup
Serverless Boston @ Oracle MeetupServerless Boston @ Oracle Meetup
Serverless Boston @ Oracle Meetup
 
Automating using Ansible
Automating using AnsibleAutomating using Ansible
Automating using Ansible
 
What is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of DevelopmentWhat is Software Development by Thesys Tech Head of Development
What is Software Development by Thesys Tech Head of Development
 
Scalable Clusters On Demand
Scalable Clusters On DemandScalable Clusters On Demand
Scalable Clusters On Demand
 
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
Study Notes - Architecting for the cloud (AWS Best Practices, Feb 2016)
 

Recently uploaded

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxUdaiappa Ramachandran
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdfPedro Manuel
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureEric D. Schabell
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 

Recently uploaded (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
Building AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptxBuilding AI-Driven Apps Using Semantic Kernel.pptx
Building AI-Driven Apps Using Semantic Kernel.pptx
 
Nanopower In Semiconductor Industry.pdf
Nanopower  In Semiconductor Industry.pdfNanopower  In Semiconductor Industry.pdf
Nanopower In Semiconductor Industry.pdf
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
OpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability AdventureOpenShift Commons Paris - Choose Your Own Observability Adventure
OpenShift Commons Paris - Choose Your Own Observability Adventure
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 

BlackRay - The open Source Data Engine

  • 1. 1
  • 2. Presentation Agenda ➔ Brief History ➔ Technology Overview ➔ Positioning towards other Projects ➔ Roadmap ➔ The Team ➔ Wrap-Up 2
  • 4. What is BlackRay? ● BlackRay is a relational, in-memory database ● SQL Support ● Fulltext (Tokenized) Search in Text fields ● Object-Oriented API Support ● Persistence via Files ● Scalable and Fault Tolerant ● Open Source, Open Community ● Dual licensed: GPL and Proprietary 4
  • 5. BlackRay History ● Concept of BlackRay was developed in 1999 for a Web Phone Directory at Deutsche Telekom ● Development of current BlackRay started in 2005, as a Data Access Library ● Production Use at Deutsche Telekom in 2006 ● Evolution from Library to Data Engine in 2007 and 2008 ● Open Source under GPLv2/Dual License since June 2009 5
  • 6. Why Build Another Database? ● Rather unique set of requirements: ● Phone Directory with approx. 80 Million subscribers ● All queries in the 10-500 Millisecond range ● Approximately 2000 concurrent users ● Over 500 Queries per Second (sustained) ● Updates once a day may only take several minutes ● Index needs to be tokenized (SQL: CONTAINS) ● Phonetic search ● Extensive Wildcard queries (leading/midspan/trailing) 6
  • 7. Options available 1999 ● Pretty much no options available: ● ORACLE and DB2 did not support Tokenized Index ● MySQL would not support the table size required The first implementation in C was an embedded version of what is now called BlackRay. The implementation was done on Solaris with 64 Bit Ultrasparc (Sun Ultra 10) 7
  • 8. Options evaluated in 2005 ● ORACLE 10g with Oracle Text ● Worst case search performance in the tens of minutes ● Updates of the 1TB+ Index takes hours or days.... ● FAST Search and Transfer (now Microsoft) ● Extremely hardware intensive, 48 Sun V280 Servers required ● Not transactional, loses data on indexing ● PERST by McObject ● Thunderstone ● Well, lets say Deutsche Telekom could not afford it..... 8
  • 9. Decision to Implement BlackRay ● Decision was formed in Mid 2005 ● Designed as a lightweight data access engine ● Implementation in C++ for performance and maintainability ● APIs for access across the network from different languages (Java, C++) ● Feature set was designed for our specific business case in a particular project 9
  • 10. Current Release ● Current release 0.9.0 released on June 12th, 2009 ● Production level quality ● All relevant index functions for large scale search applications ● APIs in C++ and Java fully functional ● SQL still only a small subset of the API functionality 10
  • 12. Why call it Data Engine? ● BlackRay is a hybrid between a relational database and a search engine → thus we call it „data engine“ ● Database features: ● Relational structure, with Join between tables ● Wildcards and index functions ● SQL and JDBC/ODBC ● Search Engine Features ● Fulltext retrieval (token index) ● Phonetic and similar approximation search ● Extremely low latency 12
  • 13. Why call it Data Engine? ● BlackRay supports the combination of Search Engine and Database features: ● Full-text queries with wildcards and phonetics.... ● Multiple columns with different data types ● Joining tables to better organize data 13
  • 14. BlackRay Architecture Postgres* SQL Management Clients Interface Server Data Universe C++ API Instance Server 5-Perspective Index Java API L1: Dictionary Redo Log L2: Postings L3: Row Index L4: Multi-Tokens L5: Multi-Values Snapshots < 14
  • 15. Hierarchical Model ● Each BlackRay node can hold many Instances ● One Management process per node ● Each Instance runs in a separate Process ● Instances are completely separated from each other ● Snapshots (for persistence) are taken on an Instance level ● In an Instance, Schemas and Tables can be created ● Queries can span across Tables and Schemas 15
  • 16. Getting Data Into BlackRay ● Once an Instane is created, data can be loaded ● Schemas, Tables, and Indexes are created using an XML description language ● Standard loader utility to load CSV data ● Bulk loading is done with logging disabled ● Indexing is done with maximum degree of parallelism depending on CPUs ● After all data is indexed, a snapshot can be taken 16
  • 17. Basic Load Performance Data ● German yellowpage and whitepage data ● 60 Million subscribers ● 100 Million phone numbers ● Raw data approx 16GB ● Indexing performance ● Total index size 11GB ● Time to index: 40 Minutes, on dual 2GHz Xeon (Linux) ● Updates: 300MB, 200K rows in approx 5 minutes ● Time to load snapshot: 3.5 Minutes for 11GB 17
  • 18. Data Universe ● BlackRay features a 5-Perspective Index ● Layer 1: Dictionary ● Layer 2: Postings ● Layer 3: Row Index ● Layer 4: Multi-Token Layer ● Layer 5: Multi-Value Layer ● Layer 1 and 2 comprise a fully inverted Index ● Statistics in this Index used for Query Plan Building 18
  • 19. Snapshots and Data Versioning ● Persistence is done via file based snapshots ● Snapshots consist of all schemas in one instance ● Snapshots have a version number ● To make a backup of the data, simply copy the snapshot file to a backup media ● It is possible to load an older snapshot: Data is version controlled if older snapshots are stored 19
  • 20. Transactions in BlackRay ● BlackRay supports transactions via a Redo Log ● All commands that modify data are logged if requested ● In case of a crash, the latest snapshot will be loaded ● Replay of the transaction log will then bring the database back to a consistent state ● Redo Log is eliminated when a snapshot is persisted ● For better performance snapshots should be taken periodically, ideally after each bulk update 20
  • 21. Query APIs ● C++ and Java Object Oriented APIs are available ● Built with ICE (ZeroC) as the Network and Object Brokerage Protocol ● Query Objects are constructed using an Object Builder ● Execution via the network, Results as Objects ● Load balacing and failover built into the protocol ● Bindings for Python already done, C#, Ruby and PHP in the works 21
  • 22. Query APIs ● Queries can use any combination of OR, AND ● Index functions (phonetic, sysnonyms, stopwords) can be stacked ● Token search (fulltext) and wildcard are also supported ● Advantage of the APIs: ● Minimized overhead ● Very low latency ● High availability 22
  • 23. Management Features ● Management Server acts as central broker for all Instances ● Command line tools for administration ● SNMP management: ● Health check of Instances ● Statistics , including access counters and performance measurements 23
  • 24. Administration Requirements ● In-Memory Databases require very little administrative tasks ● Configuration via one file per Instance ● Disk layout etc all are of no importance ● Backups are performed by copying snapshots ● Recovery is done by restoring a snapshot ● No daily administration required ● SNMP allows remote supervision with common tools 24
  • 25. Some more details ● Written in C++ ● Relies heavily on boost ● Compiles well with gcc and Sun Studio 12 ● Behaves well on Linux, Solaris, OpenSolaris and MacOS ● Complete 64 Bit development ● Use of cmake for multi platform support 25
  • 27. BlackRay and other Projects ● BlackRay is being positioned as a Query intensive database addition ● Ideally suited where updates are done in Bulk and searches outweigh updates by many orders of magnitude ● Wildcards come with little overhead: No overhead for trailing wildcard, some overhead for leading and midspan wildcard ● Good match when index functions such as phonetic or word rotation/position search combined with relational data are required 27
  • 28. OS Projects with similar Goals ● Relational Databases (the usual suspects): ● MySQL, MariaDB, Drizzle.... ● PostgreSQL... ● Many more alternatives, including embedded etc.... ● Fulltext Search: ● Sphinx ● Lucene ● In-Memory Databases: ● FastDB ● HSQLDB/H2 (Java → Garbage collection issues....) 28
  • 29. Commercial Alternatives ● Commercial In-Memory Databases ● ORACLE/TimesTen (Acquired by ORACLE in 2005) ● IBM/SolidDB (Acquired by IBM in 2007) ● VoltDB (No real data available as of yet) ● eXtremeDB (embedded use only) ● Dual-Licensed Alternatives ● CSQL (Open Source Version is severely crippled) ● MySQL with memcached 29
  • 30. Is it the right thing for me? ● BlackRay is not designed as a 100% RDBMS replacement ● Questions to ask: ● Do I need ad-hoc data updates, or are updates done in bulk? ● How important are fulltext search and extensive wildcards? ● How large is my data? Gigabytes: OK. Terrabytes: Not yet ● Do I need a relational data model? ● Is SQL an important feature? 30
  • 31. BlackRay will fit you well when... … searches outweigh updates … data is updated in bulk … needs to be available quickly … you have Gigabytes not Terrabytes … you need lots of SELECTs … SQL is necessary … a relational data model is required (JOIN) … source code must be available 31
  • 33. Immediate Roadmap ● Pending immediate release: ● BlackRay Admin Console (Remora) 0.1.0 ● Upcoming 0.10.0 ● Rewrite of SQL Parser ● PostgreSQL client compatibility (via network protocol) to allow JDBC/ODBC... via PostgreSQL driver ● Rewritten CLI tools ● Some bugfixes (potential memory leaks) ● Authentication for Instances 33
  • 34. Midterm Roadmap ● Data manipulation/Query Features ● Ad-hoc INSERT/UPDATE/DELETE support ● Aggregate functions for SELECT ● Scalability Features ● Sharding & Partitioning Options ● Federated Search ● Security Features ● Improved User and Access Control concepts ● SSL for all connections ● Improved Statistics Module 34
  • 35. Longterm Roadmap ● Integration with other DBMSs ● Storage Engine for MariaDB → Depends on potential modification needs of the storage engine interfaces ● Trigger-based updates to support BlackRay as a Query cache instance over a regular DBMS ● Standalone Engine Improvements ● SQL92 compliance: SUBSELECT/UNION support ● Triggers in BlackRay 35
  • 36. The Team behind BlackRay
  • 37. Development Team ● SoftMethod Core Team ● Thomas Wunschel – Lead Developer ● Felix Schupp – Project Sponsor ● Andreas Meyer – Documentation, Porting ● Frank Fiedler ● Outside Contributors ● Mike Alexeev – Senior Developer ● Andreas Strafner – Developer, Porting to AIX 37
  • 38. Thomas Wunschel ● Director of Development, SoftMethod GmbH ● Almost 10 years of development experience ● Involved with BlackRay and its applications since 2005 ● Currently involved in the Network Protocol Stack ● Lead Designer and Decision Lead for new Features 38
  • 39. Mike Alexeev ● Senior Software Developer ● Over 10 years of C++ experience ● First outside committer to BlackRay ● Currently involved in rewriting the SQL grammar to support all index features available via the APIs 39
  • 40. Felix Schupp ● Managing Director, SoftMethod GmbH ● Over 10 years of commercial software development ● Designer of first BlackRay predecessor in 1999 ● Project sponsor and spokesperson ● Responsible for funding and applications ● Guide and coach in the development process 40
  • 42. What to do next ● Get BlackRay: ● Register yourself on http://forge.softmethod.de ● SVN checkout available at http://svn.softmethod.de/opensource/blackray/trunk ● Get Involved ● Anyone can register and create tickets, news etc ● We have an active mailing list for discussion as well ● Contribute ● We require a signed Contributor agreement before being allowed commit access to the repository 42
  • 43. Contact Us ● Website: http://www.blackray.org ● Twitter: http://twitter.com/dataengine ● Facebook http://facebook.com/dataengine ● Mailing List: http://lists.softmethod.de ● Download: http://sourceforge.net/projects/blackray ● Felix: felix.schupp@softmethod.de ● Thomas: thomas.wunschel@softmethod.de 43