SlideShare a Scribd company logo
1 of 21
Download to read offline
Rise of the Scientific
     Database
  John A. De Goes, @jdegoes
Agenda
•   Scientific Computing & Databases
•   Blessing / Curse of the RDBMS
•   Power of the Array
•   Scientific Databases
•   Hadoop
•   Summary & Conclusions
What is Scientific
         Computing?

"Scientific computing is concerned with
constructing mathematical models and
quantitative analysis techniques and using
computers to analyze and solve scientific
problems."
                              —Wikipedia
J

                                                                                     LAPACK

                                                                                   Mathematica                  Julia
                             Fortran
                                                           LINPACK                   SciLab                    Spark
                    Modern numerical linear
                           algebra                         MATLAB                     SciPy                   MLBase

                         Gradient methods            Conjugate gradient               PDL                      SciDB
    Finite
differences         Finite difference for PDEs        Poisson solvers              Rasdaman                MonetDB / SciQL




                1940's                      1960's                        1980's                  2000's                 The Future
                             1950's                        1970's                    1990's                    2010's




       Finite element methods      Stable SVD algorithms        Large-scale eigenvalue            NumPy                      ???
                                                                       solvers
       Numeric linear algebra          Iterative methods                                         Hadoop
                                                                     GNU Octave
        Linear programming         Stable pseudoinverses                                          Mahout
                                                                          Python
              Monte carlo                    FFT                                                  HPCC
                                                                          SPSS
                                         APL invented                                             CUDA

                                        SAS released                                             OpenCL

                                                                                                 BrookGPU
What is a Database?


"A technology that combines the ability to
store data with a high-level, high-
performance means of storing, retrieving,
and manipulating that data without having
to write code or have knowledge of the
mechanisms of implementation."
Relational Model

           Ingres (QUEL)

         System R (SEQUEL)                                           Julia

             SQL/DBS                                                Spark

               DBS2                       ODBMS                    MLBase

              Oracle                       MySQL                    SciDB

             "RDBMS"                     PostgreSQL             MonetDB / SciQL




1960's                       1980's                   2000's                  The Future
              1970's                      1990's                    2010's




CODASYL                      SQL wins                 MongoDB                     ???

  IMS                          DB2                    CouchDB

 SABRE                        DBase                    Riak

                           SQL Server                  Neo4j

                       Other solutions
The Relationship between Scientific
     Computing & Databases




   Scientific    Scientific     Data
   Computing    Databases     Analysis
The Database Landscape
  Unstructured       2000              ?               ?


Semi-structured      2005           2000               ?


     Structured      1970           1980               ?
                  Operational     Analytical       Scientific
                    gets & puts    sums & counts    data analysis
Relational Algebra

Projection            Selection         Rename                    Natural Join
                                                                         R            S




  Semijoin           Antijoin           Division                    Theta Join
   R        S         R       S          R   ÷   S



Left outer join   Right outer join   Full outer join              Aggregation
    R   ⟕   S         R   ⟖   S          R⟗ S          G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)
The Curse of RDBMS



Sets        Tuples      ???
 rows         columns
The Curse of RDBMS



Sets        Tuples      Arrays
 rows         columns
The Power of the Array
•   Linear Algebra
•   Transforms (Fourier, wavelet, etc.)
•   Spatial Analysis
•   Temporal Analysis
•   Etc.
Poor Man’s Arrays
SELECT X.row AS row, Y.col AS col,
  SUM(X.value *   Y.value) AS value,
  FROM X, Y where X.col = X.row
  GROUP BY X.row, Y.col
Poor Man’s Arrays
SELECT A.name,   A.sales, SUM(B.sales) AS
    running_total
  FROM Sales AS A, Sales AS B
  WHERE A.sales < B.sales or
    (A.sales = B.sales and
     A.name = B.name)
  GROUP BY A.name,   A.sales
Poor Man’s Arrays
What is a Scientific
              Database?
•   First-class support for multidimensional arrays

     •   Creation

     •   Manipulation

     •   Composition

•   Capable of expressing whole analyses, not just snippets

•   Tremendous benefits across multiple dimensions

     •   Scalability & Performance

     •   Expressiveness & Usability

     •   Robustness & Accuracy
Array Algebra
•   Many different approaches (NRCA, SciQL, AFL, ODMG, etc.)

•   Possible to define as extensions to relational core (but not
    necessary)

•   Most approaches share common core

    •   Array deconstruction

    •   Array construction

    •   Array reduction
Scientific Databases



Rasdaman   SciDB   MonetDB (+SciQL)
What About Hadoop?
•   Commonly used in scientific computing

•   No scientific database technology

      •   But many useful programming libraries

           •    Hama

           •    Mahout

           •    Cascading

•   Hadoop doesn’t make it easy

      •   YARN should help (Tez?)

      •   Balancing needs help

•   Not the only game in town anymore (BDAS, MPI-2, HPCC, etc.)
Conclusions
• Scientific computing can benefit from a
  scientific database
• Success of RDBMS was also a curse
• NoSQL, big data, catalysts for disruption
• Still early for scientific databases
• Hadoop loves/hates science
Resources
                 SciDB / Array Functional Language
                         http://bit.ly/VdXJkA

                          Rasdaman / rasql
               http://en.wikipedia.org/wiki/Rasdaman

                          MonetDB / SciQL
                         http://monetdb.org

                          Precog / Quirrel
                         http://precog.com

Query Language for Multidimensional Arrays: Design, Implementation, &
                     Optimization Techniques



              John A. De Goes, @jdegoes

More Related Content

Viewers also liked

Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...Maria Antonia Brovelli
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive AnalyticsJohn De Goes
 
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free MonadsJohn De Goes
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseMongoDB
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity ModelJohn De Goes
 
Advanced Analytics & Statistics with MongoDB
Advanced Analytics & Statistics with MongoDBAdvanced Analytics & Statistics with MongoDB
Advanced Analytics & Statistics with MongoDBJohn De Goes
 

Viewers also liked (7)

UN Open GIS Capacity Building
UN Open GIS Capacity BuildingUN Open GIS Capacity Building
UN Open GIS Capacity Building
 
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...Citizen science, vgi, geo  crowd sourcing, big geo data how they matter to th...
Citizen science, vgi, geo crowd sourcing, big geo data how they matter to th...
 
In-Database Predictive Analytics
In-Database Predictive AnalyticsIn-Database Predictive Analytics
In-Database Predictive Analytics
 
Post-Free: Life After Free Monads
Post-Free: Life After Free MonadsPost-Free: Life After Free Monads
Post-Free: Life After Free Monads
 
Using MongoDB As a Tick Database
Using MongoDB As a Tick DatabaseUsing MongoDB As a Tick Database
Using MongoDB As a Tick Database
 
Analytics Maturity Model
Analytics Maturity ModelAnalytics Maturity Model
Analytics Maturity Model
 
Advanced Analytics & Statistics with MongoDB
Advanced Analytics & Statistics with MongoDBAdvanced Analytics & Statistics with MongoDB
Advanced Analytics & Statistics with MongoDB
 

Similar to Rise of the scientific database

An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentationMapR Technologies
 
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015cdmaxime
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the SurfaceJosi Aranda
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupCloudera, Inc.
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...Geoffrey Fox
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondXiangrui Meng
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondDataWorks Summit
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014cdmaxime
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at ScaleSascha Dittmann
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014cdmaxime
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" BioinformaticsBrian Repko
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...Xavier Llorà
 

Similar to Rise of the scientific database (20)

An introduction to apache drill presentation
An introduction to apache drill presentationAn introduction to apache drill presentation
An introduction to apache drill presentation
 
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015Apache Spark - San Diego Big Data Meetup Jan 14th 2015
Apache Spark - San Diego Big Data Meetup Jan 14th 2015
 
Spark from the Surface
Spark from the SurfaceSpark from the Surface
Spark from the Surface
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Apache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's GroupApache Hadoop & Friends at Utah Java User's Group
Apache Hadoop & Friends at Utah Java User's Group
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...What is the "Big Data" version of the Linpack Benchmark?; What is “Big Data...
What is the "Big Data" version of the Linpack Benchmark? ; What is “Big Data...
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
Recent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and BeyondRecent Developments in Spark MLlib and Beyond
Recent Developments in Spark MLlib and Beyond
 
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
Apache Spark - Santa Barbara Scala Meetup Dec 18th 2014
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
Microsoft R - Data Science at Scale
Microsoft R - Data Science at ScaleMicrosoft R - Data Science at Scale
Microsoft R - Data Science at Scale
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014Introduction to Spark - Phoenix Meetup 08-19-2014
Introduction to Spark - Phoenix Meetup 08-19-2014
 
"Big Data" Bioinformatics
"Big Data" Bioinformatics"Big Data" Bioinformatics
"Big Data" Bioinformatics
 
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...Data-Intensive Computing for  Competent Genetic Algorithms:  A Pilot Study us...
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study us...
 
Apache Drill
Apache DrillApache Drill
Apache Drill
 
Spark meetup TCHUG
Spark meetup TCHUGSpark meetup TCHUG
Spark meetup TCHUG
 

More from John De Goes

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type ClassesJohn De Goes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them AllJohn De Goes
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIOJohn De Goes
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }John De Goes
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final TaglessJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: RebirthJohn De Goes
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingJohn De Goes
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018John De Goes
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New GameJohn De Goes
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsJohn De Goes
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional ArchitectureJohn De Goes
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemJohn De Goes
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsJohn De Goes
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!John De Goes
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...John De Goes
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and FutureJohn De Goes
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!John De Goes
 

More from John De Goes (20)

Refactoring Functional Type Classes
Refactoring Functional Type ClassesRefactoring Functional Type Classes
Refactoring Functional Type Classes
 
One Monad to Rule Them All
One Monad to Rule Them AllOne Monad to Rule Them All
One Monad to Rule Them All
 
Error Management: Future vs ZIO
Error Management: Future vs ZIOError Management: Future vs ZIO
Error Management: Future vs ZIO
 
Atomically { Delete Your Actors }
Atomically { Delete Your Actors }Atomically { Delete Your Actors }
Atomically { Delete Your Actors }
 
The Death of Final Tagless
The Death of Final TaglessThe Death of Final Tagless
The Death of Final Tagless
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
Scalaz Stream: Rebirth
Scalaz Stream: RebirthScalaz Stream: Rebirth
Scalaz Stream: Rebirth
 
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional ProgrammingZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
ZIO Schedule: Conquering Flakiness & Recurrence with Pure Functional Programming
 
ZIO Queue
ZIO QueueZIO Queue
ZIO Queue
 
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018Blazing Fast, Pure Effects without Monads — LambdaConf 2018
Blazing Fast, Pure Effects without Monads — LambdaConf 2018
 
Scalaz 8: A Whole New Game
Scalaz 8: A Whole New GameScalaz 8: A Whole New Game
Scalaz 8: A Whole New Game
 
Scalaz 8 vs Akka Actors
Scalaz 8 vs Akka ActorsScalaz 8 vs Akka Actors
Scalaz 8 vs Akka Actors
 
Orthogonal Functional Architecture
Orthogonal Functional ArchitectureOrthogonal Functional Architecture
Orthogonal Functional Architecture
 
The Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect SystemThe Design of the Scalaz 8 Effect System
The Design of the Scalaz 8 Effect System
 
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & AnalyticsQuark: A Purely-Functional Scala DSL for Data Processing & Analytics
Quark: A Purely-Functional Scala DSL for Data Processing & Analytics
 
Streams for (Co)Free!
Streams for (Co)Free!Streams for (Co)Free!
Streams for (Co)Free!
 
MTL Versus Free
MTL Versus FreeMTL Versus Free
MTL Versus Free
 
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
The Easy-Peasy-Lemon-Squeezy, Statically-Typed, Purely Functional Programming...
 
Halogen: Past, Present, and Future
Halogen: Past, Present, and FutureHalogen: Past, Present, and Future
Halogen: Past, Present, and Future
 
All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!All Aboard The Scala-to-PureScript Express!
All Aboard The Scala-to-PureScript Express!
 

Rise of the scientific database

  • 1. Rise of the Scientific Database John A. De Goes, @jdegoes
  • 2. Agenda • Scientific Computing & Databases • Blessing / Curse of the RDBMS • Power of the Array • Scientific Databases • Hadoop • Summary & Conclusions
  • 3. What is Scientific Computing? "Scientific computing is concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems." —Wikipedia
  • 4. J LAPACK Mathematica Julia Fortran LINPACK SciLab Spark Modern numerical linear algebra MATLAB SciPy MLBase Gradient methods Conjugate gradient PDL SciDB Finite differences Finite difference for PDEs Poisson solvers Rasdaman MonetDB / SciQL 1940's 1960's 1980's 2000's The Future 1950's 1970's 1990's 2010's Finite element methods Stable SVD algorithms Large-scale eigenvalue NumPy ??? solvers Numeric linear algebra Iterative methods Hadoop GNU Octave Linear programming Stable pseudoinverses Mahout Python Monte carlo FFT HPCC SPSS APL invented CUDA SAS released OpenCL BrookGPU
  • 5. What is a Database? "A technology that combines the ability to store data with a high-level, high- performance means of storing, retrieving, and manipulating that data without having to write code or have knowledge of the mechanisms of implementation."
  • 6. Relational Model Ingres (QUEL) System R (SEQUEL) Julia SQL/DBS Spark DBS2 ODBMS MLBase Oracle MySQL SciDB "RDBMS" PostgreSQL MonetDB / SciQL 1960's 1980's 2000's The Future 1970's 1990's 2010's CODASYL SQL wins MongoDB ??? IMS DB2 CouchDB SABRE DBase Riak SQL Server Neo4j Other solutions
  • 7. The Relationship between Scientific Computing & Databases Scientific Scientific Data Computing Databases Analysis
  • 8. The Database Landscape Unstructured 2000 ? ? Semi-structured 2005 2000 ? Structured 1970 1980 ? Operational Analytical Scientific gets & puts sums & counts data analysis
  • 9. Relational Algebra Projection Selection Rename Natural Join R S Semijoin Antijoin Division Theta Join R S R S R ÷ S Left outer join Right outer join Full outer join Aggregation R ⟕ S R ⟖ S R⟗ S G1, G2, ..., Gm g f1(A1'), f2(A2'), ..., fk(Ak') (r)
  • 10. The Curse of RDBMS Sets Tuples ??? rows columns
  • 11. The Curse of RDBMS Sets Tuples Arrays rows columns
  • 12. The Power of the Array • Linear Algebra • Transforms (Fourier, wavelet, etc.) • Spatial Analysis • Temporal Analysis • Etc.
  • 13. Poor Man’s Arrays SELECT X.row AS row, Y.col AS col, SUM(X.value * Y.value) AS value, FROM X, Y where X.col = X.row GROUP BY X.row, Y.col
  • 14. Poor Man’s Arrays SELECT A.name, A.sales, SUM(B.sales) AS running_total FROM Sales AS A, Sales AS B WHERE A.sales < B.sales or (A.sales = B.sales and A.name = B.name) GROUP BY A.name, A.sales
  • 16. What is a Scientific Database? • First-class support for multidimensional arrays • Creation • Manipulation • Composition • Capable of expressing whole analyses, not just snippets • Tremendous benefits across multiple dimensions • Scalability & Performance • Expressiveness & Usability • Robustness & Accuracy
  • 17. Array Algebra • Many different approaches (NRCA, SciQL, AFL, ODMG, etc.) • Possible to define as extensions to relational core (but not necessary) • Most approaches share common core • Array deconstruction • Array construction • Array reduction
  • 18. Scientific Databases Rasdaman SciDB MonetDB (+SciQL)
  • 19. What About Hadoop? • Commonly used in scientific computing • No scientific database technology • But many useful programming libraries • Hama • Mahout • Cascading • Hadoop doesn’t make it easy • YARN should help (Tez?) • Balancing needs help • Not the only game in town anymore (BDAS, MPI-2, HPCC, etc.)
  • 20. Conclusions • Scientific computing can benefit from a scientific database • Success of RDBMS was also a curse • NoSQL, big data, catalysts for disruption • Still early for scientific databases • Hadoop loves/hates science
  • 21. Resources SciDB / Array Functional Language http://bit.ly/VdXJkA Rasdaman / rasql http://en.wikipedia.org/wiki/Rasdaman MonetDB / SciQL http://monetdb.org Precog / Quirrel http://precog.com Query Language for Multidimensional Arrays: Design, Implementation, & Optimization Techniques John A. De Goes, @jdegoes