Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- When Should You Consider Using Amaz... by FlyData Inc. 1266 views
- Using DBpedia for Spotting and Disa... by Julien PLU 1178 views
- Linking Stanford Typed Dependencies... by fzablith 438 views
- Legislative Petitions Handout by fredvafamilyhisto... 322 views
- Using the US Federal Census in Gene... by fredvafamilyhisto... 450 views
- Object-Oriented Databases by Tess98 1144 views

1,530 views

1,330 views

1,330 views

Published on

Slides from the talk, "Rise of the Scientific Database" at Strata 2012 (Santa Clara).

No Downloads

Total views

1,530

On SlideShare

0

From Embeds

0

Number of Embeds

19

Shares

0

Downloads

40

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Rise of the Scientific Database John A. De Goes, @jdegoes
- 2. Agenda• Scientific Computing & Databases• Blessing / Curse of the RDBMS• Power of the Array• Scientific Databases• Hadoop• Summary & Conclusions
- 3. What is Scientific Computing?"Scientific computing is concerned withconstructing mathematical models andquantitative analysis techniques and usingcomputers to analyze and solve scientificproblems." —Wikipedia
- 4. J LAPACK Mathematica Julia Fortran LINPACK SciLab Spark Modern numerical linear algebra MATLAB SciPy MLBase Gradient methods Conjugate gradient PDL SciDB Finitedifferences Finite difference for PDEs Poisson solvers Rasdaman MonetDB / SciQL 1940s 1960s 1980s 2000s The Future 1950s 1970s 1990s 2010s Finite element methods Stable SVD algorithms Large-scale eigenvalue NumPy ??? solvers Numeric linear algebra Iterative methods Hadoop GNU Octave Linear programming Stable pseudoinverses Mahout Python Monte carlo FFT HPCC SPSS APL invented CUDA SAS released OpenCL BrookGPU
- 5. What is a Database?"A technology that combines the ability tostore data with a high-level, high-performance means of storing, retrieving,and manipulating that data without havingto write code or have knowledge of themechanisms of implementation."
- 6. Relational Model Ingres (QUEL) System R (SEQUEL) Julia SQL/DBS Spark DBS2 ODBMS MLBase Oracle MySQL SciDB "RDBMS" PostgreSQL MonetDB / SciQL1960s 1980s 2000s The Future 1970s 1990s 2010sCODASYL SQL wins MongoDB ??? IMS DB2 CouchDB SABRE DBase Riak SQL Server Neo4j Other solutions
- 7. The Relationship between Scientific Computing & Databases Scientific Scientific Data Computing Databases Analysis
- 8. The Database Landscape Unstructured 2000 ? ?Semi-structured 2005 2000 ? Structured 1970 1980 ? Operational Analytical Scientific gets & puts sums & counts data analysis
- 9. Relational AlgebraProjection Selection Rename Natural Join R S Semijoin Antijoin Division Theta Join R S R S R ÷ SLeft outer join Right outer join Full outer join Aggregation R ⟕ S R ⟖ S R⟗ S G1, G2, ..., Gm g f1(A1), f2(A2), ..., fk(Ak) (r)
- 10. The Curse of RDBMSSets Tuples ??? rows columns
- 11. The Curse of RDBMSSets Tuples Arrays rows columns
- 12. The Power of the Array• Linear Algebra• Transforms (Fourier, wavelet, etc.)• Spatial Analysis• Temporal Analysis• Etc.
- 13. Poor Man’s ArraysSELECT X.row AS row, Y.col AS col, SUM(X.value * Y.value) AS value, FROM X, Y where X.col = X.row GROUP BY X.row, Y.col
- 14. Poor Man’s ArraysSELECT A.name, A.sales, SUM(B.sales) AS running_total FROM Sales AS A, Sales AS B WHERE A.sales < B.sales or (A.sales = B.sales and A.name = B.name) GROUP BY A.name, A.sales
- 15. Poor Man’s Arrays
- 16. What is a Scientific Database?• First-class support for multidimensional arrays • Creation • Manipulation • Composition• Capable of expressing whole analyses, not just snippets• Tremendous benefits across multiple dimensions • Scalability & Performance • Expressiveness & Usability • Robustness & Accuracy
- 17. Array Algebra• Many different approaches (NRCA, SciQL, AFL, ODMG, etc.)• Possible to define as extensions to relational core (but not necessary)• Most approaches share common core • Array deconstruction • Array construction • Array reduction
- 18. Scientific DatabasesRasdaman SciDB MonetDB (+SciQL)
- 19. What About Hadoop?• Commonly used in scientific computing• No scientific database technology • But many useful programming libraries • Hama • Mahout • Cascading• Hadoop doesn’t make it easy • YARN should help (Tez?) • Balancing needs help• Not the only game in town anymore (BDAS, MPI-2, HPCC, etc.)
- 20. Conclusions• Scientific computing can benefit from a scientific database• Success of RDBMS was also a curse• NoSQL, big data, catalysts for disruption• Still early for scientific databases• Hadoop loves/hates science
- 21. Resources SciDB / Array Functional Language http://bit.ly/VdXJkA Rasdaman / rasql http://en.wikipedia.org/wiki/Rasdaman MonetDB / SciQL http://monetdb.org Precog / Quirrel http://precog.comQuery Language for Multidimensional Arrays: Design, Implementation, & Optimization Techniques John A. De Goes, @jdegoes

No public clipboards found for this slide

Be the first to comment