SlideShare a Scribd company logo
1 of 17
StratioDeep: an integration layer
between Spark and Cassandra
Our customers

#StratioBD
StratioDeep
An efficient data mining solution
“Two and two are four?
Sometimes… Sometimes they are five.”

G. Orwell

#StratioBD
Why we use
Cassandra
Case A

One User – Lots of data
#StratioBD
Case B

Many Users – Few data
#StratioBD
Case C

Many users – Lots of data
#StratioBD
Why we also need Spark
• In Cassandra, you need to design the schema with the

query in mind
• Every other type of query is either very inefficient or
impossible to resolve

#StratioBD
Challenge
Accepted
StratioDeep features (I)
•
•
•

#StratioBD

Supports CQL3 features
Use of secondary Indexes
Small codebase (less bugs)
StratioDeep features (II)
Provides a Java friendly API:
•

Developers map Column Families to custom serializable POJOs

•

StratioDeep wraps the complexity of performing Spark calculations

directly over the user provided POJOs.
•

#StratioBD

SQL-Like Domain Specific Language
Stratio DSL (I)
SQL-Like domain specific language:
•
•
•

#StratioBD

Built on-top of Spark’s API.
SQL + Linq abstractions.
Unique interface to all Stratio platform modules
Stratio DSL (II)
Stratio RT extension
•

Built on-top of Spark Streaming API.

Stratio BUS extension
•

Registration of new channels/consumer/producers

Cross-module integration with StratioMeta
•

Lets us create flows of data between StratioDeep  StratioRT
•
Materialized views, live queries, alerts, etc…

#StratioBD
Conclusion

Use case A

#StratioBD

Use case C
THANKS

Luca Rosellini
@luca_rosellini

Alvaro Agea
@alvaroagea

More Related Content

Viewers also liked

Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Achim Friedland
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
jbellis
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
kakugawa
 

Viewers also liked (20)

Learn to use Stratio Crossdata
Learn to use Stratio CrossdataLearn to use Stratio Crossdata
Learn to use Stratio Crossdata
 
Why Spark?
Why Spark?Why Spark?
Why Spark?
 
Stratio big data spain
Stratio   big data spainStratio   big data spain
Stratio big data spain
 
Crossdata: an efficient distributed datahub with batch and streaming query ca...
Crossdata: an efficient distributed datahub with batch and streaming query ca...Crossdata: an efficient distributed datahub with batch and streaming query ca...
Crossdata: an efficient distributed datahub with batch and streaming query ca...
 
Mongo db administration guide
Mongo db administration guideMongo db administration guide
Mongo db administration guide
 
Primeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid MeetupPrimeros pasos con Apache Spark - Madrid Meetup
Primeros pasos con Apache Spark - Madrid Meetup
 
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and MonoFosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
Fosdem 2011 - A Common Graph Database Access Layer for .Net and Mono
 
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
Webinar: Compliance and Data Protection in the Big Data Age: MongoDB Security...
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Tutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtimeTutorial en Apache Spark - Clasificando tweets en realtime
Tutorial en Apache Spark - Clasificando tweets en realtime
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012Cassandra at NoSql Matters 2012
Cassandra at NoSql Matters 2012
 
Strategies for Distributed Data Storage
Strategies for Distributed Data StorageStrategies for Distributed Data Storage
Strategies for Distributed Data Storage
 
Introduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and ConsistencyIntroduction to Cassandra: Replication and Consistency
Introduction to Cassandra: Replication and Consistency
 
Database security
Database securityDatabase security
Database security
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 
Cloudian at cassandra conference in tokyo
Cloudian at cassandra conference in tokyoCloudian at cassandra conference in tokyo
Cloudian at cassandra conference in tokyo
 
Titan: The Rise of Big Graph Data
Titan: The Rise of Big Graph DataTitan: The Rise of Big Graph Data
Titan: The Rise of Big Graph Data
 
Cassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful APICassandra DataTables Using RESTful API
Cassandra DataTables Using RESTful API
 
Database Security
Database SecurityDatabase Security
Database Security
 

Similar to StratioDeep: an Integration Layer Between Spark and Cassandra - Spark Summit 2013

Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»
e-Legion
 

Similar to StratioDeep: an Integration Layer Between Spark and Cassandra - Spark Summit 2013 (20)

An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013Scalable database, Scalable language @ JDC 2013
Scalable database, Scalable language @ JDC 2013
 
3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)3 boyd direct3_d12 (1)
3 boyd direct3_d12 (1)
 
Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»Rafael Bagmanov «Scala in a wild enterprise»
Rafael Bagmanov «Scala in a wild enterprise»
 
Seattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp APISeattle Spark Meetup Mobius CSharp API
Seattle Spark Meetup Mobius CSharp API
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
Building Next Generation Drivers: Optimizing Performance in Go and Rust
Building Next Generation Drivers: Optimizing Performance in Go and RustBuilding Next Generation Drivers: Optimizing Performance in Go and Rust
Building Next Generation Drivers: Optimizing Performance in Go and Rust
 
Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1Linked in nosql_atnetflix_2012_v1
Linked in nosql_atnetflix_2012_v1
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
RenderScript
RenderScriptRenderScript
RenderScript
 
Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks20151015 zagreb spark_notebooks
20151015 zagreb spark_notebooks
 
Apache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetupApache spark-melbourne-april-2015-meetup
Apache spark-melbourne-april-2015-meetup
 
6 months/weeks training in Vlsi,jalandhar
6 months/weeks training in Vlsi,jalandhar6 months/weeks training in Vlsi,jalandhar
6 months/weeks training in Vlsi,jalandhar
 
6 weeks/months summer training in vlsi,ludhiana
6 weeks/months summer training in vlsi,ludhiana6 weeks/months summer training in vlsi,ludhiana
6 weeks/months summer training in vlsi,ludhiana
 
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelLaskar: High-Velocity GraphQL & Lambda-based Software Development Model
Laskar: High-Velocity GraphQL & Lambda-based Software Development Model
 
Spark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin OderskySpark - The Ultimate Scala Collections by Martin Odersky
Spark - The Ultimate Scala Collections by Martin Odersky
 
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
(SEC403) Diving into AWS CloudTrail Events w/ Apache Spark on EMR
 
Apache Spark on HDinsight Training
Apache Spark on HDinsight TrainingApache Spark on HDinsight Training
Apache Spark on HDinsight Training
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Recently uploaded (20)

Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

StratioDeep: an Integration Layer Between Spark and Cassandra - Spark Summit 2013

Editor's Notes

  1. Hi everybody, my name is Alvaro Agea, and my colleague is Luca Rosellini.We are Big Data architects in Stratio. Stratio is a new Startup that works only on big data projects, we starTed this summer and now we are developing different products and technologies for this type of problems.It’s evident we love Cassandra and we are creating a new platform around this technology