This is achieved thanks to its generic architecture and the definition of a custom SQL-like language. Our language augments the classical SQL data manipulation language in order to add support for streaming queries. From the point of view of the user, a common logical view of the existing catalogs and datastores is presented independently of which cluster or technology stores a particular table.
Supporting multiple architectures imposes two main challenges: how to normalize the access to the datastores, and how to cope with datastore limitations. In order to be able to access multiple datastore technologies Crossdata defines a common unifying interface containing a basic set of operations that a datastore may support. New connectors can be easily added to Crossdata to increase its connectivity
The document discusses several topics related to SQL:
1) SQLNet compression - How ordering data in a query can significantly reduce the amount of data sent over the network by compressing repeated values. Ordering by additional columns further improves compression.
2) NULLs and indexes - There is a misconception that indexes cannot be used with queries involving NULL values, but indexes can support queries searching for NULL values.
3) Subquery caching - Repeated scalar subqueries are cached and evaluated only once to improve performance of queries containing subqueries.
Spark is an alternative to Hadoop MapReduce for processing large datasets in parallel across a cluster, but it is not an alternative to Hadoop itself. While Spark can handle very large datasets up to 200 PB, it does not require that much memory and can work with mutable data. Spark supports both Scala and Java APIs and can also be used alongside existing Hadoop technologies like Hive, Pig, Impala, Tez and Drill.
StratioDeep: an Integration Layer Between Spark and Cassandra - Spark Summit ...Álvaro Agea Herradón
We present StratioDeep, an integration layer between the Spark distributed computing framework and Cassandra, a NoSQL distributed database.
Cassandra brings together the distributed system technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent and based on a P2P model without a single point of failure. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems. For these reasons, C* is one of the most popular NoSQL databases, but one of its handicaps is that it’s necessary to model the schema on the executed queries. This is because C* is oriented to search by key.
Integrating C* and Spark gives us a system that combines the best of both worlds.
Existing integrations between the two systems are not satisfactory: they basically provide an HDFS abstraction layer over C*. We believe this solution is not efficient because introduces an important overhead between the two systems.
The purpose of our work has been to provide an much lower-level integration that not only performs better, it also opens to Cassandra the possibility to solve a wide range of new use cases thanks to the powerfulness of the Spark distributed computing framework.
We’ve already deployed this solution in real applications with diverse clients: pattern detection, log mining, fraud detection, sentiment analysis and financial transaction analysis.
In addition this integration is the building block for our challenging and novel Lambda architecture completely based on Cassandra.
In order to complete the integration, we provide a seamless extension to the Cassandra Query Language: CQL is oriented to key-based search. As such, it is not a good choice to perform queries that move an huge amount of data. We’ve extended CQL in order to provide a user-friendly interface. This is a new approach for batch processing over C*. It consists in an abstraction layer that translates custom CQL queries to Spark jobs and delegates the complexity of distributing the query itself over the underlying cluster of commodity machines to Spar
This document discusses efficient data mining solutions using Hadoop, Cassandra, and Spark. It describes Cassandra as a fast, robust, and efficient key-value database but notes it has limitations for certain queries. Spark is presented as an alternative to Hadoop MapReduce that can be 100 times faster for interactive algorithms and data mining. The document demonstrates how Spark can integrate with Cassandra to allow distributed data processing over Cassandra data without needing to clone the data or use other databases. Future extensions are proposed to directly access Cassandra's SSTable files from Spark and extend CQL3 to leverage Spark.
Crossdata: an efficient distributed datahub with batch and streaming query ca...Álvaro Agea Herradón
Big Data analysis is commonly associated with batch processing of data stored in distributed file systems. The advent of streaming data is exposing the shortcomings of the traditional data analysis. Users aiming to combine both worlds - batch processing and streaming - had to turn to unreliable in-house developments. We propose Stratio META to meet this new need. META is a technology based on a structured NoSQL datastore with advanced indexing capabilities. META includes an efficient query planner designed from scratch. The planner determines which is the optimal path to execute a query and which components should be involved.
Primeros pasos con Apache Spark - Madrid Meetupdhiguero
Primeros pasos con Spark dentro del Apache Spark Meetup group de Madrid (http://www.meetup.com/Madrid-Apache-Spark-meetup/events/198362002/)
Contenido:
- Introdución
- Conceptos básicos
- Ecosistema Spark
- Instalación del entorno
- Errores comunes
Tutorial en Apache Spark - Clasificando tweets en realtimeSocialmetrix
Apache Spark [1] es un nuevo framework de procesamiento distribuido para big data, escrito en Scala con wrappers para Python, que viene generando mucha atención de la comunidad por su potencia, simplicidad de uso y velocidad de procesamiento. Ya siendo llamado como el remplazo de Apache Hadoop.
Socialmetrix desarrolla soluciones en este framework para generar reportes y dashboards de información a partir de los datos extraídos de redes sociales.
Los participantes de este tutorial van aprender a levantar información de Twitter usando Spark Streaming, Desarrollar algoritmos para calcular hashtags más frecuentes, usuarios más activos en batch processing aplicarlos en realtime a los nuevos tweets que lleguen a través del stream.
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...MongoDB
MongoDB Stitch is a serverless platform designed to help you easily and securely build an application on top of MongoDB Atlas. It lets developers focus on building applications rather than on managing data manipulation code, service integration, or backend infrastructure. MongoDB Stitch also makes it simple to respond to backend changes immediately, allowing you to simplify client side code and build complex flows more easily. This talk will cover ways that MongoDB Stitch helps you respond to changes in your database and take your applications to the next level.
The document discusses several topics related to SQL:
1) SQLNet compression - How ordering data in a query can significantly reduce the amount of data sent over the network by compressing repeated values. Ordering by additional columns further improves compression.
2) NULLs and indexes - There is a misconception that indexes cannot be used with queries involving NULL values, but indexes can support queries searching for NULL values.
3) Subquery caching - Repeated scalar subqueries are cached and evaluated only once to improve performance of queries containing subqueries.
Spark is an alternative to Hadoop MapReduce for processing large datasets in parallel across a cluster, but it is not an alternative to Hadoop itself. While Spark can handle very large datasets up to 200 PB, it does not require that much memory and can work with mutable data. Spark supports both Scala and Java APIs and can also be used alongside existing Hadoop technologies like Hive, Pig, Impala, Tez and Drill.
StratioDeep: an Integration Layer Between Spark and Cassandra - Spark Summit ...Álvaro Agea Herradón
We present StratioDeep, an integration layer between the Spark distributed computing framework and Cassandra, a NoSQL distributed database.
Cassandra brings together the distributed system technologies from Dynamo and the data model from Google’s BigTable. Like Dynamo, Cassandra is eventually consistent and based on a P2P model without a single point of failure. Like BigTable, Cassandra provides a ColumnFamily-based data model richer than typical key/value systems. For these reasons, C* is one of the most popular NoSQL databases, but one of its handicaps is that it’s necessary to model the schema on the executed queries. This is because C* is oriented to search by key.
Integrating C* and Spark gives us a system that combines the best of both worlds.
Existing integrations between the two systems are not satisfactory: they basically provide an HDFS abstraction layer over C*. We believe this solution is not efficient because introduces an important overhead between the two systems.
The purpose of our work has been to provide an much lower-level integration that not only performs better, it also opens to Cassandra the possibility to solve a wide range of new use cases thanks to the powerfulness of the Spark distributed computing framework.
We’ve already deployed this solution in real applications with diverse clients: pattern detection, log mining, fraud detection, sentiment analysis and financial transaction analysis.
In addition this integration is the building block for our challenging and novel Lambda architecture completely based on Cassandra.
In order to complete the integration, we provide a seamless extension to the Cassandra Query Language: CQL is oriented to key-based search. As such, it is not a good choice to perform queries that move an huge amount of data. We’ve extended CQL in order to provide a user-friendly interface. This is a new approach for batch processing over C*. It consists in an abstraction layer that translates custom CQL queries to Spark jobs and delegates the complexity of distributing the query itself over the underlying cluster of commodity machines to Spar
This document discusses efficient data mining solutions using Hadoop, Cassandra, and Spark. It describes Cassandra as a fast, robust, and efficient key-value database but notes it has limitations for certain queries. Spark is presented as an alternative to Hadoop MapReduce that can be 100 times faster for interactive algorithms and data mining. The document demonstrates how Spark can integrate with Cassandra to allow distributed data processing over Cassandra data without needing to clone the data or use other databases. Future extensions are proposed to directly access Cassandra's SSTable files from Spark and extend CQL3 to leverage Spark.
Crossdata: an efficient distributed datahub with batch and streaming query ca...Álvaro Agea Herradón
Big Data analysis is commonly associated with batch processing of data stored in distributed file systems. The advent of streaming data is exposing the shortcomings of the traditional data analysis. Users aiming to combine both worlds - batch processing and streaming - had to turn to unreliable in-house developments. We propose Stratio META to meet this new need. META is a technology based on a structured NoSQL datastore with advanced indexing capabilities. META includes an efficient query planner designed from scratch. The planner determines which is the optimal path to execute a query and which components should be involved.
Primeros pasos con Apache Spark - Madrid Meetupdhiguero
Primeros pasos con Spark dentro del Apache Spark Meetup group de Madrid (http://www.meetup.com/Madrid-Apache-Spark-meetup/events/198362002/)
Contenido:
- Introdución
- Conceptos básicos
- Ecosistema Spark
- Instalación del entorno
- Errores comunes
Tutorial en Apache Spark - Clasificando tweets en realtimeSocialmetrix
Apache Spark [1] es un nuevo framework de procesamiento distribuido para big data, escrito en Scala con wrappers para Python, que viene generando mucha atención de la comunidad por su potencia, simplicidad de uso y velocidad de procesamiento. Ya siendo llamado como el remplazo de Apache Hadoop.
Socialmetrix desarrolla soluciones en este framework para generar reportes y dashboards de información a partir de los datos extraídos de redes sociales.
Los participantes de este tutorial van aprender a levantar información de Twitter usando Spark Streaming, Desarrollar algoritmos para calcular hashtags más frecuentes, usuarios más activos en batch processing aplicarlos en realtime a los nuevos tweets que lleguen a través del stream.
Ch-Ch-Ch-Ch-Changes: Taking Your MongoDB Stitch Application to the Next Level...MongoDB
MongoDB Stitch is a serverless platform designed to help you easily and securely build an application on top of MongoDB Atlas. It lets developers focus on building applications rather than on managing data manipulation code, service integration, or backend infrastructure. MongoDB Stitch also makes it simple to respond to backend changes immediately, allowing you to simplify client side code and build complex flows more easily. This talk will cover ways that MongoDB Stitch helps you respond to changes in your database and take your applications to the next level.
This document provides lessons learned from running three AppSync projects in production over 18 months. It discusses preferences for using VTL over Lambda resolvers for CRUD operations due to VTL being faster, cheaper and simpler. It also recommends per-resolver caching over full request caching. Other tips include not leaving logging on full in production, handling user errors gracefully, planning for nested CloudFormation stacks for large projects, and modeling multi-tenancy using Cognito groups and attributes.
This document discusses the ql.io open source project, which provides a domain specific language (DSL) for making HTTP requests. The DSL allows HTTP resources to be treated like database tables, enabling CRUD operations on those resources with a SQL-like syntax. Ql.io can be used as an HTTP gateway and allows parallelizing and joining requests. It aims to simplify writing code for making API calls. The document provides examples of using the ql.io DSL and discusses how it can be used as a Node.js module.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)MongoDB
The document provides an agenda and prerequisites for a MongoDB Stitch workshop. The agenda includes an introduction to Stitch and Atlas, creating a simple API, building a dashboard, and adding authentication. The prerequisites are a computer, Node.js 6.0+, and links to example files and documentation. Various aspects of the workshop will cover connecting data sources through Stitch, building functions to access and manipulate data, creating a real-time dashboard, and securing access with authentication.
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Sparkhound Inc.
Whether developing a small customization or a large enterprise solution, one goal is to minimize redundancy in Code. In this presentation, Sparkhound Consultant Ted Wagner shows how the MVP design pattern is used in SharePoint to create business models that can be reused easily between other ASP or C# application.
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceCloudian
This document is to help a new user set up the network when deploying a 3-node Cloudian storage cluster in your data center for use with the Cloudian HyperStore Hybrid Cloud Service from AWS Marketplace.
The document provides information for a MongoDB Stitch workshop, including:
- The prerequisites needed for the workshop including a computer, MongoDB Atlas cluster, Node.js, and important files and documentation.
- The agenda for the workshop, which will cover an introduction to Stitch and Atlas, creating a simple API, building a dashboard with D3, and adding authentication.
- Instructions for getting started with Stitch and Atlas, including signing up for an Atlas account and whitelisting IP addresses.
1. MongoDB Stitch is a backend as a service that allows developers to easily work with data and integrate their apps with key services.
2. It provides integrated rules, pipelines, and services to handle complex workflows between databases and third party services.
3. Requests made to Stitch are parsed, rules are applied, databases and services are orchestrated, results are aggregated and returned to the client.
Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch is a platform that allows developers to easily access MongoDB databases and integrate with key services. It provides native SDKs, integrated rules and functions to build scalable backends. Requests made through Stitch are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch handles authentication, authorization and access controls through user profiles and declarative rules. It is a unified solution for building complete applications that connect to MongoDB and external services securely.
Virtual training intro to InfluxDB - June 2021InfluxData
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB
Serverless development with MongoDB Stitch allows developers to build applications without managing infrastructure. Stitch provides four main services - QueryAnywhere for data access, Functions for server-side logic, Triggers for real-time notifications, and Mobile Sync for offline data synchronization. These services integrate with MongoDB and other data sources through a unified API, and apply access controls and filters to queries. Functions can be used to build applications or enable data services, and are integrated with application context including user information, services, and values. This allows developers to write code without dealing with deployment or scaling.
Professional Services Insights into Improving Sitecore XPSeanHolmesby1
This presentation was delivered at SUGCON ANZ 2022 by Sean Holmesby and James Barrow from the Sitecore Professional Services team.
'So you're on XP, and it's not performing the way you want it to. What can you do about it?
In this session we'll go over the common pitfalls and issues that the Sitecore Professional Services team have come across in XP implementations, and how to fix them.
Poor site performance? Struggling xDB analytics? Log error messages that don't make any sense?
We've seen it all.... now let's help you fix them up.'
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
This document describes a web-based monitoring system project for caching solutions submitted by Subhayu Chakravorty for his Bachelor of Technology internship. The project involves developing a GUI using PHP that allows users and administrators to monitor caching servers. Key features include graphs of server metrics generated by Cacti, troubleshooting tools, and an admin panel to manage users and payments. The system was tested using servers provided by Data Consultancy Corps.
The document provides an agenda for a MuleSoft Meetup Group meeting in Moscow on May 13, 2021. The agenda includes introductions, MuleSoft updates, a demo and discussion on building secure financial APIs, a networking break, and a demo and discussion on revealing OData capabilities with Mulesoft and connecting it to Salesforce and mobile apps.
A fully web-based solution for calculating & displaying real-time data on large screens in contact centers (wallboards) and also directly on computer screens of supervisors, agents and even on mobile devices of executives (dashboards). Schedule a demo at 2Ring.com/Demo.
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022InfluxData
This document summarizes Samantha Wang's presentation on InfluxDB data collection options. She discussed three main options: native data collection, client libraries, and Telegraf. Native collection allows ingesting data directly from sources without transformation. Client libraries are available for many languages. Telegraf has over 300 plugins and supports ingesting from many sources. Future plans for Telegraf include reducing its binary size, improving the CLI, and adding new plugins.
This document provides an agenda and overview for a N1QL workshop on indexing and query tuning in Couchbase 4.0. The agenda includes sections on view index, global secondary index (GSI), multi-index scan, hands-on N1QL, query tuning, index selection hints, key-value access, joins, and more hands-on N1QL. The overview sections explain indexing in Couchbase including the primary index, secondary indexes, composite indexes, index intersection for multi-index scans, and the query execution flow involving parsing, planning, scanning indexes, and fetching documents.
Speaker: Drew DiPalma, Product Manager, Cloud, MongoDB
Level: 100 (Beginner)
Track: Developer
Come learn more about MongoDB Stitch – Our new Backend as a Service (BaaS) that makes it easy for developers to create and launch applications across mobile and web platforms. Stitch provides a REST API on top of MongoDB with read, write, and validation rules built-in and full integration with the services you love. This talk will cover the what, why, and how of MongoDB Stitch. We’ll discuss everything from features to the architecture. You’ll walk away knowing how Stitch can kickstart your new project or take your existing application to the next level.
What You Will Learn:
- The basics of MongoDB Stitch and how to use it to kickstart new projects and implement new features in existing projects.
- How to integrate your favorite services with your MongoDB application without writing any code.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
This document provides lessons learned from running three AppSync projects in production over 18 months. It discusses preferences for using VTL over Lambda resolvers for CRUD operations due to VTL being faster, cheaper and simpler. It also recommends per-resolver caching over full request caching. Other tips include not leaving logging on full in production, handling user errors gracefully, planning for nested CloudFormation stacks for large projects, and modeling multi-tenancy using Cognito groups and attributes.
This document discusses the ql.io open source project, which provides a domain specific language (DSL) for making HTTP requests. The DSL allows HTTP resources to be treated like database tables, enabling CRUD operations on those resources with a SQL-like syntax. Ql.io can be used as an HTTP gateway and allows parallelizing and joining requests. It aims to simplify writing code for making API calls. The document provides examples of using the ql.io DSL and discusses how it can be used as a Node.js module.
This is a run-through at a 200 level of the Microsoft Azure Big Data Analytics for the Cloud data platform based on the Cortana Intelligence Suite offerings.
Implementing Your Full Stack App with MongoDB Stitch (Tutorial)MongoDB
The document provides an agenda and prerequisites for a MongoDB Stitch workshop. The agenda includes an introduction to Stitch and Atlas, creating a simple API, building a dashboard, and adding authentication. The prerequisites are a computer, Node.js 6.0+, and links to example files and documentation. Various aspects of the workshop will cover connecting data sources through Stitch, building functions to access and manipulate data, creating a real-time dashboard, and securing access with authentication.
Optimizing Code Reusability for SharePoint using Linq to SharePoint & the MVP...Sparkhound Inc.
Whether developing a small customization or a large enterprise solution, one goal is to minimize redundancy in Code. In this presentation, Sparkhound Consultant Ted Wagner shows how the MVP design pattern is used in SharePoint to create business models that can be reused easily between other ASP or C# application.
Network Setup Guide: Deploying Your Cloudian HyperStore Hybrid Storage ServiceCloudian
This document is to help a new user set up the network when deploying a 3-node Cloudian storage cluster in your data center for use with the Cloudian HyperStore Hybrid Cloud Service from AWS Marketplace.
The document provides information for a MongoDB Stitch workshop, including:
- The prerequisites needed for the workshop including a computer, MongoDB Atlas cluster, Node.js, and important files and documentation.
- The agenda for the workshop, which will cover an introduction to Stitch and Atlas, creating a simple API, building a dashboard with D3, and adding authentication.
- Instructions for getting started with Stitch and Atlas, including signing up for an Atlas account and whitelisting IP addresses.
1. MongoDB Stitch is a backend as a service that allows developers to easily work with data and integrate their apps with key services.
2. It provides integrated rules, pipelines, and services to handle complex workflows between databases and third party services.
3. Requests made to Stitch are parsed, rules are applied, databases and services are orchestrated, results are aggregated and returned to the client.
Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch is a platform that allows developers to easily access MongoDB databases and integrate with key services. It provides native SDKs, integrated rules and functions to build scalable backends. Requests made through Stitch are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch handles authentication, authorization and access controls through user profiles and declarative rules. It is a unified solution for building complete applications that connect to MongoDB and external services securely.
Virtual training intro to InfluxDB - June 2021InfluxData
In this training webinar, we will walk you through the basics of InfluxDB – the purpose-built time series database. InfluxDB has everything you need from a time series platform in a single binary – a multi-tenanted time series database, UI and dashboarding tools, background processing and monitoring agent. This one-hour session will include the training and time for live Q&A.
What you will learn
Core concepts of time series databases
An overview of the InfluxDB platform
How to ingesting and query data in InfluxDB
MongoDB.local Atlanta: Introduction to Serverless MongoDBMongoDB
Serverless development with MongoDB Stitch allows developers to build applications without managing infrastructure. Stitch provides four main services - QueryAnywhere for data access, Functions for server-side logic, Triggers for real-time notifications, and Mobile Sync for offline data synchronization. These services integrate with MongoDB and other data sources through a unified API, and apply access controls and filters to queries. Functions can be used to build applications or enable data services, and are integrated with application context including user information, services, and values. This allows developers to write code without dealing with deployment or scaling.
Professional Services Insights into Improving Sitecore XPSeanHolmesby1
This presentation was delivered at SUGCON ANZ 2022 by Sean Holmesby and James Barrow from the Sitecore Professional Services team.
'So you're on XP, and it's not performing the way you want it to. What can you do about it?
In this session we'll go over the common pitfalls and issues that the Sitecore Professional Services team have come across in XP implementations, and how to fix them.
Poor site performance? Struggling xDB analytics? Log error messages that don't make any sense?
We've seen it all.... now let's help you fix them up.'
Tutorial: Building Your First App with MongoDB StitchMongoDB
MongoDB Stitch allows developers to easily access and integrate MongoDB databases with key services. It provides integrated rules, functions and SDKs to handle complex connection logic and orchestrate databases and third party services. Requests made through Stitch applications are parsed, services are orchestrated, rules are applied, and results are returned to clients. Stitch offers scalable hosted JavaScript functions and declarative access controls to securely manage data and service access.
This document describes a web-based monitoring system project for caching solutions submitted by Subhayu Chakravorty for his Bachelor of Technology internship. The project involves developing a GUI using PHP that allows users and administrators to monitor caching servers. Key features include graphs of server metrics generated by Cacti, troubleshooting tools, and an admin panel to manage users and payments. The system was tested using servers provided by Data Consultancy Corps.
The document provides an agenda for a MuleSoft Meetup Group meeting in Moscow on May 13, 2021. The agenda includes introductions, MuleSoft updates, a demo and discussion on building secure financial APIs, a networking break, and a demo and discussion on revealing OData capabilities with Mulesoft and connecting it to Salesforce and mobile apps.
A fully web-based solution for calculating & displaying real-time data on large screens in contact centers (wallboards) and also directly on computer screens of supervisors, agents and even on mobile devices of executives (dashboards). Schedule a demo at 2Ring.com/Demo.
Samantha Wang [InfluxData] | Data Collection Overview | InfluxDays 2022InfluxData
This document summarizes Samantha Wang's presentation on InfluxDB data collection options. She discussed three main options: native data collection, client libraries, and Telegraf. Native collection allows ingesting data directly from sources without transformation. Client libraries are available for many languages. Telegraf has over 300 plugins and supports ingesting from many sources. Future plans for Telegraf include reducing its binary size, improving the CLI, and adding new plugins.
This document provides an agenda and overview for a N1QL workshop on indexing and query tuning in Couchbase 4.0. The agenda includes sections on view index, global secondary index (GSI), multi-index scan, hands-on N1QL, query tuning, index selection hints, key-value access, joins, and more hands-on N1QL. The overview sections explain indexing in Couchbase including the primary index, secondary indexes, composite indexes, index intersection for multi-index scans, and the query execution flow involving parsing, planning, scanning indexes, and fetching documents.
Speaker: Drew DiPalma, Product Manager, Cloud, MongoDB
Level: 100 (Beginner)
Track: Developer
Come learn more about MongoDB Stitch – Our new Backend as a Service (BaaS) that makes it easy for developers to create and launch applications across mobile and web platforms. Stitch provides a REST API on top of MongoDB with read, write, and validation rules built-in and full integration with the services you love. This talk will cover the what, why, and how of MongoDB Stitch. We’ll discuss everything from features to the architecture. You’ll walk away knowing how Stitch can kickstart your new project or take your existing application to the next level.
What You Will Learn:
- The basics of MongoDB Stitch and how to use it to kickstart new projects and implement new features in existing projects.
- How to integrate your favorite services with your MongoDB application without writing any code.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Need for Speed: Removing speed bumps from your Symfony projects ⚡️Łukasz Chruściel
No one wants their application to drag like a car stuck in the slow lane! Yet it’s all too common to encounter bumpy, pothole-filled solutions that slow the speed of any application. Symfony apps are not an exception.
In this talk, I will take you for a spin around the performance racetrack. We’ll explore common pitfalls - those hidden potholes on your application that can cause unexpected slowdowns. Learn how to spot these performance bumps early, and more importantly, how to navigate around them to keep your application running at top speed.
We will focus in particular on tuning your engine at the application level, making the right adjustments to ensure that your system responds like a well-oiled, high-performance race car.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
E-commerce Application Development Company.pdfHornet Dynamics
Your business can reach new heights with our assistance as we design solutions that are specifically appropriate for your goals and vision. Our eCommerce application solutions can digitally coordinate all retail operations processes to meet the demands of the marketplace while maintaining business continuity.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI AppGoogle
AI Fusion Buddy Review: Brand New, Groundbreaking Gemini-Powered AI App
👉👉 Click Here To Get More Info 👇👇
https://sumonreview.com/ai-fusion-buddy-review
AI Fusion Buddy Review: Key Features
✅Create Stunning AI App Suite Fully Powered By Google's Latest AI technology, Gemini
✅Use Gemini to Build high-converting Converting Sales Video Scripts, ad copies, Trending Articles, blogs, etc.100% unique!
✅Create Ultra-HD graphics with a single keyword or phrase that commands 10x eyeballs!
✅Fully automated AI articles bulk generation!
✅Auto-post or schedule stunning AI content across all your accounts at once—WordPress, Facebook, LinkedIn, Blogger, and more.
✅With one keyword or URL, generate complete websites, landing pages, and more…
✅Automatically create & sell AI content, graphics, websites, landing pages, & all that gets you paid non-stop 24*7.
✅Pre-built High-Converting 100+ website Templates and 2000+ graphic templates logos, banners, and thumbnail images in Trending Niches.
✅Say goodbye to wasting time logging into multiple Chat GPT & AI Apps once & for all!
✅Save over $5000 per year and kick out dependency on third parties completely!
✅Brand New App: Not available anywhere else!
✅ Beginner-friendly!
✅ZERO upfront cost or any extra expenses
✅Risk-Free: 30-Day Money-Back Guarantee!
✅Commercial License included!
See My Other Reviews Article:
(1) AI Genie Review: https://sumonreview.com/ai-genie-review
(2) SocioWave Review: https://sumonreview.com/sociowave-review
(3) AI Partner & Profit Review: https://sumonreview.com/ai-partner-profit-review
(4) AI Ebook Suite Review: https://sumonreview.com/ai-ebook-suite-review
#AIFusionBuddyReview,
#AIFusionBuddyFeatures,
#AIFusionBuddyPricing,
#AIFusionBuddyProsandCons,
#AIFusionBuddyTutorial,
#AIFusionBuddyUserExperience
#AIFusionBuddyforBeginners,
#AIFusionBuddyBenefits,
#AIFusionBuddyComparison,
#AIFusionBuddyInstallation,
#AIFusionBuddyRefundPolicy,
#AIFusionBuddyDemo,
#AIFusionBuddyMaintenanceFees,
#AIFusionBuddyNewbieFriendly,
#WhatIsAIFusionBuddy?,
#HowDoesAIFusionBuddyWorks
GraphSummit Paris - The art of the possible with Graph TechnologyNeo4j
Sudhir Hasbe, Chief Product Officer, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Software Engineering, Software Consulting, Tech Lead, Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Transaction, Spring MVC, OpenShift Cloud Platform, Kafka, REST, SOAP, LLD & HLD.
Odoo ERP software
Odoo ERP software, a leading open-source software for Enterprise Resource Planning (ERP) and business management, has recently launched its latest version, Odoo 17 Community Edition. This update introduces a range of new features and enhancements designed to streamline business operations and support growth.
The Odoo Community serves as a cost-free edition within the Odoo suite of ERP systems. Tailored to accommodate the standard needs of business operations, it provides a robust platform suitable for organisations of different sizes and business sectors. Within the Odoo Community Edition, users can access a variety of essential features and services essential for managing day-to-day tasks efficiently.
This blog presents a detailed overview of the features available within the Odoo 17 Community edition, and the differences between Odoo 17 community and enterprise editions, aiming to equip you with the necessary information to make an informed decision about its suitability for your business.
13. Crossdata
o A new technology that:
• Is not limited by the underlying datastore capabili9es
• Leverages Spark to perform non-‐na9vely supported
opera9ons
• Supports batch and streaming queries
• Supports mul9ple clusters and technologies
#BDS1413
14. Learn to use Stratio Crossdata
Developing your first connector
14
Daniel Higuero
dhiguero@stratio.com
@dhiguero
Alvaro Agea
alvaro@stratio.com
@alvaroagea
#BDS14
17. Connecting to the outside world
o Crossdata defines an IConnector extension interface
o User can easily add new connectors to support
• Different datastores
• Different processing engines
• Different versions
o Where each connector defines its capabili9es
17
Our planner will choose the best connector for each query
#BDS14
18. Query execution
18
Parsing
Valida6on
Planning
Execu6on
datastore
Connector1
Connector2
Connector3
Our planner will choose the best connector for each query
#BDS14
19. Multi-cluster support
o Stra9o Crossdata offers the possibility of accessing a single
catalog across a set of datastores.
• Mul9ple clusters can coexist to op9mize plaOorm
performance
§ E.g., produc9on cluster, test cluster, write-‐op9mized
cluster, read-‐op9mized cluster, etc.
• A table is saved in a unique datastore
#BDS1419
20. Logical and physical mapping
20
SELECT
*
FROM
app.users;
Users
table
Test
table
old_users
table
App
catalog
C*
Produc6on
M
development
Other
datastores
#BDS14
22. Metadata in the era of Schemaless NoSQL datastores
o Some datastores are schemaless but our applica9ons are not!
• Flexible schemas vs Schemaless
• Crossdata provides a Metadata manager that stores
schemas for any datasource
§ Remember ODBC and those BI tools
?
1010010101010
1010110101010
1111010001111
001000
#BDS1422
23. Metadata management
23
Connector
C*
produc6on
Metadata
Store
Infinispan
Metadata
Manager
2
Updated
metadata
informa6on
is
maintained
among
Crossdata
servers
using
Infinispan
If
the
connector
does
not
support
metadata
opera6ons
those
are
skipped
1
2
#BDS14
25. Stratio Crossdata ODBC/JDBC
o Well-‐known interface standard (for BI tools, external apps, …)
o We have implemented it using Simba SDK
o It opens the full poten9al of Stra9o Crossdata to the external
world
o Currently tested with Tableau, Qlikview and MS Excel
25
One ODBC/JDBC for all datastores!
#BDS14
27. Crossdata Connectors
o The Crossdata core is abstracted of the inner workings of each
Connectors.
• Common IConnector interface.
• Use of XML manifest to define datastore and connector
capabilies
o Each connector may access different datastores
o Each connector supports many clusters of the same datastore
technology
#BDS1427
29. Connector interface
o The ConnectorApp abstracts the
communica9on with the server
using AKKA.
• No worries about transferring
data, status, etc.
• It takes an implementa9on of
IConnector and launches the
required actors
ConnectorApp
IConnector
IConnectorImpl
Datastore
#BDS1429
30. IConnector
public
interface
IConnector
{
/**
*
Get
the
name
of
the
connector.
*
@return
A
name.
*/
String
getConnectorName();
/**
*
Get
the
names
of
the
datastores
supported
by
the
connector.
*
Several
connectors
may
declare
the
same
datastore
name.
*
@return
The
names.
*/
String[]
getDatastoreName();
#BDS1430
31. IConnector (II)
/**
*
Initialize
the
connector
service.
*
@param
configuration
The
configuration.
*
@throws
InitializationException
If
the
connector
initialization
fails.
*/
void
init(IConfiguration
configuration)
throws
InitializationException;
/**
*
Connect
to
a
datastore
using
a
set
of
options.
*
@param
credentials
The
required
credentials
*
@param
config
The
cluster
configuration.
*
@throws
ConnectionException
If
the
connection
could
not
be
established.
*/
void
connect(ICredentials
credentials,
ConnectorClusterConfig
config)
throws
ConnectionException;
#BDS1431
32. IConnector (III)
/**
*
Get
the
storage
engine.
*
...
*/
IStorageEngine
getStorageEngine()
throws
UnsupportedException;
/**
*
Get
the
query
engine.
*
...
*/
IQueryEngine
getQueryEngine()
throws
UnsupportedException;
/**
*
Get
the
metadata
engine
...
*/
IMetadataEngine
getMetadataEngine()
throws
UnsupportedException;
#BDS1432
33. IMetadataEngine
o Defines opera9ons related with the metadata management
createCatalog(ClusterName,
CatalogMetadata)
dropCatalog(ClusterName,
CatalogName)
createTable(ClusterName,
TableMetadata)
alterTable(ClusterName,
TableName,
AlterOptions)
dropTable(ClusterName,
TableName)
createIndex(ClusterName,
IndexMetadata)
dropIndex(ClusterName,
IndexName)
#BDS1433
34. IStorageEngine
o Defines opera9ons related with wri9ng data
insert(ClusterName,
TableMetadata,
Row)
insert(ClusterName,
TableMetadata,
Collection<Row>)
delete(ClusterName,
TableName,
Collection<Filter>)
update(ClusterName,
TableName,
Collection<Relation>,
Collection<Filter>)
truncate(ClusterName,
TableName)
#BDS1434
35. IQueryEngine
o Defines opera9ons related to querying data
execute(LogicalWorkflow)
asyncExecute(String,
LogicalWorkflow,
IResultHandler)
stop(String)
#BDS1435
36. Logical workflows
o Graph representa9on of a query,
composed of two types of logical
steps
§ Transforma9on step: one input,
one output
§ Union step: n inputs, one
output
#BDS1436
37. Building a Logical Workflow
o Consider the following query
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Parsing
Valida6on
Planning
#BDS1437
38. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Iden9fy tables and required fields
#BDS1438
39. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
100
For each table, retrieve all columns that
LIMIT
are involved in the query
#BDS1439
40. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
100
For each table, retrieve all columns that
LIMIT
are involved in the query
#BDS1440
41. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
Build a Project logical step per table
#BDS1441
42. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
#BDS1442
43. Building a Logical Workflow - Project
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
Project
TWEET(id,
user,
hashtag)
Project
MENTIONS(user,
counter)
#BDS1443
44. Building a Logical Workflow - Filters
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
counter)
Next, we add filtering
steps ASAP
#BDS1444
45. Building a Logical Workflow - Filters
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
counter)
Filter
(hashtag
=
‘bds14’)
#BDS1445
46. Building a Logical Workflow - Filters
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET(id,
user,
hashtag)
P
MENTIONS(user,
Filter
(hashtag
=
‘bds14’)
counter)
Filter
(counter
>
100)
#BDS1446
47. Building a Logical Workflow - Window
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
(hashtag
=
‘bds14’)
Filter
(counter
>
100)
#BDS1447
48. Building a Logical Workflow - Window
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
(hashtag
=
‘bds14’)
Filter
(counter
>
100)
Window
(2
min)
#BDS1448
49. Building a Logical Workflow - Join
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
#BDS1449
50. Building a Logical Workflow - Join
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
m.user
=
t.user
#BDS1450
51. Building a Logical Workflow – Order By
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
#BDS1451
52. Building a Logical Workflow – Order By
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
#BDS1452
53. Building a Logical Workflow – Limit
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
#BDS1453
54. Building a Logical Workflow – Limit
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Filter
Window
(2
m)
Filter
(counter
>
100)
Join
GroupBy
(m.counter)
Limit
100
#BDS1454
55. Building a Logical Workflow – Select
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Window
Filter
Filter
Join
GroupBy
Limit
100
#BDS1455
56. Building a Logical Workflow – Select
SELECT
tweet.id,
tweet.user
FROM
transactions
WITH
WINDOW
2
minutes
JOIN
mentions
ON
mentions.user
=
tweet.user
WHERE
mentions.counter
>
100
AND
tweet.hashtag
=
‘#bds14’
ORDER
BY
mentions.counter
LIMIT
100
P
TWEET
P
MENTIONS
Window
Filter
Filter
IQueryEngine.execute()
Join
GroupBy
Limit
100
Select
id,
user
#BDS1456
57. o Na9ve
• Cassandra
• MongoDB
• Aerospike
• Elas9cSearch
• Stra9o Streaming
o based
• Cassandra
• MongoDB
• Aerospike
• HDFS
Existing connectors
#BDS1457
64. Learn to use Stratio Crossdata
Developing your first connector
64
Daniel Higuero
dhiguero@stratio.com
@dhiguero
Alvaro Agea
alvaro@stratio.com
@alvaroagea
#BDS14
Editor's Notes
Crossdata is a new technology that have been in development for the last months. It is current open source with Apache license.
And it has several new features that we think will change the interaction approach with big data systems.
Crossdata avoids the limitations imposed by the underlying datastore. For instance, if a column has not been indexed, we need to think about methods of retriving that columns.
It is important to highlight that our focus is on users and performance. If the user request some data, it is important to provide that result. We think it is better to answer a non-optimal query in the system that forbid it. We will always have time to add a new index if that query becomes the norm. As an example, think about users not involved in the database design that interact with the system through business intelligent tools.
We use spark to performn non-natively queries in those cases, and we also use it when we need to mix data coming from streaming sources with batch data.
Moreover, our design allows us to access several clusters and technologies at the same time.
From the architectural point of view, we can define three main layers
On the top we find the driver which is used to built custom applications through his Java/Scala API. We have also an ODBC connector for external tools and we provide REST API.
On the middle, we have our server component, which contains the core logic and the distributed capabilities.
On the bottom, being a generic architecture we employ a connector-based approach that permits to extend the system to communicate with any datastore.
Communication between layers is accomplished using scala actor framework.
With respect to the Connectors approach, we have defined a Java interface that contains the set of operations that an ideal connector should provide.
Notice that our design simplifies connector development so users can easy add their datastore to the Crossdata ecosystem. Each developer will define which are the connector capabilities and Crossdata with its planner will choose the best one depending on the query. Several connector to the same datastore can coexist at the same time.
To clarify this, our query execution path is pretty similar to the existing ones, with the addition of the connector selection step after the query planning is determined.
Another nice feature is related to multi-cluster support. Many times we have seen an application that have two completelly different types of accesses. Would it be nice if we can easily tune its database. If both types of accesses target the same cluster this is usually imposible as you need to decide for instance if your workload is write or read oriented.
With Crossdata a common logical view is provide to the user independly of which is the cluster storing a particular table.
We only have impose the limitation that a tablename can only be found in a single cluster.
To illustrate this, let us imagine an scenario with three cluster. One production cassandra cluster, one development cluster and other cluster with an old version of the database or other technologies. In this case if we submit the following query, Crossdata will determine which is the datastore persisting table users, and it will retrieve a resultset without requiring any knowledge from the user point-of-view about physical deployment.
Now let’s focus on Metadata and how we can solve its management problem when we have different techonolgies.
The first question that arises when we talk about metadata is how are we going to be compatible with schemaless aproaches. It is true that some datastores are schemales, but it is also true that our applications need to know which fields should be queried and shown to the users.
The difference for us is not whether a datastore is schemaless, it is more related with providing the user with a flexible way of updating existing schemas.
Crossdata stores schemas for any datasource independly of whether they are schemaless. Remember, that some times other applications like business intelligent tools do exists, and they require schemas.
To show how it is done, even though in Cassandra we already have an schema (… thanks for that …) Crossdata stores and shares the schema itselft among the existing Crossdata servers. This is also important to reduce metadata related queries to the underlying datastore (For instance, during query validation).
And finally, it is time for the ODBC.
We have develop an ODBC that retrives data from Crossdata that allow us to integrate with different applications and business intelligent tools. To mention some of them, we currently support Tableau, Qlikview and Excel. It is important to highlight that given the generic nature of Crossdata, the existence of this ODBC opens the possibility of connecting any datastore through ODBC by just writing a new connector. And believe us if we tell you that it is easier to do so.