In today’s world of agile business, Java developers and organizations benefit when JSON-based NoSQL databases and SQL-based querying come together. NoSQL provides schema flexibility and elastic scaling. SQL provides expressive, independent data access. Java developers need to deliver apps that readily evolve, perform, and scale with changing business needs. Organizations need rapid access to their operational data, using standard analytical tools, for insight into their business. In this session, you will learn to build apps that combine NoSQL and SQL for agility, performance, and scalability. This includes
• JSON data modeling
• Indexing
• Tool integration
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
Data Con LA 2018 - Agile Integration Using an Enterprise Data Hub by Michael ...Data Con LA
Agile Integration Using an Enterprise Data Hub by Michael Malgeri, Principal Technologist at MarkLogic, Corp.
Today, business processes and decisions are driven from information contained in a variety of internal systems of record (SoRs) and other external sources (social media, partners, suppliers, customers). Enterprises are discovering that out of the box (OOTB) solutions often don't exist for their operational and observational requirements. In addition, traditional integration solutions often take too long to implement and are brittle. This presentation will discuss how an enterprise operational data hub (ODH) can be an agile, flexible solution for integrating key information sources in a timely manner for driving mission critical applications and reporting.
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
I discuss a good big data architecture which includes Data Warehouse / Business Intelligence + Apache Hadoop + Real Time / Stream Processing. Several real world example are shown. TIBCO offers some very nice products for realizing these use cases, e.g. Spotfire (Business Intelligence / BI), StreamBase (Stream Processing), BusinessEvents (Complex Event Processing / CEP) and BusinessWorks (Integration / ESB). TIBCO is also ready for Hadoop by offering connectors and plugins for many important Hadoop frameworks / interfaces such as HDFS, Pig, Hive, Impala, Apache Flume and more.
Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
The Mechanics of Testing Large Data PipelinesC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1Vx8CwP.
Mathieu Bastian explores the mechanics of testing large, complex data workflows and tries to identify the most common challenges developers face. He looks at good practices how to develop unit, integration, data and performance testing for data workflows. In terms of tools, he looks at what exists today for Hadoop, Pig and Spark with code examples. Filmed at qconlondon.com.
Mathieu Bastian spent the last four years building data products at LinkedIn. He joined LinkedIn's Data Science team in late 2010 and first worked on InMaps, a system that served millions of complex graph visualizations to users. Previously, Bastian co-founded and led the development of Gephi, an award-winning open-source software to visualize and analyze large graphs.
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)Kent Graziano
(updated slides used for North Texas DAMA meetup Oct 2018) As we move more and more towards the need for everyone to do Agile Data Warehousing, we need a data modeling method that can be agile with us. Data Vault Data Modeling is an agile data modeling technique for designing highly flexible, scalable, and adaptable data structures for enterprise data warehouse repositories. It is a hybrid approach using the best of 3NF and dimensional modeling. It is not a replacement for star schema data marts (and should not be used as such). This approach has been used in projects around the world (Europe, Australia, USA) for over 15 years and is now growing in popularity. The purpose of this presentation is to provide attendees with an introduction to the components of the Data Vault Data Model, what they are for and how to build them. The examples will give attendees the basics:
• What the basic components of a DV model are
• How to build, and design structures incrementally, without constant refactoring
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
Data Con LA 2018 - Agile Integration Using an Enterprise Data Hub by Michael ...Data Con LA
Agile Integration Using an Enterprise Data Hub by Michael Malgeri, Principal Technologist at MarkLogic, Corp.
Today, business processes and decisions are driven from information contained in a variety of internal systems of record (SoRs) and other external sources (social media, partners, suppliers, customers). Enterprises are discovering that out of the box (OOTB) solutions often don't exist for their operational and observational requirements. In addition, traditional integration solutions often take too long to implement and are brittle. This presentation will discuss how an enterprise operational data hub (ODH) can be an agile, flexible solution for integrating key information sources in a timely manner for driving mission critical applications and reporting.
"Hadoop and Data Warehouse (DWH) – Friends, Enemies or Profiteers? What about...Kai Wähner
I discuss a good big data architecture which includes Data Warehouse / Business Intelligence + Apache Hadoop + Real Time / Stream Processing. Several real world example are shown. TIBCO offers some very nice products for realizing these use cases, e.g. Spotfire (Business Intelligence / BI), StreamBase (Stream Processing), BusinessEvents (Complex Event Processing / CEP) and BusinessWorks (Integration / ESB). TIBCO is also ready for Hadoop by offering connectors and plugins for many important Hadoop frameworks / interfaces such as HDFS, Pig, Hive, Impala, Apache Flume and more.
Non-interactive big-data analysis prohibits experimentation and can interrupt the analyst’s train of thoughts but analyzing and drawing insights in real time is no easy task with jobs often taking minutes/hours to complete. What if you want to put a interactive interface in front of that data that allows iterative insights? What if you need that interactive experience to be sub second?
Traditional SQL and most MPP/NoSQL databases cannot run complex calculations over large data in a performant manner. Popular distributed systems such as Hadoop or Spark can execute jobs but their job overhead prohibits sub second response times. Learn how an in-memory computing framework enabled us to perform complex analysis jobs on massive data points with sub second response times — allowing us to plug it into a simple, drag-and-drop web 2.0 interface.
The Mechanics of Testing Large Data PipelinesC4Media
Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/1Vx8CwP.
Mathieu Bastian explores the mechanics of testing large, complex data workflows and tries to identify the most common challenges developers face. He looks at good practices how to develop unit, integration, data and performance testing for data workflows. In terms of tools, he looks at what exists today for Hadoop, Pig and Spark with code examples. Filmed at qconlondon.com.
Mathieu Bastian spent the last four years building data products at LinkedIn. He joined LinkedIn's Data Science team in late 2010 and first worked on InMaps, a system that served millions of complex graph visualizations to users. Previously, Bastian co-founded and led the development of Gephi, an award-winning open-source software to visualize and analyze large graphs.
This document provides an introduction to MongoDB and includes several customer case studies. It discusses how MongoDB allows customers to intelligently store flexible data, run applications anywhere, and scale easily. Case studies describe how an insurance company built a single customer view in 90 days, how the city of Chicago created a real-time geospatial platform, and how an agricultural company increased farm outputs by 8% using IoT data stored in MongoDB. The document argues that MongoDB is uniquely positioned as a modern general purpose database and has significant funding to continue investing in its business.
Nubank is the leading fintech in Latin America. Using bleeding-edge technology, design, and data, the company aims to fight complexity and empower people to take control of their finances. We are disrupting an outdated and bureaucratic system by building a simple, safe and 100% digital environment.
In order to succeed, we need to constantly make better decisions in the speed of insight, and that’s what We aim when building Nubank’s Data Platform. In this talk we want to explore and share the guiding principles and how we created an automated, scalable, declarative and self-service platform that has more than 200 contributors, mostly non-technical, to build 8 thousand distinct datasets, ingesting data from 800 databases, leveraging Apache Spark expressiveness and scalability.
The topics we want to explore are:
– Making data-ingestion a no-brainer when creating new services
– Reducing the cycle time to deploy new Datasets and Machine Learning models to production
– Closing the loop and leverage knowledge processed in the analytical environment to take decisions in production
– Providing the perfect level of abstraction to users
You will get from this talk:
– Our love for ‘The Log’ and how we use it to decouple databases from its schema and distribute the work to keep schemas up to date to the entire team.
– How we made data ingestion so simple using Kafka Streams that teams stopped using databases for analytical data.
– The huge benefits of relying on the DataFrame API to create datasets which made possible having tests end-to-end verifying that the 8000 datasets work without even running a Spark Job and much more.
– The importance of creating the right amount of abstractions and restrictions to have the power to optimize.
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...DataWorks Summit
In this talk we will describe the journey we made with one of our customers, Volotea, to deploy a serverless Business Intelligence (BI), Machine Learning (ML) and Big Data (BD) platform on the Cloud. The new platform leverages Platform-as-a-Service (PaaS) Cloud services, and it is the result of the reengineering and extension of an existing platform based on Cloud Infrastructure-as-a-Service (IaaS) services and bare-metal systems. Managing and maintaining BI, ML and BD platforms based on bare-metal or IaaS deployments is not a straightforward task, and as size and complexity grow, we often find ourselves spending more and more time in tasks that are rather administrative, more than of a development or analytics nature. That is exactly what Volotea realized, and together we envisioned and executed a plan to lift and reengineer their platform into a new solution that leverages Microsoft Azure PaaS services. We have delivered a solution that manages to greatly reduce the administrative burden as well as the technical complexity when implementing new use cases. The new platform is based on the Microsoft Azure stack and it includes Azure Data Lake, Azure Data Lake Analytics, Azure Data Factory, Azure Machine Learning and Azure SQL Database. Join us in this talk where we will share our lessons learned and we will discuss how to plan and execute such an endeavor.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented.
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
Peter Vescuso, CMO, VoltDB and Mike Gualtieri Principal Analyst, Forrester Research talk about the latest on the streaming analytics market, in-memory database technology, and the emergence of ‘translytics’. Also hear how the latest release of VoltDB, the in-memory operational database, simplifies the distributed system complexity of building high-speed data pipelines with rapid ingestion of data, real-time analytics, Apache Kafka support, and Elasticsearch support.
eBay has one of the largest and most active data platforms in the world. The presentation discusses eBay's business, strategy, and key trends driving commerce. It then provides details on eBay's big data platform, including the large volume of data collected daily and how it is captured, transformed and synthesized to provide actionable insights. The presentation concludes by discussing how eBay has evolved to a governed self-service model for analytics to better organize and provide a consistent experience for its diverse user community.
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The document provides an overview of the Scout24 Data Platform and its evolution towards becoming a truly data-driven company. Some key points:
- Scout24 operates various household brands across 18 countries with 80 million household reach.
- Historically, Scout24's technical architecture included a monolithic application and data warehouse that acted as a bottleneck.
- To address this, Scout24 built an internal "data platform" consisting of a microservices architecture, data lake, self-service analytics, and data ingestion tools to enable fast, easy product development supported by data and analytics.
- The data platform is thought of as a product in itself that provides generic layers for Scout24's products to be built upon
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
Gone are the days of a single enterprise database – typically and Oracle RDBMS – that holds all data in a strictly normalized form. We work with many more types of data (big and fast, structured and unstructured) that we use in various ways. Relational and ACID is not applicable to all of those. Always the latest, freshest data may not be uniformly valid either. We will continue to see an increase in specialized data stores that cater for specific needs and specific scenarios.
This presentation is a combination of a presentation and a demonstration on the various dimensions and use cases of using data and data stores in various ways – while ensuring the appropriate (!) levels of freshness, integrity, performance. Key take away: what as an architect you should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation. How are upcoming architectural patterns such as CQRS (command query responsibility segregation) , event sourcing and microservices influencing the way we handle data in the enterprise? Some of the technologies discussed: products such as MongoDB, MySQL, Neo4J, Apache Kafka, Redis, Elastic Search and Hadoop/Spark, Oracle Data Hub Cloud (based on Apache Cassandra) – used locally, in containers and on the cloud. Additionally we will discuss data replication scenarios.
Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...DataWorks Summit
Four years ago we started with a data analytics platform to learn more about how our customers use our on-premise software and how it behaves out in the fields regarding function usage, exception rates and overall performance.
The talk is about the journey we had to take, coming from an existing web apps statistics tracking system to our current and still evolving Hadoop based ETL system. This includes the current technologies we use and the approcache on how we support reporting and dashboards.
This new Hadoop platform is used to collect, transform, enrich with data warehouse data, and analyze millions of log files every day. The generated insights help us to make data driven decisions for portfolio management, UX-Design, and overall software quality improvements with real business value.
You will hear about the Dos and Donts we learned, what we think are best practices, and the new challenges we have to deal with while data volume and management awareness is still emerging.
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
This document discusses planning for production success with Hadoop. It covers key questions around business continuity, high availability, data protection and disaster recovery. It also discusses considerations for multi-tenancy, interoperability and high performance. Additionally, it provides an overview of MapR's enterprise-grade data platform and highlights how it addresses production requirements through features like its NFS interface, strong data protection, and high availability.
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
Riccardo Zamana presented on time series analytics using Azure Data Explorer (ADX). The presentation covered ADX basics, architecture, quickstart, ingestion techniques including LightIngest and One Click ingestion, query optimization techniques like materialized views and caching, and visualization using dashboards in Kusto and Grafana. The document provided code examples of queries, functions, and operators for time series analysis in ADX.
Data analysis has evolved significantly from its origins in statistics. The evolution included developments in database management technology, business intelligence and analytic platforms, and statistical processing technologies. The key pillars of modern data analysis systems are big data, open source software, and cloud computing. These have transformed data analysis by enabling massive and fast distributed processing of both structured and unstructured data at large scale and low costs. Emerging trends in data analysis include the rise of citizen data scientists, algorithm marketplaces, and smarter data discovery tools that make analytics more accessible.
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads Srikanth Ramakrishnan
Born in HP Labs, EsgynDB is an open sourced All-in-One SQL DB Engine for Hadoop. It is the only platform which provides mixed workloads (Real-Time Operational Reporting, Analytics and Transactions) with high performance and concurrency.
EsgynDB gives you the ease and security of traditional (R)DBMS and the cost-effective scale of Hadoop.
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBVoltDB
Real-time analytics on streaming data is a strategic activity. Enterprises that can tap streaming data to uncover insights and take action faster than their competition gain business advantage. Join John Hugg, Founding Engineer, VoltDB and Pethuru Raj Chelliah and Skylab Vanga, Infrastructure Architect and Specialists, IBM SoftLayer to learn how VoltDB enables high performance and real-time big data analytics in the IBM SoftLayer cloud.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast July 14, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=bbd4395ea2f8c60a03cfefc68c7aa823
Innovation often implies risk, which is why businesses have many issues to weigh when considering change. Yet the remarkable growth of data is driving many traditional systems into the ground, forcing information workers to take a critical look at their existing tools. Technologies like Hadoop offer economical solutions to big data management, but to truly take advantage of its capabilities, organizations must modernize their infrastructure.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains how and why organizations should improve legacy systems. He’ll be briefed by Todd Untrecht of Actian, who will tout his company’s Actian Vortex, a SQL-in-Hadoop solution. He will show how integrating a SQL engine directly in the Hadoop cluster can lead to faster analytics and greater control, while still maintaining existing investments.
Visit InsideAnalysis.com for more information.
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
In this session look at the role of a data engineer in designing, provisioning, and enabling an Oracle Cloud data lake using Oracle Analytics Cloud, Data Lake. Attendees learn how to use data flow and data pipeline authoring tools and how machine learning and AI can be applied to this task, as well as how to connect to database and SaaS sources along with sources of external data via Oracle Data as a Service. Discover how traditional Oracle Analytics developers can transition their skills into this role and start working as a data engineer on Oracle Public Cloud data lake projects.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLKeshav Murthy
Abstract
NoSQL databases bring the benefits of schema flexibility and
elastic scaling to the enterprise. Until recently, these benefits have
come at the expense of giving up rich declarative querying as
represented by SQL.
In today’s world of agile business, developers and organizations need
the benefits of both NoSQL and SQL in a single platform. NoSQL
(document) databases provide schema flexibility; fast lookup; and
elastic scaling. SQL-based querying provides expressive data access
and transformation; separation of querying from modeling and storage;
and a unified interface for applications, tools, and users.
Developers need to deliver applications that can easily evolve,
perform, and scale. Otherwise, the cost, effort, and delay in keeping
up with changing business needs will become significant disadvantages.
Organizations need sophisticated and rapid access to their operational data, in
order to maintain insight into their business. This access should
support both pre-defined and ad-hoc querying, and should integrate
with standard analytical tools.
This talk will cover how to build applications that combine the
benefits of NoSQL and SQL to deliver agility, performance, and
scalability. It includes:
- N1QL, which extends SQL to JSON
- JSON data modeling
- Indexing and performance
- Transparent scaling
- Integration and ecosystem
You will walk away with an understanding of the design patterns and
best practices for effective utilization of NoSQL document
databases - all using open-source technologies.
Introducing N1QL: New SQL Based Query Language for JSONKeshav Murthy
This session introduces N1QL and sets the stage for the rich selection of N1QL-related sessions at Couchbase Connect 2015. N1QL is SQL for JSON, extending the querying power of SQL with the modeling flexibility of JSON. In this session, you will get an introduction to the N1QL language, architecture, and ecosystem, and you will hear the benefits of N1QL for developers and for enterprises.
This document provides an introduction to MongoDB and includes several customer case studies. It discusses how MongoDB allows customers to intelligently store flexible data, run applications anywhere, and scale easily. Case studies describe how an insurance company built a single customer view in 90 days, how the city of Chicago created a real-time geospatial platform, and how an agricultural company increased farm outputs by 8% using IoT data stored in MongoDB. The document argues that MongoDB is uniquely positioned as a modern general purpose database and has significant funding to continue investing in its business.
Nubank is the leading fintech in Latin America. Using bleeding-edge technology, design, and data, the company aims to fight complexity and empower people to take control of their finances. We are disrupting an outdated and bureaucratic system by building a simple, safe and 100% digital environment.
In order to succeed, we need to constantly make better decisions in the speed of insight, and that’s what We aim when building Nubank’s Data Platform. In this talk we want to explore and share the guiding principles and how we created an automated, scalable, declarative and self-service platform that has more than 200 contributors, mostly non-technical, to build 8 thousand distinct datasets, ingesting data from 800 databases, leveraging Apache Spark expressiveness and scalability.
The topics we want to explore are:
– Making data-ingestion a no-brainer when creating new services
– Reducing the cycle time to deploy new Datasets and Machine Learning models to production
– Closing the loop and leverage knowledge processed in the analytical environment to take decisions in production
– Providing the perfect level of abstraction to users
You will get from this talk:
– Our love for ‘The Log’ and how we use it to decouple databases from its schema and distribute the work to keep schemas up to date to the entire team.
– How we made data ingestion so simple using Kafka Streams that teams stopped using databases for analytical data.
– The huge benefits of relying on the DataFrame API to create datasets which made possible having tests end-to-end verifying that the 8000 datasets work without even running a Spark Job and much more.
– The importance of creating the right amount of abstractions and restrictions to have the power to optimize.
A Journey to a Serverless Business Intelligence, Machine Learning and Big Dat...DataWorks Summit
In this talk we will describe the journey we made with one of our customers, Volotea, to deploy a serverless Business Intelligence (BI), Machine Learning (ML) and Big Data (BD) platform on the Cloud. The new platform leverages Platform-as-a-Service (PaaS) Cloud services, and it is the result of the reengineering and extension of an existing platform based on Cloud Infrastructure-as-a-Service (IaaS) services and bare-metal systems. Managing and maintaining BI, ML and BD platforms based on bare-metal or IaaS deployments is not a straightforward task, and as size and complexity grow, we often find ourselves spending more and more time in tasks that are rather administrative, more than of a development or analytics nature. That is exactly what Volotea realized, and together we envisioned and executed a plan to lift and reengineer their platform into a new solution that leverages Microsoft Azure PaaS services. We have delivered a solution that manages to greatly reduce the administrative burden as well as the technical complexity when implementing new use cases. The new platform is based on the Microsoft Azure stack and it includes Azure Data Lake, Azure Data Lake Analytics, Azure Data Factory, Azure Machine Learning and Azure SQL Database. Join us in this talk where we will share our lessons learned and we will discuss how to plan and execute such an endeavor.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE)Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented.
The State of Streaming Analytics: The Need for Speed and ScaleVoltDB
Peter Vescuso, CMO, VoltDB and Mike Gualtieri Principal Analyst, Forrester Research talk about the latest on the streaming analytics market, in-memory database technology, and the emergence of ‘translytics’. Also hear how the latest release of VoltDB, the in-memory operational database, simplifies the distributed system complexity of building high-speed data pipelines with rapid ingestion of data, real-time analytics, Apache Kafka support, and Elasticsearch support.
eBay has one of the largest and most active data platforms in the world. The presentation discusses eBay's business, strategy, and key trends driving commerce. It then provides details on eBay's big data platform, including the large volume of data collected daily and how it is captured, transformed and synthesized to provide actionable insights. The presentation concludes by discussing how eBay has evolved to a governed self-service model for analytics to better organize and provide a consistent experience for its diverse user community.
The Scout24 Data Platform (A Technical Deep Dive)RaffaelDzikowski
The document provides an overview of the Scout24 Data Platform and its evolution towards becoming a truly data-driven company. Some key points:
- Scout24 operates various household brands across 18 countries with 80 million household reach.
- Historically, Scout24's technical architecture included a monolithic application and data warehouse that acted as a bottleneck.
- To address this, Scout24 built an internal "data platform" consisting of a microservices architecture, data lake, self-service analytics, and data ingestion tools to enable fast, easy product development supported by data and analytics.
- The data platform is thought of as a product in itself that provides generic layers for Scout24's products to be built upon
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)Lucas Jellema
Gone are the days of a single enterprise database – typically and Oracle RDBMS – that holds all data in a strictly normalized form. We work with many more types of data (big and fast, structured and unstructured) that we use in various ways. Relational and ACID is not applicable to all of those. Always the latest, freshest data may not be uniformly valid either. We will continue to see an increase in specialized data stores that cater for specific needs and specific scenarios.
This presentation is a combination of a presentation and a demonstration on the various dimensions and use cases of using data and data stores in various ways – while ensuring the appropriate (!) levels of freshness, integrity, performance. Key take away: what as an architect you should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation. How are upcoming architectural patterns such as CQRS (command query responsibility segregation) , event sourcing and microservices influencing the way we handle data in the enterprise? Some of the technologies discussed: products such as MongoDB, MySQL, Neo4J, Apache Kafka, Redis, Elastic Search and Hadoop/Spark, Oracle Data Hub Cloud (based on Apache Cassandra) – used locally, in containers and on the cloud. Additionally we will discuss data replication scenarios.
Data Beats Emotions – How DATEV Generates Business Value with Data-driven Dec...DataWorks Summit
Four years ago we started with a data analytics platform to learn more about how our customers use our on-premise software and how it behaves out in the fields regarding function usage, exception rates and overall performance.
The talk is about the journey we had to take, coming from an existing web apps statistics tracking system to our current and still evolving Hadoop based ETL system. This includes the current technologies we use and the approcache on how we support reporting and dashboards.
This new Hadoop platform is used to collect, transform, enrich with data warehouse data, and analyze millions of log files every day. The generated insights help us to make data driven decisions for portfolio management, UX-Design, and overall software quality improvements with real business value.
You will hear about the Dos and Donts we learned, what we think are best practices, and the new challenges we have to deal with while data volume and management awareness is still emerging.
Mastering MapReduce: MapReduce for Big Data Management and AnalysisTeradata Aster
Whether you’ve heard of Google’s MapReduce or not, its impact on Big Data applications, data warehousing, ETL,
business intelligence, and data mining is re-shaping the market for business analytics and data processing.
Attend this session to hear from Curt Monash on the basics of the MapReduce framework, how it is used, and what implementations like SQL-MapReduce enable.
In this session you will learn:
* The basics of MapReduce, key use cases, and what SQL-MapReduce adds
* Which industries and applications are heavily using MapReduce
* Recommendations for integrating MapReduce in your own BI, Data Warehousing environment
Key Considerations for Putting Hadoop in Production SlideShareMapR Technologies
This document discusses planning for production success with Hadoop. It covers key questions around business continuity, high availability, data protection and disaster recovery. It also discusses considerations for multi-tenancy, interoperability and high performance. Additionally, it provides an overview of MapR's enterprise-grade data platform and highlights how it addresses production requirements through features like its NFS interface, strong data protection, and high availability.
Time series Analytics - a deep dive into ADX Azure Data Explorer @Data Saturd...Riccardo Zamana
Riccardo Zamana presented on time series analytics using Azure Data Explorer (ADX). The presentation covered ADX basics, architecture, quickstart, ingestion techniques including LightIngest and One Click ingestion, query optimization techniques like materialized views and caching, and visualization using dashboards in Kusto and Grafana. The document provided code examples of queries, functions, and operators for time series analysis in ADX.
Data analysis has evolved significantly from its origins in statistics. The evolution included developments in database management technology, business intelligence and analytic platforms, and statistical processing technologies. The key pillars of modern data analysis systems are big data, open source software, and cloud computing. These have transformed data analysis by enabling massive and fast distributed processing of both structured and unstructured data at large scale and low costs. Emerging trends in data analysis include the rise of citizen data scientists, algorithm marketplaces, and smarter data discovery tools that make analytics more accessible.
EsgynDB: A Big Data Engine. Simplifying Fast and Reliable Mixed Workloads Srikanth Ramakrishnan
Born in HP Labs, EsgynDB is an open sourced All-in-One SQL DB Engine for Hadoop. It is the only platform which provides mixed workloads (Real-Time Operational Reporting, Analytics and Transactions) with high performance and concurrency.
EsgynDB gives you the ease and security of traditional (R)DBMS and the cost-effective scale of Hadoop.
Real-time Big Data Analytics in the IBM SoftLayer Cloud with VoltDBVoltDB
Real-time analytics on streaming data is a strategic activity. Enterprises that can tap streaming data to uncover insights and take action faster than their competition gain business advantage. Join John Hugg, Founding Engineer, VoltDB and Pethuru Raj Chelliah and Skylab Vanga, Infrastructure Architect and Specialists, IBM SoftLayer to learn how VoltDB enables high performance and real-time big data analytics in the IBM SoftLayer cloud.
The Scout24 Data Landscape Manifesto: Building an Opinionated Data PlatformRising Media Ltd.
The Scout24 Data Landscape Manifesto is the formalization of our opinions on how a successful data-driven company should approach data. In a truly data-driven company, no manager, no salesperson, no engineer and no data scientist can do their job properly without easy access to large amounts of high-quality data. It is Sean's mandate to create a platform that encourages the production of high-quality data and enables engagement with data by all employees. He and his team are opinionated about how all producers and consumers of data need to be active participants in the data platform, to make data-driven decisions and to be responsible for the data they produce. And he built the data platform with 'nudges' that reward data usage that matches his vision for a data-driven company. In this talk, Sean will present the Scout24 Data Landscape Manifesto and will show how the strong opinions it contains enabled him to successfully migrate from a classic centralized data warehouse to a decentralized, scalable, cloud-based data platform at AutoScout24 and ImmobilienScout24 that is core to their analytics and machine learning activities.
SQL In Hadoop: Big Data Innovation Without the RiskInside Analysis
The Briefing Room with Dr. Robin Bloor and Actian
Live Webcast July 14, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=bbd4395ea2f8c60a03cfefc68c7aa823
Innovation often implies risk, which is why businesses have many issues to weigh when considering change. Yet the remarkable growth of data is driving many traditional systems into the ground, forcing information workers to take a critical look at their existing tools. Technologies like Hadoop offer economical solutions to big data management, but to truly take advantage of its capabilities, organizations must modernize their infrastructure.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor as he explains how and why organizations should improve legacy systems. He’ll be briefed by Todd Untrecht of Actian, who will tout his company’s Actian Vortex, a SQL-in-Hadoop solution. He will show how integrating a SQL engine directly in the Hadoop cluster can lead to faster analytics and greater control, while still maintaining existing investments.
Visit InsideAnalysis.com for more information.
From BI Developer to Data Engineer with Oracle Analytics Cloud, Data LakeRittman Analytics
In this session look at the role of a data engineer in designing, provisioning, and enabling an Oracle Cloud data lake using Oracle Analytics Cloud, Data Lake. Attendees learn how to use data flow and data pipeline authoring tools and how machine learning and AI can be applied to this task, as well as how to connect to database and SaaS sources along with sources of external data via Oracle Data as a Service. Discover how traditional Oracle Analytics developers can transition their skills into this role and start working as a data engineer on Oracle Public Cloud data lake projects.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
During this presentation, Infusion and MongoDB shared their mainframe optimization experiences and best practices. These have been gained from working with a variety of organizations, including a case study from one of the world’s largest banks. MongoDB and Infusion bring a tested approach that provides a new way of modernizing mainframe applications, while keeping pace with the demand for new digital services.
Bringing SQL to NoSQL: Rich, Declarative Query for NoSQLKeshav Murthy
Abstract
NoSQL databases bring the benefits of schema flexibility and
elastic scaling to the enterprise. Until recently, these benefits have
come at the expense of giving up rich declarative querying as
represented by SQL.
In today’s world of agile business, developers and organizations need
the benefits of both NoSQL and SQL in a single platform. NoSQL
(document) databases provide schema flexibility; fast lookup; and
elastic scaling. SQL-based querying provides expressive data access
and transformation; separation of querying from modeling and storage;
and a unified interface for applications, tools, and users.
Developers need to deliver applications that can easily evolve,
perform, and scale. Otherwise, the cost, effort, and delay in keeping
up with changing business needs will become significant disadvantages.
Organizations need sophisticated and rapid access to their operational data, in
order to maintain insight into their business. This access should
support both pre-defined and ad-hoc querying, and should integrate
with standard analytical tools.
This talk will cover how to build applications that combine the
benefits of NoSQL and SQL to deliver agility, performance, and
scalability. It includes:
- N1QL, which extends SQL to JSON
- JSON data modeling
- Indexing and performance
- Transparent scaling
- Integration and ecosystem
You will walk away with an understanding of the design patterns and
best practices for effective utilization of NoSQL document
databases - all using open-source technologies.
Introducing N1QL: New SQL Based Query Language for JSONKeshav Murthy
This session introduces N1QL and sets the stage for the rich selection of N1QL-related sessions at Couchbase Connect 2015. N1QL is SQL for JSON, extending the querying power of SQL with the modeling flexibility of JSON. In this session, you will get an introduction to the N1QL language, architecture, and ecosystem, and you will hear the benefits of N1QL for developers and for enterprises.
Querying NoSQL with SQL - KCDC - August 2017Matthew Groves
Until recently, agile business had to choose between the benefits of JSON-based NoSQL databases and the benefits of SQL-based querying. NoSQL provides schema flexibility, high performance, and elastic scaling, while SQL provides expressive, independent data access. Recent convergence allows developers and organizations to have the best of both worlds.
Developers need to deliver apps that readily evolve, perform, and scale, all to match changing business needs. Organizations need rapid access to their operational data, using standard analytical tools, for insight into their business. In this session, you will learn the ways that SQL can be applied to NoSQL databases (N1QL, SQL++, ODBC, JDBC, and others), and what additional features are needed to deal with JSON documents. SQL for JSON, JSON data modeling, indexing, and tool integration will be covered.
From SQL to NoSQL: Structured Querying for JSONKeshav Murthy
Can SQL be used to query JSON? SQL is the universally known structured query language, used for well defined, uniformly structured data; while JSON is the lingua franca of flexible data management, used to define complex, variably structured data objects.
Yes! SQL can most-definitely be used to query JSON with Couchbase's SQL query language for JSON called N1QL (verbalized as Nickel.)
In this session, we will explore how N1QL extends SQL to provide the flexibility and agility inherent in JSON while leveraging the universality of SQL as a query language.
We will discuss utilizing SQL to query complex JSON objects that include arrays, sets and nested objects.
You will learn about the powerful query expressiveness of N1QL, including the latest features that have been added to the language. We will cover how using N1QL can solve your real-world application challenges, based on the actual queries of Couchbase end-users.
Docker Summit MongoDB - Data Democratization Chris Grabosky
The document summarizes a presentation given by Chris Grabosky, a Solutions Architect at MongoDB. It discusses how containerization and data democratization can help organizations build cloud native applications. It highlights how MongoDB and Docker can be used together to build scalable and portable apps. It also covers data modeling approaches, workload isolation, analytics, and data access controls that MongoDB provides to help democratize data.
Json data modeling june 2017 - pittsburgh tech festMatthew Groves
The document discusses JSON data modeling and accessing data in NoSQL databases. It covers why organizations adopt NoSQL, modeling data in relational versus JSON document models, strategies for modeling different types of data, and methods for accessing data including key-value operations, queries using N1QL and map reduce, and migrating data into NoSQL databases from relational sources. The presentation aims to help attendees understand how to design their data model and choose the best approach to working with data in a NoSQL database like Couchbase.
Couchbase Tutorial: Big data Open Source Systems: VLDB2018Keshav Murthy
The document provides an agenda and introduction to Couchbase and N1QL. It discusses Couchbase architecture, data types, data manipulation statements, query operators like JOIN and UNNEST, indexing, and query execution flow in Couchbase. It compares SQL and N1QL, highlighting how N1QL extends SQL to query JSON data.
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5Keshav Murthy
N1QL gives developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data. We’ll begin this session with a brief overview of N1QL and then explore some key enhancements we’ve made in the latest versions of Couchbase Server. Couchbase Server 5.0 has language and performance improvements for pagination, index exploitation, integration, index availability, and more. Couchbase Server 5.5 will offer even more language and performance features for N1QL and global secondary indexes (GSI), including ANSI joins, aggregate performance, index partitioning, auditing, and more. We’ll give you an overview of the new features as well as practical use case examples.
Application Development & Database Choices: Postgres Support for non Relation...EDB
This talk will cover the advanced features of PostgreSQL that make it the most-loved RDBMS by developers and a great choice for non-relational workloads.
This webinar will explore:
- Global adoption of Postgres
- Document-centric applications
- Geographic Information Systems (GIS)
- Business intelligence
- Central data centers
- Server-side languages
1. MongoDB Stitch is a backend as a service that allows developers to easily work with data and integrate their apps with key services.
2. It provides integrated rules, pipelines, and services to handle complex workflows between databases and third party services.
3. Requests made to Stitch are parsed, rules are applied, databases and services are orchestrated, results are aggregated and returned to the client.
This document provides an agenda and overview for a N1QL workshop on indexing and query tuning in Couchbase 4.0. The agenda includes sections on view index, global secondary index (GSI), multi-index scan, hands-on N1QL, query tuning, index selection hints, key-value access, joins, and more hands-on N1QL. The overview sections explain indexing in Couchbase including the primary index, secondary indexes, composite indexes, index intersection for multi-index scans, and the query execution flow involving parsing, planning, scanning indexes, and fetching documents.
Utilizing Arrays: Modeling, Querying and IndexingKeshav Murthy
Arrays can be simple; arrays can be complex. JSON arrays give you a method to collapse the data model while retaining structure flexibility. Arrays of scalars, objects, and arrays are common structures in a JSON data model. Once you have this, you need to write queries to update and retrieve the data you need efficiently. This talk will discuss modeling and querying arrays. Then, it will discuss using array indexes to help run those queries on arrays faster.
Big Data for Small Businesses & StartupsFujio Turner
Big Data is not just for Big Businesses. In this slideshare we will cover how small businesses and startups can leverage Big Data to increase revenue. HPCC Systems lets you get started with only one machine and grow to exabytes.
1. Mining and understanding customers behavior from data outside the firewall and joining it with internal data to turn it into actionable marketing strategies.
2. Understanding your whole business with BI tools. Learn how Big Data help join data from different parts of your business to see the big picture.
JSON Data Modeling - July 2018 - Tulsa TechfestMatthew Groves
If you’re thinking about using a document database, it can be intimidating to start. A flexible data model gives you a lot of choices, but which way is the right way? Is a document database even the right tool? In this session we’ll go over the basics of data modeling using JSON. We’ll compare and contrast with traditional RDBMS modeling. Impact on application code will be discussed, as well as some tooling that could be helpful along the way. The examples use the free, open-source Couchbase Server document database, but the principles from this session can also be applied to CosmosDb, Mongo, RavenDb, etc.
JSON Data Modeling - GDG Indy - April 2020Matthew Groves
Presented virtually at GDG Indy - https://www.meetup.com/indy-gdg/events/269467916/
If you’re thinking about using a document database, it can be intimidating to start. A flexible data model gives you a lot of choices, but which way is the right way? Is a document database even the right tool? In this session we’ll go over the basics of data modeling using JSON. We’ll compare and contrast with traditional RDBMS modeling. Impact on application code will be discussed, as well as some tooling that could be helpful along the way. The examples use the free, open-source Couchbase Server document database, but the principles from this session can also be applied to CosmosDb, Mongo, RavenDb, etc.
The document discusses using WSO2's big data analytics platform to analyze social media streams like Twitter data. It describes how the WSO2 Business Activity Monitor can collect data from various sources, and the Complex Event Processor can process multiple event streams and identify patterns. An example use case presented is analyzing Twitter tags over time to identify trends and public interest around certain topics, and notifying users via email of interesting tweets. The proposed solution architecture shows how a Twitter agent can collect data and publish it to BAM for storage and processing by CEP queries.
Watch this Fast Data Strategy Virtual Summit session with speakers Mano Vajpey, Managing Director, SimplicityBI & John Skier, Director Systems Integration, IBM here: https://goo.gl/Qf2zRW
Today's complex organizations will often have hundreds if not thousands of data repositories, distributed across on-premises stores and now the cloud. For data to become truly democratized, non-specialists and data natives need to be able to easily find and access data without requiring outside help.
Attend this session to discover:
• Why organizations’ need to rethink the way they work with data
• How a data marketplace facilitates a single access point for all of an organization's data assets
• The role of data virtualization in enabling a data marketplace
Similar to SQL for JSON: Rich, Declarative Querying for NoSQL Databases and Applications (20)
The N1QL is a developer favorite because it’s SQL for JSON. Developer’s life is going to get easier with the upcoming N1QL features. We have exciting features in many areas including language to performance, indexing to search, and tuning to transactions. This session will preview new the features for both new and advanced users.
XLDB Lightning Talk: Databases for an Engaged World: Requirements and Design...Keshav Murthy
Traditional databases have been designed for system of record and analytics. Modern enterprises have orders of magnitude more interactions than transactions. Couchbase Server is a rethinking of the database for interactions and engagements called, Systems of Engagement. Memory today is much cheaper than disks were when traditional databases were designed back in the 1970's, and networks are much faster and much more reliable than ever before. Application agility is also an extremely important requirement. Today's Couchbase Server is a memory- and network-centric, shared-nothing, auto-partitioned, and distributed NoSQL database system that offers both key-based and secondary index-based data access paths as well as API- and query-based data access capabilities. This lightning talk gives you an overview of requirements posed by next-generation database applications and approach to implementation including “Multi Dimensional Scaling.
Couchbase 5.5: N1QL and Indexing featuresKeshav Murthy
This deck contains the high-level overview of N1QL and Indexing features in Couchbase 5.5. ANSI joins, hash join, index partitioning, grouping, aggregation performance, auditing, query performance features, infrastructure features.
The document discusses improvements to the N1QL query optimizer and execution engine in Couchbase Server 5.0. Key improvements include UnionScan to handle OR predicates using multiple indexes, IntersectScan terminating early for better performance, implicit covering array indexes, stable scans, efficiently pushing composite filters, pagination support, index column ordering, aggregate pushdown, and index projections.
Mindmap: Oracle to Couchbase for developersKeshav Murthy
This deck provides a high-level comparison between Oracle and Couchbase: Architecture, database objects, types, data model, SQL & N1QL statements, indexing, optimizer, transactions, SDK and deployment options.
Queries need indexes to speed up and optimize resource utilization. What indexes to create and what rules to follow to create right indexes to optimize the workload? This presentation gives the rules for those.
N1QL = SQL + JSON. N1QL gives developers and enterprises an expressive, powerful, and complete language for querying, transforming, and manipulating JSON data. We begin with a brief overview. Couchbase 5.0 has language and performance improvements for pagination, index exploitation, integration, and more. We’ll walk through scenarios, features, and best practices.
Tuning for Performance: indexes & QueriesKeshav Murthy
There are three things important in databases: performance, performance, performance. From a simple query to fetch a document to a query joining millions of documents, designing the right data models and indexes is important. There are many indices you can create, and many options you can choose for each index. This talk will help you understand tuning N1QL query, exploiting various types of indices, analyzing the system behavior, and sizing them correctly.
Understanding N1QL Optimizer to Tune QueriesKeshav Murthy
Every flight has a flight plan. Every query has a query plan. You must have seen its text form, called EXPLAIN PLAN. Query optimizer is responsible for creating this query plan for every query, and it tries to create an optimal plan for every query. In Couchbase, the query optimizer has to choose the most optimal index for the query, decide on the predicates to push down to index scans, create appropriate spans (scan ranges) for each index, understand the sort (ORDER BY) and pagination (OFFSET, LIMIT) requirements, and create the plan accordingly. When you think there is a better plan, you can hint the optimizer with USE INDEX. This talk will teach you how the optimizer selects the indices, index scan methods, and joins. It will teach you the analysis of the optimizer behavior using EXPLAIN plan and how to change the choices optimizer makes.
N1QL supports select, join, project,nest,unnest operations on flexible schema documents represented in JSON.
Couchbase 4.5 enhances the data modeling and query flexibility.
When you have parent-child relationship, children documents point to parent document, you join from child to parent. Now, how would you join from parent to child when parent does not contain the reference to child? How would you improve performance on this? This presentation explain the syntax, execution of the query.
Enterprise Architect's view of Couchbase 4.0 with N1QLKeshav Murthy
Enterprise architects have to decide on the database platform that will meet various requirements: performance and scalability on one side, ease of data modeling, agile development on the other, elasticity and flexibility to handle change easily, and a database platform that integrates well with tools and within ecosystem. This presentation will highlight the challenges and approaches to solution using Couchbase with N1QL.
Variety is the spice of life, but it’s also the reality of big data. For this reason, JSON has now becoming lingua franca of data in the internet – for APIs, data exchange, data storage and data processing. In the business intelligence world, SQL is the language to analyze the data in other forms. Hence, the myriad of “SQL-on-Hadoop” projects. However, traditional SQL isn’t JSON/Parquet/etc. friendly. ETL into flattened tables is costly and not real time.
Apache Drill unifies SQL with variety of data forms on Hadoop. That enables interactive analytics using your favorite BI tool and visualization tool on you data simultaneously. In this talk, we’ll introduce Apache Drill and describe use cases.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6850#sthash.NhuLz6Dq.dpuf
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
Informix Warehouse Accelerator (IWA) has helped traditional
data warehousing performance to improve dramatically. Now,
IWA accelerates analytics over the sensor data stored in relational and timeseries data.
You know what iMEAN? Using MEAN stack for application dev on InformixKeshav Murthy
You know what iMEAN? Using MEAN stack for application dev on Informix. MongoDB, ExpressJS, AngularJS, NodeJS combine to form a MEAN stack for quick appdev. iMEAN is using the same stack to develop applications on Informix.
Informix SQL & NoSQL: Putting it all togetherKeshav Murthy
IBM Informix is a database management system that provides capabilities for handling different types of data including relational tables, JSON collections, and time series data. It uses a hybrid approach that allows seamless access to different data types using SQL and NoSQL APIs. The document discusses how Informix can be used to store and analyze IoT, mobile, and sensor data from devices and gateways in both on-premises and cloud environments. It also highlights the Informix Warehouse Accelerator for in-memory analytics and how Informix can be integrated with other IBM products and services like MongoDB, Bluemix, and Cognos.
Informix SQL & NoSQL -- for Chat with the labs on 4/22Keshav Murthy
This document discusses how Informix can be used for both SQL and NoSQL applications. It notes that applications now need to support mobile, big data, and social/digital collaboration. NoSQL databases like MongoDB are growing in popularity due to their ability to handle these new requirements. The document then outlines how Informix provides drivers and tools that allow existing MongoDB applications and data models to run directly on Informix, and also allows SQL applications to access and query JSON document data stored in Informix. It discusses features like indexing, querying, scaling, and hybrid access to both relational and JSON data from a single database platform.
NoSQL Deepdive - with Informix NoSQL. IOD 2013Keshav Murthy
Deepdive into IBM Informix NoSQL with Mongodb API, hybrid access. Informix now supports MongoDB API and stores JSON natively, thus supporting flexible schema. Informix NoSQL also supports sharding, enabling scale out. This presentation gives overview to a real application to technical details of this
Informix NoSQL & Hybrid SQL detailed deep diveKeshav Murthy
This document provides an overview of Informix NoSQL capabilities and use cases. It discusses key-value stores, column family stores, document databases, and graph databases supported by Informix NoSQL. Several business uses of Informix NoSQL are outlined, including session store, user profile store, content metadata store, mobile apps, third party data aggregation, caching, and ecommerce. The document also compares pricing of Informix and MongoDB editions over a three year period. Finally, it provides timelines for go-to-market strategies for DB2 JSON and Informix JSON capabilities.
Transform Your Communication with Cloud-Based IVR SolutionsTheSMSPoint
Discover the power of Cloud-Based IVR Solutions to streamline communication processes. Embrace scalability and cost-efficiency while enhancing customer experiences with features like automated call routing and voice recognition. Accessible from anywhere, these solutions integrate seamlessly with existing systems, providing real-time analytics for continuous improvement. Revolutionize your communication strategy today with Cloud-Based IVR Solutions. Learn more at: https://thesmspoint.com/channel/cloud-telephony
DDS Security Version 1.2 was adopted in 2024. This revision strengthens support for long runnings systems adding new cryptographic algorithms, certificate revocation, and hardness against DoS attacks.
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesQuickdice ERP
Explore the seamless transition to e-invoicing with this comprehensive guide tailored for Saudi Arabian businesses. Navigate the process effortlessly with step-by-step instructions designed to streamline implementation and enhance efficiency.
Takashi Kobayashi and Hironori Washizaki, "SWEBOK Guide and Future of SE Education," First International Symposium on the Future of Software Engineering (FUSE), June 3-6, 2024, Okinawa, Japan
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Introducing Crescat - Event Management Software for Venues, Festivals and Eve...Crescat
Crescat is industry-trusted event management software, built by event professionals for event professionals. Founded in 2017, we have three key products tailored for the live event industry.
Crescat Event for concert promoters and event agencies. Crescat Venue for music venues, conference centers, wedding venues, concert halls and more. And Crescat Festival for festivals, conferences and complex events.
With a wide range of popular features such as event scheduling, shift management, volunteer and crew coordination, artist booking and much more, Crescat is designed for customisation and ease-of-use.
Over 125,000 events have been planned in Crescat and with hundreds of customers of all shapes and sizes, from boutique event agencies through to international concert promoters, Crescat is rigged for success. What's more, we highly value feedback from our users and we are constantly improving our software with updates, new features and improvements.
If you plan events, run a venue or produce festivals and you're looking for ways to make your life easier, then we have a solution for you. Try our software for free or schedule a no-obligation demo with one of our product specialists today at crescat.io
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
Hand Rolled Applicative User ValidationCode KataPhilip Schwarz
Could you use a simple piece of Scala validation code (granted, a very simplistic one too!) that you can rewrite, now and again, to refresh your basic understanding of Applicative operators <*>, <*, *>?
The goal is not to write perfect code showcasing validation, but rather, to provide a small, rough-and ready exercise to reinforce your muscle-memory.
Despite its grandiose-sounding title, this deck consists of just three slides showing the Scala 3 code to be rewritten whenever the details of the operators begin to fade away.
The code is my rough and ready translation of a Haskell user-validation program found in a book called Finding Success (and Failure) in Haskell - Fall in love with applicative functors.
Atelier - Innover avec l’IA Générative et les graphes de connaissancesNeo4j
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Allez au-delà du battage médiatique autour de l’IA et découvrez des techniques pratiques pour utiliser l’IA de manière responsable à travers les données de votre organisation. Explorez comment utiliser les graphes de connaissances pour augmenter la précision, la transparence et la capacité d’explication dans les systèmes d’IA générative. Vous partirez avec une expérience pratique combinant les relations entre les données et les LLM pour apporter du contexte spécifique à votre domaine et améliorer votre raisonnement.
Amenez votre ordinateur portable et nous vous guiderons sur la mise en place de votre propre pile d’IA générative, en vous fournissant des exemples pratiques et codés pour démarrer en quelques minutes.
What is Master Data Management by PiLog Groupaymanquadri279
PiLog Group's Master Data Record Manager (MDRM) is a sophisticated enterprise solution designed to ensure data accuracy, consistency, and governance across various business functions. MDRM integrates advanced data management technologies to cleanse, classify, and standardize master data, thereby enhancing data quality and operational efficiency.
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
OpenMetadata Community Meeting - 5th June 2024OpenMetadata
The OpenMetadata Community Meeting was held on June 5th, 2024. In this meeting, we discussed about the data quality capabilities that are integrated with the Incident Manager, providing a complete solution to handle your data observability needs. Watch the end-to-end demo of the data quality features.
* How to run your own data quality framework
* What is the performance impact of running data quality frameworks
* How to run the test cases in your own ETL pipelines
* How the Incident Manager is integrated
* Get notified with alerts when test cases fail
Watch the meeting recording here - https://www.youtube.com/watch?v=UbNOje0kf6E
55. Thank you
Q & A
Gerald Sangudi | @sangudi | Chief Architect | Couchbase
Keshav Murthy| @rkeshavmurthy | Director | Couchbase
@N1QL | query.couchbase.com
Editor's Notes
In today’s world of agile business, Java developers and organizations benefit when JSON-based NoSQL databases and SQL-based querying come together. NoSQL provides schema flexibility and elastic scaling. SQL provides expressive, independent data access. Java developers need to deliver apps that readily evolve, perform, and scale with changing business needs. Organizations need rapid access to their operational data, using standard analytical tools, for insight into their business. In this session, you will learn to build apps that combine NoSQL and SQL for agility, performance, and scalability. This includes
• JSON data modeling
• Indexing
• Tool integration
KEY POINT: MAJOR ENTERPRISES ACROSS NUMEROUS INDUSTRIES ARE ADOPTING COUCHBASE NOSQL TECHNOLOGY TO SUPPORT THEIR DATA MANAGEMENT NEEDS.
We’ve looked at what’s driving NoSQL and the advantages it delivers. So who are some of the companies that are actually adopting Couchbase?
As this slide shows, major enterprises across many different industries are adopting Couchbase NoSQL solution.
You see many innovative Internet companies here – like eBay, LinkedIn, Orbitz, and PayPal.
They were the early adopters of NoSQL. But it’s clear that NoSQL is now being adopted by a broad range of companies in many other industries:
Consumer electronics and technology companies like Apple and Cisco
Retail companies like Walmart and Tesco
Financial Services companies like VISA and Wells Fargo
Telcos like Verizon, AT&T and Vodafone
What’s also interesting is that we’re seeing the use of NoSQL expand inside many of these companies. Orbitz, the online travel company, is a great example – they started using Couchbase to store their hotel rate data, and now they use Couchbase in many other ways.
So the big takeaway is that an increasing number of companies across industries are seeing the value of NoSQL for a growing number of use cases.
SQL (relational) databases are great. They give you LOT OF functionality.
Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. The cost is going down every day.
One thing RDBMS does not handle well is CHANGE.
Change of schema (both logical and physical), change of hardware, change of capacity.
Appdev: Changes are required not only during initial dev but also after going long-beta or production. Quick response to market feedback differentiates winners from loosers.
Scalability: Many of the digital economy business applications (every business is a digital economy business) expect high throughput (> 100K gets/s) with millions of customers. Physics of servers dictate we distribute the data to multiple nodes.
Consistent Perf: With high throughput comes high responsibility: consistently low latency queries.
Reliability: It’s not IF but WHEN some of the hardware goes down, be it disk, memory, server. Your business can’t be at the mercy of hardware.
Java and JDBC have had a very long and successful relationship.
JDBC Provides java connectivity to relational databases via SQL.
All of the application servers support using JDBC for persistence.
Lot of the object-relational mappers like hibernate, frameworks like ibatis, spring data all support database access via
SQL (relational) databases are great. They give you LOT OF functionality.
Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. The cost is going down every day.
One thing RDBMS does not handle well is CHANGE.
Change of schema (both logical and physical), change of hardware, change of capacity.
Change in the CONSTANT.
Appdev: Changes are required not only during initial dev but also after going long-beta or production. Quick response to market feedback differentiates winners from loosers.
Scalability: Many of the digital economy business applications (every business is a digital economy business) expect high throughput (> 100K gets/s) with millions of customers. Physics of servers dictate we distribute the data to multiple nodes.
Consistent Perf: With high throughput comes high responsibility: consistently low latency queries.
Reliability: It’s not IF but WHEN some of the hardware goes down, be it disk, memory, server. Your business can’t be at the mercy of hardware.
Java and JDBC have had a very long and successful relationship.
JDBC Provides java connectivity to relational databases via SQL.
All of the application servers support using JDBC for persistence.
Lot of the object-relational mappers like hibernate, frameworks like ibatis, spring data all support database access via
SQL (relational) databases are great. They give you LOT OF functionality.
Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. The cost is going down every day.
One thing RDBMS does not handle well is CHANGE.
Change of schema (both logical and physical), change of hardware, change of capacity.
Appdev: Changes are required not only during initial dev but also after going long-beta or production. Quick response to market feedback differentiates winners from loosers.
Scalability: Many of the digital economy business applications (every business is a digital economy business) expect high throughput (> 100K gets/s) with millions of customers. Physics of servers dictate we distribute the data to multiple nodes.
Consistent Perf: With high throughput comes high responsibility: consistently low latency queries.
Reliability: It’s not IF but WHEN some of the hardware goes down, be it disk, memory, server. Your business can’t be at the mercy of hardware.
Java and JDBC have had a very long and successful relationship.
JDBC Provides java connectivity to relational databases via SQL.
All of the application servers support using JDBC for persistence.
Lot of the object-relational mappers like hibernate, frameworks like ibatis, spring data all support database access via
SQL (relational) databases are great. They give you LOT OF functionality.
Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. The cost is going down every day.
One thing RDBMS does not handle well is CHANGE.
Change of schema (both logical and physical), change of hardware, change of capacity.
Appdev: Changes are required not only during initial dev but also after going long-beta or production. Quick response to market feedback differentiates winners from loosers.
Scalability: Many of the digital economy business applications (every business is a digital economy business) expect high throughput (> 100K gets/s) with millions of customers. Physics of servers dictate we distribute the data to multiple nodes.
Consistent Perf: With high throughput comes high responsibility: consistently low latency queries.
Reliability: It’s not IF but WHEN some of the hardware goes down, be it disk, memory, server. Your business can’t be at the mercy of hardware.
Java and JDBC have had a very long and successful relationship.
JDBC Provides java connectivity to relational databases via SQL.
All of the application servers support using JDBC for persistence.
Lot of the object-relational mappers like hibernate, frameworks like ibatis, spring data all support database access via
SQL (relational) databases are great. They give you LOT OF functionality.
Great set of abstractions (tables, columns, data types, constraints, triggers, SQL, ACID TRANSACTIONS, stored procedures and more) at a highly reasonable cost. The cost is going down every day.
One thing RDBMS does not handle well is CHANGE.
Change of schema (both logical and physical), change of hardware, change of capacity.
Appdev: Changes are required not only during initial dev but also after going long-beta or production. Quick response to market feedback differentiates winners from loosers.
Scalability: Many of the digital economy business applications (every business is a digital economy business) expect high throughput (> 100K gets/s) with millions of customers. Physics of servers dictate we distribute the data to multiple nodes.
Consistent Perf: With high throughput comes high responsibility: consistently low latency queries.
Reliability: It’s not IF but WHEN some of the hardware goes down, be it disk, memory, server. Your business can’t be at the mercy of hardware.
Java and JDBC have had a very long and successful relationship.
JDBC Provides java connectivity to relational databases via SQL.
All of the application servers support using JDBC for persistence.
Lot of the object-relational mappers like hibernate, frameworks like ibatis, spring data all support database access via
NoSQL, although generally accepted as Not Only SQL, generally refers to databases which lack SQL.
Implementing generally accepted subset of SQL for flexible data model on a distributed system IS HARD.
Once the SQL language, transaction became optional, flurry of databases were created using distinct approaches for common use-cases.
KEY-Value simply provided quick access to data for a given KEY. Lot of the databases in this group provide additional functionality compared to memcached.
Wide Column databases
Graph databases can store large number of arbitrary columns in each row
Document databases aggregate data into a hierarchical structure.
With JSON is a means to the end. Document databases provide flexible schema,built-in data types, rich structure, implicit relationships using JSON.
With lot of these databases simple things like GET and PUT were done via very simple API.
Anything compilicated requires long, client side programs. Let’s seen an example.
Let’s look at modeling Customer data.
Rich Structure
In relational database, this customers data would be stored in five normalized tables.
Each time you want to construct a customer object, you JOIN the data in these tables;
Each time you persist, you find the appropriate rows in relevant tables and insert/update.
Relationship
Enforcement is via referential constraints. Objects are constructed by JOINS, EACH time.
Value Evolution
Additional values of the SAME TYPE (e.g. additional phone, additional address) is managed by additional ROWS in one of the tables.
Customer:contacts will have 1:n relationship.
Structure Evolution:
This is the most difficult part.changing the structure is difficult, within a table, across tae table.
While you can do these via ALTER TABLE, requires downtime, migration and application versioning.
This is one of the problem document databases try to handle by representing data in JSON.
Let’s see how to represent customer data in JSON.
So, finally, you have a JSON document that represents a CUSTOMER.
In a single JSON document, relationship between the data is implicit by use of sub-structures and arrays and arrays of sub-structures.
SQL language operates on set of relations (table) and then projects another relation (table)
N1QL is a SQL based language designed for JSON.
Input is set of related sets of JSON documents.
It operates on these documents and produces another set of JSON documents.
All of the SQL operattions, select, join, project operations are supported.
All of the common statements are supported: SELECT, INSERT, UPDATE, DELETE and MERGE.
Additionally, N1QL extends the language to support:
-- Fully address, access and modify any part of the JSON document.
-- Handle the full flexible schema where schema is self-described by each document.
-- key-value pairs could be missing, data types could change and structure could change!
Nielsen session and use case
Mention nested sourcing?
This JSR is to provide an update for the Java API for JSON Processing Specification (JSON-P).
There are 3 goals for this JSR.
* Support for JSON Pointer and JSON Patch
After the release of JSR 353, the specifications for JSON Pointer (http://tools.ietf.org/html/rfc6901) and JSON Patch (http://tools.ietf.org/html/rfc6902) have been released. This JSR will update JSON-P to provide support for these specifications.
* Add editing/transformation operations to JSON object model
The current JSON object model provides immutable JSON objects and arrays. It is useful to provide editing or transformation functionalities to the basic JSON processing.
For a JSON object, an operation will be added to remove a name/value pair from the object. For a JSON array, operations will be added to insert, replace, or remove an entry of the array.
The builder pattern will be used to handle these operations: a builder is created with the initial JSON object or array, the operations are carried out by the builder, and the edited object or array is finally converted to an immutable JSON object or array.
* Update the API to work with Java SE 8
Queries on JSON object model are currently possible, using Java SE 8's stream operations and lambda expressions. However, to make them truly useful and convenient, there is a need for Collectors that return JSON objects or arrays instead of Maps or Lists. This JSR will define some common and useful Collectors.
It may also be useful to turn JSON parser events into a stream of events, so that SE 8's stream operations can be used.