Gone are the days of a single enterprise database – typically and Oracle RDBMS – that holds all data in a strictly normalized form. We work with many more types of data (big and fast, structured and unstructured) that we use in various ways. Relational and ACID is not applicable to all of those. Always the latest, freshest data may not be uniformly valid either. We will continue to see an increase in specialized data stores that cater for specific needs and specific scenarios.
This presentation is a combination of a presentation and a demonstration on the various dimensions and use cases of using data and data stores in various ways – while ensuring the appropriate (!) levels of freshness, integrity, performance. Key take away: what as an architect you should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation. How are upcoming architectural patterns such as CQRS (command query responsibility segregation) , event sourcing and microservices influencing the way we handle data in the enterprise? Some of the technologies discussed: products such as MongoDB, MySQL, Neo4J, Apache Kafka, Redis, Elastic Search and Hadoop/Spark, Oracle Data Hub Cloud (based on Apache Cassandra) – used locally, in containers and on the cloud. Additionally we will discuss data replication scenarios.
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...Lucas Jellema
Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks, Oracle Machine Learning CS and the Citizen Data Scientists all make their appearance, as does SQL.
Integrating Applications and Data (with Oracle PaaS Cloud) - Oracle Cloud Day...Lucas Jellema
Integration is a challenge that has become even more urgent with the move to the cloud that all organizations are making or are about to make. Whether SaaS applications have to be enabled (linked to other SaaS applications or to custom apps) or IoT is used to integrate the physical world into enterprise IT or whether microservices (on premises) have to collaborate with microservices (in the cloud) - integration is at the heart of enterprise IT. This presentation discusses the move to the cloud, a number of common integration use cases and the key components in Oracle PaaS Portfolio for tackling these challenges. The presentation was delivered at the Oracle Cloud Day 2017 in Nieuwegein, The Netherlands
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
IT systems and applications generate more and more machine data due to millions of mobile devices, Internet of Things, social network users, and other new emerging technologies. However, organizations experience challenges when monitoring and managing their IT systems and technology infrastructure. They struggle with network and server monitoring/troubleshooting, security analysis, custom application monitoring and debugging, compliance standards, and others.
This session discusses how to solve the challenges of analyzing Terabytes and more of different log data to leverage the “digital business” – a term defined by Gartner and others to explain that IT is not just a tool to enable a business, but IT is the business.
The main part of the session compares different solutions for operational intelligence and log analytics to create “digital business”, such as Splunk, TIBCO LogLogic and the open source “ELK stack” (ElasticSearch, Logstash, Kibana).
A common use case will be demonstrated in a live demo: Monitoring, analyzing and correlating a complex E-Commerce transaction running through different custom applications such as a Java EE web application, an integration middleware and analytics processes.
The end of the session explains the distinction of the discussed solutions to Apache Hadoop, and how they can complement each other in a big data architecture.
Tapdata provides a smart data as a service platform that offers:
1) Real-time data collection and synchronization from various sources like databases, files, and streaming data.
2) Data modeling and governance capabilities like data validation, quality checks, and AI-assisted cataloging.
3) Scalable data storage across TBs to PBs of data using a distributed database.
4) A code-less API publishing module to quickly build and deploy RESTful APIs for internal and external users.
Driving Business Outcomes with a Modern Data Architecture - Level 100Amazon Web Services
Your business data contains critical information about customer behaviors, operational decisions, and many factors that have financial impact on your organisation. Increasingly though, this data is too big, too fast, and too complex for existing systems to handle. AWS Data and Analytics services are designed to ingest, store, analyse, and consume information at record-breaking scale. In this session you will learn how these services work together to deliver business automation, enhance customer engagement and intelligence.
Speaker: Craig Stires, APAC Business Development - Big Data & Analytics, Amazon Web Services
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
Building applications on streaming data has its challenges. If you are trying to use programs such as Apache Spark or Storm to build applications, this presentation will explain the advantages and disadvantages of each solution and how to choose the right tool for your next streaming data project. Building streaming data applications that can manage the massive quantities of data generated from mobile devices, M2M, sensors and other IoT devices, is a big challenge that many organizations face today. Traditional tools, such as conventional database systems, do not have the capacity to ingest data, analyze it in real-time, and make decisions. New technologies such as Apache Spark and Storm are now coming to the forefront as possible solutions to handing fast data streams. Typical technology choices fall into one of three categories: OLAP, OLTP, and stream-processing systems. Each of these solutions has its benefits, but some choices support streaming data and application development much better than others. Employing a solution that handles streaming data, provides state, ensures durability, and supports transactions and real-time decisions is key to benefitting from fast data. During this presentation you will learn: - The difference between fast OLAP, stream-processing, and OLTP database solutions. - The importance of state, real-time analytics and real-time decisions when building applications on streaming data. - How streaming applications deliver more value when built on a super-fast in-memory, SQL database.
Building a Modern FinTech Big Data InfrastructureDatabricks
The cloud is now the first choice for large-scale analytics, but organizations that have sunk investment into Hadoop on-premises are also challenged with maintaining operations. This can make a move to modern analytics platforms like Spark difficult or impossible. Learn about innovations for large-scale migration that can take full advantage of cloud-based analytics without disrupting operations.
This document discusses ESGYN DB, a distributed transaction processing database engine that runs natively on Hadoop. It was created by the same engineers who invented massively parallel processing and non-stop SQL databases decades ago. The document outlines key benefits of ESGYN DB such as enabling real-time business performance reporting on Hadoop, guaranteed ACID transactions, and reducing data lake operational analytics costs by 10 times. It also provides an overview of ESGYN's history and technology.
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...Lucas Jellema
Our technology has gotten smart and fast enough to make predictions and come up with recommendations in near real time. Machine Learning is the art of deriving models from our Big Data collections – harvesting historic patterns and trends – and applying those models to new data in order to rapidly and adequately respond to that data. This presentation will explain and demonstrate in simple, straightforward terms and using easy to understand practical examples what Machine Learning really is and how it can be useful in our world of applications, integrations and databases. Hadoop and Spark, real time and streaming analytics, Watson and Cloud Datalab, Jupyter Notebooks, Oracle Machine Learning CS and the Citizen Data Scientists all make their appearance, as does SQL.
Integrating Applications and Data (with Oracle PaaS Cloud) - Oracle Cloud Day...Lucas Jellema
Integration is a challenge that has become even more urgent with the move to the cloud that all organizations are making or are about to make. Whether SaaS applications have to be enabled (linked to other SaaS applications or to custom apps) or IoT is used to integrate the physical world into enterprise IT or whether microservices (on premises) have to collaborate with microservices (in the cloud) - integration is at the heart of enterprise IT. This presentation discusses the move to the cloud, a number of common integration use cases and the key components in Oracle PaaS Portfolio for tackling these challenges. The presentation was delivered at the Oracle Cloud Day 2017 in Nieuwegein, The Netherlands
Framework and Product Comparison for Big Data Log Analytics and ITOA Kai Wähner
IT systems and applications generate more and more machine data due to millions of mobile devices, Internet of Things, social network users, and other new emerging technologies. However, organizations experience challenges when monitoring and managing their IT systems and technology infrastructure. They struggle with network and server monitoring/troubleshooting, security analysis, custom application monitoring and debugging, compliance standards, and others.
This session discusses how to solve the challenges of analyzing Terabytes and more of different log data to leverage the “digital business” – a term defined by Gartner and others to explain that IT is not just a tool to enable a business, but IT is the business.
The main part of the session compares different solutions for operational intelligence and log analytics to create “digital business”, such as Splunk, TIBCO LogLogic and the open source “ELK stack” (ElasticSearch, Logstash, Kibana).
A common use case will be demonstrated in a live demo: Monitoring, analyzing and correlating a complex E-Commerce transaction running through different custom applications such as a Java EE web application, an integration middleware and analytics processes.
The end of the session explains the distinction of the discussed solutions to Apache Hadoop, and how they can complement each other in a big data architecture.
Tapdata provides a smart data as a service platform that offers:
1) Real-time data collection and synchronization from various sources like databases, files, and streaming data.
2) Data modeling and governance capabilities like data validation, quality checks, and AI-assisted cataloging.
3) Scalable data storage across TBs to PBs of data using a distributed database.
4) A code-less API publishing module to quickly build and deploy RESTful APIs for internal and external users.
Driving Business Outcomes with a Modern Data Architecture - Level 100Amazon Web Services
Your business data contains critical information about customer behaviors, operational decisions, and many factors that have financial impact on your organisation. Increasingly though, this data is too big, too fast, and too complex for existing systems to handle. AWS Data and Analytics services are designed to ingest, store, analyse, and consume information at record-breaking scale. In this session you will learn how these services work together to deliver business automation, enhance customer engagement and intelligence.
Speaker: Craig Stires, APAC Business Development - Big Data & Analytics, Amazon Web Services
Akmal Chaudhri - How to Build Streaming Data Applications: Evaluating the Top...NoSQLmatters
Building applications on streaming data has its challenges. If you are trying to use programs such as Apache Spark or Storm to build applications, this presentation will explain the advantages and disadvantages of each solution and how to choose the right tool for your next streaming data project. Building streaming data applications that can manage the massive quantities of data generated from mobile devices, M2M, sensors and other IoT devices, is a big challenge that many organizations face today. Traditional tools, such as conventional database systems, do not have the capacity to ingest data, analyze it in real-time, and make decisions. New technologies such as Apache Spark and Storm are now coming to the forefront as possible solutions to handing fast data streams. Typical technology choices fall into one of three categories: OLAP, OLTP, and stream-processing systems. Each of these solutions has its benefits, but some choices support streaming data and application development much better than others. Employing a solution that handles streaming data, provides state, ensures durability, and supports transactions and real-time decisions is key to benefitting from fast data. During this presentation you will learn: - The difference between fast OLAP, stream-processing, and OLTP database solutions. - The importance of state, real-time analytics and real-time decisions when building applications on streaming data. - How streaming applications deliver more value when built on a super-fast in-memory, SQL database.
Building a Modern FinTech Big Data InfrastructureDatabricks
The cloud is now the first choice for large-scale analytics, but organizations that have sunk investment into Hadoop on-premises are also challenged with maintaining operations. This can make a move to modern analytics platforms like Spark difficult or impossible. Learn about innovations for large-scale migration that can take full advantage of cloud-based analytics without disrupting operations.
This document discusses ESGYN DB, a distributed transaction processing database engine that runs natively on Hadoop. It was created by the same engineers who invented massively parallel processing and non-stop SQL databases decades ago. The document outlines key benefits of ESGYN DB such as enabling real-time business performance reporting on Hadoop, guaranteed ACID transactions, and reducing data lake operational analytics costs by 10 times. It also provides an overview of ESGYN's history and technology.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
IBM z Analytics provides machine learning capabilities for enterprise data on IBM Z mainframes. The presentation discusses machine learning concepts and how IBM solutions apply machine learning to real-world problems. It demonstrates how to build machine learning models using Spark on transactional data from IBM Z, and how trained models can be deployed on IBM Z to power applications in real-time. IBM offers multiple options for machine learning including on cloud, on-premises, and optimized for IBM Z to best suit enterprise needs.
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
This document provides an overview of a webinar on driving business transformation with real-time analytics using Apache Kafka and KSQL. The webinar features presentations from Nick Dearden of Confluent, John Thuma of Arcadia Data, and Thomas Clarke of RCG Global Services. It discusses how Kafka and KSQL can be used together to enable real-time data processing and analytics. It also highlights how Arcadia Data provides a BI tool for KSQL that allows for easy drag-and-drop dashboarding on streaming data. RCG then discusses its approach to digital transformation and data architecture services. The webinar concludes with a Q&A section.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Informatica Cloud Services deliver purpose-built data integration cloud applications to allow business users to integrate data across cloud-based applications and on-premise systems and databases. Informatica Cloud Services address specific business processes (customer/product master synchronization, opportunity to order, etc.) and point-to-point data integration (e.g. Salesforce.com to on premise end-points).
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
BI architecture drivers have to change to satisfy new requirements in format, volume, latency, hosting, analysis, reporting, and visualization. In this presentation delivered at the 2014 SATURN conference, SoftServe`s Serhiy and Olha showcased a number of reference architectures that address these challenges and speed up the design and implementation process, making it more predictable and economical:
- Traditional architecture based on an RDMBS data warehouse but modernized with column-based storage to handle a high load and capacity
- NoSQL-based architectures that address Big Data batch and stream-based processing and use popular NoSQL and complex event-processing solutions
- Hybrid architecture that combines traditional and NoSQL approaches to achieve completeness that would not be possible with either alone
The architectures are accompanied by real-life projects and case studies that the presenters have performed for multiple companies, including Fortune 100 and start-ups.
This document summarizes a webinar about using Informatica Cloud to load big data into AWS services like Amazon Redshift for analytics. It discusses how Informatica Cloud can help consolidate and analyze customer data from multiple sources for a company called UBM to improve customer insights. The webinar also provides an example of how UBM used Informatica Cloud and Redshift to better understand customer behaviors and identify potential event attendees through analytics.
Using Hadoop for Cognitive Analytics discusses using Hadoop and external data sources for cognitive analytics. The document outlines solution architectures that integrate external and customer-specific metrics to improve decision making. Microservices are used for data ingestion and curation from various sources into Hadoop for storage and analytics. This allows combining business metrics with hyperlocal data at precise locations to provide insights.
eBay has one of the largest and most active data platforms in the world. The presentation discusses eBay's business, strategy, and key trends driving commerce. It then provides details on eBay's big data platform, including the large volume of data collected daily and how it is captured, transformed and synthesized to provide actionable insights. The presentation concludes by discussing how eBay has evolved to a governed self-service model for analytics to better organize and provide a consistent experience for its diverse user community.
The document discusses the challenges of maintaining separate data lake and data warehouse systems. It notes that businesses need to integrate these areas to overcome issues like managing diverse workloads, providing consistent security and user management across uses cases, and enabling data sharing between data science and business analytics teams. An integrated system is needed that can support both structured analytics and big data/semi-structured workloads from a single platform.
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
In this presentation I do a review of the architecture of an AI application for IoT environments.
Since specific modeling and training aspects also have an impact on the final implementation of an enterprise ready solution, such solutions become very complex pretty soon.
The complexity of AI system for IoT is a big challenge – thus, I want to break this complexity down into particular views, which emphasize the individual but still interconnected aspects more clearly.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented using DataStax Enterprise as the backend.
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
The Internet of Things (IoT) is getting more and more traction as valuable use cases come to light. Whether you are in Healthcare, Telecommunications, Manufacturing, Banking or Retail to name a few industries, there is one key challenge and that's the integration of backend IoT data logs and applications, business services and cloud services to process the data in real time and at scale.
In this talk, we will be sharing how Kafka has become the leading technology used throughout the business to provide Real Time Event Streaming. Explore real life use cases of Kafka Connect, Kafka Streams and KSQL independent of the data deployment be it on a private or public Cloud, On Premise or at the Edge.
Audi - Connected car infrastructure
Robert Bosch Power Tools - Track and Trace of devices and people at construction areas
Deutsche Bahn - Customer 360 for train timetable updates
E.ON - IoT Streaming Platform to integrate and build smart home, smart building and smart grid infrastructures
This document summarizes Manulife's global data strategy and data operations in Melbourne. It discusses establishing a balanced hub-and-spoke model to provide global consistency, talent, and dynamics. The data offices follow the business roadmap and have engineering, governance, and analytics functions. The enterprise data lake setup includes three physical instances across regions with identical technology stacks for operations, preview, validation, and DR. It ingests and stores various data sources and enables advanced analysis, digital connection of systems, and automated reporting use cases across regions.
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...Lucas Jellema
Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope and volume of data and the place of data in the IT architecture. BigData, unstructured data and non-relational data stored on Hadoop, in NoSQL databases and held in Elastic Search Indexes, Caches and Message Queues complements data in the enterprise RDBMS. Emerging patterns such as microservices that contain their own data, BASE, CQRS and Event Sourcing have changed the way we store, share and govern data. This session introduces patterns, technologies, trends and hypes around storing, processing and retrieving data using products such as MongoDB, MySQL, Kafka, Redis, Elastic Search and Hadoop/Spark -locally,in containers and on the cloud
Key take away: what an application architect and a developer should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation.
These are the slides for the presentation as well as all the demos I prepared for the Devoxx Morocco event in November 2017. The deck includes 150+ slides showing the setup of the demo environment (Oracle Public Cloud DBaaS, Event Hub, Application Container, Application Cache, Kubernetes and Kafka) and the detailed demo steps that show Microservices with Data Bounded Context, Event based choreography and CQRS in action.
This document provides an overview of a presentation on big data and data science. It covers:
1. An introduction to key concepts in big data including architecture, Hadoop, sources of data, and definitions.
2. Details on common big data reference architectures from companies like IBM, Oracle, SAP, and open source technologies.
3. A discussion of how data science is disrupting various industries and the characteristics of firms using data science successfully.
4. Descriptions of machine learning techniques like segmentation, forecasting, and the overall reference architecture for machine learning involving data storage, signal extraction, and responding to insights.
WJAX 2013 Slides online: Big Data beyond Apache Hadoop - How to integrate ALL...Kai Wähner
Big data represents a significant paradigm shift in enterprise technology. Big data radically changes the nature of the data management profession as it introduces new concerns about the volume, velocity and variety of corporate data. Apache Hadoop is the open source defacto standard for implementing big data solutions on the Java platform. Hadoop consists of its kernel, MapReduce, and the Hadoop Distributed Filesystem (HDFS). A challenging task is to send all data to Hadoop for processing and storage (and then get it back to your application later), because in practice data comes from many different applications (SAP, Salesforce, Siebel, etc.) and databases (File, SQL, NoSQL), uses different technologies and concepts for communication (e.g. HTTP, FTP, RMI, JMS), and consists of different data formats using CSV, XML, binary data, or other alternatives. This session shows different open source frameworks and products to solve this challenging task. Learn how to use every thinkable data with Hadoop – without plenty of complex or redundant boilerplate code.
IBM z Analytics provides machine learning capabilities for enterprise data on IBM Z mainframes. The presentation discusses machine learning concepts and how IBM solutions apply machine learning to real-world problems. It demonstrates how to build machine learning models using Spark on transactional data from IBM Z, and how trained models can be deployed on IBM Z to power applications in real-time. IBM offers multiple options for machine learning including on cloud, on-premises, and optimized for IBM Z to best suit enterprise needs.
Driving Business Transformation with Real-Time Analytics Using Apache Kafka a...confluent
This document provides an overview of a webinar on driving business transformation with real-time analytics using Apache Kafka and KSQL. The webinar features presentations from Nick Dearden of Confluent, John Thuma of Arcadia Data, and Thomas Clarke of RCG Global Services. It discusses how Kafka and KSQL can be used together to enable real-time data processing and analytics. It also highlights how Arcadia Data provides a BI tool for KSQL that allows for easy drag-and-drop dashboarding on streaming data. RCG then discusses its approach to digital transformation and data architecture services. The webinar concludes with a Q&A section.
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
Real-time data beats slow data. That’s true for almost every use case. Nevertheless, enterprise architects build new infrastructures with the Lambda architecture that includes separate batch and real-time layers.
This video explores why a single real-time pipeline, called Kappa architecture, is the better fit for many enterprise architectures. Real-world examples from companies such as Disney, Shopify, Uber, and Twitter explore the benefits of Kappa but also show how batch processing fits into this discussion positively without the need for a Lambda architecture.
The main focus of the discussion is on Apache Kafka (and its ecosystem) as the de facto standard for event streaming to process data in motion (the key concept of Kappa), but the video also compares various technologies and vendors such as Confluent, Cloudera, IBM Red Hat, Apache Flink, Apache Pulsar, AWS Kinesis, Amazon MSK, Azure Event Hubs, Google Pub Sub, and more.
Video recording of this presentation:
https://youtu.be/j7D29eyysDw
Further reading:
https://www.kai-waehner.de/blog/2021/09/23/real-time-kappa-architecture-mainstream-replacing-batch-lambda/
https://www.kai-waehner.de/blog/2021/04/20/comparison-open-source-apache-kafka-vs-confluent-cloudera-red-hat-amazon-msk-cloud/
https://www.kai-waehner.de/blog/2021/05/09/kafka-api-de-facto-standard-event-streaming-like-amazon-s3-object-storage/
Informatica Cloud Services deliver purpose-built data integration cloud applications to allow business users to integrate data across cloud-based applications and on-premise systems and databases. Informatica Cloud Services address specific business processes (customer/product master synchronization, opportunity to order, etc.) and point-to-point data integration (e.g. Salesforce.com to on premise end-points).
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
Slides from my talk at Codemotion Rome in March 2017. Development of analytic machine learning / deep learning models with R, Apache Spark ML, Tensorflow, H2O.ai, RapidMinder, KNIME and TIBCO Spotfire. Deployment to real time event processing / stream processing / streaming analytics engines like Apache Spark Streaming, Apache Flink, Kafka Streams, TIBCO StreamBase.
Big Data Analytics: Reference Architectures and Case Studies by Serhiy Haziye...SoftServe
BI architecture drivers have to change to satisfy new requirements in format, volume, latency, hosting, analysis, reporting, and visualization. In this presentation delivered at the 2014 SATURN conference, SoftServe`s Serhiy and Olha showcased a number of reference architectures that address these challenges and speed up the design and implementation process, making it more predictable and economical:
- Traditional architecture based on an RDMBS data warehouse but modernized with column-based storage to handle a high load and capacity
- NoSQL-based architectures that address Big Data batch and stream-based processing and use popular NoSQL and complex event-processing solutions
- Hybrid architecture that combines traditional and NoSQL approaches to achieve completeness that would not be possible with either alone
The architectures are accompanied by real-life projects and case studies that the presenters have performed for multiple companies, including Fortune 100 and start-ups.
This document summarizes a webinar about using Informatica Cloud to load big data into AWS services like Amazon Redshift for analytics. It discusses how Informatica Cloud can help consolidate and analyze customer data from multiple sources for a company called UBM to improve customer insights. The webinar also provides an example of how UBM used Informatica Cloud and Redshift to better understand customer behaviors and identify potential event attendees through analytics.
Using Hadoop for Cognitive Analytics discusses using Hadoop and external data sources for cognitive analytics. The document outlines solution architectures that integrate external and customer-specific metrics to improve decision making. Microservices are used for data ingestion and curation from various sources into Hadoop for storage and analytics. This allows combining business metrics with hyperlocal data at precise locations to provide insights.
eBay has one of the largest and most active data platforms in the world. The presentation discusses eBay's business, strategy, and key trends driving commerce. It then provides details on eBay's big data platform, including the large volume of data collected daily and how it is captured, transformed and synthesized to provide actionable insights. The presentation concludes by discussing how eBay has evolved to a governed self-service model for analytics to better organize and provide a consistent experience for its diverse user community.
The document discusses the challenges of maintaining separate data lake and data warehouse systems. It notes that businesses need to integrate these areas to overcome issues like managing diverse workloads, providing consistent security and user management across uses cases, and enabling data sharing between data science and business analytics teams. An integrated system is needed that can support both structured analytics and big data/semi-structured workloads from a single platform.
How to Apply Big Data Analytics and Machine Learning to Real Time Processing ...Codemotion
The world gets connected more and more every year due to Mobile, Cloud and Internet of Things. "Big Data" is currently a big hype. Large amounts of historical data are stored in Hadoop to find patterns, e.g. for predictive maintenance or cross-selling. But how to increase revenue or reduce risks in new transactions? "Fast Data" via stream processing is the solution to embed patterns into future actions in real-time. This session discusses how machine learning and analytic models with R, Spark MLlib, H2O, etc. can be integrated into real-time event processing. A live demo concludes the session
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
In this presentation I do a review of the architecture of an AI application for IoT environments.
Since specific modeling and training aspects also have an impact on the final implementation of an enterprise ready solution, such solutions become very complex pretty soon.
The complexity of AI system for IoT is a big challenge – thus, I want to break this complexity down into particular views, which emphasize the individual but still interconnected aspects more clearly.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented using DataStax Enterprise as the backend.
IoT Architectures for Apache Kafka and Event Streaming - Industry 4.0, Digita...Kai Wähner
The Internet of Things (IoT) is getting more and more traction as valuable use cases come to light. Whether you are in Healthcare, Telecommunications, Manufacturing, Banking or Retail to name a few industries, there is one key challenge and that's the integration of backend IoT data logs and applications, business services and cloud services to process the data in real time and at scale.
In this talk, we will be sharing how Kafka has become the leading technology used throughout the business to provide Real Time Event Streaming. Explore real life use cases of Kafka Connect, Kafka Streams and KSQL independent of the data deployment be it on a private or public Cloud, On Premise or at the Edge.
Audi - Connected car infrastructure
Robert Bosch Power Tools - Track and Trace of devices and people at construction areas
Deutsche Bahn - Customer 360 for train timetable updates
E.ON - IoT Streaming Platform to integrate and build smart home, smart building and smart grid infrastructures
This document summarizes Manulife's global data strategy and data operations in Melbourne. It discusses establishing a balanced hub-and-spoke model to provide global consistency, talent, and dynamics. The data offices follow the business roadmap and have engineering, governance, and analytics functions. The enterprise data lake setup includes three physical instances across regions with identical technology stacks for operations, preview, validation, and DR. It ingests and stores various data sources and enables advanced analysis, digital connection of systems, and automated reporting use cases across regions.
50 Shades of Data - how, when and why Big,Relational,NoSQL,Elastic,Event,CQRS...Lucas Jellema
Data has been and will be the key ingredient to enterprise IT. What is changing is the nature, scope and volume of data and the place of data in the IT architecture. BigData, unstructured data and non-relational data stored on Hadoop, in NoSQL databases and held in Elastic Search Indexes, Caches and Message Queues complements data in the enterprise RDBMS. Emerging patterns such as microservices that contain their own data, BASE, CQRS and Event Sourcing have changed the way we store, share and govern data. This session introduces patterns, technologies, trends and hypes around storing, processing and retrieving data using products such as MongoDB, MySQL, Kafka, Redis, Elastic Search and Hadoop/Spark -locally,in containers and on the cloud
Key take away: what an application architect and a developer should know about the various types of data in enterprise IT and how to store/manage/query/manipulate them. What products and technologies are at your disposal. How can you make these work together - for a consistent (enough) overall data presentation.
These are the slides for the presentation as well as all the demos I prepared for the Devoxx Morocco event in November 2017. The deck includes 150+ slides showing the setup of the demo environment (Oracle Public Cloud DBaaS, Event Hub, Application Container, Application Cache, Kubernetes and Kafka) and the detailed demo steps that show Microservices with Data Bounded Context, Event based choreography and CQRS in action.
This document provides an overview of a presentation on big data and data science. It covers:
1. An introduction to key concepts in big data including architecture, Hadoop, sources of data, and definitions.
2. Details on common big data reference architectures from companies like IBM, Oracle, SAP, and open source technologies.
3. A discussion of how data science is disrupting various industries and the characteristics of firms using data science successfully.
4. Descriptions of machine learning techniques like segmentation, forecasting, and the overall reference architecture for machine learning involving data storage, signal extraction, and responding to insights.
A Winning Strategy for the Digital EconomyEric Kavanagh
The speed of innovation today creates tremendous opportunities for some, existential threats for others. Companies that win create their own success by leveraging modern data platforms. While architectures vary, the foundation is often in-memory, and the latency is real-time. Register for this Special Edition of The Briefing Room to hear veteran Analyst Dr. Robin Bloor explain how today's data platforms enable the modern enterprise in groundbreaking ways. He'll be briefed by Chris Hallenbeck of SAP who will demonstrate how forward-looking companies are leveraging real-time data platforms to achieve operational excellence, make decisions faster, and find new ways to innovate.
This document discusses how a company called EarEcstasy modernized their data architecture to enable better business insights and customer experiences. It describes their journey from a traditional B2B model to launching smart earbuds directly to consumers. This required answering new types of questions quickly, so EarEcstasy looked to build a modern data architecture on AWS. The summary outlines three key outcomes: 1) Modernizing and consolidating their data infrastructure, 2) Innovating for new revenues through personalization, and 3) Enabling real-time customer engagement.
Modern Data Architectures for Business Insights at Scale Amazon Web Services
This document discusses modern data architectures for business insights at scale. It begins by explaining how businesses can gain insights from analyzing customer data and logs. It then discusses the challenges posed by big data in terms of increasing volume, velocity, and variety of data. The document outlines several AWS services that can be used to ingest, store, process, and analyze data at different speeds (batch, real-time, interactive). It provides examples of how companies like Redfin, Nordstrom, and Euclid leverage AWS to gain insights from customer data. The document emphasizes experimenting with available data and AWS services to deliver business outcomes and continuous differentiation.
Big Data and Analytics on Amazon Web Services: Building A Business-Friendly P...Amazon Web Services
If you are crafting a better customer experience, automating your business, or modernizing your systems, you are likely finding that your data and analytics platform is absolutely critical to your success. In this session, we will look at how customers are building on the managed services from Amazon Web Services to meet the needs of the business. Patterns we see gaining popularity are near-real time engagement with customers over mobile, also combining and analyzing unstructured consumer behavior with structured transactional data, as well as managing spiky data workloads. See how our customers use our managed, elastic, secure, and highly available services to change what is possible.
Craig Stires, Head of Big Data and Analytics, Amazon Web Services, APAC
The document outlines a reference architecture for using big data and analytics to address challenges in areas like fraud detection, risk reduction, compliance, and customer churn prevention for financial institutions. It describes components like streaming data ingestion, storage, processing, analytics and machine learning, and presentation. Specific applications discussed include money laundering prevention, using techniques like decision trees, cluster analysis, and pattern detection on data from multiple sources stored in Azure data services.
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
The document provides an overview of topics to be covered in a 60 minute session on big data. It will discuss big data architecture, Hadoop, data science career opportunities, and include a Q&A. The presenter is introduced as a big data entrepreneur with 14 years of experience architecting distributed data systems. Key aspects of big data are defined, including where data is generated from various sources. Different data types and challenges of structured vs unstructured data are outlined. The architecture of big data systems is depicted, including components like Hadoop, data warehouses, data marts and more. Examples of big data in various industries are given to showcase the growth of data.
Richard Vermillion, CEO of After, Inc. and Fulcrum Analytics, Inc. discusses data lakes and their value in supporting the warranty and extended service plain chain.
Hadoop 2.0: YARN to Further Optimize Data ProcessingHortonworks
Data is exponentially increasing in both types and volumes, creating opportunities for businesses. Watch this video and learn from three Big Data experts: John Kreisa, VP Strategic Marketing at Hortonworks, Imad Birouty, Director of Technical Product Marketing at Teradata and John Haddad, Senior Director of Product Marketing at Informatica.
Multiple systems are needed to exploit the variety and volume of data sources, including a flexible data repository. Learn more about:
- Apache Hadoop 2 and YARN
- Data Lakes
- Intelligent data management layers needed to manage metadata and usage patterns as well as track consumption across these data platforms.
Amazon Kinesis is a platform for streaming data ingestion, processing, and analytics on AWS. The presentation discusses three Amazon Kinesis services - Kinesis Streams, Kinesis Firehose, and Kinesis Analytics. It provides an overview of each service and examples of how customers use streaming data and these services for applications like IoT, online gaming, advertising, and financial services. It also includes a demo of building a serverless IoT analytics solution on AWS using these streaming data services.
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
Your customers probably want a better experience with your brand. Your different business teams want and need better insights in their decision making. Almost certainly, your finance and operations teams require this to happen at a fraction of the cost of traditional on-premises options. Modern data architectures on AWS help many of our best customers realize all of those goals. Your business data contains critical information about customer behaviors, operational decisions, and many factors that have financial impact on your organization. Increasingly, this data sits beyond your transactional systems, and is too big, too fast, and too complex for existing systems to handle. AWS Data and Analytics services are designed from our customers' requirements to ingest, store, analyze, and consume information at record-breaking scale. In this session you will learn how these services work together to deliver business automation, enhance customer engagement and intelligence.
This document discusses streaming data and real-time analytics. It covers:
1) Streaming data is characterized by its low latency, continuous, ordered, and incremental nature with high volumes. Common uses include IoT, log analytics, and smart home/city applications.
2) ABN AMRO developed a Customer Event Store to handle growing volumes of customer event data from various sources in real-time, replacing their batch-driven data warehouse. It uses services like Kinesis and Step Functions.
3) Technologies like Kinesis, Kinesis Data Firehose, Kinesis Data Analytics and Flink allow ingesting, processing, and analyzing streaming data in real-time for applications like predictive analytics and recommendations.
15 slide presentation displaying the use cases, features and benefits of the 4th generation Kingland Platform. The platform delivers enterprise data management solutions for some of the world's largest organizations. Powered by an artificial intelligence suite, the platform helps organizations avoid costs, accelerate projects, and improve how you use data to make business decisions.
Shawn Gandhi, head of Solutions Architecture for AWS Canada, takes us on a journey through Big Data and the different strategies and services available to implementers and practicioners.
Kaizentric is a Data Analytics firm, based in Chennai, India. Statistical Analysis is performed on a well-built client specific data warehouse, supported by Data Mining.
Take Action: The New Reality of Data-Driven BusinessInside Analysis
The Briefing Room with Dr. Robin Bloor and WebAction
Live Webcast on July 23, 2014
Watch the archive:
https://bloorgroup.webex.com/bloorgroup/lsr.php?RCID=360d371d3a49ad256942f55350aa0a8b
The waiting used to be the hardest part, but not anymore. Today’s cutting-edge enterprises can seize opportunities faster than ever, thanks to an array of technologies that enable real-time responsiveness across the spectrum of business processes. Early adopters are solving critical business challenges by enabling the rapid-fire design, development and production of very specific applications. Functionality can range from improved customer engagement to dynamic machine-to-machine interactions.
Register for this episode of The Briefing Room to learn from veteran Analyst Dr. Robin Bloor, who will tout a new era in data-driven organizations, and why a data flow architecture will soon be critical for industry leaders. He’ll be briefed by Sami Akbay of WebAction, who will showcase his company’s real-time data management platform, which combines all the component parts needed to access, process and leverage data big and small. He’ll explain how this new approach can provide game-changing power to organizations of all types and sizes.
Visit InsideAnlaysis.com for more information.
Analytical Systems Evolution: From Excel to Big Data Platforms and Data LakesProvectus
Maxim Tereschenko (BigData Lead, Provectus) with the talk "Analytical Systems Evolution - From Excel to Big Data Platforms and Data Lakes".
Description: For the last ten years, analytical systems have changed dramatically. From Excel and Data Warehouses, we came to Big Data platforms and Data Lakes. It is no longer fantasy to communicate with the analytical system by voice or to wander in 3D glasses among the visualizations of the data. In scope of the speech, I want to follow this evolution, identify its main trends and fantasize about the future.
Similar to 50 Shades of Data - Dutch Oracle Architects Platform (February 2018) (20)
Introduction to web application development with Vue (for absolute beginners)...Lucas Jellema
In this slide deck I show you how you can easily and quickly create quite rich web applications with Vue 3 – without having to study complex concepts or understand many technical details. I have only recently learned how to work with Vue 3 myself and now is the best time for me to share my learning experience (and my enthusiasm) with you. I know what I found essential to understand and what most got me excited in these early steps (what was a little bit hard to grasp). I believe that I can present my steps and guide you to experience the same fun and have a similarly gratifying experience. I am not an expert in this subject – I have barely learned how to walk and that is why I can help you with these first steps with Vue.
In this deck, I do not explain how Vue works. I do not really know that. I will show you how to work with it and how to create web applications that are functional, appealing, fast and responsive.
The approach I am taking is straightforward:
• I will tell you a little bit about web development, browsers and reactive frameworks
• I will show the hello world of Vue applications
• I will explain about components and nesting, events, data binding and reactive behavior and demonstrate these concepts
• I will introduce Vue UI Component libraries – and with no effort at all we will launch our application to the next level – with rich components to explore, manipulate, visualize data collections
• We will publish the web application from our development environment to where the whole world could see it – using GitHub Pages
• As bonus topic – we discuss state management
At the end of this session you will be able to quickly create a simple yet rich web application with Vue 3. You have a starting point to further evolve your skills with the many online resources I am convinced that you will enjoy your newfound powers and the simplicity and power of Vue 3.
Note: a tutorial accompanies this slide deck - see https://github.com/lucasjellema/code-face-vue3-intro-reactiive-webapps-aug2023/blob/main/README.md
Making the Shift Left - Bringing Ops to Dev before bringing applications to p...Lucas Jellema
The document discusses bringing operations considerations into the development process earlier, referred to as "shifting left." It advocates designing applications with operations in mind from the beginning. This includes understanding operational objectives, constraints, and service level agreements. Application telemetry and monitoring are also important to incorporate from the start. The document provides examples of how to implement operational practices like deployments, health checks, and incident response processes in a shifted left model where development and operations work more closely together.
Lightweight coding in powerful Cloud Development Environments (DigitalXchange...Lucas Jellema
The document discusses lightweight coding in powerful cloud development environments using Gitpod. It describes Gitpod as providing a preconfigured Linux development environment in the browser or on local machine. The document outlines key Gitpod features like open source project collaboration, costs which are free for 50 hours per month, and benefits like clean environments and efficient resource usage. It also briefly mentions other tools like GitHub Codespaces.
Apache Superset - open source data exploration and visualization (Conclusion ...Lucas Jellema
Introducing Apache Superset - an open source platform for data exploration, visualization and analysis - co-starring Trino and Steampipe for providing SQL access to many non-SQL data sources.
CONNECTING THE REAL WORLD TO ENTERPRISE IT – HOW IoT DRIVES OUR ENERGY TRANSI...Lucas Jellema
Enterprise IT systems are deaf, blind and highly insensitive. They do not know what is going on in the outside world. Through Internet of Things technology, we provide eyes, ears and hands that allow enterprises to learn about and react in real time to events in the physical world. The energy transition at a major Dutch energy company (Eneco) is powered by IoT technology – to steer and sometimes curtail windmills and solar farms and to coordinate local energy production and trade. This session shows you how the physical world was connected to the customer portals and apps, asset management systems and Kafka platform through the Azure cloud based IoT Hub en Edge, digital twin, serverless functions, timeseries datastores and streaming data analysis. It is a story about technological innovation on top of existing foundations and of a vision for business and our society at large.
Help me move away from Oracle - or not?! (Oracle Community Tour EMEA - LVOUG...Lucas Jellema
I hear this aspiration from a growing number of organizations. Sometimes as a quite literal question. This however is merely half of a wish. Apparently, organizations want to quit with one thing — but have not yet stipulated what they desire instead. What is the objective that is pursued here? Only to get rid of Oracle? It will become clear why you should give a considerable thought about dropping Oracle, or any other vendors’ technology, when you’re not pleased with your current IT situation. You need to focus on the actual problems and objectives and define the suitable roadmap to fit your real needs. It turns out that the quest is usually for modernization and flexibility - and Oracle can very well be a part of that future.
Organizations with decades of investment in Oracle technology sometimes (and increasingly) express a wish to move away from Oracle. In this session, we will first explore where the desire to move away from Oracle might come from. Then we describe what the term Oracle represents — more than 2.000 products on all layers in the technology stack and in different business areas. Finally, we map out what the ‘moving away from’ consists of: defining where you ‘move to’ and subsequently actually going there.
It will become clear why you should give considerable thought about dropping Oracle, or any other vendors’ technology, when you’re not pleased with your current IT situation. You need to focus on the actual problems and objectives and define the suitable roadmap to fit your real needs. It turns out that the quest is usually for modernization and flexibility - and Oracle can very well be a part of that future.
Original storyline in this Medium Article: https://medium.com/real-vox/what-if-companies-say-help-me-move-away-from-oracle-ffbbc95afc4f
IoT - from prototype to enterprise platform (DigitalXchange 2022)Lucas Jellema
In 2019 the company started a small scale IoT project: smart meters in consumer homes, a cloud based IoT platform for device management, metrics collecting, monitoring and real time data processing. From the initial 12 devices and this single use case, the initiative has rapidly scaled, to tens of thousands devices - including entire wind parks and solar farms - and seven substantial business cases, not just for harvesting data but increasingly for real time actuation. The IoT Platform is feeding the brain at the heart of the enterprise - through an event streaming platform and an API platform. It supports complex operations with anomaly detection on metrics streams and device and communication monitoring. This session tells about the eye catching business cases - what are business objectives and results - and explains the journey since the start. It continues the story presented at DigitalXchange 2020 - discussing technical challenges and solutions as well as organizational aspects. Areas of particular interest: edge processing, data analytics and machine learning.
Who Wants to Become an IT Architect-A Look at the Bigger Picture - DigitalXch...Lucas Jellema
Pitch: The movie The Matrix made it clear: The Architect is powerful. How to be(come) and IT architect? What do you do, what do you need to know, is it fun and why? Using real world examples, core principles and useful tools, this session introduces the subtle art of designing and realizing flexible IT architectures. </p><p>Taking a step back to get and create an overview, frequently asking why to get to the real intention, bringing aspects such as cost, scale, time and change and business strategy into the design and bridging the gap between business owners, process managers and technical specialists. One way to define the responsibility of an IT architect. In this session, we will discuss what is expected of the architect and what you need to do for that and what you could use to get it done. How do you get started as an architect, how to grow in that role? We discuss a number of real life architectural challenges and solution design. And discuss a number of architecture principles, patterns, and powers to apply. Never stop programming - but perhaps rise to the architecture challenge too.
Notes: Many IT professionals aspire to become architects. Many architects wonder what it is they have to do. After 27 years in IT I find I have slowly and steadily moved into a role that I can probably use the label architect for, although still with some reluctance. What exactly does that mean - IT architect? While I may not have all answers and the ultimate truth and wisdom, I do have many architectural challenges to discuss and some core principles to share and a number of tips, tricks and tools to recommend that will help anyone get started or grow in a role as architect for software and IT systems. Elements that make an appearance include cloud, agile, DevOps, microservices, persistence, business, powers of persuasion, diagramming, cost, security, software engineering, data.
Outline: - two real world examples (one new business initiative, one running and struggling project) and how to approach them with an architect's mind - core principles to apply , patterns to us, what to unearth (the power question of WHY) - architecture products: what do you deliver as an architect; how do you ensure agility? - how to be effective? bringing your design to life - communication with stakeholders/powers of persuasion, monitoring adherence, being pragmatic but not lose grip; - anecdotal evidence from several small and large product teams - the good and also the ugly (architectural oversights and the consequences)
some specific answers to address - how much technical knowledge and programming skills does an architect require? What other knowledge is required and how to stay on top of your game? how to get going: first steps towards be(com)ing and architect?
Steampipe - use SQL to retrieve data from cloud, platforms and files (Code Ca...Lucas Jellema
Introduction to Steampipe - a tool for retrieving data and metadata about cloud resources, platform resources and file content - all through SQL. Data from clouds, files and platforms can be joined, filtered, sorted, aggregated using regular SQL. Steampipe offers a very convenient way to get hold of data that describes the environment in detail.
Automation of Software Engineering with OCI DevOps Build and Deployment Pipel...Lucas Jellema
Automation of software delivery has several advantages. Prevention of human error is certainly one. Consistent and complete execution of tried and tested build and deployment tasks as the only way to apply changes in the live environment. Once the pipelines have been set up, the engineers can focus on the software and applying the required changes to it. To bring that software all the way to production is a breeze. Oracle Cloud Infrastructure offers the DevOps service, introduced in the Summer of 2021. This service comes with git style code repositories, build servers and build pipelines, artifact repositories as well as deployment pipelines. This session introduces OCI DevOps and demonstrates how software can be built and deployed on OKE Kubernetes, Compute Instance VMs and Oracle Functions. From simple source code an application is put in production without manual intervention in the build and deployment process.
Introducing Dapr.io - the open source personal assistant to microservices and...Lucas Jellema
Dapr.io is an open source product, originated from Microsoft and embraced by a broad coalition of cloud suppliers (part of CNFC) and open source projects. Dapr is a runtime framework that can support any application and that especially shines with distributed applications - for example microservices - that run in containers, spread over clouds and / or edge devices.
With Dapr you give an application a "sidecar" - a kind of personal assistant that takes care of all kinds of common responsibilities. Capturing and retrieving state, publishing and consuming messages or events. Reading secrets and configuration data. Shielding and load balancing over service endpoints. Calling and subscribing to all kinds of SaaS and PaaS facilities. Logging traces across all kinds of application components and logically routing calls between microservices and other application components. Dapr provides generic APIs to the application (HTTP and gRPC) for calling all these generic services – and provides implementations of these APIs for all public clouds and dozens of technology components. This means that your application can easily make use of a wide range of relevant features - with a strict separation between the language the application uses for this (generic, simple) and the configuration of the specific technology (e.g. Redis, MySQL, CosmosDB, Cassandra, PostgreSQL, Oracle Database, MongoDB, Azure SQL etc) that the Dapr sidecar uses. Changing technology does not affect the application, but affects the configuration of the Sidecar. Dapr can be used from applications in any technology - from Java and C#/.NET to Go, Python, Node, Rust and PHP. Or whatever can talk HTTP (or gRPC).
In this Code Café I will introduce you to Dapr.io. I will show you what Dapr can do for you (application) and how you can Dapr-izen an application. I'll show you how an asynchronously collaborative system of microservices - implemented in different technologies - can be easily connected to Dapr, first to Redis as a Pub/Sub mechanism and then also to Apache Kafka without modifications. Then we do - with the interested parties - also a hands-on in which you will apply Dapr yourself . In a short time you get a good feel for how you can use Dapr for different aspects of your applications. And if nothing else, Dapr is a very easy way to get your code with Kafka, S3, Redis, Azure EventGrid, HashiCorp Consul, Twillio, Pulsar, RabbitMQ, HashiCorp Vault, AWS Secret Manager, Azure KeyVault, Cron, SMTP, Twitter, AWS SQS & SNS, GCP Pub/Sub and dozens of other technology components talk.
How and Why you can and should Participate in Open Source Projects (AMIS, Sof...Lucas Jellema
For a long time I have been reluctant to actively contribute to an open source project. I thought it would be rather complicated and demanding – and that I didn't have the knowledge or skills for it or at the very least that they (the project team) weren't waiting for me.
In December 2021, I decided to have a serious input into the Dapr.io project – and now finally to determine how it works and whether it is really that complicated. In this session I want to tell you about my experiences. How Fork, Clone, Branch, Push (and PR) is the rhythm of contributing to an open source project and how you do that (these are all Git actions against GitHub repositories). How to learn how such a project functions and how to connect to it; which tools are needed, which communication channels are used. I tell how the standards of the project – largely automatically enforced – help me to become a better software engineer, with an eye for readability and testability of the code.
How the review process is quite exciting once you have offered your contribution. And how the final "merge to master" of my contribution and then the actual release (Dapr 1.6 contains my first contribution) are nice milestones.
I hope to motivate participants in this session to also take the step yourself and contribute to an open source project in the form of issues or samples, documentation or code. It's valuable to the community and the specific project and I think it's definitely a valuable experience for the "contributer". I looked up to it and now that I've done it gives me confidence – and it tastes like more (I could still use some help with the work on Dapr.io, by the way).
Microservices, Apache Kafka, Node, Dapr and more - Part Two (Fontys Hogeschoo...Lucas Jellema
Apache Kafka is one of the best known enterprise grade message brokers – created at LinkedIn, donated to the Apache software foundation and used in an ever growing number of organizations to provide a backbone for asynchronous communication. This session introduces Apache Kafka – history, concepts, community and tooling. In a hands on lab, participants will create topics, publish and consume messages and get a general feel for Kafka. Simple microservices are developed in NodeJS – publishing to and consuming from Apache Kafka.
Dapr.io has support for Apache Kafka. Using Kafka through Dapr is very straightforward as is explained and demonstrated and applied in a second handson lab – with applications in various programming languages. Participants will even be able to exchange events across their laptops – through a cloud based Kafka broker.
Use of Apache Kafka in several architecture patterns is discussed – such as data integration, microservices, CQRS, Event Sourcing – along with a number of real world use cases from several well known organizations. The Kafka Connector framework is introduced – a set of adapters that allow us to easily connect Kafka to sources and sinks – where respectively change events are captured from and messages are published to.
Bonus Lab: Apache Kafka is ran on Kubernetes as is Dapr.io. Multiple mutually interacting microservices are deployed on the same local Kubernetes cluster.
Microservices, Node, Dapr and more - Part One (Fontys Hogeschool, Spring 2022)Lucas Jellema
This session does a quick recap of microservices: why do we want them, what problems do they solve and what are the principles around designing and implementing them? The Dapr.io runtime framework for distributed applications is introduced. Dapr provides a sidecar (almost like a personal assistant to a manager) to an application or microservice, a companion process that handles common tasks such as storing and retrieving state, consuming and publishing messages and events, invoking external services and other microservices as well as handling incoming requests. Participants will do a handson lab with Dapr.io and learn how to quickly implement interactions with various technologies, including Redis and MySQL.
Node(JS) is introduced – a server side JavaScript-based programming language that can be used well for implementing microservices. Some of the main characteristics of NodeJS are discussed (functional programming, asynchronous flows, NPM package manager) as well as common use cases (handle incoming HTTP requests, invoke REST APIs). In the second lab, Node and Dapr are used together to implement microservices that interact with databases and message brokers and each other – in a decoupled fashion.
6Reinventing Oracle Systems in a Cloudy World (RMOUG Trainingdays, February 2...Lucas Jellema
The cloud is changing many things. Even the decision to not (yet) adopt cloud is one to make explicitly. Now is a time for any organization to reconsider the IT landscape. For each system we should make a conscious ruling on its roadmap. The 6R model suggests six ways to move a system forward.
This session uses the 6R model and applies it specifically to Oracle technology based systems: what are the options and considerations for Oracle Database, Oracle Fusion Middleware, custom applications, and other red components? What future should we consider and how do we choose? The paths chosen by several Oracle-heavy users is presented to illustrate these options and the decision making process. Oracle Cloud Infrastructure and Autonomous Database play a role, as do Azure IaaS and Azure Managed Database as well as on premises systems. Latency, recovery, scalability, licenses, automation, lock-in, skills, and resources all make their appearance.
Help me move away from Oracle! (RMOUG Training Days 2022, February 2022)Lucas Jellema
Organizations with decades of investment in Oracle technology sometimes (and increasingly) express a wish to move away from Oracle. In this session, we will first explore where the desire to move away from Oracle might come from. Then we describe what the term Oracle represents -- more than 2.000 products on all layers in the technology stack and in different business areas. Finally, we map out what the 'moving away from' consists of: defining where you 'move to' and subsequently actually going there.
It will become clear why you should give considerable thought about dropping Oracle, or any other vendors' technology, when you're not pleased with your current IT situation. You need to focus on the actual problems and objectives and define the suitable roadmap to fit your real needs. It turns out that the quest is usually for modernization and flexibility - and Oracle can very well be a part of that future.
DevOps is a term used in many places and unfortunately also to mean many different things. This presentation (largely in Dutch) paints the DevOps picture. While it may not give a clear cut definition (there does not seem to be one) it certainly makes clear what DevOps is about, what objectives and origins are and which factors enable and drive DevOps.
Conclusion Code Cafe - Microcks for Mocking and Testing Async APIs (January 2...Lucas Jellema
Microcks is a tool for API Mocking and Testing. In this presentation an overview of the support in Microcks for asynchronous APIs - the event publishing and consuming behavior of services and applications
Cloud native applications offer scalability, flexibility, and optimal use of compute resources. Serverless functions interacting through events, leveraging cloud capabilities for persistent storage and automated operations take organization to the next level in IT. This session demonstrates polyglot Functions interacting with native cloud services for events and persistence (Object Storage and NoSQL Database) and leveraging the Key and Secrets Vault, Monitoring and Notifications services for operational control. A lightweight API Gateway is used to expose APIs to external consumers. Infrastructure as Code is the guiding principle in deploying both cloud resources and application components, through OCI CLI and Terraform. This session leverages many cloud native (enabling) services in Oracle Cloud Infrastructure. The session will introduce concepts, then spend most of the time on live demonstrations. All sources are shared with the audience, to allow participants to create the same application in their own cloud tenancy. What is so great about Cloud Native Applications? How do you create one? I will explain the first and demonstrate the second. On Oracle Cloud Infrastructure, using services that anyone can use for free, I will live create a cloud native application that streams, persists, notifies, scales, monitors Benefits: - get to know many different OCI services - understand the meaning, purpose and benefits of cloud native development - learn how to take your own first steps in OCI - for free!
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j
Dr. Jesús Barrasa, Head of Solutions Architecture for EMEA, Neo4j
Découvrez les dernières innovations de Neo4j, et notamment les dernières intégrations cloud et les améliorations produits qui font de Neo4j un choix essentiel pour les développeurs qui créent des applications avec des données interconnectées et de l’IA générative.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.
Graspan: A Big Data System for Big Code AnalysisAftab Hussain
We built a disk-based parallel graph system, Graspan, that uses a novel edge-pair centric computation model to compute dynamic transitive closures on very large program graphs.
We implement context-sensitive pointer/alias and dataflow analyses on Graspan. An evaluation of these analyses on large codebases such as Linux shows that their Graspan implementations scale to millions of lines of code and are much simpler than their original implementations.
These analyses were used to augment the existing checkers; these augmented checkers found 132 new NULL pointer bugs and 1308 unnecessary NULL tests in Linux 4.4.0-rc5, PostgreSQL 8.3.9, and Apache httpd 2.2.18.
- Accepted in ASPLOS ‘17, Xi’an, China.
- Featured in the tutorial, Systemized Program Analyses: A Big Data Perspective on Static Analysis Scalability, ASPLOS ‘17.
- Invited for presentation at SoCal PLS ‘16.
- Invited for poster presentation at PLDI SRC ‘16.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
A Study of Variable-Role-based Feature Enrichment in Neural Models of CodeAftab Hussain
Understanding variable roles in code has been found to be helpful by students
in learning programming -- could variable roles help deep neural models in
performing coding tasks? We do an exploratory study.
- These are slides of the talk given at InteNSE'23: The 1st International Workshop on Interpretability and Robustness in Neural Software Engineering, co-located with the 45th International Conference on Software Engineering, ICSE 2023, Melbourne Australia
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...XfilesPro
Wondering how X-Sign gained popularity in a quick time span? This eSign functionality of XfilesPro DocuPrime has many advancements to offer for Salesforce users. Explore them now!
Zoom is a comprehensive platform designed to connect individuals and teams efficiently. With its user-friendly interface and powerful features, Zoom has become a go-to solution for virtual communication and collaboration. It offers a range of tools, including virtual meetings, team chat, VoIP phone systems, online whiteboards, and AI companions, to streamline workflows and enhance productivity.
Do you want Software for your Business? Visit Deuglo
Deuglo has top Software Developers in India. They are experts in software development and help design and create custom Software solutions.
Deuglo follows seven steps methods for delivering their services to their customers. They called it the Software development life cycle process (SDLC).
Requirement — Collecting the Requirements is the first Phase in the SSLC process.
Feasibility Study — after completing the requirement process they move to the design phase.
Design — in this phase, they start designing the software.
Coding — when designing is completed, the developers start coding for the software.
Testing — in this phase when the coding of the software is done the testing team will start testing.
Installation — after completion of testing, the application opens to the live server and launches!
Maintenance — after completing the software development, customers start using the software.
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
Most important New features of Oracle 23c for DBAs and Developers. You can get more idea from my youtube channel video from https://youtu.be/XvL5WtaC20A
Microservice Teams - How the cloud changes the way we workSven Peters
A lot of technical challenges and complexity come with building a cloud-native and distributed architecture. The way we develop backend software has fundamentally changed in the last ten years. Managing a microservices architecture demands a lot of us to ensure observability and operational resiliency. But did you also change the way you run your development teams?
Sven will talk about Atlassian’s journey from a monolith to a multi-tenanted architecture and how it affected the way the engineering teams work. You will learn how we shifted to service ownership, moved to more autonomous teams (and its challenges), and established platform and enablement teams.
Microservice Teams - How the cloud changes the way we work
50 Shades of Data - Dutch Oracle Architects Platform (February 2018)
1. 50 Shades of
Data - how,
when and why
Big, Fast,
Relational,
NoSQL, Elastic,
Event, CQRS
On the many types of
data, data stores and data
usages
Dutch Oracle Architects Platform | 6th February 2018 1
µ
µ
3. What is data?
• A solidified representation of
• An observation [of a fact]
• A concept
• Serialized in order to be
• Understood & processed by machines
• Reproduced for human consumption
4. When things were simple
RDBMS
SQL
ACID
Data
files
Log
Files
Backup
Backup
Backup
SAN
5. And then stuff happened
Middle Tier:
Java EE (Stateful) application
Client Tier:
Browser
Client Tier:
Browser
Client Tier:
Browser
Mobile App
(offline)
Mobile App
(offline)
Mobile App
(offline)
Data
Warehouse
OO,
XML,
JSON
Content
Management
Big Data
Fast Data
API
API
API
µ λ
9. Business Areas
Marketing &
Campaigns
External Actors
Supplier
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Sales &
Customer
Service Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
Customers
10. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
Custom Order
Management
Application
B2B
APIs
B2B
APIs
Open Data
APIs
DaaS
Services
APIs
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Asset
Tracker
Business
Applications
11. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Business Applications
& IT Systems
Microservices
Platform
Kubernetes Container
Management
12. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Business & IT - Data List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets with
Sales records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
14. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Data Volume List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets
with Sales
records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
Big Data Lake
Machine Learning
models
Long term history Data
Warehouse
Big Lots of data
Small chunks of
off line data
Piles of log-files
Fine grained
events
Gathering – never
purging?
Small payloads
Medium size –
structured data
Rule meta-data
(very small)
15. Compression
• . Technical Compression
• Same data, fewer bits to store
• Same time – or even longer - to process
• Logical Compression
• Filter (older than, one in X)
• Reduce fine grainedness - helicopterview
• Average over geographical area
• Min/Max/Average per minute/hour/day
• Is typically done in data warehouse & digital twin
• Could be done for query stores and even for big data set
20. Fast Data – Fast Insight
Raw Data
Event Hub
Streaming with
Hot (Alerting)
and ColdIoT
Device Data Digital Twin
Machine Learning
Models to apply to digital
twin to predict maintenance
need
21. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Data Volatility
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Emails regarding
customer
complaints
Log-files from IT
systems (infra &
platform)
ML Models
In Flight
Messages
Events
Job
Schedules
Offers, invoices,
rewards
messages
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales
Aggregatesby
Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
List of Products
shown in UI
Spreadsheets with
Sales records
WebShop activity,
Social Media
discussions, …
In Flight
Messages
Events
Application &
Infrastructure
source history
Shopping Cart with
selected items
Audit Trails,
Security
Incidents
Readings from
motion detectors
Sales Aggregates by
Day, Region, Product
Category
high low
25. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Location List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets
with Sales
records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
Global Content Delivery
Network
Offline Storage in
Apps
Third party (SaaS)
Git repo
Offsite Standby for
Disaster Recovery
SaaS data store
in Cloud
DaaS data store
in Cloud
Application Server
Memory (on site)
Excel Sheets on
employee laptops
Local storage on “Things” &
Edge devices
Cloud storage for
Database backups
Local Database Instance for
each region
26. Considerations around
Location
• Latency
• Latency experienced by end-user is sum of latencies in the chain
• Co-located – systems with chatty interaction
• Storage cost
• Network Transport costs
• Ease of distribution
• Background distribution may be acceptable – provided it happens
frequently enough
• Off line usage
• Security
• Data “en route”
28. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Streaming List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets
with Sales
records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
Synchronization of Devices
coming online again
Upload of ML
Models
Replaying transaction on
standby database
Applications
being deployed
Update of
Datawarehouse
Laptops & USB sticks
on the move
Raw IoT => Streaming Analysis
=> {alerts | digital twin | big
data}
Customer sending
complaint by email
Synchronization of SaaS from
On Premises
Metrics from Apps | Platform |
Infra to Log Stash & Monitor
Events moving to consumers
UI updates pushed to
browser
Task notification sent to
employee
Fresh Data pushed to
Application Cache
Database Backup
moved offsite
30. TC(D)O –
Total Cost of Data Ownership
• Business cost (missed opportunity, user dissatisfaction, …) of not having the
data available
• at all or fast enough or fresh enough
Speed
Freshness
Available
Compute
Storage
Network
31. TC(D)O –
Total Cost of Data Ownership
• Direct cost of
• Acquiring data
• Storing Data
• Storage (cheap and slow, expensive and quick)
• Compression (less storage at expense of compute)
• Retrieving Data
• Compute resources
• Cleansing, Calculating & Deriving data (DWH, ML Model, CQRS)
• Compute resources
• Transporting Data
• Network traffic has a price tag (especially when out of local ‘area’)
32. TC(D)O –
Total Cost of Data Ownership
• Operational costs
• Backup & Recovery
• Security
• Intellectual property
• Life cycle management – slower tier, archive, purge
• “Right to be forgotten”
• Regulatory periods to hang on to data
33. Open (APIs) & DaaS
• Governments and NGOs, scientific
and even commercial organizations
are publishing data
• Inviting anyone who wants to join in
to help make sense of the data
– understand driving factors,
identify categories, help predict
• Many areas
• Economy, health, public safety, sports,
traffic & transportation, games,
environment, maps, …
37. Stale
• Data is a representation of the real world
• All data is inherently stale
• Except when it describes something that can not change – and whose
description can not change
• Staleness is probably not a problem
• Except in self driving cars…
• Run the end-of-year-report
• Consistency is much more important
43. 44
Looking into the future…
OUR_PRODUCTS
NAME PRICE
select name, price
from our_products
44. 45
Looking further into the future…
OUR_PRODUCTS
NAME PRICE
select name, price
from our_products
begin
DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME (
level => 'ASOF'
, query_time => TO_TIMESTAMP('01-10-2018', 'DD-MM-YYYY')
);
end;
45. 46
Current situation …
OUR_PRODUCTS
NAME PRICE
select name, price
from our_products
begin
DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME (
level => 'CURRENT'
);
end;
46. All data in the table
(the default setting)
47
OUR_PRODUCTS
NAME PRICE
select name, price
from our_products
begin
DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME (
level => 'ALL'
);
end;
47. All data in the table
(the default setting)
48
OUR_PRODUCTS
NAME PRICE
select name, price, start_date, end_date
from our_products
order
by start_date
START_DATE END_DATE
begin
DBMS_FLASHBACK_ARCHIVE.ENABLE_AT_VALID_TIME (
level => 'ALL'
);
end;
49. Make the database aware of the time based business validity
of records
• Add timestamp columns indicating start and end of valid time for a record
• Specify a PERIOD for the table
• Note:
• A table can have multiple sets of columns, describing multiple types of
temporal business validity
create table our_products
( name varchar2(100)
, price number(7,2)
, start_date timestamp
, end_date timestamp
, PERIOD FOR offer_time (start_date, end_date)
);
54. Data Integrity
• Why?
• Is it about truth?
• About regulations and by-the-book?
• Allow IT systems to run smoothly and not get confused?
• About auditability and non-repudiation?
• What about the real world?
• Data in IT is just a representation;
if the world is not by the book – what should IT do?
55. Blockchain
• Distributed
• Across trusted business partners
• Across public, anonymous parties
• Immutable
• Secured
• Trusted
• Smart Contracts
• Operations on data (without human intervention)
57. Graph Database
• Natural fit during development
• Superior (10-1000 times better)
performance
Person liked
by anyone
liked by Bob
Find People
liked by
anyone liked
by Bob
Find People
liked by
anyone liked
by Bob
61. SQL is not good at anything
• But it sucks at nothing
62. Relational Databases
• Based on relational model of data (E.F. Codd), a mathematical foundation
• Uses SQL for query, DML and DDL
• Transactions are ACID (Atomicity, Consistency, Isolation, Durability)
• All or nothing
• Constraint Compliant
• Individual experience
[in a multi-session environment]
(aka concurrency)
• Down does not hurt
63. ACID comes at a cost
• Transaction results have to be persisted [before the transaction completes]
in order to guarantee D
• Concurrency requires some degree of locking (and multi-versioning) in order
to have I
• Constraint compliance (unique key, foreign key) means all data hangs
together (as do all transactions)
in order to have C
• Two-phase commit (across multiple participants)
introduces complexity, dependencies and delays,
yet required for A
64. The holy grail of Normalization
• Normalize to prevent
• data redundancy
• discrepancies (split brain)
• storage waste
• However: we should
recognize the fact that
some data is read far more
frequently than that
it is created and modified
65. The Relational Model
in practice
• Traditional Relational Data Model has severe impact on physical disk
performance
• Transaction Log => Sequential Write (append to file)
• Data Blocks require much more expensive Random Access disk writes
• Indexes (B-Tree, Bitmap, …) are used to speed up query (read)
performance
• and slow down transactions
• Relational data does not [always] map naturally to the data format required
in the application (OO, JSON, XML)
• Capability to join and construct ad-hoc queries across the entire data model
is powerful
• Declarative integrity constraints allow for strict enforcement of data quality
rules
• “the data may be non sensical, but at least it adheres to the rules”
66. Databases re-evaluated
• Not all use cases require ACID (or can afford it)
• Read only (product catalog for web shops)
• Inserts only and no (inter-record) constraints
• Big Data collected and “dumped” in Data Lake (Hadoop) for subsequent
processing
• High performance demands
• Not all data needs structured formats or structured querying and JOINs
• Entire documents are stored and retrieved based on a single key
• Sometimes – scalable availability and productivity is more important than
Consistency – and ACID is sacrificed
• CAP-theorem states: Consistency [across nodes], Availability and
Partition tolerance can not all three be satisfied
67. NoSQL and BASE
• NoSQL arose because of performance and scalability
challenges with traditional/relational approach in Web Scale operations
• NoSQL is a label for a wide variety of databases that lack some aspect of a
true relational database
• ACID-ness, SQL, relational model, constraints
• The label has been used since 2009
• Perhaps NoREL would be more appropriate
• Some well known NoSQL products are
• Cassandra, MongoDB, Redis, CouchDB, …
• BASE as alternative to ACID:
• basically available, soft state, eventually consistent
(after a short duration)
68. Typical for NoSQL
• Focus on speed, availability and scalability
• Horizontal scale out – distributed with load balancing and fail-over
• No (predefined) Data Structure
• Integrity primarily protected by application logic
• Open Source (most offerings are, not all: MarkLogic)
• Close(r) attention for how the data is used
• Application oriented data format and search paths and specialized
database per application (microservice, capability)
• Similar to the switch from SOA to API/Microservice
• Reads (far) more relevant than writes
• Data redundancy & denormalization
• No data access through SQL – well, …
70. (leading) NoSQL Database
products
• MongoDB is (one of) the most popular (by any measure)
• Cloud (only):
• Google BigTable,
• AWS Dynamo
• Cache (in memory)
• ZooKeeper, Redis,
Coherence, Memcached,
Apache Ignite
(pka GridGain), …
• Hadoop/HDFS
• Oracle NoSQL
(fka Berkeley DB)
71. NoSQL means:
No Data Access through SQL
• However
• Data Professionals and
Developers speak SQL
• Reporting, Dashboarding,
ETL, BI tools speak SQL
• There is no common query
language across NoSQL
products
72. No Data Access through SQL
• However
• Data Professionals and
Developers speak SQL
• Reporting, Dashboarding,
ETL, BI tools speak SQL
• There is no common query
language across NoSQL
products
• Attempts from many vendors to create drivers that translate SQL statements
into NoSQL commands for the specific target database
• To protect existing investments in SQL – skills, tools, applications, reports,
..
73. SQL vs NoSQL
• SQL != RDBMS
• SQL on top of
• Hadoop – Spark SQL, Hive, Drill, Impala
• “External Table” Text files, CSV, Excel
• XML, JSON
• KSQL on Kafka events
• Google Spanner, BigQuery
• NoSQL – Berkeley DB, Hbase, Elastic Search,
MongoDB, Cassandra
74. NoSQL (MongoDB) vs
SQL (Oracle)
db.emp.find
( {"JOB":"SALESMAN"}
, { ENAME:1
, SAL:1}
)
.sort
( {'SAL':-1})
.limit(2)
select ename
, sal
from emp
where job = 'SALESMAN'
order
by sal desc
FETCH FIRST 2 ROWS ONLY
75. NoSQL (MongoDB) vs
SQL (Oracle)
db.emp.find
( {"JOB":"SALESMAN"
, $where :
" this.SAL +
(this.COMM != null?
this.COMM: 0)
> 2000"
}
)
select *
from emp
where sal + nvl(comm, 0)
> 2000
77. Why distributed?
• Because it is
• Business is physically spread out over multiple locations
• To achieve
• Scalability
• Performance (parallelism, latency)
• Resilience of the whole – availability (in the face of individual failure)
• (site) Disaster recovery
• Trust (e.g. blockchain)
• Applies to data & processes
78. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Distributed List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets
with Sales
records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
Global Content Delivery
Network
Offline Storage in
Apps
Real Application
Clusters
Distributed In Memory Cache Hazelcast,
MemCached, Redis, Coherence
Java EE Application
Server Cluster
SETI
Local storage on “Things” &
Edge devices
Active Standby
Database
SAN
Cross Cloud/On
Premises archive
Distributed Datastore MongoDB,
Cassandra, BigTable, HBase
Apache Spark Distributed Data
Processing
Logical Data Shards in Oracle
Database, MySQL, Elastic
HDFS Hadoop Distributed File
System
Kubernetes Distributed
Container Platform
Distributed Event Bus:
Kafka
82. Marketing &
Campaigns
Public Internet/External Actors
Gov Agency
ShippingSecurity
Finance
Accounts, Invoices
Supplier & Product
Management
Inventory &
Warehousing
Output
(print & mail, email,
SMS, …)
Inside the Enterprise
Data Department
Consolidation, MI, Reporting,
Analysis and R&D
Customer
Management
Order
Management
Data
providers
SupplierCustomers
B2B Partner Portal
Customer
Service
SaaS
Mobile
App
Big Data
Lake
Custom
Application for
Product Catalog
IoT Gateways
& Hub
SaaS ERP
Enterprise Content
Management
System
Human
Workflow
Engine Mail
Server
Data Warehouse
SaaS CRM
DaaS
Services
SaaS CX
Campaigns, Social
Media Monitor, 360
Customer View
LDAP for Users,
Roles &
Permissions
WebShop
Portal
Recommendation
Engine
Enterprise
Dashboard & BI
& Reporting
Security &
Compliance
Monitor
Desktop
Tools
Communication &
Collaboration tools
Logging Collector
& Monitor &
Analyzer
Monitor for
Application & Infra
metrics
Source
Code
Control
System
API
Gateway
Service
Bus
Event
Bus
Event
Bus
Rule
Engine
Desktop
Browser
Mobile
Devices
Email /
Facebook /
WhatsApp
Custom Order
Management
Application
Asset
Tracker
Corporate
DatabaseFile
Storage
Job
Scheduling
B2B
APIs
Open Data
APIs
APIs
Application
Server
Private
Blockchain
B2B
APIs
Docker
Container
Registry
Availability List of
Products
shown in UI
Personal Profile,
Order and
Payments Details
Smart Contracts
with supply
chain details
Recent
Consumer
purchases
information
Footage
from security
cameras
Readings
from motion
detectors
Emails regarding
customer
complaints
Spreadsheets
with Sales
records
Log-files from IT
systems (infra &
platform)
WebShop activity,
Social Media
discussions, …
ML Models
In Flight
Messages
Events
Job
Schedules
Application &
Infrastructure
source history
Offers, invoices,
rewards
messages
Shopping Cart
with selected
items
Order
Details
API usage,
billing, policies
Running & Past
workflow instances
Sales Aggregates
by Day, Region,
Product Category
Invoices &
Payments
Product
Manuals
Digital
Twin
KPIs & Alerts
Customer
Interaction records
Case files
(Complaints
, Requests)
Rules & Rule
Execution
metrics
Weather,
Demographics,
Sports, Social, …
Config
data
Customer
Details
Audit Trails,
Security
Incidents
ML Models
Programming
in progress
User Stories,
Designs, Discussions
Copy of
Production Data
in Acceptance
Global Content Delivery
Network
Webshop 24/7
on line
Relaxed availability (office
hours) for DWH
SaaS CRM less available
than desired
Fairly high availability for
[clusters of] things – not for
individual things
Active Standby
Database
SAN
Cross Cloud/On
Premises archive
Low availability demands
on Big Data
H/A for Oracle
Database
EventBus 24/7
on line
H/A for IoT
Hub
H/A for
LDAP
Fairly high availability for
[clusters of] things – not for
individual things
H/A during extended office
hours for human workflow
engine
Service Bus
24/7 on line
Some loss or service is acceptable for
recommendation engine
83. Availability of Data
• Availability:
• unplanned downtime (incident => disaster)
• planned (not desired) downtime (upgrade, patch to application, platform,
infra)
• Chain is as strong as the weakest link
• Availability is determined by least available component
• Datastore can drive (and help improve) availability of many
systems/applications/services
• Custom UI on top of SAP requires 99.95% up time – SAP only offers 98%
• Increase availability
• H/A architecture – multi-node cluster, hot standby and fail-over, disaster
recovery
• Rolling upgrades
• Single node for command, multiple (independent) helpers for query
84. Case of Web Shop
• Webshop – 1M visitors per day
• Product catalog consists of 15+ millions of records
• The web shop presents: product description, images, reviews, pricing details,
related offerings, stock status
• Some Products are added and updated and removed every day
• Although most products do not change very frequently
• Some vendors do bulk manipulation of product details
Products
Product updates
Webshop visits
- searches
- product details
- orders
85. Case of Web Shop –
Usage Patterns & Architecture
Products
Product updates
Webshop visits
- searches
- product details
- orders
firewall
Data manipulation
Data Quality (enforcement)
<10K transactions
Batch jobs next to online
Speed is nice
Read only
On line
Speed is crucial
XHTML & JSON
> 5M visits
86. Products
Products
Products
Webshop visits
- searches
- product details
- orders
firewall
Data manipulation
Data Quality (enforcement)
<10K transactions
Batch jobs next to online
Speed is nice
Read only
On line
Speed is crucial
XHTML & JSON
> 1M visits
DMZ
Read only
JSON documents
Images
Text Search
Scale Horizontally
Stale but consistent
Products
Nightly generation
Product updates
Case of Web Shop –
Usage Patterns & Architecture
88. CQRS – Multi Data Store
Hoe integreer je applicaties en data? 89
Products
Data Manipulation
Data
Retrieval
89. CQRS – Multi Data Store
Hoe integreer je applicaties en data? 90
Special
Products
Product
Clusters
ProductsData Manipulation
Data Retrieval
Food
Stuff
Toys
Quick Product
Search Index
Product Store in
SaaS app
90. CQRS in Oracle Database
Active Data Guard Standby
SAN
Middleware Middleware Middleware
T T
MV
MV
idx idx
IMDB
RAC
RAC
Shard
(12c R2)
Shard
(12c R2)
SAN
SAN
dbf
SGA
Redo
Logs
91. CQRS - Command and Query
Responsibility Segragation
• Data manipulation and retrieval in separate places
• (physical data proliferation)
• Query store is optimized
for consumers
• Level of detail, format,
filters applied
• For performance and
scalability, independence,
productivity
lower license fees and
lower TCO, security
92. Synchronizing the Query Stores
Special
Products
Product Clusters
Products
Data Manipulation
Data Retrieval
Food Stuff
Toys
Quick Product Search
Index
Product Store in
SaaS app
93. Synchronizing the Query Stores
• Depends on
• Freshness requirements
• Authorization demands
• Cost of synchronizing the query store (full synchronize vs event based)
• Usage pattern for query store
• Facilities available in Command store (and in query stores)
• Relative locations (e.g. cloud & on premises)
• Mechanisms
• Importing Database dump-file
(periodic, full or partial)
• Direct queries & DML
• Change Data Capture from transaction logs
• Event based
Special
Product
s
Product
Clusters
ProductsData Manipulation
Data Retrieval
Food
Stuff
Toys
Quick Product
Search Index
Product Store
in SaaS app
95. State is sum of changes
Source: https://ookami86.github.io/event-sourcing-in-practice/#how-eventsourcing-works
96. Take the UD out of CRUD
• Introducing the Immu Table
• A ledger of entity changes
• With a timestamp or event sequence
• And the entity identifier
• And the new values of the added, changed,
erased attributes
• Each event is an immutable record that is appended to the ledger – just
simply added to the end
• Atomic, very cheap compared to Update and Delete
– does not require a lock
- it does require random file access and rearranging blocks on disk
Bank Account Change Event
Event Type
Timestamp
Account Id
Amount
(New value for) Owner
Erased: some attribute
97. Event Log in Event Sourcing
• Primary Data Source is ledger of change events
• Not a store of the current state
• However: optionally use snapshots of baseline (state up until time)
• Entity Event Store replaces Table
• Offers a simple API for creating and retrieving events
• ‘Entity Change Event’ Producer (to which consumers can subscribe)
• To correct a mistake:
• Do not remove the event! (it happened, it may already have been
distributed)
• Instead, create a compensating event (and then it unhappened)
98. Event Log
• Audit Log
• Time travel
• Reconstruct system (application state)
• Distributed application state
• Support multiple (read) models
• Easy construct debugging environment – of exact situation and time
• What-if scenarios –take copy, inject event & play forward from there
• State = sum of change events
• State = snapshot plus sum of recent events
• To synch application state = current state + sum of events after the event
version number on which current state is based
99. To implement
Event Sourcing
• Take a data store
• That is distributed, scalable, available
• For example Apache Cassandra
• Create an Event Log table [for each business entity]
• Create columns for timestamp, event id,
change [event] type, entity identifier
• Create columns for all attributes
or a single column to hold a document (e.g. JSON)
• A special change type can be ‘snapshot’ to specify a baseline
• No older entries are needed in the event log
102. What is IT all about?
Application
Production Runtime
103. What is IT all about?
Application
Production Runtime
Platform
104. What is IT all about?
Application
Platform
Production Runtime
Operations
Monitoring &
Management
105. One team has Agile responsibility
through full lifecyle
Application
Platform
Production Runtime
Operations
Monitoring &
ManagementApplication
Preparation Runtime
Platform
Development
CD
Agile Design,
Build, Test
106. One team has Agile responsibility
through full lifecyle
Application
Platform
Production Runtime
Operations
Monitoring &
ManagementApplication
Preparation Runtime
Platform
Development
CD
Agile Design,
Build, Test
107. One team has Agile responsibility
through full lifecyle
Application
Platform
Application
Platform
108. DevOps team owns and runs
one (or more) products
Application
Platform
Generic Infrastructure Platform for running DevOps Products
Floorspace, Power,
Cooling, Storage,
Compute
Monitoring, Management,
Cache, Authentication,
RDBMS, Event Hub
109. Multiple products from multiple teams
run on a shared generic infrastructure
Generic Infrastructure Platform for running DevOps Products
Floorspace, Power,
Cooling, Storage,
Compute
Monitoring, Management,
Cache, Authentication,
RDBMS, Event Hub
Application
Platform
Application
Platform
Application
Platform
Application
Platform
Application
Platform
110. App plus platform under DevOps
== Microservice
Generic Infrastructure Platform for running DevOps Products
µ µ µ µ µ
111. App plus platform under DevOps
== Microservice• Stateless
• Horizontally scalable
• Mutually Independent
• upgrade, patch, relocate
• Can expose Public API (HTTP/REST)
and/or UI
• Communicate with each other through events
• Have their own bounded data context
• Do not rely on other microservices [for the data they need]
• Serverless – do not require allocated server, can be fired up
Generic Infrastructure Platform for running DevOps Products
µ µ µ µ µ
112. Microservices - objectives
• Minimize cost of change
• Maximize agility
• Isolate responsibility
• Reduce cohesion by minimizing dependencies
• logical, technical and runtime
• only standardized communication/interaction
• Independent, scalable processes
• Choreograhy (broadcast) preferred over Orchestration (direct call)
• Efficient operations
• Comprehendable, controllable IT
How do we get
from a Monolith
to Microservices?
113. Data in microservices
• Microservices are stateless & horizontally scalable
• Microservices are isolated & independent
• Where is their data?
• What about lookup data?
• Data not owned by the microservice –
but still required by it to perform its role => bounded context
115. Bounded context in microservices
• Micoservice needs to be able to run independently
• It needs to contain & own all data required to run
• It cannot depend on other microservices
API
Customer
APIUI
OrderCustomerModified event
117. Wrap Up
• Data used to be like T-Ford
• One model, one color
• And then:
118. Wrap Up
• Data comes in many shades (at least 50) – variations along many
dimensions
usage
Total Cost of Data Ownership
authorization
distribution
formatvolatility volume
ACID demands availability
freshness requirements
(staleness allowance)
location
speed
ownership
required consistency
119. Wrap Up
• Some form of CQRS is plain common sense
• Use fitting technology for the query challenge at hand
• Graph, Document, Relational, Key/Value, Column, Elastic Index, …
• Every organization will (should) have multiple data stores in various
technologies – and not just relational SQL
• Design & implement mechanism to synchronize
the query stores
• Events are attractive: decoupled, fine grained and fast
• Devise a purging strategy
• Stop carrying around your data legacy
120. Wrap Up
• All data is stale
• Consistency should be your main concern
• Microservices are stateless
• They can own state – in their private data store
• And maintain derived state – bounded context
• Events are published to allow microservices to synch their context
• Event Sourcing reduces complexity
• CRUD => CR
• Keep a ledger of data changes (book keeping of DML transactions)
• Reconstruct state – current or historical – from events
(into query store)
121. Wrap Up
• Data Integrity may be overrated
• Instead of enforcing constraints (reality may not be so clean) – identify
anomalies in data and act on them
• SQL sits on top of the world
• SQL [like query languages] run against a wide array of data stores,
including Streams, Big Data, NoSQL and CSV / Excel
• People and tools know SQL – make use of that
• Machine Learning and Artificial Intelligence are fueled by data
• They make the smallest, rawest, silliest piece of data potentially valuable
124. Thank you!
What is Apache Kafka and why is it important? 127
• Blog: technology.amis.nl
• Email: lucas.jellema@amis.nl
• : @lucasjellema
• : lucas-jellema
• : www.amis.nl, info@amis.nl
Editor's Notes
Fast data arrives in real time and potentially high volume. Rapid processing, filtering and aggregation is required to ensure timely reaction and actual information in user interfaces. Doing so is a challenge, make this happen in a scalable and reliable fashion is even more interesting. This session introduces Apache Kafka as the scalable event bus that takes care of the events as they flow in and Kafka Streams and KSQL for the streaming analytics. Both Java and Node applications are demonstrated that interact with Kafka and leverage Server Sent Events and WebSocket channels to update the Web UI in real time. User activity performed by the audience in the Web UI is processed by the Kafka powered back end and results in live updates on all clients.
Fast data arrives in real time and potentially high volume. Rapid processing, filtering and aggregation is required to ensure timely reaction and actual information in user interfaces. Doing so is a challenge, make this happen in a scalable and reliable fashion is even more interesting. This session introduces Apache Kafka as the scalable event bus that takes care of the events as they flow in and Kafka Streams for the streaming analytics. Both Java and Node applications are demonstrated that interact with Kafka and leverage Server Sent Events and WebSocket channels to update the Web UI in real time. User activity performed by the audience in the Web UI is processed by the Kafka powered back end and results in live updates on all clients. Introducing the challenge: fast data, scalable and decoupled event handling, streaming analytics Introduction of Kafka demo of Producing to and consuming from Kafka in Java and Nodejs clients Intro Kafka Stream API for streaming analytics Demo streaming analytics from java client Intro of web ui: HTML 5, WebSocket channel and SSE listener Demo of Push from server to Web UI - in general End to end flow: - IFTTT picks up Tweets and pushed them to an API that hands them to Kafka Topic. - The Java application Consumes these events, performs Streaming Analytics (grouped by hashtag and author and time window) and counts them; the aggregation results are produced to Kafka - The NodeJS application consumes these aggregation results and pushes them to Web UI - The WebUI displays the selected Tweets along with the aggregation results - in the Web UI, users can LIKE and RATE the tweets; each like or rating is sent to the server and produced to Kafka; these events are processed too through Stream Analytics and result in updated Like counts and Average Rating results; these are then pushed to all clients; this means that the audience can Tweet, see the tweet appear in the web ui on their own device, rate & like and see the ratings and like count update in real time
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
http://morocco.opendataforafrica.org/
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://specify.io/concepts/microservices
https://www.infoq.com/articles/microservices-aggregates-events-cqrs-part-1-richardson
http://microservices.io/patterns/data/event-sourcing.html
http://blog.kontena.io/event-sourcing-microservices-with-kafka/
https://specify.io/concepts/microservices
CQRS and Event Sourcing Applications with Cassandra - https://www.youtube.com/watch?v=3t8EUDiPfMQ
Martin Fowler - https://www.youtube.com/watch?v=aweV9FLTZkU
https://youtu.be/9a1PqwFrMP0?t=14m28s – Hospital – admit, transfer,transfer, discharge
Greg Young - https://www.youtube.com/watch?v=kZL41SMXWdM
All data stores are distributed
Or at least distributedly available
They can be local or on cloud (latency is important)
Data in generic data store is still owned by only one microservice – no one can touch it
Only in DWH and BigData do we deliberately take copies of data and disown them