Building reliable big data applications for news brands across the Benelux. The document discusses 5 key challenges in building big data applications for news brands: 1) collecting user-content interactions, 2) creating an easy testing environment, 3) ensuring data quality, 4) handling high event volumes, and 5) achieving real-time scalability. It provides solutions to each challenge, such as using open source tracking libraries to collect user data, implementing schema verification to check for corrupt data, leveraging technologies like Spark and Kafka to process and aggregate high volumes of event data, and using container orchestration to dynamically scale applications in response to fluctuating traffic levels.
High availability, real-time and scalable architecturesJampp
Presented at the Architecture Conference (ArqConf) in Buenos Aires, Argentina. Here is a 10,000ft view of our Real Time Bidding and Stream Processing architecture.
Presented at the AWS Summit in London, here's a deep dive on getting started with Amazon Kinesis and use-case with Jampp, the world's leading mobile app marketing platform.
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...HostedbyConfluent
Hermes, Germany's largest post-independent logistics service provider for deliveries, had one main goal—make faster and smarter data-driven business decisions. But with high volumes of diverse and disparate data, how can you effectively leverage it as an asset for real-time insights and business intelligence? During this session, Hermes will share their data challenges and how HVR's high volume data replication capabilities enabled Hermes to securely and seamlessly integrate data into Kafka for real-time decision-making and greater visibility into the entire logistics process.
This post talks about various architectural decision and their driving reasons which was taken to build an REST API which need to deliver large amount of reporting data.
Financial Event Sourcing at Enterprise Scaleconfluent
For years, Rabobank has been actively investing in becoming a real-time, event-driven bank. If you are familiar with banking processes, you will understand that this is not simple. Many banking processes are implemented as batch jobs on not-so-commodity hardware, meaning that any migration effort is immense.
*Find out how Rabobank redesigned Rabo Alerts while continuing to provide a robust and stable alert system for its existing user base
*Learn how the project team managed to achieve a balance between the need to decentralise activity while not losing control
*Understand how Rabobank re-invented a reliable service to meet modern customer expectations
Business breakout during Confluent’s streaming event in Munich, presented by Falko Schwarz, VP CEMEA at Confluent. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
High availability, real-time and scalable architecturesJampp
Presented at the Architecture Conference (ArqConf) in Buenos Aires, Argentina. Here is a 10,000ft view of our Real Time Bidding and Stream Processing architecture.
Presented at the AWS Summit in London, here's a deep dive on getting started with Amazon Kinesis and use-case with Jampp, the world's leading mobile app marketing platform.
Achieving Real-Time Analytics at Hermes | Zulf Qureshi, HVR and Dr. Stefan Ro...HostedbyConfluent
Hermes, Germany's largest post-independent logistics service provider for deliveries, had one main goal—make faster and smarter data-driven business decisions. But with high volumes of diverse and disparate data, how can you effectively leverage it as an asset for real-time insights and business intelligence? During this session, Hermes will share their data challenges and how HVR's high volume data replication capabilities enabled Hermes to securely and seamlessly integrate data into Kafka for real-time decision-making and greater visibility into the entire logistics process.
This post talks about various architectural decision and their driving reasons which was taken to build an REST API which need to deliver large amount of reporting data.
Financial Event Sourcing at Enterprise Scaleconfluent
For years, Rabobank has been actively investing in becoming a real-time, event-driven bank. If you are familiar with banking processes, you will understand that this is not simple. Many banking processes are implemented as batch jobs on not-so-commodity hardware, meaning that any migration effort is immense.
*Find out how Rabobank redesigned Rabo Alerts while continuing to provide a robust and stable alert system for its existing user base
*Learn how the project team managed to achieve a balance between the need to decentralise activity while not losing control
*Understand how Rabobank re-invented a reliable service to meet modern customer expectations
Business breakout during Confluent’s streaming event in Munich, presented by Falko Schwarz, VP CEMEA at Confluent. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...confluent
(Chris Maier + Steven Royster, West Monroe Partners) Kafka Summit SF 2018
The value of real-time data is growing as an increasing number of companies look to provide a comprehensive experience for their customers. Utilizing Kafka in key facets of your organization will yield greater customer satisfaction and promote a better understanding of user interactions. As data streaming is becoming more prevalent in a wide variety of industries, companies are seeking to modernize their tech stacks by employing the extensible, scalable infrastructure afforded by Kafka.
Over the past few months, we have successfully developed a containerized Kafka implementation at a major healthcare provider. In addition, we created producers to publish messages to the Kafka cluster and consumers to receive them on the other end. By capturing a plethora of data around customer activity, we created opportunities for the business to act upon real-time metrics in order to provide an improved customer experience.
In this talk, we will cover the user-related data sources we connected to Kafka, the reasons we chose them, and how the insights gained from each source can be leveraged in your business. You will walk out understanding how capturing a wide variety of customer activity data can create opportunities for the business to act on real-time metrics in order to provide an improved customer experience.
How to Quantify the Value of Kafka in Your Organization confluent
(Lyndon Hedderly, Confluent) Kafka Summit SF 2018
We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event driven?
The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon and Facebook (based on stock prices July 2017). We will go on to discuss other digital natives: Uber, Ebay, Netflix and LinkedIn, before exploring more traditional companies across retail, finance and automotive. Next, we’ll look at non-businesses such as governments and lobbyists. Whether organizations are using data to create new business products and services, improve user experiences, increase productivity, manage risk or influencing global power, we’ll see that fast and interconnected data, or “event streaming” is increasingly important.
After showing that data value can be quantified, the second half of this talk will explain the five steps to creating a business case.
Most businesses focus on:
-Making more money or conferring competitive advantage to make more money
-Increasing efficiency to save money and/or
-Mitigating risk to the business to protect money
-We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in big money and big business, as well as big data, this talk is for you.
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...Databricks
TomTom has the mission of creating a world free of congestion and better driving experience. In order to do that, we need to understand driving behavoiur from end users, at the same time that we optimize the operational costs of our services. However, due to the large scale of our probe data from vehicles providing insights and performing advanced analytics can can be quite challenging.
During this discussion I will showcase two use cases where Databricks, Delta Lake and MLflow has enabled us to accelerate innovation. The first one is the IQMaps usecase. IQMaps is a system designed specifically for in-dash systems – taking the same up-to-date user experience you expect from navigation apps and bringing it to reliable, in-car navigation. IQ Maps learn the drivers’ driving patterns and updates the map regions that are most relevant to the user, using Wi-Fi or 4G. However, optimizing the data network consumption, which can have a high cost, while keeping the best driving experience, by having the map updated, requires complex simulations using millions of locations traces from vehicles. Apache Spark has been our key instrument to find the best balance to this trade off. The second use case is Destination Prediction. For many years, we have offered a personalized feature on our navigation products that predicts with high accuracy the driver’s next destination. Nonetheless, with the exponential increase and availability of data, and the access to more sophisticated Machine Learning models, we have revisited this feature to take it to the next level. Both us ecases take advantage of the latest frameworks and tools available on Databricks. With MLflow and Delta we have been able to find the best models that predict the destination for each individual driver, and to track each one of the KPIs.
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
Apache Kafka is used as the primary message bus for propagating events and logs across Uber. In particular, it pairs with Apache Pinot, a real-time distributed OLAP datastore, to deliver real-time insights seconds after the messages produced to Kafka.
One challenge we faced was to update existing data in Pinot with the changelog in Kafka, and deliver an accurate view in the real-time analytical results. For example, the financial dashboard can report gross booking with the corrected Ride fares. And restaurant owners can analyze the UberEats orders with their latest delivery status.
Implementing upserts in an immutable real-time OLAP store like Pinot is nontrivial. We need to make architectural changes in how data is distributed via Kafka amongst the server nodes, how it's indexed and queried in a distributed fashion. In this talk I will discuss how we leveraged Kafka's partition-by-key feature to this end and how we added this ability in Pinot without any performance degradation.
View this talk here: https://www.confluent.io/online-talks/connecting-apache-kafka-to-cash-lyndon-hedderly
Real-time data has value. But how do you quantify that value in order to create a business case for becoming data, or event driven? This talk explores why valuing Kafka is important - but covers some of the problems in quantifying the value of a data infrastructure platform.
Despite the challenges, we will explore some examples of where we have attributed a quantified monetary amount to Kafka across specific business use cases, within Retail, Banking and Automotive.
Whether organizations are using data to create new business products and services, improving user experiences, increasing productivity, or managing risk, we’ll see that fast and interconnected data, or ‘event streaming’ is increasingly important. We will conclude with the five steps to creating a business case around Kafka use cases.
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
Presented at Snowplow London Meetup, 8 February 2017
Christophe Bogaert, Data Scientist at Snowplow, talked about how businesses are constantly evolving, why that means their analytics stack needs to evolve with it and how Snowplow supports that evolution. With Snowplow, you can flexibly define the events and entities to represent your business. Finally, he talked about event data modeling and how to handle the evolution of your data pipeline.
Digital Transformation Mindset - More Than Just Technologyconfluent
Many enterprises faced with silo’ed, batch-oriented, legacy systems struggle to compete in this new digital-first world. Adhering to the ‘If it’s not broken don’t fix it’ mentality leaves the door wide open for native digital challengers to grow and succeed. To stay competitive, your organization must respond in real time to every customer experience transaction, sale, and market movement. But how do you get there? First, you must change your mindset.
As streaming platforms become central to data strategies, companies both small and large are re-thinking their enterprise architecture with real-time context at the forefront. Monoliths are evolving into microservices. Datacenters are moving to the cloud. What was once a ‘batch’ mindset is quickly being replaced with stream processing as the demands of the business impose real-time requirements on technology leaders.
Join Argyle, in partnership with Confluent, in our 2018 CIO Virtual Event: The Digital Transformation Mindset – More Than Just Technology. During the webinar we’ll learn how leading companies across industries rely on a streaming platform to make event-driven architectures central to:
• How data strategies and IT initiatives are improving the digital customer experiences
• How executives are reducing risk with real time monitoring and anomaly detection
• Increasing operational agility with microservices and IoT architectures within organizations
At Demandbase, our Product Team has been hard at work innovating and building exciting new technology to support your ABM strategy and programs. We’re eager to share our progress and roadmap with you, including the latest and greatest features as well as what’s to come.
This talk will be about the reasons behind the new stream processing model, how it compare to the old batch model, what are their pros and cons, and a list of existing technologies implementing stream processing with their most prominent characteristics. It will contain details of one possible use-case of data streaming that is not possible with batches: display in (near) real-time all trains in Switzerland and their position on a map, beginning with an overview of all the requirements and the design. Finally, using an OpenData endpoint and the Hazelcast platform,showing a working demo implementation of it.
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Broker. What are the benefits of placing an Event Broker in a Modern Data (Analytics) Architecture? What exactly is an Event Broker and what capabilities should it provide? Why is Apache Kafka the most popular realisation of an Event Broker?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Broker.
Then the session will highlight the different architecture styles which can be supported using an Event Broker (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Broker the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented using DataStax Enterprise as the backend.
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL confluent
(Sönke LIebau, OpenCore GmbH & Co.KG) Kafka Summit SF 2018
Airports are complex networks consisting of an immense number of systems that are necessary to keep the daily stream of passengers in constant motion. Connecting these systems in order to make the big picture transparent to the people running the show, authorities and last but not least the passengers is no simple endeavor.
In this talk I will describe a fictional airport and its effort to restructure the IT infrastructure around Kafka Streams to serve the real-time data needs of a busy airport. I will start by giving a brief overview of Kafka Streams, KSQL and the opportunities they offer for real-time stream processing. Following that we will explore the the target architecture, which relies heavily on manifested views to serve up-to-date data, while also persisting to a traditional data lake for larger analytics workflows. Additionally we will take a look at the generic data transformation framework that was created to minimize integration effort of the data receiving systems. To illustrate these ideas I will describe some examples of possible integrations: joining flight data with radar and weather data to predict arrival time at the gate down to the second, constantly updated processing data from the luggage conveyor belts as well as results from prediction models for passenger flow, and many more.
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
Massive amounts of data generated from mobile devices, M2M communications, sensors and other IoT devices is redefining the world. What kind of applications will you build to take advantage of this data and provide value to your customers? What technologies are out there to help you? This deck will illustrate the difference between fast OLAP, stream-processing, and OLTP database solutions. You will also learn the importance of state, real-time analytics and real-time decisions when building applications on streaming data, and how streaming applications deliver more value when built on a super-fast in-memory, SQL database. To view the webinar in its entirety, click here: http://learn.voltdb.com/WRFastDataAppsTopContenders.html
CCT is a web app developed to help the project manager to have an overview of the transfers made by his team. It is a web app developed entirely in Python, HTML, css. I also used Flask to connect to the server.
CCT allowed to: add new transfers, show charts related to the types of costs and value, produce a PDF document to download, automatically calculate the sum of the costs made and look for new users on Github to cover missing skills.
Data reply sneak peek: real time decision enginesconfluent
Events happen constantly in every business: a purchase in an online shop, a credit limit is hit, the mobile internet plan has been exhausted, users interact with a website. Events rule the business world. So why would you react to them hours or days later? Real-Time Decision Engines enable a variety of use cases, driving new products, increasing user experience, reducing costs and risks by reacting instantly to business events.
From personalized instantaneous marketing campaigns to reacting to user interactions, Real-Time is the key to open up a world of use cases that batch and scheduled processing cannot efficiently satisfy. In this talk, we are going to show some example use cases that Data Reply developed for some of its customers and how Real-Time Decision Engines had an impact on their businesses.
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
Deep.bi It helps ecommerce teams improve their performance by providing current and detailed insights.
It bring operational excellence and performance for:
- Category Managers / Merchandisers
- Marketers
- Customer service
- UX / Design Team
- Tech / IT
- Executives / Managers
Big data is an opportunity for communications service providers (CSPs) to create the intelligence for operating their infrastructures more efficiently, to analyze the success of their services, and to create a better personal experience for their customers.
CSP Top executives, Network and IT managers and Marketing, are eager to exploit the large amount of information to achieve better business decisions. They expect their Chief Technical Officer to provide end-to-end analytic solutions based on the data available in their IT and network infrastructure.
This presentation analyzes the complete value chain that can transform CSPs’ data to knowledge. It covers the sources of information, the data collection tools, the analytic platforms providing quick data access, and finally the business intelligence use cases with the presentation and visualization of the results and predictions.
Using Kafka in Your Organization with Real-Time User Insights for a Customer ...confluent
(Chris Maier + Steven Royster, West Monroe Partners) Kafka Summit SF 2018
The value of real-time data is growing as an increasing number of companies look to provide a comprehensive experience for their customers. Utilizing Kafka in key facets of your organization will yield greater customer satisfaction and promote a better understanding of user interactions. As data streaming is becoming more prevalent in a wide variety of industries, companies are seeking to modernize their tech stacks by employing the extensible, scalable infrastructure afforded by Kafka.
Over the past few months, we have successfully developed a containerized Kafka implementation at a major healthcare provider. In addition, we created producers to publish messages to the Kafka cluster and consumers to receive them on the other end. By capturing a plethora of data around customer activity, we created opportunities for the business to act upon real-time metrics in order to provide an improved customer experience.
In this talk, we will cover the user-related data sources we connected to Kafka, the reasons we chose them, and how the insights gained from each source can be leveraged in your business. You will walk out understanding how capturing a wide variety of customer activity data can create opportunities for the business to act on real-time metrics in order to provide an improved customer experience.
How to Quantify the Value of Kafka in Your Organization confluent
(Lyndon Hedderly, Confluent) Kafka Summit SF 2018
We all know real-time data has a value. But how do you quantify that value in order to create a business case for becoming more data, or event driven?
The first half of this talk will explore the value of data across a variety of organizations, starting with the five most valuable companies in the world: Apple, Alphabet (Google), Microsoft, Amazon and Facebook (based on stock prices July 2017). We will go on to discuss other digital natives: Uber, Ebay, Netflix and LinkedIn, before exploring more traditional companies across retail, finance and automotive. Next, we’ll look at non-businesses such as governments and lobbyists. Whether organizations are using data to create new business products and services, improve user experiences, increase productivity, manage risk or influencing global power, we’ll see that fast and interconnected data, or “event streaming” is increasingly important.
After showing that data value can be quantified, the second half of this talk will explain the five steps to creating a business case.
Most businesses focus on:
-Making more money or conferring competitive advantage to make more money
-Increasing efficiency to save money and/or
-Mitigating risk to the business to protect money
-We’ll walk through examples of real business cases, discuss how business cases have evolved over the years and show the power of a sound business case. If you’re interested in big money and big business, as well as big data, this talk is for you.
Driver Location Intelligence at Scale using Apache Spark, Delta Lake, and ML...Databricks
TomTom has the mission of creating a world free of congestion and better driving experience. In order to do that, we need to understand driving behavoiur from end users, at the same time that we optimize the operational costs of our services. However, due to the large scale of our probe data from vehicles providing insights and performing advanced analytics can can be quite challenging.
During this discussion I will showcase two use cases where Databricks, Delta Lake and MLflow has enabled us to accelerate innovation. The first one is the IQMaps usecase. IQMaps is a system designed specifically for in-dash systems – taking the same up-to-date user experience you expect from navigation apps and bringing it to reliable, in-car navigation. IQ Maps learn the drivers’ driving patterns and updates the map regions that are most relevant to the user, using Wi-Fi or 4G. However, optimizing the data network consumption, which can have a high cost, while keeping the best driving experience, by having the map updated, requires complex simulations using millions of locations traces from vehicles. Apache Spark has been our key instrument to find the best balance to this trade off. The second use case is Destination Prediction. For many years, we have offered a personalized feature on our navigation products that predicts with high accuracy the driver’s next destination. Nonetheless, with the exponential increase and availability of data, and the access to more sophisticated Machine Learning models, we have revisited this feature to take it to the next level. Both us ecases take advantage of the latest frameworks and tools available on Databricks. With MLflow and Delta we have been able to find the best models that predict the destination for each individual driver, and to track each one of the KPIs.
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...HostedbyConfluent
Apache Kafka is used as the primary message bus for propagating events and logs across Uber. In particular, it pairs with Apache Pinot, a real-time distributed OLAP datastore, to deliver real-time insights seconds after the messages produced to Kafka.
One challenge we faced was to update existing data in Pinot with the changelog in Kafka, and deliver an accurate view in the real-time analytical results. For example, the financial dashboard can report gross booking with the corrected Ride fares. And restaurant owners can analyze the UberEats orders with their latest delivery status.
Implementing upserts in an immutable real-time OLAP store like Pinot is nontrivial. We need to make architectural changes in how data is distributed via Kafka amongst the server nodes, how it's indexed and queried in a distributed fashion. In this talk I will discuss how we leveraged Kafka's partition-by-key feature to this end and how we added this ability in Pinot without any performance degradation.
View this talk here: https://www.confluent.io/online-talks/connecting-apache-kafka-to-cash-lyndon-hedderly
Real-time data has value. But how do you quantify that value in order to create a business case for becoming data, or event driven? This talk explores why valuing Kafka is important - but covers some of the problems in quantifying the value of a data infrastructure platform.
Despite the challenges, we will explore some examples of where we have attributed a quantified monetary amount to Kafka across specific business use cases, within Retail, Banking and Automotive.
Whether organizations are using data to create new business products and services, improving user experiences, increasing productivity, or managing risk, we’ll see that fast and interconnected data, or ‘event streaming’ is increasingly important. We will conclude with the five steps to creating a business case around Kafka use cases.
How to evolve your analytics stack with your business using SnowplowGiuseppe Gaviani
Presented at Snowplow London Meetup, 8 February 2017
Christophe Bogaert, Data Scientist at Snowplow, talked about how businesses are constantly evolving, why that means their analytics stack needs to evolve with it and how Snowplow supports that evolution. With Snowplow, you can flexibly define the events and entities to represent your business. Finally, he talked about event data modeling and how to handle the evolution of your data pipeline.
Digital Transformation Mindset - More Than Just Technologyconfluent
Many enterprises faced with silo’ed, batch-oriented, legacy systems struggle to compete in this new digital-first world. Adhering to the ‘If it’s not broken don’t fix it’ mentality leaves the door wide open for native digital challengers to grow and succeed. To stay competitive, your organization must respond in real time to every customer experience transaction, sale, and market movement. But how do you get there? First, you must change your mindset.
As streaming platforms become central to data strategies, companies both small and large are re-thinking their enterprise architecture with real-time context at the forefront. Monoliths are evolving into microservices. Datacenters are moving to the cloud. What was once a ‘batch’ mindset is quickly being replaced with stream processing as the demands of the business impose real-time requirements on technology leaders.
Join Argyle, in partnership with Confluent, in our 2018 CIO Virtual Event: The Digital Transformation Mindset – More Than Just Technology. During the webinar we’ll learn how leading companies across industries rely on a streaming platform to make event-driven architectures central to:
• How data strategies and IT initiatives are improving the digital customer experiences
• How executives are reducing risk with real time monitoring and anomaly detection
• Increasing operational agility with microservices and IoT architectures within organizations
At Demandbase, our Product Team has been hard at work innovating and building exciting new technology to support your ABM strategy and programs. We’re eager to share our progress and roadmap with you, including the latest and greatest features as well as what’s to come.
This talk will be about the reasons behind the new stream processing model, how it compare to the old batch model, what are their pros and cons, and a list of existing technologies implementing stream processing with their most prominent characteristics. It will contain details of one possible use-case of data streaming that is not possible with batches: display in (near) real-time all trains in Switzerland and their position on a map, beginning with an overview of all the requirements and the design. Finally, using an OpenData endpoint and the Hazelcast platform,showing a working demo implementation of it.
Why use big data tools to do web analytics? And how to do it using Snowplow a...yalisassoon
There are a number of mature web analytics products that have been on the market for ~20 years. Big data tools have only really taken off in the last 5 years. So why use big data tools mine web analytics data?
In this presentation, I explore the limitations of traditional approaches to web analytics, and explain how big data tools can be used to address those limitations and drive more value from the underlying data. I explain how a combination of Snowplow and Qubole can be used to do this in practice
Event Broker (Kafka) in a Modern Data ArchitectureGuido Schmutz
Today's modern data architectures and the their implementations contain an Event Broker. What are the benefits of placing an Event Broker in a Modern Data (Analytics) Architecture? What exactly is an Event Broker and what capabilities should it provide? Why is Apache Kafka the most popular realisation of an Event Broker?
These and many other questions will be answered in this session. The talk will start with a vendor-neutral definition of the capabilities of an Event Broker.
Then the session will highlight the different architecture styles which can be supported using an Event Broker (Kafka), such as Streaming Data Integration, Stream Analytics and Decoupled Event-Driven Applications and how can these be combined into a unified architecture, making the Event Broker the central nervous system of an enterprise architecture. We will end with an overview of the Kafka ecosystem and a placement of the various components onto the Modern Data (Analytics) Architecture.
Customer Event Hub – a modern Customer 360° view with DataStax Enterprise (DSE) Guido Schmutz
Today, companies are using various channels to communicate with their customers. As a consequence, a lot of data is created, more and more also outside of the traditional IT infrastructure of an enterprise. This data often does not have a common format and they are continuously created with ever increasing volume. With Internet of Things (IoT) and their sensors, the volume as well as the velocity of data just gets more extreme.
To achieve a complete and consistent view of a customer, all these customer-related information has to be included in a 360 degree view in a real-time or near-real-time fashion. By that, the Customer Hub will become the Customer Event Hub. It constantly shows the actual view of a customer over all his interaction channels and provides an enterprise the basis for a substantial and effective customer relation.
In this presentation the value of such a platform is shown and how it can be implemented using DataStax Enterprise as the backend.
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Serving the Real-Time Data Needs of an Airport with Kafka Streams and KSQL confluent
(Sönke LIebau, OpenCore GmbH & Co.KG) Kafka Summit SF 2018
Airports are complex networks consisting of an immense number of systems that are necessary to keep the daily stream of passengers in constant motion. Connecting these systems in order to make the big picture transparent to the people running the show, authorities and last but not least the passengers is no simple endeavor.
In this talk I will describe a fictional airport and its effort to restructure the IT infrastructure around Kafka Streams to serve the real-time data needs of a busy airport. I will start by giving a brief overview of Kafka Streams, KSQL and the opportunities they offer for real-time stream processing. Following that we will explore the the target architecture, which relies heavily on manifested views to serve up-to-date data, while also persisting to a traditional data lake for larger analytics workflows. Additionally we will take a look at the generic data transformation framework that was created to minimize integration effort of the data receiving systems. To illustrate these ideas I will describe some examples of possible integrations: joining flight data with radar and weather data to predict arrival time at the gate down to the second, constantly updated processing data from the luggage conveyor belts as well as results from prediction models for passenger flow, and many more.
How to Build Fast Data Applications: Evaluating the Top ContendersVoltDB
Massive amounts of data generated from mobile devices, M2M communications, sensors and other IoT devices is redefining the world. What kind of applications will you build to take advantage of this data and provide value to your customers? What technologies are out there to help you? This deck will illustrate the difference between fast OLAP, stream-processing, and OLTP database solutions. You will also learn the importance of state, real-time analytics and real-time decisions when building applications on streaming data, and how streaming applications deliver more value when built on a super-fast in-memory, SQL database. To view the webinar in its entirety, click here: http://learn.voltdb.com/WRFastDataAppsTopContenders.html
CCT is a web app developed to help the project manager to have an overview of the transfers made by his team. It is a web app developed entirely in Python, HTML, css. I also used Flask to connect to the server.
CCT allowed to: add new transfers, show charts related to the types of costs and value, produce a PDF document to download, automatically calculate the sum of the costs made and look for new users on Github to cover missing skills.
Data reply sneak peek: real time decision enginesconfluent
Events happen constantly in every business: a purchase in an online shop, a credit limit is hit, the mobile internet plan has been exhausted, users interact with a website. Events rule the business world. So why would you react to them hours or days later? Real-Time Decision Engines enable a variety of use cases, driving new products, increasing user experience, reducing costs and risks by reacting instantly to business events.
From personalized instantaneous marketing campaigns to reacting to user interactions, Real-Time is the key to open up a world of use cases that batch and scheduled processing cannot efficiently satisfy. In this talk, we are going to show some example use cases that Data Reply developed for some of its customers and how Real-Time Decision Engines had an impact on their businesses.
Deep.bi - Real-time, Deep Data Analytics Platform For EcommerceDeep.BI
Deep.bi It helps ecommerce teams improve their performance by providing current and detailed insights.
It bring operational excellence and performance for:
- Category Managers / Merchandisers
- Marketers
- Customer service
- UX / Design Team
- Tech / IT
- Executives / Managers
Big data is an opportunity for communications service providers (CSPs) to create the intelligence for operating their infrastructures more efficiently, to analyze the success of their services, and to create a better personal experience for their customers.
CSP Top executives, Network and IT managers and Marketing, are eager to exploit the large amount of information to achieve better business decisions. They expect their Chief Technical Officer to provide end-to-end analytic solutions based on the data available in their IT and network infrastructure.
This presentation analyzes the complete value chain that can transform CSPs’ data to knowledge. It covers the sources of information, the data collection tools, the analytic platforms providing quick data access, and finally the business intelligence use cases with the presentation and visualization of the results and predictions.
Data-based business models: How to turn your data into a goldmine?diconium
- Data Product Maturity
- Exemplary benchmarks?
- Digital Commerce Maturity Level
- The real nugget, the data platform!
- Which industry is next?
- Make or buy?
[Notes] Customer 360 Analytics with LEO CDPTrieu Nguyen
Part 1: Why should every business need to deploy a CDP ?
1. Big data is the reality of business today
2. What are technologies to manage customer data ?
3. The rise of first-party data and new technologies for Digital Marketing
4. How to apply USPA mindset to build your CDP for data-driven business
Part 2: How to use LEO CDP for your business
1. Core functions of LEO CDP for marketers and IT managers
2. Data Unification for Customer 360 Analytics
3. Data Segmentation
4. Customer Personalization
5. Customer Data Activation
Part 3: Case study in O2O Retail and Ecommerce
1. How to build customer journey map for ecommerce and retail
2. How to do customer analytics to find ideal customer profiles
The ideal customer profile in a B2B context
The ideal customer profile in a B2C context
3. Manage product catalog for customer personalization
4. Monitoring Data of Customer Experience (CX Analytics)
CX Data Flow
CX Rating plugin is embedded in the website, to collect feedback data
An overview of CX Report
A CX Report in a customer profile
5. Monitoring data with real-time event tracking reports
Event Data Flow
Summary Event Data Report
Event Data Report in a Customer Profile
Part 4: How to setup an instance of LEO CDP for free
1. Technical architecture
2. Server infrastructure
3. Setup middlewares: Nginx, ArangoDB, Redis, Java and Python
Network requirements
Software requirements for new server
ArangoDB
Nginx Proxy
SSL for Nginx Server
Java 8 JVM
Redis
Install Notes for Linux Server
Clone binary code for new server
Set DNS hosts for LEO CDP workers
4. Setup data for testing and system verification
Part 5: Summary all key ideas
Driving Better Products with Customer Intelligence Cloudera, Inc.
In today’s fast moving world, the ability to capture and process massive amounts of data and make valuable insights is key to gaining a competitive advantage. For RingCentral, a leader in Unified Communications, this is very true since they work with over 350,000 organizations worldwide. With such scale, it can be difficult to address quality issues when they appear while supporting additional calls.
Big Data Paris - A Modern Enterprise ArchitectureMongoDB
Depuis les années 1980, le volume de données produit et le risque lié à ces données ont littéralement explosé. 90% des données existantes aujourd’hui ont été créé ces 2 dernières années, dont 80% sont non structurées. Avec plus d’utilisateurs et le besoin de disponibilité permanent, les risques sont beaucoup plus élevés.
Quels sont les paramètres de bases de données qu’un décideur doit prendre en compte pour déployer ses applications innovantes?
This is developed to provide real-time analytic from collected customers' online activities data in e-commerce and finance.
It's delivering very adaptive use scenarios to mktg managers and campaign planners with clear & useful customer insights thru basic & advanced analysis.
Trivadis TechEvent 2016 Customer Event Hub - the modern Customer 360° view by...Trivadis
Unternehmen kommunizieren heute über verschiedenste Kanäle mit ihren Kunden. Dabei entstehen viele Daten in unterschiedlichen Systemen, immer öfter auch ausserhalb des Unternehmens. Diese Daten haben oft keine einheitlichen Formate und werden kontinuierlich und mit grösser werdendem Volumen erzeugt. Mit IoT Anwendungen wird dies nur noch extremer. Um eine komplette und konsistente Sicht über den Kunden zu haben, müssen all diese kundenbezogenen Informationen in eine 360 Grad Sicht einbezogen werden und dies möglichst in Echtzeit. Der Customer Hub wird damit zum Customer Event Hub
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
Sales Force Automation (SFA) and Customer Relationship Management (CRM) tools, such as Salesforce.com and Microsoft Dynamics CRM, are ubiquitous tools that provide all of the transactional capabilities required to manage a company's sales pipeline. SFA and CRM data alone, however, is limited and so combining it with information from other sources enables you to create unique and powerful insights. When combined with product and financial data, for example, get visibility into relationships between geographies, sales reps, product performance, and revenue to ultimately optimize profits. Layer on advanced analytic to make predictions about future product sales based on seasonality and other market conditions. To unleash the full power of the CRM and dramatically increase operational performance and top-line revenue, companies are leveraging advanced analytic and data visualization to deliver new insights to the entire sales organization. Moreover, delivering these sales enablement productivity solutions on mobile devices, ensures strong adoption across every sales team. Join us in this webinar to learn how to use MicroStrategy together with Amazon Redshift to build mobile sales productivity solutions for your business.
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTKiththi Perera
ITU-TRCSL Symposium on Cloud Computing 2015 Colombo
Session 04: Big Data Strategy in the Cloud and Applications
Speaker's PPT by K. A. Kiththi Perera, Chief Enterprise and Wholesale Officer, Sri Lanka Telecom
If your business is heavily dependent on the Internet, you may be facing an unprecedented level of network traffic analytics data. How to make the most of that data is the challenge. This presentation from Kentik VP Product and former EMA analyst Jim Frey explores the evolving need, the architecture and key use cases for BGP and NetFlow analysis based on scale-out cloud computing and Big Data technologies.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
4. Data strategy
Building a data foundation to generate value with data for:
Increase conversion
&
Personal Offer
personalized
ads
better results
uplift of page views,
unique visitors
& time on site
respectful
&
reducing risk
3
Newsroom
2
Digital
advertising
1
Subscriptions
Marketing
4
Compliancy
& security
5. Digital
advertising
News
room
Subscription
Marketing
360 ° PROFILE AND DATALAKE
IN THE CENTRE OF THE ORGANIZATION
Increase
automated
conversion on
paywall and
newspaper.nl
Automated CTR
optimization
Audiences
visitor behavior
for news
innovations
b2c marketing
datafundament
tbv Consumer
Intelligence
b2b Sales
data fundament
6. Optimize Media channels, creation and platform.
- Channels: Where can we find our consumer and find the best way to convince them.
- Creation: How do we best appeal to you as a consumer and in what format?
- Platform: In what phase is the consumer and do we convert him to sales or more engagement?
-
Data: Online subscriptions marketing
8. KONING
VOETBAL
Specifieke interesse in voetbal, maar leest ook andere
sporten. Checkt regelmatig voetbalcenter voor uitslagen.
kenmerk per
gebruiker
aantal
# bezoeken per
maand
5,8
# pagina’s per
bezoek
6,5
# artikelpagina’s 79
% kijkt video’s 19
% crossdomain
landelijk / regionaal
15/20
% ingelogd 4,9
9. Reduce the distance to Google and Facebook
• Strong brands (Volkskrant, Parool, Trouw, AD, tweakers, Qmusic etc ..)
• Link with demographic characteristics through CRM data
• Create audiences based on behaviour
• Demand from larger advertisers is growing to be less dependent on Google
or Facebook while maintaining results
→ Closing step by step by building data in 2 zones
1. Demographic and behavioral data
2. Intent data
Improve service for advertisers and close gap with Google and Facebook
10. ● How successful is my story
● Via which channels van I need to publish
● Can I improve the header
● Should we create a follow up?
11. RAW layer
Master
(datamarts)
Clean layer
Batch / micro batch
Data catalog raw
● Source
● Owner
● Location
● Frequentie
● Description
● Consent
● Delta /full
Data catalog Clean
● Consent
● PI data (hashed)
● frequentie
● lookuptables
● field description
🕐Airflow Ingestion
Monitoring/ alerting
👤 Acces by owner and
dataprocessor
🕐Airflow transformation
Monitoring/ alerting
Consumers van data
(CI/BI/CX/IT/DCC)
● Dataiku
● Databricks
● Redshift (Spectrum)
● Athena
● Looker / Clicksense
S
3
S
3
🕐Airflow / data
transformation
👤 Acces role based /PI
data hashed
Logging user
Trails
Monitoring
Performance
/costs
14. 1 - User-content interactions
for analytics and data science
Problems with known analytics partners
- Throttling/sampling
- Non-realtime (event level)
- 3rd party tracking
- Non-transparant
- Privacy control
- Vendor lock-in
15. 1 - User-content interactions
for analytics and data science
Open source tracking:
- Android
- Go
- .NET
- iOS
- Java
- JavaScript
- NodeJS
- Python
- Scala
- [many more]
- Infrastructure as a Service on AWS
- Open source
- Flexible/configurable
- Realtime
- 1st party
20. 3 - Data quality
Challenges:
- Variety of brands
- Variety of platforms
- Variety of development teams
Solutions:
- Enforcing schema verification => corrupt events topic
- Tag manager templating
- Monitoring of tags and anomalies
- Automated quality assurance for new releases
21. 4 - High event volumes
ClicksPageviews Player heartbeats
5 B/month
Processing
- Transform
- Filter
- Parse
- Window
- Aggregate
Integrate with
- Business Intelligence tools
- Data Science tools
22. 4 - High event volumes
Solutions:
- Snowplow => heavy lifting of collecting
- Start/terminate (EMR clusters on AWS)
from Airflow when needed
- Spark for cleaning and aggregating
- Mirror S3 (partially) to Redshift for fast
querying and BI tooling
regionieuws
bouw &
vastgoed
voetbal
27. 5 - Realtime scalability
Night/day pattern
- almost no night time traffic
Breaking news/developing stories
- double / quadruple daily volume
Push notifications
- peaks up to 16K events per second
How to aggregate?
28. 5 - Realtime scalability
Challenges
- Fluctuating traffic
- Stateful streaming
Considerations
- Latency - How fast is fast enough?
- Spark Streaming is still mini-batch
Solutions
- Dockerize applications
- Orchestrate with Kubernetes
- Container I/O to Kafka
- Redis
- ElasticSearch
- Flink