Streaming in the Extreme
Jim Scott, Director, Enterprise Strategy & Architecture, MapR
Have you ever heard of Kafka? Are you ready to start streaming all of the events in your business? What happens to your streaming solution when you outgrow your single data center? What happens when you are at a company that is already running multiple data centers and you need to implement streaming across data centers? I will discuss technologies like Kafka that can be used to accomplish, real-time, lossless messaging that works in both single and multiple globally dispersed data centers. I will also describe how to handle the data coming in through these streams in both batch processes as well as real-time processes.What about when you need to scale to a trillion events per day? I will discuss technologies like Kafka that can be used to accomplish, real-time, lossless messaging that works in both single and multiple globally dispersed data centers. I will also describe how to handle the data coming in through these streams in both batch processes as well as real-time processes.
Video Presentation:
https://youtu.be/Y0vxLgB1u9o
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
Learn about what technologies enable a new, modern Stream-based architecture to connect everything within application modules or across data centers and public clouds. Combine Kafka-style streaming and stream processing frameworks like Spark and Flink with Microservices and completely rethink your big data architecture away from state and into data flows.
Slides from Strata+Hadoop Singapore 2016 presenting how Deep Learning can be scaled both vertically and horizontally, when to use CPUs and when to use GPUs.
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single, general-purpose compute engine.
But is Spark alone sufficient for developing cloud-based big data applications? What are the other required components for supporting big data cloud processing? How can you accelerate the development of applications which extend across Spark and other frameworks such as Kafka, Hadoop, NoSQL databases, and more?
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
Learn about what technologies enable a new, modern Stream-based architecture to connect everything within application modules or across data centers and public clouds. Combine Kafka-style streaming and stream processing frameworks like Spark and Flink with Microservices and completely rethink your big data architecture away from state and into data flows.
Slides from Strata+Hadoop Singapore 2016 presenting how Deep Learning can be scaled both vertically and horizontally, when to use CPUs and when to use GPUs.
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single, general-purpose compute engine.
But is Spark alone sufficient for developing cloud-based big data applications? What are the other required components for supporting big data cloud processing? How can you accelerate the development of applications which extend across Spark and other frameworks such as Kafka, Hadoop, NoSQL databases, and more?
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
Big data real time architectures -
How do to big data processing in real time?
What architectures are out there to support this paradigm?
Which one should we choose?
What Advantages / Pitfalls they contain.
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
At BAADER, we design and engineer innovative and holistic solutions that ensure intelligent, safe, efficient and sustainable food processing in all phases, from the handling of live and raw protein materials to the finished food products. As a key player in the food value chain, we aim to take further significant steps toward greater efficiency, traceability, transparency, profitability, and sustainability through new digital solutions.
During our digital transformation we are working on two ends: At one hand there are many brownfield factories unprepared for the digital journey and the other hand we have powerful greenfield technologies like Apache Kafka. Now, we have to bring two mindsets together – robust food processing machinery and highly scalable software technologies. In this talk, we will present how we successfully started to ingest various kinds of IoT data into our Kafka cluster - spotlighted from both ends.
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...Dataconomy Media
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at First Derivatives
About the Author:
James is Senior Vice President, Fast Data Solutions at Kx where he has worked as a developer since 2009. In his career to date, he has worked in the algorithmic trading space at many of the world’s top financial institutions using Kx - a low latency technology for analysing time series data. He is a certified Professional Risk Manager and holds a masters in Quantitative Finance from University College Dublin. In recent years he has built systems for clients ranging from start-ups to blue chip companies in data intensive industries such as pharma, utilities and telco.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation.
We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.
Building a real-time, scalable and intelligent programmatic ad buying platformJampp
After a brief introduction to programmatic ads and RTB we go through the evolution of Jampp's data platform to handle the enormous about of data we need to process.
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScyllaDB
Will LaForest is the Public Sector CTO for Confluent. In his current position, Will evangelizes the benefits of Apache Kafka, event-driven data in motion architecture, and open-source software is addressing mission challenges in the Government. He has spent 25 years wrangling data at massive scale. His technical career spans diverse areas from software engineering, NoSQL, data science, cloud computing, machine learning, and building statistical visualization software but began with code slinging at DARPA as a teenager. Will holds degrees in mathematics and physics from the University of Virginia.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Data streams take many forms and their velocity is hard to tame. They can be myriads of tiny flows that you can collect to tame with Time-series Databases; continuous massive flows than you cannot stop to tame with Data Stream Management Systems; Continuous numerous flows that can turn into a torrent to tame with Event-based Systems; and myriads of continuous flows of any size and speed that form an immense delta to tame with Event-Driven Architectures. Enjoy this introductory talk!
This presentation looks at how to build an architecture for big and fast data. It reviews the Kappa & Lambda architectures and looks at the role Hazelcast Jet & IMDG can play in the Kappa architecture. It then proposes an evolution of the Kappa architecture to provide a transactional big data system.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Lviv Startup Club
Serhii Kholodniuk: What you need to know, before migrating data platform to GCP (Google cloud platform)
AI & BigData Online Day 2022
Website: https://aiconf.com.ua
Youtube: https://www.youtube.com/startuplviv
FB: https://www.facebook.com/aiconf
Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent
Building Pinterest Real-Time Ads Platform Using Kafka Streams (Liquan Pei + Boyang Chen, Pinterest) Kafka Summit SF 2018
In this talk, we are sharing the experience of building Pinterest’s real-time Ads Platform utilizing Kafka Streams. The real-time budgeting system is the most mission-critical component of the Ads Platform as it controls how each ad is delivered to maximize user, advertiser and Pinterest value. The system needs to handle over 50,000 queries per section (QPS) impressions, requires less than five seconds of end-to-end latency and recovers within five minutes during outages. It also needs to be scalable to handle the fast growth of Pinterest’s ads business.
The real-time budgeting system is composed of real-time stream-stream joiner, real-time spend aggregator and a spend predictor. At Pinterest’s scale, we need to overcome quite a few challenges to make each component work. For example, the stream-stream joiner needs to maintain terabyte size state while supporting fast recovery, and the real-time spend aggregator needs to publish to thousands of ads servers while supporting over one million read QPS. We choose Kafka Streams as it provides milliseconds latency guarantee, scalable event-based processing and easy-to-use APIs. In the process of building the system, we performed tons of tuning to RocksDB, Kafka Producer and Consumer, and pushed several open source contributions to Apache Kafka. We are also working on adding a remote checkpoint for Kafka Streams state to reduce the time of code start when adding more machines to the application. We believe that our experience can be beneficial to people who want to build real-time streaming solutions at large scale and deeply understand Kafka Streams.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their share. I will present a working demonstration of how modern data streaming can be applied to the data acquisition and analysis problem posed by modern motorsports.
Instead of bringing multiple Formula 1 cars to the talk, I will show how we instrumented a high fidelity physics-based automotive simulator to produce realistic data from simulated cars running on the Spa-Francorchamps track. We move data from the cars, to the pits, to the engineers back at HQ.
The result is near real-time visualization and comparison of performance and a great exposition of how to move data using messaging systems like Kafka, and process data in real time with Apache Spark, then analyse data using SQL with Apache Drill.
Code available here: https://github.com/mapr-demos/racing-time-series
SIEM Modernization: Build a Situationally Aware Organization with Apache Kafka®confluent
Watch this talk here: https://www.confluent.io/online-talks/siem-modernization-build-a-situationally-aware-organization-with-apache-kafka
Of all security breaches, 85% are conducted with compromised credentials, often at the administration level or higher. A lot of IT groups think “security” means authentication, authorization and encryption (AAE), but these are often tick-boxes that rarely stop breaches. The internal threat surfaces of data streams or disk drives in a raidset in a data centre are not the threat surface of interest.
Cyber or Threat organizations must conduct internal investigations of IT, subcontractors and supply chains without implicating the innocent. Therefore, they are organizationally air-gapped from IT. Some surveys indicate up to 10% of IT is under investigation at any given time.
Deploying a signal processing platform, such as Confluent Platform, allows organizations to evaluate data as soon as it becomes available enabling them to assess and mitigate risk before it arises. In Cyber or Threat Intelligence, events can be considered signals, and when analysts are hunting for threat actors, these don't appear as a single needle in a haystack, but as a series of needles. In this paradigm, streams of signals aggregate into signatures. This session shows how various sub-systems in Apache Kafka can be used to aggregate, integrate and attribute these signals into signatures of interest.
In this talk you will learn:
-The current threat landscape
-The difference between Security and Threat Intelligence
-The value of Confluent platform as an ideal complement to hardware endpoint detection systems and batch-based SIEM warehouses
At BAADER, we design and engineer innovative and holistic solutions that ensure intelligent, safe, efficient and sustainable food processing in all phases, from the handling of live and raw protein materials to the finished food products. As a key player in the food value chain, we aim to take further significant steps toward greater efficiency, traceability, transparency, profitability, and sustainability through new digital solutions.
During our digital transformation we are working on two ends: At one hand there are many brownfield factories unprepared for the digital journey and the other hand we have powerful greenfield technologies like Apache Kafka. Now, we have to bring two mindsets together – robust food processing machinery and highly scalable software technologies. In this talk, we will present how we successfully started to ingest various kinds of IoT data into our Kafka cluster - spotlighted from both ends.
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at ...Dataconomy Media
"Introduction to Kx Technology", James Corcoran, Head of Engineering EMEA at First Derivatives
About the Author:
James is Senior Vice President, Fast Data Solutions at Kx where he has worked as a developer since 2009. In his career to date, he has worked in the algorithmic trading space at many of the world’s top financial institutions using Kx - a low latency technology for analysing time series data. He is a certified Professional Risk Manager and holds a masters in Quantitative Finance from University College Dublin. In recent years he has built systems for clients ranging from start-ups to blue chip companies in data intensive industries such as pharma, utilities and telco.
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Processing Real-Time Data at Scale: A streaming platform as a central nervous...confluent
(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Enterprises are Increasingly demanding realtime analytics and insights to power use cases like personalization, monitoring and marketing. We will present Pulsar, a realtime streaming system used at eBay which can scale to millions of events per second with high availability and SQL-like language support, enabling realtime data enrichment, filtering and multi-dimensional metrics aggregation.
We will discuss how Pulsar integrates with a number of open source Apache technologies like Kafka, Hadoop and Kylin (Apache incubator) to achieve the high scalability, availability and flexibility. We use Kafka to replay unprocessed events to avoid data loss and to stream realtime events into Hadoop enabling reconciliation of data between realtime and batch. We use Kylin to provide multi-dimensional OLAP capabilities.
Building a real-time, scalable and intelligent programmatic ad buying platformJampp
After a brief introduction to programmatic ads and RTB we go through the evolution of Jampp's data platform to handle the enormous about of data we need to process.
Scylla Summit 2022: An Odyssey to ScyllaDB and Apache KafkaScyllaDB
Will LaForest is the Public Sector CTO for Confluent. In his current position, Will evangelizes the benefits of Apache Kafka, event-driven data in motion architecture, and open-source software is addressing mission challenges in the Government. He has spent 25 years wrangling data at massive scale. His technical career spans diverse areas from software engineering, NoSQL, data science, cloud computing, machine learning, and building statistical visualization software but began with code slinging at DARPA as a teenager. Will holds degrees in mathematics and physics from the University of Virginia.
To watch all of the recordings hosted during Scylla Summit 2022 visit our website here: https://www.scylladb.com/summit.
Data streams take many forms and their velocity is hard to tame. They can be myriads of tiny flows that you can collect to tame with Time-series Databases; continuous massive flows than you cannot stop to tame with Data Stream Management Systems; Continuous numerous flows that can turn into a torrent to tame with Event-based Systems; and myriads of continuous flows of any size and speed that form an immense delta to tame with Event-Driven Architectures. Enjoy this introductory talk!
This presentation looks at how to build an architecture for big and fast data. It reviews the Kappa & Lambda architectures and looks at the role Hazelcast Jet & IMDG can play in the Kappa architecture. It then proposes an evolution of the Kappa architecture to provide a transactional big data system.
Processing 19 billion messages in real time and NOT dying in the processJampp
Here is an introduction in the Jampp architecture for data processing. We walk through our journey of migrating to systems that allows us to process more data in real time
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Lviv Startup Club
Serhii Kholodniuk: What you need to know, before migrating data platform to GCP (Google cloud platform)
AI & BigData Online Day 2022
Website: https://aiconf.com.ua
Youtube: https://www.youtube.com/startuplviv
FB: https://www.facebook.com/aiconf
Building Pinterest Real-Time Ads Platform Using Kafka Streams confluent
Building Pinterest Real-Time Ads Platform Using Kafka Streams (Liquan Pei + Boyang Chen, Pinterest) Kafka Summit SF 2018
In this talk, we are sharing the experience of building Pinterest’s real-time Ads Platform utilizing Kafka Streams. The real-time budgeting system is the most mission-critical component of the Ads Platform as it controls how each ad is delivered to maximize user, advertiser and Pinterest value. The system needs to handle over 50,000 queries per section (QPS) impressions, requires less than five seconds of end-to-end latency and recovers within five minutes during outages. It also needs to be scalable to handle the fast growth of Pinterest’s ads business.
The real-time budgeting system is composed of real-time stream-stream joiner, real-time spend aggregator and a spend predictor. At Pinterest’s scale, we need to overcome quite a few challenges to make each component work. For example, the stream-stream joiner needs to maintain terabyte size state while supporting fast recovery, and the real-time spend aggregator needs to publish to thousands of ads servers while supporting over one million read QPS. We choose Kafka Streams as it provides milliseconds latency guarantee, scalable event-based processing and easy-to-use APIs. In the process of building the system, we performed tons of tuning to RocksDB, Kafka Producer and Consumer, and pushed several open source contributions to Apache Kafka. We are also working on adding a remote checkpoint for Kafka Streams state to reduce the time of code start when adding more machines to the application. We believe that our experience can be beneficial to people who want to build real-time streaming solutions at large scale and deeply understand Kafka Streams.
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources. Proven at scale in a variety of use cases at Airbnb, Bloomberg, Comcast, Facebook, FINRA, LinkedIn, Lyft, Netflix, Twitter, and Uber, in the last few years Presto experienced an unprecedented growth in popularity in both on-premises and cloud deployments over Object Stores, HDFS, NoSQL and RDBMS data stores.
With the ever-growing list of connectors to new data sources such as Azure Blob Storage, Elasticsearch, Netflix Iceberg, Apache Kudu, and Apache Pulsar, recently introduced Cost-Based Optimizer in Presto must account for heterogeneous inputs with differing and often incomplete data statistics. This talk will explore this topic in detail as well as discuss best use cases for Presto across several industries. In addition, we will present recent Presto advancements such as Geospatial analytics at scale and the project roadmap going forward.
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Fast Cars, Big Data - How Streaming Can Help Formula 1Tugdual Grall
Modern cars produce data. Lots of data. And Formula 1 cars produce more than their share. I will present a working demonstration of how modern data streaming can be applied to the data acquisition and analysis problem posed by modern motorsports.
Instead of bringing multiple Formula 1 cars to the talk, I will show how we instrumented a high fidelity physics-based automotive simulator to produce realistic data from simulated cars running on the Spa-Francorchamps track. We move data from the cars, to the pits, to the engineers back at HQ.
The result is near real-time visualization and comparison of performance and a great exposition of how to move data using messaging systems like Kafka, and process data in real time with Apache Spark, then analyse data using SQL with Apache Drill.
Code available here: https://github.com/mapr-demos/racing-time-series
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
Examine the unique features of the MapR Converged Data Platform and how they can support production-grade enterprise machine learning - Ends with a live demo using H2O - Presented at Hadoop Summit Tokyo 2016
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single compute engine. Spark is speeding up data pipeline development, enabling richer predictive analytics, and bringing a new class of applications to market.
Spark and MapR Streams: A Motivating ExampleIan Downard
Businesses are discovering the untapped potential of large datasets and data streams through the use of technologies for big data processing and storage. By leveraging these assets they’re creating a new generation of applications that derive value from data they used to throw away. In this presentation Ian Downard shows how to build operational environments for these types of applications with the MapR Converged Data Platform and he describes examples of a next-generation applications that use Java APIs for MapR Streams, Apache Spark, Apache Hive, and MapR-DB. He shows how these technologies can be used to join and transform unbounded datasets to find signals and derive new data streams for a financial scenario involving real-time algorithmic trading and historical analysis using SQL. He also discusses how MapR enables you to run real-time data applications with the speed, reliability, and security you need for a production environment.
HUG Italy meet-up with Fabian Wilckens, MapR EMEA Solutions ArchitectSpagoWorld
The presentation supported the speech "Think differently – Stream-based Microservice Architecture for Next-Generation Applications" by Fabian Wilckens (EMEA Solutions Architect, MapR Technologies Inc.) at the HUG Italy meet-up supported by Engineering Group's SpagoBI Labs, which took place in Milan, Italy on March 17th, 2016. Read more: http://bit.ly/1UydNuz
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
Please join us to learn about the recent developments during the past year in the MapR Community Edition. In these slides, we will cover the following platform updates:
-Taking cluster monitoring to the next level with the Spyglass Initiative
-Real-time streaming with MapR Streams
-MapR-DB JSON document database and application development with OJAI
-Securing your data with access control expressions (ACEs)
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
Agility is king in the world of finance, and a message-driven architecture is a mechanism for building and managing discrete business functionality to enable agility. In order to accommodate rapid innovation, data pipelines must evolve. However, implementing microservices can create management problems, like the number of instances running in an environment.
Microservices can be leveraged on a message-driven architecture, but the concept must be thoughtfully implemented to show the true value. Jim Scott outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance. Along the way, Jim covers financial use cases dealing with securities management and fraud—starting with ingestion of data from potentially hundreds of data sources to the required fan-out of that data without sacrificing performance—and discusses the pros and cons around operational capabilities and using the same data pipeline to support development and quality assurance practices.
Presented at Strata+Hadoop World NY 2016 by:
Jim Scott
MapR Technologies, Inc.
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
End of maintenance for MapR 4.x is coming in January, so now is a good time to plan your upgrade. Please join us to learn about the recent developments during the past year in the MapR Platform that will make the upgrade effort this year worthwhile.
Lambda Architecture: The Best Way to Build Scalable and Reliable Applications!Tugdual Grall
Lambda Architecture is a useful framework to think about designing big data applications. This framework has been built initially at Twitter. In this presentation you will learn, based on concrete examples how to build deploy scalable and fault tolerant applications, with a focus on Big Data and Hadoop.
This presentation was delivered at the OOP conference, Munich, Feb 2016
MapR is an ideal scalable platform for data science and specifically for operationalizing machine learning in the enterprise. This presentations gives specific reasons why.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.