This document provides an overview of a presentation given by Antje Barth on container and Kubernetes technologies without limits. The presentation covered:
- The challenges of stateful applications in containerized environments and how a modern data platform can help support them across multiple data centers or locations.
- How the MapR data platform provides persistence across containers in Kubernetes through features like global namespaces, various forms of primitive persistence, scalability, and uniform access controls.
- How the MapR data fabric for Kubernetes integrates with Kubernetes APIs to provision and mount MapR volumes for containerized applications, providing persistent storage that scales with containers and is highly available.
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
Docker containers running on Kubernetes combine with MapR Converged Data Platform allow any company to potentially enjoy the same sophisticated data infrastructure for enabling teams to engage in transformative machine learning and deep learning for production use at scale.
This document summarizes a talk given by Mathieu Dumoulin of MapR Technologies about architecting hybrid cloud applications using streaming messaging systems. The talk discusses using streaming architectures to connect systems in hybrid clouds, with public and private clouds connected by streaming. It also discusses using streaming for IoT and microservices and highlights Kafka and Spark Streaming/Flink as streaming technologies. Examples of log analysis architectures spanning hybrid clouds are presented.
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
This document is the agenda for a MapR product update webinar that will take place in Spring 2017. It introduces MapR's new Persistent Application Client Container (PACC) which allows applications to easily persist data in Docker containers. It also discusses MapR Edge for IoT which extends MapR's converged data platform to the edge. The webinar will cover Hive, Spark, and Drill updates in the new MapR Ecosystem Pack 3.0. Speakers from MapR will provide details on these products and there will be a question and answer session.
MapR is an ideal scalable platform for data science and specifically for operationalizing machine learning in the enterprise. This presentations gives specific reasons why.
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
This document summarizes Ellen Friedman's presentation on streaming data and architectures. The key points are:
1) Streaming data is becoming mainstream as technologies for distributed storage and stream processing mature. Real-time insights from streaming data provide more value than static batch analysis.
2) MapR Streams is part of MapR's converged data platform for message transport and can support use cases like microservices with its distributed, durable messaging capabilities.
3) Apache Flink is a popular open source stream processing framework that provides accurate, low-latency processing of streaming data through features like windowing, event-time semantics, and state management.
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETESMatt Stubbs
The document discusses progress for big data in Kubernetes. It provides an overview of Kubernetes and containers, and how the MapR data platform can integrate with Kubernetes to provide a unified global namespace and all three forms of data persistence (files, streams, tables) across Kubernetes clusters. This allows state to no longer be a problem for Kubernetes and enables more complex applications to run on it.
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
Our Strata Beijing 2017 presentation slides where we show how to use data from a movement sensor, in real-time, to do anomaly detection at scale using standard enterprise big data software.
Converged and Containerized Distributed Deep Learning With TensorFlow and Kub...Mathieu Dumoulin
Docker containers running on Kubernetes combine with MapR Converged Data Platform allow any company to potentially enjoy the same sophisticated data infrastructure for enabling teams to engage in transformative machine learning and deep learning for production use at scale.
This document summarizes a talk given by Mathieu Dumoulin of MapR Technologies about architecting hybrid cloud applications using streaming messaging systems. The talk discusses using streaming architectures to connect systems in hybrid clouds, with public and private clouds connected by streaming. It also discusses using streaming for IoT and microservices and highlights Kafka and Spark Streaming/Flink as streaming technologies. Examples of log analysis architectures spanning hybrid clouds are presented.
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
This document is the agenda for a MapR product update webinar that will take place in Spring 2017. It introduces MapR's new Persistent Application Client Container (PACC) which allows applications to easily persist data in Docker containers. It also discusses MapR Edge for IoT which extends MapR's converged data platform to the edge. The webinar will cover Hive, Spark, and Drill updates in the new MapR Ecosystem Pack 3.0. Speakers from MapR will provide details on these products and there will be a question and answer session.
MapR is an ideal scalable platform for data science and specifically for operationalizing machine learning in the enterprise. This presentations gives specific reasons why.
Streaming Goes Mainstream: New Architecture & Emerging Technologies for Strea...MapR Technologies
This document summarizes Ellen Friedman's presentation on streaming data and architectures. The key points are:
1) Streaming data is becoming mainstream as technologies for distributed storage and stream processing mature. Real-time insights from streaming data provide more value than static batch analysis.
2) MapR Streams is part of MapR's converged data platform for message transport and can support use cases like microservices with its distributed, durable messaging capabilities.
3) Apache Flink is a popular open source stream processing framework that provides accurate, low-latency processing of streaming data through features like windowing, event-time semantics, and state management.
Big Data LDN 2018: PROGRESS FOR BIG DATA IN KUBERNETESMatt Stubbs
The document discusses progress for big data in Kubernetes. It provides an overview of Kubernetes and containers, and how the MapR data platform can integrate with Kubernetes to provide a unified global namespace and all three forms of data persistence (files, streams, tables) across Kubernetes clusters. This allows state to no longer be a problem for Kubernetes and enables more complex applications to run on it.
State of the Art Robot Predictive Maintenance with Real-time Sensor DataMathieu Dumoulin
Our Strata Beijing 2017 presentation slides where we show how to use data from a movement sensor, in real-time, to do anomaly detection at scale using standard enterprise big data software.
This document discusses progress in using Kubernetes for big data applications. It begins by introducing Kubernetes and explaining its growing popularity due to support from major cloud providers and an open source community. It then discusses some challenges with using containers, particularly around state management. The document proposes using MapR's data platform to provide a global namespace and support for files, streams and tables to address state issues when using Kubernetes for big data applications.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
This document provides an overview of MapR Technologies and their MapR Distribution for Hadoop. It discusses three trends driving changes in enterprise architecture: 1) industry leaders compete using data, 2) big data is overwhelming traditional systems, and 3) Hadoop is becoming a disruptive technology. It then summarizes MapR's capabilities for high availability, data protection, disaster recovery, security, performance, and multi-tenancy. Case studies are presented showing how MapR has helped customers in financial services, retail, and other industries gain business value from their big data.
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
The LEGaTO project received funding from the European Union's Horizon 2020 programme to create a software stack that optimizes for energy efficiency on heterogeneous computing platforms. The project aims to start with mature European software and optimize it to support energy-efficient computation on hardware with CPUs, GPUs, FPGAs, and FPGA-based dataflow engines. Key partners include universities and companies developing hardware and software. The project will develop programming models, runtime systems, and use cases in areas like healthcare, smart homes, and machine learning to demonstrate the stack.
SystemML is an Apache project that provides a declarative machine learning language for data scientists. It aims to simplify the development of custom machine learning algorithms and enable scalable execution on everything from single nodes to clusters. SystemML provides pre-implemented machine learning algorithms, APIs for various languages, and a cost-based optimizer to compile execution plans tailored to workload and hardware characteristics in order to maximize performance.
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
The document discusses machine learning and autonomous driving applications. It begins with a simple machine learning example of classifying images of chickens posted on Twitter. It then discusses how autonomous vehicles use machine learning by gathering large amounts of sensor data to train models for tasks like object recognition. The document also summarizes challenges for applying machine learning at an enterprise scale and how the MapR data platform can address these challenges by providing a unified environment for storing, accessing, and processing large amounts of diverse data.
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019VMware Tanzu
The document summarizes testing performed to evaluate Greenplum, Postgres-XL, and Citus for scaling out PostgreSQL to handle large astronomical datasets. Tests were conducted on ingestion performance, query performance, hardware requirements, and high availability features using real datasets from ESA space missions. Greenplum generally performed best across the tests in terms of speed and scalability. The testing helped identify requirements for distributing the large catalogues and lessons learned for future database architecture planning.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
Advanced Threat Detection on Streaming DataCarol McDonald
The document discusses using a stream processing architecture to enable real-time detection of advanced threats from large volumes of streaming data. The solution ingests data using fast distributed messaging like Kafka or MapR Streams. Complex event processing with Storm and Esper is used to detect patterns. Data is stored in scalable NoSQL databases like HBase and analyzed using machine learning. The parallelized, partitioned architecture allows for high performance and scalability.
Applying Machine Learning to Live Patient DataCarol McDonald
This document discusses applying machine learning to live patient data for real-time anomaly detection. It describes using streaming data from medical devices like EKGs to build a machine learning model for identifying anomalies. The streaming data is processed using Spark Streaming and enriched with cluster assignments from a pre-trained K-means model before being sent to a dashboard for real-time monitoring of patient vitals.
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
This document discusses building a streaming data pipeline using Apache technologies like Kafka, Spark Streaming, and MapR-DB. It describes collecting streaming data with Kafka, organizing the data into topics, and processing the streams in Spark Streaming. The streaming data can then be stored in MapR-DB and queried using Spark SQL. An example uses a streaming payment dataset to demonstrate parsing the data, transforming it into a Dataset, and continuously aggregating values with Spark Streaming.
Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
Date: 14th November 2018
Location: Keynote Theatre
Time: 11:50 - 12:20
Speaker: Ellen Friedman
Organisation: MapR
About: We’ve seen that over 90% of our customers have large scale projects successfully in production. What are they doing right? And how can you adapt their effective habits to your own business?
Value comes from big data when you have successful production deployments of data-intensive AI and analytics applications tied to practical business goals. Doing this well can be difficult on many levels. Each business presents its own challenges, but we’ve observed a number of habits that are common to many of the organizations who are getting value from their production deployments.
This presentation will explore 7 key habits that can make a difference and use real world examples to show you why. From architecture to technology to organizational culture, you’ll learn practical approaches that can improve your likelihood of success in production.
This document discusses progress in using Kubernetes for big data applications. It begins by introducing Kubernetes and explaining its growing popularity due to support from major cloud providers and an open source community. It then discusses some challenges with using containers, particularly around state management. The document proposes using MapR's data platform to provide a global namespace and support for files, streams and tables to address state issues when using Kubernetes for big data applications.
The folk wisdom has always been that when running stateful applications inside containers, the only viable choice is to externalize the state so that the containers themselves are stateless or nearly so. Keeping large amounts of state inside containers is possible, but it’s considered a problem because stateful containers generally can’t preserve that state across restarts.
In practice, this complicates the management of large-scale Kubernetes-based infrastructure because these high-performance storage systems require separate management. In terms of overall system management, it would be ideal if we could run a software-defined storage system directly in containers managed by Kubernetes, but that has been hampered by lack of direct device access and difficult questions about what happens to the state on container restarts.
Ted Dunning describes recent developments that make it possible for Kubernetes to manage both compute and storage tiers in the same cluster. Container restarts can be handled gracefully without loss of data or a requirement to rebuild storage structures and access to storage from compute containers is extremely fast. In some environments, it’s even possible to implement elastic storage frameworks that can fold data onto just a few containers during quiescent periods or explode it in just a few seconds across a large number of machines when higher speed access is required.
The benefits of systems like this extend beyond management simplicity, because applications can be more Agile precisely because the storage layer is more stable and can be uniformly accessed from any container host. Even better, it makes it a snap to configure and deploy a full-scale compute and storage infrastructure.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
We describe an application of CEP using a microservice-based streaming architecture. We use Drools business rule engine to apply rules in real time to an event stream from IoT traffic sensor data.
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
This document provides an overview of MapR Technologies and their MapR Distribution for Hadoop. It discusses three trends driving changes in enterprise architecture: 1) industry leaders compete using data, 2) big data is overwhelming traditional systems, and 3) Hadoop is becoming a disruptive technology. It then summarizes MapR's capabilities for high availability, data protection, disaster recovery, security, performance, and multi-tenancy. Case studies are presented showing how MapR has helped customers in financial services, retail, and other industries gain business value from their big data.
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
SAMOS 2018: LEGaTO: first steps towards energy-efficient toolset for heteroge...LEGATO project
The LEGaTO project received funding from the European Union's Horizon 2020 programme to create a software stack that optimizes for energy efficiency on heterogeneous computing platforms. The project aims to start with mature European software and optimize it to support energy-efficient computation on hardware with CPUs, GPUs, FPGAs, and FPGA-based dataflow engines. Key partners include universities and companies developing hardware and software. The project will develop programming models, runtime systems, and use cases in areas like healthcare, smart homes, and machine learning to demonstrate the stack.
SystemML is an Apache project that provides a declarative machine learning language for data scientists. It aims to simplify the development of custom machine learning algorithms and enable scalable execution on everything from single nodes to clusters. SystemML provides pre-implemented machine learning algorithms, APIs for various languages, and a cost-based optimizer to compile execution plans tailored to workload and hardware characteristics in order to maximize performance.
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
The document discusses machine learning and autonomous driving applications. It begins with a simple machine learning example of classifying images of chickens posted on Twitter. It then discusses how autonomous vehicles use machine learning by gathering large amounts of sensor data to train models for tasks like object recognition. The document also summarizes challenges for applying machine learning at an enterprise scale and how the MapR data platform can address these challenges by providing a unified environment for storing, accessing, and processing large amounts of diverse data.
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019VMware Tanzu
The document summarizes testing performed to evaluate Greenplum, Postgres-XL, and Citus for scaling out PostgreSQL to handle large astronomical datasets. Tests were conducted on ingestion performance, query performance, hardware requirements, and high availability features using real datasets from ESA space missions. Greenplum generally performed best across the tests in terms of speed and scalability. The testing helped identify requirements for distributing the large catalogues and lessons learned for future database architecture planning.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
Advanced Threat Detection on Streaming DataCarol McDonald
The document discusses using a stream processing architecture to enable real-time detection of advanced threats from large volumes of streaming data. The solution ingests data using fast distributed messaging like Kafka or MapR Streams. Complex event processing with Storm and Esper is used to detect patterns. Data is stored in scalable NoSQL databases like HBase and analyzed using machine learning. The parallelized, partitioned architecture allows for high performance and scalability.
Applying Machine Learning to Live Patient DataCarol McDonald
This document discusses applying machine learning to live patient data for real-time anomaly detection. It describes using streaming data from medical devices like EKGs to build a machine learning model for identifying anomalies. The streaming data is processed using Spark Streaming and enriched with cluster assignments from a pre-trained K-means model before being sent to a dashboard for real-time monitoring of patient vitals.
Structured Streaming Data Pipeline Using Kafka, Spark, and MapR-DBCarol McDonald
This document discusses building a streaming data pipeline using Apache technologies like Kafka, Spark Streaming, and MapR-DB. It describes collecting streaming data with Kafka, organizing the data into topics, and processing the streams in Spark Streaming. The streaming data can then be stored in MapR-DB and queried using Spark SQL. An example uses a streaming payment dataset to demonstrate parsing the data, transforming it into a Dataset, and continuously aggregating values with Spark Streaming.
Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html
Big Data LDN 2018: 7 SUCCESSFUL HABITS FOR DATA-INTENSIVE APPLICATIONS IN PRO...Matt Stubbs
Date: 14th November 2018
Location: Keynote Theatre
Time: 11:50 - 12:20
Speaker: Ellen Friedman
Organisation: MapR
About: We’ve seen that over 90% of our customers have large scale projects successfully in production. What are they doing right? And how can you adapt their effective habits to your own business?
Value comes from big data when you have successful production deployments of data-intensive AI and analytics applications tied to practical business goals. Doing this well can be difficult on many levels. Each business presents its own challenges, but we’ve observed a number of habits that are common to many of the organizations who are getting value from their production deployments.
This presentation will explore 7 key habits that can make a difference and use real world examples to show you why. From architecture to technology to organizational culture, you’ll learn practical approaches that can improve your likelihood of success in production.
Fast Cars, Big Data How Streaming can help Formula 1Carol McDonald
This document discusses how streaming data and analytics can help Formula 1 racing teams. It provides examples of the large volume of sensor data collected from Formula 1 cars during races. The document demonstrates how streaming this data using Apache Kafka and analyzing it in real-time with tools like Apache Spark and Apache Flink can help teams with tasks like predictive maintenance, race strategy optimization, and driver coaching. It also discusses storing the streaming data in databases like Apache Drill and MapR-DB for ad-hoc querying and analysis.
Surprising Advantages of Streaming - ACM March 2018Ellen Friedman
Shift to a new idea: stream instead of database as heart of your big data architecture. With the right capabilities for event-by-event streaming data transport (not processing) you get the flexibility of streaming microservices & much more. Includes real world use case examples.
DataOps: An Agile Method for Data-Driven OrganizationsEllen Friedman
DataOps expands DevOps philosophy to include data-heavy roles (data engineering & data science). DataOps uses better cross-functional collaboration for flexibility, fast time to value and an agile workflow for data-intensive applications including machine learning pipelines. (Strata Data San Jose March 2018)
The rise of microservices details how the software infrastructure of the future are changing. As corporations strive for competitive advantage, they must redesign their brownfield legacy applications and move them to the cloud. Agile Cloud applications follow microservices and cloudnative development patterns. Microservices architectures are enabled by Docker and Kubernetes. Both software are hosted by CNCF.
microservices architectures are being enhanced with a service mesh layer which simplifies the communication and management of cloudnative applications.
Spark and MapR Streams: A Motivating ExampleIan Downard
Businesses are discovering the untapped potential of large datasets and data streams through the use of technologies for big data processing and storage. By leveraging these assets they’re creating a new generation of applications that derive value from data they used to throw away. In this presentation Ian Downard shows how to build operational environments for these types of applications with the MapR Converged Data Platform and he describes examples of a next-generation applications that use Java APIs for MapR Streams, Apache Spark, Apache Hive, and MapR-DB. He shows how these technologies can be used to join and transform unbounded datasets to find signals and derive new data streams for a financial scenario involving real-time algorithmic trading and historical analysis using SQL. He also discusses how MapR enables you to run real-time data applications with the speed, reliability, and security you need for a production environment.
Advanced Spark and TensorFlow Meetup - Dec 12 2017 - Dong Meng, MapR + Kubern...Chris Fregly
This document discusses distributed deep learning on the MapR Converged Data Platform. It provides an overview of MapR's enterprise big data journey and capabilities for distributed deep learning. It describes using containers and Kubernetes for deep learning model development and deployment, with NVIDIA GPUs for computation. It presents architectures and patterns for separating or collocating MapR and GPU clusters. Finally, it previews demos of parameter server/workers and real-time face detection using streams.
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
Please join us to learn about the recent developments during the past year in the MapR Community Edition. In these slides, we will cover the following platform updates:
-Taking cluster monitoring to the next level with the Spyglass Initiative
-Real-time streaming with MapR Streams
-MapR-DB JSON document database and application development with OJAI
-Securing your data with access control expressions (ACEs)
7 Habits for Big Data in Production - keynote Big Data London Nov 2018Ellen Friedman
You can improve your chances for success with data intensive large scale applications (AI, machine learning and analytics) in production.
This keynote presentation from Big Data London shows you how.
Predictive Maintenance Using Recurrent Neural NetworksJustin Brandenburg
This document discusses using recurrent neural networks for predictive maintenance. It begins by providing context on industry 4.0 and the growth of industrial automation. It then discusses predictive maintenance and how sensor data from industrial equipment can be used for failure prediction. The document outlines how a recurrent neural network model could be developed using streaming sensor data from manufacturing devices to identify abnormal behavior and predict needed maintenance. It describes the workflow of importing and preparing the data, developing and testing the model, and deploying it to generate alerts from new streaming data.
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Application developers are key to the success of an edge compute strategy. They are the backbone for any digital ecosystem and their requirements drive the platform architecture. Edge computing is no different. In this talk, we will focus on some key requirements, challenges and possible solutions for a developer centric architecture for multi-access edge computing including abstraction of the service provider’s network complexity, low footprint cloud native builder models, micro-services, hardware abstractions, intelligence layers and massive monitoring of application instances.
About the speaker: Shamik Mishra is currently Assistant Vice President (AVP), Technology and Innovation at Aricent. He is a practice leader for new product architectures. He has extensive experience and contributions in software development in cloud, wireless technologies, edge computing and platform software. His research interests are Network Function Virtualization (NFV), Cloud and edge computing and Machine Learning (ML). He has spoken in several conferences and his work is regularly covered in the media. Shamik has a bachelor’s and a master’s degree from Indian Institute of Technology (IIT) Kharagpur, India.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
End of maintenance for MapR 4.x is coming in January, so now is a good time to plan your upgrade. Please join us to learn about the recent developments during the past year in the MapR Platform that will make the upgrade effort this year worthwhile.
Designing data pipelines for analytics and machine learning in industrial set...DataWorks Summit
Machine learning has made it possible for technologists to do amazing things with data. Its arrival coincides with the evolution of networked manufacturing systems driven by IoT. In this presentation we’ll examine the rise of IoT and ML from a practitioners perspective to better understand how applications of AI can be built in industrial settings. We'll walk through a case study that combines multiple IoT and ML technologies to monitor and optimize an industrial heating and cooling HVAC system. Through this instructive example you'll see how the following components can be put into action:
1. A StreamSets data pipeline that sources from MQTT and persists to OpenTSDB
2. A TensorFlow model that predicts anomalies in streaming sensor data
3. A Spark application that derives new event streams for real-time alerts
4. A Grafana dashboard that displays factory sensors and alerts in an interactive view
By walking through this solution step-by-step, you'll learn how to build the fundamental capabilities needed in order to handle endless streams of IoT data and derive ML insights from that data:
1. How to transport IoT data through scalable publish/subscribe event streams
2. How to process data streams with transformations and filters
3. How to persist data streams with the timeliness required for interactive dashboards
4. How to collect labeled datasets for training machine learning models
At the end of this presentation you will have learned how a variety of tools can be used together to build ML enhanced applications and data products for instrumented manufacturing systems.
Speakers
Ian Downard, Sr. Developer Evangelist, MapR
William Ochandarena, Senior Director of Product Management, MapR
Big Data LDN 2018: DATA OPERATIONS PROBLEMS CREATED BY DEEP LEARNING, AND HOW...Matt Stubbs
The document discusses several problems that are created by deep learning related to data operations and logistics, including a lack of support for the AI software development lifecycle, handling different workloads beyond just deep learning models, difficulties in putting machine learning models into production, running models in multiple locations, data dependencies being more costly than code dependencies, and changes in conditions over time. It provides recommendations on how to address these problems through approaches like stream-based architectures and containerization.
Real-World Machine Learning - Leverage the Features of MapR Converged Data Pl...Mathieu Dumoulin
Examine the unique features of the MapR Converged Data Platform and how they can support production-grade enterprise machine learning - Ends with a live demo using H2O - Presented at Hadoop Summit Tokyo 2016
Similar to Container and Kubernetes without limits (20)
Top Benefits of Using Salesforce Healthcare CRM for Patient Management.pdfVALiNTRY360
Salesforce Healthcare CRM, implemented by VALiNTRY360, revolutionizes patient management by enhancing patient engagement, streamlining administrative processes, and improving care coordination. Its advanced analytics, robust security, and seamless integration with telehealth services ensure that healthcare providers can deliver personalized, efficient, and secure patient care. By automating routine tasks and providing actionable insights, Salesforce Healthcare CRM enables healthcare providers to focus on delivering high-quality care, leading to better patient outcomes and higher satisfaction. VALiNTRY360's expertise ensures a tailored solution that meets the unique needs of any healthcare practice, from small clinics to large hospital systems.
For more info visit us https://valintry360.com/solutions/health-life-sciences
Mobile app Development Services | Drona InfotechDrona Infotech
Drona Infotech is one of the Best Mobile App Development Company In Noida Maintenance and ongoing support. mobile app development Services can help you maintain and support your app after it has been launched. This includes fixing bugs, adding new features, and keeping your app up-to-date with the latest
Visit Us For :
WWDC 2024 Keynote Review: For CocoaCoders AustinPatrick Weigel
Overview of WWDC 2024 Keynote Address.
Covers: Apple Intelligence, iOS18, macOS Sequoia, iPadOS, watchOS, visionOS, and Apple TV+.
Understandable dialogue on Apple TV+
On-device app controlling AI.
Access to ChatGPT with a guest appearance by Chief Data Thief Sam Altman!
App Locking! iPhone Mirroring! And a Calculator!!
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsPeter Muessig
The UI5 tooling is the development and build tooling of UI5. It is built in a modular and extensible way so that it can be easily extended by your needs. This session will showcase various tooling extensions which can boost your development experience by far so that you can really work offline, transpile your code in your project to use even newer versions of EcmaScript (than 2022 which is supported right now by the UI5 tooling), consume any npm package of your choice in your project, using different kind of proxies, and even stitching UI5 projects during development together to mimic your target environment.
8 Best Automated Android App Testing Tool and Framework in 2024.pdfkalichargn70th171
Regarding mobile operating systems, two major players dominate our thoughts: Android and iPhone. With Android leading the market, software development companies are focused on delivering apps compatible with this OS. Ensuring an app's functionality across various Android devices, OS versions, and hardware specifications is critical, making Android app testing essential.
SOCRadar's Aviation Industry Q1 Incident Report is out now!
The aviation industry has always been a prime target for cybercriminals due to its critical infrastructure and high stakes. In the first quarter of 2024, the sector faced an alarming surge in cybersecurity threats, revealing its vulnerabilities and the relentless sophistication of cyber attackers.
SOCRadar’s Aviation Industry, Quarterly Incident Report, provides an in-depth analysis of these threats, detected and examined through our extensive monitoring of hacker forums, Telegram channels, and dark web platforms.
Top 9 Trends in Cybersecurity for 2024.pptxdevvsandy
Security and risk management (SRM) leaders face disruptions on technological, organizational, and human fronts. Preparation and pragmatic execution are key for dealing with these disruptions and providing the right cybersecurity program.
SMS API Integration in Saudi Arabia| Best SMS API ServiceYara Milbes
Discover the benefits and implementation of SMS API integration in the UAE and Middle East. This comprehensive guide covers the importance of SMS messaging APIs, the advantages of bulk SMS APIs, and real-world case studies. Learn how CEQUENS, a leader in communication solutions, can help your business enhance customer engagement and streamline operations with innovative CPaaS, reliable SMS APIs, and omnichannel solutions, including WhatsApp Business. Perfect for businesses seeking to optimize their communication strategies in the digital age.
Using Query Store in Azure PostgreSQL to Understand Query PerformanceGrant Fritchey
Microsoft has added an excellent new extension in PostgreSQL on their Azure Platform. This session, presented at Posette 2024, covers what Query Store is and the types of information you can get out of it.