MapR offers an enterprise distribution of Hadoop that supports a broad range of use cases. It has been chosen by major companies like Google and Amazon for its capabilities. The document discusses three stories of how companies have benefited from using MapR: 1) A telecom company was able to offload ETL processing to gain a 20x cost performance advantage. 2) A company improved the performance of a recommendation engine on large datasets. 3) A machine learning expert was able to reproduce models accurately and explain previous recommendations.
The case of vehicle networking financial services accomplished by China MobileDataWorks Summit
As the largest mobile telecom carrier in the world, China Mobile has the world's largest wireless mobile network, based on the existing vehicle networking equipment (CAN-bus, OBD, ADAS, equipment fatigue warning system, GPS, driving recorder, etc.), which can provide vehicle networking service, based on vehicle networking data analysis and provide users risk assessment, vehicle real-time risk monitoring, and comprehensive financial institutions for the vehicle and provide data support for differentiated financial services.
The main contents include the following:
1. Vehicle and drivers data collection: Collecting information of vehicle's mechanical status, driving behavior, and surrounding environment through OBD, ADAS, fatigue warning system, GPS, and other equipment.
2. AI technology application: mainly include the identification of the driver's body state, the wine driving, the fatigue degree, and so on.
3. To improve the accuracy and applicability of the risk assessment model through machine learning.
Speaker
Duan Yunfeng, Chief Designer of China Mobile's big data system, China Mobile Communications Corporation
Keynote: Levi Bailey, Humana | Improving Health with Event-Driven Architectur...confluent
Humana is at the forefront of an industry-wide effort to improve health outcomes while keeping costs in check through improved interoperability. Levi Bailey, Associate Vice President, Cloud Architecture, will share Humana’s vision and how an event-driven architecture, built with Kafka and Confluent, powers the interoperability platforms at the heart of the company’s digital transformation.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
The case of vehicle networking financial services accomplished by China MobileDataWorks Summit
As the largest mobile telecom carrier in the world, China Mobile has the world's largest wireless mobile network, based on the existing vehicle networking equipment (CAN-bus, OBD, ADAS, equipment fatigue warning system, GPS, driving recorder, etc.), which can provide vehicle networking service, based on vehicle networking data analysis and provide users risk assessment, vehicle real-time risk monitoring, and comprehensive financial institutions for the vehicle and provide data support for differentiated financial services.
The main contents include the following:
1. Vehicle and drivers data collection: Collecting information of vehicle's mechanical status, driving behavior, and surrounding environment through OBD, ADAS, fatigue warning system, GPS, and other equipment.
2. AI technology application: mainly include the identification of the driver's body state, the wine driving, the fatigue degree, and so on.
3. To improve the accuracy and applicability of the risk assessment model through machine learning.
Speaker
Duan Yunfeng, Chief Designer of China Mobile's big data system, China Mobile Communications Corporation
Keynote: Levi Bailey, Humana | Improving Health with Event-Driven Architectur...confluent
Humana is at the forefront of an industry-wide effort to improve health outcomes while keeping costs in check through improved interoperability. Levi Bailey, Associate Vice President, Cloud Architecture, will share Humana’s vision and how an event-driven architecture, built with Kafka and Confluent, powers the interoperability platforms at the heart of the company’s digital transformation.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
In this session, we provide an update on Amazon Redshift, and look at a case study from Equinox Fitness Clubs. We show you how Amazon Redshift queries data across your data warehouse and data lake, without the need or delay of loading data, to deliver insights you cannot obtain by querying independent data silos. Discover how Equinox Fitness Clubs transitioned from on-premises data warehouses and data marts to a cloud-based, integrated data platform, built on AWS and Amazon Redshift. Learn about their journey from static reports, redundant data, and inefficient data integration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Explore the various options for streaming data on AWS, such as Amazon Kinesis and Amazon Managed Streaming for Kafka, and the various options for processing streams of data such as Apache Spark, Apache Flink, AWS Lambda, and Amazon Kinesis Analytics for Java. Let's explore what an architecture for processing Australia's new Open Banking data format at 60,000 transactions per second could look like.
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
Today, in 2022, Apache Kafka is the central nervous system of many applications in various areas related to the automotive and manufacturing industry for processing analytical and transactional data in motion across edge, hybrid, and multi-cloud deployments.
This presentation explores the automotive event streaming landscape, including connected vehicles, smart manufacturing, supply chain optimization, aftersales, mobility services, and innovative new business models.
Afterwards, many real-world examples are shown from companies such as Audi, BMW, Porsche, Tesla, Uber, Grab, and FREENOW.
More detail in the blog post:
https://www.kai-waehner.de/blog/2022/01/12/apache-kafka-landscape-for-automotive-and-manufacturing/
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
Business digitalization trends like microservices, the Internet of Things or Machine Learning are driving the need to process events at a whole new scale, speed and efficiency. Traditional solutions like ETL/data integration or messaging are not build to serve these needs.
Today, the open source project Apache Kafka® is being used by thousands of companies including over 60% of the Fortune 100 to power and innovate their businesses by focusing their data strategies around event-driven architectures leveraging event streaming.We will discuss the market and technology changes that have given rise to Kafka and to Event Streaming, and we will introduce the audience to the key aspects of building an Event streaming platform with Kafka. Examples of productive use cases from the automotive, manufacturing and transportation sector will showcase the power of event streaming.
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
Machine Learning is separated into model training and model inference. ML frameworks typically load historical data from a data store like HDFS or S3 to train models. This talk shows how you can completely avoid such a data store by ingesting streaming data directly via Apache Kafka from any source system into TensorFlow for model training and model inference using the capabilities of “TensorFlow I/O” add-on.
The talk compares this modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical, real time ML architecture with muss less headaches and problems.
Key takeaways for the audience
• Scalable open source Machine Learning infrastructure
• Streaming ingestion into TensorFlow without the need for another data store like HDFS or S3 (leveraging TensorFlow I/O and its Kafka plugin)
• Stream Processing using analytic models in mission-critical deployments to act in Real Time
• Learn how Apache Kafka open source ecosystem including Kafka Connect, Kafka Streams and KSQL help to build, deploy, score and monitor analytic models
• Comparison and trade-offs between this modern streaming approach and traditional batch model training infrastructures
Transform and Bridge the Digital Disconnect with SAP SolutionsCapgemini
Discover a customer information system that delivers utility and property tax bills to citizens using a CRM and billing solution, a multichannel foundation, and a computing platform, all from SAP.
See how the City of Kitchener, with Capgemini, started a transformation journey to deliver top services to those who have chosen Kitchener as their home.
Presented at SAPPHIRE NOW 2016.
DataWorks Summit 2017 - Sydney Keynote
Scott Gnau, Chieft Technology Officer, Hortonworks
Data has become the most valuable asset for every enterprise. As businesses undergo data transformation, leading organizations are turning to data science and machine learning to drive more business value out of their data. In this talk, Scott will examine the trends and the key requirements needed to evolve to next-generation analytics and operations.
Learnings from our Journey to Become an Event-Driven Customer Data Platform (...confluent
In this talk, veteran Kafka success expert, Mitch Henderson, will covers the most common mistakes people make when first adopting Apache Kafka. We will start with smaller single application deployments and work our way up to common mistakes made by larger centralized platform deployments. Among other things, we'll discuss: - Rules of thumb for initial infrastructure and deployment considerations - protecting cluster stability from rouge applications hell bent on destruction - Single large cluster or many small? How to decide? - Getting users onboard and doing productive work After this session, you'll be able to avoid the common mistakes, and confidently guide your team toward a better class of mistakes - those that are directly relevant to the value you add to the business
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
Deep learning, machine learning, artificial intelligence - all buzzwords and representative of the future of analytics. In this talk we will explain what is machine learning and deep learning at a high level with some real world examples. The goal of this is not to turn you into a data scientist, but to give you a better understanding of what you can do with machine learning. Machine learning is becoming more accessible to developers, and Data scientists work with domain experts, architects, developers and data engineers, so it is important for everyone to have a better understanding of the possibilities. Every piece of information that your business generates has potential to add value. This and future posts are meant to provoke a review of your own data to identify new opportunities.
We’re in the midst of an exciting paradigm shift in terms of how we process events data in real time to better react to business opportunities or risk. To stay ahead of your competition, you need the ability to react to business-critical events as they happen. These critical events are created through diverse sources such as social interaction, machine sensors, or a customer transaction. How can you understand the meaning and context of these events that ultimately define your business?
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Extending the Reach of R to the Enterprise with TERR and SpotfireLou Bajuk
An overview of how TIBCO integrates dynamic, interactive visual applications in Spotfire with predictive and advanced analytics in the R language, using TIBCO Enterprise Runtime for R--our R-compatible, enterprise-grade platform for the R language.
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutions under varying performance and operational requirements.
Which configuration is faster for big files? Which is faster for sharing data and servers among groups? Which eliminates data movement? Which is easiest to manage? Which works best with iterative and multistep algorithms? What are the hardware requirements of each alternative?
This webinar is intended to help new users of R with Hadoop select their best architecture for integrating Hadoop and R, by explaining the benefits of several popular configurations, their performance potential, workload handling and programming model and administrative characteristics.
Presenters from Revolution Analytics will describe the options for using Revolution R Open and Revolution R Enterprise with Hadoop including servers, edge nodes, rHadoop and ScaleR. We’ll then compare the characteristics of each configuration as regards performance but also programming model, administration, data movement, ease of scaling, mixed workload handling, and performance for large individual analyses vs. mixed workloads.
Amazon Redshift Update and How Equinox Fitness Clubs Migrated to a Modern Dat...Amazon Web Services
In this session, we provide an update on Amazon Redshift, and look at a case study from Equinox Fitness Clubs. We show you how Amazon Redshift queries data across your data warehouse and data lake, without the need or delay of loading data, to deliver insights you cannot obtain by querying independent data silos. Discover how Equinox Fitness Clubs transitioned from on-premises data warehouses and data marts to a cloud-based, integrated data platform, built on AWS and Amazon Redshift. Learn about their journey from static reports, redundant data, and inefficient data integration to a modern and flexible data lake and data warehouse architecture that delivers dynamic reports based on trusted data.
Explore the various options for streaming data on AWS, such as Amazon Kinesis and Amazon Managed Streaming for Kafka, and the various options for processing streams of data such as Apache Spark, Apache Flink, AWS Lambda, and Amazon Kinesis Analytics for Java. Let's explore what an architecture for processing Australia's new Open Banking data format at 60,000 transactions per second could look like.
Apache Kafka Landscape for Automotive and ManufacturingKai Wähner
Today, in 2022, Apache Kafka is the central nervous system of many applications in various areas related to the automotive and manufacturing industry for processing analytical and transactional data in motion across edge, hybrid, and multi-cloud deployments.
This presentation explores the automotive event streaming landscape, including connected vehicles, smart manufacturing, supply chain optimization, aftersales, mobility services, and innovative new business models.
Afterwards, many real-world examples are shown from companies such as Audi, BMW, Porsche, Tesla, Uber, Grab, and FREENOW.
More detail in the blog post:
https://www.kai-waehner.de/blog/2022/01/12/apache-kafka-landscape-for-automotive-and-manufacturing/
The Rise Of Event Streaming – Why Apache Kafka Changes EverythingKai Wähner
Business digitalization trends like microservices, the Internet of Things or Machine Learning are driving the need to process events at a whole new scale, speed and efficiency. Traditional solutions like ETL/data integration or messaging are not build to serve these needs.
Today, the open source project Apache Kafka® is being used by thousands of companies including over 60% of the Fortune 100 to power and innovate their businesses by focusing their data strategies around event-driven architectures leveraging event streaming.We will discuss the market and technology changes that have given rise to Kafka and to Event Streaming, and we will introduce the audience to the key aspects of building an Event streaming platform with Kafka. Examples of productive use cases from the automotive, manufacturing and transportation sector will showcase the power of event streaming.
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai Wähner
Machine Learning is separated into model training and model inference. ML frameworks typically load historical data from a data store like HDFS or S3 to train models. This talk shows how you can completely avoid such a data store by ingesting streaming data directly via Apache Kafka from any source system into TensorFlow for model training and model inference using the capabilities of “TensorFlow I/O” add-on.
The talk compares this modern streaming architecture to traditional batch and big data alternatives and explains benefits like the simplified architecture, the ability of reprocessing events in the same order for training different models, and the possibility to build a scalable, mission-critical, real time ML architecture with muss less headaches and problems.
Key takeaways for the audience
• Scalable open source Machine Learning infrastructure
• Streaming ingestion into TensorFlow without the need for another data store like HDFS or S3 (leveraging TensorFlow I/O and its Kafka plugin)
• Stream Processing using analytic models in mission-critical deployments to act in Real Time
• Learn how Apache Kafka open source ecosystem including Kafka Connect, Kafka Streams and KSQL help to build, deploy, score and monitor analytic models
• Comparison and trade-offs between this modern streaming approach and traditional batch model training infrastructures
Transform and Bridge the Digital Disconnect with SAP SolutionsCapgemini
Discover a customer information system that delivers utility and property tax bills to citizens using a CRM and billing solution, a multichannel foundation, and a computing platform, all from SAP.
See how the City of Kitchener, with Capgemini, started a transformation journey to deliver top services to those who have chosen Kitchener as their home.
Presented at SAPPHIRE NOW 2016.
DataWorks Summit 2017 - Sydney Keynote
Scott Gnau, Chieft Technology Officer, Hortonworks
Data has become the most valuable asset for every enterprise. As businesses undergo data transformation, leading organizations are turning to data science and machine learning to drive more business value out of their data. In this talk, Scott will examine the trends and the key requirements needed to evolve to next-generation analytics and operations.
Learnings from our Journey to Become an Event-Driven Customer Data Platform (...confluent
In this talk, veteran Kafka success expert, Mitch Henderson, will covers the most common mistakes people make when first adopting Apache Kafka. We will start with smaller single application deployments and work our way up to common mistakes made by larger centralized platform deployments. Among other things, we'll discuss: - Rules of thumb for initial infrastructure and deployment considerations - protecting cluster stability from rouge applications hell bent on destruction - Single large cluster or many small? How to decide? - Getting users onboard and doing productive work After this session, you'll be able to avoid the common mistakes, and confidently guide your team toward a better class of mistakes - those that are directly relevant to the value you add to the business
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
Demystifying AI, Machine Learning and Deep LearningCarol McDonald
Deep learning, machine learning, artificial intelligence - all buzzwords and representative of the future of analytics. In this talk we will explain what is machine learning and deep learning at a high level with some real world examples. The goal of this is not to turn you into a data scientist, but to give you a better understanding of what you can do with machine learning. Machine learning is becoming more accessible to developers, and Data scientists work with domain experts, architects, developers and data engineers, so it is important for everyone to have a better understanding of the possibilities. Every piece of information that your business generates has potential to add value. This and future posts are meant to provoke a review of your own data to identify new opportunities.
We’re in the midst of an exciting paradigm shift in terms of how we process events data in real time to better react to business opportunities or risk. To stay ahead of your competition, you need the ability to react to business-critical events as they happen. These critical events are created through diverse sources such as social interaction, machine sensors, or a customer transaction. How can you understand the meaning and context of these events that ultimately define your business?
Applying Machine learning to IOT: End to End Distributed Distributed Pipeline...Carol McDonald
This discusses the architecture of an end-to-end application that combines streaming data with machine learning to do real-time analysis and visualization of where and when Uber cars are clustered, so as to analyze and visualize the most popular Uber locations.
Extending the Reach of R to the Enterprise with TERR and SpotfireLou Bajuk
An overview of how TIBCO integrates dynamic, interactive visual applications in Spotfire with predictive and advanced analytics in the R language, using TIBCO Enterprise Runtime for R--our R-compatible, enterprise-grade platform for the R language.
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutions under varying performance and operational requirements.
Which configuration is faster for big files? Which is faster for sharing data and servers among groups? Which eliminates data movement? Which is easiest to manage? Which works best with iterative and multistep algorithms? What are the hardware requirements of each alternative?
This webinar is intended to help new users of R with Hadoop select their best architecture for integrating Hadoop and R, by explaining the benefits of several popular configurations, their performance potential, workload handling and programming model and administrative characteristics.
Presenters from Revolution Analytics will describe the options for using Revolution R Open and Revolution R Enterprise with Hadoop including servers, edge nodes, rHadoop and ScaleR. We’ll then compare the characteristics of each configuration as regards performance but also programming model, administration, data movement, ease of scaling, mixed workload handling, and performance for large individual analyses vs. mixed workloads.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
Real-time Puppies and Ponies - Evolving Indicator Recommendations in Real-timeTed Dunning
This talk describes how indicator-based recommendations can be evolved in real time. Normally, indicator-based recommendations use a large off-line computation to understand the general structure of items to be recommended and then make recommendations in real-time to users based on a comparison of their recent history versus the large-scale product of the off-line computation.
In this talk, I show how the same components of the off-line computation that guarantee linear scalability in a batch setting also give strict real-time bounds on the cost of a practical real-time implementation of the indicator computation.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
How Spark is Enabling the New Wave of Converged ApplicationsMapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single compute engine. Spark is speeding up data pipeline development, enabling richer predictive analytics, and bringing a new class of applications to market.
Using Familiar BI Tools and Hadoop to Analyze Enterprise NetworksMapR Technologies
From the Hadoop Summit 2015 Session with Nick Amato.
This session examines practical ways you can begin leveraging network data sources in Hadoop using familiar technologies like SQL and BI tools. Using the diverse sets of sources available, such as traces, routing protocol data, and direct packet captures from critical network locations, we will examine the capabilities of BI tools in the network context and examine cases for extracting value from data collected from the network infrastructure.
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single, general-purpose compute engine.
But is Spark alone sufficient for developing cloud-based big data applications? What are the other required components for supporting big data cloud processing? How can you accelerate the development of applications which extend across Spark and other frameworks such as Kafka, Hadoop, NoSQL databases, and more?
Presented at the MLConf in Seattle, this presentation offers a quick introduction to Apache Spark, followed by an overview of two novel features for data science
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
End of maintenance for MapR 4.x is coming in January, so now is a good time to plan your upgrade. Please join us to learn about the recent developments during the past year in the MapR Platform that will make the upgrade effort this year worthwhile.
Tech-Talk at Bay Area Spark Meetup
Apache Spark(tm) has rapidly become a key tool for data scientists to explore, understand and transform massive datasets and to build and train advanced machine learning models. The question then becomes, how do I deploy these model to a production environment. How do I embed what I have learned into customer facing data applications. Like all things in engineering, it depends.
In this meetup, we will discuss best practices from Databricks on how our customers productionize machine learning models and do a deep dive with actual customer case studies and live demos of a few example architectures and code in Python and Scala. We will also briefly touch on what is coming in Apache Spark 2.X with model serialization and scoring options.
Practical Machine Learning Pipelines with MLlibDatabricks
This talk from 2015 Spark Summit East discusses Pipelines and related concepts introduced in Spark 1.2 which provide a simple API for users to set up complex ML workflows.
Apache Spark MLlib 2.0 Preview: Data Science and ProductionDatabricks
This talk highlights major improvements in Machine Learning (ML) targeted for Apache Spark 2.0. The MLlib 2.0 release focuses on ease of use for data science—both for casual and power users. We will discuss 3 key improvements: persisting models for production, customizing Pipelines, and improvements to models and APIs critical to data science.
(1) MLlib simplifies moving ML models to production by adding full support for model and Pipeline persistence. Individual models—and entire Pipelines including feature transformations—can be built on one Spark deployment, saved, and loaded onto other Spark deployments for production and serving.
(2) Users will find it much easier to implement custom feature transformers and models. Abstractions automatically handle input schema validation, as well as persistence for saving and loading models.
(3) For statisticians and data scientists, MLlib has doubled down on Generalized Linear Models (GLMs), which are key algorithms for many use cases. MLlib now supports more GLM families and link functions, handles corner cases more gracefully, and provides more model statistics. Also, expanded language APIs allow data scientists using Python and R to call many more algorithms.
Finally, we will demonstrate these improvements live and show how they facilitate getting started with ML on Spark, customizing implementations, and moving to production.
Achieving Business Value by Fusing Hadoop and Corporate DataInside Analysis
The Briefing Room with Richard Hackathorn and Teradata
Live Webcast March 25, 2015
Watch the Archive: https://bloorgroup.webex.com/bloorgroup/onstage/g.php?MTID=e7254708146d056339a0974f097f569b2
Hadoop data lakes are emerging as peers to corporate data warehouses. However, successful analytic solutions require a fusion of all relevant data, big and small, which has proven challenging for many companies. By allowing business analysts to quickly access data wherever it rests, success factors shift to focus on three key aspects: 1) business objectives, 2) organizational workflow, and 3) data placement.
Register for this Special Edition of The Briefing Room to hear veteran Analyst Richard Hackathorn as he provides details from his recent research report focused on success stories using Teradata QueryGrid. Examples of use cases described will include:
Joining sensor data in Hadoop with data warehouse labor schedules in seconds
How bridging corporate cultures and systems creates new business opportunities
The 360 view of customer journeys using weblogs in Hadoop via BI tools
How can you put the data where you want and query it however you want
Virtualizing Hadoop data with Teradata QueryGrid
Visit InsideAnalysis.com for more information.
What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.
The unification of big and little data processing onto a single platform is an important requirement for Hadoop. How can this be achieved? Ted Dunning explains what is needed for three important use cases.
Big Data Solutions on Cloud – The Way Forward by Kiththi Perera SLTKiththi Perera
ITU-TRCSL Symposium on Cloud Computing 2015 Colombo
Session 04: Big Data Strategy in the Cloud and Applications
Speaker's PPT by K. A. Kiththi Perera, Chief Enterprise and Wholesale Officer, Sri Lanka Telecom
Network and IT Ops Series: Build Production Solutions Neo4j
Jeff Morris, Director, Neo4j:Are you building a breakthrough product or extending an existing one? Do you need introduce new capabilities based on insights from data relationships? If so, you should consider embedding a graph database.
For software providers building products to assure quality network operations or security, using an embedded graph database may open new customer opportunities. Watch this webinar to learn how you can easily differentiate your applications and take your solutions to market faster with a native graph database like Neo4j.
Sap Leonardo - what is it, and why would I want one?Tom Raftery
A quick run through of the technologies running through SAP's Innovation portfolio of products, called SAP Leonardo, and use cases where it has been deployed successfully with customers
Recent work in recommendations allows some really amazing simplicity of implementation while extending the inputs handled to multiple kinds of interactions against items different from the ones being recommended.
Multi-Cloud Breaks IT Ops: Best Practices to De-Risk Your Cloud StrategyThousandEyes
Organizations are using multiple IaaS and SaaS providers today, yet traditional ITOps processes and tools are straining to cope with a vast new scope of challenges and risks. Recent research by Enterprise Management Associates (EMA) shows that 74% of enterprise network teams had incumbent network monitoring tools failing to address cloud requirements. As IT business leaders responsible for delivering services in this new ecosystem, how do you equip yourself with the right visibility?
Shamus McGillicuddy, Research Director for EMA’s network management practice, and Archana Kesavan, Director of Product Marketing at ThousandEyes dive deep into the challenges of multi-cloud and how to rethink your monitoring strategy and operational delivery processes.
Uncover:
Five common IT operational challenges of multi-cloud identified in recent EMA research
The risks of not evolving ITOps for a managed cloud environment
Four monitoring best practices for a cloud-centric IT Operation
Serhii Kholodniuk: What you need to know, before migrating data platform to G...Lviv Startup Club
Serhii Kholodniuk: What you need to know, before migrating data platform to GCP (Google cloud platform)
AI & BigData Online Day 2022
Website: https://aiconf.com.ua
Youtube: https://www.youtube.com/startuplviv
FB: https://www.facebook.com/aiconf
MapR Technologies Chief Marketing Officer, Jack Norris, talks about the advantages of Hadoop. He elaborates and multiple use cases and explains how MapR Technologies is the best Hadoop distribution.
Recent work in recommendations allows some really amazing simplicity of implementation while extending the inputs handled to multiple kinds of interactions against items different from the ones being recommended.
How Data-Driven Approaches are Changing Your Data Management Strategies
Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
How Rendezvous Architecture Improves Evaluation in the Real World
In this addition of our machine learning logistics webinar series we build on the ideas of the key requirements for effective management of machine learning logistics presented in the Overview webinar and in Part I Workshop. Here we focus on model-to-model comparison & evaluation, use of decoy models and more. Listen here: http://info.mapr.com/machine-learning-workshop2.html?_ga=2.35695522.324200644.1511891424-416597139.1465233415
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
Big data technologies are being applied to a wide variety of use cases. We will review tangible examples of machine learning, discuss an autonomous driving project and illustrate the role of MapR in next generation initiatives. More: http://info.mapr.com/WB_Machine-Learning-for-Chickens_Global_DG_17.11.02_RegistrationPage.html
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
For this talk we will explore the power of streaming real time events in the context of the IoT and smart cities.
http://info.mapr.com/WB_Streaming-Real-Time-Events_Global_DG_17.08.02_RegistrationPage.html
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html
MapR announced a few new releases in 2017, and we want to go over those exciting new products and features that are available now. We’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about the latest updates.
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
SAP HANA is an increasingly popular platform for various analytical and transactional use cases with its in-memory architecture. If you’re an SAP customer you’ve experienced the benefits.
However, the underlying storage for SAP HANA is painfully expensive. This slows down your ability to grow your SAP HANA footprint and serve up more applications.
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
MapR has been selected by two of the companies most experienced with MapReduce technology which is a testament to the technology advanges of MapR’s distribution. Amazon through its Elastic MapReduce service (EMR) hosted over 2 million clusters in the past year. Amazon selected MapR to complement EMR as the only commercial Hadoop distribution being offered, sold and supported as a service by Amazon to its customers. MapR was also selected by Google – the pioneer of MapReduce and the company whose white paper on MapReduce inspired the creation of Hadoop – has also selected MapR to make our distribution available on Google Compute Engine. Hadoop in the cloud makes a great deal of sense: the elastic resource allocation that cloud computing is premised on works well for cluster-based data processing infrastructure used on varying analyses and data sets of indeterminate size. MapR has unique features such as mirroring between sites and multi-tenancy support that further enhance cloud deployments
MapR is used today across industries. We have 10 of the Fortune 100 that are using MapR in production. We have leading web 2.0 properties such as leading digital advertising platforms, using MapR.These customers are using MapR in production for a variety of use cases. Examples include one of the largest credit card issuers in the world that has standardized on MapR for fraud and consumer targeting applications.Other examples include a major health care group,national cyber security, and one of the largest retailers in the world. These are all provided by MapR’s complete distribution for Apache Hadoop