We’re in the midst of an exciting paradigm shift in terms of how we process events data in real time to better react to business opportunities or risk. To stay ahead of your competition, you need the ability to react to business-critical events as they happen. These critical events are created through diverse sources such as social interaction, machine sensors, or a customer transaction. How can you understand the meaning and context of these events that ultimately define your business?
NoSQL Application Development with JSON and MapR-DBMapR Technologies
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
Big data presents both enormous challenges and incredible opportunities for companies in today’s competitive environment. To deal with the rapid growth of global data, companies have turned to Hadoop to help them with performing real-time search, obtaining fast and efficient analytics, and predicting behaviors and trends. In this session, we’ll demonstrate how we successfully leveraged Hadoop and its ecosystem components to build a converged data infrastructure to meet these needs.
With the general availability of the MapR Converged Data Platform 5.2, we’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about this exciting new release.
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single, general-purpose compute engine.
But is Spark alone sufficient for developing cloud-based big data applications? What are the other required components for supporting big data cloud processing? How can you accelerate the development of applications which extend across Spark and other frameworks such as Kafka, Hadoop, NoSQL databases, and more?
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
NoSQL Application Development with JSON and MapR-DBMapR Technologies
NoSQL databases are being used everywhere by startups and Global 2000 companies alike for data environments that require cost-effective scaling. These environments also typically need to represent data in a more flexible way than is practical with relational databases.
Xactly: How to Build a Successful Converged Data Platform with Hadoop, Spark,...MapR Technologies
Big data presents both enormous challenges and incredible opportunities for companies in today’s competitive environment. To deal with the rapid growth of global data, companies have turned to Hadoop to help them with performing real-time search, obtaining fast and efficient analytics, and predicting behaviors and trends. In this session, we’ll demonstrate how we successfully leveraged Hadoop and its ecosystem components to build a converged data infrastructure to meet these needs.
With the general availability of the MapR Converged Data Platform 5.2, we’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about this exciting new release.
How Spark is Enabling the New Wave of Converged Cloud Applications MapR Technologies
Apache Spark has become the de-facto compute engine of choice for data engineers, developers, and data scientists because of its ability to run multiple analytic workloads with a single, general-purpose compute engine.
But is Spark alone sufficient for developing cloud-based big data applications? What are the other required components for supporting big data cloud processing? How can you accelerate the development of applications which extend across Spark and other frameworks such as Kafka, Hadoop, NoSQL databases, and more?
Open Source Innovations in the MapR Ecosystem Pack 2.0MapR Technologies
Over the summer, we introduced the MapR Ecosystem Pack (MEP) which is a natural evolution of our existing software update program that decouples open source ecosystem updates from core platform updates. MEP gives our customers quick access to the latest open source innovations while also ensuring cross-project compatibility in any given MEP version.
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
Please join us to learn about the recent developments during the past year in the MapR Community Edition. In these slides, we will cover the following platform updates:
-Taking cluster monitoring to the next level with the Spyglass Initiative
-Real-time streaming with MapR Streams
-MapR-DB JSON document database and application development with OJAI
-Securing your data with access control expressions (ACEs)
We're introducing MapR Streams, a reliable, global event streaming system that connects data producers and data consumers across shared topics of information. With the integration of MapR Streams, comes the industry’s first and only converged data platform that integrates file, database, event streaming, and analytics to accelerate data-driven applications and address emerging IoT needs.
Are you ready to accelerate your business with the power of a truly global platform for integrating data-in-motion with data-at-rest?
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
End of maintenance for MapR 4.x is coming in January, so now is a good time to plan your upgrade. Please join us to learn about the recent developments during the past year in the MapR Platform that will make the upgrade effort this year worthwhile.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
Building a robust, responsive, secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models:
• Document representation for patient profile view or update
• Graph representation to query relationships between patients, providers, and medications
• Search representation for advanced lookups
Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at a Health Information Network provider.
This talk will go over the Kafka API with these design patterns:
• Turning the database upside down
• Event Sourcing , Command Query Responsibity Separation , Polyglot Persistence
• Kappa Architecture
What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensMapR Technologies
To get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. The Hadoop revolution is about merging business analytics and production operations to create the ‘as-it-happens’ business. It’s not a matter of running a few queries to gain insight to make the next business decision but, to change the organization’s fundamental metabolic rate. It is essential to take a data centric approach to infrastructure to provide flexible, real-time data access, collapsing data silos and automating data-to-action for immediate operational benefits.
MapR 5.2: Getting More Value from the MapR Converged Community EditionMapR Technologies
Please join us to learn about the recent developments during the past year in the MapR Community Edition. In these slides, we will cover the following platform updates:
-Taking cluster monitoring to the next level with the Spyglass Initiative
-Real-time streaming with MapR Streams
-MapR-DB JSON document database and application development with OJAI
-Securing your data with access control expressions (ACEs)
We're introducing MapR Streams, a reliable, global event streaming system that connects data producers and data consumers across shared topics of information. With the integration of MapR Streams, comes the industry’s first and only converged data platform that integrates file, database, event streaming, and analytics to accelerate data-driven applications and address emerging IoT needs.
Are you ready to accelerate your business with the power of a truly global platform for integrating data-in-motion with data-at-rest?
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
End of maintenance for MapR 4.x is coming in January, so now is a good time to plan your upgrade. Please join us to learn about the recent developments during the past year in the MapR Platform that will make the upgrade effort this year worthwhile.
How Big Data is Reducing Costs and Improving Outcomes in Health CareCarol McDonald
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this talk to hear how MapR customers are using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer.
We will cover big data healthcare trends and production use cases that demonstrate how to deliver data-driven healthcare applications
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Anomaly Detection in Telecom with Spark - Tugdual Grall - Codemotion Amsterda...Codemotion
Telecom operators need to find operational anomalies in their networks very quickly. This need, however, is shared with many other industries as well so there are lessons for all of us here. Spark plus a streaming architecture can solve these problems very nicely. I will present both a practical architecture as well as design patterns and some detailed algorithms for detecting anomalies in event streams. These algorithms are simple but quite general and can be applied across a wide variety of situations.
Streaming Patterns Revolutionary Architectures with the Kafka APICarol McDonald
Building a robust, responsive, secure data service for healthcare is tricky. For starters, healthcare data lends itself to multiple models:
• Document representation for patient profile view or update
• Graph representation to query relationships between patients, providers, and medications
• Search representation for advanced lookups
Keeping these different systems up to date requires an architecture that can synchronize them in real time as data is updated. Furthermore, meeting audit requirements in Healthcare requires the ability to apply granular cross-datacenter replication policies to data and be able to provide detailed lineage information for each record. This post will describe how stream-first architectures can solve these challenges, and look at how this has been implemented at a Health Information Network provider.
This talk will go over the Kafka API with these design patterns:
• Turning the database upside down
• Event Sourcing , Command Query Responsibity Separation , Polyglot Persistence
• Kappa Architecture
What is the future of Hadoop?
What is the new future of Hadoop?
How is that different from the old one?
Here is how Ted Dunning answered these questions at the winter Hadoop Conference of Japan 2013.
Strata+Hadoop 2015 Keynote: Impacting Business as it HappensMapR Technologies
To get value out of today’s big and fast data, organizations must evolve beyond traditional analytic cycles that are heavy with data transformation and schema management. The Hadoop revolution is about merging business analytics and production operations to create the ‘as-it-happens’ business. It’s not a matter of running a few queries to gain insight to make the next business decision but, to change the organization’s fundamental metabolic rate. It is essential to take a data centric approach to infrastructure to provide flexible, real-time data access, collapsing data silos and automating data-to-action for immediate operational benefits.
What if you could get over $3 back for every $1 you invest in big data technology? Recent research* by IDC shows that big data ROI is for real, and it can be huge, at an average of 382% 3-year ROI for the organizations that were studied.
In this deck, Carl Olofson, Research Vice President, Data Management Software Research for IDC, shares his findings on nine MapR customers and discusses:
+ The business value they gained from their big data deployments
+ An average of 42% reduction in cost over alternative big data systems
+ 31% higher productivity for data scientists
+ 39% increased productivity for application developers
Dale Kim, Sr. Director of Industry Solutions at MapR Technologies, then explains how the MapR Converged Data Platform advantages drive significant ROI for customers.
*Research comes from IDC Document #US40870615
Get the report here: http://www.mapr.com/idc-researches-business-value-mapr?source=Social&campaign=2016_Content_IDCReportMapRBusinessValue&utm_source=Social&utm_medium=Slideshare&utm_campaign=IDC+Report
ElasticES-Hadoop: Bridging the world of Hadoop and ElasticsearchMapR Technologies
In this talk, we will provide an overview of Elasticsearch for Apache Hadoop (ES-Hadoop), which includes integrations between the various Hadoop libraries, whether batch (Map/Reduce, Pig, Hive) or stream oriented (such as Apache Spark). We will also cover the YARN support and the HDFS snapshot/restore plugin available as part of ES-Hadoop. We will talk about the upcoming ES-Hadoop 2.1 GA release and near-term roadmap.
This was one of the talks that I gave at the Strata San Jose conference. I migrated my topic a bit, but here is the original abstract:
Application developers and architects today are interested in making their applications as real-time as possible. To make an application respond to events as they happen, developers need a reliable way to move data as it is generated across different systems, one event at a time. In other words, these applications need messaging.
Messaging solutions have existed for a long time. However, when compared to legacy systems, newer solutions like Apache Kafka offer higher performance, more scalability, and better integration with the Hadoop ecosystem. Kafka and similar systems are based on drastically different assumptions than legacy systems and have vastly different architectures. But do these benefits outweigh any tradeoffs in functionality? Ted Dunning dives into the architectural details and tradeoffs of both legacy and new messaging solutions to find the ideal messaging system for Hadoop.
Topics include:
* Queues versus logs
* Security issues like authentication, authorization, and encryption
* Scalability and performance
* Handling applications that span multiple data centers
* Multitenancy considerations
* APIs, integration points, and more
Processing data from social media streams and sensors in real-time is becoming increasingly prevalent and there are plenty open source solutions to choose from. To help practitioners decide what to use when we compare three popular Apache projects allowing to do stream processing: Apache Storm, Apache Spark and Apache Samza.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
Baptist Health: Solving Healthcare Problems with Big DataMapR Technologies
Editor’s Note: Download the complimentary MapR Guide to Big Data in Healthcare for more information: https://mapr.com/mapr-guide-big-data-healthcare/
There is no better example of the important role that data plays in our lives than in matters of our health and our healthcare. There’s a growing wealth of health-related data out there, and it’s playing an increasing role in improving patient care, population health, and healthcare economics.
Join this webinar to hear how Baptist Health is using big data and advanced analytics to address a myriad of healthcare challenges—from patient to payer—through their consumer- centric approach.
MapR Technologies will cover broader big data healthcare trends and production use cases that demonstrate how to converge data and compute power to deliver data-driven healthcare applications.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
Presented by Jack Norris, SVP Data & Applications at Gartner Symposium 2016.
Jack presents how companies from TransUnion to Uber use event-driven processing to transform their business with agility, scale, robustness, and efficiency advantages.
More info: https://www.mapr.com/company/press-releases/mapr-present-gartner-symposiumitxpo-and-other-notable-industry-conferences
Fast and Furious: From POC to an Enterprise Big Data Stack in 2014MapR Technologies
View this webinar presentation as CenturyLink Technology Solutions (Formerly Savvis) and MapR as we deconstruct and demystify “the enterprise big data stack.” We provide you with a more holistic view of the landscape, explore use cases to show how you can derive business value from it, and share best practices for navigating through the fragmented big data environment.
This presentation provides an introduction to Apache Kafka and describes best practices for working with fast data streams in Kafka and MapR Streams.
The code examples used during this talk are available at github.com/iandow/design-patterns-for-fast-data.
Author:
Ian Downard
Presented at the Portland Java User Group on Tuesday, October 18 2016.
Many of the systems we want to monitor happen as a stream of events, examples include event data from web or mobile applications, sensors, medical devices. What do we need to do to build a real time streaming application , and how do we do this with High Performance at Scale?
This Free Code Friday will help you get a jump-start on scaling distributed computing by taking an example time series application and coding through different aspects of working with such a dataset. We will cover building an end to end distributed processing pipeline using MapR Streams (Kafka API), Apache Spark, and MapR-DB (HBase API), to rapidly ingest, process and store large volumes of high speed data.
Reinventing the Modern Information Pipeline: Paxata and MapRLilia Gutnik
(Presented at MapR's Big Data Everywhere event in Redwood City, CA in December 2016)
The relationship between business teams and IT has changed as the complexity of data has increased. A traditional data pipeline designed for an IT-centered approach to information management is not designed for the data demands of today's business decisions. Designing a big data strategy requires modernizing previous approaches. Self-service data preparation in a collaborative, intuitive, governed, and secure environment is the key to a nimble and decisive business unit.
Hadoop in 2015: Keys to Achieving Operational Excellence for the Real-Time En...MapR Technologies
In this webinar, Carl W. Olofson, Research Vice President, Application Development and Deployment for IDC, and Dale Kim, Director of Industry Solutions for MapR, will provide an insightful outlook for Hadoop in 2015, and will outline why enterprises should consider using Hadoop as a "Decision Data Platform" and how it can function as a single platform for both online transaction processing (OLTP) and real-time analytics.
What exactly is big data? The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs. Put simply, big data is larger, more complex data sets, especially from new data sources.
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
3 Things to Learn About:
*Building scalable real time architectures for managing data from IoT
*Processing data in real time with components such as Kudu & Spark
*Customer case studies highlighting real-time IoT use cases
Streaming Analytics Comparison of Open Source Frameworks, Products, Cloud Ser...Kai Wähner
Streaming Analytics Comparison of Open Source Frameworks, Products and Cloud Services. Includes Apache Storm, Flink, Spark, TIBCO, IBM, AWS Kinesis, Striim, Zoomdata, ...
This session discusses the technical concepts of stream processing / streaming analytics and how it is related to big data, mobile, cloud and internet of things. Different use cases such as predictive fault management or fraud detection are used to show and compare alternative frameworks and products for stream processing and streaming analytics.
The focus of the session lies on comparing
- different open source frameworks such as Apache Apex, Apache Flink or Apache Spark Streaming
- engines from software vendors such as IBM InfoSphere Streams, TIBCO StreamBase
- cloud offerings such as AWS Kinesis.
- real time streaming UIs such as Striim, Zoomdata or TIBCO Live Datamart.
Live demos will give the audience a good feeling about how to use these frameworks and tools.
The session will also discuss how stream processing is related to Apache Hadoop frameworks (such as MapReduce, Hive, Pig or Impala) and machine learning (such as R, Spark ML or H2O.ai).
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization Denodo
Watch here: https://bit.ly/2NGQD7R
In an era increasingly dominated by advancements in cloud computing, AI and advanced analytics it may come as a shock that many organizations still rely on data architectures built before the turn of the century. But that scenario is rapidly changing with the increasing adoption of real-time data virtualization - a paradigm shift in the approach that organizations take towards accessing, integrating, and provisioning data required to meet business goals.
As data analytics and data-driven intelligence takes centre stage in today’s digital economy, logical data integration across the widest variety of data sources, with proper security and governance structure in place has become mission-critical.
Attend this session to learn:
- Learn how you can meet cloud and data science challenges with data virtualization.
- Why data virtualization is increasingly finding enterprise-wide adoption
- Discover how customers are reducing costs and improving ROI with data virtualization
Integrating Hadoop into your enterprise IT environmentMapR Technologies
http://bit.ly/1M8gzAM – As the old saying goes, "it's not what you do, but how you do it" that makes all the difference. The benefits of Hadoop are well-documented as mainstream adoption continues to grow. However, as with any new technology, integrating Hadoop with your existing data management infrastructure is crucial for getting the maximum value from its capabilities.
Join us for a special roundtable webcast on July 10th to learn how to do it the right way. Gain a deeper understanding of the fundamentals of Hadoop and its growing ecosystem, the key considerations for modifying your current data management practices and the types of Big Data applications you'll be able to build.
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
Whether to take data ingestion cycles off the ETL tool and the data warehouse or to facilitate competitive Data Science and building algorithms in the organization, the data lake – a place for unmodeled and vast data – will be provisioned widely in 2020.
Though it doesn’t have to be complicated, the data lake has a few key design points that are critical, and it does need to follow some principles for success. Avoid building the data swamp, but not the data lake! The tool ecosystem is building up around the data lake and soon many will have a robust lake and data warehouse. We will discuss policy to keep them straight, send data to its best platform, and keep users’ confidence up in their data platforms.
Data lakes will be built in cloud object storage. We’ll discuss the options there as well.
Get this data point for your data lake journey.
How Data-Driven Approaches are Changing Your Data Management Strategies
Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
How Rendezvous Architecture Improves Evaluation in the Real World
In this addition of our machine learning logistics webinar series we build on the ideas of the key requirements for effective management of machine learning logistics presented in the Overview webinar and in Part I Workshop. Here we focus on model-to-model comparison & evaluation, use of decoy models and more. Listen here: http://info.mapr.com/machine-learning-workshop2.html?_ga=2.35695522.324200644.1511891424-416597139.1465233415
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
Big data technologies are being applied to a wide variety of use cases. We will review tangible examples of machine learning, discuss an autonomous driving project and illustrate the role of MapR in next generation initiatives. More: http://info.mapr.com/WB_Machine-Learning-for-Chickens_Global_DG_17.11.02_RegistrationPage.html
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
For this talk we will explore the power of streaming real time events in the context of the IoT and smart cities.
http://info.mapr.com/WB_Streaming-Real-Time-Events_Global_DG_17.08.02_RegistrationPage.html
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html
MapR announced a few new releases in 2017, and we want to go over those exciting new products and features that are available now. We’d like to invite our customers and partners to this webinar in which members of the MapR product team will share details about the latest updates.
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
SAP HANA is an increasingly popular platform for various analytical and transactional use cases with its in-memory architecture. If you’re an SAP customer you’ve experienced the benefits.
However, the underlying storage for SAP HANA is painfully expensive. This slows down your ability to grow your SAP HANA footprint and serve up more applications.
Handling the Extremes: Scaling and Streaming in FinanceMapR Technologies
Agility is king in the world of finance, and a message-driven architecture is a mechanism for building and managing discrete business functionality to enable agility. In order to accommodate rapid innovation, data pipelines must evolve. However, implementing microservices can create management problems, like the number of instances running in an environment.
Microservices can be leveraged on a message-driven architecture, but the concept must be thoughtfully implemented to show the true value. Jim Scott outlines the core tenets of a message-driven architecture and explains its importance in real-time big data-enabled distributed systems within the realm of finance. Along the way, Jim covers financial use cases dealing with securities management and fraud—starting with ingestion of data from potentially hundreds of data sources to the required fan-out of that data without sacrificing performance—and discusses the pros and cons around operational capabilities and using the same data pipeline to support development and quality assurance practices.
Presented at Strata+Hadoop World NY 2016 by:
Jim Scott
MapR Technologies, Inc.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
This is the framework enabled by the MapR Converged Data Platform. Focus is how to compress the data to action cycle. Requires convergence of capbilities
Contrast this with the situation presented with all other approaches. You have data duplication, more management tasks to coordinate the flow, more latency as data is staged and transferred between systems and much more data risk as there are lack of reliability, protection and DR capabilities across these solutions.
Producers example – sensors, web logs, application logs, credit card transactions.
Subscribers example – spark streaming, storm, or even databases.
Altitude Digital, is one of the fastest growing video advertising marketplaces, with nearly seven billion transactions per day. Altitude Digital selects in real time the best video advertisement to play at the right time for the right person.
The MapR platform provides Altitude Digital with a centralized data repository based on the MapR-DB NoSQL database that serves multiple departments, driving efficiency and competitive advantage. Altitude Digital optimizes and streamlines the connection between buyers and sellers of online video and mobile advertising by providing data-driven insights on daily consumer transactions. “Every event helps us make a more intelligent decision about what advertisement we will deliver for what person, and what will make the most money for the publisher while maintaining return on investment for the advertiser,” explained Manny Puentes, CTO, Altitude Digital. “Our goal is to present the most compelling video ad in real time for every video player in the world. That’s a big challenge.” Since MapR enables Altitude Digital to house all of the data in one place, data is managed holistically instead of in departmental silos. Operations, support, sales, and product use the data repository to solve their business-related objectives. “MapR-DB NoSQL database table replication is a powerful feature that enables real-time, bi-directional updates across data centers at scale. We use table replication for discovery and real-time notifications, giving our business a competitive advantage,” said Puentes. “MapR continues to deliver innovative enterprise solutions that just work.”
24
25
National Oilwell Varco is a $23B multinational company based in Houston and is a leading worldwide provider of oil equipment, components and services.
NOV is using MapR to perform real-time analysis to optimize Oil and Gas drilling and production"
They needed to make it easier for their platform to serve hospitals….
Their answer was our converged platform….and to treat the electronic medical record as a stream…the stream itself is a system of record and any updates are subscribed to and received by the various players and consumed as their app requires - - a search index, a database table.
This dramatically simplified the processs and flow with integrated security for privacy and HiPAA requirement.s