http://bit.ly/1zYUhKr – Apache Drill is the industry's first schema-free SQL engine. Designed with a powerful JSON data model, Drill enables queries on complex data with low latency.
This document discusses machine learning with Hadoop. It begins with an agenda that covers why big data has become important, what can be done with big data, and how machine learning works with big data. It then discusses why now is the right time for big data, explaining that improvements in algorithms, data practices, and hardware have enabled new opportunities. Specific machine learning techniques like classification, clustering, and recommendations using Hadoop and Mahout are then outlined.
This talk takes a technological deep dive into MapR M7 including information on some of the key challenges that were solved during the implementation of M7. MapR's M7 is a clean room replication of the HBase API written in C++ and fully integrated into the MapR platform.
In the process of implementing M7, we learned some lessons and solved some interesting challenges. Ted Dunning shares some of these experiences and lessons. Many of these lessons apply across the board to high performance query systems in general and can be applied much more widely. Some of the resulting techniques have already been adopted by the Apache Drill project, but there are lots more places that these techniques can be used.
This document discusses how Spark can be used for production scale applications. It provides examples of companies using Spark and MapR in production for tasks like security analytics, genomics research, and customer analytics. It also outlines key issues to consider when taking Spark to production and describes how MapR provides the performance, reliability, support and data services needed for mission critical Spark applications.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
MapR-DB is an enterprise-grade, high performance, in-Hadoop NoSQL (“Not Only SQL”) database management system. It is used to add real-time, operational analytics capabilities to Hadoop and now natively support JSON.
http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
This document discusses machine learning with Hadoop. It begins with an agenda that covers why big data has become important, what can be done with big data, and how machine learning works with big data. It then discusses why now is the right time for big data, explaining that improvements in algorithms, data practices, and hardware have enabled new opportunities. Specific machine learning techniques like classification, clustering, and recommendations using Hadoop and Mahout are then outlined.
This talk takes a technological deep dive into MapR M7 including information on some of the key challenges that were solved during the implementation of M7. MapR's M7 is a clean room replication of the HBase API written in C++ and fully integrated into the MapR platform.
In the process of implementing M7, we learned some lessons and solved some interesting challenges. Ted Dunning shares some of these experiences and lessons. Many of these lessons apply across the board to high performance query systems in general and can be applied much more widely. Some of the resulting techniques have already been adopted by the Apache Drill project, but there are lots more places that these techniques can be used.
This document discusses how Spark can be used for production scale applications. It provides examples of companies using Spark and MapR in production for tasks like security analytics, genomics research, and customer analytics. It also outlines key issues to consider when taking Spark to production and describes how MapR provides the performance, reliability, support and data services needed for mission critical Spark applications.
Practical Machine Learning: Innovations in Recommendation WorkshopMapR Technologies
Ted Dunning, Committer for Apache Mahout, Drill & Zookeeper presents on:
1. How to build a production quality recommendation engine using Mahout and Solr or Elasticsearch
2. How to build a multi-modal recommendation from multiple behavioral inputs
3. How search engines can be used for more than just text
This talk will present a detailed tear-down and walk-through of a working soup-to-nuts recommendation engine that uses observations of multiple kinds of behavior to do combined recommendation and cross recommendation. The system uses Mahout to do off-line analysis and can use Solr or Elasticsearch to provide real-time recommendations. The talk will also include enough theory to provide useful working intuitions for those desiring to adapt this design.
The entire system including a data generator, off-line analysis scripts, Solr and Elasticsearch configurations and sample web pages will be made available on github for attendees to modify as they like.
Building recommendation engines by abusing a search engine has been well-known for some time to a small sub-culture in the recommendation community, but techniques for building multi-model recommendation engines are not at all well known.
MapR-DB is an enterprise-grade, high performance, in-Hadoop NoSQL (“Not Only SQL”) database management system. It is used to add real-time, operational analytics capabilities to Hadoop and now natively support JSON.
http://bit.ly/1BTaXZP – As organizations look for even faster ways to derive value from big data, they are turning to Apache Spark is an in-memory processing framework that offers lightning-fast big data analytics, providing speed, developer productivity, and real-time processing advantages. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Spark Streaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis. This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop. By the end of the session, you’ll come away with a deeper understanding of how you can unlock deeper insights from your data, faster, with Spark.
From the Hadoop Summit 2015 Session with Tomer Shiran.
To deliver real-time impact from big data, organizations must evolve beyond traditional analytic approaches to support a new class of agile, distributed applications. Real-time Hadoop overcomes batch programs reliant on data transformations and schema management. This session highlights how leading organizations are leveraging Hadoop and NoSQL to merge analytics and production data to make adjustments while business is happening to optimize revenue, mitigate risk and reduce operational costs. Details include how companies have achieved real-time impact on their business, collapsed data silos, and automated in-line analytics with operational data for immediate impact.
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションMapR Technologies Japan
数あるSQL-on-Hadoopエンジンの中でも、標準SQL準拠、柔軟で動的なデータ解釈、様々なデータソースや格納形式への対応という特徴を持つApache Drill。デモを中心に、Drillの便利な機能を利用したデータ検索・分析の楽しみ方をご紹介します。2014年11月6日に開催されたCloudera World Tokyo 2014 LTセッションでの講演資料です。
Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.
Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
This talk with focus on two key aspects of applications that are using the HBase APIs. The first part will provide a basic overview of how HBase works followed by an introduction to the HBase APIs with a simple example. The second part will extend what we've learned to secure the HBase application running on MapR's industry leading Hadoop.
Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. At MapR his primary responsibility is working with customers as a consultant, but he also teaches classes, contributes to documentation, and works with MapR engineering. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.
http://bit.ly/1BTaXZP – Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now.
That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Sparkstreaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis.
This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop.
Keys Botzum - Senior Principal Technologist with MapR Technologies
Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book.
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
From the Hadoop Summit 2015 Session with Ted Dunning:
The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.
This document discusses tuning HBase and HDFS for performance and correctness. Some key recommendations include:
- Enable HDFS sync on close and sync behind writes for correctness on power failures.
- Tune HBase compaction settings like blockingStoreFiles and compactionThreshold based on whether the workload is read-heavy or write-heavy.
- Size RegionServer machines based on disk size, heap size, and number of cores to optimize for the workload.
- Set client and server RPC chunk sizes like hbase.client.write.buffer to 2MB to maximize network throughput.
- Configure various garbage collection settings in HBase like -Xmn512m and -XX:+UseCMSInit
How Data-Driven Approaches are Changing Your Data Management Strategies
Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
This document discusses machine learning model comparison and evaluation. It describes how the rendezvous architecture in MapR makes evaluation easier by collecting metrics on model performance and allowing direct comparison of models. It also discusses challenges like reject inferencing and the need to balance exploration of new models with exploitation of existing models. The document provides recommendations for change detection and analyzing latency distributions to better evaluate models over time.
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
The document discusses machine learning and autonomous driving applications. It begins with a simple machine learning example of classifying images of chickens posted on Twitter. It then discusses how autonomous vehicles use machine learning by gathering large amounts of sensor data to train models for tasks like object recognition. The document also summarizes challenges for applying machine learning at an enterprise scale and how the MapR data platform can address these challenges by providing a unified environment for storing, accessing, and processing large amounts of diverse data.
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
For this talk we will explore the power of streaming real time events in the context of the IoT and smart cities.
http://info.mapr.com/WB_Streaming-Real-Time-Events_Global_DG_17.08.02_RegistrationPage.html
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
Apache Drill でたしなむ セルフサービスデータ探索 - 2014/11/06 Cloudera World Tokyo 2014 LTセッションMapR Technologies Japan
数あるSQL-on-Hadoopエンジンの中でも、標準SQL準拠、柔軟で動的なデータ解釈、様々なデータソースや格納形式への対応という特徴を持つApache Drill。デモを中心に、Drillの便利な機能を利用したデータ検索・分析の楽しみ方をご紹介します。2014年11月6日に開催されたCloudera World Tokyo 2014 LTセッションでの講演資料です。
Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.
Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
Introduction to Apache HBase, MapR Tables and SecurityMapR Technologies
This talk with focus on two key aspects of applications that are using the HBase APIs. The first part will provide a basic overview of how HBase works followed by an introduction to the HBase APIs with a simple example. The second part will extend what we've learned to secure the HBase application running on MapR's industry leading Hadoop.
Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. At MapR his primary responsibility is working with customers as a consultant, but he also teaches classes, contributes to documentation, and works with MapR engineering. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.
http://bit.ly/1BTaXZP – Hadoop has been a huge success in the data world. It’s disrupted decades of data management practices and technologies by introducing a massively parallel processing framework. The community and the development of all the Open Source components pushed Hadoop to where it is now.
That's why the Hadoop community is excited about Apache Spark. The Spark software stack includes a core data-processing engine, an interface for interactive querying, Sparkstreaming for streaming data analysis, and growing libraries for machine-learning and graph analysis. Spark is quickly establishing itself as a leading environment for doing fast, iterative in-memory and streaming analysis.
This talk will give an introduction the Spark stack, explain how Spark has lighting fast results, and how it complements Apache Hadoop.
Keys Botzum - Senior Principal Technologist with MapR Technologies
Keys is Senior Principal Technologist with MapR Technologies, where he wears many hats. His primary responsibility is interacting with customers in the field, but he also teaches classes, contributes to documentation, and works with engineering teams. He has over 15 years of experience in large scale distributed system design. Previously, he was a Senior Technical Staff Member with IBM, and a respected author of many articles on the WebSphere Application Server as well as a book.
HBase and Drill: How Loosely Typed SQL is Ideal for NoSQLMapR Technologies
From the Hadoop Summit 2015 Session with Ted Dunning:
The Apache HBase approach to data has a huge potential for expressing NoSQL-y, non-relational programs. Apache Drill supports SQL for non-relational data. Paradoxically, combining this NoSQL with this SQL tool results in something even better. I will show and explain how to combine HBase and Drill to access time series data and to support high performance secondary indexing.
This document discusses tuning HBase and HDFS for performance and correctness. Some key recommendations include:
- Enable HDFS sync on close and sync behind writes for correctness on power failures.
- Tune HBase compaction settings like blockingStoreFiles and compactionThreshold based on whether the workload is read-heavy or write-heavy.
- Size RegionServer machines based on disk size, heap size, and number of cores to optimize for the workload.
- Set client and server RPC chunk sizes like hbase.client.write.buffer to 2MB to maximize network throughput.
- Configure various garbage collection settings in HBase like -Xmn512m and -XX:+UseCMSInit
How Data-Driven Approaches are Changing Your Data Management Strategies
Introducing data-driven strategies into your business model alters the way your organization manages and provides information to your customers, partners and employees. Gone are the days of “waterfall” implementation strategies from relational data to applications within a data center. Now, data-driven business models require agile implementation of applications based on information from all across an organization–on-premises, cloud, and mobile–and includes information from outside corporate walls from partners, third-party vendors, and customers. Data management strategies need to be ready to meet these challenges or your new and disruptive business models will fail at the most critical time: when your customers want to access it.
ML Workshop 2: Machine Learning Model Comparison & EvaluationMapR Technologies
This document discusses machine learning model comparison and evaluation. It describes how the rendezvous architecture in MapR makes evaluation easier by collecting metrics on model performance and allowing direct comparison of models. It also discusses challenges like reject inferencing and the need to balance exploration of new models with exploitation of existing models. The document provides recommendations for change detection and analyzing latency distributions to better evaluate models over time.
Self-Service Data Science for Leveraging ML & AI on All of Your DataMapR Technologies
MapR has launched the MapR Data Science Refinery which leverages a scalable data science notebook with native platform access, superior out-of-the-box security, and access to global event streaming and a multi-model NoSQL database.
Enabling Real-Time Business with Change Data CaptureMapR Technologies
Machine learning (ML) and artificial intelligence (AI) enable intelligent processes that can autonomously make decisions in real-time. The real challenge for effective ML and AI is getting all relevant data to a converged data platform in real-time, where it can be processed using modern technologies and integrated into any downstream systems.
Machine Learning for Chickens, Autonomous Driving and a 3-year-old Who Won’t ...MapR Technologies
The document discusses machine learning and autonomous driving applications. It begins with a simple machine learning example of classifying images of chickens posted on Twitter. It then discusses how autonomous vehicles use machine learning by gathering large amounts of sensor data to train models for tasks like object recognition. The document also summarizes challenges for applying machine learning at an enterprise scale and how the MapR data platform can address these challenges by providing a unified environment for storing, accessing, and processing large amounts of diverse data.
ML Workshop 1: A New Architecture for Machine Learning LogisticsMapR Technologies
Having heard the high-level rationale for the rendezvous architecture in the introduction to this series, we will now dig in deeper to talk about how and why the pieces fit together. In terms of components, we will cover why streams work, why they need to be persistent, performant and pervasive in a microservices design and how they provide isolation between components. From there, we will talk about some of the details of the implementation of a rendezvous architecture including discussion of when the architecture is applicable, key components of message content and how failures and upgrades are handled. We will touch on the monitoring requirements for a rendezvous system but will save the analysis of the recorded data for later. Listen to the webinar on demand: https://mapr.com/resources/webinars/machine-learning-workshop-1/
Machine Learning Success: The Key to Easier Model ManagementMapR Technologies
Join Ellen Friedman, co-author (with Ted Dunning) of a new short O’Reilly book Machine Learning Logistics: Model Management in the Real World, to look at what you can do to have effective model management, including the role of stream-first architecture, containers, a microservices approach and a DataOps style of work. Ellen will provide a basic explanation of a new architecture that not only leverages stream transport but also makes use of canary models and decoy models for accurate model evaluation and for efficient and rapid deployment of new models in production.
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
Data warehouses have been the standard tool for analyzing data created by business operations. In recent years, increasing data volumes, new types of data formats, and emerging analytics technologies such as machine learning have given rise to modern data lakes. Connecting application databases, data warehouses, and data lakes using real-time data pipelines can significantly improve the time to action for business decisions. More: http://info.mapr.com/WB_MapR-StreamSets-Data-Warehouse-Modernization_Global_DG_17.08.16_RegistrationPage.html
Live Tutorial – Streaming Real-Time Events Using Apache APIsMapR Technologies
For this talk we will explore the power of streaming real time events in the context of the IoT and smart cities.
http://info.mapr.com/WB_Streaming-Real-Time-Events_Global_DG_17.08.02_RegistrationPage.html
Bringing Structure, Scalability, and Services to Cloud-Scale StorageMapR Technologies
Deploying storage with a forklift is so 1990s, right? Today’s applications and infrastructure demand systems and services that scale. Customers require performance and capacity to fit the use case and workloads, not the other way around. Architects need multi-temperature, multi-location, highly available, and compliance friendly platforms that grow with the generational shift in data growth and utility.
Churn prediction is big business. It minimizes customer defection by predicting which customers are likely to cancel a service. Though originally used within the telecommunications industry, it has become common practice for banks, ISPs, insurance firms, and other verticals. More: http://info.mapr.com/WB_PredictingChurn_Global_DG_17.06.15_RegistrationPage.html
The prediction process is data-driven and often uses advanced machine learning techniques. In this webinar, we'll look at customer data, do some preliminary analysis, and generate churn prediction models – all with Spark machine learning (ML) and a Zeppelin notebook.
Spark’s ML library goal is to make machine learning scalable and easy. Zeppelin with Spark provides a web-based notebook that enables interactive machine learning and visualization.
In this tutorial, we'll do the following:
Review classification and decision trees
Use Spark DataFrames with Spark ML pipelines
Predict customer churn with Apache Spark ML decision trees
Use Zeppelin to run Spark commands and visualize the results
An Introduction to the MapR Converged Data PlatformMapR Technologies
Listen to the webinar on-demand: http://info.mapr.com/WB_Partner_CDP_Intro_EMEA_DG_17.05.31_RegistrationPage.html
In this 90-minute webinar, we discuss:
- The MapR Converged Data Platform and its components
- Use cases for the Converged Data Platform
- MapR Converged Partner Program
- How to get started with MapR
- Becoming a partner
How to Leverage the Cloud for Business Solutions | Strata Data Conference Lon...MapR Technologies
IT budgets are shrinking, and the move to next-generation technologies is upon us. The cloud is an option for nearly every company, but just because it is an option doesn’t mean it is always the right solution for every problem.
Most cloud providers would prefer that every customer be tightly coupled with their proprietary services and APIs to create lock-in with that cloud provider. The savvy customer will leverage the cloud as infrastructure and stay loosely bound to a cloud provider. This creates an opportunity for the customer to execute a multicloud strategy or even a hybrid on-premises and cloud solution.
Jim Scott explores different use cases that may be best run in the cloud versus on-premises, points out opportunities to optimize cost and operational benefits, and explains how to get the data moved between locations. Along the way, Jim discusses security, backups, event streaming, databases, replication, and snapshots across a variety of use cases that run most businesses today.
Is your organization at the analytics crossroads? Have you made strides collecting and sharing massive amounts of data from electronic health records, insurance claims, and health information exchanges but found these efforts made little impact on efficiency, patient outcomes, or costs?
Changes in how business is done combined with multiple technology drivers make geo-distributed data increasingly important for enterprises. These changes are causing serious disruption across a wide range of industries, including healthcare, manufacturing, automotive, telecommunications, and entertainment. Technical challenges arise with these disruptions, but the good news is there are now innovative solutions to address these problems. http://info.mapr.com/WB_Geo-distributed-Big-Data-and-Analytics_Global_DG_17.05.16_RegistrationPage.html
This document is the agenda for a MapR product update webinar that will take place in Spring 2017. It introduces MapR's new Persistent Application Client Container (PACC) which allows applications to easily persist data in Docker containers. It also discusses MapR Edge for IoT which extends MapR's converged data platform to the edge. The webinar will cover Hive, Spark, and Drill updates in the new MapR Ecosystem Pack 3.0. Speakers from MapR will provide details on these products and there will be a question and answer session.
3 Benefits of Multi-Temperature Data Management for Data AnalyticsMapR Technologies
SAP® HANA and SAP® IQ are popular platforms for various analytical and transactional use cases. If you’re an SAP customer, you’ve experienced the benefits of deploying these solutions. However, as data volumes grow, you’re likely asking yourself: How do I scale storage to support these applications? How can I have one platform for various applications and use cases?
Cisco & MapR bring 3 Superpowers to SAP HANA DeploymentsMapR Technologies
SAP HANA is an increasingly popular platform for various analytical and transactional use cases with its in-memory architecture. If you’re an SAP customer you’ve experienced the benefits.
However, the underlying storage for SAP HANA is painfully expensive. This slows down your ability to grow your SAP HANA footprint and serve up more applications.
You’re not the only one still loading your data into data warehouses and building marts or cubes out of it. But today’s data requires a much more accessible environment that delivers real-time results. Prepare for this transformation because your data platform and storage choices are about to undergo a re-platforming that happens once in 30 years.
With the MapR Converged Data Platform (CDP) and Cisco Unified Compute System (UCS), you can optimize today’s infrastructure and grow to take advantage of what’s next. Uncover the range of possibilities from re-platforming by intimately understanding your options for density, performance, functionality and more.
Drill can query JSON data stored in various data sources like HDFS, HBase, and Hive. It allows running SQL queries over JSON data without requiring a fixed schema. The document describes how Drill enables ad-hoc querying of JSON-formatted Yelp business review data using SQL, providing insights faster than traditional approaches.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
"Scaling RAG Applications to serve millions of users", Kevin GoedeckeFwdays
How we managed to grow and scale a RAG application from zero to thousands of users in 7 months. Lessons from technical challenges around managing high load for LLMs, RAGs and Vector databases.
The Microsoft 365 Migration Tutorial For Beginner.pptxoperationspcvita
This presentation will help you understand the power of Microsoft 365. However, we have mentioned every productivity app included in Office 365. Additionally, we have suggested the migration situation related to Office 365 and how we can help you.
You can also read: https://www.systoolsgroup.com/updates/office-365-tenant-to-tenant-migration-step-by-step-complete-guide/
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/temporal-event-neural-networks-a-more-efficient-alternative-to-the-transformer-a-presentation-from-brainchip/
Chris Jones, Director of Product Management at BrainChip , presents the “Temporal Event Neural Networks: A More Efficient Alternative to the Transformer” tutorial at the May 2024 Embedded Vision Summit.
The expansion of AI services necessitates enhanced computational capabilities on edge devices. Temporal Event Neural Networks (TENNs), developed by BrainChip, represent a novel and highly efficient state-space network. TENNs demonstrate exceptional proficiency in handling multi-dimensional streaming data, facilitating advancements in object detection, action recognition, speech enhancement and language model/sequence generation. Through the utilization of polynomial-based continuous convolutions, TENNs streamline models, expedite training processes and significantly diminish memory requirements, achieving notable reductions of up to 50x in parameters and 5,000x in energy consumption compared to prevailing methodologies like transformers.
Integration with BrainChip’s Akida neuromorphic hardware IP further enhances TENNs’ capabilities, enabling the realization of highly capable, portable and passively cooled edge devices. This presentation delves into the technical innovations underlying TENNs, presents real-world benchmarks, and elucidates how this cutting-edge approach is positioned to revolutionize edge AI across diverse applications.
QA or the Highway - Component Testing: Bridging the gap between frontend appl...zjhamm304
These are the slides for the presentation, "Component Testing: Bridging the gap between frontend applications" that was presented at QA or the Highway 2024 in Columbus, OH by Zachary Hamm.
inQuba Webinar Mastering Customer Journey Management with Dr Graham HillLizaNolte
HERE IS YOUR WEBINAR CONTENT! 'Mastering Customer Journey Management with Dr. Graham Hill'. We hope you find the webinar recording both insightful and enjoyable.
In this webinar, we explored essential aspects of Customer Journey Management and personalization. Here’s a summary of the key insights and topics discussed:
Key Takeaways:
Understanding the Customer Journey: Dr. Hill emphasized the importance of mapping and understanding the complete customer journey to identify touchpoints and opportunities for improvement.
Personalization Strategies: We discussed how to leverage data and insights to create personalized experiences that resonate with customers.
Technology Integration: Insights were shared on how inQuba’s advanced technology can streamline customer interactions and drive operational efficiency.
From Natural Language to Structured Solr Queries using LLMsSease
This talk draws on experimentation to enable AI applications with Solr. One important use case is to use AI for better accessibility and discoverability of the data: while User eXperience techniques, lexical search improvements, and data harmonization can take organizations to a good level of accessibility, a structural (or “cognitive” gap) remains between the data user needs and the data producer constraints.
That is where AI – and most importantly, Natural Language Processing and Large Language Model techniques – could make a difference. This natural language, conversational engine could facilitate access and usage of the data leveraging the semantics of any data source.
The objective of the presentation is to propose a technical approach and a way forward to achieve this goal.
The key concept is to enable users to express their search queries in natural language, which the LLM then enriches, interprets, and translates into structured queries based on the Solr index’s metadata.
This approach leverages the LLM’s ability to understand the nuances of natural language and the structure of documents within Apache Solr.
The LLM acts as an intermediary agent, offering a transparent experience to users automatically and potentially uncovering relevant documents that conventional search methods might overlook. The presentation will include the results of this experimental work, lessons learned, best practices, and the scope of future work that should improve the approach and make it production-ready.
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyScyllaDB
Freshworks creates AI-boosted business software that helps employees work more efficiently and effectively. Managing data across multiple RDBMS and NoSQL databases was already a challenge at their current scale. To prepare for 10X growth, they knew it was time to rethink their database strategy. Learn how they architected a solution that would simplify scaling while keeping costs under control.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
Must Know Postgres Extension for DBA and Developer during MigrationMydbops
Mydbops Opensource Database Meetup 16
Topic: Must-Know PostgreSQL Extensions for Developers and DBAs During Migration
Speaker: Deepak Mahto, Founder of DataCloudGaze Consulting
Date & Time: 8th June | 10 AM - 1 PM IST
Venue: Bangalore International Centre, Bangalore
Abstract: Discover how PostgreSQL extensions can be your secret weapon! This talk explores how key extensions enhance database capabilities and streamline the migration process for users moving from other relational databases like Oracle.
Key Takeaways:
* Learn about crucial extensions like oracle_fdw, pgtt, and pg_audit that ease migration complexities.
* Gain valuable strategies for implementing these extensions in PostgreSQL to achieve license freedom.
* Discover how these key extensions can empower both developers and DBAs during the migration process.
* Don't miss this chance to gain practical knowledge from an industry expert and stay updated on the latest open-source database trends.
Mydbops Managed Services specializes in taking the pain out of database management while optimizing performance. Since 2015, we have been providing top-notch support and assistance for the top three open-source databases: MySQL, MongoDB, and PostgreSQL.
Our team offers a wide range of services, including assistance, support, consulting, 24/7 operations, and expertise in all relevant technologies. We help organizations improve their database's performance, scalability, efficiency, and availability.
Contact us: info@mydbops.com
Visit: https://www.mydbops.com/
Follow us on LinkedIn: https://in.linkedin.com/company/mydbops
For more details and updates, please follow up the below links.
Meetup Page : https://www.meetup.com/mydbops-databa...
Twitter: https://twitter.com/mydbopsofficial
Blogs: https://www.mydbops.com/blog/
Facebook(Meta): https://www.facebook.com/mydbops/
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...Jason Yip
The typical problem in product engineering is not bad strategy, so much as “no strategy”. This leads to confusion, lack of motivation, and incoherent action. The next time you look for a strategy and find an empty space, instead of waiting for it to be filled, I will show you how to fill it in yourself. If you’re wrong, it forces a correction. If you’re right, it helps create focus. I’ll share how I’ve approached this in the past, both what works and lessons for what didn’t work so well.
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...Fwdays
Direct losses from downtime in 1 minute = $5-$10 thousand dollars. Reputation is priceless.
As part of the talk, we will consider the architectural strategies necessary for the development of highly loaded fintech solutions. We will focus on using queues and streaming to efficiently work and manage large amounts of data in real-time and to minimize latency.
We will focus special attention on the architectural patterns used in the design of the fintech system, microservices and event-driven architecture, which ensure scalability, fault tolerance, and consistency of the entire system.
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...DanBrown980551
This LF Energy webinar took place June 20, 2024. It featured:
-Alex Thornton, LF Energy
-Hallie Cramer, Google
-Daniel Roesler, UtilityAPI
-Henry Richardson, WattTime
In response to the urgency and scale required to effectively address climate change, open source solutions offer significant potential for driving innovation and progress. Currently, there is a growing demand for standardization and interoperability in energy data and modeling. Open source standards and specifications within the energy sector can also alleviate challenges associated with data fragmentation, transparency, and accessibility. At the same time, it is crucial to consider privacy and security concerns throughout the development of open source platforms.
This webinar will delve into the motivations behind establishing LF Energy’s Carbon Data Specification Consortium. It will provide an overview of the draft specifications and the ongoing progress made by the respective working groups.
Three primary specifications will be discussed:
-Discovery and client registration, emphasizing transparent processes and secure and private access
-Customer data, centering around customer tariffs, bills, energy usage, and full consumption disclosure
-Power systems data, focusing on grid data, inclusive of transmission and distribution networks, generation, intergrid power flows, and market settlement data
"Choosing proper type of scaling", Olena SyrotaFwdays
Imagine an IoT processing system that is already quite mature and production-ready and for which client coverage is growing and scaling and performance aspects are life and death questions. The system has Redis, MongoDB, and stream processing based on ksqldb. In this talk, firstly, we will analyze scaling approaches and then select the proper ones for our system.