The document discusses managing a multi-tenant data lake at Comcast over time. It began as an experiment in 2013 with 10 nodes and has grown significantly to over 1500 nodes currently. Governance was instituted to manage the diverse user community and workloads. Tools like the Command Center were developed to provide monitoring, alerting and visualization of the large Hadoop environment. SLA management, support processes, and ongoing training are needed to effectively operate the multi-tenant data lake at scale.
This document provides an overview of NEC's Heterogeneous Mixture Learning (HML) technology and its implementation on Apache Spark. It introduces the speakers and their backgrounds working on distributed computing and machine learning. The agenda discusses HML, applications of HML, the HML algorithm, and benchmark performance evaluations showing HML achieves competitive prediction accuracy compared to other Spark ML algorithms while maintaining good scalability. Distributed HML on Spark aims to enable fast, large-scale machine learning by balancing work across executors and leveraging high-performance matrix libraries.
This document discusses building a scalable data science platform with R. It describes R as a popular statistical programming language with over 2.5 million users. It notes that while R is widely used, its open source nature means it lacks enterprise capabilities for large-scale use. The document then introduces Microsoft R Server as a way to bring enterprise capabilities like scalability, efficiency, and support to R in order to make it suitable for production use on big data problems. It provides examples of using R Server with Hadoop and HDInsight on the Azure cloud to operationalize advanced analytics workflows from data cleaning and modeling to deployment as web services at scale.
Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop that was created by eBay and later open sourced as an Apache Incubator project. It provides security for Hadoop systems by instantly identifying access to sensitive data, recognizing attacks/malicious activity, and blocking access in real time through complex policy definitions and stream processing. Eagle was designed to handle the huge volume of metrics and logs generated by large-scale Hadoop deployments through its distributed architecture and use of technologies like Apache Storm and Kafka.
Presented by Jack Norris, SVP Data & Applications at Gartner Symposium 2016.
Jack presents how companies from TransUnion to Uber use event-driven processing to transform their business with agility, scale, robustness, and efficiency advantages.
More info: https://www.mapr.com/company/press-releases/mapr-present-gartner-symposiumitxpo-and-other-notable-industry-conferences
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
If you are testing a simple mobile app, you may find it relatively easy to find representative test data. However, what if you are testing enterprise scale applications? In the enterprise data center, one hundred or more applications of various sizes, complexity, and criticality co-exist, operating on various data repositories, in some cases shared data repositories. In some cases, disparate data repositories hold related data, and the ability to test integration across applications that access these data sets is critical. In this keynote speech, Rex Black will talk about the challenges facing his clients as they deal with these testing problems. You’ll go away with a better understanding of the nature of the challenges, as well as ideas on how to handle them, grounded in lessons Rex has learned in over 30 years of software engineering and testing.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
The document discusses managing a multi-tenant data lake at Comcast over time. It began as an experiment in 2013 with 10 nodes and has grown significantly to over 1500 nodes currently. Governance was instituted to manage the diverse user community and workloads. Tools like the Command Center were developed to provide monitoring, alerting and visualization of the large Hadoop environment. SLA management, support processes, and ongoing training are needed to effectively operate the multi-tenant data lake at scale.
This document provides an overview of NEC's Heterogeneous Mixture Learning (HML) technology and its implementation on Apache Spark. It introduces the speakers and their backgrounds working on distributed computing and machine learning. The agenda discusses HML, applications of HML, the HML algorithm, and benchmark performance evaluations showing HML achieves competitive prediction accuracy compared to other Spark ML algorithms while maintaining good scalability. Distributed HML on Spark aims to enable fast, large-scale machine learning by balancing work across executors and leveraging high-performance matrix libraries.
This document discusses building a scalable data science platform with R. It describes R as a popular statistical programming language with over 2.5 million users. It notes that while R is widely used, its open source nature means it lacks enterprise capabilities for large-scale use. The document then introduces Microsoft R Server as a way to bring enterprise capabilities like scalability, efficiency, and support to R in order to make it suitable for production use on big data problems. It provides examples of using R Server with Hadoop and HDInsight on the Azure cloud to operationalize advanced analytics workflows from data cleaning and modeling to deployment as web services at scale.
Apache Eagle is a distributed real-time monitoring and alerting engine for Hadoop that was created by eBay and later open sourced as an Apache Incubator project. It provides security for Hadoop systems by instantly identifying access to sensitive data, recognizing attacks/malicious activity, and blocking access in real time through complex policy definitions and stream processing. Eagle was designed to handle the huge volume of metrics and logs generated by large-scale Hadoop deployments through its distributed architecture and use of technologies like Apache Storm and Kafka.
Presented by Jack Norris, SVP Data & Applications at Gartner Symposium 2016.
Jack presents how companies from TransUnion to Uber use event-driven processing to transform their business with agility, scale, robustness, and efficiency advantages.
More info: https://www.mapr.com/company/press-releases/mapr-present-gartner-symposiumitxpo-and-other-notable-industry-conferences
Testistanbul 2016 - Keynote: "Enterprise Challenges of Test Data" by Rex BlackTurkish Testing Board
If you are testing a simple mobile app, you may find it relatively easy to find representative test data. However, what if you are testing enterprise scale applications? In the enterprise data center, one hundred or more applications of various sizes, complexity, and criticality co-exist, operating on various data repositories, in some cases shared data repositories. In some cases, disparate data repositories hold related data, and the ability to test integration across applications that access these data sets is critical. In this keynote speech, Rex Black will talk about the challenges facing his clients as they deal with these testing problems. You’ll go away with a better understanding of the nature of the challenges, as well as ideas on how to handle them, grounded in lessons Rex has learned in over 30 years of software engineering and testing.
Insight Platforms Accelerate Digital TransformationMapR Technologies
Many organizations have invested in big data technologies such as Hadoop and Spark. But these investments only address how to gain deeper insights from more diverse data. They do not address how to create action from those insights.
Forrester has identified an emerging class of software—insight platforms—that combine data, analytics, and insight execution to drive action using a big data fabric.
In this presentation, our guest, Forrester Research VP and Principal Analyst, Brian Hopkins, will:
o Present Forrester's recent research on insight platforms and big data fabrics.
o Provide strategies for getting more value from your big data investments.
MapR will share:
o Examples of leading companies and best practices for creating modern applications.
o How to combine analytics and operations to accelerate digital transformation and create competitive advantage.
451 Research is a leading IT research and advisory company founded in 2000 with over 250 employees including over 100 analysts. It provides research and data through fifteen channels to over 1,000 clients on technology and service providers. The document discusses the evolution of the meaning of "Hadoop" from referring originally to specific Apache projects like HDFS and MapReduce to becoming a catch-all term for the distributed data processing ecosystem, and how different Hadoop distributions combine various related Apache projects in their offerings. It also examines how data platforms are converging, with various databases, analytics engines, and streaming platforms increasingly supporting common workloads and data models.
This workshop will provide a hands on introduction to basic Machine Learning techniques with Apache Spark ML using the cloud.
Format: A short introductory lecture on a select important supervised and unsupervised Machine Learning techniques followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Machine Learning with Spark ML. In the lab, you will use the following components: Apache Zeppelin (a “Modern Data Science Toolbox”) and Apache Spark. You will learn how to analyze the data, structure the data, train Machine Learning models and apply them to answer real-world questions.
Pre-requisites: Registrants must bring a laptop that can run the Hortonworks Data Cloud.
At this Crash Course everyone will have a cluster assigned to them to try several workloads using Machine Learning, Spark and Zeppelin on the cloud.
Speakers: Robert Hryniewicz
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
Common and unique use cases for Apache HadoopBrock Noland
The document provides an overview of Apache Hadoop and common use cases. It describes how Hadoop is well-suited for log processing due to its ability to handle large amounts of data in parallel across commodity hardware. Specifically, it allows processing of log files to be distributed per unit of data, avoiding bottlenecks that can occur when trying to process a single large file sequentially.
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
The document summarizes Mayo Clinic's implementation of a big data platform to process and analyze large volumes of daily healthcare data, including HL7 messages, for enterprise-wide clinical and non-clinical usage. The platform, built on Hadoop and using technologies like Storm and Elasticsearch, reliably handles 20-50 times more data than their current daily volumes. It provides ultra-fast free text search capabilities. The system supports applications like processing data for colorectal surgery, exceeding requirements and outperforming previous RDBMS-only systems. Ongoing work involves further enhancing capabilities and integrating with additional components as part of a unified data platform.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
The document discusses challenges with building operational data applications on Hadoop and introduces the Cask Data Application Platform (CDAP) as a solution. It provides an agenda that covers data applications, challenges, CDAP motivation and goals, use cases, and an introduction and architecture overview of CDAP. The document aims to demonstrate how CDAP provides a unified platform that simplifies application development and lifecycle while supporting reusable data and processing patterns.
Ted Dunning is the Chief Applications Architect at MapR Technologies and a committer for Apache Drill, Zookeeper, and other projects. The document discusses goals around real-time or near-time processing and microservices. It describes how to design microservices for isolation using self-describing data, private databases, and shared storage only where necessary. Various scenarios involving fraud detection, IoT data aggregation, and global data recovery are presented. Lessons focus on decoupling services, propagating events rather than table updates, and how data architecture should reflect business structure.
The document discusses how big data has enabled new opportunities by changing scaling laws and problem landscapes. Specifically, linearly scaling costs with big data now make it feasible to process large amounts of data, opening up many problems that were previously impossible or too difficult. This has created many "green field" opportunities where simple approaches can solve important problems. Two examples discussed are using log analysis to detect security threats and using transaction histories to find a common point of compromise for a data breach.
This document discusses scaling machine learning models from a laboratory setting to production. It proposes using a standardized representation called PMML to capture models produced by R and Scikit-Learn. PMML allows models to be deployed across different frameworks and languages. The document outlines APIs for evaluating, maintaining, and integrating models as reusable functions within data pipelines in Hadoop ecosystems like Spark, Pig, and Cascading. The goal is a portable, platform-agnostic architecture for operationalizing machine learning based on open standards.
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
Today’s Software Defined environments attempt to remove the weakness of computing hardware from the operational equation. There is no doubt that this is a natural progress away from overpriced, proprietary compute and storage layers. However, even at the heart of any Software Defined universe is an underlying hardware stack that must be robust, reliable and cost effective. Our 20+ years experience delivering over 2000 clusters and clouds has taught us how to properly design and engineer the right hardware solution for Big Data, Cluster and Cloud environments. This presentation will share this knowledge allowing user to make better design decisions for any deployment.
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
The document discusses using Hadoop to optimize an enterprise data warehouse. It describes offloading some ETL and long-term storage tasks to Hadoop which provides significant cost savings over a traditional data warehouse. The hybrid solution leverages both Hadoop and the data warehouse for optimized querying, presentation and analytics. Examples are provided of real-time and operational applications that can be built using Hadoop technologies.
The document discusses MapR Streams, a global publish/subscribe event streaming system. It provides converged, continuous, and global capabilities. MapR Streams allows producers to publish billions of messages per second to topics, and guarantees immediate and reliable delivery to consumers. It also enables tying together geo-dispersed clusters globally. The document demonstrates MapR Streams capabilities with a live demo and discusses use cases for event streaming across various industries.
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutions under varying performance and operational requirements.
Which configuration is faster for big files? Which is faster for sharing data and servers among groups? Which eliminates data movement? Which is easiest to manage? Which works best with iterative and multistep algorithms? What are the hardware requirements of each alternative?
This webinar is intended to help new users of R with Hadoop select their best architecture for integrating Hadoop and R, by explaining the benefits of several popular configurations, their performance potential, workload handling and programming model and administrative characteristics.
Presenters from Revolution Analytics will describe the options for using Revolution R Open and Revolution R Enterprise with Hadoop including servers, edge nodes, rHadoop and ScaleR. We’ll then compare the characteristics of each configuration as regards performance but also programming model, administration, data movement, ease of scaling, mixed workload handling, and performance for large individual analyses vs. mixed workloads.
Real life use cases from across Europe (Walid Aoudi - Cognizant)
This presentation will present some Cognizant Big Data clients return on experiences on continental Europe and UK. The main focus will be centered on use cases through the presentation of the business drivers behind these projects. Key highlights around the big data architecture and approach solutions will be presented. Finally, the business outcomes in terms of ROI provided by the solutions implementations will be discussed.
The document discusses how machine data from various sources such as IoT devices, industrial systems, mobile devices, and other systems can be collected and analyzed using Splunk software. Splunk provides capabilities for data ingestion, indexing, searching, analyzing, and visualizing large amounts of machine data. It also discusses how Splunk has been used by companies in various industries to gain insights from their machine data to improve operations, security, customer experience, and business outcomes. Specific use cases highlighted include predictive maintenance, anomaly detection, supply chain optimization, and understanding customer behavior.
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
This document discusses how companies are increasingly investing in next-generation technologies like big data, cloud computing, and software/hardware related to these areas. It notes that 90% of data will be on next-gen technologies within four years. It then discusses how a converged data platform can help organizations gain insights from both historical and real-time data through applications that combine operational and analytical uses. Key benefits include the ability to seamlessly access and analyze both types of data.
The document discusses resource tracking for Hadoop and Storm clusters at Yahoo. It describes how Yahoo developed tools over three years to track resource usage at the application, cluster, queue, user and project levels. This includes capturing CPU and memory usage for Hadoop YARN applications and Storm topologies. The data is stored and made available through dashboards and APIs. Yahoo also calculates total cost of ownership for Hadoop and converts resource usage to estimated monthly costs for projects. This visibility into usage and costs helps with capacity planning, operational efficiency, and ensuring fairness across grid users.
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)Abdelkrim Hadjidj
The architecture of modern enterprise data lakes is based on multiple Hadoop clusters. Several clusters are used to separate between environments such as dev, test, production and DR. In some organizations, each business line has its own dedicated cluster to comply with legal or internal constraints. This is often the case in financial services. Hybrid cloud deployment is another example of multi-clusters deployment that we see in several verticals such as manufacturing. In this presentation, we will introduce Dataplane Service (DPS), a global data management platform that enables organizations to operate, secure and govern multiple clusters from a single pane of glass. We will show how DPS is a foundation of any services that needs to operate on multiple clusters. Finally, we will present three services on top of DPS and we will focus on Data Replication and Disaster Recovery.
Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.
Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
Multi tier, multi-tenant, multi-problem kafkaTodd Palino
At LinkedIn, the Kafka infrastructure is run as a service: the Streaming team develops and deploys Kafka, but is not the producer or consumer of the data that flows through it. With multiple datacenters, and numerous applications sharing these clusters, we have developed an architecture with multiple pipelines and multiple tiers. Most days, this works out well, but it has led to many interesting problems. Over the years we have worked to develop a number of solutions, most of them open source, to make it possible for us to reliably handle over a trillion messages a day.
451 Research is a leading IT research and advisory company founded in 2000 with over 250 employees including over 100 analysts. It provides research and data through fifteen channels to over 1,000 clients on technology and service providers. The document discusses the evolution of the meaning of "Hadoop" from referring originally to specific Apache projects like HDFS and MapReduce to becoming a catch-all term for the distributed data processing ecosystem, and how different Hadoop distributions combine various related Apache projects in their offerings. It also examines how data platforms are converging, with various databases, analytics engines, and streaming platforms increasingly supporting common workloads and data models.
This workshop will provide a hands on introduction to basic Machine Learning techniques with Apache Spark ML using the cloud.
Format: A short introductory lecture on a select important supervised and unsupervised Machine Learning techniques followed by a demo, lab exercises and a Q&A session. The lecture will be followed by lab time to work through the lab exercises and ask questions.
Objective: To provide a quick and short hands-on introduction to Machine Learning with Spark ML. In the lab, you will use the following components: Apache Zeppelin (a “Modern Data Science Toolbox”) and Apache Spark. You will learn how to analyze the data, structure the data, train Machine Learning models and apply them to answer real-world questions.
Pre-requisites: Registrants must bring a laptop that can run the Hortonworks Data Cloud.
At this Crash Course everyone will have a cluster assigned to them to try several workloads using Machine Learning, Spark and Zeppelin on the cloud.
Speakers: Robert Hryniewicz
MapR on Azure: Getting Value from Big Data in the Cloud -MapR Technologies
Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
Common and unique use cases for Apache HadoopBrock Noland
The document provides an overview of Apache Hadoop and common use cases. It describes how Hadoop is well-suited for log processing due to its ability to handle large amounts of data in parallel across commodity hardware. Specifically, it allows processing of log files to be distributed per unit of data, avoiding bottlenecks that can occur when trying to process a single large file sequentially.
Big Data Platform Processes Daily Healthcare Data for Clinic Use at Mayo ClinicDataWorks Summit
The document summarizes Mayo Clinic's implementation of a big data platform to process and analyze large volumes of daily healthcare data, including HL7 messages, for enterprise-wide clinical and non-clinical usage. The platform, built on Hadoop and using technologies like Storm and Elasticsearch, reliably handles 20-50 times more data than their current daily volumes. It provides ultra-fast free text search capabilities. The system supports applications like processing data for colorectal surgery, exceeding requirements and outperforming previous RDBMS-only systems. Ongoing work involves further enhancing capabilities and integrating with additional components as part of a unified data platform.
Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
"Real World Use Cases: Hadoop and NoSQL in Production" by Tugdual Grall.
What’s important about a technology is what you can use it to do. I’ve looked at what a number of groups are doing with Apache Hadoop and NoSQL in production, and I will relay what worked well for them and what did not. Drawing from real world use cases, I show how people who understand these new approaches can employ them well in conjunction with traditional approaches and existing applications. Thread Detection, Datawarehouse optimization, Marketing Efficiency, Biometric Database are some examples exposed during this presentation.
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
The document discusses challenges with building operational data applications on Hadoop and introduces the Cask Data Application Platform (CDAP) as a solution. It provides an agenda that covers data applications, challenges, CDAP motivation and goals, use cases, and an introduction and architecture overview of CDAP. The document aims to demonstrate how CDAP provides a unified platform that simplifies application development and lifecycle while supporting reusable data and processing patterns.
Ted Dunning is the Chief Applications Architect at MapR Technologies and a committer for Apache Drill, Zookeeper, and other projects. The document discusses goals around real-time or near-time processing and microservices. It describes how to design microservices for isolation using self-describing data, private databases, and shared storage only where necessary. Various scenarios involving fraud detection, IoT data aggregation, and global data recovery are presented. Lessons focus on decoupling services, propagating events rather than table updates, and how data architecture should reflect business structure.
The document discusses how big data has enabled new opportunities by changing scaling laws and problem landscapes. Specifically, linearly scaling costs with big data now make it feasible to process large amounts of data, opening up many problems that were previously impossible or too difficult. This has created many "green field" opportunities where simple approaches can solve important problems. Two examples discussed are using log analysis to detect security threats and using transaction histories to find a common point of compromise for a data breach.
This document discusses scaling machine learning models from a laboratory setting to production. It proposes using a standardized representation called PMML to capture models produced by R and Scikit-Learn. PMML allows models to be deployed across different frameworks and languages. The document outlines APIs for evaluating, maintaining, and integrating models as reusable functions within data pipelines in Hadoop ecosystems like Spark, Pig, and Cascading. The goal is a portable, platform-agnostic architecture for operationalizing machine learning based on open standards.
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...Data Con LA
Today’s Software Defined environments attempt to remove the weakness of computing hardware from the operational equation. There is no doubt that this is a natural progress away from overpriced, proprietary compute and storage layers. However, even at the heart of any Software Defined universe is an underlying hardware stack that must be robust, reliable and cost effective. Our 20+ years experience delivering over 2000 clusters and clouds has taught us how to properly design and engineer the right hardware solution for Big Data, Cluster and Cloud environments. This presentation will share this knowledge allowing user to make better design decisions for any deployment.
Using Hadoop to Offload Data Warehouse Processing and More - Brad AnsersonMapR Technologies
The document discusses using Hadoop to optimize an enterprise data warehouse. It describes offloading some ETL and long-term storage tasks to Hadoop which provides significant cost savings over a traditional data warehouse. The hybrid solution leverages both Hadoop and the data warehouse for optimized querying, presentation and analytics. Examples are provided of real-time and operational applications that can be built using Hadoop technologies.
The document discusses MapR Streams, a global publish/subscribe event streaming system. It provides converged, continuous, and global capabilities. MapR Streams allows producers to publish billions of messages per second to topics, and guarantees immediate and reliable delivery to consumers. It also enables tying together geo-dispersed clusters globally. The document demonstrates MapR Streams capabilities with a live demo and discusses use cases for event streaming across various industries.
Performance and Scale Options for R with Hadoop: A comparison of potential ar...Revolution Analytics
R and Hadoop go together. In fact, they go together so well, that the number of options available can be confusing to IT and data science teams seeking solutions under varying performance and operational requirements.
Which configuration is faster for big files? Which is faster for sharing data and servers among groups? Which eliminates data movement? Which is easiest to manage? Which works best with iterative and multistep algorithms? What are the hardware requirements of each alternative?
This webinar is intended to help new users of R with Hadoop select their best architecture for integrating Hadoop and R, by explaining the benefits of several popular configurations, their performance potential, workload handling and programming model and administrative characteristics.
Presenters from Revolution Analytics will describe the options for using Revolution R Open and Revolution R Enterprise with Hadoop including servers, edge nodes, rHadoop and ScaleR. We’ll then compare the characteristics of each configuration as regards performance but also programming model, administration, data movement, ease of scaling, mixed workload handling, and performance for large individual analyses vs. mixed workloads.
Real life use cases from across Europe (Walid Aoudi - Cognizant)
This presentation will present some Cognizant Big Data clients return on experiences on continental Europe and UK. The main focus will be centered on use cases through the presentation of the business drivers behind these projects. Key highlights around the big data architecture and approach solutions will be presented. Finally, the business outcomes in terms of ROI provided by the solutions implementations will be discussed.
The document discusses how machine data from various sources such as IoT devices, industrial systems, mobile devices, and other systems can be collected and analyzed using Splunk software. Splunk provides capabilities for data ingestion, indexing, searching, analyzing, and visualizing large amounts of machine data. It also discusses how Splunk has been used by companies in various industries to gain insights from their machine data to improve operations, security, customer experience, and business outcomes. Specific use cases highlighted include predictive maintenance, anomaly detection, supply chain optimization, and understanding customer behavior.
Evolving Beyond the Data Lake: A Story of Wind and RainMapR Technologies
This document discusses how companies are increasingly investing in next-generation technologies like big data, cloud computing, and software/hardware related to these areas. It notes that 90% of data will be on next-gen technologies within four years. It then discusses how a converged data platform can help organizations gain insights from both historical and real-time data through applications that combine operational and analytical uses. Key benefits include the ability to seamlessly access and analyze both types of data.
The document discusses resource tracking for Hadoop and Storm clusters at Yahoo. It describes how Yahoo developed tools over three years to track resource usage at the application, cluster, queue, user and project levels. This includes capturing CPU and memory usage for Hadoop YARN applications and Storm topologies. The data is stored and made available through dashboards and APIs. Yahoo also calculates total cost of ownership for Hadoop and converts resource usage to estimated monthly costs for projects. This visibility into usage and costs helps with capacity planning, operational efficiency, and ensuring fairness across grid users.
FOD Paris Meetup - Global Data Management with DataPlane Services (DPS)Abdelkrim Hadjidj
The architecture of modern enterprise data lakes is based on multiple Hadoop clusters. Several clusters are used to separate between environments such as dev, test, production and DR. In some organizations, each business line has its own dedicated cluster to comply with legal or internal constraints. This is often the case in financial services. Hybrid cloud deployment is another example of multi-clusters deployment that we see in several verticals such as manufacturing. In this presentation, we will introduce Dataplane Service (DPS), a global data management platform that enables organizations to operate, secure and govern multiple clusters from a single pane of glass. We will show how DPS is a foundation of any services that needs to operate on multiple clusters. Finally, we will present three services on top of DPS and we will focus on Data Replication and Disaster Recovery.
Predicting failure in power networks, detecting fraudulent activities in payment card transactions, and identifying next logical products targeted at the right customer at the right time all require machine learning around massive data sets. This form of artificial intelligence requires complex self-learning algorithms, rapid data iteration for advanced analytics and a robust big data architecture that’s up to the task.
Learn how you can quickly exploit your existing IT infrastructure and scale operations in line with your budget to enjoy advanced data modeling, without having to invest in a large data science team.
Rigorous and Multi-tenant HBase PerformanceCloudera, Inc.
The document discusses techniques for rigorously measuring Apache HBase performance in both standalone and multi-tenant environments. It introduces the Yahoo! Cloud Serving Benchmark (YCSB) and best practices for cluster setup, workload generation, data loading, and measurement. These include pre-splitting tables, warming caches, setting target throughput, and using appropriate workload distributions. The document also covers challenges in achieving good multi-tenant performance across HBase, MapReduce and Apache Solr.
Multi tier, multi-tenant, multi-problem kafkaTodd Palino
At LinkedIn, the Kafka infrastructure is run as a service: the Streaming team develops and deploys Kafka, but is not the producer or consumer of the data that flows through it. With multiple datacenters, and numerous applications sharing these clusters, we have developed an architecture with multiple pipelines and multiple tiers. Most days, this works out well, but it has led to many interesting problems. Over the years we have worked to develop a number of solutions, most of them open source, to make it possible for us to reliably handle over a trillion messages a day.
Solving Multi-tenancy and G1GC in Apache HBase HBaseCon
This document discusses tuning Garbage First Garbage Collector (G1GC) for HBase clusters. Out of the box G1GC can hurt performance with long GC pauses. The key tuning parameters are heap size, initiating heap occupancy percentage, Eden size percentage, and HBase memory configuration caps. Tuning involves setting these parameters based on historical maximums for block cache size, memstore size, and static index size plus a buffer. Tuning Eden size also considers percentage time in GC and average young GC pause times. Adjustments may be needed over time based on cluster usage. Suboptimal client usage could also impact GC and requires fixing. Monitoring GC metrics helps evaluate tuning effectiveness.
Lily for the Bay Area HBase UG - NYC editionNGDATA
The document discusses Lily, an open source content application developed by Outerthought that uses HBase for scalable storage and SOLR for search. It provides a high-level overview of Lily's architecture, which maps content to HBase, indexes it in SOLR, and uses a queue implemented on HBase to connect updates between the systems. Future plans for Lily include a 1.0 release with additional features like user management and a UI framework.
Spark Summit - Watson Analytics for Social Media: From single tenant Hadoop t...Behar Veliqi
- WHAT IS WATSON ANALYTICS FOR SOCIAL MEDIA
- PREVIOUS ARCHITECTURE ON HADOOP
- THOUGHT PROCESS TOWARDS MULTITENANCY
- NEW ARCHITECTURE ON TOP OF APACHE SPARK
- LESSONS LEARNED
This document discusses using event streams as the system of record for data, rather than traditional databases. It argues that streams can serve as the single source of truth for data, providing benefits like data lineage, auditing, and integrity. It also describes how healthcare company Liaison uses a streaming platform from MapR to power their data integration platform, gaining the advantages of streams while meeting various compliance requirements.
Treasure Data provides a big data analytics platform that runs on Hadoop in the cloud. It aims to simplify big data and make it accessible for more users ("Big Data for the Rest of Us"). Treasure Data collects and stores data from various sources in its cloud-based columnar datastore and allows querying and analysis of data through SQL, REST APIs and other tools. It handles all the operational complexities of Hadoop and provides a simple interface for users.
This document provides an overview of distributed databases and the Yahoo! Cloud Serving Benchmark (YCSB). It discusses NoSQL databases Cassandra and HBase and how YCSB can be used to benchmark their performance. Experiments were conducted on Amazon EC2 using YCSB to load data and run workloads on Cassandra and HBase clusters. The results showed Cassandra had lower latency and higher throughput than HBase. YCSB provides a way to compare the performance of different databases.
Managing multi tenant resource toward Hive 2.0Kai Sasaki
This document discusses Treasure Data's migration architecture for managing resources across multiple clusters when upgrading from Hive 1.x to Hive 2.0. It introduces components like PerfectQueue and Plazma that enable blue-green deployment without downtime. It also describes how automatic testing and validation is done to prevent performance degradation. Resource management is discussed to define resources per account across different job queues and Hadoop clusters. Brief performance comparisons show improvements from Hive 2.x features like Tez and vectorization.
At the StampedeCon 2015 Big Data Conference: YARN enables Hadoop to move beyond just pure batch processing. With that multiple workloads and tenants now must be able to share a single infrastructure for data processing. Features of the Capacity Scheduler enable resource sharing among multiple tenants in a fair manner with elastic queues to maximize utilization. This talk will focus on the features of the Capacity Scheduler that enable Multi-Tenancy and how resource sharing can be rebalanced using features like Preemption.
Hortonworks Technical Workshop: What's New in HDP 2.3Hortonworks
Hortonworks Data Platform (HDP) 2.3 includes several new capabilities:
1) It improves the user experience with more guided configuration, customizable dashboards, and improved workload management.
2) It enhances security with new data encryption at rest and extends data governance.
3) It adds proactive cluster monitoring through Hortonworks SmartSense to enhance support.
This document discusses strategies for filling a data lake by improving the process of data onboarding. It advocates using a template-based approach to streamline data ingestion from various sources and reduce dependence on hardcoded procedures. The key aspects are managing ELT templates and metadata through automated metadata extraction. This allows generating integration jobs dynamically based on metadata passed at runtime, providing flexibility to handle different source data with one template. It emphasizes reducing the risks associated with large data onboarding projects by maintaining a standardized and organized data lake.
Multi-Tenant Operations with Cloudera 5.7 & BTCloudera, Inc.
One benefit of Apache Hadoop is the ability to power multiple workloads, across many different users and departments, all within a single, shared cluster. Hear how BT is doing this today and learn about new features in Cloudera Manager to provide better visibility for multi-tenant operations.
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
Hortonworks Data Platform is a key component of Modern Data Architecture. Organizations rely on HDP for mission critical business functions and expects for the system to be constantly available and performant. In this session we will cover the operational best practices for administering the Hortonworks Data Platform including the initial setup and ongoing maintenance.
Apache Phoenix and Apache HBase: An Enterprise Grade Data WarehouseJosh Elser
An overview of Apache Phoenix and Apache HBase from the angle of a traditional data warehousing solution. This talk focuses on where this open-source architect fits into the market outlines the features and integrations of the product, showing that it is a viable alternative to traditional data warehousing solutions.
Presented by Mark Miller, Software Developer, Cloudera
Apache Lucene/Solr committer Mark Miller talks about how Solr has been integrated into the Hadoop ecosystem to provide full text search at "Big Data" scale. This talk will give an overview of how Cloudera has tackled integrating Solr into the Hadoop ecosystem and highlights some of the design decisions and future plans. Learn how Solr is getting 'cozy' with Hadoop, which contributions are going to what project, and how you can take advantage of these integrations to use Solr efficiently at "Big Data" scale. Learn how you can run Solr directly on HDFS, build indexes with Map/Reduce, load Solr via Flume in 'Near Realtime' and much more.
Harmonizing Multi-tenant HBase Clusters for Managing Workload DiversityHBaseCon
Speakers: Dheeraj Kapur, Rajiv Chittajallu & Anish Mathew (Yahoo!)
In early 2013, Yahoo! introduced multi-tenancy to HBase to offer it as a platform service for all Hadoop users. A certain degree of customization per tenant (a user or a project) was achieved through RegionServer groups, namespaces, and customized configs for each tenant. This talk covers how to accommodate diverse needs to individual tenants on the cluster, as well as operational tips and techniques that allow Yahoo! to automate the management of multi-tenant clusters at petabyte scale without errors.
The NameNode was experiencing high load and instability after being restarted. Graphs showed unknown high load between checkpoints on the NameNode. DataNode logs showed repeated 60000 millisecond timeouts in communication with the NameNode. Thread dumps revealed NameNode server handlers waiting on the same lock, indicating a bottleneck. Source code analysis pointed to repeated block reports from DataNodes to the NameNode as the likely cause of the high load.
HBase Vs Cassandra Vs MongoDB - Choosing the right NoSQL databaseEdureka!
NoSQL includes a wide range of different database technologies and were developed as a result of surging volume of data stored. Relational databases are not capable of coping with this huge volume and faces agility challenges. This is where NoSQL databases have come in to play and are popular because of their features. The session covers the following topics to help you choose the right NoSQL databases:
Traditional databases
Challenges with traditional databases
CAP Theorem
NoSQL to the rescue
A BASE system
Choose the right NoSQL database
Accelerating Innovation with Hybrid CloudJeff Jakubiak
1) The document discusses IBM's hybrid cloud portfolio and how it can help organizations accelerate innovation through hybrid cloud.
2) IBM's hybrid cloud portfolio spans infrastructure, platform and application services across public, private and dedicated cloud environments to provide flexibility.
3) Key benefits highlighted include accelerating digital transformation, increasing operational speed and flexibility, and unlocking existing data and applications through hybrid integration.
The mainframe IMS assets can be integrated in the API economy, in the hybrid Cloud, in virtual data lake repositories. Just imagine any business use case and ask for guidance
Cloud computing is a style of computing in which scalable and elastic IT capabilities are delivered as a service using internet technologies. Key aspects include resources that are scalable and metered by use, and may be single-tenant or multi-tenant and hosted remotely or on-premises. Self-service interfaces like web UIs and APIs are exposed directly to customers. Cloud services can provide software, platforms, infrastructure, integration capabilities, and everything as a service (XaaS). The major cloud providers offer various capabilities and are competing on features, price, and services beyond basic compute and storage.
Move from Web Era to PaaS requires careful planning. This presentation simplifies the process by outlining 7 basic steps an Enterprise has to consider as it moves to PaaS
Monitoring your applications, get into a framework of proactive application fixing instead of reactive. And with IBM, reduce your outages with the of predictive insights.
The intersection of Traditional IT and New-Generation ITKangaroot
Keynote from Franz Meyer - VP, EMEA Strategic Business Development Red Hat about "The intersection of Traditional IT and New-Generation IT : the Red Hat Open Hybrid Journey". This presentation was given during the Open Source Cloud Day of Kangaroot & Red Hat.
This document discusses cloud computing and how it can empower development teams. It begins with an overview of cloud service models like Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS). It then discusses how the cloud can increase agility for development teams through virtualization, standardization, and automation. The document also provides examples of how tools like IBM Eclipse Tools for Bluemix and containers can help development teams deploy applications to the cloud more efficiently.
High Value Business Intelligence for IBM Platform compute environmentsGabor Samu
IBM Platform Analytics is an advanced analysis and visualization tool for analyzing workload data from IBM Platform LSF and IBM Platform Symphony clusters. It allows organizations to correlate workload, resource and license data from multiple clusters for data-driven decision making.
The document discusses major app development trends for 2014, including mobile, social, cloud, and big data. It notes that mobile apps and advertising revenues are rapidly increasing. Popular mobile platforms include Android and iOS, while hybrid and cross-platform SDKs are gaining popularity for app development. Social media integration is also a significant trend, with apps adopting more social features. Cloud computing, especially personal cloud storage and Infrastructure as a Service (IaaS), is seeing greater adoption. Finally, big data solutions around Hadoop, NoSQL databases, and data analysis are increasingly important.
The document provides an overview of Software as a Service (SaaS) including:
- SaaS is a software delivery model that provides remote access to software via the web for a recurring fee, enabling users to access functionality hosted by the provider.
- SaaS is a subset of cloud computing where resources are provided as a service over the internet.
- Major benefits of SaaS include lower costs, quick access to updates, and reduced need for infrastructure management.
Migration to cloud is no easy task. Start small and learn the core technologies before leveraging the advanced features of the cloud. The cultural change will affect the whole organization from development to business management and sales.
Cloud native applications are the future of software. Modern software is stateless, provided from cloud to heterogeneous clients on demand and designed to be scalable and resilient.
Accelerating the Path to Digital with a Cloud Data StrategyMongoDB
This document discusses accelerating digital transformation through a cloud data strategy using MongoDB.
It begins by outlining MongoDB's capabilities as a cloud data platform, including its use by over 3000 enterprises. The document then discusses how time to market has replaced cost as the primary driver for cloud adoption. It also outlines considerations for choosing a cloud data platform like deployment flexibility, reducing complexity, agility, resiliency, scalability, cost, and security.
The document then provides an overview of MongoDB's cloud offerings, including MongoDB Atlas on public clouds, MongoDB Ops Manager for private clouds, and MongoDB Stitch for backend services. It also discusses best practices for replatforming applications from relational databases to MongoDB in the cloud.
Towards Application Portability in Platform as a ServiceStefan Kolb
Get the book "On the Portability of Applications in Platform as a Service" at https://www.amazon.de/dp/3863096312
Presentation from IEEE SOSE 2014. Full paper at http://bit.ly/paaspaper
1. The document discusses DevOps and hybrid cloud, with DevOps being an approach combining culture, processes, and technologies to continuously deliver applications and innovation.
2. APIs are key to hybrid cloud and DevOps, allowing components and services to be developed and reused across teams and cloud environments.
3. IBM recommends organizations build a common toolchain including tools for development, testing, deployment, and monitoring to facilitate DevOps practices and hybrid cloud deployments.
The document discusses cloud adoption patterns for integrating cloud solutions into an organization's IT strategy. It outlines different types of application migrations like lift-and-shift, cloud tuning, and cloud-centric design. It also covers design principles for cloud-native applications like microservices and stateless runtimes. Various DevOps patterns are presented, such as continuous integration/delivery pipelines, functional testing, and log aggregation. The goal is to provide guidance on architectural approaches and best practices for developing and deploying applications in the cloud.
Originally Published on Sep 23, 2014
IBM InfoSphere BigInsights, an enterprise-ready distribution of Hadoop, is designed to address the challenges of big data and modern IT by analyzing larger volumes of data more cost-effectively. Deployed on the cloud, it enables rapid deployment of clusters and real-time analytics.
FYI: The value of Hadoop and many more questions will be pondered at this year’s Strata/Hadoop World event in NYC (October 15-17, 2014) and certainly at IBM Insight (October 26-30, 2014).
This document discusses cloud adoption patterns to help organizations integrate cloud solutions into their IT strategies. It introduces the concept of patterns and pattern languages as solutions to problems in context. The document outlines categories of cloud adoption patterns and provides examples of patterns for application architecture, deployment styles, data caching, and more. It also discusses considerations for migrating applications to the cloud through lift and shift, cloud tuning, or cloud-centric redesign. The goal is to provide guidance to organizations on evaluating workloads and adopting cloud technologies.
Build end-to-end solutions with BlueMix, Avi Vizel & Ziv Dai, IBMCodemotion Tel Aviv
The document discusses IBM's cloud platform Bluemix. It provides an overview of Bluemix, describing it as an open platform for developing and hosting applications that simplifies tasks associated with managing infrastructure at internet scale. Bluemix is built on IBM's Cloud Operating Environment architecture using Cloud Foundry as an open source PaaS. It enables developers to rapidly build, deploy, and manage cloud applications while tapping into available services and runtimes provided by IBM and other ecosystem partners. The document outlines some key Bluemix concepts and components such as applications, services, organizations/spaces, and buildpacks.
Learn about IBM's Hadoop offering called BigInsights. We will look at the new features in version 4 (including a discussion on the Open Data Platform), review a couple of customer examples, talk about the overall offering and differentiators, and then provide a brief demonstration on how to get started quickly by creating a new cloud instance, uploading data, and generating a visualization using the built-in spreadsheet tooling called BigSheets.
Similar to STAC Summit 2014 - Building a multitenant Big Data infrastructure (20)
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GridMate - End to end testing is a critical piece to ensure quality and avoid...ThomasParaiso2
End to end testing is a critical piece to ensure quality and avoid regressions. In this session, we share our journey building an E2E testing pipeline for GridMate components (LWC and Aura) using Cypress, JSForce, FakerJS…
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Enchancing adoption of Open Source Libraries. A case study on Albumentations.AIVladimir Iglovikov, Ph.D.
Presented by Vladimir Iglovikov:
- https://www.linkedin.com/in/iglovikov/
- https://x.com/viglovikov
- https://www.instagram.com/ternaus/
This presentation delves into the journey of Albumentations.ai, a highly successful open-source library for data augmentation.
Created out of a necessity for superior performance in Kaggle competitions, Albumentations has grown to become a widely used tool among data scientists and machine learning practitioners.
This case study covers various aspects, including:
People: The contributors and community that have supported Albumentations.
Metrics: The success indicators such as downloads, daily active users, GitHub stars, and financial contributions.
Challenges: The hurdles in monetizing open-source projects and measuring user engagement.
Development Practices: Best practices for creating, maintaining, and scaling open-source libraries, including code hygiene, CI/CD, and fast iteration.
Community Building: Strategies for making adoption easy, iterating quickly, and fostering a vibrant, engaged community.
Marketing: Both online and offline marketing tactics, focusing on real, impactful interactions and collaborations.
Mental Health: Maintaining balance and not feeling pressured by user demands.
Key insights include the importance of automation, making the adoption process seamless, and leveraging offline interactions for marketing. The presentation also emphasizes the need for continuous small improvements and building a friendly, inclusive community that contributes to the project's growth.
Vladimir Iglovikov brings his extensive experience as a Kaggle Grandmaster, ex-Staff ML Engineer at Lyft, sharing valuable lessons and practical advice for anyone looking to enhance the adoption of their open-source projects.
Explore more about Albumentations and join the community at:
GitHub: https://github.com/albumentations-team/albumentations
Website: https://albumentations.ai/
LinkedIn: https://www.linkedin.com/company/100504475
Twitter: https://x.com/albumentations
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.