This document discusses Spark application development and common problems that can occur. It notes that failures, wrong results, poor performance, scalability issues, and application, data, storage, and resource problems can all go wrong with Spark applications. It asks how application developers currently detect and fix these issues by looking at logs, but that logs are spread out, incomplete, and difficult to understand. It proposes that a better approach is to visualize all relevant data in one place, analyze the data to provide diagnoses and fixes, and help prevent problems and meet goals. It then lists some existing tools for Hadoop and Spark that provide visualization, optimization, and strategic capabilities.
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - LectureRainer Winkler
Software can easily become complex and difficult to handle - join this session to learn techniques on how to manage and prevent this. You will see how test seams for ABAP simplify unit tests, even in legacy ABAP code with many dependencies. We will demonstrate an OpenSource tool to automatically generate dependency graphs and use it in projects, and the main technique for working with legacy code - writing a characterization test and using it as safety net while making changes.
This presentation was given at the SAP TechED Barcelona 2017
Triggers, more specifically DML triggers, are blocks of code that run automatically when the associated event occurs on a table. Some developers use them a lot. Many others say “Never use triggers!” What’s a DB dev to do?
In our September 3 2019 PL/SQL Office Hours, Chris Saxon and Steven Feuerstein explore some of the nuances of triggers and have a BIG ARGUMENT over how and when they should be used. Well, OK, maybe not a BIG argument. But we’ll be happy to argue with anyone who shows up. Well, not ARGUE, exactly.
Guest appearances from Toon Koppelaars, of the Oracle Real World Performance Team, and Jacek Gebal of utPLSQL v3 fame.
Here are the slides.
Machine Learning is often discussed in the context of data science, but little attention is given to the complexities of engineering production ready ML systems. This talk will explore some of the important challenges and provide advice on solutions to these problems.
Delivered @ MusicCityCode 6/2/2017
Knowledge is power, but is it if you're not using it? What if the application you delivered to your customers was extremely intelligent? It could retrieve, analyze and use the massive amounts of data that businesses are generating at an astronomical rate.
It could analyze business deals, predict potential issues, proactively recommend business decisions and estimate profit, loss and risks.
Those things provide direct benefits to your company. Churning through that data by hand doesn't. Enter Azure Machine Learning.
In this session you will learn how to integrate Azure Machine Learning into your existing applications and workflows with REST services. You will learn how to deliver a modular, maintainable solution to your customers that allows them to analyze their data.
You will learn to:
* Numerous ways to abstract business rules, workflows, AI (Machine Learning) and more into your applications
* How to Integrate Azure Machine Learning into your existing Applications and Processes
* Create Azure Machine Learning Experiments
* Retrieve the Score from an Azure Machine Learning Experiment and integrate it into your applications and processes
* Integrate numerous Machine Learning Experiments from the Azure Machine Learning Marketplace into your existing applications and processes
* Learn various concepts for abstracting and managing services and api's.
Survive the Chaos - S4H151 - SAP TechED Barcelona 2017 - LectureRainer Winkler
Software can easily become complex and difficult to handle - join this session to learn techniques on how to manage and prevent this. You will see how test seams for ABAP simplify unit tests, even in legacy ABAP code with many dependencies. We will demonstrate an OpenSource tool to automatically generate dependency graphs and use it in projects, and the main technique for working with legacy code - writing a characterization test and using it as safety net while making changes.
This presentation was given at the SAP TechED Barcelona 2017
Triggers, more specifically DML triggers, are blocks of code that run automatically when the associated event occurs on a table. Some developers use them a lot. Many others say “Never use triggers!” What’s a DB dev to do?
In our September 3 2019 PL/SQL Office Hours, Chris Saxon and Steven Feuerstein explore some of the nuances of triggers and have a BIG ARGUMENT over how and when they should be used. Well, OK, maybe not a BIG argument. But we’ll be happy to argue with anyone who shows up. Well, not ARGUE, exactly.
Guest appearances from Toon Koppelaars, of the Oracle Real World Performance Team, and Jacek Gebal of utPLSQL v3 fame.
Here are the slides.
Machine Learning is often discussed in the context of data science, but little attention is given to the complexities of engineering production ready ML systems. This talk will explore some of the important challenges and provide advice on solutions to these problems.
Delivered @ MusicCityCode 6/2/2017
Knowledge is power, but is it if you're not using it? What if the application you delivered to your customers was extremely intelligent? It could retrieve, analyze and use the massive amounts of data that businesses are generating at an astronomical rate.
It could analyze business deals, predict potential issues, proactively recommend business decisions and estimate profit, loss and risks.
Those things provide direct benefits to your company. Churning through that data by hand doesn't. Enter Azure Machine Learning.
In this session you will learn how to integrate Azure Machine Learning into your existing applications and workflows with REST services. You will learn how to deliver a modular, maintainable solution to your customers that allows them to analyze their data.
You will learn to:
* Numerous ways to abstract business rules, workflows, AI (Machine Learning) and more into your applications
* How to Integrate Azure Machine Learning into your existing Applications and Processes
* Create Azure Machine Learning Experiments
* Retrieve the Score from an Azure Machine Learning Experiment and integrate it into your applications and processes
* Integrate numerous Machine Learning Experiments from the Azure Machine Learning Marketplace into your existing applications and processes
* Learn various concepts for abstracting and managing services and api's.
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
Spark and SQL-on-Hadoop have made it easier than ever for enterprises to create or migrate apps to the big data stack. Thousands of apps are being generated every day in the form of ETL and modeling pipelines, business intelligence and data cubes, deep machine learning, graph analytics, and real-time data streaming. However, the task of reliably operationalizing these big data apps involves many painpoints. Developers may not have the experience in distributed systems to tune apps for efficiency and performance. Diagnosing failures or unpredictable performance of apps can be a laborious process that involves multiple people. Apps may get stuck or steal resources and cause mission-critical apps to miss SLAs.
This talk with introduce the audience to these problems and their common causes. We will also demonstrate how to find and fix these problems quickly, as well as prevent such problems from happening in the first place.
Speakers:
Dr. Shivnath Babu is a Co-founder and CTO of Unravel and Associate Professor of Computer Science at Duke University. With more than a decade of experience researching the ease of use and manageability of data-intensive systems, he leads the Starfish project at Duke, which pioneered the automation of Hadoop application tuning, problem diagnosis, and resource management. Shivnath has more than 80 peer-reviewed publications to his credit and has received the U.S. National Science Foundation CAREER Award, the HP Labs Innovation Award, and three IBM Faculty Awards.
Delivered at Pittsburgh Tech Fest - 6/10/2017
Knowledge is power, but is it if you're not using it? What if the application you delivered to your customers was extremely intelligent? It could retrieve, analyze and use the massive amounts of data that businesses are generating at an astronomical rate.
It could analyze business deals, predict potential issues, proactively recommend business decisions and estimate profit, loss and risks.
Those things provide direct benefits to your company. Churning through that data by hand doesn't. Enter Azure Machine Learning.
In this session you will learn how to integrate Azure Machine Learning into your existing applications and workflows with REST services. You will learn how to deliver a modular, maintainable solution to your customers that allows them to analyze their data.
You will learn to:
* Numerous ways to abstract business rules, workflows, AI (Machine Learning) and more into your applications
* How to Integrate Azure Machine Learning into your existing Applications and Processes
* Create Azure Machine Learning Experiments
* Retrieve the Score from an Azure Machine Learning Experiment and integrate it into your applications and processes
* Integrate numerous Machine Learning Experiments from the Azure Machine Learning Marketplace into your existing applications and processes
* Learn various concepts for abstracting and managing services and api's.
demo on own dataset (csv, dicom, image...etc) for each service how to apply, in practice ,data science with various Azure machine learning services vs when this service should be used in what scenario/datasets, demo azure services include -
Azure TSQL in database analytics
Azure Batch Service for multiple dataset + parallel model training
Azure BatchAI service for deep learning models with GPU acceleration
Azure databrick for deep learning + opencv (computer vision tasks) + sklearn (normal machine learning models)
Azure Data science virtual machine <-- a sandbox & shared environment for data science experiments
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
http://www.bigdataspain.org/2014/conference/state-of-play-data-science-on-hadoop-in-2015-keynote
Machine Learning is not new. Big Machine Learning is qualitatively different: More data beats algorithm improvement, scale trumps noise and sample size effects, can brute-force manual tasks.
Session presented at Big Data Spain 2014 Conference
18th Nov 2014
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Slides: https://speakerdeck.com/bigdataspain/state-of-play-data-science-on-hadoop-in-2015-by-sean-owen-at-big-data-spain-2014
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
Spark and SQL-on-Hadoop have made it easier than ever for enterprises to create or migrate apps to the big data stack. Thousands of apps are being generated every day in the form of ETL and modeling pipelines, business intelligence and data cubes, deep machine learning, graph analytics, and real-time data streaming. However, the task of reliably operationalizing these big data apps involves many painpoints. Developers may not have the experience in distributed systems to tune apps for efficiency and performance. Diagnosing failures or unpredictable performance of apps can be a laborious process that involves multiple people. Apps may get stuck or steal resources and cause mission-critical apps to miss SLAs.
This talk with introduce the audience to these problems and their common causes. We will also demonstrate how to find and fix these problems quickly, as well as prevent such problems from happening in the first place.
Speakers:
Dr. Shivnath Babu is a Co-founder and CTO of Unravel and Associate Professor of Computer Science at Duke University. With more than a decade of experience researching the ease of use and manageability of data-intensive systems, he leads the Starfish project at Duke, which pioneered the automation of Hadoop application tuning, problem diagnosis, and resource management. Shivnath has more than 80 peer-reviewed publications to his credit and has received the U.S. National Science Foundation CAREER Award, the HP Labs Innovation Award, and three IBM Faculty Awards.
Delivered at Pittsburgh Tech Fest - 6/10/2017
Knowledge is power, but is it if you're not using it? What if the application you delivered to your customers was extremely intelligent? It could retrieve, analyze and use the massive amounts of data that businesses are generating at an astronomical rate.
It could analyze business deals, predict potential issues, proactively recommend business decisions and estimate profit, loss and risks.
Those things provide direct benefits to your company. Churning through that data by hand doesn't. Enter Azure Machine Learning.
In this session you will learn how to integrate Azure Machine Learning into your existing applications and workflows with REST services. You will learn how to deliver a modular, maintainable solution to your customers that allows them to analyze their data.
You will learn to:
* Numerous ways to abstract business rules, workflows, AI (Machine Learning) and more into your applications
* How to Integrate Azure Machine Learning into your existing Applications and Processes
* Create Azure Machine Learning Experiments
* Retrieve the Score from an Azure Machine Learning Experiment and integrate it into your applications and processes
* Integrate numerous Machine Learning Experiments from the Azure Machine Learning Marketplace into your existing applications and processes
* Learn various concepts for abstracting and managing services and api's.
demo on own dataset (csv, dicom, image...etc) for each service how to apply, in practice ,data science with various Azure machine learning services vs when this service should be used in what scenario/datasets, demo azure services include -
Azure TSQL in database analytics
Azure Batch Service for multiple dataset + parallel model training
Azure BatchAI service for deep learning models with GPU acceleration
Azure databrick for deep learning + opencv (computer vision tasks) + sklearn (normal machine learning models)
Azure Data science virtual machine <-- a sandbox & shared environment for data science experiments
Using Machine Learning to Understand Kafka Runtime Behavior (Shivanath Babu, ...confluent
Apache Kafka is now nearly ubiquitous in modern data pipelines and use cases. While the Kafka development model is elegantly simple, operating Kafka clusters in production environments is a challenge. It’s hard to troubleshoot misbehaving Kafka clusters, especially when there are potentially hundreds or thousands of topics, producers and consumers and billions of messages.
The root cause of why real-time applications is lag may be due to an application problem – like poor data partitioning or load imbalance – or due to a Kafka problem – like resource exhaustion or suboptimal configuration. Therefore getting the best performance, predictability, and reliability for Kafka-based applications can be difficult. In the end, the operation of your Kafka powered analytics pipelines could themselves benefit from machine learning (ML).
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
http://www.bigdataspain.org/2014/conference/state-of-play-data-science-on-hadoop-in-2015-keynote
Machine Learning is not new. Big Machine Learning is qualitatively different: More data beats algorithm improvement, scale trumps noise and sample size effects, can brute-force manual tasks.
Session presented at Big Data Spain 2014 Conference
18th Nov 2014
Kinépolis Madrid
http://www.bigdataspain.org
Event promoted by: http://www.paradigmatecnologico.com
Slides: https://speakerdeck.com/bigdataspain/state-of-play-data-science-on-hadoop-in-2015-by-sean-owen-at-big-data-spain-2014
Introduction to NetGuardians' Big Data Software StackJérôme Kehrli
NetGuardians is executing it's Big Data Analytics Platform on three key Big Data components underneath: ElasticSearch, Apache Mesos and Apache Spark. This is a presentation of the behaviour of this software stack.
2. Lance Co Ting Keh
Machine Learning @ Box
Distributed ML Infrastructure
Go Blue Devils!
Shivnath Babu
Associate Professor @ Duke
Chief Scientist at Unravel Data
Systems
R&D in Management of Data Systems
15. What can go wrong?
• Failures
• My query failed after 6 hours!
• What does this exception mean?
• Wrong results
• Result of my job looks wrong
• Bad performance
• My app is very slow
• Pipeline is not meeting the 4hr SLA
• Poor scalability
• Oh, but it worked on the dev cluster!
• Bad App(le)s
• Tom’s query brought the cluster down
• Application Problems
• Poor choice of transformations
• Ineffective caching
• Bloated data structures
• Data/Storage Problems
• Skewed data, load imbalance
• Small files, poor data partitioning
• Spark Problems
• Shuffle
• Lazy evaluation causes confusion
• Resources Problems
• Resource contention
• Performance degradation
And Why?