Cloud technology is a crucial factor in machine learning. The presentation was first done at the March Colombo Cloud Meetup.
Discussed about Azure Machine Learning, GPU processing and many more...
Azure Machine learning is a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. Here's a jump start for Azure ML.
Using Azure Machine Learning to Detect Patterns in Data from DevicesBizTalk360
This session is about learning how to use Microsoft Azure Machine Learning with the devices in order to detect data patterns. This session will cover an introduction to Machine Learning, and different algorithms used to detect data patterns. The algorithms discussed will be nearest neighbor, probabilistic learning, decision trees, and neural networks. It will also cover data that comes from devices like the Kinect for Windows device. The session will show basic demos and data coming from the device. The session will then drill down into how to incorporate Azure Machine Learning features into an application to detect data patterns in real time.
Keynote: Artificial Intelligence Methods for Time Series Forecasting and Classification of Real-Time IoT Sensor Data Streams, Romeo Kienzler, Chief Data Scientist - IBM Watson IoT WW, IBM Academy of Technology
Azure Machine learning is a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. Here's a jump start for Azure ML.
Using Azure Machine Learning to Detect Patterns in Data from DevicesBizTalk360
This session is about learning how to use Microsoft Azure Machine Learning with the devices in order to detect data patterns. This session will cover an introduction to Machine Learning, and different algorithms used to detect data patterns. The algorithms discussed will be nearest neighbor, probabilistic learning, decision trees, and neural networks. It will also cover data that comes from devices like the Kinect for Windows device. The session will show basic demos and data coming from the device. The session will then drill down into how to incorporate Azure Machine Learning features into an application to detect data patterns in real time.
Keynote: Artificial Intelligence Methods for Time Series Forecasting and Classification of Real-Time IoT Sensor Data Streams, Romeo Kienzler, Chief Data Scientist - IBM Watson IoT WW, IBM Academy of Technology
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
Notebooks are a great tool for Big Data. They have drastically changed the way scientists and engineers develop and share ideas. However, most world-class Spark products cannot be easily engineered, tested and deployed just by modifying or combining notebooks. Taking a prototype to production with high quality typically involves proper software engineering.
10 Reasons Why Your SAP Applications Belong on NetAppNetApp
NetApp has been supporting SAP for 20 years, delivering advanced solutions for SAP applications. Here are 10 reasons why your SAP applications belong on NetApp!
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
This talk summarizes key points for big data advanced analytics on Microsoft Azure. First, there is a review of the major technologies. Second, there is a series of technology demos (focusing on VMs, Databricks and Azure ML Service). Third, there is some advice on using the Team Data Science Process to help plan projects. The deck has web resources recommended. This presentation was delivered at the Global Azure Bootcamp 2019, Atlanta GA location (Alpharetta Avalon).
Deploying machine learning pipelines robustly at scale is one of the biggest challenges within an organization. Kubeflow is an open-source platform for distributed training, tuning, and serving models on Kubenetes. As a comprehensive solution for deploying and managing end-to-end data science and machine learning pipelines, Kubeflow is rapidly accelerating analytics innovation and adoption. John provides an overview of Kubeflow and how he has been using it in the wild.
Bootstrapping of PySpark Models for Factorial A/B TestsDatabricks
A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.
We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.
To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.
In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.
AWS Sydney Meetup Presentation by Binqi Zhang of PolarSeven - http://polarseven.com
How the Internet of Things is expanding at a rapid pace with more and more connected devices being added all the time.
The world technology landscape is ever changing and evolving as we can track and measure more data than ever before with high availability connectivity.
Using AWS IoT, which is now available in Australia Binqi looks at some of the ways the new technology can be applied.
Best practices with Microsoft Graph: Making your applications more performant...Microsoft Tech Community
Learn how to take advantage of APIs, platform capabilities and intelligence from Microsoft Graph to make your app more performant, more resilient and more reliable
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...Databricks
Video is everything at Viacom. They build their own video players on iOS, Android and web platforms, and they have to know how those players are performing so they track critical metrics in near real-time with Apache Kafka, Spark and the Databricks platform.
In this session, Viacom will share how a quick proof of concept turned into a system that is giving them real insights into their video player performance. They will also discuss investigating platforms like Druid for fast slicing and dicing of data for business-oriented users.
One of the key takeaways is learning how, as engineers, we should work to drive value through technology, even if we work for a company that may not be tech first. Also, the data you collect can be a distraction, so create focus. Lastly, different users require different interfaces into the same data; learn how Viacom made that happen through technology, even if we work for a company that may not be tech first.
— The data you collect can be a distraction so create focus.
— Different users require different interfaces into the same data. We’ll talk about how we made that happen.
Managing your ML lifecycle with Azure Databricks and Azure MLParashar Shah
Machine learning development has new complexities beyond software development. There are a myriad of tools and frameworks which make it hard to track experiments, reproduce results and deploy machine learning models. Learn how you can accelerate and manage your end-to-end machine learning lifecycle on Azure Databricks using MLflow and Azure ML to reliably build, share and deploy machine learning applications using Azure Databricks. This is based on our talk at //build - https://www.youtube.com/watch?v=pe_OH07wAYc and https://mybuild.techcommunity.microsoft.com/sessions/76976
Machine Learning Real Life Applications By Examples - Mario CartiaData Driven Innovation
Durante il talk verranno illustrati 3 casi d'uso reali di utilizzo del machine learning da parte delle maggiori piattaforme web (Google, Facebook, Amazon, Twitter, PayPal) per l'implementazione di particolari features. Per ciascun esempio verrà spiegato l'algoritmo utilizzato mostrando come realizzare le medesime funzionalità attraverso l'utilizzo di Apache Spark MLlib e del linguaggio Scala.
Best Practices for Engineering Production-Ready Software with Apache SparkDatabricks
Notebooks are a great tool for Big Data. They have drastically changed the way scientists and engineers develop and share ideas. However, most world-class Spark products cannot be easily engineered, tested and deployed just by modifying or combining notebooks. Taking a prototype to production with high quality typically involves proper software engineering.
10 Reasons Why Your SAP Applications Belong on NetAppNetApp
NetApp has been supporting SAP for 20 years, delivering advanced solutions for SAP applications. Here are 10 reasons why your SAP applications belong on NetApp!
Big Data Advanced Analytics on Microsoft Azure 201904Mark Tabladillo
This talk summarizes key points for big data advanced analytics on Microsoft Azure. First, there is a review of the major technologies. Second, there is a series of technology demos (focusing on VMs, Databricks and Azure ML Service). Third, there is some advice on using the Team Data Science Process to help plan projects. The deck has web resources recommended. This presentation was delivered at the Global Azure Bootcamp 2019, Atlanta GA location (Alpharetta Avalon).
Deploying machine learning pipelines robustly at scale is one of the biggest challenges within an organization. Kubeflow is an open-source platform for distributed training, tuning, and serving models on Kubenetes. As a comprehensive solution for deploying and managing end-to-end data science and machine learning pipelines, Kubeflow is rapidly accelerating analytics innovation and adoption. John provides an overview of Kubeflow and how he has been using it in the wild.
Bootstrapping of PySpark Models for Factorial A/B TestsDatabricks
A/B testing, i.e., measuring the impact of proposed variants of e.g. e-commerce websites, is fundamental for increasing conversion rates and other key business metrics.
We have developed a solution that makes it possible to run dozens of simultaneous A/B tests, obtain conclusive results sooner, and get more interpretable results than just statistical significance, but rather probabilities of the change having a positive effect, how much revenue is risked, etc.
To compute those metrics, we need to estimate the posterior distributions of the metrics, which are computed using Generalized Linear Models (GLMs). Since we process gigabytes of data, we use a PySpark implementation, which however does not provide standard errors of coefficients. We, therefore, use bootstrapping to estimate the distributions.
In this talk, I’ll describe how we’ve implemented parallelization of an already parallelized GLM computation to be able to scale this computation horizontally over a large cluster in Databricks and describe various tweaks and how they’ve improved the performance.
AWS Sydney Meetup Presentation by Binqi Zhang of PolarSeven - http://polarseven.com
How the Internet of Things is expanding at a rapid pace with more and more connected devices being added all the time.
The world technology landscape is ever changing and evolving as we can track and measure more data than ever before with high availability connectivity.
Using AWS IoT, which is now available in Australia Binqi looks at some of the ways the new technology can be applied.
Best practices with Microsoft Graph: Making your applications more performant...Microsoft Tech Community
Learn how to take advantage of APIs, platform capabilities and intelligence from Microsoft Graph to make your app more performant, more resilient and more reliable
Changing the Way Viacom Looks at Video Performance with Mark Cohen and Michae...Databricks
Video is everything at Viacom. They build their own video players on iOS, Android and web platforms, and they have to know how those players are performing so they track critical metrics in near real-time with Apache Kafka, Spark and the Databricks platform.
In this session, Viacom will share how a quick proof of concept turned into a system that is giving them real insights into their video player performance. They will also discuss investigating platforms like Druid for fast slicing and dicing of data for business-oriented users.
One of the key takeaways is learning how, as engineers, we should work to drive value through technology, even if we work for a company that may not be tech first. Also, the data you collect can be a distraction, so create focus. Lastly, different users require different interfaces into the same data; learn how Viacom made that happen through technology, even if we work for a company that may not be tech first.
— The data you collect can be a distraction so create focus.
— Different users require different interfaces into the same data. We’ll talk about how we made that happen.
Managing your ML lifecycle with Azure Databricks and Azure MLParashar Shah
Machine learning development has new complexities beyond software development. There are a myriad of tools and frameworks which make it hard to track experiments, reproduce results and deploy machine learning models. Learn how you can accelerate and manage your end-to-end machine learning lifecycle on Azure Databricks using MLflow and Azure ML to reliably build, share and deploy machine learning applications using Azure Databricks. This is based on our talk at //build - https://www.youtube.com/watch?v=pe_OH07wAYc and https://mybuild.techcommunity.microsoft.com/sessions/76976
Machine Learning Real Life Applications By Examples - Mario CartiaData Driven Innovation
Durante il talk verranno illustrati 3 casi d'uso reali di utilizzo del machine learning da parte delle maggiori piattaforme web (Google, Facebook, Amazon, Twitter, PayPal) per l'implementazione di particolari features. Per ciascun esempio verrà spiegato l'algoritmo utilizzato mostrando come realizzare le medesime funzionalità attraverso l'utilizzo di Apache Spark MLlib e del linguaggio Scala.
From a Developer's POV: is Machine Learning Reshaping the World? - Simone Sca...Codemotion
There is no denying that machine learning is rapidly reshaping the technological horizon, fueled by increasing availability of data, computing power, and software (e.g., TensorFlow). Classical ML techniques are becoming a common tool for the everyday programmer, at the same time that sophisticated deep learning models are fueling driverless cars, advanced AI players, and more. This talk will survey the ways in which ML is impacting the programming world, as we try to answer the following questions: are we truly witnessing a new AI resurgence? If yes, what should any developer be aware of?
Material for Azure Machine Learning tutorial lecture, held within Data Mining course of MoS in Engineering in Computer Science at Università degli Studi di Roma "La Sapienza" (A.Y. 2016/2017).
Lecturers:
Fabio Rosato - rosato.1565173@studenti.uniroma1.it
Giacomo Lanciano - lanciano.1487019@studenti.uniroma1.it
Francisco Ferreres Garcia - matakukos@gmail.com
Leonardo Martini - martini.1722989@studenti.uniroma1.it
Simone Caldaro - caldaro.1324152@studenti.uniroma1.it
Na Zhu - nana.zhu@hotmail.com
Github repo: https://github.com/giacomolanciano/Azure-Machine-Learning-tutorial
Video tutorial: https://youtu.be/_zvPX6Kk7z8
This slidedeck was used for condusting "What's inside AI" sessions which was a introduction to terms and buzz words in the field of Artificial Intelligence.
The slidedeck of the session I did at Melbourne Azure Nights - September 2018.
The demo codebase and the database - https://github.com/haritha91/Cats-Dogs-Classifier---Keras
Slide deck of the session at Global Azure BootCamp 2018 - Colombo, Sri Lanka.
Here's the session video URL : https://youtu.be/fZK3-z1mK-I?list=PLWBsXKRfcKwtqlKb4525t_ioiF4GwXux7
Adjusting OpenMP PageRank : SHORT REPORT / NOTESSubhajit Sahu
For massive graphs that fit in RAM, but not in GPU memory, it is possible to take
advantage of a shared memory system with multiple CPUs, each with multiple cores, to
accelerate pagerank computation. If the NUMA architecture of the system is properly taken
into account with good vertex partitioning, the speedup can be significant. To take steps in
this direction, experiments are conducted to implement pagerank in OpenMP using two
different approaches, uniform and hybrid. The uniform approach runs all primitives required
for pagerank in OpenMP mode (with multiple threads). On the other hand, the hybrid
approach runs certain primitives in sequential mode (i.e., sumAt, multiply).
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
15. Why AzureML?
• Reduces Complexity.
• No coding! Seriously??
• Top class machine learning algorithms inbuilt.
• Power of cloud.
• Easy deployment with RESTful API.
• Easy collaboration.
• R & Python support
• Vowpal Wabbit
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26. You heard hell a lot on AzureML that you couldn’t
believe, and now it’s time for a DEMO
Caution : Unexpected things may occur during a demonstration
:)
Google traffic – A use of big data, data analysis, data visualization
Data science is multidisciplinary
Data Science acts as the middle core
Machine learning is a technique of data science that helps computers learn from existing data in order to forecast future behaviors, outcomes, and trends.
needed a huge amount of processing power and storage. Thus, businesses seeking to use the so-called learning systems for tasks like predictive analytics had to shell out major bucks for hardware and software.
Cortana Intelligence is a powerful solution to transform your data into intelligent action from Microsoft.
A fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions.
Cross Industry Standard Process for Data Mining – CRISP-DM
Azure machine learning process. Starts with defining the objective.