This document provides an overview of real world machine learning using Azure. It discusses the machine learning workflow including data understanding, preprocessing, feature engineering, model selection, evaluation and tuning. It then describes various Azure machine learning tools for building, testing and deploying machine learning models including Azure ML Workbench, Studio, Experimentation Service and Model Management Service. It concludes with an upcoming demo of predictive maintenance using Azure ML Studio.
Data mining Course
Chapter 2: Data preparation and processing
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling
Discretization
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
A presentation on Bidirectional Encoder Representations from Transformers (BERT) meant to introduce the model's use cases and training mechanism. Best viewed with powerpoint since it contain many slide animations.
Data mining Course
Chapter 2: Data preparation and processing
Introduction
Domain Expert
Goal identification and Data Understanding
Data Cleaning
Missing values
Noisy Data
Inconsistent Data
Data Integration
Data Transformation
Data Reduction
Feature Selection
Sampling
Discretization
We are predicting Heart Disease by Taking 14 Medical Parameters as an inputs through 2 data Minning Techniques(Decision Tree(Faster) And KNN neighbour Algorithms(Slower)).
And Visualizing The dataset.If the output 1 then it means Higher Chances of getting Heart Attack ,if 0 then it means Less chances of Heart Attack.
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
How should data be preprocessed for use in machine learning algorithms? How to identify the most predictive attributes of a dataset? What features can generate to improve the accuracy of a model?
Feature Engineering is the process of extracting and selecting, from raw data, features that can be used effectively in predictive models. As the quality of the features greatly influences the quality of the results, knowing the main techniques and pitfalls will help you to succeed in the use of machine learning in your projects.
In this talk, we will present methods and techniques that allow us to extract the maximum potential of the features of a dataset, increasing flexibility, simplicity and accuracy of the models. The analysis of the distribution of features and their correlations, the transformation of numeric attributes (such as scaling, normalization, log-based transformation, binning), categorical attributes (such as one-hot encoding, feature hashing, Temporal (date / time), and free-text attributes (text vectorization, topic modeling).
Python, Python, Scikit-learn, and Spark SQL examples will be presented and how to use domain knowledge and intuition to select and generate features relevant to predictive models.
A presentation on Bidirectional Encoder Representations from Transformers (BERT) meant to introduce the model's use cases and training mechanism. Best viewed with powerpoint since it contain many slide animations.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
If there is one crucial thing in building ML models, this would be the data preparation. That is the process of transforming raw data to a state where machine learning algorithms could be run to disclose insights and make predictions. Data preparation involves analysis, depends on the nature of the problem and the particular algorithms. As far as there are knowledge and experience involved, there is no such thing as automation, which makes the role of the data scientist the key to success.
ML is trendy and Microsoft already have more than 10 services to support ML. So we will focus on tools like Azure ML Workbench and Python for data preparation, review some common tricks to approach data and experiment in Azure ML Studio.
The Data Science Process - Do we need it and how to apply?Ivo Andreev
Machine learning is not black magic but a discipline that involves statistics, data science, analysis and hard work. From searching patterns and data preparation through applying and optimizing algorithms to obtaining usable predictions, one would need background and appropriate tools.
But do we need it, when there is already available AI as a service solution out there? Do we need to try hard with artificial neural networks? And if we decide to do so, what tools would be a safe bet?
In this session we will go through real world examples, mention key tools from Microsoft and open source world to do data science and machine learning and most importantly - we will provide a workflow and some best practices.
Two hour lecture I gave at the Jyväskylä Summer School. The purpose of the talk is to give a quick non-technical overview of concepts and methodologies in data science. Topics include a wide overview of both pattern mining and machine learning.
See also Part 2 of the lecture: Industrial Data Science. You can find it in my profile (click the face)
This presentation on Recurrent Neural Network will help you understand what is a neural network, what are the popular neural networks, why we need recurrent neural network, what is a recurrent neural network, how does a RNN work, what is vanishing and exploding gradient problem, what is LSTM and you will also see a use case implementation of LSTM (Long short term memory). Neural networks used in Deep Learning consists of different layers connected to each other and work on the structure and functions of the human brain. It learns from huge volumes of data and used complex algorithms to train a neural net. The recurrent neural network works on the principle of saving the output of a layer and feeding this back to the input in order to predict the output of the layer. Now lets deep dive into this presentation and understand what is RNN and how does it actually work.
Below topics are explained in this recurrent neural networks tutorial:
1. What is a neural network?
2. Popular neural networks?
3. Why recurrent neural network?
4. What is a recurrent neural network?
5. How does an RNN work?
6. Vanishing and exploding gradient problem
7. Long short term memory (LSTM)
8. Use case implementation of LSTM
Simplilearn’s Deep Learning course will transform you into an expert in deep learning techniques using TensorFlow, the open-source software library designed to conduct machine learning & deep neural network research. With our deep learning course, you'll master deep learning and TensorFlow concepts, learn to implement algorithms, build artificial neural networks and traverse layers of data abstraction to understand the power of data and prepare you for your new role as deep learning scientist.
Why Deep Learning?
It is one of the most popular software platforms used for deep learning and contains powerful tools to help you build and implement artificial neural networks.
Advancements in deep learning are being seen in smartphone applications, creating efficiencies in the power grid, driving advancements in healthcare, improving agricultural yields, and helping us find solutions to climate change. With this Tensorflow course, you’ll build expertise in deep learning models, learn to operate TensorFlow to manage neural networks and interpret the results.
And according to payscale.com, the median salary for engineers with deep learning skills tops $120,000 per year.
You can gain in-depth knowledge of Deep Learning by taking our Deep Learning certification training course. With Simplilearn’s Deep Learning course, you will prepare for a career as a Deep Learning engineer as you master concepts and techniques including supervised and unsupervised learning, mathematical and heuristic aspects, and hands-on modeling to develop algorithms. Those who complete the course will be able to:
Learn more at: https://www.simplilearn.com/
This Presentation is about NoSQL which means Not Only SQL. This presentation covers the aspects of using NoSQL for Big Data and the differences from RDBMS.
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
BERT was developed by Google AI Language and came out Oct. 2018. It has achieved the best performance in many NLP tasks. So if you are interested in NLP, studying BERT is a good way to go.
presentation on recent data mining Techniques ,and future directions of research from the recent research papers made in Pre-master ,in Cairo University under supervision of Dr. Rabie
Data preprocessing techniques
See my Paris applied psychology conference paper here
https://www.slideshare.net/jasonrodrigues/paris-conference-on-applied-psychology
or
https://prezi.com/view/KBP8JnekVH9LkLOiKY3w/
My presentation at The Richmond Data Science Community (Jan 2018). The slides are slightly different than what I had presented last year at The Data Intelligence Conference.
This Edureka Recurrent Neural Networks tutorial will help you in understanding why we need Recurrent Neural Networks (RNN) and what exactly it is. It also explains few issues with training a Recurrent Neural Network and how to overcome those challenges using LSTMs. The last section includes a use-case of LSTM to predict the next word using a sample short story
Below are the topics covered in this tutorial:
1. Why Not Feedforward Networks?
2. What Are Recurrent Neural Networks?
3. Training A Recurrent Neural Network
4. Issues With Recurrent Neural Networks - Vanishing And Exploding Gradient
5. Long Short-Term Memory Networks (LSTMs)
6. LSTM Use-Case
If there is one crucial thing in building ML models, this would be the data preparation. That is the process of transforming raw data to a state where machine learning algorithms could be run to disclose insights and make predictions. Data preparation involves analysis, depends on the nature of the problem and the particular algorithms. As far as there are knowledge and experience involved, there is no such thing as automation, which makes the role of the data scientist the key to success.
ML is trendy and Microsoft already have more than 10 services to support ML. So we will focus on tools like Azure ML Workbench and Python for data preparation, review some common tricks to approach data and experiment in Azure ML Studio.
The Data Science Process - Do we need it and how to apply?Ivo Andreev
Machine learning is not black magic but a discipline that involves statistics, data science, analysis and hard work. From searching patterns and data preparation through applying and optimizing algorithms to obtaining usable predictions, one would need background and appropriate tools.
But do we need it, when there is already available AI as a service solution out there? Do we need to try hard with artificial neural networks? And if we decide to do so, what tools would be a safe bet?
In this session we will go through real world examples, mention key tools from Microsoft and open source world to do data science and machine learning and most importantly - we will provide a workflow and some best practices.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
Azure Machine Learning and ML on PremisesIvo Andreev
Machine Learning finds patterns in large volumes of data and uses those patterns to perform predictive analysis.Microsoft offers Azure Machine Learning, while Amazon offers Amazon Machine Learning and Google offers the Google Prediction API - now depricated and replaced by Google ML engine based on TensorFlow. Software products such as MATLAB support traditional, non-cloud-based ML modeling.
Machine learning for IoT - unpacking the blackboxIvo Andreev
Have you ever considered Machine Learning as a black box? It sounds as a kind of magic happening. Although being one among many solutions available, Azure ML has proved to be a great balance between flexibility, usability and affordable price. But how does Azure ML compare with the other ML providers? How to choose the appropriate algorithm? Do you understand the key performance indicators and how to improve the quality of your models? The session is about understanding the black box and using it for IoT workload and not only.
IoT with Azure Machine Learning and InfluxDBIvo Andreev
Devices from the IoT realm generate data in a rate and magnitude that make it practically impossible to retrieve valuable information without support of adequate AI engines. Although being one among many solutions available, Azure ML has proved to be a great balance between flexibility, usability and affordable price.
Storing and serving billions of data measurements over time is also a non-trivial task addressed by the special class of Time Series DBs. Out of these, InfluxDB has the largest popularity, provides comprehensive documentation and above all - is available open source.
This session is about managing and understanding IoT data.
Machine Learning Foundations for Professional ManagersAlbert Y. C. Chen
20180804@Taiwan AI Academy, Hsinchu
6 hour lecture for those new to machine learning, to grasps the concepts, advantages and limitations of various classical machine learning methods. More importantly, to learn the skills to break down large complicated AI projects into manageable pieces, where features and functionalities could be added incrementally and annotated data accumulated. Take home message: machine learning is always a delicate balance between model complexity M and number of data N so that the trained classifier generalizes well and does not overfit.
This presentation covers an overview of Analytics and Machine learning. It also covers the Microsoft's contribution in Machine learning space. Azure ML Studio, a SaaS based portal to create, experiment and share Machine Learning Solutions to the external world.
Sample Codes: https://github.com/davegautam/dotnetconfsamplecodes
Presentation on How you can get started with ML.NET. If you are existing .NET Stack Developer and Wanna use the same technology into Machine Learning, this slide focuses on how you can use ML.NET for Machine Learning.
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
Traditional machine learning requires volumes of labelled data that can be time consuming and expensive to produce,”
“Machine teaching leverages the human capability to decompose and explain concepts to train machine learning models
direction (teaching the correct answer is not by showing the data for it, but by using a person to show the answer).
Project Bonsai is a low code platform for intelligent solutions but with a different perspective on data it allows a completely new approach to tasks, especially when the physical world is involved. Under the hood it combines machine teaching, calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a new language concept - “Inkling” and training a model is easy and interactive.
Keynote presentation from ECBS conference. The talk is about how to use machine learning and AI in improving software engineering. Experiences from our project in Software Center (www.software-center.se).
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Rodney Joyce
Number 2 in the Data Science for Dummies series - We'll predict Titanic survival with Databricks, python and MLSpark.
These are the slides only (excuse the Powerpoint animation issues) - check out the actual tech talk on YouTube: https://rodneyjoyce.home.blog/2019/05/03/data-science-for-dummies-machine-learning-with-databricks-python-sparkml-tech-talk-1-of-7/)
If you have not used Databricks before check out the first talk - Databricks for Dummies.
Here's the rest of the series: https://rodneyjoyce.home.blog/tag/data-science-for-dummies/
1) Data Science overview with Databricks
2) Titanic survival prediction with Azure Machine Learning Studio + Kaggle
3) Data Engineering with Titanic dataset + Databricks + Python
4) Titanic with Databricks + Spark ML
5) Titanic with Databricks + Azure Machine Learning Service
6) Titanic with Databricks + MLS + AutoML
7) Titanic with Databricks + MLFlow
8) Titanic with .NET Core + ML.NET
9) Deployment, DevOps/MLOps and Productionisation
By popular demand, here is a case study of my first Kaggle competition from about a year ago. Hope you find it useful. Thank you again to my fantastic team.
Machine Learning 2 deep Learning: An IntroSi Krishan
Provides a brief introduction to machine learning, reasons for its popularity, a simple walk through example and then a need for deep learning and some of its characteristics. This is an updated version of an earlier presentation.
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
First public meetup at Twitter Seattle, for Seattle DAML:
http://www.meetup.com/Seattle-DAML/events/159043422/
We compare/contrast several open source frameworks which have emerged for Machine Learning workflows, including KNIME, IPython Notebook and related Py libraries, Cascading, Cascalog, Scalding, Summingbird, Spark/MLbase, MBrace on .NET, etc. The analysis develops several points for "best of breed" and what features would be great to see across the board for many frameworks... leading up to a "scorecard" to help evaluate different alternatives. We also review the PMML standard for migrating predictive models, e.g., from SAS to Hadoop.
Similar to The Machine Learning Workflow with Azure (20)
Cybersecurity and Generative AI - for Good and Bad vol.2Ivo Andreev
The presentation is an extended in-depth version review of cybersecurity challenges with generative AI, enriched with multiple demos, analysis, responsible AI topics and mitigation steps, also covering a broader scope beyond OpenAI service.
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
Architecting AI Solutions in Azure for BusinessIvo Andreev
The topic is about Azure solution architectures that involve IoT and AI to solve common business domain problems. With near real time recommender system and an object detection with image recognition we review the architecture, build from the ground-up and illustrate how the typical realistic challenges could be addressed.
Cybersecurity Challenges with Generative AI - for Good and BadIvo Andreev
The presentation is an extended in-depth version review of cybersecurity challenges with generative AI, enriched with multiple demos, analysis, responsible AI topics and mitigation steps, also covering a broader scope beyond OpenAI service.
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
JS-Experts - Cybersecurity for Generative AIIvo Andreev
Popularity, demand and ease of access to modern generative AI technologies reveal new challenges in the cybersecurity landscape that vary from protecting confidentiality and integrity of data to misuse and abuse of technology by malicious actors. In this session we elaborate about monitoring and auditing, managing ethical implications and resolving common problems like prompt injections, jailbreaks, utilization in cyberattacks or generating insecure code.
This is a totally different perspective of LLMs
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersIvo Andreev
Have you ever wondered why GPT models work? Do you ask questions like:
◉ How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? ◉ Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? ◉ How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Key Terms; ChatGPT Enterprise; Top Questions; Enterprise Data; Azure Search; Functions; Embeddings; Context Encoding; General Intelligence; Emerging Abilities; Chain of Thought; Plugins; Multimodal with DALL-E; Project Florence
OpenAI GPT in Depth - Questions and MisconceptionsIvo Andreev
OpenAI GPT in depth – misconceptions and questions you would like answered
Have you ever wondered why GPT models work? Do you ask questions like:
How does GPT work? Why does the same problem receive different answers for different users? Is there a way to improve explainability? Can GPT model provide its sources? Why does Bing chat work differently? What are my ways to have better performance and improve completions? How can I work with data in my enterprise? What practical business cases could a generative AI model fit solving?
If you are tired of sessions just scratching the surface of OpenAI GPT, this one will go deeper and answer questions like why, why not and how.
Cutting Edge Computer Vision for EveryoneIvo Andreev
Microsoft offers a wide range of tools and advanced solutions to support you in managing computer vision related tasks.
From purely coding approaches with ML.NET, through zero-code ComputerVision.ai to advanced and flexible AI service in Azure ML, there is a solution for every need and each type of person.
From running on premises, through managed infrastructure to completely cloud services the speed of getting to the desired results and the return of investment are guaranteed.
Join this session to get insights about the options, deployment, pricing, pros and cons compared and select the most appropriate tech for your business case.
Collecting and Analysing Spaceborn DataIvo Andreev
Communicating with space and analysing satellite data
Azure reached beyond the clouds and bring space-born satellite data to your subscription for analysis and discovering insights.
Satellite as a service, Azure Orbital and a whole new ecosystem signal the ambition to push the limits and explore new opportunities.
In this session we are talking about geospatial AI-based analysis and a comprehensive flow that will allow you touch a vector of increasing importance for extending the cloud and helping businesses make tactical decisions.
Collecting and Analysing Satellite Data with Azure OrbitalIvo Andreev
Azure reached beyond the clouds and bring space-born satellite data to your subscription for analysis and discovering insights.
Satellite as a service, Azure Orbital and a whole new ecosystem signal the ambition to push the limits and explore new opportunities.
In this session we are talking about geospatial AI-based analysis and a comprehensive flow that will allow you touch a vector of increasing importance for extending the cloud and helping businesses make tactical decisions.
Azure Orbital - a fully managed cloud-based ground station as a service that enables you to communicate with your spacecrafts or satellites and generate products for customers.
AZ orbital handles machine-machine communication for the user based on the schedule and TLE location of satellites.
Azure software modules decrypt satellite data and prepare for usage.
Since Nov 2021 AZ cognitive for language is having a fresh tool – the Language Studio which is now in Preview. The studio offers multiple prebuilt and preconfigured models which allow you to quickly implement, test and deploy tasks like understanding conversational language, extracting information, classifying text or answering questions. But it goes further and offers multiple features to create, train and deploy custom models that model your data and serves your needs best. Language Studio does that by utilizing workflows that let developers build models without the need of ML knowledge and deploy the results as handy APIs.
Cosmos DB is among the top databases, with its strengths being in a flexible, extremely scalable hosted model, high SLA, low latency, globally distributed, automatic indexing, 2-dimensional redundancy and granular access level. But how does it suit IoT scenarios and for what scenarios is it appropriate?
Forecasting time series powerful and simpleIvo Andreev
Time series are a sequence of data points positioned in order of time. Time series forecasting has two main purposes - to understand the mechanisms that lead to rise or fall, and to predict future values. Very often it analyses trends, cyclical events, seasonality and has unique importance in Economics and Business. The quality of predictions can be evaluated only in future due to temporal dependencies on previous data points and there are many model types for approximation. In this session we are going to talk about challenges, ways of improvement and technology stack like ML.NET, ARIMA, Python, Azure ML, Regression and FB Prophet
Azure security guidelines for developers Ivo Andreev
Azure security baselines and benchmarks, Security Maturity Model, Industrial Internet Consortium IIC , Certification, Web Application Firewall, API Management Service
Autonomous Machines with Project BonsaiIvo Andreev
Autonomous machines rely on fusion of many technologies to sense, plan, optimize and act as if an intelligent superhuman is in control.
Project Bonsai is a machine teaching service that combines machine learning (ML), calibration and optimization to create intelligent control systems using simulations. The teaching curriculum is performed using a proprietary “Inkling” language close to JavaScript and training a model is easy and interactive. Join this session for a Bonsai jump start and a demo and try it yourself – it is free.
Global azure virtual 2021 - Azure LighthouseIvo Andreev
Azure Lighthouse provides capabilities to perform cross-tenant management at scale.
We do this by providing you the ability to view and manage multiple customers from a single context.
Building a scalable business model in the cloud is a real challenge that is of uncomparable complexity compared to project-based solutions.
If you want to offer a solution in the cloud and onboard multiple customers, the next step would be to consider how would you deploy, maintain and monitor such environment. What is Azure Lighthouse and how to make your first steps following good practices is the response to that question and the main topic of our session.
Flux QL - Nexgen Management of Time Series Inspired by JSIvo Andreev
The time series landscape evolves fast to meet the aggressive challenges in IoT. Influx 2.0 Beta was released in the first days of 2020 and although being already Top 1 time series database it introduces a revolutionary change again. InfluxDB 2 is now generally available and its key features are originate from Flux - a functional and open source 4th generation analytical programming language inspired by JavaScript. Supported in VS Code it takes a new approach towards data exploration of time series data and enables some unmatched capabilities like enrichment and filtering of time series data with external data from RDBMS.
Azure architecture design patterns - proven solutions to common challengesIvo Andreev
Building a reliable, scalable, secure applications could happen either following verified design patterns or the hard way - following the trial and error approach. Azure architecture patterns are a tested and accepted solutions of common challenges thus reducing the technical risk to the project by not having to employ a new and untested design. However, most of the patterns are relevant to any distributed system, whether hosted on Azure or on other cloud platforms.
Industrial IoT from the Ground up with Azure and Open Source
IIoT leverages the power of machines and realtime analytics to pick up on industrial inefficiencies and problems sooner, and save time and money in addition to supporting BI efforts. In a myriad of reference architectures it is up to experience and trial-error to find out what really works in a real life scenario.
We will review the challenges and solutions in building an IIoT platform from the ground up on the edge between Azure and open source in order to have the best from both worlds. Technical focus will be on IoT Edge, TS Insights, Stream Analytics, IoT Hub, App Insights, Event Grid, Service Bus, ARM templates, Influx DB, Grafana and more - all neatly glued together by Azure Functions.
Flying a Drone with JavaScript and Computer VisionIvo Andreev
Almost anything that used to run on desktop, now runs in the browser and as of Atwood's law: anything that could be written in JavaScript, will eventually be written in JavaScript.
If you have dared imagining to control your toys with code, communicate with the cloud and use advanced computer intelligence, your dreams have now become close at hand.
This session is to challenge your fantasy and make you think what you could do with JavaScript. This session is about programming drones with JavaScript and AI capabilities.
For business users, always using AI is about easy access to the tools without writing any code. This session is not about learning how to do AI but how to make AI usable and add value.
AI powered visuals such as Key Influencer in Power BI desktop to analyse the data without deep knoledge of the machine learning concepts.
Machine Learning is approaching a peak of inflated expectations, although we see AI daily and in all contexts. Media pressure is high, governments are overly optimistic, plenty of ventures are putting money in unviable ideas or some brilliant engineers fail to reach business users.
But Microsoft bring all of this under the same roof and unleash the power of AI by integrating Power BI ecosystem with Azure ML and Cognitive services. The result is as simple and effective as great technology at end-user's hand.
Large Language Models and the End of ProgrammingMatt Welsh
Talk by Matt Welsh at Craft Conference 2024 on the impact that Large Language Models will have on the future of software development. In this talk, I discuss the ways in which LLMs will impact the software industry, from replacing human software developers with AI, to replacing conventional software with models that perform reasoning, computation, and problem-solving.
Software Engineering, Software Consulting, Tech Lead.
Spring Boot, Spring Cloud, Spring Core, Spring JDBC, Spring Security,
Spring Transaction, Spring MVC,
Log4j, REST/SOAP WEB-SERVICES.
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Shahin Sheidaei
Games are powerful teaching tools, fostering hands-on engagement and fun. But they require careful consideration to succeed. Join me to explore factors in running and selecting games, ensuring they serve as effective teaching tools. Learn to maintain focus on learning objectives while playing, and how to measure the ROI of gaming in education. Discover strategies for pitching gaming to leadership. This session offers insights, tips, and examples for coaches, team leads, and enterprise leaders seeking to teach from simple to complex concepts.
A Comprehensive Look at Generative AI in Retail App Testing.pdfkalichargn70th171
Traditional software testing methods are being challenged in retail, where customer expectations and technological advancements continually shape the landscape. Enter generative AI—a transformative subset of artificial intelligence technologies poised to revolutionize software testing.
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
CyanicLab, an offshore custom software development company based in Sweden,India, Finland, is your go-to partner for startup development and innovative web design solutions. Our expert team specializes in crafting cutting-edge software tailored to meet the unique needs of startups and established enterprises alike. From conceptualization to execution, we offer comprehensive services including web and mobile app development, UI/UX design, and ongoing software maintenance. Ready to elevate your business? Contact CyanicLab today and let us propel your vision to success with our top-notch IT solutions.
Listen to the keynote address and hear about the latest developments from Rachana Ananthakrishnan and Ian Foster who review the updates to the Globus Platform and Service, and the relevance of Globus to the scientific community as an automation platform to accelerate scientific discovery.
Enhancing Research Orchestration Capabilities at ORNL.pdfGlobus
Cross-facility research orchestration comes with ever-changing constraints regarding the availability and suitability of various compute and data resources. In short, a flexible data and processing fabric is needed to enable the dynamic redirection of data and compute tasks throughout the lifecycle of an experiment. In this talk, we illustrate how we easily leveraged Globus services to instrument the ACE research testbed at the Oak Ridge Leadership Computing Facility with flexible data and task orchestration capabilities.
How to Position Your Globus Data Portal for Success Ten Good PracticesGlobus
Science gateways allow science and engineering communities to access shared data, software, computing services, and instruments. Science gateways have gained a lot of traction in the last twenty years, as evidenced by projects such as the Science Gateways Community Institute (SGCI) and the Center of Excellence on Science Gateways (SGX3) in the US, The Australian Research Data Commons (ARDC) and its platforms in Australia, and the projects around Virtual Research Environments in Europe. A few mature frameworks have evolved with their different strengths and foci and have been taken up by a larger community such as the Globus Data Portal, Hubzero, Tapis, and Galaxy. However, even when gateways are built on successful frameworks, they continue to face the challenges of ongoing maintenance costs and how to meet the ever-expanding needs of the community they serve with enhanced features. It is not uncommon that gateways with compelling use cases are nonetheless unable to get past the prototype phase and become a full production service, or if they do, they don't survive more than a couple of years. While there is no guaranteed pathway to success, it seems likely that for any gateway there is a need for a strong community and/or solid funding streams to create and sustain its success. With over twenty years of examples to draw from, this presentation goes into detail for ten factors common to successful and enduring gateways that effectively serve as best practices for any new or developing gateway.
How Recreation Management Software Can Streamline Your Operations.pptxwottaspaceseo
Recreation management software streamlines operations by automating key tasks such as scheduling, registration, and payment processing, reducing manual workload and errors. It provides centralized management of facilities, classes, and events, ensuring efficient resource allocation and facility usage. The software offers user-friendly online portals for easy access to bookings and program information, enhancing customer experience. Real-time reporting and data analytics deliver insights into attendance and preferences, aiding in strategic decision-making. Additionally, effective communication tools keep participants and staff informed with timely updates. Overall, recreation management software enhances efficiency, improves service delivery, and boosts customer satisfaction.
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
Worried about document security while sharing them in Salesforce? Fret no more! Here are the top-notch security standards XfilesPro upholds to ensure strong security for your Salesforce documents while sharing with internal or external people.
To learn more, read the blog: https://www.xfilespro.com/how-does-xfilespro-make-document-sharing-secure-and-seamless-in-salesforce/
Developing Distributed High-performance Computing Capabilities of an Open Sci...Globus
COVID-19 had an unprecedented impact on scientific collaboration. The pandemic and its broad response from the scientific community has forged new relationships among public health practitioners, mathematical modelers, and scientific computing specialists, while revealing critical gaps in exploiting advanced computing systems to support urgent decision making. Informed by our team’s work in applying high-performance computing in support of public health decision makers during the COVID-19 pandemic, we present how Globus technologies are enabling the development of an open science platform for robust epidemic analysis, with the goal of collaborative, secure, distributed, on-demand, and fast time-to-solution analyses to support public health.
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Globus
Large Language Models (LLMs) are currently the center of attention in the tech world, particularly for their potential to advance research. In this presentation, we'll explore a straightforward and effective method for quickly initiating inference runs on supercomputers using the vLLM tool with Globus Compute, specifically on the Polaris system at ALCF. We'll begin by briefly discussing the popularity and applications of LLMs in various fields. Following this, we will introduce the vLLM tool, and explain how it integrates with Globus Compute to efficiently manage LLM operations on Polaris. Attendees will learn the practical aspects of setting up and remotely triggering LLMs from local machines, focusing on ease of use and efficiency. This talk is ideal for researchers and practitioners looking to leverage the power of LLMs in their work, offering a clear guide to harnessing supercomputing resources for quick and effective LLM inference.
Strategies for Successful Data Migration Tools.pptxvarshanayak241
Data migration is a complex but essential task for organizations aiming to modernize their IT infrastructure and leverage new technologies. By understanding common challenges and implementing these strategies, businesses can achieve a successful migration with minimal disruption. Data Migration Tool like Ask On Data play a pivotal role in this journey, offering features that streamline the process, ensure data integrity, and maintain security. With the right approach and tools, organizations can turn the challenge of data migration into an opportunity for growth and innovation.
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Globus
The U.S. Geological Survey (USGS) has made substantial investments in meeting evolving scientific, technical, and policy driven demands on storing, managing, and delivering data. As these demands continue to grow in complexity and scale, the USGS must continue to explore innovative solutions to improve its management, curation, sharing, delivering, and preservation approaches for large-scale research data. Supporting these needs, the USGS has partnered with the University of Chicago-Globus to research and develop advanced repository components and workflows leveraging its current investment in Globus. The primary outcome of this partnership includes the development of a prototype enterprise repository, driven by USGS Data Release requirements, through exploration and implementation of the entire suite of the Globus platform offerings, including Globus Flow, Globus Auth, Globus Transfer, and Globus Search. This presentation will provide insights into this research partnership, introduce the unique requirements and challenges being addressed and provide relevant project progress.
Unleash Unlimited Potential with One-Time Purchase
BoxLang is more than just a language; it's a community. By choosing a Visionary License, you're not just investing in your success, you're actively contributing to the ongoing development and support of BoxLang.
Code reviews are vital for ensuring good code quality. They serve as one of our last lines of defense against bugs and subpar code reaching production.
Yet, they often turn into annoying tasks riddled with frustration, hostility, unclear feedback and lack of standards. How can we improve this crucial process?
In this session we will cover:
- The Art of Effective Code Reviews
- Streamlining the Review Process
- Elevating Reviews with Automated Tools
By the end of this presentation, you'll have the knowledge on how to organize and improve your code review proces
Accelerate Enterprise Software Engineering with PlatformlessWSO2
Key takeaways:
Challenges of building platforms and the benefits of platformless.
Key principles of platformless, including API-first, cloud-native middleware, platform engineering, and developer experience.
How Choreo enables the platformless experience.
How key concepts like application architecture, domain-driven design, zero trust, and cell-based architecture are inherently a part of Choreo.
Demo of an end-to-end app built and deployed on Choreo.
Into the Box Keynote Day 2: Unveiling amazing updates and announcements for modern CFML developers! Get ready for exciting releases and updates on Ortus tools and products. Stay tuned for cutting-edge innovations designed to boost your productivity.
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisGlobus
JASMIN is the UK’s high-performance data analysis platform for environmental science, operated by STFC on behalf of the UK Natural Environment Research Council (NERC). In addition to its role in hosting the CEDA Archive (NERC’s long-term repository for climate, atmospheric science & Earth observation data in the UK), JASMIN provides a collaborative platform to a community of around 2,000 scientists in the UK and beyond, providing nearly 400 environmental science projects with working space, compute resources and tools to facilitate their work. High-performance data transfer into and out of JASMIN has always been a key feature, with many scientists bringing model outputs from supercomputers elsewhere in the UK, to analyse against observational or other model data in the CEDA Archive. A growing number of JASMIN users are now realising the benefits of using the Globus service to provide reliable and efficient data movement and other tasks in this and other contexts. Further use cases involve long-distance (intercontinental) transfers to and from JASMIN, and collecting results from a mobile atmospheric radar system, pushing data to JASMIN via a lightweight Globus deployment. We provide details of how Globus fits into our current infrastructure, our experience of the recent migration to GCSv5.4, and of our interest in developing use of the wider ecosystem of Globus services for the benefit of our user community.
1. April 21
Real World Machine Learning in Azure
The Machine Learning Workflow
Step by Step and in Azure
2. About me
• Project Manager @
o 16+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020
• External Expert Eurostars, InnoFund DK
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning
o Security & Performance Optimization
• Contact
o ivelin.andreev@icb.bg
o www.linkedin.com/in/ivelin
o www.slideshare.net/ivoandreev
4. Programming vs Machine Learning
• How classic programming works?
o Developer is the intelligence
o Array of statements:
• Does a bird fly?
• Yes!... Unless: dead, injured, flightless, missing a wing
o Problems raise at scale, rules and exceptions are endless
o System does not adapt
• ML model is …
o System, answering questions correctly (most of the time )
o Created via training process
o Learns from data and finds patterns
• Use Cases
o Classification, Regression, Recommendation, Anomaly detection
5. Machine Learning Challenges
• Asking the right questions
• Requires training data
o Real-world data is messy (wrong or missing data)
o Feature engineering transforms to predictive features (i.e. DNA)
o Feature extraction ( i.e. IP Address -> population density)
o Feature selection for informative features
• Overfitting model
o “Kicks ass” while training , fails badly on real predictions
• Model validation
o “Sense” how well your model will work on new data
6. The purpose of ML modelling is:
• Generate predictions
• Understand true relations
7. • Parametric Methods
o Step 1: Select a form for the function (i.e. f(X)=a.X + b)
o Step 2: Learn the coefficients from the training data
o Pros: Simple, Speed, Less training data
o i.e. Linear Regression 𝒚 = 𝜷 𝟎 + 𝜷1*Credit_Line + 𝜷2*Education_Level + 𝜷3*Age
• Nonparametric Methods
o No fixed functional form
o Pros: Flexiblе, No assumptions, Predictive power
o Cons: Overfitting, Slower, More training data
o i.e. Decision Tree
Model Types
8. ML vs. Statistical Modelling
• Statistical Models
o Require understanding how data were collected
o Aggregate data into numbers to understand structure
o Easily interpretable on lower dimensional datasets
• Data Science
o Bridges the gap
o Find out patterns in data and come with initial insights
• ML Models
o Make data speak instead of following initial hypothesis
o Customizable to fit business domain
o Scale to handle thousands of features
9. Do you know which is
the “sexiest” job
of 21st century?
11. • Appealing
o 64% believe they are working in this century’s most appealing job
• In demand
o 90% contacted at least once a month with job offer
o 50% - weekly, 30% - several times/week, 35% have <2y experience
• The dark side…
o All models are wrong, some are useful
o 80% time is data preparation
o Real life, not academic problems
o Non-linear process
o No full automation
• No one cares how you do it
• Presentation is the key
The Truth about Data Science
12. MASTERING THE TOOLS
That does not transform
you to a watchmaker
There are yet
process and experience
14. Data Understanding
• Mosaic plot
o Categorical distribution
o Visualizes the relation between X and Y
o Strong relation = Y-splits are far apart
• Box plot
o Continuous distribution
o Distribution of numeric variable
o Identify and discard outliers (IQR)
• Scatter plot
o How much a variable determines another
https://www.kaggle.com/saisivasriram/titanic-feature-understanding-from-plots
15. • Make features usable
o Numerical
o Categorical (i.e. week day)
o PCA dimensionality reduction
(clustering, low covariance)
o Dummy variables
• Handle missing data
• Normalize data
o Standard range of numerical scale (i.e. from [-1000;1000] -> [0;1], [-1;1])
o Value range influence the importance of the feature compared to other
Data Preprocessing
16. Feature Engineering
Def: Using transformations on raw data to create new
features, more closely related to target variable
• Create features more closely related to target variable
o i.e. defaulting customer – debt-to-balance ratio = debt / balance
• Bring external data sources (i.e. Google places from IP address)
• Create features that are easily interpreted (i.e. date to day & month)
• You are using unstructured data sources (i.e. text, video)
• Create features, experiment, choose with best predictive power
Note: Domain knowledge is important (i.e. 7th is a pension day)
17. Note: All information is encoded in the digital media
• Images
o Step 1: Colour statistics, EXIF metadata, edges, shapes
o Step 2: Extract knowledge in fixed set of numeric characteristics
• Text
o Step 1
• Bagging, N-grams, term frequency, topic modelling, stemming
• Named entity recognition (i.e. Wikipedia)
o Step 2: Extract knowledge in fixed set of numeric characteristics
Digital Media Feature Engineering
18. Modelling Starts by Selecting Algorithm
• There are other ML tools
• There are many more algorithms
• You could make custom
implementations
19. Basic evaluation workflow
• Pick performance metric based on algorithm type
• Tweak data and model until target performance reached
CAUTION: Common problems
• Using the same data for validation and training
o Split data - 20-40% of data for validation
o K-fold cross validate - repeated random split with beats split noise
• Overfitting and model optimism
o Do not get tempted to model noise (bias-variance tradeoff)
o Do not use temporal features (future features) to predict values in the past
Performance Evaluation
20. Performance Metrics
• Regression model
o Root Mean Squared Error (RMSE)
o Coefficient of Determination, R2 ϵ [0;1]
• Classification model
o Confusion matrix
• Binary classification model
o Accuracy based on correct answers
o Area under ROC curve (AUC)
• Threshold
• Precision = TP / (TP + FP)
• Recall = TP / (TP + FN)
o PR-curve is better for imbalanced distribution
21. Tuning Model Parameters
• Model parameters control inner behaviour
o More sophisticated algorithm, more parameters
• i.e. Locally Deep SVM with kernel
o Kernel type, kernel coefficient
• How parameter tuning works?
1. Choose metric for evaluation (AUC - classification, R2-regression, etc.)
2. Select parameters for optimization
3. Define a grid as Cartesian product between arrays
4. For each combination, cross-validate on training set
5. Select the parameters for the best evaluation
Note: Expected improvement is 3%-8%
22. Feature Selection - select
the most predictive features
ML handles x1000 params
Not all params are equal
Adding features
Common approach
to increase accuracy
Poor performance
Correlated features could lead to
poor model performance
Overfitting
Learning relations in more detail
may lead to overfitting
23. Selecting Good Features
• Motivation
o Sometimes the ML goal is not to predict but identify predictive features
o Computational costs are related to number of features
• Approach
o Trying all combinations of features? ( that would be infeasible)
o Algorithms with built-in feature selection (i.e. decision trees)
• Algorithms
o Iterative Forward selection & Backward elimination
o Permutation feature importance
• High importance features are more sensitive to random shuffling of values
o Filter based feature selection
!!!Some features may have more predictive power when paired!!!
25. Azure Machine Learning
• Azure Machine Learning is an integrated, end-to-end
data science and advanced analytics solution
• ML related services and tools
• Highlights
o Built on open source technologies (Jupyter Notebook, Spark, Python, Docker)
o Execute experiments in isolated environments
o GPU-enabled VMs
o Azure ML Workbench
o Azure ML Experimentation Service
o Azure ML Model Management Service
o Azure ML Studio
o Data Science VM
o Libraries for Apache Spark (MMLSpark)
o Visual Studio Code Tools for AI
o Cognitive Toolkit (CNTK)
o Microsoft Cognitive Services
o ML Services for SQL Server (R, Python)
26. Azure ML Workbench
• Desktop application (Windows, macOS)
• Built-in Jupyter Notebook services and Git integration
• End-to-end process support
o Powerful inspectors for data analysis
o Data transformations by example
o Model development and experimentation (Python)
o Model history and deployment (local, Docker)
27. Azure ML Studio
• Visual workspace to build, test and deploy ML solutions
• Highlights
o X-browser drag and drop, no programming
o Rich set of modules
o Fits beginners and advanced users
o Unlimited extensibility (R Script, Python Script)
o Enterprise grade cloud service (SLA 99.95%)
o ML REST web services consumption
o Jupyter Notebook
o Azure AI Gallery (8000+ samples)
• At what price?
o Free plan available
o €8.5 per seat + €0.85 per experiment/hour
o Recommended: €85/month (100K requests)
28. Azure Data Science VM
• Pre-configured cloud environment for AI & Data Science
• Highlights
o Preconfigured, fully operational environment
o 50+ tools DEV, ML, BigData, Data management
o Windows and Linux (Ubuntu/CentOS)
o Updated every few months
o On-demand elastic capacity
o GPU optimized VMs for deep learning
o Up to 4x NV K80 or V100 GPUs
o Up to 128 cores, 3.8TiB RAM
• At what price?
o From €10 to €28’620 per month
29. Azure ML Experimentation Service
• Handle execution of ML experiments in virtual environment
for isolated, consistent and reproducible results (since 09.2017)
o Local native
o Docker (Local and Remote)
o Azure Spark cluster
• Supports Workbench, records and presents run history
• Scalable model consumption
https://docs.microsoft.com/en-us/azure/machine-learning/preview/experimentation-service-configuration
30. Azure ML Model Management Service
• Provide deployment, hosting, versioning and
management of models in Azure, on-prem and IoT Edge
• Deployment
o Model manifest for Docker image
https://docs.microsoft.com/en-us/azure/machine-
learning/preview/deployment-setup-configuration
• Consumption
o Models exposed on REST API
o Sample code (Java, C#, Python)
• Scalability
o Scale-out to 100x replicas/cluster
o 10 requests/replica (default)
o Autoscaling based on load
• Retraining
o APIs to retrain models and update
model version
31. Takeaways
• ML in the Microsoft World
o https://docs.microsoft.com/en-us/azure/machine-learning/
• Python for AI
o https://wiki.python.org/moin/PythonForArtificialIntelligence
• Data Science Blog
o https://data-flair.training/blogs/category/machine-learning/
• Starter Books
32. DEMO
• Data Analysis (Azure ML Workbench)
• Data Preparation (Azure ML Workbench)
• Predictive Maintenance (Azure ML Studio)
33. Upcoming events
SQLSaturday #711 Plovdiv
02 June 2018
www.sqlsaturday.com/711/
SQLSaturday #763 Sofia
13 Oct 2018
www.sqlsaturday.com/763/
34. Thanks to our Sponsors:
Global Sponsor:
Platinum Sponsors:
Swag Sponsors: Media Partners: