Material for Azure Machine Learning tutorial lecture, held within Data Mining course of MoS in Engineering in Computer Science at Università degli Studi di Roma "La Sapienza" (A.Y. 2016/2017).
Lecturers:
Fabio Rosato - rosato.1565173@studenti.uniroma1.it
Giacomo Lanciano - lanciano.1487019@studenti.uniroma1.it
Francisco Ferreres Garcia - matakukos@gmail.com
Leonardo Martini - martini.1722989@studenti.uniroma1.it
Simone Caldaro - caldaro.1324152@studenti.uniroma1.it
Na Zhu - nana.zhu@hotmail.com
Github repo: https://github.com/giacomolanciano/Azure-Machine-Learning-tutorial
Video tutorial: https://youtu.be/_zvPX6Kk7z8
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
In this talk, we will present an overview of Azure Machine Learning, a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. We will start with the basics of machine learning and end with a demo that uses real world data.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
Complete No code solution to Machine Learning using Azure ML Studio. The aim of this presentation is to discuss the capability of Azure ML Studio in enabling any novice to perform ML experiments.
A presentation covers how data science is connected to build effective machine learning solutions. How to build end to end solutions in Azure ML. How to build, model, and evaluate algorithms in Azure ML.
In this talk, we will present an overview of Azure Machine Learning, a fully managed cloud service that enables you to easily build, deploy, and share predictive analytics solutions. We will start with the basics of machine learning and end with a demo that uses real world data.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
The Power of Auto ML and How Does it WorkIvo Andreev
Automated ML is an approach to minimize the need of data science effort by enabling domain experts to build ML models without having deep knowledge of algorithms, mathematics or programming skills. The mechanism works by allowing end-users to simply provide data and the system automatically does the rest by determining approach to perform particular ML task. At first this may sound discouraging to those aiming to the “sexiest job of the 21st century” - the data scientists. However, Auto ML should be considered as democratization of ML, rather that automatic data science.
In this session we will talk about how Auto ML works, how is it implemented by Microsoft and how it could improve the productivity of even professional data scientists.
For this plenary talk at the Charlotte AI Institute for Smarter Learning, Dr. Cori Faklaris introduces her fellow college educators to the exciting world of generative AI tools. She gives a high-level overview of the generative AI landscape and how these tools use machine learning algorithms to generate creative content such as music, art, and text. She then shares some examples of generative AI tools and demonstrate how she has used some of these tools to enhance teaching and learning in the classroom and to boost her productivity in other areas of academic life.
Complete No code solution to Machine Learning using Azure ML Studio. The aim of this presentation is to discuss the capability of Azure ML Studio in enabling any novice to perform ML experiments.
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
The session is about creating, training, evaluating and deploying machine learning with no-code approach using Azure AutoML.
* NO MACHINE LEARNING EXPERIENCE REQUIRED *
Agenda:
1. Introduction to Machine Learning
2. What is AutoML (Automated Machine Learning) ?
3. AutoML versus Conventional ML practices
4. Intro to Azure Automated Machine Learning
5. Hands-on demo
6 Contest
6. Learning resources
7. Conclusion
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
This session will introduce you the features of Amazon SageMaker, including a one-click training environment, highly-optimized machine learning algorithms with built-in model tuning, and deployment without engineering effort. With zero-setup required, Amazon SageMaker significantly decreases your training time and overall cost of building production machine learning systems. You'll also hear how and why Intuit is using Amazon SageMaker on AWS for real-time fraud detection.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Recommendation is one of the most popular applications in machine learning (ML). In this workshop, we’ll show you how to build a movie recommendation model based on factorization machines — one of the built-in algorithms of Amazon SageMaker — and the popular MovieLens dataset.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Model driven engineering for big data management systemsMarcos Almeida
In the Big Data systems, the sheer volume of incoming data is itself a problem to be solved, as well as the problem of how to process that data for the end user. As a French leader in the domain of UML modelling, Softeam then saw the opportunity to improve its commercial offer of UML based modelling environments. This work was funded by the EU commission as part of the JUNIPER FP7 project .
Команда Data Phoenix Events приглашает всех, 17 августа в 19:00, на первый вебинар из серии "The A-Z of Data", который будет посвящен MLOps. В рамках вводного вебинара, мы рассмотрим, что такое MLOps, основные принципы и практики, лучшие инструменты и возможные архитектуры. Мы начнем с простого жизненного цикла разработки ML решений и закончим сложным, максимально автоматизированным, циклом, который нам позволяет реализовать MLOps.
https://dataphoenix.info/the-a-z-of-data/
https://dataphoenix.info/the-a-z-of-data-introduction-to-mlops/
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Data Lakes are meant to support many of the same analytics capabilities of Data Warehouses while overcoming some of the core problems. Yet Data Lakes have a distinctly different technology base. This webinar will provide an overview of the standard architecture components of Data Lakes.
This will include:
The Lab and the factory
The base environment for batch analytics
Critical governance components
Additional components necessary for real-time analytics and ingesting streaming data
The session is about creating, training, evaluating and deploying machine learning with no-code approach using Azure AutoML.
* NO MACHINE LEARNING EXPERIENCE REQUIRED *
Agenda:
1. Introduction to Machine Learning
2. What is AutoML (Automated Machine Learning) ?
3. AutoML versus Conventional ML practices
4. Intro to Azure Automated Machine Learning
5. Hands-on demo
6 Contest
6. Learning resources
7. Conclusion
Introduction to Google BigQuery. Slides used at the first GDG Cloud meetup in Brussels, about big data on Google Cloud Platform. (http://www.meetup.com/GDG-Cloud-Belgium/events/228206131)
This session will introduce you the features of Amazon SageMaker, including a one-click training environment, highly-optimized machine learning algorithms with built-in model tuning, and deployment without engineering effort. With zero-setup required, Amazon SageMaker significantly decreases your training time and overall cost of building production machine learning systems. You'll also hear how and why Intuit is using Amazon SageMaker on AWS for real-time fraud detection.
Feature Store as a Data Foundation for Machine LearningProvectus
Looking to design and build a centralized, scalable Feature Store for your Data Science & Machine Learning teams to take advantage of? Come and learn from experts of Provectus and Amazon Web Services (AWS) how to!
Feature Store is a key component of the ML stack and data infrastructure, which enables feature engineering and management. By having a Feature Store, organizations can save massive amounts of resources, innovate faster, and drive ML processes at scale. In this webinar, you will learn how to build a Feature Store with a data mesh pattern and see how to achieve consistency between real-time and training features, to improve reproducibility with time-traveling for data.
Agenda
- Modern Data Lakes & Modern ML Infrastructure
- Existing and Emerging Architectural Shifts
- Feature Store: Overview and Reference Architecture
- AWS Perspective on Feature Store
Intended Audience
Technology executives & decision makers, manager-level tech roles, data architects & analysts, data engineers & data scientists, ML practitioners & ML engineers, and developers
Presenters
- Stepan Pushkarev, Chief Technology Officer, Provectus
- Gandhi Raketla, Senior Solutions Architect, AWS
- German Osin, Senior Solutions Architect, Provectus
Feel free to share this presentation with your colleagues and don't hesitate to reach out to us at info@provectus.com if you have any questions!
REQUEST WEBINAR: https://provectus.com/webinar-feature-store-as-data-foundation-for-ml-nov-2020/
The 'macro view' on Big Query:
We started with an overview, some typical uses and moved to project hierarchy, access control and security.
In the end we touch about tools and demos.
Recommendation is one of the most popular applications in machine learning (ML). In this workshop, we’ll show you how to build a movie recommendation model based on factorization machines — one of the built-in algorithms of Amazon SageMaker — and the popular MovieLens dataset.
In this session, you'll get all the answers about how ChatGPT and other GPT-X models can be applied to your current or future project. First, we'll put in order all the terms – OpenAI, GPT-3, ChatGPT, Codex, Dall-E, etc., and explain why Microsoft and Azure are often mentioned in this context. Then, we'll go through the main capabilities of the Azure OpenAI and respective usecases that might inspire you to either optimize your product or build a completely new one.
As data science workloads grow, so does their need for infrastructure. But, is it fair to ask data scientists to also become infrastructure experts? If not the data scientists, then, who is responsible for spinning up and managing data science infrastructure? This talk will address the context in which ML infrastructure is emerging, walk through two examples of ML infrastructure tools for launching hyperparameter optimization jobs, and end with some thoughts for building better tools in the future.
Originally given as a talk at the PyData Ann Arbor meetup (https://www.meetup.com/PyData-Ann-Arbor/events/260380989/)
Model driven engineering for big data management systemsMarcos Almeida
In the Big Data systems, the sheer volume of incoming data is itself a problem to be solved, as well as the problem of how to process that data for the end user. As a French leader in the domain of UML modelling, Softeam then saw the opportunity to improve its commercial offer of UML based modelling environments. This work was funded by the EU commission as part of the JUNIPER FP7 project .
Contents of the presentation:
1. IFC OVERVIEW
- Back to The Idea of BIM
- Open BIM
- What Is IFC?
- IFC Formats
- IFC Workflow
- Interoperability
- BIM Collaboration Format (BCF)
- Model View Definition (MVD)
- Data Modeling
- Modeling Language
- IFC Data Modeling (Schema)
- EXPRESS Schema
2. IFC DATA MODEL
- Inheritance Hierarchy
- Explicit vs Inverse Attributes
- Objectified Relationships
- Viewers
- Spatial Aggregate Hierarchy
- Geometric Representation Methods
- Relative Positioning
3. ATTRIBUTES & PROPERTIES
- It’s all about Data
- Data Mapping
- Attributes Categories
- Attributes in Revit
- Properties Classification
- IFC Property Sets:
- Revit Implementation
- Data Mapping files
4. IFC: THE NOW & THE FUTURE
- Preview
- Ifc Versions Evolution
- Ifc Certification
- ifcBridge Addition to IFC4.2
- Ifc5.0 Infrastructure & Better GIS Integration
- Ifc New Candidate Formats
- Brief Case Study: Ifcjson Format
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
The data platform at Stitch Fix runs thousands of jobs a day to feed data products that provide algorithmic capabilities to power nearly all aspects of the business, from merchandising to operations to styling recommendations. Many of these jobs are distributed across Spark clusters, while many others are scheduled as isolated single-node tasks in containers running Python, R, or Scala. Pipelines are often comprised of a mix of task types and containers.
This talk will cover thoughts and guidelines on how we develop, schedule, and maintain these pipelines at Stitch Fix. We’ll discuss guidelines on how we think about which portions of the pipelines we develop to run on what platforms (e.g. what is important to run distributed across Spark clusters vs run in stand-alone containers) and how we get them to play well together. We’ll also provide an overview of tools and abstractions that have been developed at Stitch Fix to facilitate the process from development, to deployment, to monitoring them in production.
Serverless Toronto User Group - Let's go Serverless!Daniel Zivkovic
Presentation slides from the first Toronto Kickoff Meetup. Topics covered:
1. Debunking Serverless Myths
2. How did we get here? Serverless past, present and the future
3. Serverless vs. FaaS vs. BaaS
4. Products Landscape
5. Popular Use Cases & Design Patterns
6. How to leverage The Serverless Framework to start building cloud-native applications!
7. Serverless forecast: How big will serverless be?
8. Learning Serverless & Serverless Tips
9. Adopting Serverless in your organization
10. Planning Serverless Toronto next steps...
How to Build a ML Platform Efficiently Using Open-SourceDatabricks
Fast-growing startups usually face a common set of challenges when employing machine learning. Data scientists are expected to work on new products and develop new models as well as iterate on existing ones. Once in production, models should be continuously monitored and regularly maintained as the infrastructure evolves. Before too long, data scientists end up spending most of their time doing maintenance and firefighting of existing models instead of creating new ones.
At GetYourGuide, we faced these challenges and decided to think about machine learning development holistically, which led us to our machine learning platform. Our platform uses MLflow to keep track of our machine learning life-cycle and ease the development experience. To integrate our models into our production environment, we also need to deal with additional requirements like API specification, SLOs and monitoring. To empower our data scientists, we have built a templating system that takes care of the heavy lifting of going to production, leveraging software engineering tools and ML-specific ones like BentoML.
In this talk we will present:
– Our previous approaches for deploying models and their tradeoffs
– Our data science and platform principles
– The main functionalities of our platform
– A live demo to create a new service
– Our learnings in the process
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15MLconf
10 More Lessons Learned from Building Real-Life ML Systems: A year ago I presented a collection of 10 lessons in MLConf. These goal of the presentation was to highlight some of the practical issues that ML practitioners encounter in the field, many of which are not included in traditional textbooks and courses. The original 10 lessons included some related to issues such as feature complexity, sampling, regularization, distributing/parallelizing algorithms, or how to think about offline vs. online computation.
Since that presentation and associated material was published, I have been asked to complement it with more/newer material. In this talk I will present 10 new lessons that not only build upon the original ones, but also relate to my recent experiences at Quora. I will talk about the importance of metrics, training data, and debuggability of ML systems. I will also describe how to combine supervised and non-supervised approaches or the role of ensembles in practical ML systems.
Slide deck used to introduce machine learning with Azure Machine Learning Service. Focus on deployment of models with the machine learning SDK and consumption of the models with Python and Go.
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
Talk from Software Engineering for Machine Learning Workshop (SW4ML) at the Neural Information Processing Systems (NIPS) 2014 conference in Montreal, Canada on 2014-12-13.
Abstract:
Building a real system that incorporates machine learning as a part can be a difficult effort, both in terms of the algorithmic and engineering challenges involved. In this talk I will focus on the engineering side and discuss some of the practical issues we’ve encountered in developing real machine learning systems at Netflix and some of the lessons we’ve learned over time. I will describe our approach for building machine learning systems and how it comes from a desire to balance many different, and sometimes conflicting, requirements such as handling large volumes of data, choosing and adapting good algorithms, keeping recommendations fresh and accurate, remaining responsive to user actions, and also being flexible to accommodate research and experimentation. I will focus on what it takes to put machine learning into a real system that works in a feedback loop with our users and how that imposes different requirements and a different focus than doing machine learning only within a lab environment. I will address the particular software engineering challenges that we’ve faced in running our algorithms at scale in the cloud. I will also mention some simple design patterns that we’ve fond to be useful across a wide variety of machine-learned systems.
Building a performing Machine Learning model from A to ZCharles Vestur
A 1-hour read to become highly knowledgeable about Machine learning and the machinery underneath, from scratch!
A presentation introducing to all fundamental concepts of Machine Learning step by step, following a classical approach to build a performing model. Simple examples and illustrations are used all along the presentation to make the concepts easier to grasp.
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Aaron Saray
Object Oriented Programming in enterprise level PHP is incredibly important. In this presentation, concepts like MVC architecture, data mappers, services, and domain and data models will be discussed. Simple demonstrations will be used to show patterns and best practices. In addition, using tools like Doctrine or integration with Salesforce or the AS/400 will also be discussed. There will be an emphasis on the practical application of these techniques as well - this isn't just a theoretical talk! This presentation is great for those just beginning to create enterprise applications as well as those who have had years of experience.
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningBill Liu
https://learn.xnextcon.com/event/eventdetails/W20040310
I will describe what is available in terms of Open Source and Proprietary tools for automating Data Science tasks and introduce 2 new tools: one to visualize any sized data set with one click, another: to try multiple ML models and techniques with a single call. I will provide the Github Repos for both for free in the talk.
DutchMLSchool. ML for Energy Trading and Automotive SectorBigML, Inc
Machine Learning for Energy Trading, Automotive Sector, and Logistics, presented by BigML's Partners A1 Digital.
Main Conference: Introduction to Machine Learning.
DutchMLSchool: 1st edition of the Machine Learning Summer School in The Netherlands.
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
Data science isn't an easy task to pull of.
You start with exploring data and experimenting with models.
Finally, you find some amazing insight!
What now?
How do you transform a little experiment to a production ready workflow? Better yet, how do you scale it from a small sample in R/Python to TBs of production data?
Building a BIG ML Workflow - from zero to hero, is about the work process you need to take in order to have a production ready workflow up and running.
Covering :
* Small - Medium experimentation (R)
* Big data implementation (Spark Mllib /+ pipeline)
* Setting Metrics and checks in place
* Ad hoc querying and exploring your results (Zeppelin)
* Pain points & Lessons learned the hard way (is there any other way?)
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. Hello!
Il cielo è Azure sopra Berlino team
2
Università degli Studi di Roma “La Sapienza”
MoS in Engineering in Computer Science
Data Mining course
A.Y. 2016/2017
Fabio Rosato - rosato.1565173@studenti.uniroma1.it
Giacomo Lanciano - lanciano.1487019@studenti.uniroma1.it
Francisco Ferreres Garcia - matakukos@gmail.com
Leonardo Martini - martini.1722989@studenti.uniroma1.it
Simone Caldaro - caldaro.1324152@studenti.uniroma1.it
Na Zhu - nana.zhu@hotmail.com
4. Machine Learning
◎ What is Big Data?
◎ What is Machine Learning?
◎ Uses of Machine Learning?
◎ Why Machine Learning?
◎ Who uses it?
4
5. What is big data?
◎ What is Big Data?
○ Structured
○ Unstructured
◎ From a variety of sources
○ Commercial transactions
○ Social media
○ Publicly available sources
○ Sensors
○ Business statistics
◎ How to analyze this data?
5
6. What is machine learning?
◎ Examine LARGE amounts of data
○ Find patterns. Build models.
◎ Automatic improvement of the algorithms
○ Iterative approach.
○ Multiple passes so the machine learns.
◎ Predictions
6
7. Uses of machine learning?
◎ Classification
○ Supervised.
○ e.g. spam filter
◎ Regression
○ Supervised.
○ Estimate relationship between
continuous variables.
○ e.g. car market price from specs
◎ Clustering
○ Unsupervised.
○ e.g. identify communities in social networks
7
8. Why machine learning?
◎ Growing volumes and varieties of available data
○ Processing this data manually would be impossible.
◎ Cheaper computational processing and storage
◎ Competitive advantage
○ Companies get huge benefits by analyzing
data from the markets.
8
9. Who uses it?
◎ Financial institutions
○ e.g. recognize and prevent frauds.
◎ Governments
○ e.g. increase efficiency and service.
◎ Medicine and science
○ e.g. dna sequencing, patients
wearable sensors.
◎ Marketing and sales
○ e.g. dna sequencing, patients
wearable sensors.
◎ You name it!
9
10. 2.
Using ML
A brief overview of the current tools
to harness the power of ML
10
11. ML is an incredibly powerful set of...
◎ Algorithms
◎ Tools
◎ Techniques
◎ ...
◎ Magic spells?!
11
12. Back in the ol’ days...
To use ML, you’d have to implement the
algorithms yourself:
◎ prototype in some kind of friendlier
language (like Matlab/Octave);
◎ then implement it in a real language (like
C++) for speed and efficiency.
12
13. Back in the ol’ days...
In-depth knowledge of ML techniques and
algorithms was required.
Huge barrier to adoption.
ML was used only in very big, very serious
applications (that could afford and justify the
overhead).
13
16. ML libraries and frameworks
◎ Exist for practically any widely used
programming language.
◎ Encapsulate most widely used algorithms,
abstracting away low-level details.
◎ Can even offer ad-hoc solutions for greater
speed/efficiency/reliability (e.g. distributed
computation).
16
19. ML as a Service
Outsourcing ML services:
◎ Incredibly low barrier to adoption.
◎ Massive scalability.
◎ It just works!
19
20. ML as a Service - The celebrities:
◎ Google Prediction APIs
◎ Amazon AWS ML
◎ Microsoft Azure ML
○ Allows users to create and train models,
then turn them into ready-to-be-consumed APIs.
All through a beautifully intuitive web interface.
20
22. What is Azure Machine Learning Studio?
◎ Web-based workspace.
◎ Drag-and-drop tool.
◎ Collaborative environment.
◎ Where data science, cloud resources, and
your data meet.
With Azure ML, predictive analytics solutions
are...
22
24. Ease of use!
ML can do amazing
things… But they could
be even more amazing
if accessible to all!
24
25. Setup
All you need is a web browser! Go to Azure ML
website and choose:
◎ Free workspace: start using all the features of
Studio immediately, no credit card required!
◎ Enterprise workspace: add extra storage and few
additional web services features ($10/month).
Then, start working on your data from anywhere!
25
27. Build - main features
To help you building your training experiment
(model) from scratch, Studio provides:
◎ Interactive, intuitive visual workspace.
◎ Drag-and-drop interaction to connect modules
with each other. For instance:
○ ready-to-use datasets.
○ ready-to-use standard ML algorithms.
○ your special sauce (cooked in Python or R).
○ …
◎ Huge set of samples and templates.
27
30. Build - additional features
Besides creating experiments, Studio allows you to:
◎ upload your own datasets.
◎ create web services. (!!!)
◎ store and reuse your trained models.
◎ create Jupyter notebooks.
◎ save your account settings.
◎ collect all previous objects into a single project.
30
31. Deploy
Once your model is ready, deploy it as a
web service in few steps:
◎ right from Studio, click on “Setup WS”.
◎ wait for your predictive experiment to be
created.
◎ click on “Deploy WS”.
◎ wait for your web service to be deployed.
◎ enjoy!
31
32. Deploy - predictive experiment
The original experiment is “translated” and
the model is used to predict results.
32
33. Deploy - web service
To call your new web service, just follow the
instructions about building the POST request.
33
34. Share
Your brand new experiment is ready to be
shared in the community. Remember,
ML accessible for all!
Upload it on Cortana Intelligence Gallery,
where data scientists and developers share
solutions.
34
35. Share - gallery
You can publish
your work directly
from the Studio.
Just follow the
instructions and
describe what you
have done!
35
38. Microsoft Azure Machine Learning
Studio
◎ Go to Microsoft Azure Machine
Learning Studio.
◎ In order to use the framework
we need a Microsoft account:
A. I already have one of them
→ just “Sign in”
B. I do not have any of them →
must “Sign Up”
38
40. Create an account
1. Fill the form
2. Click on create an
account
3. Verify your email
40
41. Sign in
◎ Type the account you want to use and log
in in the free workspace.
41
42. Five steps to create an experiment
◎ Create a model
○ Get data
○ Prepare the data
○ Define features
◎ Train the model
○ Choose and apply a learning algorithm
◎ Score and test the model
○ Predict new automobile prices
42
45. 1. Get Data
◎ Use data in the existing sample datasets
◎ Create your own dataset by NEW dataset
◎ Import data: Load data from sources such
as the Web, Azure SQL database, Azure
table, Hive table, or Windows Azure BLOB
storage. Formerly known as Reader
45
46. Using Azure saved dataset
◎ In the search bar, look for automobile
◎ Drag and drop the dataset in the
dashboard
→
46
47. Visualize the Data
◎ Selecting one column, some statistics are shown
◎ Given the variables for a specific automobile, we're going to try to predict the price (last
column)
47
48. 2. Prepare the data
◎ This menu can be used to
transform raw data to the
input of the next modules
48
49. Preprocess automobile dataset
1. Clean missing values present in
the columns of various rows so
the model can analyze the data
correctly.
2. Do not consider some columns.
→
49
50. Clean missing data: remove column
◎ Click on Launch column selector
◎ On the left, click With rules
◎ Under Begin With, click All columns.
◎ Select Exclude and column names,
◎ Click inside the text box and select normalized-losses
50
52. Run the experiment and visualize
processed data
◎ Save the experiment
◎ Run it
◎ Visualize data output from Clean
Missing Data
◎ Check differences
52
53. 3. Define features
◎ Features: individual measurable properties
of something you’re interested in.
◎ Finding a good set of features for creating a
predictive model requires experimentation
and knowledge about the problem you
want to solve.
◎ (In our example each row represents one
automobile, and each column is a feature
of that automobile)
53
54. Feature selection
◎ As before, drag Select columns in Dataset
◎ Connect Clean Missing Data to the module
just added
◎ Click on Launch column selector
◎ On the left, click With rules
◎ Under Begin With, click No columns.
◎ Select Include and column names,
◎ Click inside the text box and select “make”,
“body-style”, “wheel-base”, “engine-size”,
“horsepower”, “peak-rpm”, “highway-mpg”,
“price”
54
55. 4. Choose and apply a learning
algorithm
◎ Classification: predicts an
answer from a defined set of
categories
◎ Regression: predicts a
number.
◎ (Because we want to predict
price, which is a number,
we'll use a regression
algorithm)
Build
predictive
model
TrainTest
55
57. Learning algorithm selection
+ =
◎ Connect the "Train Model" module to both the "Linear
Regression" and "Split Data" modules
57
58. Train a specific feature
◎ Click the Train Model
module
◎ Click Launch column
selector in the
Properties pane
◎ Click By Name
◎ Select the price
column.
◎ This is the value that
our model is going
to predict.
58
59. 5. Predict new automobile prices
◎ 75 percent of our data used
to train the model using
◎ 25 percent of the data to
score the model functions.
59
60. Output of the score module
◎ Predicted values for price and its probability.
60
62. Metrics
◎ Mean Absolute Error (MAE): The average of absolute
errors (an error is the difference between the
predicted value and the actual value).
◎ Root Mean Squared Error (RMSE): The square root of
the average of squared errors of predictions made on
the test dataset.
◎ Relative Absolute Error: The average of absolute errors
relative to the absolute difference between actual
values and the average of all actual values.
◎ Relative Squared Error: The average of squared errors
relative to the squared difference between the actual
values and the average of all actual values.
◎ Coefficient of Determination: Also known as the R
squared value, this is a statistical metric indicating
how well a model fits the data.
62
63. How a metric should be
◎ For each of the error statistics, smaller is
better.
◎ A smaller value indicates that the
predictions more closely match the actual
values.
◎ For Coefficient of Determination, the closer
its value is to one (1.0), the better the
predictions.
63
64. Iterate to improve the model
◎ Change the features you use in your
prediction
◎ Modify the properties of the Linear
Regression algorithm
◎ Try a different algorithm altogether
◎ Add multiple machine learning algorithms to
your experiment at one time
◎ Compare two of them by using the Evaluate
Model module
64
65. 6. Deploy an Azure Machine Learning
web service
◎ Satisfied with your model???
◎ You can deploy it as a web service!
◎ Use the WebService to predict automobile
prices by using new data…
Create a training
experiment
Convert the training
experiment to a
predictive experiment
Deploy the predictive
experiment as a New
web service
65
66. Convert the training experiment to a
predictive experiment
◎ By converting to a predictive experiment, you're getting
your trained model ready to be deployed as a scoring
web service.
◎ Users of the web service can send input data to your
model and your model will send back the prediction
results.
◎ As you convert to a predictive experiment, keep in mind
how you expect your model to be used by others.
66
68. Deploy the predictive experiment as a
New web service
◎ Click Run
◎ Click Deploy Web Service
◎ Select Deploy Web
Service New.
◎ The deployment page of
the Machine Learning
Web Service portal
opens.
68
69. Test your Web Service with a Python
Program
◎ request/response page
contains Request
Response API
Documentation, with a
starter Python program
(that must be modified)
to call the web service
69