Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

pass marathon predictive analytics use cases - PowerBI and Azure ML


Published on

Comparing Azure ML Lab and PowerBI , like apples and oranges

Published in: Software
  • Login to see the comments

pass marathon predictive analytics use cases - PowerBI and Azure ML

  1. 1. Presenting Sponsor Azure Machine Learning Studio and PowerBI Yana Berkovich, Microsoft MVP, BI & Applications Lead , Onni Group of Companies Moderated By: Aasish Sharma
  2. 2. Technical Assistance If you require assistance during the session, type your inquiry into the question pane on the right side. Maximize your screen with the zoom button on the top of the presentation window. Please fill in the short evaluation following the session. It will appear in your web browser.
  3. 3. Thank You to Our Sponsor KingswaySoft is a leading provider of high-performance data integration solutions for connectivity and productivity using SSIS as the ETL platform. Organizations from more than 70 countries rely on our solutions to drive their business data efficiency.
  4. 4. Attend PASS Summit to Grow Your Career • The Community PASS Summit is the largest conference for technical professionals who leverage the Microsoft Data Platform. November 6-9 | Seattle, WA Connect with a global network of 250,000+ data professionals
  5. 5. Yana Berkovich Microsoft Data Platform MVP VANO365 User group president Vancouver PowerBI user group member and speaker SQL and O365 Saturdays and community events speaker @Yana_Berkovich yanaberkovich/ Yana@yanaberkovich. com
  6. 6. Presenting Sponsor Azure Machine Learning Studio and PowerBI Yana Berkovich, Microsoft MVP, BI & Applications Lead , Onni Group of Companies
  7. 7. Azure Machine Learning Lab & PowerBI Why are we using those tools? AzureML Lab vs PowerBI Getting the Data PreProcessing the Data The prediction model Sample, population and your data set Example - Exponential Smoothing Method Quick Summary – what to use when…?
  8. 8. Data science as Gartner defines it…
  9. 9. ML is in the Peak since 2015! Gartner
  10. 10. Decision Support​ This is how it looks like This is how the Data Analysis field is trying to make it look like: James Taylor, leader of Decision Management Solutions Decision Management Systems: A Practical Guide to Using Business Rules and Predictive Analytics (IBM Press)
  11. 11. What is Azure ML Lab/Studio? Built on top of the machine learning capabilities of several Microsoft products and services. Shares many of the real-time predictive analytics of the new personal assistant - Cortana. Azure ML also uses proven solutions from Xbox and Bing. Components Lab Gallery Only cloud based tool in Azure Audience: Data Analysts, Statisticians, Actuary, Data Scientists … Users: Data Analysts, Data Scientists
  12. 12. A suite of business analytics tools that deliver insights. Connects to hundreds of data sources, simplifies data prep, and drives ad hoc analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices. Scalable across the enterprise, with governance and security built-in. Components Desktop O365 Mobile Embedded Report Server Insights apps Cloud solution, on-premise solution, mobile solution What is PowerBI? Audience: Business Users & Managers Users: IT, Finance, Marketing, Manufacturing, Data Analysts…
  13. 13. Azure ML A service that was created for developers and data scientist Business users, end users and customers, Analysts friendly Predict the future Train and create custom models based on statistics that will help answer questions Visualize the existing data for business use Answer business questions Predict the future??!! Is there a better why that can potentially generate more value for the business? PowerBI Get insights to give information for the Decision Support Who? What? Why?
  14. 14. Some useful sources for the beginners… Machine Learning • • ning/ • us/blog/tag/azure-machine-learning/ • • • us/professional-program/tracks/data-science/ PowerBI • • • •
  15. 15. Getting the data
  16. 16. Getting the Data - Decision Support System and the tip of the iceberg Predict Model Insights Information Data Data Science starts with data gathering, Getting the data from the metrics is hard! The data collection process can be a result of meetings, Telemetry, IOT…
  17. 17. How are we using ML Studio? Machine Learning Process Cycle Adapted from: Azure Machine Learning Studio Four Tips from the Pros by Brad Llewellyn’s presentation PASS Link 4& Data Collection Data Cleansing Data Manipulations Model Creation Model Evaluation
  18. 18. IT all starts with the right question and Business Goal!
  19. 19. Case Study Airplanes are never late…. We are going to analyze the data set of flights during the month of October This data set was taken from the sample data sets in ML studio
  20. 20. Getting the Data Azure ML Lab PowerBI Data set CSV file, txt, Excel, Hive table, SQL table, Odata, SVMlight, Zip, R object Source – CSV file in this case, More than a 100 different sources Source Type Data Delimiter Data connection and refresh
  21. 21. Visualizing the Data Azure ML Lab PowerBI Data Preview Histograms, box plots Raw data This is the main goal of this tool – Data visualization Recently, similar automatic visualizations Data view for all the visualizations click the Aggregated data
  22. 22. Visualizing the Data PowerBI Quick insights mode Quick insights mode
  23. 23. PreProcessing the data
  24. 24. Azure ML Data Type, Change metadata module Data Type – automatic detection, Change the type in a SQL query, directly on the column Clean missing data – minimum maximum missing value ration (even 100% of the data cleaned) Clean duplications, first last top rows Use DAX queries and R PowerBI Create measures calculated based on data ranges Data Cleansing Convert the data into categories from range Group categorical values Edit metadata SMOTE - increasing rows/facts number Edit metadata
  25. 25. Azure ML Selecting columns, Selecting columns, Merging, Join with other data source – SQL manipulations, R Manipulations, Python manipulations Building Dimensions – Time dimension, Airport Dimension… Creating custom measures, quick measures and code based measures using DAX PowerBI ERD- create connections between the dimensions and the fact tables Data Manipulations Creating Join through SQL query, Merging, Appending lines Creating EDR through join of another dimension table for the selected columns Using R or Python for creating custom measures (avg, mean…)
  26. 26. Azure ML Only if you build a model for that Out of the box visualization for the data set with 2 graphic options as previously mentioned Q&A functionality recently available on desktop Looks very similar to the visualizations that exist in ML lab Enables the user to add the FAQ visualization to the dashboard or report “native” language questions answered- What is the most late flight from Chicago airport? PowerBI Data Manipulations
  27. 27. The prediction model
  28. 28. Main Steps in creating an Experiment / Report AzureML Experiment PowerBI Report  Get data  Clean the data  Prepare the data (adding columns, calculations, missing data types, joins, SQL manipulations…)  Divide the data – sample for the model to train, data for evaluation  Choose the model  Train the Algorithm  Score using the data for evaluation  Evaluate  Save as a trained model for later use or  Create Web Service and predict for new data sets  Get or connect to the data  Clean the query  Create measures and dimensions  Create connections using ERD  Create data visualizations  Q&A Analyze the data and get the answers to your question  Add visualizations to Dashboard  Create Application and publish
  29. 29. Which Questions do we ask our Model? Azure ML Lab PowerBI  How do we predict if a certain flight is going to be late?  How does the weather affect the flight being late?  If we are going to fly from a certain airport, will our flight be late – Ask the Web service!  What is the chance for the flight to be less than 15min late if it’s AA? What is the precision of this prediction? Future Events  We generally don’t! It is mostly a data Visualization tool not a tool we use to predict  What is the average? Max? Min?  Which Airport has the most late arrivals?  What is the correlation and the trend between the weather and the delay time?  Clustering the data, which airports are in the most late cluster? – histograms and brick charts Events that have already happened, limited prediction
  30. 30. What is a prediction model? Which Algorithm is the best fit to predict the results, depending on the data Has the data seasonal? hads repetitions? Categorical? Linear Regression or Poisson Regression? How can we know what works best? Based on the past results! Main model types: Anomaly Detection Classification Clustering Regression
  31. 31. Statistics…and prediction models How do we predict the average late departure? Average Single Exponential Smoothing Exponential smoothing is a rule of thumb technique for smoothing time series data using the exponentialwindow function. Whereas in the simple moving average the past observations are weighted equally,exponential functions are used to assignexponentially decreasing weights over time. ( Wikipedia to the rescue… ) Moving Average The last month might be a better prediction for flights than the last 20 months Weighted Moving Average Some observations are more significant than others, flights of a domastic flight company have different performance and cannot be compared to others or big vs small planes Can be chosen, for the single smoothing, between 0.1 and 0.9, is chosen through a local optimal minimum value We choose the best value for α so the value which results in the smallest MSE. (Mean of Square Errors)
  32. 32. Adding information to our data visualization PowerBI Min value line Max value line Trend line – we can see that the AVG delay time increases? Expediential Smooth Seasonality – 7 points (week in a month) Ignore last 10 points – to check our prediction Forecast length- to see what the other 7 days will look like
  33. 33. Adding information to our data visualization PowerBI – How can we explain the predicted results? Trend line – we can see that the AVG delay time increases? How can we validate and score the predicted results? Azure ML Lab • End of October - Thanksgiving? • Weather changes at the airports for the worse • The trend line doesn’t continue for the predicted data • How can we control the Alpha? Well in Power View for O365, not in PowerBI yet
  34. 34. More options in PowerBI? – R R model for more, simple prediction options in PowerBI Add the R code in the PowerBI model for the relevant data column The R visualization can do predictive models of your choice It is limited but very useful for business case scenarios Recommended Blog post - Revenue and forecasting by Christian Berg – Plot using R New Series of Time Series by PHD MVP Leiila Etatti –
  35. 35. Meanwhile in Azure ML Lab Unfortunately, the ETS – Exponential smoothing module was deprecated, so lets choose a better one! Edit Metadata – Adding the column for the Average values Split the data into sample and population (not just ignore last 10 but randomize the split) The question what is the average late time expected is simply wrong for this tool, we would like to use it for actually predicting for each flight if it is going to be late, or how the weather affects the flights being late.
  36. 36. Azure ML Lab some of the Mathematical models Decision Forest Regression Linear Regression (Excell as well…Solver) 2 Class Boosted Decision Tree Decision Tree 2 Class Logistic Regression Will be used in the prediction demo to compare which is predicting the best way K- Mean Clustering (PBI as well)
  37. 37. • Bullet one • Bullet two • Bullet three The Prediction by Airport – Hartsfield in Atlanta Georgia and Chicago are the 2 leading airports that the weather has a very large impact on the delay times, the delay times there are the largest, just like we hear in the news about those airports being in delay (How many Hallmark movies are using the weather in Chicago airport during a snowstorm in Christmas…)
  38. 38. • Bullet one • Bullet two • Bullet three The Flight Delay prediction compare the scored models  So the blue prediction model is slightly better than the red one, to predict if the flight is going to be late.  Two class boosted decision tree is slightly better than two class logistics regression
  39. 39. Last Slide – What? When?
  40. 40. Azure ML Data scientists, developers Business users, end users and customers, Analysts friendly Be the development platform for prediction analytics solutions Development platform and publishing platform for data visualization Upload the data, manipulate the data, divide into data set and training set, train the model, evaluate the model create service, predict for other data sets PowerBI Connect to data, create report, analyze exciting data and get data insights Who? What? Why? Ask questions – Business users and managers questions, evaluate, compare, classify, displayPredict given a mathematical trained model based on past results The next generation is already here… Azure IoT hub, Azure AI and Machine learning focused on devs
  41. 41. Questions?
  42. 42. Thank you for attending @sqlpass #sqlpass @PASScommunity Presenting Sponsor