Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Microsoft Technologies for Data Science 201612


Published on

Delivered to SQL Saturday BI Edition -- Atlanta, GA
Microsoft provides several technologies in and around Azure which can be used for casual to serious data science. This presentation provides an overview of the major Microsoft options for both on-premise and cloud-based data science (and hybrid). These technologies have been used by the presenter in various companies and industries, both as a Microsoft consultant and previously independent consultant. As well, the speaker provides insights into data science careers, information which helps imply where the business will likely be for consultants and partners.

Published in: Data & Analytics
  • Be the first to comment

Microsoft Technologies for Data Science 201612

  1. 1. Microsoft Technologies for Data Science Mark Tabladillo, Ph.D. Lead Data Scientist (Architect) Microsoft December 2016: SQL Saturday BI Atlanta, GA
  2. 2. Networking Interactive
  3. 3.     
  4. 4.          
  5. 5. Terms Definition Data Science Machine Learning Data Mining Applied Statistics the automated or semi- automated process of discovering patterns in data Applied scientific method
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11. Technology Choices SQL SERVER ANALYSIS SERVICES Enterprise Business Intelligence EXCEL ADD-IN FOR SSAS Office 365 Office 2013 or Higher x64 SEMANTIC SEARCH Enterprise Business Intelligence Standard Web Express with Advanced Services MICROSOFT AZURE ML Free (Size Limited) Paid (Web Service): Experiment + Query F# Open Source SQL SERVER R SERVICES SQL Server 2016 or higher
  12. 12. 4351-4434-A78A- 3384CA7515BF/SQL_Server_2016_Deeper_Insights_Across_D ata_White_Paper.pdf
  13. 13. SS SQL AS NoSQL
  14. 14. Data mining add-in for business analysts • Ease of use • Rich data mining • Scalable
  15. 15. Rowset Output with Scores Varchar NVarchar Office PDF
  16. 16. Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  17. 17. Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  18. 18. Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  19. 19. Features Microsoft R Open R Distribution (Free) Microsoft R Client Free Microsoft R Server Commercial Big Data In-memory bound Can only process datasets that fit into the available memory In-memory bound Can process datasets that fit into the available memory Operates on large volumes when connected to R Server Disk scalability Operates on bigger volumes & factors Speed of Analysis Multi-threaded when MKL is installed for non-ScaleR functions Multi-threaded with MKL for non-ScaleR functions Up to 2 threads for ScaleR functions with a local compute context Full parallel threading & processing Enterprise Readiness Community support Community support Commercial support Analytic Breadth & Depth 8000+ open source packages Leverage & optimize open source R packages plus 'Big Data'-ready ScaleR packages Leverage & optimize open source R packages plus 'Big Data'-ready + Multithreaded ready ScaleR packages Commercial Viability Risk of deployment to open source Free for everyone Commercial licenses DeployR Enterprise Not available Not available Included
  20. 20. Microsoft R Server Editions Description Install ScaleR Get Started R Server for Hadoop Scale your analysis transparently by distributing work across nodes without complex programming Doc Doc R Server for Teradata DB Run advanced analytics in- database for seamless data analysis Doc Doc R Server for Linux Bring predictive and prescriptive analytics power to your Linux environments Doc Doc
  21. 21.  
  22. 22. Mutable Immutable Classic Open Source Java Scala .NET Now Open Source C#, C++, VB.NET F#
  23. 23.   
  24. 24.
  25. 25. Capabilities Products Preconfigured solutions •Business scenarios •Forecasting, churn, etc. Intelligence •Integration with Cortana •Bot services •Cognitive services •Cortana •Bot Framework •Cognitive Services Dashboards and visualizations •Dashboards and visualizations •Power BI Machine learning and advanced analytics •Machine learning •Hadoop •Distributed analytics •Complex event processing •Machine Learning •HDInsight (Data Lake service) •Data Lake analytics •Stream Analytics Big data stores •Big Data repository •Elastic data warehouse •Data Lake store, Blobs •SQL Data Warehouse Information management •Data orchestration •Data catalog •Event ingestion •Data Factory •Data catalog •Event Hubs
  26. 26.  
  27. 27. 
  28. 28. 
  29. 29.  https://academy.microso US/professional- degree/data-science/  https://borntolearn.msle announcing-the- microsoft-professional- degree-mpd-program
  30. 30. books.html
  31. 31.     
  32. 32. US/home?forum=MachineLearning videos-february-2015
  33. 33.    
  34. 34.   R Server Overview  Introduction to MicrosoftML  MicrosoftML Algorithm Cheat Sheet  Inside the Trump Bunker
  35. 35. Linked In @MarkTabNet