In this conference I made an interesting laboratory using Power BI Data Flow and Power BI Automated Machine Learning. But, before the workshop we had an interesting speak about Artificial Intelligence and Machine Learning on Azure
Managing your ML lifecycle with Azure Databricks and Azure MLParashar Shah
Machine learning development has new complexities beyond software development. There are a myriad of tools and frameworks which make it hard to track experiments, reproduce results and deploy machine learning models. Learn how you can accelerate and manage your end-to-end machine learning lifecycle on Azure Databricks using MLflow and Azure ML to reliably build, share and deploy machine learning applications using Azure Databricks. This is based on our talk at //build - https://www.youtube.com/watch?v=pe_OH07wAYc and https://mybuild.techcommunity.microsoft.com/sessions/76976
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
Scaling Production Machine Learning Pipelines with DatabricksDatabricks
Conde Nast is a global leader in the media production space housing iconic brands such as The New Yorker, Wired, Vanity Fair, and Epicurious, among many others. Along with our content production, Conde Nast invests heavily in companion products to improve and enhance our audience’s experience.
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
Many companies need to analyze large datasets that include location information. To be able to derive business insights from these datasets you need a solution that provides geospatial analysis functionalities and can scale to manage large volumes of information. The combination of CARTO and Databricks allows you to solve this kind of large scale geospatial analytics problems. CARTO provides a location intelligence platform to discover and predict key insights through location data. In this session we will see how we can integrate CARTO and Databricks and how we can take advantage of this combination to solve specific problems for industries such as logistics, telecommunications or financial services.
Managing your ML lifecycle with Azure Databricks and Azure MLParashar Shah
Machine learning development has new complexities beyond software development. There are a myriad of tools and frameworks which make it hard to track experiments, reproduce results and deploy machine learning models. Learn how you can accelerate and manage your end-to-end machine learning lifecycle on Azure Databricks using MLflow and Azure ML to reliably build, share and deploy machine learning applications using Azure Databricks. This is based on our talk at //build - https://www.youtube.com/watch?v=pe_OH07wAYc and https://mybuild.techcommunity.microsoft.com/sessions/76976
Everything generates logs. Applications, infrastructure, security ... everything. Keeping track of the flood of log data is a big challenge, yet critical to your ability to understand your systems and troubleshoot (or prevent) issues. In this session, we will use both Amazon CloudWatch and application logs to show you how to build an end-to-end log analytics solution. First, we cover how to configure an Amazon Elaticsearch Service domain and ingest data into it using Amazon Kinesis Firehose, demonstrating how easy it is to transform data with Firehose. We look at best practices for choosing instance types, storage options, shard counts, and index rotations based on the throughput of incoming data and configure a secure analytics environment. We demonstrate how to set up a Kibana dashboard and build custom dashboard widgets. Finally, we dive deep into the Elasticsearch query DSL and review approaches for generating custom, ad-hoc reports.
Scaling Production Machine Learning Pipelines with DatabricksDatabricks
Conde Nast is a global leader in the media production space housing iconic brands such as The New Yorker, Wired, Vanity Fair, and Epicurious, among many others. Along with our content production, Conde Nast invests heavily in companion products to improve and enhance our audience’s experience.
Unlocking Geospatial Analytics Use Cases with CARTO and DatabricksDatabricks
Many companies need to analyze large datasets that include location information. To be able to derive business insights from these datasets you need a solution that provides geospatial analysis functionalities and can scale to manage large volumes of information. The combination of CARTO and Databricks allows you to solve this kind of large scale geospatial analytics problems. CARTO provides a location intelligence platform to discover and predict key insights through location data. In this session we will see how we can integrate CARTO and Databricks and how we can take advantage of this combination to solve specific problems for industries such as logistics, telecommunications or financial services.
Power BI provides advanced data technology for transforming, storing, and presenting data. This technology can support advanced Excel users and data scientists. This presentation provides basics for Excel 2010 and 2013 users and information about O365 and SharePoint Online supporting shared use cases.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
At GoDataFest 2019, Guy Kfir presented how AI delivers a new experience to Formula 1 fans across the world. AWS fuels the analytics through machine learning. Did you know a Formula 1 race car contains 120 sensors and generated 3 GB of data every race at 1,500 data points per second? AWS developed several applications, including overtake possibility, pitstop advantage. How important is it for your company to invest in Machine Learning and AI? There are three scenario's for AI/ML success: Automation, Enrichment and Invention. So, what are you waiting for: create the loop, advance your data strategy and organize for succes. To get started identify AI/ML use cases, educate yourself, start with AI services and move to Amazon Sagemaker, engage with AWS, consider the partner eco system (like GoDataDriven or Binx).
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Ankit Rathi
While designing and building Data & AI platforms, you may need to evaluate the options available. Whether your platform would be on-premise or you could use cloud/s services or you would take a hybrid approach.
In any case, you may need to look and evaluate various tools & services for your ingestion, storage, process/analysis and serving layers.
In this post, I have mapped open-source and popular managed cloud services to make our evaluation process a bit easier.
Microsoft Azure Platform offers a wide range of services, predominantly PaaS services. And building a solution architecture you will use various of these pluggable services. In this session a few real world Azure solution architectures will be discussed with 'plug & play' view. Why, how the solution where designed that way, pitfalls, alternatives, constraints and lessons learned.
Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch. Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData.
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
Take control over your data. View our presentation of end-to-end enterprise business intelligence leveraging Microsoft solutions including SQL Server, Power Pivot, and Power BI.
Demonstration includes:
• How to build a Tabular model by importing a Power Pivot workbook
• Connecting a Tabular model to Power BI
• Developing Power BI dashboards/reports connected to an on-premise Tabular model
• Refreshing Power BI dashboards/reports
AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry.
Speaker: Lijia Xu, Big Data Practice Lead, Professional Services, AWS
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBigDataExpo
Er zijn veel beloftes rondom Big Data. Iedereen praat erover maar hoe begin je zonder meteen een grote business case op te moeten stellen. Cortana Analytics Suite is laagdrempelig en een makkelijk toegankelijk Advanced Analytics platform om je ideeën op haalbaarheid te testen maar daarna ook door te groeien naar (grote) productie implementaties. In deze sessie krijg je een overzicht van de scenario’s die Cortana Analytics biedt. Denk daar bij aan IOT, Machine Learning maar ook Churn Analysis, Forecasting en Predictive Maintenance.
For business users, always using AI is about easy access to the tools without writing any code. This session is not about learning how to do AI but how to make AI usable and add value.
AI powered visuals such as Key Influencer in Power BI desktop to analyse the data without deep knoledge of the machine learning concepts.
Machine Learning is approaching a peak of inflated expectations, although we see AI daily and in all contexts. Media pressure is high, governments are overly optimistic, plenty of ventures are putting money in unviable ideas or some brilliant engineers fail to reach business users.
But Microsoft bring all of this under the same roof and unleash the power of AI by integrating Power BI ecosystem with Azure ML and Cognitive services. The result is as simple and effective as great technology at end-user's hand.
When it comes to dealing with large, complex, and disparate data sets, traditional database technologies are unable to keep pace with the rich analytics necessary to power today’s data-driven applications. Graph analytics databases are becoming the underlying infrastructure for AI and machine learning. These databases allow users to ask complex questions across complex data, which is not always practical or even possible at scale using other approaches. They also enable faster insights against massive data sets when combined with pattern recognition, statistical analysis, and AI/ machine learning. And in the case of standards-based graph databases, they connect with popular visualization tools like Graphileon, allowing users to easily explore their data stores and quickly build compelling graph-based applications.
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
This was a very interesting conference, TIC students oriented where I take him to the azure ecosystem for data warehousing architecture and best practices to reach powerful Business Intelligence Solutions according to the new era
Power BI provides advanced data technology for transforming, storing, and presenting data. This technology can support advanced Excel users and data scientists. This presentation provides basics for Excel 2010 and 2013 users and information about O365 and SharePoint Online supporting shared use cases.
Introduction to Real-time, Streaming Data and Amazon Kinesis. Streaming Data ...Amazon Web Services
Amazon Kinesis is a platform for streaming data on AWS, offering powerful services to make it easy to load and analyze streaming data. In this session, you’ll learn about how AWS customers are transitioning from batch to real-time processing using Amazon Kinesis, and how to get started. We will provide an overview of streaming data applications and introduce the Amazon Kinesis platform and its services. We will walk through a production use case to demonstrate how to ingest streaming data, prepare it, and analyze it to gain actionable insights in real time using Amazon Kinesis. We will also provide pointers to tutorials and other resources so you can quickly get started with your streaming data application.
Artificial intelligence in actions: delivering a new experience to Formula 1 ...GoDataDriven
At GoDataFest 2019, Guy Kfir presented how AI delivers a new experience to Formula 1 fans across the world. AWS fuels the analytics through machine learning. Did you know a Formula 1 race car contains 120 sensors and generated 3 GB of data every race at 1,500 data points per second? AWS developed several applications, including overtake possibility, pitstop advantage. How important is it for your company to invest in Machine Learning and AI? There are three scenario's for AI/ML success: Automation, Enrichment and Invention. So, what are you waiting for: create the loop, advance your data strategy and organize for succes. To get started identify AI/ML use cases, educate yourself, start with AI services and move to Amazon Sagemaker, engage with AWS, consider the partner eco system (like GoDataDriven or Binx).
Data & AI Platforms — Open Source Vs Managed Services (AWS vs Azure vs GCP)Ankit Rathi
While designing and building Data & AI platforms, you may need to evaluate the options available. Whether your platform would be on-premise or you could use cloud/s services or you would take a hybrid approach.
In any case, you may need to look and evaluate various tools & services for your ingestion, storage, process/analysis and serving layers.
In this post, I have mapped open-source and popular managed cloud services to make our evaluation process a bit easier.
Microsoft Azure Platform offers a wide range of services, predominantly PaaS services. And building a solution architecture you will use various of these pluggable services. In this session a few real world Azure solution architectures will be discussed with 'plug & play' view. Why, how the solution where designed that way, pitfalls, alternatives, constraints and lessons learned.
Organize & manage master meta data centrally, built upon kong, cassandra, neo4j & elasticsearch. Managing master & meta data is a very common problem with no good opensource alternative as far as I know, so initiating this project – MasterMetaData.
Leveraging Microsoft Power BI To Support Enterprise Business IntelligenceRightpoint
Take control over your data. View our presentation of end-to-end enterprise business intelligence leveraging Microsoft solutions including SQL Server, Power Pivot, and Power BI.
Demonstration includes:
• How to build a Tabular model by importing a Power Pivot workbook
• Connecting a Tabular model to Power BI
• Developing Power BI dashboards/reports connected to an on-premise Tabular model
• Refreshing Power BI dashboards/reports
AWS is hosting the first FSI Cloud Symposium in Hong Kong, which will take place on Thursday, March 23, 2017 at Grand Hyatt Hotel. The event will bring together FSI customers, industry professional and AWS experts, to explore how to turn the dream of transformation, innovation and acceleration into reality by exploiting Cloud, Voice to Text and IoT technologies. The packed agenda includes expert sessions on a host of pressing issues, such as security and compliance, as well as customer experience sharing on how cloud computing is benefiting the industry.
Speaker: Lijia Xu, Big Data Practice Lead, Professional Services, AWS
Big Data Expo 2015 - Microsoft Transform you data into intelligent actionBigDataExpo
Er zijn veel beloftes rondom Big Data. Iedereen praat erover maar hoe begin je zonder meteen een grote business case op te moeten stellen. Cortana Analytics Suite is laagdrempelig en een makkelijk toegankelijk Advanced Analytics platform om je ideeën op haalbaarheid te testen maar daarna ook door te groeien naar (grote) productie implementaties. In deze sessie krijg je een overzicht van de scenario’s die Cortana Analytics biedt. Denk daar bij aan IOT, Machine Learning maar ook Churn Analysis, Forecasting en Predictive Maintenance.
For business users, always using AI is about easy access to the tools without writing any code. This session is not about learning how to do AI but how to make AI usable and add value.
AI powered visuals such as Key Influencer in Power BI desktop to analyse the data without deep knoledge of the machine learning concepts.
Machine Learning is approaching a peak of inflated expectations, although we see AI daily and in all contexts. Media pressure is high, governments are overly optimistic, plenty of ventures are putting money in unviable ideas or some brilliant engineers fail to reach business users.
But Microsoft bring all of this under the same roof and unleash the power of AI by integrating Power BI ecosystem with Azure ML and Cognitive services. The result is as simple and effective as great technology at end-user's hand.
When it comes to dealing with large, complex, and disparate data sets, traditional database technologies are unable to keep pace with the rich analytics necessary to power today’s data-driven applications. Graph analytics databases are becoming the underlying infrastructure for AI and machine learning. These databases allow users to ask complex questions across complex data, which is not always practical or even possible at scale using other approaches. They also enable faster insights against massive data sets when combined with pattern recognition, statistical analysis, and AI/ machine learning. And in the case of standards-based graph databases, they connect with popular visualization tools like Graphileon, allowing users to easily explore their data stores and quickly build compelling graph-based applications.
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
This was a very interesting conference, TIC students oriented where I take him to the azure ecosystem for data warehousing architecture and best practices to reach powerful Business Intelligence Solutions according to the new era
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
Visualize your data in Data Lake with AWS Athena and AWS Quicksight Hands-on ...Amazon Web Services
Level 200: Visualize Your Data in Data Lake with AWS Athena and AWS Quicksight
Nowadays, enterprises are building Data Lake which store lots of structured and unstructured data for data analysis. But it takes lots of time for building the data modeling and infrastructure that is required. How to make quick data queries without servers and databases is the next big question for every enterprises.
In this workshop, eCloudvalley, the first and only Premier Consulting Partner in GCR, will demonstrate how to use serverless architecture to visualize your data using Amazon Athena and Amazon Quicksight.
You can easily query and visualize the data in your S3, and get business insights with the combination of these two services. Also, you can also build business reports with other tools such as AWS IoT, Amazon Kinesis Firehose.
Reason to Attend:
Learn how to quickly search for thousands of data on S3 via serverless Amazon's Athena
Learn how to use AWS QuickSight to retrieve information from your database quickly and create detailed reports
Data Lake allows an organisation to store all of their data, structured and unstructured, in one, centralised repository. Since data can be stored as-is, there is no need to convert it to a predefined schema and you no longer need to know what questions you want to ask of your data beforehand. In this session we will explore the architecture of a Data Lake on AWS and cover topics such as storage, processing and security.
We live in a world of unprecedented change. To be successful in this world of change, you will need to develop a data culture, creating an environment where every team and every individual is empowered to do great things because of the data at their fingertips. In this event you will learn how to create a culture of data and how the Microsoft Modern BI platform and tools can help you to can harness the power of data once only reserved for data scientists. Learn about how to tap into the power of natural language, self-service business insights and visualization capabilities – and make insights available to anyone, anywhere, at any time.
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
The world is creating more data in more ways than ever before. The average internet user in 2017 generates 1.5GB of data per day, with the rate doubling every 18 months. A single autonomous vehicle can generate 4TB per day. Each smart manufacturing plant generates 1PB per day. Storing, managing, and analyzing this data requires integrated database and analytic services that provide reliability and security at scale. AWS offers a range of managed data services that let customers focus on making data useful, including Amazon Aurora, RDS, DynamoDB, Redshift, Spectrum, ElastiCache, Kinesis, EMR, Elasticsearch Service, and Glue. In this session, we discuss these services, share our vision for innovation, and show how our customers use these services today. Learn More: https://aws.amazon.com/government-education/
Understanding AWS Managed Database and Analytics Services | AWS Public Sector...Amazon Web Services
The world is creating more data in more ways than ever before. The average internet user in 2017 generates 1.5GB of data per day, with the rate doubling every 18 months. A single autonomous vehicle can generate 4TB per day. Each smart manufacturing plant generates 1PB per day. Storing, managing, and analyzing this data requires integrated database and analytic services that provide reliability and security at scale. AWS offers a range of managed data services that let customers focus on making data useful, including Amazon Aurora, RDS, DynamoDB, Redshift, Spectrum, ElastiCache, Kinesis, EMR, Elasticsearch Service, and Glue. In this session, we discuss these services, share our vision for innovation, and show how our customers use these services today. Learn More: https://aws.amazon.com/government-education/
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Freddie Mac & KPMG Case Study – Advanced Machine Learning Data Integration wi...DataWorks Summit
Freddie Mac and KPMG will share an innovative solution to accelerate data model (ERM) development and data integration on a highly-distributed, in-memory computing platform. The machine learning component (PySpark) of the framework executes against evolving semi-structured and structured data sets to learn and automate data mapping from various sources to a targeted schema. As a result, it significantly reduces the manual analysis, design and development effort, as well as establishes faster data integration across a variety of complex and high-volume datasets.
The solution will leverage various components of the Hadoop data platform. It will use Sqoop to import the data into the platform. PySpark will be leveraged in order to process the data. In addition, the application will also have a developed PySpark ML model that will run as a continuous job in Spark to process the ingested semi-structured data and intelligently map into the proper Hive tables. This will all be scheduled thru the use of Oozie.
Speakers
Kevin Martelli, KPMG, Managing Director
Balaji Wooputur, Freddie Mac, Risk Analyst Director
Data warehousing in the era of Big Data: Deep Dive into Amazon RedshiftAmazon Web Services
Analyzing big data quickly and efficiently requires a data warehouse optimized to handle and scale for large datasets. Amazon Redshift is a fast, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all of your data for a fraction of the cost of traditional data warehouses. In this session, we take an in-depth look at data warehousing with Amazon Redshift for big data analytics. We cover best practices to take advantage of Amazon Redshift's columnar technology and parallel processing capabilities to deliver high throughput and query performance. We also discuss how to design optimal schemas, load data efficiently, and use work load management.
Big Data, IoT, data lake, unstructured data, Hadoop, cloud, and massively parallel processing (MPP) are all just fancy words unless you can find uses cases for all this technology. Join me as I talk about the many use cases I have seen, from streaming data to advanced analytics, broken down by industry. I’ll show you how all this technology fits together by discussing various architectures and the most common approaches to solving data problems and hopefully set off light bulbs in your head on how big data can help your organization make better business decisions.
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
This presentation will cover Cloud history and Microsoft Azure Data Analytics capabilities. Moreover, it has a real-world example of DW modernization. Finally, we will check the alternative solution on Azure using Snowflake and Matillion ETL.
In this opportunity I spoke about the distinct way that we have on azure to manage data pipelines and what are the best practices. I showed to the audience some bid data, stream data and transnational data architecture using Azure services
In this opportunity I spoke for almost 4 hours -with a lunch in between- about the analytics solutions on azure and it's tool for machine learning and cognitive services. I introduced the automated machine learning on Azure with some demos in real time.
On this conference I talked about the importance that the data has for any company that wants to become a data driven decision firm. Also I show two scenarios to develop and maintain data solutions for analytics workloads
Presentación e introducción del curso. Abordamos los siguientes temas con la intención de familiarizar con estos temas a los que inician a hacer sus pininos en DB y refrescar los conceptos a los que tienen niveles más avanzados.
¿Qué hacer con los datos?
Diferencia entre datos e información
Objetivos
Conceptos básicos:
- Bases de datos
- DBMS
-Tablas
- Filas
- Columnas
- Relaciones y tipos de relaciones
- Claves primarias y foráneas
- Tipos de datos
- Integridad referencial
- Redundancia
- Restricciones de integridad
Presentación e introducción del curso. Abordamos los siguientes temas con la intención de familiarizar con estos temas a los que inician a hacer sus pininos en DB y refrescar los conceptos a los que tienen niveles más avanzados.
¿Qué hacer con los datos?
Diferencia entre datos e información
Objetivos
Conceptos básicos:
- Bases de datos
- DBMS
-Tablas
- Filas
- Columnas
- Relaciones y tipos de relaciones
- Claves primarias, candidatas y foráneas
- Tipos de datos
- Integridad referencial
- Redundancia
- Restricciones de integridad
- Normalización
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
1. Elena López
MicrosoftMVP – DataPlatform
elopez@solvex.com.do
www.solvex.com.do
Arquitectura de Datos en Azure
más allá de Data Factory, Power BI y Azure SQL
Database
2. Azure Data Services
Advanced Analytics
Social
LOB
Graph
IoT
Image
CRM
INGEST STORE PREP MODEL & SERVE
(& store)
Data orchestration
and monitoring
Big data store Transform & Clean Data warehouse
AI
BI + Reporting
Azure Data Factory
SSIS
Azure Data Lake
Storage Gen2
Blob Storage
Cosmos DB
Azure Databricks
Azure HDInsight
Power BI Dataflow
Azure Data Lake Analytics
Azure SQL Data Warehouse
Azure Analysis Services
Cosmos DB
Power BI Aggregations
3. A “no-compromises” Data Lake: secure, performant, massively-scalable Data Lake storage that brings the cost and
scale profile of object storage together with the performance and analytics feature set of data lake storage
Azure Data Lake Storage Gen2
M A N A G E A B L E S C A L A B L EF A S TS E C U R E
No limits on
data store size
Global footprint
(50 regions)
Optimized for Spark
and Hadoop
Analytic Engines
Tightly integrated
with Azure end to
end analytics
solutions
Automated
Lifecycle Policy
Management
Object Level
tiering
Support for fine-
grained ACLs,
protecting data at the
file and folder level
Multi-layered
protection via at-rest
Storage Service
encryption and Azure
Active Directory
integration
C O S T
E F F E C T I V E
I N T E G R AT I O N
R E A D Y
Atomic file
operations
means jobs
complete faster
Object store
pricing levels
File system
operations
minimize
transactions
required for job
completion
4. Objectives
Plan the structure based on optimal data retrieval
Avoid a chaotic, unorganized data swamp
Data Retention Policy
Temporary data
Permanent data
Applicable period (ex: project lifetime)
etc…
Business Impact / Criticality
High (HBI)
Medium (MBI)
Low (LBI)
etc…
Confidential Classification
Public information
Internal use only
Supplier/partner confidential
Personally identifiable information (PII)
Sensitive – financial
Sensitive – intellectual property
etc…
Probability of Data Access
Recent/current data
Historical data
etc…
Owner / Steward / SME
Subject Area
Security Boundaries
Department
Business unit
etc…
Time Partitioning
Year/Month/Day/Hour/Minute
Downstream App/Purpose
Common ways to organize the data:
Organizing a Data Lake – Folder structure
6. Architecture
Automated ML
Power BI
Dashboard
Data for
Real-time
Processing
Data Stream
Job
Hourly Prediction
Updates
External Data Azure Services
Send to Azure SQL
for predictions
Get Data
Azure WebJob
Runs jobs to get data
from public source
Azure SQL
Contains Historical Energy
Consumption & Weather Data
Real time
data stats
Azure Data Factory
Pipeline invokes AML
Web Service
Energy Consumption
Data & Weather Data
(Public Source)
Azure Event Hub
stores streaming
data
Azure Stream Analytics
processes events as they
arrive in the Event Hub
Power BI
Dashboard
Data for
Real-time
Processing
Data Stream
Job
Hourly Prediction
Updates
Get Data
Azure WebJob
Runs jobs to get data
from public source
Real time
data stats
8. How much is this car worth?
Machine Learning Problem Example
9. Model Creation Is Typically Time-Consuming
Mileage
Condition
Car brand
Year of make
Regulations
…
Parameter 1
Parameter 2
Parameter 3
Parameter 4
…
Gradient Boosted
Nearest Neighbors
SVM
Bayesian Regression
LGBM
…
Mileage Gradient Boosted Criterion
Loss
Min Samples Split
Min Samples Leaf
Others Model
Which algorithm? Which parameters?Which features?
Car brand
Year of make
10. Criterion
Loss
Min Samples Split
Min Samples Leaf
Others
N Neighbors
Weights
Metric
P
Others
Which algorithm? Which parameters?Which features?
Mileage
Condition
Car brand
Year of make
Regulations
…
Gradient Boosted
Nearest Neighbors
SVM
Bayesian Regression
LGBM
…
Nearest Neighbors
Model
Iterate
Gradient BoostedMileage
Car brand
Year of make
Car brand
Year of make
Condition
Model Creation Is Typically Time-Consuming
11. Which algorithm? Which parameters?Which features?
Iterate
Model Creation Is Typically Time-Consuming
12. Introducing Automated Machine Learning
Dataset
Optimization
Metric
Constraints
(Time/Cost)
ML ModelAutomated ML
Accessible & Faster
13. Enter data
Define goals
Apply constraints
Output
Automated ML Accelerates Model Development
Input Intelligently test multiple models in parallel
Optimized model
14. Automated ML Capabilities
• Based on Microsoft Research
• Brain trained with several
million experiments
• Collaborative filtering and
Bayesian optimization
• Privacy preserving: No need
to “see” the data
15. Automated ML Capabilities
• ML Scenarios: Classification &
Regression
• Integration: Azure Machine
Learning, Azure Notebooks,
Jupyter Notebooks
• Data Type: Numeric, Text
• Languages: Python SDK for
deployment and hosting for
inference
• Training Compute: Local Machine,
Remote Azure DSVM (Linux),
Azure Batch AI
• Transparency: View run history,
model metrics
• Scale: Faster model training using
multiple cores and parallel
experiments
16. GA:
• Feature importance as part of
training
• Simple UX for feature importance
for a selected iteration
• Local feature importance for a
given sample
Post GA:
• Importance of Raw data columns
• Accuracy and performance
improvements
Model Explain-ability