Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise and one platform for everyone from a citizen developer to a data engineer. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product from Microsoft and I will simplify your understanding of it via a presentation and demo.
Agenda:
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Power BI Overview, Deployment and GovernanceJames Serra
Deploying Power BI in a large enterprise is a complex task, and one that requires a lot of thought and planning. The purpose of this presentation is to help you make your Power BI deployment a success. After a quick Power BI overview, I’ll discuss deployment strategies, common usage scenarios, how to store and refresh data, prototyping options, how to share externally, and then finish with how to administer and secure Power BI. I’ll outline considerations and best practices for achieving an optimal, well-performing, enterprise level Power BI deployment.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
This presentation walks through the Security and Compliance functionality to customers leveraging Azure as a compute environment. It includes deep-dive references to detailed information on each topic presented.
Cloud migrations are hardly one size fits all. It can be challenging to migrate from a large-scale data center to an optimized AWS environment without draining IT resources. By leveraging CSC, organizations are able to determine exactly what they need from their IT infrastructure and efficiently migrate to a customized cloud environment on AWS that meets those needs. With 400+ AWS certified architects and 30+ experts with AWS professional-level certification, CSC helps organizations experience seamless, results-oriented migrations. Register for the upcoming webinar to hear speakers from CSC and AWS discuss the ins and outs of a successful large-scale migration to AWS.
Join us to learn:
How CSC helped a large federal systems integration company migrate their workloads to the AWS Cloud in less than three months
How CSC has facilitated customers split from their shared IT environment in less than 3 months
The step-by-step process of an efficient data center migration
Who Should Attend:
IT Manager, IT Security Manager, Solution Architect, Cloud App Architect, System Administrator, IT Project Manager, Product Manager, Business Development
Organizations are grappling to manually classify and create an inventory for distributed and heterogeneous data assets to deliver value. However, the new Azure service for enterprises – Azure Synapse Analytics is poised to help organizations and fill the gap between data warehouses and data lakes.
Power BI Overview, Deployment and GovernanceJames Serra
Deploying Power BI in a large enterprise is a complex task, and one that requires a lot of thought and planning. The purpose of this presentation is to help you make your Power BI deployment a success. After a quick Power BI overview, I’ll discuss deployment strategies, common usage scenarios, how to store and refresh data, prototyping options, how to share externally, and then finish with how to administer and secure Power BI. I’ll outline considerations and best practices for achieving an optimal, well-performing, enterprise level Power BI deployment.
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
This presentation walks through the Security and Compliance functionality to customers leveraging Azure as a compute environment. It includes deep-dive references to detailed information on each topic presented.
Cloud migrations are hardly one size fits all. It can be challenging to migrate from a large-scale data center to an optimized AWS environment without draining IT resources. By leveraging CSC, organizations are able to determine exactly what they need from their IT infrastructure and efficiently migrate to a customized cloud environment on AWS that meets those needs. With 400+ AWS certified architects and 30+ experts with AWS professional-level certification, CSC helps organizations experience seamless, results-oriented migrations. Register for the upcoming webinar to hear speakers from CSC and AWS discuss the ins and outs of a successful large-scale migration to AWS.
Join us to learn:
How CSC helped a large federal systems integration company migrate their workloads to the AWS Cloud in less than three months
How CSC has facilitated customers split from their shared IT environment in less than 3 months
The step-by-step process of an efficient data center migration
Who Should Attend:
IT Manager, IT Security Manager, Solution Architect, Cloud App Architect, System Administrator, IT Project Manager, Product Manager, Business Development
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudNew Relic
The process of building new apps or migrating existing apps to a cloud-based platform is complex. There are hundreds of paths you can take and only a few will make sense for you and your business. Get a step-by-step guide on how to plan for a successful app migration.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
In this step-by-step Power Apps beginner tutorial, you will learn all about the different App Types in Power Apps. Canvas Power Apps Vs Model-driven Apps Vs Power Apps Portals. You will learn how to create your first Canvas PowerApp, Model-driven App & Portal, understand the differences between the App types - features, licensing, data sources etc. with demos of PowerApps & more.
Power Apps is a suite of apps, services, and connectors, as well as a data platform, that provides a rapid development environment to build custom apps for your business needs. Using Power Apps, you can quickly build custom business apps that connect to your data stored either in the underlying data platform (Microsoft Dataverse) or in various online and on-premises data sources (such as SharePoint, Microsoft 365, Dynamics 365, SQL Server, and so on)
In this video you will learn about:
✅ What is Power Apps?
✅ Different Types of Power Apps - Canvas Apps, Model Driven Apps and Portals
✅ When to use what?
✅ Feature Comparison - Licensing, External Access, Data Sources, etc.
✅ How to build a Canvas App?
✅ How to build a Model-driven App?
✅ How to build a Power Apps Portal?
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Migrating large fleets of legacy applications to AWS cloud infrastructure requires careful planning, since each phase needs to balance risk tolerance against the speed of migration.
Through participation in many large-scale migration engagements with customers, AWS Professional Services has developed a set of successful best practices, tools, and techniques that help migration factories optimize speed of delivery and success rate. In this session, we cover the complete lifecycle of an application portfolio migration with special emphasis on how to organize and conduct the assessment and how to identify elements that can benefit from cloud architecture.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Microsoft PowerApps and Flow enable any Office 365 user to create mobile apps, electronic forms and workflows. These simple tools enable citizen developers to create business-focused apps that support business processes in the modern digital workplace.
A dive into Microsoft Fabric/AI Solutions offering. For the event: AI, Data, and CRM: Shaping Business through Unique Experiences. By D. Koutsanastasis, Microsoft
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
These are slides from an introductory session for Microsoft Azure done at IIT Sri Lanka giving the students hands-on exposure to Microsoft Azure. Introducing them to Azure App Service and Azure Functions.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Cloud Migration Cookbook: A Guide To Moving Your Apps To The CloudNew Relic
The process of building new apps or migrating existing apps to a cloud-based platform is complex. There are hundreds of paths you can take and only a few will make sense for you and your business. Get a step-by-step guide on how to plan for a successful app migration.
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. I’ll include use cases so you can see what approach will work best for your big data needs.
In this step-by-step Power Apps beginner tutorial, you will learn all about the different App Types in Power Apps. Canvas Power Apps Vs Model-driven Apps Vs Power Apps Portals. You will learn how to create your first Canvas PowerApp, Model-driven App & Portal, understand the differences between the App types - features, licensing, data sources etc. with demos of PowerApps & more.
Power Apps is a suite of apps, services, and connectors, as well as a data platform, that provides a rapid development environment to build custom apps for your business needs. Using Power Apps, you can quickly build custom business apps that connect to your data stored either in the underlying data platform (Microsoft Dataverse) or in various online and on-premises data sources (such as SharePoint, Microsoft 365, Dynamics 365, SQL Server, and so on)
In this video you will learn about:
✅ What is Power Apps?
✅ Different Types of Power Apps - Canvas Apps, Model Driven Apps and Portals
✅ When to use what?
✅ Feature Comparison - Licensing, External Access, Data Sources, etc.
✅ How to build a Canvas App?
✅ How to build a Model-driven App?
✅ How to build a Power Apps Portal?
Power BI Governance - Access Management, Recommendations and Best PracticesLearning SharePoint
This document outlines permissions management for Power BI Workspace and features of new admin, Member and Contributor Roles. Recommendations and best practices for sharing report are also included. Free to Download.
Databricks is a Software-as-a-Service-like experience (or Spark-as-a-service) that is a tool for curating and processing massive amounts of data and developing, training and deploying models on that data, and managing the whole workflow process throughout the project. It is for those who are comfortable with Apache Spark as it is 100% based on Spark and is extensible with support for Scala, Java, R, and Python alongside Spark SQL, GraphX, Streaming and Machine Learning Library (Mllib). It has built-in integration with many data sources, has a workflow scheduler, allows for real-time workspace collaboration, and has performance improvements over traditional Apache Spark.
Migrating large fleets of legacy applications to AWS cloud infrastructure requires careful planning, since each phase needs to balance risk tolerance against the speed of migration.
Through participation in many large-scale migration engagements with customers, AWS Professional Services has developed a set of successful best practices, tools, and techniques that help migration factories optimize speed of delivery and success rate. In this session, we cover the complete lifecycle of an application portfolio migration with special emphasis on how to organize and conduct the assessment and how to identify elements that can benefit from cloud architecture.
Learn to Use Databricks for Data ScienceDatabricks
Data scientists face numerous challenges throughout the data science workflow that hinder productivity. As organizations continue to become more data-driven, a collaborative environment is more critical than ever — one that provides easier access and visibility into the data, reports and dashboards built against the data, reproducibility, and insights uncovered within the data.. Join us to hear how Databricks’ open and collaborative platform simplifies data science by enabling you to run all types of analytics workloads, from data preparation to exploratory analysis and predictive analytics, at scale — all on one unified platform.
Building a Logical Data Fabric using Data Virtualization (ASEAN)Denodo
Watch full webinar here: https://bit.ly/3FF1ubd
In the recent Building the Unified Data Warehouse and Data Lake report by leading industry analysts TDWI, we have discovered 64% of organizations stated the objective for a unified Data Warehouse and Data Lakes is to get more business value and 84% of organizations polled felt that a unified approach to Data Warehouses and Data Lakes was either extremely or moderately important.
In this session, you will learn how your organization can apply a logical data fabric and the associated technologies of machine learning, artificial intelligence, and data virtualization can reduce time to value. Hence, increasing the overall business value of your data assets.
KEY TAKEAWAYS:
- How a Logical Data Fabric is the right approach to assist organizations to unify their data.
- The advanced features of a Logical Data Fabric that assist with the democratization of data, providing an agile and governed approach to business analytics and data science.
- How a Logical Data Fabric with Data Virtualization enhances your legacy data integration landscape to simplify data access and encourage self-service.
Introduction to Snowflake Datawarehouse and Architecture for Big data company. Centralized data management. Snowpipe and Copy into a command for data loading. Stream loading and Batch Processing.
Microsoft PowerApps and Flow enable any Office 365 user to create mobile apps, electronic forms and workflows. These simple tools enable citizen developers to create business-focused apps that support business processes in the modern digital workplace.
A dive into Microsoft Fabric/AI Solutions offering. For the event: AI, Data, and CRM: Shaping Business through Unique Experiences. By D. Koutsanastasis, Microsoft
The data lake has become extremely popular, but there is still confusion on how it should be used. In this presentation I will cover common big data architectures that use the data lake, the characteristics and benefits of a data lake, and how it works in conjunction with a relational data warehouse. Then I’ll go into details on using Azure Data Lake Store Gen2 as your data lake, and various typical use cases of the data lake. As a bonus I’ll talk about how to organize a data lake and discuss the various products that can be used in a modern data warehouse.
These are slides from an introductory session for Microsoft Azure done at IIT Sri Lanka giving the students hands-on exposure to Microsoft Azure. Introducing them to Azure App Service and Azure Functions.
So you got a handle on what Big Data is and how you can use it to find business value in your data. Now you need an understanding of the Microsoft products that can be used to create a Big Data solution. Microsoft has many pieces of the puzzle and in this presentation I will show how they fit together. How does Microsoft enhance and add value to Big Data? From collecting data, transforming it, storing it, to visualizing it, I will show you Microsoft’s solutions for every step of the way
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
Full review 04.2020 about Azure Data Explorer service. Slide Desk is a sort of review od Kusto, in terms of usage, ingestion techniques, querying and exporting data, using anomaly detection and clustering methods.
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionJames Serra
It can be quite challenging keeping up with the frequent updates to the Microsoft products and understanding all their use cases and how all the products fit together. In this session we will differentiate the use cases for each of the Microsoft services, explaining and demonstrating what is good and what isn't, in order for you to position, design and deliver the proper adoption use cases for each with your customers. We will cover a wide range of products such as Databricks, SQL Data Warehouse, HDInsight, Azure Data Lake Analytics, Azure Data Lake Store, Blob storage, and AAS as well as high-level concepts such as when to use a data lake. We will also review the most common reference architectures (“patterns”) witnessed in customer adoption.
Analytics in a Day Ft. Synapse Virtual WorkshopCCG
Say goodbye to data silos! Analytics in a Day will simplify and accelerate your journey towards the modern data warehouse. Join CCG and Microsoft for a half-day virtual workshop, hosted by James McAuliffe.
1 Introduction to Microsoft data platform analytics for releaseJen Stirrup
Part 1 of a conference workshop. This forms the morning session, which looks at moving from Business Intelligence to Analytics.
Topics Covered: Azure Data Explorer, Azure Data Factory, Azure Synapse Analytics, Event Hubs, HDInsight, Big Data
My Slidedeck about Common Data Service and Model. This technology is under development so content is subject to change and based on current service on 4/13/2018
Today, data lakes are widely used and have become extremely affordable as data volumes have grown. However, they are only meant for storage and by themselves provide no direct value. With up to 80% of data stored in the data lake today, how do you unlock the value of the data lake? The value lies in the compute engine that runs on top of a data lake.
Join us for this webinar where Ahana co-founder and Chief Product Officer Dipti Borkar will discuss how to unlock the value of your data lake with the emerging Open Data Lake analytics architecture.
Dipti will cover:
-Open Data Lake analytics - what it is and what use cases it supports
-Why companies are moving to an open data lake analytics approach
-Why the open source data lake query engine Presto is critical to this approach
The cloud is all the rage. Does it live up to its hype? What are the benefits of the cloud? Join me as I discuss the reasons so many companies are moving to the cloud and demo how to get up and running with a VM (IaaS) and a database (PaaS) in Azure. See why the ability to scale easily, the quickness that you can create a VM, and the built-in redundancy are just some of the reasons that moving to the cloud a “no brainer”. And if you have an on-prem datacenter, learn how to get out of the air-conditioning business!
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Introduces the Microsoft’s Data Platform for on premise and cloud. Challenges businesses are facing with data and sources of data. Understand about Evolution of Database Systems in the modern world and what business are doing with their data and what their new needs are with respect to changing industry landscapes.
Dive into the Opportunities available for businesses and industry verticals: the ones which are identified already and the ones which are not explored yet.
Understand the Microsoft’s Cloud vision and what is Microsoft’s Azure platform is offering, for Infrastructure as a Service or Platform as a Service for you to build your own offerings.
Introduce and demo some of the Real World Scenarios/Case Studies where Businesses have used the Cloud/Azure for creating New and Innovative solutions to unlock these potentials.
We live in a world of unprecedented change. To be successful in this world of change, you will need to develop a data culture, creating an environment where every team and every individual is empowered to do great things because of the data at their fingertips. In this event you will learn how to create a culture of data and how the Microsoft Modern BI platform and tools can help you to can harness the power of data once only reserved for data scientists. Learn about how to tap into the power of natural language, self-service business insights and visualization capabilities – and make insights available to anyone, anywhere, at any time.
Data Lakehouse, Data Mesh, and Data Fabric (r2)James Serra
So many buzzwords of late: Data Lakehouse, Data Mesh, and Data Fabric. What do all these terms mean and how do they compare to a modern data warehouse? In this session I’ll cover all of them in detail and compare the pros and cons of each. They all may sound great in theory, but I'll dig into the concerns you need to be aware of before taking the plunge. I’ll also include use cases so you can see what approach will work best for your big data needs. And I'll discuss Microsoft version of the data mesh.
Data Warehousing Trends, Best Practices, and Future OutlookJames Serra
Over the last decade, the 3Vs of data - Volume, Velocity & Variety has grown massively. The Big Data revolution has completely changed the way companies collect, analyze & store data. Advancements in cloud-based data warehousing technologies have empowered companies to fully leverage big data without heavy investments both in terms of time and resources. But, that doesn’t mean building and managing a cloud data warehouse isn’t accompanied by any challenges. From deciding on a service provider to the design architecture, deploying a data warehouse tailored to your business needs is a strenuous undertaking. Looking to deploy a data warehouse to scale your company’s data infrastructure or still on the fence? In this presentation you will gain insights into the current Data Warehousing trends, best practices, and future outlook. Learn how to build your data warehouse with the help of real-life use-cases and discussion on commonly faced challenges. In this session you will learn:
- Choosing the best solution - Data Lake vs. Data Warehouse vs. Data Mart
- Choosing the best Data Warehouse design methodologies: Data Vault vs. Kimball vs. Inmon
- Step by step approach to building an effective data warehouse architecture
- Common reasons for the failure of data warehouse implementations and how to avoid them
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Power BI has become a product with a ton of exciting features. This presentation will give an overview of some of them, including Power BI Desktop, Power BI service, what’s new, integration with other services, Power BI premium, and administration.
The breath and depth of Azure products that fall under the AI and ML umbrella can be difficult to follow. In this presentation I’ll first define exactly what AI, ML, and deep learning is, and then go over the various Microsoft AI and ML products and their use cases.
Want to see a high-level overview of the products in the Microsoft data platform portfolio in Azure? I’ll cover products in the categories of OLTP, OLAP, data warehouse, storage, data transport, data prep, data lake, IaaS, PaaS, SMP/MPP, NoSQL, Hadoop, open source, reporting, machine learning, and AI. It’s a lot to digest but I’ll categorize the products and discuss their use cases to help you narrow down the best products for the solution you want to build.
Embarking on building a modern data warehouse in the cloud can be an overwhelming experience due to the sheer number of products that can be used, especially when the use cases for many products overlap others. In this talk I will cover the use cases of many of the Microsoft products that you can use when building a modern data warehouse, broken down into four areas: ingest, store, prep, and model & serve. It’s a complicated story that I will try to simplify, giving blunt opinions of when to use what products and the pros/cons of each.
AI for an intelligent cloud and intelligent edge: Discover, deploy, and manag...James Serra
Discover, manage, deploy, monitor – rinse and repeat. In this session we show how Azure Machine Learning can be used to create the right AI model for your challenge and then easily customize it using your development tools while relying on Azure ML to optimize them to run in hardware accelerated environments for the cloud and the edge using FPGAs and Neural Network accelerators. We then show you how to deploy the model to highly scalable web services and nimble edge applications that Azure can manage and monitor for you. Finally, we illustrate how you can leverage the model telemetry to retrain and improve your content.
Power BI for Big Data and the New Look of Big Data SolutionsJames Serra
New features in Power BI give it enterprise tools, but that does not mean it automatically creates an enterprise solution. In this talk we will cover these new features (composite models, aggregations tables, dataflow) as well as Azure Data Lake Store Gen2, and describe the use cases and products of an individual, departmental, and enterprise big data solution. We will also talk about why a data warehouse and cubes still should be part of an enterprise solution, and how a data lake should be organized.
In three years I went from a complete unknown to a popular blogger, speaker at PASS Summit, a SQL Server MVP, and then joined Microsoft. Along the way I saw my yearly income triple. Is it because I know some secret? Is it because I am a genius? No! It is just about laying out your career path, setting goals, and doing the work.
I'll cover tips I learned over my career on everything from interviewing to building your personal brand. I'll discuss perm positions, consulting, contracting, working for Microsoft or partners, hot fields, in-demand skills, social media, networking, presenting, blogging, salary negotiating, dealing with recruiters, certifications, speaking at major conferences, resume tips, and keys to a high-paying career.
Your first step to enhancing your career will be to attend this session! Let me be your career coach!
Is the traditional data warehouse dead?James Serra
With new technologies such as Hive LLAP or Spark SQL, do I still need a data warehouse or can I just put everything in a data lake and report off of that? No! In the presentation I’ll discuss why you still need a relational data warehouse and how to use a data lake and a RDBMS data warehouse to get the best of both worlds. I will go into detail on the characteristics of a data lake and its benefits and why you still need data governance tasks in a data lake. I’ll also discuss using Hadoop as the data lake, data virtualization, and the need for OLAP in a big data solution. And I’ll put it all together by showing common big data architectures.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Learning to present and becoming good at itJames Serra
Have you been thinking about presenting at a user group? Are you being asked to present at your work? Is learning to present one of the keys to advancing your career? Or do you just think it would be fun to present but you are too nervous to try it? Well take the first step to becoming a presenter by attending this session and I will guide you through the process of learning to present and becoming good at it. It’s easier than you think! I am an introvert and was deathly afraid to speak in public. Now I love to present and it’s actually my main function in my job at Microsoft. I’ll share with you journey that lead me to speak at major conferences and the skills I learned along the way to become a good presenter and to get rid of the fear. You can do it!
DocumentDB is a powerful NoSQL solution. It provides elastic scale, high performance, global distribution, a flexible data model, and is fully managed. If you are looking for a scaled OLTP solution that is too much for SQL Server to handle (i.e. millions of transactions per second) and/or will be using JSON documents, DocumentDB is the answer.
First introduced with the Analytics Platform System (APS), PolyBase simplifies management and querying of both relational and non-relational data using T-SQL. It is now available in both Azure SQL Data Warehouse and SQL Server 2016. The major features of PolyBase include the ability to do ad-hoc queries on Hadoop data and the ability to import data from Hadoop and Azure blob storage to SQL Server for persistent storage. A major part of the presentation will be a demo on querying and creating data on HDFS (using Azure Blobs). Come see why PolyBase is the “glue” to creating federated data warehouse solutions where you can query data as it sits instead of having to move it all to one data platform.
Machine learning allows us to build predictive analytics solutions of tomorrow - these solutions allow us to better diagnose and treat patients, correctly recommend interesting books or movies, and even make the self-driving car a reality. Microsoft Azure Machine Learning (Azure ML) is a fully-managed Platform-as-a-Service (PaaS) for building these predictive analytics solutions. It is very easy to build solutions with it, helping to overcome the challenges most businesses have in deploying and using machine learning. In this presentation, we will take a look at how to create ML models with Azure ML Studio and deploy those models to production in minutes.
Big data architectures and the data lakeJames Serra
With so many new technologies it can get confusing on the best approach to building a big data architecture. The data lake is a great new concept, usually built in Hadoop, but what exactly is it and how does it fit in? In this presentation I'll discuss the four most common patterns in big data production implementations, the top-down vs bottoms-up approach to analytics, and how you can use a data lake and a RDBMS data warehouse together. We will go into detail on the characteristics of a data lake and its benefits, and how you still need to perform the same data governance tasks in a data lake as you do in a data warehouse. Come to this presentation to make sure your data lake does not turn into a data swamp!
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
1. Microsoft Fabric
A unified analytics solution for the era of AI
James Serra
Industry Advisor
Microsoft, Federal Civilian
jamesserra3@gmail.com
6/16/23
2. About Me
Microsoft, Data & AI Solution Architect in Microsoft Federal Civilian
At Microsoft for most of the last nine years as a Data & AI Architect , with a brief stop at EY
In IT for 35 years, worked on many BI and DW projects
Worked as desktop/web/database developer, DBA, BI and DW architect and developer, MDM
architect, PDW/APS developer
Been perm employee, contractor, consultant, business owner
Presenter at PASS Summit, SQLBits, Enterprise Data World conference, Big Data Conference
Europe, SQL Saturdays, Informatica World
Blog at JamesSerra.com
Former SQL Server MVP
Author of book “Deciphering Data Architectures: Choosing Between a Modern Data Warehouse,
Data Fabric, Data Lakehouse, and Data Mesh”
3. My upcoming book
- Foundation
- Big data
- Types of data architectures
- Architecture Design Session
- Common data architecture concepts
- Relational Data Warehouse
- Data Lake
- Approaches to Data Stores
- Approaches to Design
- Approaches to Data Modeling
- Approaches to Data Ingestion
- Data Architectures
- Modern Data Warehouse (MDW)
- Data Fabric
- Data Lakehouse
- Data Mesh Foundation
- Data Mesh Adoption
- People, Process, and Technology
- People and process
- Technologies
- Data architectures on Microsoft Azure
First two chapters available now:
Deciphering Data Architectures (oreilly.com)
Table of contents
4. Agenda
What is Microsoft Fabric?
Workspaces and capacities
OneLake
Lakehouse
Data Warehouse
ADF
Power BI / DirectLake
Resources
Not covered:
Real-time analytics
Spark
Data science
Fabric capacities
Billing / Pricing
Reflex / Data Activator
Git integration
Admin monitoring
Purview integration
Data mesh
Copilot
5. Microsoft Fabric does it all—in a unified solution
An end-to-end analytics platform that brings together all the data and analytics tools that
organizations need to go from the data lake to the business user
Data Integration
Data Factory
Data Engineering
Synapse
Data Warehouse
Synapse
Data Science
Synapse
Real Time Analytics
Synapse
Business Intelligence
Power BI
UNIFIED
SaaS product experience
Unified data foundation
OneLake
Observability
Data Activator
Security and governance Compute and storage Business model
6. Onboarding and trials
Sign-on
Navigation model
UX model
Workspace organization
Collaboration experience
Data Lake
Storage format
Data copy for all engines
Security model
CI/CD
Monitoring hub
Data Hub
Governance & compliance
Single…
The Intelligent data foundation
AI Assisted
Shared Workspaces
Universal Compute Capacities
OneSecurity
OneLake
Data
Factory
Synapse Data
Engineering
Synapse Data
Science
Synapse Data
Warehousing
Synapse Real
Time Analytics Power BI
Data
Activator
Microsoft Fabric
The data platform for the era of AI
7. SaaS
Frictionless onboarding
Quick results w/ Intuitive UX
Minimal knobs
Auto optimized
Auto Integrated
Tenant-wide governance
Instant Provisioning
5x5
Centralized security
management
Compliance built-in
Centralized
administration
Success
by Default
5 seconds to signup, 5 minutes to wow
9. Understanding Microsoft Fabric / FAQ
• Think of it as taking the PBI workspace and adding a SaaS version of Synapse to it
• You will wake up one day and PBI workspaces will be automatically migrated to Fabric workspaces: PBI
capacities will become fabric capacities. Your PBI tenant will have the Fabric workloads automatically built-
in
• Aligned to backend fabric capacity. Similar to Power BI capacity – specific amount of compute assigned to it.
A universal bucket of compute. No more Synapse DWU’s, Spark clusters, etc
• Serverless Pool and Dedicated Pool combined into one – no more relational storage or dedicated resources.
Everything is serverless. All about data lakehouse
• No Azure portal, subscriptions, creating storage. User won’t even realize they are using Azure
• Fabric has strong separation between person who buys and pays the bill, with person who builds stuff. In
Azure, the person building the solution has to also have the power to buy
• This is not just for departmental use. It’s not PaaS services (i.e., Synapse) vs Fabric. Fabric is the future.
Fabric is going to run your entire data estate: departmental projects as well as the largest data warehouse,
data lakehouses and data science projects
• One platform for enterprise data professional and citizen developer (next slide)
10. •Quickly tune a custom model by
integrating a model built and trained in
Azure ML in a Spark notebook
•Work faster with the ability to user your
preferred data science frameworks,
languages, and tools
•Bypass engineering dependencies
with the ability to use your preferred no-
code ML Ops to deploy and operate
models in production
•Tap into proven-at-scale models and
services to accelerate your AI
differentiation (AOAI, Cognitive Services,
ONNX integration, etc).
•Avoid slow, progress-stagnating
data wrangling by seamlessly triggering
a workflow that can unlock data
engineering tools and capabilities quickly.
•Accelerate your work with visual and
SQL based tools for self-serve data
transformations and modeling as well as
self-serve tools for reporting, dashboards,
and data visualizations
•Turn data into impact with industry-
leading BI tools and integration with the
apps your people use everyday like
Microsoft 365
•Make more data-driven decisions
with actionable insights and intelligence
in your preferred applications
•Maintain access to all the data you
need, without being overwhelmed by
data ancillary to your role thanks to fine
grain data access management controls
Data Engineers
•Execute faster with the ability to spin up
a Spark VM cluster in seconds, or
configure with familiar experiences like
Git DevOps pipelines for data
engineering artifacts
•Streamline your work with a single
platform to build and operate real-time
analytics pipelines, data lakes, lake
houses, warehouses, marts, and cubes
using your preferred IDE, plug-ins, and
tools.
•Reduce costly data replication and
movement with the ability to produce
base datasets that can serve data analysts
and data scientists without needing to
build pipelines
Supporting experiences:
Data Scientists
Supporting experiences
Data Analysts
Supporting experiences
Data Citizens
Supporting experiences
Serve data via
warehouse or
lakehouse
Serve
transformed
data
Serve insights
via
embedding
Serve data via warehouse or lakehouse
Data Stewards
•Maintain visibility and control of costs with a unified consumption and cost model that provides evergreen spend optics on your end-to-end data estate
•Gain full visibility and governance over your entire analytics estate from data sources and connections to your data lake, to users and their insights
Data Factory
Real-time analytics
Data Warehouse
Data Engineering
Data
Warehouse
Power BI
Real-time
analytics
Data Science Azure ML Power BI Microsoft 365
13. Create fabric capacity
Capacity is a dedicated set of resources reserved for exclusive use. It offers dependable,
consistent performance for your content. Each capacity offers a selection of SKUs, and
each SKU provides different resource tiers for memory and computing power. You pay
for the provisioned capacity whether you use it or not.
A capacity is a quota-based system, and scaling up or down a capacity doesn't involve
provisioning compute or moving data, so it’s instant.
14. Once the capacity is created, we can see the capacity on the Admin portal- Capacity Settings pane under the "Fabric Capacity" tab
Create fabric capacity
19. OneLake for all data 2
“The OneDrive for data”
A single unified logical SaaS data lake for
the whole organization (no silos)
Organize data into domains
Foundation for all Fabric data items
Provides full and open access through
industry standard APIs and formats to any
application (no lock-in)
OneLake
One Copy
One Security
OneLake Data Hub Intelligent data fabric
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
20. One Copy for all computes 4
Real separation of compute and storage
No matter which engine or item you use,
everyone contributes to building the same lake.
Engines are being optimized to work with
Delta Parquet as their native format
Compute powers the applications and
experiences in Fabric. The compute is
separate from the storage.
Multiple compute engines are available, and
all engines can access the same data without
needing to import or export it. You are able
to choose the right engine for the right job.
Non-Fabric engines can also read/write
to the same copy of data using the
ADLS APIs or added through shortcuts
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
T-SQL Spark
Analysis
services
KQL
21. Shortcuts virtualize data across domains and clouds
No data movements or duplication
A shortcut is a symbolic link which points
from one data location to another
Create a shortcut to make data from a
warehouse part of your lakehouse
Create a shortcut within Fabric to consolidate
data across items or workspaces without
changing the ownership of the data. Data can be
reused multiple times without data duplication.
Existing ADLS gen2 storage accounts and
Amazon S3 buckets can be managed
externally to Fabric and Microsoft while still
being virtualized into OneLake with shortcuts
All data is mapped to a unified namespace
and can be accessed using the same APIs
including the ADLS Gen2 DFS APIs
Unified management and governance
Workspace A
Warehouse
Finance
Lakehouse
Customer
360
Workspace B
Lakehouse
Service
telemetry
Warehouse
Business
KPIs
Amazon
Azure
Data
Factory
Synapse Data
Warehousing
Synapse Data
Engineering
Synapse Data
Science
Synapse Real
Time Analytics
Power BI
Data
Activator
23. OneLake Data Hub
Discover, manage and use data in one place
Central location within Fabric to discover,
manage, and reuse data
Data can be easily discovered by its domain
(e.g. Finance) so users can see what matters
for them
Explorer capability to easily browse and find
data by its folder (workspace) hierarchy
Efficient data discovery using search, filter
and sort
26. Lakehouse – Lakehouse mode
Table - This is a virtual view of the managed area in your lake. This is the main container to host
tables of all types (CSV, Parquet, Delta, Managed tables and External tables). All tables, whether
automatically or explicitly created, will show up as a table under the managed area of the Lakehouse.
This area can also include any types of files or folder/subfolder organizations.
Files - This is a virtual view of the unmanaged area in your lake. It can contain any files and
folders/subfolder’s structure. The main distinction between the managed area and the unmanaged
area is the automatic delta table detection process which runs over any folders created in the
managed area. Any delta format files (parquet + transaction log) will be automatically registered as a
table and will also be available from the serving layer (TSQL)
Automatic Table Discovery and Registration
Lakehouse Table Automatic discovery and registration is a feature of the lakehouse that provides a fully managed
file to table experience for data engineers and data scientists. Users can drop a file into the managed area of the
lakehouse and the file will be automatically validated for supported structured formats, which is currently only
Delta tables, and registered into the metastore with the necessary metadata such as column names, formats,
compression and more. Users can then reference the file as a table and use SparkSQL syntax to interact with the
data.
29. Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Lakehouse
Sales
32. Data warehouse
Data Source
Shortcut Enabled
Structured /
Unstructured
Ingestion
Mounts
Pipelines &
Dataflows
Store
Data Warehouse
Transform
Procedures
Expose
PBI
Warehouse
33. Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
1
1 Open standard format in an open
data lake replaces proprietary
formats as the native storage
• First transactional data warehouse natively
embracing an open standard format
• Data is stored in Delta – Parquet with no
vendor lock-in
• Is auto-integrated and auto-optimized with
minimal knobs
• Extends full SQL ecosystem benefits
34. Infinite serverless compute
Open Storage Format
in customer owned Data Lake
Relational Engine
Data
Warehouse
Data
Warehouse
Data
Warehouse
Data
Warehouse
Synapse Data Warehouse
Infinitely scalable and open
Synapse Data Warehouse in Fabric
2
2 Dedicated clusters are replaced by
serverless compute infrastructure
1
• Physical compute resources assigned
within milliseconds to jobs
• Infinite scaling with dynamic resource
allocation tailored to data volume and
query complexity
• Instant scaling up/down with no physical
provisioning involved
• Resource pooling providing significant
efficiencies and pricing
35.
36. Workspaces and capacities accessing OneLake
Each tenant will have only one OneLake, and any tenant can
access files in a OneLake from other tenants via shortcuts
Warehouse
Sales
37. Data Warehouse
Use this to build a relational layer on top of the physical data
in the Lakehouse and expose it to analysis and reporting
tools using T-SQL/TDS end-point.
This offers a transactional data warehouse with T-SQL DML
support, stored procedures, tables, and views
How can I control “bad actor” queries?
Fabric compute is designed to automatically classify queries
to allocate resources and ensure high priority queries (i.e. ETL,
data preparation, and reporting) are not impacted by
potentially poorly written ad hoc queries.
How is the classification for an incoming query determined?
Queries are intelligently classified by a combination of the
source (i.e., pipeline vs. Power BI) and the query type (I.e.,
INSERT vs. SELECT)
Where is the physical storage for the Data Warehouse? All
data for Fabric is stored in OneLake in the open Delta format.
A single COPY of the data is therefore exposed to all the
compute engines of Fabric without needing to move or
duplicate data
41. Why two options?
Delta lake shortcomings:
- No multi-table transactions
- Lack of full T-SQL support (no
updates, limited reads)
- Performance problem for trickle
transactions
46. ADF Review Mapping data flows Wrangling data flows
Data Pipelines Don’t
Exist
Dataflow Gen2
Dataflow Gen1
47. Data Factory in Fabric
What is Dataflows Gen2?
This is the new generation of Dataflows Gen1. Dataflows provide a low-code
interface for ingesting data from 100s of data sources, transforming your data
using 300+ data transformations and loading the resulting data into multiple
destinations such as Azure SQL Databases, Lakehouse, and more
We currently have multiple Dataflows experiences with Power BI Dataflows
Gen1, Power Query Dataflows and ADF Data flows. What is the strategy with
Fabric with these various experiences?
Our goal is to evolve over time with a single Dataflow that combines the ease of
use of PBI, Power Query and the scale of ADF
What is Fabric Pipelines?
Fabric pipelines enable powerful workflow capabilities at cloud-scale. With data
pipelines, you can build complex workflows that can refresh your dataflow, move
PB-size data, and define sophisticated control flow pipelines. Use data pipelines to
build complex ETL and Data factory workflows that can perform a number of
different tasks at scale. Control flow capabilities are built into pipelines that will
allow you to build workflow logic which provides loops and conditional.
50. For best performance you should compress the data using
the VORDER compression method (50%-70% more
compression). Stored this way by ADF by default
51. Should I use Fabric now?
Yes, for prototyping
Yes, if you won’t be in production for several months
You have to be OK with bugs, missing features, and possible performance issues
Don’t use if have hundreds of terabytes
52. If building in Synapse, how to make transition to Fabric smooth?
Do not use dedicated pools, unless needed for serving and performance
Don’t use any stored procedures to modify data in dedicated pools
Use ADF for pipelines and for PowerQuery, and don’t use ADF mapping data flows. Don’t use
Synapse pipelines or mapping data flows
Embrace the data lakehouse architecture
53. Resources
Microsoft Fabric webinar series: https://aka.ms/fabric-webinar-series
New documentation: https://aka.ms/fabric-docs. Check out the tutorials.
Data Mesh, Data Fabric, Data Lakehouse – (video from Toronto Data Professional Community on 2/15/23)
Build videos:
Build 2-day demos
Microsoft Fabric Synapse data warehouse, Q&A
My intro blog on Microsoft Fabric (with helpful links at the bottom)
Fabric notes
Advancing Analytics videos
Ask me Anything (AMA) about Microsoft Fabric!
54. Q & A ?
James Serra, Microsoft, Industry Advisor
Email me at: jamesserra3@gmail.com
Follow me at: @JamesSerra
Link to me at: www.linkedin.com/in/JamesSerra
Visit my blog at: JamesSerra.com
Editor's Notes
Abstract
Microsoft Fabric
Microsoft Fabric is the next version of Azure Data Factory, Azure Data Explorer, Azure Synapse Analytics, and Power BI. It brings all of these capabilities together into a single unified analytics platform that goes from the data lake to the business user in a SaaS-like environment. Therefore, the vision of Fabric is to be a one-stop shop for all the analytical needs for every enterprise. Fabric will cover the complete spectrum of services including data movement, data lake, data engineering, data integration and data science, observational analytics, and business intelligence. With Fabric, there is no need to stitch together different services from multiple vendors. Instead, the customer enjoys end-to-end, highly integrated, single offering that is easy to understand, onboard, create and operate.
This is a hugely important new product that I have spent a ton of hours understanding and simplifying into a deck and demo that I will present to you. This will shortcut your time to upskill on it so you are prepared to answer customer questions. My presentation comes from the angle that you are in the field and are familiar with Azure Synapse and want to know how this differs.
-----------------------
May public preview, Microsoft build
GA by end-of-year
MVP for GA, incremental updates, release features over time
TODO:
can you have multiple workspace per capacities. Show diagram with workspaces all pointing to the same warehouse
dataflow
how do we do snowflake cloning
How will CDC work with no data flows
how do we talk to customers about waiting for GA for Fabric when they need to do something now, go Lakehouse route in Synapse
slide on mounting a dedicated pool
Architecture diagrams on how things are done in Fabric / Use Cases
This is giving new power to pbi users
enterprise solution vs department wide solution slide https://www.jamesserra.com/archive/2022/06/power-bi-as-a-enterprise-solution/
Slide with Synapse missing items
Will copilot be added?
Highlight no more dedicated pools - use serverless
PBI capacities - how to delegate
get email about behind the scenes warehouse
PBI desktop vs Fabric – what features are not yet in Fabric as modeling is there now
Schema drift
Failover
TODO specialist:
all or nothing with users
who talks about it?
Segment out Synapse - I want to build something now, what product do I use? Don't use dedicated pools, but will have a mounting option. Use ADF instead of Synpase piplelines
Missing features in Synapse and what is better
Snowflake compete slides
link to S3
Pricing
Position Synapse today to move seameless into Fabric - migration path
Purview not in here
What about synapse database templates
Demo built?
Ability to request a demo/presentation from PM/GBB who have been doing it
Fluff, but point is I bring real work experience to the session
Microsoft Fabric combines Data Factory, Synapse Analytics, Data Explorer, and Power BI into a single, unified experience, on the cloud. The open and governed data lakehouse foundation is a cost-effective and performance-optimized fabric for business intelligence, machine kerning, and AI workloads at any scale. It is the foundation for migrating and modernizing existing analytics solutions, whether this be data appliances or traditional data warehouses.
Talk Track for Greenfield/Growing Analytics Customers – Microsoft Fabric’s SaaS environment makes it easier to deploy an entire end-to-end analytics engine from the ground up at an accelerated pace. With the solution’s built-in security and governance capabilities, you can be rest assured your data and insights are protected.
Talk Track for Existing Synapse Customers – Microsoft Fabric is an evolution of Azure Synapse. You will still be able to enjoy the benefits and limitless scale of Synapse in an easier to use SaaS solution while adopting new capabilities that enhance your entire analytics approach. And with the addition of Power BI, you can help democratize the ability to uncover insights and create interactive reports across the organization, helping everyone make more data-driven decisions in their everyday work.
Talk Track for Existing Power BI Customers – With Microsoft Fabric, you’ll be able to access new and powerful data tools and services like Azure Synapse within the same user experience you already enjoy with Power BI. You can unify these tools with your disparate data sources in the same environment to establish a single source of truth for all data, driving the ability for everyone to uncover more accurate and consistent insights than before. And instead of having to worry about security concerns of a patchwork analytics estate, you can be rest assured your data is protected with the built-in security and governance capabilities.
It is not possible to share capacities across tenants
https://learn.microsoft.com/en-us/fabric/enterprise/licenses
PBI v-cores will evolve to Compute Units (group of 8 v-cores)
Public preview available March 23rd. Switch in admin console to run the functionality on/off completely. It will be off by default until July 1st, when it will be switched on unless they go into the admin console and say “No, I don’t want to have this switched on starting July 1st”. Can control at the tenant and capacity levels.
The Microsoft Fabric (Preview) trial includes access to the Fabric product experiences and the resources to create and host Fabric items. The Fabric (Preview) trial lasts until Fabric General Availability (GA), unless canceled. After GA, the Fabric (Preview) trial converts to the GA version and is extended for 60 days.
Starting with OneLake itself. OneLake provides you a single data lake for your entire organization.
Users cannot create OneLake storage. OneLake storage (ADLS Gen2) managed by OneLake API is attached to Fabric tenant. When a workspace is created, a folder is created in OneLake storage (ADLS Gen2 behind the scenes) on a customer tenant.
Everyone is able to contribute to the same lake no matter which engine you use.
We are doing a lot a work to optimize our engines to work directly with delta parquet as their native format for tabular data as you can see with T-SQL for Data warehousing and DirectLake mode in Analysis Service for BI.
Think of OneLake as an abstraction layer. You can mount existing ADLS Gen2 to it. Virtualization across many storage account. Maintains a single namespace.
A shortcut is nothing more than a symbolic link which points from one data location to another. Just like you can create shortcuts in Windows or Linux, the data will appear in the shortcut location as if it were physically there.
Today, if you have tables in a data warehouse, which you want you want to make available along side other tables or files in a lakehouse, you will need to copy that data out of warehouse. With OneLake, you simply create a shortcut in the lakehouse pointing to the warehouse. The data will appear in your lakehouse as if you had physically copied it. Since you didn’t copy it, when data is updated in the warehouse, changes are automatically reflected in the lakehouse.
You can also use shortcuts to consolidate data across workspace and domains without changing the ownership of the data. In this example, the workspace B still owns the data. They still have ultimate control over who can access it and how it stays up to date.
Many of you already have existing data lakes stored in ADLS gen2 or in Amazon S3 buckets. These lakes can continue to exist and be managed externally to Fabric.
We have extended shortcuts to include lake outside of OneLake and even outside of Azure so that you can virtualize you existing ADLS gen 2 accounts or Amazon S3 buckets into OneLake.
All data is mapped to the same unified namespace and can be accessed using the same ADLS gen2 APIs even when it is coming from S3.
The Microsoft Fabric Lakehouse analytics scenario makes it so that data can be ingested into OneLake with shortcuts to other clouds repositories, pipelines, and dataflows in order to allow end-users to leverage other data.
Once that data has been pulled into Microsoft Fabric, users can leverage notebooks to transform that data in OneLake and then store them in Lakehouses with medallion structure.
From there, users can begin to analyze and visualize that data with Power BI using the see-through mode or SQL endpoints.
If you don’t see a Lakehouse table in the warehouse (default), check the data format. Only the tables in Delta Lake format are available in the warehouse (default). Parquet, CSV, and other formats cannot be queried using the warehouse (default)
Warehouse mode in the Lakehouse allows a user to transition from the “Lake” view of the Lakehouse (which supports data engineering and Apache Spark) to the “SQL” experiences that a data warehouse would provide, supporting T-SQL. In warehouse mode the user has a subset of SQL commands that can define and query data objects but not manipulate the data. You can perform the following actions in your warehouse(default):
• Query the tables that reference data in your Delta Lake folders in the lake.
• Create views, inline TVFs, and procedures to encapsulate your semantics and business logic in T-SQL.
• Manage permissions on the objects.
Warehouse mode is primarily oriented towards designing your warehouse and BI needs and serving data.
The Data Warehouse analytics scenario takes existing sources that are mounted, while pipelines and dataflows can bring in all other data that is needed.
IT teams can then define and store procedures to transform the data, which is stored as Parquet/Delta Lake files in OneLake.
From there, business users can analyze and visualize data with Power BI, again using the see-through mode or SQL endpoints.
Can more than one capacity be connected to a Datawarehouse, for instance, one to handle data writes and one to handle data reads? Currently, a capacity is assigned to the workspace level and a Data Warehouse is associated to a single workspace. This means all artifacts in the workspace will share the same capacity and all read/write operations will use the same capacity.
Does Fabric Data Warehouse support fine grained access control like row-level security, column-level security, dynamic data masking? These security constructs are not available but are planned for the Fabric Data Warehouse and will integrate with Fabric’s universal security model.
We already support stats, it’s in the docs. Automatic stats on load and on metadata discovery should land soon. Query plans, indexes, SQL RLS etc also will land incrementally
Python, R, Scala
https://learn.microsoft.com/en-us/fabric/get-started/decision-guide-warehouse-lakehouse
Lakehouse: call it delta lake, owned and managed by Spark, customer can update files - it's user owned. Use if customer likes Spark and using files
Warehouse: well structured, SQL front door, transactional guarentees, multi-table transactions. Nobody except the SQL engine can update the files. Use if customer is comfortable with SQL, comes from a relational database world
LDF and MDF are still used behind the scenes. Can query the warehouse from the lakehouse, but can’t do opposite. Data is synced into onelake from LDF and MDF (only INSERT works for now)
Can't support everything SQL supports with the current open source format at Delta, (multi-table transactions, indexing), so have to use SQL engine
Want to get to a point where don't use LDF/MDF files but use delta underneath. Talk about using Iceberg way down the road
Other knobs to turn? What should we expose? Hide things (DMV, explain plan) because can't tune. Performance DBA's - getting rid of another part of your job
Fabric: Can land in bronze zone in warehouse and just use that for all layers to use T-SQL to write
Compared to Synapse: no longer having relational storage and dedicated compute - idea is it’s all done within lake
3 deployment models
How to organize workspaces – dev/test/prod, by orgs, by cost
High concurrency clusters, spark monitoring
Mapping data flows = ADF Data flows
Wrangling data flows = Power Query
Mapping data flows = ADF Data flows
Wrangling data flows = Power Query
Just in the Fabric context, I would say it this way: Fabric Dataflows are the PQ UI with the scale of ADF
the Gen1 vs. Gen2 is just the distinction between what is in PBI and Excel today vs. what is in Fabric
what will be the response to customers who ask how they move ADF Data flows to Fabric?
We are looking at ways to convert to Fabric data flows and work with Partners who can help with the conversions like 1
Since ADF PQ already had cloud scale, why not just move that into Fabric dataflow gen2?
so Fabric Dataflow Gen2 will eventually be an improvement over ADF PQ
DirectLake mode is a groundbreaking new engine capability to analyze very large datasets in Power BI. The technology is based on the idea to load parquet-formatted files directly from a data lake without having to query a Data Warehouse or Lakehouse endpoint and without having to import or duplicate data into a Power BI dataset. DirectLake is a fast path to load the data from the lake straight into the Power BI engine, ready for analysis. It loads the data directly from the files into memory at runtime. Because there is no explicit import process, it is possible to pick up any changes at the source as they occur, thus combining the advantages of DirectQuery and import mode while avoiding their disadvantages. DirectLake can read parquet-formatted delta files, but for best performance you should compress the data using the VORDER compression method (50%-70% more compression).