Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)
Data virtualization, Data Federation & IaaS with Jboss TeiidAnil Allewar
Enterprise have always grappled with the problem of information silos that needed to be merged using multiple data warehouses(DWs) and business intelligence(BI) tools so that enterprises could mine this disparate data for businessdecisions and strategy. Traditionally this data integration was done with ETL by consolidating multiple DBMS into a single data storage facility.
Data virtualization enables abstraction, transformation, federation, and delivery of data taken from variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration. It allows consuming applications or users to access data from these various sources via a request to a single access point and delivers information-as-a-service (IaaS).
In this presentation, we will explore what data virtualization is and how it differs from the traditional data integration architecture. We’ll also look at validating the data virtualization and federation concepts by working through an example(see videos at the GitHub repo) to federate data across 2 heterogeneous data sources; mySQL and MongoDB using the JBoss Teiid data virtualization platform.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
Data virtualization, Data Federation & IaaS with Jboss TeiidAnil Allewar
Enterprise have always grappled with the problem of information silos that needed to be merged using multiple data warehouses(DWs) and business intelligence(BI) tools so that enterprises could mine this disparate data for businessdecisions and strategy. Traditionally this data integration was done with ETL by consolidating multiple DBMS into a single data storage facility.
Data virtualization enables abstraction, transformation, federation, and delivery of data taken from variety of heterogeneous data sources as if it is a single virtual data source without the need to physically copy the data for integration. It allows consuming applications or users to access data from these various sources via a request to a single access point and delivers information-as-a-service (IaaS).
In this presentation, we will explore what data virtualization is and how it differs from the traditional data integration architecture. We’ll also look at validating the data virtualization and federation concepts by working through an example(see videos at the GitHub repo) to federate data across 2 heterogeneous data sources; mySQL and MongoDB using the JBoss Teiid data virtualization platform.
These are the slides for my talk "An intro to Azure Data Lake" at Azure Lowlands 2019. The session was held on Friday January 25th from 14:20 - 15:05 in room Santander.
Choosing technologies for a big data solution in the cloudJames Serra
Has your company been building data warehouses for years using SQL Server? And are you now tasked with creating or moving your data warehouse to the cloud and modernizing it to support “Big Data”? What technologies and tools should use? That is what this presentation will help you answer. First we will cover what questions to ask concerning data (type, size, frequency), reporting, performance needs, on-prem vs cloud, staff technology skills, OSS requirements, cost, and MDM needs. Then we will show you common big data architecture solutions and help you to answer questions such as: Where do I store the data? Should I use a data lake? Do I still need a cube? What about Hadoop/NoSQL? Do I need the power of MPP? Should I build a "logical data warehouse"? What is this lambda architecture? Can I use Hadoop for my DW? Finally, we’ll show some architectures of real-world customer big data solutions. Come to this session to get started down the path to making the proper technology choices in moving to the cloud.
Microsoft Data Platform - What's includedJames Serra
The pace of Microsoft product innovation is so fast that even though I spend half my days learning, I struggle to keep up. And as I work with customers I find they are often in the dark about many of the products that we have since they are focused on just keeping what they have running and putting out fires. So, let me cover what products you might have missed in the Microsoft data platform world. Be prepared to discover all the various Microsoft technologies and products for collecting data, transforming it, storing it, and visualizing it. My goal is to help you not only understand each product but understand how they all fit together and there proper use case, allowing you to build the appropriate solution that can incorporate any data in the future no matter the size, frequency, or type. Along the way we will touch on technologies covering NoSQL, Hadoop, and open source.
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
This session will focus on how to get from 'Minimum Viable Product' (MVP) to scale. It will also explain how to deal with unpredictable demand and how to build a scalable business. Attend this session to learn how to:
Scale web servers and app services with Elastic Load Balancing and Auto Scaling on Amazon EC2
Scale your storage on Amazon S3 and S3 Reduced Redundancy Storage
Scale your database with Amazon DynamoDB, Amazon RDS, and Amazon ElastiCache
Scale your customer base by reaching customers globally in minutes with Amazon CloudFront
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Many companies today move mountains of data using ETL (extract, transform, load) technology. But data volumes are growing too large to move, customers are now expecting real-time data, and ETL costs now account for 10-15% of computing capacity. In this slide presentation, you can see how data virtualization enables data structures that were designed independently to be leveraged together, in real time, and without data movement, reducing complexity, lowering IT costs, and minimizing risk.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
Enterprises are beginning to consider the deployment of data science and data warehouse platforms on hybrid (public cloud, private cloud, and on premises) infrastructure. This delivers the flexibility and freedom of choice to deploy your analytics anywhere you need it and to create an adaptable and agile analytics platform.
But the market is conspiring against customer desire for innovation...
Leading public cloud vendors are interested in pushing their new, but proprietary, analytic stacks, locking customers into subpar Analytics as a Service (AaaS) for years to come.
In tandem, Legacy Data Warehouse vendors are trying to extend the lifecycle of their costly and aging appliances with new features of marginal value, simply imitating the same limiting models of public cloud vendors.
New vendors are coming up with interesting ideas, but these ideas are often lacking critical features that don’t provide support for hybrid solutions, limiting the immediate value to users.
It is 2017—you can, in fact, have your analytics cake and eat it too! Solve your short term costs and capabilities challenges, and establish a long term hybrid data strategy by running the same open source analytics platform on your infrastructure as it exists today.
In this webinar you will learn how Pivotal can help you build a modern analytical architecture able to run on your public, private cloud, or on-premises platform of your choice, while fully leveraging proven open source technologies and supporting the needs of diverse analytical users.
Let’s have a productive discussion about how to deploy a solid cloud analytics strategy.
Presenter : Jacque Istok, Head of Data Technical Field for Pivotal
https://content.pivotal.io/webinars/jul-20-how-to-build-modern-data-architectures-both-on-premises-and-in-the-cloud
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesJames Serra
Learn how SQL Server can scale to HUNDREDS of terabytes for BI solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned on a PDW implementation and how a successful BI solution was built on top of it using SSAS.
SQL Server 2017 Enhancements You Need To KnowQuest
In this session, database experts Pini Dibask and Jason Hall reveal the lesser-known features that’ll help you improve database performance in record time.
AWS Cloud Kata 2013 | Singapore - Getting to Scale on AWSAmazon Web Services
This session will focus on how to get from 'Minimum Viable Product' (MVP) to scale. It will also explain how to deal with unpredictable demand and how to build a scalable business. Attend this session to learn how to:
Scale web servers and app services with Elastic Load Balancing and Auto Scaling on Amazon EC2
Scale your storage on Amazon S3 and S3 Reduced Redundancy Storage
Scale your database with Amazon DynamoDB, Amazon RDS, and Amazon ElastiCache
Scale your customer base by reaching customers globally in minutes with Amazon CloudFront
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Many companies today move mountains of data using ETL (extract, transform, load) technology. But data volumes are growing too large to move, customers are now expecting real-time data, and ETL costs now account for 10-15% of computing capacity. In this slide presentation, you can see how data virtualization enables data structures that were designed independently to be leveraged together, in real time, and without data movement, reducing complexity, lowering IT costs, and minimizing risk.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
Azure SQL Database Managed Instance is a new flavor of Azure SQL Database that is a game changer. It offers near-complete SQL Server compatibility and network isolation to easily lift and shift databases to Azure (you can literally backup an on-premise database and restore it into a Azure SQL Database Managed Instance). Think of it as an enhancement to Azure SQL Database that is built on the same PaaS infrastructure and maintains all it's features (i.e. active geo-replication, high availability, automatic backups, database advisor, threat detection, intelligent insights, vulnerability assessment, etc) but adds support for databases up to 35TB, VNET, SQL Agent, cross-database querying, replication, etc. So, you can migrate your databases from on-prem to Azure with very little migration effort which is a big improvement from the current Singleton or Elastic Pool flavors which can require substantial changes.
Think of big data as all data, no matter what the volume, velocity, or variety. The simple truth is a traditional on-prem data warehouse will not handle big data. So what is Microsoft’s strategy for building a big data solution? And why is it best to have this solution in the cloud? That is what this presentation will cover. Be prepared to discover all the various Microsoft technologies and products from collecting data, transforming it, storing it, to visualizing it. My goal is to help you not only understand each product but understand how they all fit together, so you can be the hero who builds your companies big data solution.
How to Build Modern Data Architectures Both On Premises and in the CloudVMware Tanzu
Enterprises are beginning to consider the deployment of data science and data warehouse platforms on hybrid (public cloud, private cloud, and on premises) infrastructure. This delivers the flexibility and freedom of choice to deploy your analytics anywhere you need it and to create an adaptable and agile analytics platform.
But the market is conspiring against customer desire for innovation...
Leading public cloud vendors are interested in pushing their new, but proprietary, analytic stacks, locking customers into subpar Analytics as a Service (AaaS) for years to come.
In tandem, Legacy Data Warehouse vendors are trying to extend the lifecycle of their costly and aging appliances with new features of marginal value, simply imitating the same limiting models of public cloud vendors.
New vendors are coming up with interesting ideas, but these ideas are often lacking critical features that don’t provide support for hybrid solutions, limiting the immediate value to users.
It is 2017—you can, in fact, have your analytics cake and eat it too! Solve your short term costs and capabilities challenges, and establish a long term hybrid data strategy by running the same open source analytics platform on your infrastructure as it exists today.
In this webinar you will learn how Pivotal can help you build a modern analytical architecture able to run on your public, private cloud, or on-premises platform of your choice, while fully leveraging proven open source technologies and supporting the needs of diverse analytical users.
Let’s have a productive discussion about how to deploy a solid cloud analytics strategy.
Presenter : Jacque Istok, Head of Data Technical Field for Pivotal
https://content.pivotal.io/webinars/jul-20-how-to-build-modern-data-architectures-both-on-premises-and-in-the-cloud
Overview of Microsoft Appliances: Scaling SQL Server to Hundreds of TerabytesJames Serra
Learn how SQL Server can scale to HUNDREDS of terabytes for BI solutions. This session will focus on Fast Track Solutions and Appliances, Reference Architectures, and Parallel Data Warehousing (PDW). Included will be performance numbers and lessons learned on a PDW implementation and how a successful BI solution was built on top of it using SSAS.
SQL Server 2017 Enhancements You Need To KnowQuest
In this session, database experts Pini Dibask and Jason Hall reveal the lesser-known features that’ll help you improve database performance in record time.
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
Microsoft Azure Data Lake Storage is designed to enable operational and exploratory analytics through a hyper-scale repository. Journey through Azure Data Lake Storage Gen1 with Microsoft Data Platform Specialist, Audrey Hammonds. In this video she explains the fundamentals to Gen 1 and Gen 2, walks us through how to provision a Data Lake, and gives tips to avoid turning your Data Lake into a swamp.
Learn more about Data Lakes with our blog - Data Lakes: Data Agility is Here Now https://bit.ly/2NUX1H6
Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - Strea...Impetus Technologies
Traditional databases and batch ETL operations have not been able to serve the growing data volumes and the need for fast and continuous data processing.
How can modern enterprises provide their business users real-time access to the most up-to-date and complete data?
In our upcoming webinar, our experts will talk about how real-time CDC improves data availability and fast data processing through incremental updates in the big data lake, without modifying or slowing down source systems. Join this session to learn:
What is CDC and how it impacts business
The various methods for CDC in the enterprise data warehouse
The key factors to consider while building a next-gen CDC architecture:
Batch vs. real-time approaches
Moving from just capturing and storing, to capturing enriching, transforming, and storing
Avoiding stopgap silos to state-through processing
Implementation of CDC through a live demo and use-case
You can view the webinar here - https://www.streamanalytix.com/webinar/planning-your-next-gen-change-data-capture-cdc-architecture-in-2019/
For more information visit - https://www.streamanalytix.com
Enabling transparent SQL/SPARQL access to both static and dynamically-computed data
Query languages for databases (e.g., SQL) and knowledge graphs (e.g., SPARQL) provide a concise, declarative, and highly flexible mechanism to access stored data. Yet, many use cases also involve dynamically-computed data available through web APIs or other forms of external services. In such settings, data access is comparatively less flexible (e.g., due to restrictions on available input/output methods), convenient, and sometimes prohibitively slow for users interactively querying data. In this talk, we discuss these problems and present open source solutions that enable querying dynamically-computed data as a “virtual” (since not fully materialized) relational database via SQL, or as a “virtual” knowledge graph via SPARQL, at the same time providing pre-computation and caching solutions to speed up data access. The core components presented in the talk have been developed in the context of the HIVE “Fusion Grant” project and the OntoCRM project, both involving UNIBZ and Ontopic srl. In both projects, we aim at extending virtual knowledge graphs to dynamically-computed data, with a particular focus on applications in the domains of environmental sustainability and climate risk management.
Samedi SQL Québec - La plateforme data de AzureMSDEVMTL
6 juin 2015
Samedi SQL à Québec
Session 3 - Data (SQL Azure, Table et Blob Storage) (Eric Moreau)
SQL Azure est une base de données relationnelle en tant que service, Azure Storage permet de stocker et d'extraire de gros volumes de données non structurées (par exemple, des documents et fichiers multimédias) avec les objets blob Azure ; de données NoSql structurées avec les tables Azure ; de messages fiables avec les files d'attente Azure.
In the spirit of the book 7 Databases in 7 Weeks, Lara Rubbelke and Karen Lopez cover ~seven databases and datastores in the SQL and NoSQL world, when to use them, and how they are SQL-like.
From SQLBitsXV
Notice an error? Let me know. I welcome this sort of feedback.
Microsoft released SQL Azure more than two years ago - that's enough time for testing (I hope!). So, are you ready to move your data to the Cloud? If you’re considering a business (i.e. a production environment) in the Cloud, you need to think about methods for backing up your data, a backup plan for your data and, eventually, restoring with Red Gate Cloud Services. In this session, you’ll see the differences, functionality, restrictions, and opportunities in SQL Azure and On-Premise SQL Server 2008/2008 R2/2012. We’ll consider topics such as how to be prepared for backup and restore, and which parts of a cloud environment are most important: keys, triggers, indexes, prices, security, service level agreements, etc.
Microsoft released SQL Azure more than two years ago - that's enough time for testing (I hope!). So, are you ready to move your data to the Cloud? If you’re considering a business (i.e. a production environment) in the Cloud, you need to think about methods for backing up your data, a backup plan for your data and, eventually, restoring with Red Gate Cloud Services (and not only). In this session, you’ll see the differences, functionality, restrictions, and opportunities in SQL Azure and On-Premise SQL Server 2008/2008 R2/2012. We’ll consider topics such as how to be prepared for backup and restore, and which parts of a cloud environment are most important: keys, triggers, indexes, prices, security, service level agreements, etc.
Similar to Data Integration through Data Virtualization (SQL Server Konferenz 2019) (20)
The Battle of the Data Transformation Tools (PASS Data Community Summit 2023)Cathrine Wilhelmsen
The Battle of the Data Transformation Tools (Presented as part of the "Batte of the Data Transformation Tools" Learning Path at PASS Data Community Summit on November 16th, 2023)
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (PAS...Cathrine Wilhelmsen
Visually Transform Data in Azure Data Factory or Azure Synapse Analytics (Presented as part of the "Batte of the Data Transformation Tools" Learning Path at PASS Data Community Summit on November 15th, 2023)
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power ...Cathrine Wilhelmsen
Building an End-to-End Solution in Microsoft Fabric: From Dataverse to Power BI (Presented at SQLSaturday Oregon & SW Washington on November 11th, 2023)
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (S...Cathrine Wilhelmsen
Stressed, Depressed, or Burned Out? The Warning Signs You Shouldn't Ignore (Presented at SQLBits on March 18th, 2023)
We all experience stress in our lives. When the stress is time-limited and manageable, it can be positive and productive. This kind of stress can help you get things done and lead to personal growth. However, when the stress stretches out over longer periods of time and we are unable to manage it, it can be negative and debilitating. This kind of stress can affect your mental health as well as your physical health, and increase the risk of depression and burnout.
The tricky part is that both depression and burnout can hit you hard without the warning signs you might recognize from stress. Where stress barges through your door and yells "hey, it's me!", depression and burnout can silently sneak in and gradually make adjustments until one day you turn around and see them smiling while realizing that you no longer recognize your house. I know, because I've dealt with both. And when I thought I had kicked them out, they both came back for new visits.
I don't have the Answers™️ or Solutions™️ to how to keep them away forever. But in hindsight, there were plenty of warning signs I missed, ignored, or was oblivious to at the time. In this deeply personal session, I will share my story of dealing with both depression and burnout. What were the warning signs? Why did I miss them? Could I have done something differently? And most importantly, what can I - and you - do to help ourselves or our loved ones if we notice that something is not quite right?
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced ...Cathrine Wilhelmsen
"I can't keep up!" - Turning Discomfort into Personal Growth in a Fast-Paced World (Presented at SQLBits on March 17th, 2023)
Do you sometimes think the world is moving so fast that you're struggling to keep up?
Does it make you feel a little uncomfortable?
Awesome!
That means that you have ambitions. You want to learn new things, take that next step in your career, achieve your goals. You can do anything if you set your mind to it.
It just might not be easy.
All growth requires some discomfort. You need to manage and balance that discomfort, find a way to push yourself a little bit every day without feeling overwhelmed. In a fast-paced world, you need to know how to break down your goals into smaller chunks, how to prioritize, and how to optimize your learning.
Are you ready to turn your "I can't keep up" into "I can't believe I did all of that in just one year"?
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing S...Cathrine Wilhelmsen
Lessons Learned: Implementing Azure Synapse Analytics in a Rapidly-Changing Startup (Presented at SQLBits on March 11th, 2022)
What happens when you mix one rapidly-changing startup, one data analyst, one data engineer, and one hypothesis that Azure Synapse Analytics could be the right tool of choice for gaining business insights?
We had no idea, but we gave it a go!
Our ambition was to think big, start small, and act fast – to deliver business value early and often.
Did we succeed?
Join us for an honest conversation about why we decided to implement Azure Synapse Analytics alongside Power BI, how we got started, which areas we completely messed up at first, what our current solution looks like, the lessons learned along the way, and the things we would have done differently if we could start all over again.
6 Tips for Building Confidence as a Public Speaker (SQLBits 2022)Cathrine Wilhelmsen
6 Tips for Building Confidence as a Public Speaker (Presented at SQLBits on March 10th, 2022)
Do you feel nervous about getting on stage to deliver a presentation?
That was me a few years ago. Palms sweating. Hands shaking. Voice trembling. I could barely breathe and talked at what felt like a thousand words per second. Now, public speaking is one of my favorite hobbies. Sometimes, I even plan my vacations around events! What changed?
There are no shortcuts to building confidence as a public speaker. However, there are many things you can do to make the journey a little easier for yourself. In this session, I share the top tips I have learned over the years. All it takes is a little preparation and practice.
You can do this!
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Data Integration through Data Virtualization (SQL Server Konferenz 2019)
1. Data Integration through
Data Virtualization
Cathrine Wilhelmsen, Inmeta
@cathrinew | cathrinew.net
February 21st 2019
2. Abstract
Data virtualization is an alternative to Extract, Transform and Load (ETL) processes. It handles the
complexity of integrating different data sources and formats without requiring you to replicate or
move the data itself. Save time, minimize effort, and eliminate duplicate data by creating a virtual
data layer using PolyBase in SQL Server.
In this session, we will first go through fundamental PolyBase concepts such as external data sources
and external tables. Then, we will look at the PolyBase improvements in SQL Server 2019. Finally, we
will create a virtual data layer that accesses and integrates both structured and unstructured data
from different sources. Along the way, we will cover lessons learned, best practices, and known
limitations.
45. 1. Install Prerequisites
Microsoft .NET Framework 4.5
Oracle Java SE Runtime Environment (JRE) 7 or 8
2. Install PolyBase
Single Node or Scale-Out Group
3. Enable PolyBase
46. Install Prerequisites
Microsoft .NET Framework 4.5
https://www.microsoft.com/nl-nl/download/details.aspx?id=30653
Oracle Java SE Runtime Environment (JRE) 7 or 8
https://www.oracle.com/technetwork/java/javase/downloads/jre8-downloads-2133155.html
47. Install PolyBase
Note: PolyBase can be installed on only one SQL
Server instance per machine.
Note: After you install PolyBase either standalone
or in a scale-out group, you have to uninstall and
reinstall to change it.
. . . Ask me how I know : )
80. Create Statistics
Note: To create statistics, SQL Server imports the
external data into temp table first. Remember to
choose sampling or full scan.
Note: Updating statistics is not supported. Drop
and re-create instead.
81. Create Statistics
CREATE STATISTICS <StatName>
ON <TableName>(<ColumnName>);
CREATE STATISTICS <StatName>
ON <TableName>(<ColumnName>) WITH FULLSCAN;
87. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Not enough columns in this line.
88. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Too many columns in the line.
89. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Could not find a delimiter after
string delimiter.
90. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Error converting data type NVARCHAR to INT.
91. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Conversion failed when converting the
NVARCHAR value '"0"' to data type BIT.
92. Unexpected error encountered filling record
reader buffer: HadoopExecutionException:
Too long string in column [-1]:
Actual len = [4242]. MaxLEN=[4000]
93. Msg 46518, Level 16, State 12, Line 1:
The type 'nvarchar(max)' is not supported
with external tables.
94. Msg 2717, Level 16, State 2, Line 1:
The size (10000) given to the parameter
exceeds the maximum allowed (4000).
95. Msg 131, Level 15, State 2, Line 1:
The size (10000) given to the column
exceeds the maximum allowed for any data
type (8000).
98. SQL Server 2019 Big Data Clusters
SQL Server, Spark, and HDFS
Scalable clusters of containers
Runs on Kubernetes
99. Kubernetes Pod Kubernetes Pod Kubernetes Pod Kubernetes Pod
SQL Server
Master Instance
SQL Server
HDFS Data Node
SparkSQL Server
HDFS Data Node
Spark SQL Server
HDFS Data Node
Spark SQL Server
HDFS Data Node
Spark
109. Biml 💚 PolyBase
Ben Weissman:
Using Biml to automagically keep your external
polybase tables in sync!
https://www.solisyon.de/biml-polybase-external-tables/
110.
111. Azure Data Studio
1. Install Azure Data Studio
docs.microsoft.com/en-us/sql/azure-data-studio/download
2. Install Extension: SQL Server 2019 (Preview)
docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-extension