This document discusses MS SQL Server 2019's capabilities for big data processing through PolyBase and Big Data Clusters. PolyBase allows SQL queries to join data stored externally in sources like HDFS, Oracle and MongoDB. Big Data Clusters deploy SQL Server on Linux in Kubernetes containers with separate control, compute and storage planes to provide scalable analytics on large datasets. Examples of using these technologies include data virtualization across sources, building data lakes in HDFS, distributed data marts for analysis, and integrated AI/ML tasks on HDFS and SQL data.
This document provides an overview of using Polybase for data virtualization in SQL Server. It discusses installing and configuring Polybase, connecting external data sources like Azure Blob Storage and SQL Server, using Polybase DMVs for monitoring and troubleshooting, and techniques for optimizing performance like predicate pushdown and creating statistics on external tables. The presentation aims to explain how Polybase can be leveraged to virtually access and query external data using T-SQL without needing to know the physical data locations or move the data.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...Stuart Moore
SQL Server Extended Events provide a new event tracing framework that was introduced in SQL Server 2008 to replace SQL Trace and SQL Profiler. Extended Events offer better performance than previous tools by filtering events early and allowing events to be dropped if performance is impacted. Extended Events data is organized into sessions that capture specific events and store the data in targets like ring buffers or files. This allows flexible tracing of various activities and troubleshooting of issues.
This document provides an overview and technical discussion of Membase. It begins with introducing Membase and how it allows both applications and databases to scale horizontally. The rest of the document discusses Membase architecture, deployment options, use cases, and a demo. It also briefly explores developing with Membase and the future direction of NodeCode, which will allow extending Membase through custom modules.
Stretch Database allows migrating historical transactional data from an on-premises SQL Server database transparently to Microsoft Azure cloud storage. It enables seamless queries of data regardless of its location. Some limitations include inability to enforce uniqueness on stretched tables and limitations on allowed actions. Performance can degrade due to the additional overhead of query translation and data movement between on-premises and cloud locations. Remote data files provide an alternative method of archiving to cloud storage without changes to table structures but only overhead is additional latency.
Azure Data Lake Analytics provides a big data analytics service for processing large amounts of data stored in Azure Data Lake Store. It allows users to run analytics jobs using U-SQL, a language that unifies SQL with C# for querying structured, semi-structured and unstructured data. Jobs are compiled, scheduled and run in parallel across multiple Azure Data Lake Analytics Units (ADLAUs). The key components include storage, a job queue, parallelization, and a U-SQL runtime. Partitioning input data improves performance by enabling partition elimination and parallel aggregation of query results.
This document provides an overview of using Polybase for data virtualization in SQL Server. It discusses installing and configuring Polybase, connecting external data sources like Azure Blob Storage and SQL Server, using Polybase DMVs for monitoring and troubleshooting, and techniques for optimizing performance like predicate pushdown and creating statistics on external tables. The presentation aims to explain how Polybase can be leveraged to virtually access and query external data using T-SQL without needing to know the physical data locations or move the data.
Here are the slides for my talk "An intro to Azure Data Lake" at Techorama NL 2018. The session was held on Tuesday October 2nd from 15:00 - 16:00 in room 7.
SQL Server Extended Events presentation from SQL Midlands User Group 14th Mar...Stuart Moore
SQL Server Extended Events provide a new event tracing framework that was introduced in SQL Server 2008 to replace SQL Trace and SQL Profiler. Extended Events offer better performance than previous tools by filtering events early and allowing events to be dropped if performance is impacted. Extended Events data is organized into sessions that capture specific events and store the data in targets like ring buffers or files. This allows flexible tracing of various activities and troubleshooting of issues.
This document provides an overview and technical discussion of Membase. It begins with introducing Membase and how it allows both applications and databases to scale horizontally. The rest of the document discusses Membase architecture, deployment options, use cases, and a demo. It also briefly explores developing with Membase and the future direction of NodeCode, which will allow extending Membase through custom modules.
Stretch Database allows migrating historical transactional data from an on-premises SQL Server database transparently to Microsoft Azure cloud storage. It enables seamless queries of data regardless of its location. Some limitations include inability to enforce uniqueness on stretched tables and limitations on allowed actions. Performance can degrade due to the additional overhead of query translation and data movement between on-premises and cloud locations. Remote data files provide an alternative method of archiving to cloud storage without changes to table structures but only overhead is additional latency.
Azure Data Lake Analytics provides a big data analytics service for processing large amounts of data stored in Azure Data Lake Store. It allows users to run analytics jobs using U-SQL, a language that unifies SQL with C# for querying structured, semi-structured and unstructured data. Jobs are compiled, scheduled and run in parallel across multiple Azure Data Lake Analytics Units (ADLAUs). The key components include storage, a job queue, parallelization, and a U-SQL runtime. Partitioning input data improves performance by enabling partition elimination and parallel aggregation of query results.
Introduction to Azure Data Lake and U-SQL presented at Seattle Scalability Meetup, January 2016. Demo code available at https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis
Please signup for the preview at http://www.azure.com/datalake. Install Visual Studio Community Edition and the Azure Datalake Tools (http://aka.ms/adltoolvs) to use U-SQL locally for free.
This document discusses Qubole's data service for running Hadoop and Hive in the cloud. It provides an overview of Qubole, which allows users to run Hadoop and Hive queries on AWS without having to manage the infrastructure. It describes how Qubole automatically provisions and scales Hadoop clusters on demand based on query load. It also highlights features for optimizing Hive performance when running queries on data stored in S3, such as faster processing of small files, direct writes to S3, and caching data in columnar format.
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
In dieser Session stellen wir anhand eines praktischen Szenarios vor, wie konkrete Aufgabenstellungen mit HDInsight in der Praxis gelöst werden können:
- Grundlagen von HDInsight für Windows Server und Windows Azure
- Mit Windows Azure HDInsight arbeiten
- MapReduce-Jobs mit Javascript und .NET Code implementieren
This document provides an overview of Azure SQL Data Warehouse (SQL DWH), a cloud data warehouse service. It discusses SQL DWH's massively parallel processing (MPP) architecture that allows independent scaling of compute and storage. The document demonstrates how to create a SQL DWH, load data using PolyBase, and use common tools. It is intended to help users understand what SQL DWH is, how it works, and common scenarios it can be used for, such as processing large volumes of data without needing to purchase and manage hardware.
This document provides an overview and agenda for Azure Data Lake. It discusses:
- Azure Data Lake Store, which is a hyper-scale repository for big data analytics workloads that supports unlimited storage of any data type.
- Azure Data Lake Analytics, which is an elastic analytics service built on Apache YARN that processes large amounts of data using the U-SQL language. U-SQL unifies SQL and C# for querying structured, semi-structured and unstructured data.
- Tools for working with Data Lake, including Visual Studio for developing U-SQL queries and managing jobs, and PowerShell for administering Data Lake resources and submitting jobs.
Migrating structured data between Hadoop and RDBMSBouquet
- The document discusses migrating structured data between Hadoop and relational databases using a tool called Bouquet.
- Bouquet allows users to select data from a relational database, which is then sent to Spark via Kafka and stored in HDFS/Tachyon for processing.
- The enriched data in Spark can then be re-injected back into the original database.
Big Data Challenges and How to Overcome Them with Qubole - a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Storing, accessing, and analyzing large amounts of data from diverse sources and making it easily accessible to deliver actionable insights for users can be challenging for data driven organizations. The solution for customers is to optimize scaling and create a unified interface to simplify analysis. Qubole helps customers simplify their big data analytics with speed and scalability, while providing data analysts and scientists self-service access in Cloud. The platform is fully elastic and automatically scales or contracts clusters based on workload. We will try to overview main features, advantages and drawback of this platform.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
This document provides an introduction to Azure SQL Data Warehouse. It discusses the architecture of ASDW including how it is built on Azure SQL Database and Analytics Platform System (APS). It covers various topics like database design, querying, data loading, tooling, and maintenance for ASDW. The goals are to understand the basic infrastructure, learn design/querying/migration methods, and investigate available tooling for automation and monitoring of ASDW.
This document discusses using Sqoop to transfer data between relational databases and Hadoop. It begins by providing context on big data and Hadoop. It then introduces Sqoop as a tool for efficiently importing and exporting large amounts of structured data between databases and Hadoop. The document explains that Sqoop allows importing data from databases into HDFS for analysis and exporting summarized data back to databases. It also outlines how Sqoop works, including providing a pluggable connector mechanism and allowing scheduling of jobs.
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
MariaDB ColumnStore is the analytics engine for MariaDB. This talk will introduce the product, use cases, and also introduce the new features coming in the next major release 1.1.
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
- PostgreSQL and MySQL are both relational database management systems that store data in tables and views and allow users to interact with data using SQL. However, PostgreSQL supports additional features like native NoSQL capabilities, geospatial extensions, and a wider range of programming languages.
- While PostgreSQL and MySQL have similar basic functionality, PostgreSQL has a broader mission beyond the relational model and aims to support multiple data models and a fuller implementation of the SQL standard.
- For projects that may utilize more advanced features over their lifespan, such as NoSQL, geospatial analysis, or custom data types, PostgreSQL may be the better long-term choice compared to MySQL.
This document compares different NoSQL database options and discusses which type may be best for different use cases. It provides an overview of the current NoSQL landscape and models, including key-value, document, graph and wide column stores. Specific databases like Redis, CouchBase, Neo4j and Cassandra are compared based on features like query support, operations, and commercial options. The document recommends choosing a database based on the specific problem and considering aspects like data size, read/write needs, and tradeoffs between consistency, availability and partitioning. It also advocates starting small but with significance and considering hybrid SQL/NoSQL approaches.
The document summarizes new features in SQL Server 2016 SP1, organized into three categories: performance enhancements, security improvements, and hybrid data capabilities. It highlights key features such as in-memory technologies for faster queries, always encrypted for data security, and PolyBase for querying relational and non-relational data. New editions like Express and Standard provide more built-in capabilities. The document also reviews SQL Server 2016 SP1 features by edition, showing advanced features are now more accessible across more editions.
IEEE International Conference on Data Engineering 2015Yousun Jeong
SK Telecom developed a Hadoop data warehouse (DW) solution to address the high costs and limitations of traditional DW systems for handling big data. The Hadoop DW provides a scalable architecture using Hadoop, Tajo and Spark to cost-effectively store and analyze over 30PB of data across 1000+ nodes. It offers SQL analytics through Tajo for faster querying and easier migration from RDBMS systems. The Hadoop DW has helped SK Telecom and other customers such as semiconductor manufacturers to more affordably store and process massive volumes of both structured and unstructured data for advanced analytics.
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...Cloudera, Inc.
Opower is a fast moving energy management SaaS company that collects sensor data from nearly all of the major utilities in the United States–meaning from more than 45 million American households–along with major utilities in 5 countries throughout Europe and AsiaPac. Opower manages more than 100 billion meter reads, ranging from high frequency power data (AMI), smart thermostats data, and weather data. Currently all data at Opower is stored in HBase or Hadoop (and is notably not security sensitive). This discussion will discuss Opower’s HBase architecture, highlight potential and current uses of data in HBase, share the vision of Opower’s future projects and directions, and reveal how Opower’s big data management has allowed the company to help its utility clients save enough energy to power a city of nearly 200,000 people and save utility customers more than $70 million since only 2008!
Visualizing big data in the browser using sparkDatabricks
This document discusses using Spark to enable interactive visualization of big data in the browser. Spark can help address challenges of manipulating large datasets by caching data in memory to reduce latency, increasing parallelism, and summarizing, modeling, or sampling large datasets to reduce the number of data points. The goal is to put visualization back into the normal workflow of data analysis regardless of data size and enable sharing and collaboration through interactive and reproducible visualizations in the browser.
Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.
Event details:
http://www.meetup.com/TorontoHUG/events/150328602/
Introduction to Azure Data Lake and U-SQL presented at Seattle Scalability Meetup, January 2016. Demo code available at https://github.com/Azure/usql/tree/master/Examples/TweetAnalysis
Please signup for the preview at http://www.azure.com/datalake. Install Visual Studio Community Edition and the Azure Datalake Tools (http://aka.ms/adltoolvs) to use U-SQL locally for free.
This document discusses Qubole's data service for running Hadoop and Hive in the cloud. It provides an overview of Qubole, which allows users to run Hadoop and Hive queries on AWS without having to manage the infrastructure. It describes how Qubole automatically provisions and scales Hadoop clusters on demand based on query load. It also highlights features for optimizing Hive performance when running queries on data stored in S3, such as faster processing of small files, direct writes to S3, and caching data in columnar format.
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
In dieser Session stellen wir anhand eines praktischen Szenarios vor, wie konkrete Aufgabenstellungen mit HDInsight in der Praxis gelöst werden können:
- Grundlagen von HDInsight für Windows Server und Windows Azure
- Mit Windows Azure HDInsight arbeiten
- MapReduce-Jobs mit Javascript und .NET Code implementieren
This document provides an overview of Azure SQL Data Warehouse (SQL DWH), a cloud data warehouse service. It discusses SQL DWH's massively parallel processing (MPP) architecture that allows independent scaling of compute and storage. The document demonstrates how to create a SQL DWH, load data using PolyBase, and use common tools. It is intended to help users understand what SQL DWH is, how it works, and common scenarios it can be used for, such as processing large volumes of data without needing to purchase and manage hardware.
This document provides an overview and agenda for Azure Data Lake. It discusses:
- Azure Data Lake Store, which is a hyper-scale repository for big data analytics workloads that supports unlimited storage of any data type.
- Azure Data Lake Analytics, which is an elastic analytics service built on Apache YARN that processes large amounts of data using the U-SQL language. U-SQL unifies SQL and C# for querying structured, semi-structured and unstructured data.
- Tools for working with Data Lake, including Visual Studio for developing U-SQL queries and managing jobs, and PowerShell for administering Data Lake resources and submitting jobs.
Migrating structured data between Hadoop and RDBMSBouquet
- The document discusses migrating structured data between Hadoop and relational databases using a tool called Bouquet.
- Bouquet allows users to select data from a relational database, which is then sent to Spark via Kafka and stored in HDFS/Tachyon for processing.
- The enriched data in Spark can then be re-injected back into the original database.
Big Data Challenges and How to Overcome Them with Qubole - a Self-Service Platform for Big Data Analytics built on Amazon Web Services, Microsoft and Google Clouds. Storing, accessing, and analyzing large amounts of data from diverse sources and making it easily accessible to deliver actionable insights for users can be challenging for data driven organizations. The solution for customers is to optimize scaling and create a unified interface to simplify analysis. Qubole helps customers simplify their big data analytics with speed and scalability, while providing data analysts and scientists self-service access in Cloud. The platform is fully elastic and automatically scales or contracts clusters based on workload. We will try to overview main features, advantages and drawback of this platform.
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
The presentation discusses how to migrate expensive open source big data workloads to Azure and leverage latest compute and storage innovations within Azure Synapse with Azure Data Lake Storage to develop a powerful and cost effective analytics solutions. It shows how you can bring your .NET expertise with .NET for Apache Spark to bear and how the shared meta data experience in Synapse makes it easy to create a table in Spark and query it from T-SQL.
This document provides an introduction to Azure SQL Data Warehouse. It discusses the architecture of ASDW including how it is built on Azure SQL Database and Analytics Platform System (APS). It covers various topics like database design, querying, data loading, tooling, and maintenance for ASDW. The goals are to understand the basic infrastructure, learn design/querying/migration methods, and investigate available tooling for automation and monitoring of ASDW.
This document discusses using Sqoop to transfer data between relational databases and Hadoop. It begins by providing context on big data and Hadoop. It then introduces Sqoop as a tool for efficiently importing and exporting large amounts of structured data between databases and Hadoop. The document explains that Sqoop allows importing data from databases into HDFS for analysis and exporting summarized data back to databases. It also outlines how Sqoop works, including providing a pluggable connector mechanism and allowing scheduling of jobs.
[db tech showcase Tokyo 2017] C37: MariaDB ColumnStore analytics engine : use...Insight Technology, Inc.
MariaDB ColumnStore is the analytics engine for MariaDB. This talk will introduce the product, use cases, and also introduce the new features coming in the next major release 1.1.
Elephants vs. Dolphins: Comparing PostgreSQL and MySQL for use in the DoDJamey Hanson
- PostgreSQL and MySQL are both relational database management systems that store data in tables and views and allow users to interact with data using SQL. However, PostgreSQL supports additional features like native NoSQL capabilities, geospatial extensions, and a wider range of programming languages.
- While PostgreSQL and MySQL have similar basic functionality, PostgreSQL has a broader mission beyond the relational model and aims to support multiple data models and a fuller implementation of the SQL standard.
- For projects that may utilize more advanced features over their lifespan, such as NoSQL, geospatial analysis, or custom data types, PostgreSQL may be the better long-term choice compared to MySQL.
This document compares different NoSQL database options and discusses which type may be best for different use cases. It provides an overview of the current NoSQL landscape and models, including key-value, document, graph and wide column stores. Specific databases like Redis, CouchBase, Neo4j and Cassandra are compared based on features like query support, operations, and commercial options. The document recommends choosing a database based on the specific problem and considering aspects like data size, read/write needs, and tradeoffs between consistency, availability and partitioning. It also advocates starting small but with significance and considering hybrid SQL/NoSQL approaches.
The document summarizes new features in SQL Server 2016 SP1, organized into three categories: performance enhancements, security improvements, and hybrid data capabilities. It highlights key features such as in-memory technologies for faster queries, always encrypted for data security, and PolyBase for querying relational and non-relational data. New editions like Express and Standard provide more built-in capabilities. The document also reviews SQL Server 2016 SP1 features by edition, showing advanced features are now more accessible across more editions.
IEEE International Conference on Data Engineering 2015Yousun Jeong
SK Telecom developed a Hadoop data warehouse (DW) solution to address the high costs and limitations of traditional DW systems for handling big data. The Hadoop DW provides a scalable architecture using Hadoop, Tajo and Spark to cost-effectively store and analyze over 30PB of data across 1000+ nodes. It offers SQL analytics through Tajo for faster querying and easier migration from RDBMS systems. The Hadoop DW has helped SK Telecom and other customers such as semiconductor manufacturers to more affordably store and process massive volumes of both structured and unstructured data for advanced analytics.
HBaseCon 2012 | Overcoming Data Deluge with HBase to Help Save the Environmen...Cloudera, Inc.
Opower is a fast moving energy management SaaS company that collects sensor data from nearly all of the major utilities in the United States–meaning from more than 45 million American households–along with major utilities in 5 countries throughout Europe and AsiaPac. Opower manages more than 100 billion meter reads, ranging from high frequency power data (AMI), smart thermostats data, and weather data. Currently all data at Opower is stored in HBase or Hadoop (and is notably not security sensitive). This discussion will discuss Opower’s HBase architecture, highlight potential and current uses of data in HBase, share the vision of Opower’s future projects and directions, and reveal how Opower’s big data management has allowed the company to help its utility clients save enough energy to power a city of nearly 200,000 people and save utility customers more than $70 million since only 2008!
Visualizing big data in the browser using sparkDatabricks
This document discusses using Spark to enable interactive visualization of big data in the browser. Spark can help address challenges of manipulating large datasets by caching data in memory to reduce latency, increasing parallelism, and summarizing, modeling, or sampling large datasets to reduce the number of data points. The goal is to put visualization back into the normal workflow of data analysis regardless of data size and enable sharing and collaboration through interactive and reproducible visualizations in the browser.
Impala Architecture Presentation at Toronto Hadoop User Group, in January 2014 by Mark Grover.
Event details:
http://www.meetup.com/TorontoHUG/events/150328602/
Azure Synapse Analytics is Azure SQL Data Warehouse evolved: a limitless analytics service, that brings together enterprise data warehousing and Big Data analytics into a single service. It gives you the freedom to query data on your terms, using either serverless on-demand or provisioned resources, at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate business intelligence and machine learning needs. This is a huge deck with lots of screenshots so you can see exactly how it works.
Big SQL provides an SQL interface for querying data stored in Hadoop. It uses a new query engine derived from IBM's database technology to optimize queries. Big SQL allows SQL users easy access to Hadoop data through familiar SQL tools and syntax. It supports creating and loading tables, standard SQL queries including joins and subqueries, and integrating Hadoop data with external databases in a single query.
SQL Server 2016 introduces new features for business intelligence and reporting. PolyBase allows querying data across SQL Server and Hadoop using T-SQL. Integration Services has improved support for AlwaysOn availability groups and incremental package deployment. Reporting Services adds HTML5 rendering, PowerPoint export, and the ability to pin report items to Power BI dashboards. Mobile Report Publisher enables developing and publishing mobile reports.
The new Microsoft Azure SQL Data Warehouse (SQL DW) is an elastic data warehouse-as-a-service and is a Massively Parallel Processing (MPP) solution for "big data" with true enterprise class features. The SQL DW service is built for data warehouse workloads from a few hundred gigabytes to petabytes of data with truly unique features like disaggregated compute and storage allowing for customers to be able to utilize the service to match their needs. In this presentation, we take an in-depth look at implementing a SQL DW, elastic scale (grow, shrink, and pause), and hybrid data clouds with Hadoop integration via Polybase allowing for a true SQL experience across structured and unstructured data.
This document provides an overview of a course on implementing a modern data platform architecture using Azure services. The course objectives are to understand cloud and big data concepts, the role of Azure data services in a modern data platform, and how to implement a reference architecture using Azure data services. The course will provide an ARM template for a data platform solution that can address most data challenges.
This document provides an agenda and summary for a Data Analytics Meetup (DAM) on March 27, 2018. The agenda covers topics such as disruption opportunities in a changing data landscape, transitioning from traditional to modern BI architectures using Azure, Azure SQL Database vs Data Warehouse, data integration with Azure Data Factory and SSIS, Analysis Services, Power BI reporting, and a wrap-up. The document discusses challenges around data growth, digital transformation, and the shrinking time for companies to adapt to disruption. It provides overviews and comparisons of Azure SQL Database, Data Warehouse, and related Azure services to help modernize analytics architectures.
Here are a few questions I have after the material covered so far:
1. What is the difference between the compute pool and app pool in a SQL Server 2019 big data cluster?
2. How does Polybase work differently in a SQL Server 2019 big data cluster compared to a traditional Polybase setup?
3. What components make up the storage pool in a SQL Server 2019 big data cluster?
The document discusses modernizing a data warehouse using the Microsoft Analytics Platform System (APS). APS is described as a turnkey appliance that allows organizations to integrate relational and non-relational data in a single system for enterprise-ready querying and business intelligence. It provides a scalable solution for growing data volumes and types that removes limitations of traditional data warehousing approaches.
Data Integration through Data Virtualization (SQL Server Konferenz 2019)Cathrine Wilhelmsen
Data Integration through Data Virtualization - PolyBase and new SQL Server 2019 Features (Presented at SQL Server Konferenz 2019 on February 21st, 2019)
This document contains the resume of Ashish Agarwal, who has over 5 years of experience designing and delivering enterprise applications. He has expertise in C#, .NET, SQL Server, and SSAS. In his current role at Host Analytics, he has optimized SSAS cube processing, implemented asynchronous cube processing, and improved dynamic report performance through caching and memory optimizations. He holds a B.E. in Computer Science and has received awards for his technical contributions.
Introduction to microsoft sql server 2008 r2Eduardo Castro
In this presentation we review the new features in SQL 2008 R2.
Regards,
Ing. Eduardo Castro Martinez, PhD
http://comunidadwindows.org
http://ecastrom.blogspot.com
MySQL is a widely used open source relational database management system. It has a client-server model and can handle large databases accessed over the web. It is available at little to no cost, is very fast, and supports multiple platforms. PHP and MySQL are commonly used together due to their speed and ease of use for building dynamic database-driven websites.
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
Find out how Hortonworks and IBM help you address these challenges to enable success to optimize your existing EDW environment.
https://hortonworks.com/webinar/modernize-existing-edw-ibm-big-sql-hortonworks-data-platform/
Monica Opris has experience with SQL, SSIS, SSAS, SSRS, and data modeling. She has worked on BI projects in banking, company management, and insurance. Her document outlines her specialties and involvement in projects that include ETL processes, database design, and report development. She advocates moving systems to SQL Server 2012 for improved performance from features like ColumnStore indexing and the new tabular model in Analysis Services.
Similar to Andriy Zrobok "MS SQL 2019 - new for Big Data Processing" (20)
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас...Lviv Startup Club
Artem Bykovets: Чому люди не стають раптово кросс-функціональними, хоча в нас Agile? (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Natalia Renska & Roman Astafiev: Нарциси і психопати в організаціях. Як це вп...Lviv Startup Club
Natalia Renska & Roman Astafiev: Нарциси і психопати в організаціях. Як це впливає на розробку продуктів та реалізацію інноваційних рішень (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Igor Protsenko: Difference between outsourcing and product companies for prod...Lviv Startup Club
Igor Protsenko: Difference between outsourcing and product companies for product managers and related challenges (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Kseniya Leshchenko: Shared development support service model as the way to ma...Lviv Startup Club
Kseniya Leshchenko: Shared development support service model as the way to make small projects with small budgets profitable for the company (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Anna Kompanets: Проблеми впровадження проєктів, про які б ви ніколи не подума...Lviv Startup Club
Anna Kompanets: Проблеми впровадження проєктів, про які б ви ніколи не подумали (UA)
Kyiv PMDay 2024 Summer
Website – www.pmday.org
Youtube – https://www.youtube.com/startuplviv
FB – https://www.facebook.com/pmdayconference
Discover the Beauty and Functionality of The Expert Remodeling Serviceobriengroupinc04
Unlock your kitchen's true potential with expert remodeling services from O'Brien Group Inc. Transform your space into a functional, modern, and luxurious haven with their experienced professionals. From layout reconfiguration to high-end upgrades, they deliver stunning results tailored to your style and needs. Visit obriengroupinc.com to elevate your kitchen's beauty and functionality today.
Prescriptive analytics BA4206 Anna University PPTFreelance
Business analysis - Prescriptive analytics Introduction to Prescriptive analytics
Prescriptive Modeling
Non Linear Optimization
Demonstrating Business Performance Improvement
Efficient PHP Development Solutions for Dynamic Web ApplicationsHarwinder Singh
Unlock the full potential of your web projects with our expert PHP development solutions. From robust backend systems to dynamic front-end interfaces, we deliver scalable, secure, and high-performance applications tailored to your needs. Trust our skilled team to transform your ideas into reality with custom PHP programming, ensuring seamless functionality and a superior user experience.
Best Competitive Marble Pricing in Dubai - ☎ 9928909666Stone Art Hub
Stone Art Hub offers the best competitive Marble Pricing in Dubai, ensuring affordability without compromising quality. With a wide range of exquisite marble options to choose from, you can enhance your spaces with elegance and sophistication. For inquiries or orders, contact us at ☎ 9928909666. Experience luxury at unbeatable prices.
NIMA2024 | De toegevoegde waarde van DEI en ESG in campagnes | Nathalie Lam |...BBPMedia1
Nathalie zal delen hoe DEI en ESG een fundamentele rol kunnen spelen in je merkstrategie en je de juiste aansluiting kan creëren met je doelgroep. Door middel van voorbeelden en simpele handvatten toont ze hoe dit in jouw organisatie toegepast kan worden.
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART KALYAN CHART
Unlocking WhatsApp Marketing with HubSpot: Integrating Messaging into Your Ma...Niswey
50 million companies worldwide leverage WhatsApp as a key marketing channel. You may have considered adding it to your marketing mix, or probably already driving impressive conversions with WhatsApp.
But wait. What happens when you fully integrate your WhatsApp campaigns with HubSpot?
That's exactly what we explored in this session.
We take a look at everything that you need to know in order to deploy effective WhatsApp marketing strategies, and integrate it with your buyer journey in HubSpot. From technical requirements to innovative campaign strategies, to advanced campaign reporting - we discuss all that and more, to leverage WhatsApp for maximum impact. Check out more details about the event here https://events.hubspot.com/events/details/hubspot-new-delhi-presents-unlocking-whatsapp-marketing-with-hubspot-integrating-messaging-into-your-marketing-strategy/
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
AI Transformation Playbook: Thinking AI-First for Your BusinessArijit Dutta
I dive into how businesses can stay competitive by integrating AI into their core processes. From identifying the right approach to building collaborative teams and recognizing common pitfalls, this guide has got you covered. AI transformation is a journey, and this playbook is here to help you navigate it successfully.
𝐔𝐧𝐯𝐞𝐢𝐥 𝐭𝐡𝐞 𝐅𝐮𝐭𝐮𝐫𝐞 𝐨𝐟 𝐄𝐧𝐞𝐫𝐠𝐲 𝐄𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐜𝐲 𝐰𝐢𝐭𝐡 𝐍𝐄𝐖𝐍𝐓𝐈𝐃𝐄’𝐬 𝐋𝐚𝐭𝐞𝐬𝐭 𝐎𝐟𝐟𝐞𝐫𝐢𝐧𝐠𝐬
Explore the details in our newly released product manual, which showcases NEWNTIDE's advanced heat pump technologies. Delve into our energy-efficient and eco-friendly solutions tailored for diverse global markets.
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
Enhancing Adoption of AI in Agri-food: IntroductionCor Verdouw
Introduction to the Panel on: Pathways and Challenges: AI-Driven Technology in Agri-Food, AI4Food, University of Guelph
“Enhancing Adoption of AI in Agri-food: a Path Forward”, 18 June 2024
High-Quality IPTV Monthly Subscription for $15advik4387
Experience high-quality entertainment with our IPTV monthly subscription for just $15. Access a vast array of live TV channels, movies, and on-demand shows with crystal-clear streaming. Our reliable service ensures smooth, uninterrupted viewing at an unbeatable price. Perfect for those seeking premium content without breaking the bank. Start streaming today!
https://rb.gy/f409dk
SATTA MATKA DPBOSS KALYAN MATKA RESULTS KALYAN CHART KALYAN MATKA MATKA RESULT KALYAN MATKA TIPS SATTA MATKA MATKA COM MATKA PANA JODI TODAY BATTA SATKA MATKA PATTI JODI NUMBER MATKA RESULTS MATKA CHART MATKA JODI SATTA COM INDIA SATTA MATKA MATKA TIPS MATKA WAPKA ALL MATKA RESULT LIVE ONLINE MATKA RESULT KALYAN MATKA RESULT DPBOSS MATKA 143 MAIN MATKA KALYAN MATKA RESULTS KALYAN CHART
Satta Matka Dpboss Kalyan Matka Results Kalyan Chart
Andriy Zrobok "MS SQL 2019 - new for Big Data Processing"
1. MS SQL 2019:
Big Data Processing
Andrii Zrobok
Chief Database Developer, EPAM
azrobok@gmail.com
2. Agenda
MS SQL 2019 overview
PolyBase: History, What, Why, Demo
Big Data Cluster
Scenarios
3. About me
25 + years of experience in database development: development data-centric
applications from scratch, support of legacy databases/applications, data migration
tasks, performance tuning, SSIS/ETL tasks, consulting, database trainer, etc.
Databases: FoxPro 2.0 for DOS (Fox Software), MS SQL Server (from version 6.5,
1996), Oracle, Sybase ASE, MySQL, PostgreSQL
Co-leader of Lviv Data Platform UG (PASS Local Chapter) (http://lvivsqlug.pass.org/)
Speaker at:
• PASS SQLSaturday conferences (Lviv, Kyiv, Dnipro, Odessa, Kharkiv; since 2013)
• PASS L’viv/Vinnitsa/Virtual SQL Server User Groups;
• EPAM IT Week 2015-2017
4. Nowadays challenges
Unified access to all your data with unparalleled performance
Easily and securely manage data big and small
Build intelligent Apps and AI with all your data
5. MS SQL 2019 Preview
Windows: Standard version with PolyBase
Linux: Linux version without PolyBase
Docker: Database Engine Container Image (Ubuntu, Red Hat)
Big Data Analytics: Linux container on Kubernetes
https://www.microsoft.com/en-us/sql-server/sql-server-2019#Install
6. PolyBase: What?
SQL Server
PolyBase external tables / external data source
T-SQLApplications Analytics
Microsoft's newest technology for connecting to remote servers.
https://docs.microsoft.com/uk-ua/sql/relational-databases/polybase/polybase-
guide?view=sqlallproducts-allversions
7. PolyBase: History
Introduced in SQL Server Parallel Data Warehouse (PDW) edition, back
in 2010
Expanded in SQL Server Analytics Platform System (APS) in 2012.
Released to the "general public" in SQL Server 2016, with most support
being in Enterprise Edition.
Extended support for additional technologies (like Oracle, MongoDB,
etc.) will be available in SQL Server 2019.
8. PolyBase: Why?
Without PolyBase
Transfer half your data so that all your data was in one format or the other
Query both sources of data, then write custom query logic to join and
integrate the data at the client level.
With PolyBase
using T-SQL to join the data (external table, statistics for external table)
Usage
Querying / Import (into table) / Export (into data storage)
Performance
Use computation on Target server (OPTION (FORCE EXTERNALPUSHDOWN))
9. PolyBase: Demo - tools
1) PolyBase should be installed and enabled
2) Using Management Studio (scripts, no visibility)
OR
3) Using Azure Data Studio + SQL Server 2019 (Preview) Extension
https://docs.microsoft.com/en-us/sql/azure-data-studio/download?view=sql-
server-2017
https://docs.microsoft.com/en-us/sql/azure-data-studio/sql-server-2019-
extension?view=sqlallproducts-allversions
10. PolyBase: Demo - steps
Create master key (needed for password encryption)
Create database scoped credential (access to remote database
server)
Create external data source (address of remote database server)
Create schema for external data (optional)
Create external tables / statistics on external tables
12. PolyBase: select from remote servers
SELECT
e.employee_id,
e.first_name,
e.last_name
,d.department_name
,l.city
,c.country_name
,r.region_name
FROM dbo.employees e
INNER JOIN dbo.departments d ON e.department_id = d.department_id
INNER JOIN dbo.locations l ON d.location_id = l.location_id
INNER JOIN pb_oracle.countries c ON c.country_id = l.country_id
INNER JOIN pb_sqlserver.regions r ON r.region_id = c.region_id
15. PolyBase: externalpushdown
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
select stateprovinceid, count(*) from
pb_sqlserver.address group by stateprovinceid
OPTION (DISABLE EXTERNALPUSHDOWN)
16. PolyBase: Scale – out groups
One node – up to 8 readers
Polybase extends the idea of
Massively Parallel Processing
(MPP) to SQL Server.
SQL Server is a classic "scale-up"
technology: if you want more
power, add more
RAM/CPUs/resources to the
single server.
Hadoop is a great example of an
MPP system: if you want more
power, add more servers; the
system will coordinate
processing.
19. Big data cluster component
Component Description
Control Plane The control plane provides management and security for the cluster.
It contains the Kubernetes master, the SQL Server master instance,
and other cluster-level services such as the Hive Metastore and Spark Driver.
Compute plane The compute plane provides computational resources to the cluster. It contains nodes running
SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for
specific processing tasks. A compute pool can act as a PolyBase scale-out group for
distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata.
Data plane The data plane is used for data persistence and caching. The SQL data pool consists of one or
more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark
jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool
consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the
storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
20. Management
Easy deploy and manage because of benefits of containers and
Kubernetes
Fast to deploy
Self contained (no installations required, images)
Easy upgrade – new image uploading
Scalable, multi-tenant
21. Scenarios: Data virtualization
By leveraging SQL Server
PolyBase SQL Server big data
clusters can query external
data sources without moving or
copying the data
22. Scenarios: Data Lake
A SQL Server big data cluster includes
a scalable HDFS storage pool. This can
be used to store big data, potentially
ingested from multiple external
sources. Once the big data is stored in
HDFS in the big data cluster, you can
analyze and query the data and
combine it with your relational data.
23. Scenarios: Scale-out datamart
SQL Server big data clusters provide
scale-out compute and storage to
improve the performance of analyzing
any data. Data from a variety of
sources can be ingested and
distributed across data pool nodes as a
cache for further analysis.
24. Scenarios: Integrated AI and ML
SQL Server big data clusters enable AI and machine learning tasks on the data
stored in HDFS storage pools and the data pools. You can use Spark as well as
built-in AI tools in SQL Server, using R, Python, Scala, or Java.
25. MS SQL Server 2019 & Big Data Processing
The end
Q&A
THANK YOU
Editor's Notes
Big Data Clusters
The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed File System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour.
Polybase is Microsoft's newest technology for connecting to remote servers. It started by letting you connect to Hadoop and has expanded since then to include Azure Blob Storage. Polybase is also the best method to load data into Azure SQL Data Warehouse. The PolyBase product which was in earlier version too has been expanded. Sql server can now support queries from external sources like Oracle, Teradata, MongoDB which as a result increases the flexibility of the sql server
Polybase lets SQL Server compute nodes talk directly to Hadoop data nodes, perform aggregations, and then return results to the head node. This removes the classic SQL Server single point of contention.
Kubernetes enable you to use the cluster as if it is single PC. You don’t need to care the detail of the infrastructure. Just declare the what you want in yaml file, you will get what you want
Cluster A Kubernetes cluster is a set of machines, known as nodes. One node controls the cluster and is designated the master node; the remaining nodes are worker nodes. The Kubernetes master is responsible for distributing work between the workers, and for monitoring the health of the cluster.
Node A node runs containerized applications. It can be either a physical machine or a virtual machine. A Kubernetes cluster can contain a mixture of physical machine and virtual machine nodes.
Pod A pod is the atomic deployment unit of Kubernetes. A pod is a logical group of one or more containers-and associated resources-needed to run an application. Each pod runs on a node; a node can run one or more pods. The Kubernetes master automatically assigns pods to nodes in the cluster.
In SQL Server big data clusters, Kubernetes is responsible for the state of the SQL Server big data clusters; Kubernetes builds and configures the cluster nodes, assigns pods to nodes, and monitors the health of the cluster.
Big Data Clusters
The latest version simplifies big data analytics for SQL Server users. The new SQL server combines HDFS (the Hadoop Distributed Filing System) and Apache Spark and provides one integrated system. It provides the facility of data virtualization by integrating data without extracting , transforming and loading it. Big data clusters are difficult to deploy but if you have Kubernetes infrastructure, a single command will deploy your big data cluster in about half an hour.
A SQL Server big data cluster is a cluster of Linux containers orchestrated by Kubernetes.
Starting with SQL Server 2019 preview, SQL Server big data clusters allow you to deploy scalable clusters of SQL Server, Spark, and HDFS containers running on Kubernetes. These components are running side by side to enable you to read, write, and process big data from Transact-SQL or Spark, allowing you to easily combine and analyze your high-value relational data with high-volume big data.
Control plane
The control plane provides management and security for the cluster. It contains the Kubernetes master, the SQL Server master instance, and other cluster-level services such as the Hive Metastore and Spark Driver.
Compute plane
The compute plane provides computational resources to the cluster. It contains nodes running SQL Server on Linux pods. The pods in the compute plane are divided into compute pools for specific processing tasks. A compute pool can act as a PolyBase scale-out group for distributed queries over different data sources-such as HDFS, Oracle, MongoDB, or Teradata.
Data plane
The data plane is used for data persistence and caching. It contains the SQL data pool, and storage pool. The SQL data pool consists of one or more pods running SQL Server on Linux. It is used to ingest data from SQL queries or Spark jobs. SQL Server big data cluster data marts are persisted in the data pool. The storage pool consists of storage pool pods comprised of SQL Server on Linux, Spark, and HDFS. All the storage nodes in a SQL Server big data cluster are members of an HDFS cluster.
Data virtualization:
By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data.
Data lake
A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
Data virtualization:
By leveraging SQL Server PolyBase, SQL Server big data clusters can query external data sources without moving or copying the data. SQL Server 2019 preview introduces new connectors to data sources.
Data lake
A SQL Server big data cluster includes a scalable HDFS storage pool. This can be used to store big data, potentially ingested from multiple external sources. Once the big data is stored in HDFS in the big data cluster, you can analyze and query the data and combine it with your relational data.
Scale-out data mart
SQL Server big data clusters provide scale-out compute and storage to improve the performance of analyzing any data. Data from a variety of sources can be ingested and distributed across data pool nodes as a cache for further analysis.
Integrated AI and Machine Learning
SQL Server big data clusters enable AI and machine learning tasks on the data stored in HDFS storage pools and the data pools. You can use Spark as well as built-in AI tools in SQL Server, using R, Python, Scala, or Java.