PSSUG Nov 2012: Big Data with SQL Server

•Download as PPTX, PDF•

1 like•1,148 views

Mark Kromer

Mark Kromer's presentation of Big Data Analytics with Hadoop, Teradata, SQL Server, Tableau, SAS & PowerPivot

Technology

What we’ll (try) to cover tonight

‣ What is Big Data?
‣ The Big Data and Apache Hadoop environment
‣ Big Data Analytics
‣ SQL Server in the Big Data world
‣ How we utilize Big Data @ Razorfish

2

Big Data 101

‣ 3 V’s
‣ Volume – Terabyte records, transactions, tables, files
‣ Velocity – Batch, near-time, real-time (analytics), streams.
‣ Variety – Structures, unstructured, semi-structured, and all the above in a mix
‣ Text Processing
‣ Techniques for processing and analyzing unstructured (and structured) LARGE files
‣ Analytics & Insights
‣ Distributed File System & Programming

Mark’s Big Data Myths

‣ Big Data ≠ NoSQL
‣ NoSQL has similar Internet-scale Web origins of Hadoop stack (Yahoo!,
Google, Facebook, et al) but not the same thing
‣ Facebook, for example, uses Hbase from the Hadoop stack
‣ Big Data ≠ Real Time
‣ Big Data is primarily about batch processing huge files in a distributed manner
and analyzing data that was otherwise too complex to provide value
‣ Use in-memory analytics for real time insights
‣ Big Data ≠ Data Warehouse
‣ I still refer to large multi-TB DWs as “VLDB”
‣ Big Data is about crunching stats in text files for discovery of new patterns and
insights
‣ Use the DW to aggregate and store the summaries of those calculations for
reporting

‣ Batch Processing
‣ Commodity Hardware
‣ Data Locality, no shared storage
‣ Scales linearly
‣ Great for large text file processing, not so great on small files
‣ Distributed programming paradigm

In-Database Analytics (Teradata Aster)
• Because of built-in analytics functions and big data performance, Aster becomes
the data scientist’s sandbox and BI’s big data analytics processor.

Prepackaged Analytics
Functions (including Attribution)

SQL Server Big Data – Data Loading

Amazon HDFS & EMR Data Loading

Amazon S3 Bucket

SQL Server Big Data Environment

‣ SQL Server Database
‣ SQL Server 2008 R2 or 2012 Enterprise Edition
‣ Page Compression
‣ 2012 Columnar Compression on Fact Tables
‣ Clustered Index on all tables
‣ Auto-update Stats Asynch
‣ Partition Fact Tables by month and archive data with sliding window technique
‣ Drop all indexes before nightly ETL load jobs
‣ Rebuild all indexes when ETL completes
‣ SQL Server Analysis Services
‣ SSAS 2008 R2 or 2012 Enterprise Edition
‣ 2008 R2 OLAP cubes partition-aligned with DW
‣ 2012 cubes in-memory tabular cubes
‣ All access through MSMDPUMP or SharePoint

Wrap-up

‣ What is a Big Data approach to Analytics?
‣ Massive scale
‣ Data discovery & research
‣ Self-service
‣ Reporting & BI
‣ Why did we take this Big Data Analytics approach?
‣ Each Web client produces an average of 6 TBs of ICA data in a year
‣ The data in the sources are variable and unstructured
‣ SSIS ETL alone couldn’t keep up or handle complexity
‣ SQL Server 2012 columnstore and tabular SSAS 2012 were key to using SQL
Server for Big Data
‣ With the configs mentioned previously, SQL Server is working great
‣ Analytics on Big Data also requires Big Data Analytics tools
‣ Aster, Tableau, PowerPivot, SAS

What's hot

Database ChoicesLynn Langit

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays

Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club

Azure Big Data StoryLynn Langit

Big data in AzureVenkatesh Narayanan

Modern Data architecture DesignKujambu Murugesan

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks

Serverless data lake architectureMaik Wiesmüller

Get Savvy with SnowflakeMatillion

Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove

Presto for apps deck varada prestoconfOri Reshef

Laboratorio práctico: Data warehouse en la nubeSoftware Guru

TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTom Kerkhove

DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016

Redshift VS BigQueryKostas Pardalis

Azure Data Lake Store and AnalyticsSergio Zenatti Filho

Altis AWS Snowflake PracticeSamanthaSwain7

Building a Data Lake on AWSGary Stafford

C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...DataStax Academy

Unleash the Power of Azure Data Factory - SQL User GroupSergio Zenatti Filho

What's hot (20)

Database Choices

Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"

Eugene Polonichko "Architecture of modern data warehouse"

Azure Big Data Story

Big data in Azure

Modern Data architecture Design

Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...

Serverless data lake architecture

Get Savvy with Snowflake

Integration Monday - Analysing StackExchange data with Azure Data Lake

Presto for apps deck varada prestoconf

Laboratorio práctico: Data warehouse en la nube

TechDays NL 2016 - Building your scalable secure IoT Solution on Azure

DBP-010_Using Azure Data Services for Modern Data Applications

Redshift VS BigQuery

Azure Data Lake Store and Analytics

Altis AWS Snowflake Practice

Building a Data Lake on AWS

C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...

Unleash the Power of Azure Data Factory - SQL User Group

Viewers also liked

Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMark Kromer

Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer

What's new in SQL Server 2012 for philly code camp 2012.1Mark Kromer

Microsoft Event Registration System Hosted on Windows AzureMark Kromer

Big Data in the Cloud with Azure Marketplace ImagesMark Kromer

MEC Data sheetMark Kromer

Big Data with SQL ServerMark Kromer

Pentaho Big Data Analytics with Vertica and HadoopMark Kromer

Anexinet Big Data SolutionsMark Kromer

Big Data in the Real WorldMark Kromer

Pentaho Analytics on MongoDBMark Kromer

Big Data Analytics Projects - Real World with PentahoMark Kromer

Sql server 2012 roadshow masd overview 003Mark Kromer

Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer

Azure vs. amazonOmid Vahdaty

Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer

ETL in the Cloud With Microsoft AzureMark Kromer

Azure cafe marketplace with looker data analyticsMark Kromer

AWS vs Azure - Cloud Services ComparisonAniket Kanitkar

Big Data Analytics in the Cloud with Microsoft AzureMark Kromer

Viewers also liked (20)

Microsoft Cloud BI Update 2012 for SQL Saturday Philly

Philly Code Camp 2013 Mark Kromer Big Data with SQL Server

What's new in SQL Server 2012 for philly code camp 2012.1

Microsoft Event Registration System Hosted on Windows Azure

Big Data in the Cloud with Azure Marketplace Images

MEC Data sheet

Big Data with SQL Server

Pentaho Big Data Analytics with Vertica and Hadoop

Anexinet Big Data Solutions

Big Data in the Real World

Pentaho Analytics on MongoDB

Big Data Analytics Projects - Real World with Pentaho

Sql server 2012 roadshow masd overview 003

Microsoft SQL Server Data Warehouses for SQL Server DBAs

Azure vs. amazon

Big Data Analytics with Hadoop, MongoDB and SQL Server

ETL in the Cloud With Microsoft Azure

Azure cafe marketplace with looker data analytics

AWS vs Azure - Cloud Services Comparison

Big Data Analytics in the Cloud with Microsoft Azure

Similar to PSSUG Nov 2012: Big Data with SQL Server

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services

Building a modern data warehouseJames Serra

Data Lake OverviewJames Serra

Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA

Prague data management meetup 2018-03-27Martin Bém

Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst

A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data

A Tale of Two BI StandardsArcadia Data

IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach

Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs

Transform your DBMS to drive engagement innovation with Big DataAshnikbiz

Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services

Meetup Oracle Database BCN: 2.1 Data Management Trendsavanttic Consultoría Tecnológica

Accelerating Big Data AnalyticsAttunity

Is the traditional data warehouse dead?James Serra

Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra

Next Generation Data Platforms - Deon ThomasThoughtworks

Module 2 - DatalakeLam Le

Microsoft Data Platform - What's includedJames Serra

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics

Similar to PSSUG Nov 2012: Big Data with SQL Server (20)

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...

Building a modern data warehouse

Data Lake Overview

Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...

Prague data management meetup 2018-03-27

Streaming Real-time Data to Azure Data Lake Storage Gen 2

A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes

A Tale of Two BI Standards

IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?

Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES

Transform your DBMS to drive engagement innovation with Big Data

Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2

Meetup Oracle Database BCN: 2.1 Data Management Trends

Accelerating Big Data Analytics

Is the traditional data warehouse dead?

Data Lakehouse, Data Mesh, and Data Fabric (r1)

Next Generation Data Platforms - Deon Thomas

Module 2 - Datalake

Microsoft Data Platform - What's included

New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy

Commit 2024 - Secret Management made easyAlfredo García Lavilla

Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2

What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett

Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited

Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB

Artificial intelligence in cctv survelliance.pptxhariprasad279825

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

Anypoint Exchange: It’s Not Just a Repo!Manik S Magar

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

DevoxxFR 2024 Reproducible Builds with Apache Maven

Commit 2024 - Secret Management made easy

Vertex AI Gemini Prompt Engineering Tips

Nell’iperspazio con Rocket: il Framework Web di Rust!

The Future of Software Development - Devin AI Innovative Approach.pdf

What's New in Teams Calling, Meetings and Devices March 2024

Ensuring Technical Readiness For Copilot in Microsoft 365

Developer Data Modeling Mistakes: From Postgres to NoSQL

Artificial intelligence in cctv survelliance.pptx

Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Human Factors of XR: Using Human Factors to Design XR Systems

Anypoint Exchange: It’s Not Just a Repo!

Connect Wave/ connectwave Pitch Deck Presentation

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024

Streamlining Python Development: A Guide to a Modern Project Setup

My INSURER PTE LTD - Insurtech Innovation Award 2024

PSSUG Nov 2012: Big Data with SQL Server

1. Big Data with SQL Server Philly SQL Server User Group November 2012 Mark Kromer Razorfish BI & Big Data Technology Director http://www.kromerbigdata.com @kromerbigdata @mssqldude

2. What we’ll (try) to cover tonight ‣ What is Big Data? ‣ The Big Data and Apache Hadoop environment ‣ Big Data Analytics ‣ SQL Server in the Big Data world ‣ How we utilize Big Data @ Razorfish 2

3. Big Data 101 ‣ 3 V’s ‣ Volume – Terabyte records, transactions, tables, files ‣ Velocity – Batch, near-time, real-time (analytics), streams. ‣ Variety – Structures, unstructured, semi-structured, and all the above in a mix ‣ Text Processing ‣ Techniques for processing and analyzing unstructured (and structured) LARGE files ‣ Analytics & Insights ‣ Distributed File System & Programming

4. Mark’s Big Data Myths ‣ Big Data ≠ NoSQL ‣ NoSQL has similar Internet-scale Web origins of Hadoop stack (Yahoo!, Google, Facebook, et al) but not the same thing ‣ Facebook, for example, uses Hbase from the Hadoop stack ‣ Big Data ≠ Real Time ‣ Big Data is primarily about batch processing huge files in a distributed manner and analyzing data that was otherwise too complex to provide value ‣ Use in-memory analytics for real time insights ‣ Big Data ≠ Data Warehouse ‣ I still refer to large multi-TB DWs as “VLDB” ‣ Big Data is about crunching stats in text files for discovery of new patterns and insights ‣ Use the DW to aggregate and store the summaries of those calculations for reporting

5. ‣ Batch Processing ‣ Commodity Hardware ‣ Data Locality, no shared storage ‣ Scales linearly ‣ Great for large text file processing, not so great on small files ‣ Distributed programming paradigm

6. Big Data Analytics Web Platform

7. In-Database Analytics (Teradata Aster) • Because of built-in analytics functions and big data performance, Aster becomes the data scientist’s sandbox and BI’s big data analytics processor. Prepackaged Analytics Functions (including Attribution)

8. SQL Server Big Data – Data Loading Amazon HDFS & EMR Data Loading Amazon S3 Bucket

9. SQL Server Big Data Environment ‣ SQL Server Database ‣ SQL Server 2008 R2 or 2012 Enterprise Edition ‣ Page Compression ‣ 2012 Columnar Compression on Fact Tables ‣ Clustered Index on all tables ‣ Auto-update Stats Asynch ‣ Partition Fact Tables by month and archive data with sliding window technique ‣ Drop all indexes before nightly ETL load jobs ‣ Rebuild all indexes when ETL completes ‣ SQL Server Analysis Services ‣ SSAS 2008 R2 or 2012 Enterprise Edition ‣ 2008 R2 OLAP cubes partition-aligned with DW ‣ 2012 cubes in-memory tabular cubes ‣ All access through MSMDPUMP or SharePoint

10. Wrap-up ‣ What is a Big Data approach to Analytics? ‣ Massive scale ‣ Data discovery & research ‣ Self-service ‣ Reporting & BI ‣ Why did we take this Big Data Analytics approach? ‣ Each Web client produces an average of 6 TBs of ICA data in a year ‣ The data in the sources are variable and unstructured ‣ SSIS ETL alone couldn’t keep up or handle complexity ‣ SQL Server 2012 columnstore and tabular SSAS 2012 were key to using SQL Server for Big Data ‣ With the configs mentioned previously, SQL Server is working great ‣ Analytics on Big Data also requires Big Data Analytics tools ‣ Aster, Tableau, PowerPivot, SAS

PSSUG Nov 2012: Big Data with SQL Server

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to PSSUG Nov 2012: Big Data with SQL Server

Similar to PSSUG Nov 2012: Big Data with SQL Server (20)

More from Mark Kromer

More from Mark Kromer (20)

Recently uploaded

Recently uploaded (20)

PSSUG Nov 2012: Big Data with SQL Server