SlideShare a Scribd company logo
1 of 10
Big Data with SQL Server


Philly SQL Server User Group
November 2012



Mark Kromer
Razorfish BI & Big Data Technology Director
http://www.kromerbigdata.com
@kromerbigdata
@mssqldude
What we’ll (try) to cover tonight

‣ What is Big Data?
‣ The Big Data and Apache Hadoop environment
‣ Big Data Analytics
‣ SQL Server in the Big Data world
‣ How we utilize Big Data @ Razorfish




                                               2
Big Data 101

‣ 3 V’s
   ‣ Volume – Terabyte records, transactions, tables, files
   ‣ Velocity – Batch, near-time, real-time (analytics), streams.
   ‣ Variety – Structures, unstructured, semi-structured, and all the above in a mix
‣ Text Processing
   ‣ Techniques for processing and analyzing unstructured (and structured) LARGE files
‣ Analytics & Insights
‣ Distributed File System & Programming
Mark’s Big Data Myths

‣ Big Data ≠ NoSQL
    ‣ NoSQL has similar Internet-scale Web origins of Hadoop stack (Yahoo!,
      Google, Facebook, et al) but not the same thing
    ‣ Facebook, for example, uses Hbase from the Hadoop stack
‣ Big Data ≠ Real Time
    ‣ Big Data is primarily about batch processing huge files in a distributed manner
      and analyzing data that was otherwise too complex to provide value
    ‣ Use in-memory analytics for real time insights
‣ Big Data ≠ Data Warehouse
    ‣ I still refer to large multi-TB DWs as “VLDB”
    ‣ Big Data is about crunching stats in text files for discovery of new patterns and
      insights
    ‣ Use the DW to aggregate and store the summaries of those calculations for
      reporting
‣   Batch Processing
‣   Commodity Hardware
‣   Data Locality, no shared storage
‣   Scales linearly
‣   Great for large text file processing, not so great on small files
‣   Distributed programming paradigm
Big Data Analytics Web Platform
In-Database Analytics (Teradata Aster)
•   Because of built-in analytics functions and big data performance, Aster becomes
    the data scientist’s sandbox and BI’s big data analytics processor.




                                                             Prepackaged Analytics
                                                             Functions (including Attribution)
SQL Server Big Data – Data Loading




Amazon HDFS & EMR          Data Loading




                Amazon S3 Bucket
SQL Server Big Data Environment

‣ SQL Server Database
   ‣   SQL Server 2008 R2 or 2012 Enterprise Edition
   ‣   Page Compression
   ‣   2012 Columnar Compression on Fact Tables
   ‣   Clustered Index on all tables
   ‣   Auto-update Stats Asynch
   ‣   Partition Fact Tables by month and archive data with sliding window technique
   ‣   Drop all indexes before nightly ETL load jobs
   ‣   Rebuild all indexes when ETL completes
‣ SQL Server Analysis Services
   ‣   SSAS 2008 R2 or 2012 Enterprise Edition
   ‣   2008 R2 OLAP cubes partition-aligned with DW
   ‣   2012 cubes in-memory tabular cubes
   ‣   All access through MSMDPUMP or SharePoint
Wrap-up

‣ What is a Big Data approach to Analytics?
   ‣ Massive scale
   ‣ Data discovery & research
   ‣ Self-service
   ‣ Reporting & BI
‣ Why did we take this Big Data Analytics approach?
   ‣ Each Web client produces an average of 6 TBs of ICA data in a year
   ‣ The data in the sources are variable and unstructured
   ‣ SSIS ETL alone couldn’t keep up or handle complexity
   ‣ SQL Server 2012 columnstore and tabular SSAS 2012 were key to using SQL
       Server for Big Data
    ‣ With the configs mentioned previously, SQL Server is working great
‣ Analytics on Big Data also requires Big Data Analytics tools
    ‣ Aster, Tableau, PowerPivot, SAS

More Related Content

What's hot

Database Choices
Database ChoicesDatabase Choices
Database ChoicesLynn Langit
 
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Fwdays
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Lviv Startup Club
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data StoryLynn Langit
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Databricks
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architectureMaik Wiesmüller
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeTom Kerkhove
 
Presto for apps deck varada prestoconf
Presto for apps deck varada prestoconfPresto for apps deck varada prestoconf
Presto for apps deck varada prestoconfOri Reshef
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeSoftware Guru
 
TechDays NL 2016 - Building your scalable secure IoT Solution on Azure
TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTechDays NL 2016 - Building your scalable secure IoT Solution on Azure
TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTom Kerkhove
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applicationsdecode2016
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsSergio Zenatti Filho
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake PracticeSamanthaSwain7
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWSGary Stafford
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...DataStax Academy
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupSergio Zenatti Filho
 

What's hot (20)

Database Choices
Database ChoicesDatabase Choices
Database Choices
 
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
Дмитрий Лавриненко "Big & Fast Data for Identity & Telemetry services"
 
Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"Eugene Polonichko "Architecture of modern data warehouse"
Eugene Polonichko "Architecture of modern data warehouse"
 
Azure Big Data Story
Azure Big Data StoryAzure Big Data Story
Azure Big Data Story
 
Big data in Azure
Big data in AzureBig data in Azure
Big data in Azure
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
Columbia Migrates from Legacy Data Warehouse to an Open Data Platform with De...
 
Serverless data lake architecture
Serverless data lake architectureServerless data lake architecture
Serverless data lake architecture
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Integration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data LakeIntegration Monday - Analysing StackExchange data with Azure Data Lake
Integration Monday - Analysing StackExchange data with Azure Data Lake
 
Presto for apps deck varada prestoconf
Presto for apps deck varada prestoconfPresto for apps deck varada prestoconf
Presto for apps deck varada prestoconf
 
Laboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nubeLaboratorio práctico: Data warehouse en la nube
Laboratorio práctico: Data warehouse en la nube
 
TechDays NL 2016 - Building your scalable secure IoT Solution on Azure
TechDays NL 2016 - Building your scalable secure IoT Solution on AzureTechDays NL 2016 - Building your scalable secure IoT Solution on Azure
TechDays NL 2016 - Building your scalable secure IoT Solution on Azure
 
DBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data ApplicationsDBP-010_Using Azure Data Services for Modern Data Applications
DBP-010_Using Azure Data Services for Modern Data Applications
 
Redshift VS BigQuery
Redshift VS BigQueryRedshift VS BigQuery
Redshift VS BigQuery
 
Azure Data Lake Store and Analytics
Azure Data Lake Store and AnalyticsAzure Data Lake Store and Analytics
Azure Data Lake Store and Analytics
 
Altis AWS Snowflake Practice
Altis AWS Snowflake PracticeAltis AWS Snowflake Practice
Altis AWS Snowflake Practice
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
C* Summit 2013: Optimizing the Public Cloud for Cost and Scalability with Cas...
 
Unleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User GroupUnleash the Power of Azure Data Factory - SQL User Group
Unleash the Power of Azure Data Factory - SQL User Group
 

Viewers also liked

Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMicrosoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMark Kromer
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
 
What's new in SQL Server 2012 for philly code camp 2012.1
What's new in SQL Server 2012 for philly code camp 2012.1What's new in SQL Server 2012 for philly code camp 2012.1
What's new in SQL Server 2012 for philly code camp 2012.1Mark Kromer
 
Microsoft Event Registration System Hosted on Windows Azure
Microsoft Event Registration System Hosted on Windows AzureMicrosoft Event Registration System Hosted on Windows Azure
Microsoft Event Registration System Hosted on Windows AzureMark Kromer
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesMark Kromer
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL ServerMark Kromer
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopMark Kromer
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data SolutionsMark Kromer
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real WorldMark Kromer
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDBMark Kromer
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoMark Kromer
 
Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Mark Kromer
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMark Kromer
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerMark Kromer
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureMark Kromer
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsMark Kromer
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAniket Kanitkar
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureMark Kromer
 

Viewers also liked (20)

Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday PhillyMicrosoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
 
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerPhilly Code Camp 2013 Mark Kromer Big Data with SQL Server
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
 
What's new in SQL Server 2012 for philly code camp 2012.1
What's new in SQL Server 2012 for philly code camp 2012.1What's new in SQL Server 2012 for philly code camp 2012.1
What's new in SQL Server 2012 for philly code camp 2012.1
 
Microsoft Event Registration System Hosted on Windows Azure
Microsoft Event Registration System Hosted on Windows AzureMicrosoft Event Registration System Hosted on Windows Azure
Microsoft Event Registration System Hosted on Windows Azure
 
Big Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace ImagesBig Data in the Cloud with Azure Marketplace Images
Big Data in the Cloud with Azure Marketplace Images
 
MEC Data sheet
MEC Data sheetMEC Data sheet
MEC Data sheet
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
 
Pentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and HadoopPentaho Big Data Analytics with Vertica and Hadoop
Pentaho Big Data Analytics with Vertica and Hadoop
 
Anexinet Big Data Solutions
Anexinet Big Data SolutionsAnexinet Big Data Solutions
Anexinet Big Data Solutions
 
Big Data in the Real World
Big Data in the Real WorldBig Data in the Real World
Big Data in the Real World
 
Pentaho Analytics on MongoDB
Pentaho Analytics on MongoDBPentaho Analytics on MongoDB
Pentaho Analytics on MongoDB
 
Big Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with PentahoBig Data Analytics Projects - Real World with Pentaho
Big Data Analytics Projects - Real World with Pentaho
 
Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003Sql server 2012 roadshow masd overview 003
Sql server 2012 roadshow masd overview 003
 
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAsMicrosoft SQL Server Data Warehouses for SQL Server DBAs
Microsoft SQL Server Data Warehouses for SQL Server DBAs
 
Azure vs. amazon
Azure vs. amazonAzure vs. amazon
Azure vs. amazon
 
Big Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL ServerBig Data Analytics with Hadoop, MongoDB and SQL Server
Big Data Analytics with Hadoop, MongoDB and SQL Server
 
ETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft AzureETL in the Cloud With Microsoft Azure
ETL in the Cloud With Microsoft Azure
 
Azure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analyticsAzure cafe marketplace with looker data analytics
Azure cafe marketplace with looker data analytics
 
AWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services ComparisonAWS vs Azure - Cloud Services Comparison
AWS vs Azure - Cloud Services Comparison
 
Big Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft AzureBig Data Analytics in the Cloud with Microsoft Azure
Big Data Analytics in the Cloud with Microsoft Azure
 

Similar to PSSUG Nov 2012: Big Data with SQL Server

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...Amazon Web Services
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Martin Bém
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Carole Gunst
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesArcadia Data
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI StandardsArcadia Data
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?Torsten Steinbach
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESMatt Stubbs
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Amazon Web Services
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?James Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's includedJames Serra
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...Rittman Analytics
 

Similar to PSSUG Nov 2012: Big Data with SQL Server (20)

AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
AWS Partner Webcast - Analyze Big Data for Consumer Applications with Looker ...
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27Prague data management meetup 2018-03-27
Prague data management meetup 2018-03-27
 
Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2Streaming Real-time Data to Azure Data Lake Storage Gen 2
Streaming Real-time Data to Azure Data Lake Storage Gen 2
 
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data LakesA Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
 
A Tale of Two BI Standards
A Tale of Two BI StandardsA Tale of Two BI Standards
A Tale of Two BI Standards
 
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
IBM THINK 2019 - What? I Don't Need a Database to Do All That with SQL?
 
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKESBig Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
Building Data Warehouses and Data Lakes in the Cloud - DevDay Austin 2017 Day 2
 
Meetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management TrendsMeetup Oracle Database BCN: 2.1 Data Management Trends
Meetup Oracle Database BCN: 2.1 Data Management Trends
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Is the traditional data warehouse dead?
Is the traditional data warehouse dead?Is the traditional data warehouse dead?
Is the traditional data warehouse dead?
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Microsoft Data Platform - What's included
Microsoft Data Platform - What's includedMicrosoft Data Platform - What's included
Microsoft Data Platform - What's included
 
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
New World Hadoop Architectures (& What Problems They Really Solve) for Oracle...
 

More from Mark Kromer

Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxMark Kromer
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesMark Kromer
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mark Kromer
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsMark Kromer
 
Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsMark Kromer
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mark Kromer
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mark Kromer
 
Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFMark Kromer
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryMark Kromer
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Mark Kromer
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFMark Kromer
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Mark Kromer
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFMark Kromer
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Mark Kromer
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryMark Kromer
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300Mark Kromer
 
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2Mark Kromer
 
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1Mark Kromer
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationMark Kromer
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudMark Kromer
 

More from Mark Kromer (20)

Fabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptxFabric Data Factory Pipeline Copy Perf Tips.pptx
Fabric Data Factory Pipeline Copy Perf Tips.pptx
 
Build data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelinesBuild data quality rules and data cleansing into your data pipelines
Build data quality rules and data cleansing into your data pipelines
 
Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22Mapping Data Flows Training deck Q1 CY22
Mapping Data Flows Training deck Q1 CY22
 
Data cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flowsData cleansing and prep with synapse data flows
Data cleansing and prep with synapse data flows
 
Data cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flowsData cleansing and data prep with synapse data flows
Data cleansing and data prep with synapse data flows
 
Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021Mapping Data Flows Training April 2021
Mapping Data Flows Training April 2021
 
Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021Mapping Data Flows Perf Tuning April 2021
Mapping Data Flows Perf Tuning April 2021
 
Data Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADFData Lake ETL in the Cloud with ADF
Data Lake ETL in the Cloud with ADF
 
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power QueryAzure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Wrangling with Power Query
 
Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101Azure Data Factory Data Flow Performance Tuning 101
Azure Data Factory Data Flow Performance Tuning 101
 
Data Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADFData Quality Patterns in the Cloud with ADF
Data Quality Patterns in the Cloud with ADF
 
Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)Azure Data Factory Data Flows Training (Sept 2020 Update)
Azure Data Factory Data Flows Training (Sept 2020 Update)
 
Data quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADFData quality patterns in the cloud with ADF
Data quality patterns in the cloud with ADF
 
Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005Azure Data Factory Data Flows Training v005
Azure Data Factory Data Flows Training v005
 
Data Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data FactoryData Quality Patterns in the Cloud with Azure Data Factory
Data Quality Patterns in the Cloud with Azure Data Factory
 
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Level 300
 
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training V2
 
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flows Training Slides V1
 
ADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview MigrationADF Mapping Data Flow Private Preview Migration
ADF Mapping Data Flow Private Preview Migration
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 

Recently uploaded

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

PSSUG Nov 2012: Big Data with SQL Server

  • 1. Big Data with SQL Server Philly SQL Server User Group November 2012 Mark Kromer Razorfish BI & Big Data Technology Director http://www.kromerbigdata.com @kromerbigdata @mssqldude
  • 2. What we’ll (try) to cover tonight ‣ What is Big Data? ‣ The Big Data and Apache Hadoop environment ‣ Big Data Analytics ‣ SQL Server in the Big Data world ‣ How we utilize Big Data @ Razorfish 2
  • 3. Big Data 101 ‣ 3 V’s ‣ Volume – Terabyte records, transactions, tables, files ‣ Velocity – Batch, near-time, real-time (analytics), streams. ‣ Variety – Structures, unstructured, semi-structured, and all the above in a mix ‣ Text Processing ‣ Techniques for processing and analyzing unstructured (and structured) LARGE files ‣ Analytics & Insights ‣ Distributed File System & Programming
  • 4. Mark’s Big Data Myths ‣ Big Data ≠ NoSQL ‣ NoSQL has similar Internet-scale Web origins of Hadoop stack (Yahoo!, Google, Facebook, et al) but not the same thing ‣ Facebook, for example, uses Hbase from the Hadoop stack ‣ Big Data ≠ Real Time ‣ Big Data is primarily about batch processing huge files in a distributed manner and analyzing data that was otherwise too complex to provide value ‣ Use in-memory analytics for real time insights ‣ Big Data ≠ Data Warehouse ‣ I still refer to large multi-TB DWs as “VLDB” ‣ Big Data is about crunching stats in text files for discovery of new patterns and insights ‣ Use the DW to aggregate and store the summaries of those calculations for reporting
  • 5. Batch Processing ‣ Commodity Hardware ‣ Data Locality, no shared storage ‣ Scales linearly ‣ Great for large text file processing, not so great on small files ‣ Distributed programming paradigm
  • 6. Big Data Analytics Web Platform
  • 7. In-Database Analytics (Teradata Aster) • Because of built-in analytics functions and big data performance, Aster becomes the data scientist’s sandbox and BI’s big data analytics processor. Prepackaged Analytics Functions (including Attribution)
  • 8. SQL Server Big Data – Data Loading Amazon HDFS & EMR Data Loading Amazon S3 Bucket
  • 9. SQL Server Big Data Environment ‣ SQL Server Database ‣ SQL Server 2008 R2 or 2012 Enterprise Edition ‣ Page Compression ‣ 2012 Columnar Compression on Fact Tables ‣ Clustered Index on all tables ‣ Auto-update Stats Asynch ‣ Partition Fact Tables by month and archive data with sliding window technique ‣ Drop all indexes before nightly ETL load jobs ‣ Rebuild all indexes when ETL completes ‣ SQL Server Analysis Services ‣ SSAS 2008 R2 or 2012 Enterprise Edition ‣ 2008 R2 OLAP cubes partition-aligned with DW ‣ 2012 cubes in-memory tabular cubes ‣ All access through MSMDPUMP or SharePoint
  • 10. Wrap-up ‣ What is a Big Data approach to Analytics? ‣ Massive scale ‣ Data discovery & research ‣ Self-service ‣ Reporting & BI ‣ Why did we take this Big Data Analytics approach? ‣ Each Web client produces an average of 6 TBs of ICA data in a year ‣ The data in the sources are variable and unstructured ‣ SSIS ETL alone couldn’t keep up or handle complexity ‣ SQL Server 2012 columnstore and tabular SSAS 2012 were key to using SQL Server for Big Data ‣ With the configs mentioned previously, SQL Server is working great ‣ Analytics on Big Data also requires Big Data Analytics tools ‣ Aster, Tableau, PowerPivot, SAS