0
BIG DATA ANALYTICS & DATAWAREHOUSINGBreakthrough Insight
THE FANTASTIC 12 OF 2012        1         2        3    4        5         6        7    8        9        10        11   12
SCALABLE DATA WAREHOUSING & ANALYTICSGAIN SCALE & FLEXIBILITY FOR MASSIVE SCALE AT LOW COST
SOFTWARESQL Server 2012: Self-build Data WarehouseIdeal for custom data marts or small to mid-sized datawarehousesProven t...
REFERENCE ARCHITECTURESFast Track Data Warehouse: Guided-build DW (Balanced Optimizations)Tuned and optimized for data war...
APPLIANCESParallel Data Warehouse: Pre-built DW (Massive Scale at Low Cost)Handles midrange - largest data warehousing sca...
FASTEST TIME TO SOLUTION AT LOWEST COSTLowest Total Cost of OwnershipSave hundreds of thousands on new license and mainten...
FAST TRACK DATA WAREHOUSEAccelerate your data warehouse road mapBenefitsTuned and optimized for data warehousingRapid depl...
FAST TRACK 4.0 – THROUGHPUT & CAPACITY
LATEST HARDWARE FROM MAJOR VENDORSChoice, Flexibility, And Value
DOUBLING OF THROUGHPUT IN BENCHMARK TESTSQL Server 2012 Columnstore IndexesTest SystemFast Track Data Warehouse 3.0 HP DL3...
DOUBLING OF THROUGHPUT IN BENCHMARK TESTSQL Server 2012 Columnstore Indexes Cont...SQL Server 2012 ColumnStore Index Scena...
cMLC* $/GB NOW SUPERIOR TO SAS HDD FOR DW
eMLC* IS COMPETITIVE EVEN AT $6/GB LIST
LOGICAL ARCHITECTURE
CONTROL NODE
MANAGEMENT NODE
MANAGEMENT NODE
LANDING ZONE
BACKUP NODE
BACKUP NODE
“PDW NODE”
COMPUTE NODES
STORAGE NODESDual fiber channel controllersActive/ActiveProvides fault tolerance
STORAGE NODE – PHYSICAL FILE LAYOUT
DATA STRATEGIES IN PDW
DATA LAYOUT APPROACHESReplicated: A table structure that exists as a full copy within eachdiscrete PDW Node.Distributed: A...
CREATING A DISTRIBUTED TABLE
PDW SOFTWARE ARCHITECTURE
DSQL – DISTRIBUTION INCOMPATIBILITYA DSQL query that requires redistribution of data betweenDBMS instances within an appli...
POWER OF PDW
INTEGRATION WITH PDWSQL Server – Remote Table CopyOnly MPP-to-SMP supportedMust be co-located and on same Infiniband netwo...
ADMIN CONSOLE
WHAT IS NEW IN PDW APPLIANCE UPDATE 3 (AU3)
PDW AU3 OFFERS HIGH VALUE TO CUSTOMERS
AU3 SHELL DB ENABLES COST BASED OPTIMIZATION
COST OPTIMIZATION – PERFORMANCE IMPROVEMENTAU2 to AU3
THEME: PERFORMANCE AT SCALEZero data conversions in data movement                                         AU2   AU3
UPDATING STATISTICS ON PDW
7
BI SCENARIO: PDW AS A ‘DATA HUB’MPP ‘data hub’Fast and parallel feeding of data marts (DMs) via Infiniband  CREATE REMOTE ...
EXAMPLE AGGREGATED TABLE:
QUERYING AND DATA ANALYSIS BEST PRACTICESObserved data analysis pattern
Dimensional properties to change
PDW KEYS
ADDING INFINIBAND TO SSAS
MAX NUMBER OF CONNECTIONS
FINAL RESULT – 555K ROWS/SEC
PDW KEYS
CUSTOMER SUCCESSES – CONT’DHow are customers using PDW for BI ?Data Volume36 TB data warehouse analyzing data from transac...
HP BUSINESS DATA WAREHOUSE APPLIANCE
HP HARDWAREServerHP ProLiant DL370 G6 x 5670 (4U)2x Westmere processors (12 cores)96 GB of RAMStorage24 x internal SFF SAS...
SYSTEM VERIFICATION
WINDOWS MOUNT POINT VERIFICATION
DATABASE CONFIGURATION
SYSTEM CENTER OPERATIONS MANAGERAppliance Management PackExtents FragmentationTOP 20 large tablesFragmentation Thresholds ...
DIAGNOSTIC TASK AND MONITOR HISTORYFragmentation                                      1
DIAGNOSTIC TASK AND MONITOR HISTORYFragmentation (Details)
APPLIANCE DISK LAYOUT
CONFIGURE APPLIANCE & DOMAIN
CONFIGURE SQL
SETUP COMPLETE
FACTORY RESET
MDW APPLIANCE MANAGEMENT PACKMonitoringExtents FragmentationTOP 20 large tablesFragmentation Thresholds   Avg Fragment Siz...
MDW APPLIANCE DIAGRAM VIEW
DIAGNOSTIC TASK AND MONITOR HISTORY - FRAGMENTATION
DIAGNOSTIC TASK AND MONITOR HISTORY - FRAGMENTATION (DETAILS)
SO HOW DOES IT WORK?First, store the data
SO HOW DOES IT WORK?Second, take the processing to the data
HOW DO WE STORE BIG DATA?
HOW DO WE PROCESS BIG DATA?
MAPREDUCE – WORKFLOWA Map Reduce job usually splitsthe input data-set into independentchunks which are processed by themap...
HADOOP ARCHITECTURE & NEW PROGRAMMING
THE HADOOP ECOSYSTEM (SIMPLIFIED)
MICROSOFT’’S APPROACH TO BIG DATAInsights to all users by activating new types of data
HADOOP ON …
BIG DATA: ENTERPRISE-READYIntegrated with Leading DW Performance and ScaleBenefitsSimplicity and manageability of Windows ...
BIG DATA: CONNECTED TO THE WORLD’S DATACombines Internal and External Data and ServicesBenefitsStronger customer relations...
BIG DATA: INSIGHTS FOR EVERYONEAnalytics to All Users Through Familiar ToolsBenefitsAnalyze Hadoop data in ExcelReduced ti...
BIG DATA: OPEN AND FLEXIBLEDevelop Once, Deploy On-Premises or in the CloudBenefitsDeployment choice on-premises or in the...
MICROSOFT HADOOP STRATEGY
HADOOP ON WINDOWS
APACHE HADOOP AS A SERVICE ON AZURE
BIG DATA CUSTOMERS
WEB / SOCIAL
YAHOO! TAO PLATFORM
KLOUT’S BIG DATA PROBLEM
EVENT TRACKER ARCHITECTURE                                {                    event_log                                "p...
HEALTHCAREOften a laggard in technologyYet, application of technology will be revolutionary tounderstanding the human syst...
UNIVERSITY OF DUNDEE PROTEOMICS
UNIVERSITY OF DUNDEE PROTEOMICS
HEALTHCAREKey ScenariosClinical trials: not just examining existing drugs and efficacy, but alsopotential deviations E.g. ...
HEALTHCARE: RHIOs
GOVERNMENT & UTILITIES
GOVERNMENT & UTILITIESHard to work with (personnel, lengthy engagements,bureaucracy, etc.)Lots of standards and compliance...
GOVERNMENT & UTILITIESThanks to Greg Morning and Larry CochraneEvaluating consumer decisions and sentiment for green energ...
EVENT PROCESSING AND REAL TIME
OIL & GAS
OIL & GASSeismic Data ProcessingA lot of this data is processed based on 1950s seismic algorithmsChevron has a 3000 node L...
FINANCIAL
FINANCIALNatural extension of web analyticsWe built a web site, now let’s make some moneyFraud Analytics | Position, Trigg...
Schema on ReadThe schema is not defined until data is queriedMore exploratory, requires domain knowledgeGoal is to find ne...
CLOUD
ON-PREMISE
HYBRID
FINANCIAL RISK
OIL & GAS – WELL-HEAD   Sources                     Visualize                                                             ...
EVENT-DRIVEN FEEDBACK
ANALYSIS, CONSUMPTION
VISUALIZATION
1
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be...
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark
Upcoming SlideShare
Loading in...5
×

Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark

964

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
964
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
35
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Sql server 2012_sp1_12_of_12_big_data_analytics_and_data_warehousing_level300_dark"

  1. 1. BIG DATA ANALYTICS & DATAWAREHOUSINGBreakthrough Insight
  2. 2. THE FANTASTIC 12 OF 2012 1 2 3 4 5 6 7 8 9 10 11 12
  3. 3. SCALABLE DATA WAREHOUSING & ANALYTICSGAIN SCALE & FLEXIBILITY FOR MASSIVE SCALE AT LOW COST
  4. 4. SOFTWARESQL Server 2012: Self-build Data WarehouseIdeal for custom data marts or small to mid-sized datawarehousesProven technologyMost integrated out-of-the-box BI solution with no extra feesControl costs with no hardware vendor lock-inLarge partner ecosystemBlazing-fast performance with xVelocity in-memorytechnologiesSupport for up to 256 logical coresTable partitioning scales to 15,000 partitions
  5. 5. REFERENCE ARCHITECTURESFast Track Data Warehouse: Guided-build DW (Balanced Optimizations)Tuned and optimized for data warehousingEliminate guesswork to build HW box and balance CPU, IO, Storagewith SWNext-generation performance with xVelocityRequired 9’s of availabilityAll the features of SQL Server 2012 xVelocity in-memory technologies for performance AlwaysOn for mission critical availabilityLatest generation of hardwareBest practices guide to build HW, install, configure, tune SWChoice of storage with traditional HDDs or SSDsChoose from 7 industry-standard hardware vendors including HP, Dell,IBM, EMC, Cisco, Nimbus, XIOConfigurations as low as $8K per TBScale from 5 to 95 TBMajor components of BI and EIM included in purchase
  6. 6. APPLIANCESParallel Data Warehouse: Pre-built DW (Massive Scale at Low Cost)Handles midrange - largest data warehousing scale requirementsFastest time to solution (plug and play) – shipped to your door HW + SW pre-tested, integrated, configured Hardware pre-built (SAN, switches, cables, drives, adapters, CPU)Out-of-the-box integration with Microsoft BI solutionInteroperable with Informatica, Microstrategy, SAP BOUnstructured or semi-structured data from various sources80-600+ TB with up to 4 full racks from under $12K per TB0-80 TB half racks availableHighest level of co-engineering with open, industry standard HWpartnersIntegrated premier support for entire appliance (HW + SW)22 Intel Westmere procs / rack; 132 physical cores / rackMassively Parallel Processing (MPP) for most powerful distributedcomputing and scaleConnector to Apache Hadoop with SQOOP (SQL to Hadoop)
  7. 7. FASTEST TIME TO SOLUTION AT LOWEST COSTLowest Total Cost of OwnershipSave hundreds of thousands on new license and maintenancecosts migrating to SQL ServerSave 450% on ongoing administration cost than leading vendorsGet deployed immediately with choice of form factorsDeploy on open industry hardware without vendor lock-in thatyour IT staff already knowsChoice of form factors – Reference Architectures or AppliancesCo-developed with HW – pre-tested, pre-configured, pre-tunedOpen industry standard hardware – Intel x86Installs 4x faster – than other leading vendorsLowest cost of ownership – maintenance, IT staffing, installation,configuration, operationsLowest cost of acquisition – price/performance
  8. 8. FAST TRACK DATA WAREHOUSEAccelerate your data warehouse road mapBenefitsTuned and optimized for data warehousingRapid deploymentComplete out-of-the-box BI solution with no extra feesBlazing-fast performance with ColumnStore IndexChoice of storage with traditional HDDs or SSDsKey FeaturesConfigurations as low as $11K per TBScale from 4 to 80 TBChoose from 11 industry-standard hardware vendorsincluding Dell, HP, Bull, IBM, EMC & moreAll the features of SQL Server 2012
  9. 9. FAST TRACK 4.0 – THROUGHPUT & CAPACITY
  10. 10. LATEST HARDWARE FROM MAJOR VENDORSChoice, Flexibility, And Value
  11. 11. DOUBLING OF THROUGHPUT IN BENCHMARK TESTSQL Server 2012 Columnstore IndexesTest SystemFast Track Data Warehouse 3.0 HP DL380 reference architectureSQL Server 2008 R220 TB data warehouse2.4 GB/s throughput on 1TB benchmark schemaConfiguration ChangesUpgrade to SQL Server 2012Apply ColumnStore indexes to benchmark schemaNo query changesNo hardware changes
  12. 12. DOUBLING OF THROUGHPUT IN BENCHMARK TESTSQL Server 2012 Columnstore Indexes Cont...SQL Server 2012 ColumnStore Index ScenarioApply ColumnStore Index to benchmark schemaNo query changesNo Hardware changes
  13. 13. cMLC* $/GB NOW SUPERIOR TO SAS HDD FOR DW
  14. 14. eMLC* IS COMPETITIVE EVEN AT $6/GB LIST
  15. 15. LOGICAL ARCHITECTURE
  16. 16. CONTROL NODE
  17. 17. MANAGEMENT NODE
  18. 18. MANAGEMENT NODE
  19. 19. LANDING ZONE
  20. 20. BACKUP NODE
  21. 21. BACKUP NODE
  22. 22. “PDW NODE”
  23. 23. COMPUTE NODES
  24. 24. STORAGE NODESDual fiber channel controllersActive/ActiveProvides fault tolerance
  25. 25. STORAGE NODE – PHYSICAL FILE LAYOUT
  26. 26. DATA STRATEGIES IN PDW
  27. 27. DATA LAYOUT APPROACHESReplicated: A table structure that exists as a full copy within eachdiscrete PDW Node.Distributed: A table structure that is hashed on a single columnand uniformly distributed across all nodes on the appliance. Eachdistribution is a separate physical table in the DBMS.Shared Nothing: The ability to design a schema of both distributedand replicated tables to minimize data movement between nodes.Small sets of data can be more efficiently stored in full (replicated).Certain set operations (i.e., single-node operations) are more efficient against full setsof data.
  28. 28. CREATING A DISTRIBUTED TABLE
  29. 29. PDW SOFTWARE ARCHITECTURE
  30. 30. DSQL – DISTRIBUTION INCOMPATIBILITYA DSQL query that requires redistribution of data betweenDBMS instances within an appliance to create the result set.Common examples of incompatible SQL:When a distribution key is not used in join or grouping functions appliedagainst distributed tablesWhen a replicated table outer-joins with a distributed table
  31. 31. POWER OF PDW
  32. 32. INTEGRATION WITH PDWSQL Server – Remote Table CopyOnly MPP-to-SMP supportedMust be co-located and on same Infiniband networkRequires Infiniband HCA card in SQL Server or Fast Track serverSample transfer rate to 4-socket 24-core server 300 – 600 MB per second including compression factorSSAS – PDW ADO.NET driverSSIS – PDW ADO.NET driverSSRS – PDW ADO.NET driverMPP-to-MPP integration requires SSISRemote Table Copy is not available for Non-SQL Server databases
  33. 33. ADMIN CONSOLE
  34. 34. WHAT IS NEW IN PDW APPLIANCE UPDATE 3 (AU3)
  35. 35. PDW AU3 OFFERS HIGH VALUE TO CUSTOMERS
  36. 36. AU3 SHELL DB ENABLES COST BASED OPTIMIZATION
  37. 37. COST OPTIMIZATION – PERFORMANCE IMPROVEMENTAU2 to AU3
  38. 38. THEME: PERFORMANCE AT SCALEZero data conversions in data movement AU2 AU3
  39. 39. UPDATING STATISTICS ON PDW
  40. 40. 7
  41. 41. BI SCENARIO: PDW AS A ‘DATA HUB’MPP ‘data hub’Fast and parallel feeding of data marts (DMs) via Infiniband CREATE REMOTE TABLE AS SELECTAggregation abilities avoids ETL overhead in existing systems No need for indexes No need to maintain indexed/materialized views (summary tables)
  42. 42. EXAMPLE AGGREGATED TABLE:
  43. 43. QUERYING AND DATA ANALYSIS BEST PRACTICESObserved data analysis pattern
  44. 44. Dimensional properties to change
  45. 45. PDW KEYS
  46. 46. ADDING INFINIBAND TO SSAS
  47. 47. MAX NUMBER OF CONNECTIONS
  48. 48. FINAL RESULT – 555K ROWS/SEC
  49. 49. PDW KEYS
  50. 50. CUSTOMER SUCCESSES – CONT’DHow are customers using PDW for BI ?Data Volume36 TB data warehouse analyzing data from transactional andclickstream sourcesBusiness need to expand to 7 year data window(currently 1 year data)RequirementsScalability - growing data volume does not affect performancePerformance and ad-hoc analysis for interactive queryingby usersBI Integration with Microsoft BI stack - SSAS and SSRSAU3 FeedbackSSAS cubes worked ‘out-of-box’Performance an order of magnitude faster than existing system(~30x on an expanded data set)
  51. 51. HP BUSINESS DATA WAREHOUSE APPLIANCE
  52. 52. HP HARDWAREServerHP ProLiant DL370 G6 x 5670 (4U)2x Westmere processors (12 cores)96 GB of RAMStorage24 x internal SFF SAS disks2 x Smart Array controllers and SAS expander2 TB physical user storage (up to 8 TB compressed)
  53. 53. SYSTEM VERIFICATION
  54. 54. WINDOWS MOUNT POINT VERIFICATION
  55. 55. DATABASE CONFIGURATION
  56. 56. SYSTEM CENTER OPERATIONS MANAGERAppliance Management PackExtents FragmentationTOP 20 large tablesFragmentation Thresholds Avg Fragment Size in pages > 400 : Green Avg Fragment Size in pages 300-400: Yellow Avg Fragment Size in pages <300: RedCritical State added to the monitors statesShow Extents Fragmentation diagnostic task in Monitors historyShow Extents Fragmentation task in Knowledge Base link and in StateAdverse configuration changesLock Pages in memory Startup OptionEven growth DB files in FilegroupIncrease Extents Number in DatabaseAutogrow increment <5% or < 100mb…more…
  57. 57. DIAGNOSTIC TASK AND MONITOR HISTORYFragmentation 1
  58. 58. DIAGNOSTIC TASK AND MONITOR HISTORYFragmentation (Details)
  59. 59. APPLIANCE DISK LAYOUT
  60. 60. CONFIGURE APPLIANCE & DOMAIN
  61. 61. CONFIGURE SQL
  62. 62. SETUP COMPLETE
  63. 63. FACTORY RESET
  64. 64. MDW APPLIANCE MANAGEMENT PACKMonitoringExtents FragmentationTOP 20 large tablesFragmentation Thresholds Avg Fragment Size in pages > 400 : Green Avg Fragment Size in pages 300-400: Yellow Avg Fragment Size in pages <300: RedCritical State added to the monitors statesShow Extents Fragmentation diagnostic task in Monitors historyShow Extents Fragmentation task in Knowledge Base link and in StateAdverse configuration changesLock Pages in memory Startup OptionEven growth DB files in FilegroupIncrease Extents Number in DatabaseAutogrow increment <5% or < 100mb…more…
  65. 65. MDW APPLIANCE DIAGRAM VIEW
  66. 66. DIAGNOSTIC TASK AND MONITOR HISTORY - FRAGMENTATION
  67. 67. DIAGNOSTIC TASK AND MONITOR HISTORY - FRAGMENTATION (DETAILS)
  68. 68. SO HOW DOES IT WORK?First, store the data
  69. 69. SO HOW DOES IT WORK?Second, take the processing to the data
  70. 70. HOW DO WE STORE BIG DATA?
  71. 71. HOW DO WE PROCESS BIG DATA?
  72. 72. MAPREDUCE – WORKFLOWA Map Reduce job usually splitsthe input data-set into independentchunks which are processed by themap tasks in a completely parallelmannerThe framework sorts the outputsof the maps, which are then inputto the reduce tasks; the frameworkshuffles the output of maps to bereducedThe framework takes care ofscheduling tasks, monitoring themand re-executes the failed tasks
  73. 73. HADOOP ARCHITECTURE & NEW PROGRAMMING
  74. 74. THE HADOOP ECOSYSTEM (SIMPLIFIED)
  75. 75. MICROSOFT’’S APPROACH TO BIG DATAInsights to all users by activating new types of data
  76. 76. HADOOP ON …
  77. 77. BIG DATA: ENTERPRISE-READYIntegrated with Leading DW Performance and ScaleBenefitsSimplicity and manageability of Windows for HadoopEnterprise-class securityHigh performance with consistently high throughput of dataIntegration with Enterprise Data WarehouseKey FeaturesIntegration with key enterprise components such as System Centerand Active DirectoryHadoop-based service on Windows Azure and Windows ServerIntegration with enterprise-level BI solution through new Hadoopconnectors for SQL Server and PDW
  78. 78. BIG DATA: CONNECTED TO THE WORLD’S DATACombines Internal and External Data and ServicesBenefitsStronger customer relationships using social mediaWider marketplace for sharing and collaborationEnriched analytics within the application using Bing modelsExtended analytics with smart mining algorithmsKey FeaturesIntegration with third-party data and services(U.S. stock sentiment)Integration with Azure Marketplace Services(Microsoft Translator service)Rich web data and mining models used on BingIntegration with social media (Microsoft codename ‘Social Analytics’)
  79. 79. BIG DATA: INSIGHTS FOR EVERYONEAnalytics to All Users Through Familiar ToolsBenefitsAnalyze Hadoop data in ExcelReduced time-to-solutionPredictive analysis on HadoopQuick-start BI for corporate solutionsKey FeaturesHive Add-in for ExcelHive ODBC driver for SQL Server Data Mining toolsIntegration of Hive and Microsoft BI tools such asPowerPivot and Power View
  80. 80. BIG DATA: OPEN AND FLEXIBLEDevelop Once, Deploy On-Premises or in the CloudBenefitsDeployment choice on-premises or in the cloudCompatibility with HadoopSimplified programming with deployment usinga web browserKey FeaturesNew JavaScript librariesRich ecosystem of open source partners, includingHortonworks, Cloudera, and KarmasphereIntegration with .NET
  81. 81. MICROSOFT HADOOP STRATEGY
  82. 82. HADOOP ON WINDOWS
  83. 83. APACHE HADOOP AS A SERVICE ON AZURE
  84. 84. BIG DATA CUSTOMERS
  85. 85. WEB / SOCIAL
  86. 86. YAHOO! TAO PLATFORM
  87. 87. KLOUT’S BIG DATA PROBLEM
  88. 88. EVENT TRACKER ARCHITECTURE { event_log "project":"plusK", tstamp string "event":"spend", project string insights3:9003/track/{"project":”plusK","event" "session_id":"0", event string :”spend”, "ip":"50.68.47.158", session_id bigint "ks_uid":123456,”type":”add_topic"} "kloutId":“123456", ks_uid bigint “cookie_id":”123456",ip string json_keys array<string> "ref":"http://klout.com/", "type":"add_topic", json_values array<string> "time":"1338366015" json_text string } dt string hr string will be saved in HDFS at: /logs/events_tracking/2012-05-30/0100 SELECT { [Measures].[Counter], [Measures].[PreviousPeriodCounter]} ON COLUMNS, NON EMPTY CROSSJOIN ( exists([Date].[Date].[Date].allmembers, [Date].[Date].&[2012-05-19T00:00:00]:[Date].[Date].&[2012-06- 02T00:00:00]), [Events].[Event].[Event].allmembers ) DIMENSION PROPERTIES MEMBER_CAPTION ON ROWS FROM [ProductInsight] WHERE ({[Projects].[Project].[plusK]})
  89. 89. HEALTHCAREOften a laggard in technologyYet, application of technology will be revolutionary tounderstanding the human systemGenomic sequencing brings the promise of understanding humanbiological systemsProteomic sequencing brings the promise of building the proteinsequences to build customized drugsHealthcare Incidence Prediction: Heart Attacks and Asthma
  90. 90. UNIVERSITY OF DUNDEE PROTEOMICS
  91. 91. UNIVERSITY OF DUNDEE PROTEOMICS
  92. 92. HEALTHCAREKey ScenariosClinical trials: not just examining existing drugs and efficacy, but alsopotential deviations E.g. Originally Viagra was developed to lower blood pressure and treat Angina; now it also helps with newborn pulmonary hypertension and altitude sicknessPredicting healthcare incidences issuesSocial media campaigns (e.g. advertising drugs)Pharmaceutical campaign advertising analytics Modeling the consumer, trying to understand their user behavior (why are they purchasing this medication, how do they feel about their ailment, related behaviors, etc.)
  93. 93. HEALTHCARE: RHIOs
  94. 94. GOVERNMENT & UTILITIES
  95. 95. GOVERNMENT & UTILITIESHard to work with (personnel, lengthy engagements,bureaucracy, etc.)Lots of standards and compliance requirements, and requires a lotof people to engage.But if you develop the relationship, it often becomes long termGovernment engagements include VA, DoD, DoD vendorsUtilities engagements include China Light and Power, Florida Powerand Light, Duke Energy, and Toshiba
  96. 96. GOVERNMENT & UTILITIESThanks to Greg Morning and Larry CochraneEvaluating consumer decisions and sentiment for green energy trendsSmart grid load management and targeted marketing (e.g. Smart Cities)Targeted Marketing and PerformanceUtilities Marketplace
  97. 97. EVENT PROCESSING AND REAL TIME
  98. 98. OIL & GAS
  99. 99. OIL & GASSeismic Data ProcessingA lot of this data is processed based on 1950s seismic algorithmsChevron has a 3000 node Linux cluster just to process this dataSometimes to process this data, it takes over a year!Hadoop allows us to have greater degrees of parallelism by firing off multiple map jobsNext Generation ApplicationsProcessing of WITSML data (Wellsite Information Transfer Standard Markup Language XML format) viaHive XML SerDeApply current BI tools to understand and model this dataApply Stream Insight / Storm to trigger against this informationData Sharing Scenarios
  100. 100. FINANCIAL
  101. 101. FINANCIALNatural extension of web analyticsWe built a web site, now let’s make some moneyFraud Analytics | Position, Triggers | Targeted Coupon / PaymentsVerticals: Financial, Retail, Finance Departments in All Verticals (e.g.How to save money on ordering laboratory supplies for a hospital)Technology Features: R, HBase, HPC scenariosOn Prem/Cloud: Almost always exclusively onprem due to SoX andother compliance / GRC models
  102. 102. Schema on ReadThe schema is not defined until data is queriedMore exploratory, requires domain knowledgeGoal is to find new value in ambient data…You don’t know what you don’t know…Schema on WriteThe schema is defined before data is loadedExposes well-defined metric’s and KPI’s to usersMature patterns & practices for development…Show me what I already know…
  103. 103. CLOUD
  104. 104. ON-PREMISE
  105. 105. HYBRID
  106. 106. FINANCIAL RISK
  107. 107. OIL & GAS – WELL-HEAD Sources Visualize Manage On-Premise and / or Cloud On-Premise On-Premise PerformancePoint Services Excel Services Analyze IaaS Cloud Hot Stream SSAS Hive Sqoop Pig Team Acquire Store Foundation Server On-Premise Learned Cloud Limits HDFS Cluster Adapters Input Telemetry MapReduce Streams (Java,.Net, Wellhead Cold R, etc…) Metrics Stream StreamInsight Azure Flume Operations Storage Data Nodes Active Directory Maint. Name Node Health Node Equip. Sqoop SQL Server Parallel Data Warehouse SSIS (On-Premise Only) System Center Operations Manager Azure VNet
  108. 108. EVENT-DRIVEN FEEDBACK
  109. 109. ANALYSIS, CONSUMPTION
  110. 110. VISUALIZATION
  111. 111. 1
  112. 112. © 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademar ks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accu racy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×