Your SlideShare is downloading. ×
0
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack

4,120

Published on

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack - 24 Hours of PASS

Big Data on the Microsoft Platform - With Hadoop, MS BI and the SQL Server stack - 24 Hours of PASS

Published in: Technology
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
4,120
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
141
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Global Sponsor:Big Data on the Microsoft PlatformAndrew J. Brust, CEO, Blue Badge InsightsWith Hadoop, MS BI and the SQL Server stack
  • 2. CEO and Founder, Blue Badge InsightsBig Data blogger for ZDNetMicrosoft Regional Director, MVPCo-chair VSLive! and 17 years as a speakerFounder, Microsoft BI User Group of NYC http://www.msbigdatanyc.comCo-moderator, NYC .NET Developers Group http://www.nycdotnetdev.com“Redmond Review” columnist forVisual Studio Magazinebrustblog.com, Twitter: @andrewbrustMeet Andrew
  • 3. Read all about it!
  • 4. My New Blog (bit.ly/bigondata)
  • 5. AgendaBig Data, Hadoop and HDInsightMapReduceHive ODBC, BI StackHekaton, NoSQLSQL Server Parallel Data Warehouse, MPP, PolyBase
  • 6. What is Big Data?100s of TB into PB and higherInvolving data from: financial data, sensors, web logs,social media, etc.Parallel processing often involved Hadoop is emblematic, but other technologies are Big Data tooProcessing of data sets too large for transactionaldatabases Analyzing interactions, rather than transactions The three V’s: Volume, Velocity, Variety•Big Data tech sometimes imposed on small data problems
  • 7. What is Hadoop?Open source implementation of Google’s MapReduce andGFS (Google File System)Allows for scale-out processing of petabyte scale data 1 PB = 1,024 TBAlso distributed storageCommodity hardwareCan work against flat files, or certain database formatsNative processing involves imperative Java codeOther languages supported through “Streaming”7
  • 8. What is HDInsight?Microsoft’s Hadoop distribution, on Windows Most other distros on LinuxBased on Hortonworks Data Platform (HDP)Runs on Azure, eventually on Windows Server, and assandbox on dev PCFor .NET devs: .NET SDK for Hadoop, LINQ provider8
  • 9. Global Sponsor:DemoHDInsight
  • 10. The Hadoop StackMapReduce, HDFSDatabaseRDBMS Import/ExportQuery: HiveQL and Pig LatinMachine Learning/Data MiningLog file integration
  • 11. MapReduce, in a DiagrammappermappermappermappermappermapperInputreducerreducerreducerInputInputInputInputInputInputOutputOutputOutputOutputOutputOutputOutputInputInputInputK1K2K3OutputOutputOutput
  • 12. A MapReduce Example• Count by suite, on each floor• Send per-suite, per platform totals to lobby• Sort totals by platform• Send two platform packets to 10th, 20th, 30th floor• Tally up each platform• Merge tallies into one spreadsheet• Collect the tallies
  • 13. MapReduce OptionsPig, Hive, Sqoop, Mahout also generate MapReduce code13JavaJavaScript (“Rhino”)Other languages, especially Python, via StreamingC# via StreamingC# via .NET SDK
  • 14. Hortonworks DataPlatform forWindowsMRLib (NuGetPackage)LINQ to HiveOdbcClient +Hive ODBCDriverDeploymentWebHDFSclientMR code in C#,HadoopJob,MapperBase,ReducerBaseAmenities forVisual Studio/.NET
  • 15. Global Sponsor:DemoMapReduce
  • 16. HiveBegan as Hadoop sub-project Now top-level Apache projectProvides a SQL-like (“HiveQL”) abstraction overMapReduceHas its own HDFS table file format (and it’s fully schema-bound)Can also work over HBaseActs as a bridge to many BI products which expect tabulardata
  • 17. Hive ODBC Consumers17Excel 2010 or 2013 (including via add-in)PowerPivotSQL Server Analysis Services, Tabular ModeSQL Server Reporting ServicesADO.NET OdbcClient providerLINQ provider
  • 18. xVelocity TechnologiesFormerly known as VertiPaqPowerPivot, SSAS Tabular, SQL Server columnar indexesImplements BI Semantic Model (BISM)Uses column store technology Compression In-memory SpeedNot a Big Data technology per se, but very useful foranalysis of job output18
  • 19. Power ViewReports on BISM models (PowerPivot, SSAS Tabular)Hosted in SharePoint 2010, 2013 EnterpriseAlso Excel 2013 (but not on ARM/Windows RT)Interactive data exploration19
  • 20. Global Sponsor:DemoHive ODBC + BI Stack
  • 21. Project “Hekaton”In-memory engine for SQL Server transactional workloadsTables must be declared as in-memory explicitlyIn-memory and standard tables can coexist in same dbStored procs on in-mem tables are compiled to native codeHekaton and xVelocity are separateHekaton ≠ PowerPivot/SSAS TabularHekaton ≠ Columnstore indexesCompare to SAP HANA In-memory, transactional, analytical, column store21
  • 22. NoSQLNoSQL databases are non-relational and non- or loosely-schematizedHBase is a NoSQL database, of the wide column variety Hive implements a SQL layer over it HBase not yet in HDInsightHBase table = HDFS fileThree other NoSQL categories Key-value store, document store, graph database Azure Table Storage is a key-value store NoSQL databaseSome of them aren’t really Big Data tools, but marketthemselves that way anyway22
  • 23. SQL Parallel Data Warehouse (PDW)SQL PDW is a Massively Parallel Processing (MPP)databaseTeradata, IBM Netezza, HP Vertica also in this categoryIt’s an array/cluster of SQL servers made to look like oneSQL ServerAvailable as appliance only Purchase from HP, Dell Server, storage and network all pre-built and configuredMany other MPP products based on PostgreSQLPDW loosely based on acquired DATAllegro product Implemented MPP with Ingres, written in Java, running on Linux23
  • 24. MapReduce versus MPP24MapReduce MPP Splits preprocessing amongst mappernodes and aggregation amongst reducers Scales infinitely on commodity hardware Uses locally attached commodity disks onnodes Uses imperative code Processes flat files, wide column tables(HBase), relational tables (Hive) Divide-and-conquer approach, parallel,distributed Splits query amongst nodes then unifiesresult sets Scales to high-end assets in theappliance cabinet Uses shared storage (can be morenetwork traffic) Uses SQL Works with relational tables only Divide-and-conquer approach, parallel,distributed
  • 25. PolyBaseTo be included in next version of PDWMashup of SQL Server and HadoopEnables PDW to address Hadoop data nodes (HDFS)directlyParallelism managed by PDWTables are imported into SQL Server db They are EXTERNAL tables They can participate in joins with standard tables25
  • 26. ResourcesMS Big Data/HDInsight http://www.microsoft.com/bigdataApache Hadoop http://hadoop.apache.org/Apache HBase http://hbase.apache.org/SQL PDW http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspxPolyBase http://gsl.azurewebsites.net/Projects/Polybase.aspxxVelocity http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in-memory.aspxColumn store http://en.wikipedia.org/wiki/Column-oriented_DBMSPower View http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your-data-HA102835634.aspxHekaton http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance-with-in-memory-technologies.aspx26
  • 27. Global Sponsor:Questions?
  • 28. Global Sponsor:Thank You for Attending

×