This document summarizes Andrew Brust's presentation on using the Microsoft platform for big data. It discusses Hadoop and HDInsight, MapReduce, using Hive with ODBC and the BI stack. It also covers Hekaton, NoSQL, SQL Server Parallel Data Warehouse, and PolyBase. The presentation includes demos of HDInsight, MapReduce, and using Hive with the BI stack.
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
Slides from my Keynote at Visual Studio Live Las Vegas 2011 (Day 2).
Closely compares Azure to AWS, and discusses Force.com, Google, Rackspace, VMWare and Red Hat.
Discussion includes capabilities, pricing, strategy.
Cloud Computing and the Microsoft Developer - A Down-to-Earth AnalysisAndrew Brust
Slides from my Keynote at Visual Studio Live Las Vegas 2011 (Day 2).
Closely compares Azure to AWS, and discusses Force.com, Google, Rackspace, VMWare and Red Hat.
Discussion includes capabilities, pricing, strategy.
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Introduction to SQL Server Cloud Storage AzureEduardo Castro
If you find an issue with the presentation output, click here. We'll get back to you within the next 48 hours. introduction to sql storage and azure storage Edit Update ClosePublic View Post to : URL :
Presentation Description Edit Less In this presentation we cover all the basic concepts about SQL Storage and Azure Storage. We present SQL Azure and ..
In this presentation we cover all the basic concepts about SQL Storage and Azure Storage. We present SQL Azure and Windows Azure Tables.
Regards,
Dr. Eduardo Castro Martinez
Microsoft SQL Server MVP
http://ecastrom.blogspot.com
http://comunidadwindows.org
Relational databases vs Non-relational databasesJames Serra
There is a lot of confusion about the place and purpose of the many recent non-relational database solutions ("NoSQL databases") compared to the relational database solutions that have been around for so many years. In this presentation I will first clarify what exactly these database solutions are, compare them, and discuss the best use cases for each. I'll discuss topics involving OLTP, scaling, data warehousing, polyglot persistence, and the CAP theorem. We will even touch on a new type of database solution called NewSQL. If you are building a new solution it is important to understand all your options so you take the right path to success.
An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
Introduction to SQL Server Cloud Storage AzureEduardo Castro
If you find an issue with the presentation output, click here. We'll get back to you within the next 48 hours. introduction to sql storage and azure storage Edit Update ClosePublic View Post to : URL :
Presentation Description Edit Less In this presentation we cover all the basic concepts about SQL Storage and Azure Storage. We present SQL Azure and ..
In this presentation we cover all the basic concepts about SQL Storage and Azure Storage. We present SQL Azure and Windows Azure Tables.
Regards,
Dr. Eduardo Castro Martinez
Microsoft SQL Server MVP
http://ecastrom.blogspot.com
http://comunidadwindows.org
Interested in learning Hadoop, but you’re overwhelmed by the number of components in the Hadoop ecosystem? You’d like to get some hands on experience with Hadoop but you don’t know Linux or Java? This session will focus on giving a high level explanation of Hive and HiveQL and how you can use them to get started with Hadoop without knowing Linux or Java.
Introduction to HDInsight Hadoop on Windows Azure services, including using the interactive console with JavaScript and running WordCount via other methods (Streaming, Hive, etc..)
Amazon Web Service and Microsoft Azure are dominating the enterprise public cloud market, but how are they different? Here’s what you need to know.
There are plenty of differences between AWS and Azure, probably too many to mention in a single webinar. AWS and Azure take generally different approaches and offer some unique services. From our experience with enterprise customers, we found that the commoditized services are often the most important ones. Everybody has a solution for security, access and storage, but how are those solutions different? Will mastering one Cloud platform give me the knowledge I need to operate the other?
Join us in our upcoming webinar, all about the differences in some of the public cloud’s most basic (and vital) services. Watch this webinar to learn about:
The major players in today’s enterprise public cloud market
5 important differences between Azure and AWS
How a single pane of glass can compensate for these differences
www.scalr.com
http://www.scalr.com/lp/webinars/register/aws-vs-azure-5-differences-you-need-to-know-when-chosing-a-public-cloud-vendor
A brief comparison between two cloud platforms AWS vs. Azure. Compare Microsoft Azure services, pricing, customers and more with Amazon AWS through slides.
Azure vs AWS Best Practices: What You Need to KnowRightScale
Azure is now the clear #2 in public cloud behind AWS. While some cloud users are evaluating Azure vs. AWS, many enterprises are planning to use both cloud providers. But there are some notable differences between how the two clouds operate and the best practices for deploying workloads in each.
The Azure vs. AWS Best Practices: What You Need to Know webinar will cover:
Recent and coming enhancements for Azure.
Azure vs. AWS differences for compute, networking, and storage.
Best practices for cloud deployments in Azure and AWS.
How to use both Azure and AWS.
Introduction to Microsoft’s Hadoop solution (HDInsight)James Serra
Did you know Microsoft provides a Hadoop Platform-as-a-Service (PaaS)? It’s called Azure HDInsight and it deploys and provisions managed Apache Hadoop clusters in the cloud, providing a software framework designed to process, analyze, and report on big data with high reliability and availability. HDInsight uses the Hortonworks Data Platform (HDP) Hadoop distribution that includes many Hadoop components such as HBase, Spark, Storm, Pig, Hive, and Mahout. Join me in this presentation as I talk about what Hadoop is, why deploy to the cloud, and Microsoft’s solution.
Microsoft Azure vs Amazon Web Services (AWS) Services & Feature MappingIlyas F ☁☁☁
If you are a Cloud Architect, Developer, IT Manager, Director or whoever may be, if you are associated with Azure or AWS cloud in some form, I’m sure you must have come across a common question.
“What is the alternate service available in Azure or AWS vice versa and it’s pricing?” I’m sure you will say yes!
Agreed, it’s hard to remember all the services offered by public clouds, i.e. Azure and AWS. Remembering existing services and their benefits itself is a big task, on top of that updating ourselves with the new feature releases and enhancements is another major task.
So I put together a Service & Feature Mappings between Microsoft Azure & AWS for my and colleagues quick reference.
I hope you also find this piece informative.
Compare Clouds: Aws vs Azure vs Google vs SoftLayerRightScale
Most enterprises have a multi-cloud strategy, but choosing the right cloud for a workload can be challenging. We’ll share a free tool to compare public cloud features and help you make the best decision for each workload. We’ll also drill down on a few key areas where the leading public clouds are different.
Overview of Big data, Hadoop and Microsoft BI - version1Thanh Nguyen
Big Data and advanced analytics are critical topics for executives today. But many still aren't sure how to turn that promise into value. This presentation provides an overview of 16 examples and use cases that lay out the different ways companies have approached the issue and found value: everything from pricing flexibility to customer preference management to credit risk analysis to fraud protection and discount targeting. For the latest on Big Data & Advanced Analytics: http://mckinseyonmarketingandsales.com/topics/big-data
Overview of big data & hadoop version 1 - Tony NguyenThanh Nguyen
Overview of Big data, Hadoop and Microsoft BI - version1
Big Data and Hadoop are emerging topics in data warehousing for many executives, BI practices and technologists today. However, many people still aren't sure how Big Data and existing Data warehouse can be married and turn that promise into value. This presentation provides an overview of Big Data technology and how Big Data can fit to the current BI/data warehousing context.
http://www.quantumit.com.au
http://www.evisional.com
Big Data and New Challenges for DBAs (Michael Naumov, LivePerson)
Hadoop has become a popular platform for managing large datasets of structured and unstructured data. It does not replace existing infrastructures, but instead augments them. Most companies will still use relational databases for transactional processing and low-latency queries, but can benefit from Hadoop for reporting, machine learning or ETL. This session will cover:
What is Hadoop and why do I care?
What do people do with Hadoop?
How can SQL Server DBAs add Hadoop to their architecture?
Here I talk about examples and use cases for Big Data & Big Data Analytics and how we accomplished massive-scale sentiment, campaign and marketing analytics for Razorfish using a collecting of database, Big Data and analytics technologies.
Enough taking about Big data and Hadoop and let’s see how Hadoop works in action.
We will locate a real dataset, ingest it to our cluster, connect it to a database, apply some queries and data transformations on it , save our result and show it via BI tool.
ارائه در زمینه کلان داده،
کارگاه آموزشی "عصر کلان داده، چرا و چگونه؟" در بیست و دومین کنفرانس انجمن کامپیوتر ایران csicc2017.ir
وحید امیری
vahidamiry.ir
datastack.ir
It is just a basic slides which will give you normal point of view of the big data technologies and tools used in the hadoop technology
It is just a small start to share what I have to share
Philly Code Camp 2013 Mark Kromer Big Data with SQL ServerMark Kromer
These are my slides from May 2013 Philly Code Camp at Penn State Abington. I will post the samples, code and scripts on my blog here following the event this Saturday: http://www.kromerbigdata.com
Similar to Big Data on the Microsoft Platform (20)
Microsoft and its Competition: A Developer-Friendly Market Analysis
Big Data on the Microsoft Platform
1. Global Sponsor:
Big Data on the Microsoft Platform
Andrew J. Brust, CEO, Blue Badge Insights
With Hadoop, MS BI and the SQL Server stack
2. CEO and Founder, Blue Badge Insights
Big Data blogger for ZDNet
Microsoft Regional Director, MVP
Co-chair VSLive! and 17 years as a speaker
Founder, Microsoft BI User Group of NYC
http://www.msbigdatanyc.com
Co-moderator, NYC .NET Developers Group
http://www.nycdotnetdev.com
“Redmond Review” columnist for
Visual Studio Magazine
brustblog.com, Twitter: @andrewbrust
Meet Andrew
5. Agenda
Big Data, Hadoop and HDInsight
MapReduce
Hive ODBC, BI Stack
Hekaton, NoSQL
SQL Server Parallel Data Warehouse, MPP, PolyBase
6. What is Big Data?
100s of TB into PB and higher
Involving data from: financial data, sensors, web logs,
social media, etc.
Parallel processing often involved
Hadoop is emblematic, but other technologies are Big Data too
Processing of data sets too large for transactional
databases
Analyzing interactions, rather than transactions
The three V’s: Volume, Velocity, Variety
•Big Data tech sometimes imposed on small data problems
7. What is Hadoop?
Open source implementation of Google’s MapReduce and
GFS (Google File System)
Allows for scale-out processing of petabyte scale data
1 PB = 1,024 TB
Also distributed storage
Commodity hardware
Can work against flat files, or certain database formats
Native processing involves imperative Java code
Other languages supported through “Streaming”
7
8. What is HDInsight?
Microsoft’s Hadoop distribution, on Windows
Most other distros on Linux
Based on Hortonworks Data Platform (HDP)
Runs on Azure, eventually on Windows Server, and as
sandbox on dev PC
For .NET devs: .NET SDK for Hadoop, LINQ provider
8
12. A MapReduce Example
• Count by suite, on each floor
• Send per-suite, per platform totals to lobby
• Sort totals by platform
• Send two platform packets to 10th, 20th, 30th floor
• Tally up each platform
• Merge tallies into one spreadsheet
• Collect the tallies
13. MapReduce Options
Pig, Hive, Sqoop, Mahout also generate MapReduce code
13
Java
JavaScript (“Rhino”)
Other languages, especially Python, via Streaming
C# via Streaming
C# via .NET SDK
14. Hortonworks Data
Platform for
Windows
MRLib (NuGet
Package)
LINQ to Hive
OdbcClient +
Hive ODBC
Driver
Deployment
WebHDFS
client
MR code in C#,
HadoopJob,
MapperBase,
ReducerBase
Amenities for
Visual Studio/.NET
16. Hive
Began as Hadoop sub-project
Now top-level Apache project
Provides a SQL-like (“HiveQL”) abstraction over
MapReduce
Has its own HDFS table file format (and it’s fully schema-
bound)
Can also work over HBase
Acts as a bridge to many BI products which expect tabular
data
17. Hive ODBC Consumers
17
Excel 2010 or 2013 (including via add-in)
PowerPivot
SQL Server Analysis Services, Tabular Mode
SQL Server Reporting Services
ADO.NET OdbcClient provider
LINQ provider
18. xVelocity Technologies
Formerly known as VertiPaq
PowerPivot, SSAS Tabular, SQL Server columnar indexes
Implements BI Semantic Model (BISM)
Uses column store technology
Compression
In-memory
Speed
Not a Big Data technology per se, but very useful for
analysis of job output
18
19. Power View
Reports on BISM models (PowerPivot, SSAS Tabular)
Hosted in SharePoint 2010, 2013 Enterprise
Also Excel 2013 (but not on ARM/Windows RT)
Interactive data exploration
19
21. Project “Hekaton”
In-memory engine for SQL Server transactional workloads
Tables must be declared as in-memory explicitly
In-memory and standard tables can coexist in same db
Stored procs on in-mem tables are compiled to native code
Hekaton and xVelocity are separate
Hekaton ≠ PowerPivot/SSAS Tabular
Hekaton ≠ Columnstore indexes
Compare to SAP HANA
In-memory, transactional, analytical, column store
21
22. NoSQL
NoSQL databases are non-relational and non- or loosely-
schematized
HBase is a NoSQL database, of the wide column variety
Hive implements a SQL layer over it
HBase not yet in HDInsight
HBase table = HDFS file
Three other NoSQL categories
Key-value store, document store, graph database
Azure Table Storage is a key-value store NoSQL database
Some of them aren’t really Big Data tools, but market
themselves that way anyway
22
23. SQL Parallel Data Warehouse (PDW)
SQL PDW is a Massively Parallel Processing (MPP)
database
Teradata, IBM Netezza, HP Vertica also in this category
It’s an array/cluster of SQL servers made to look like one
SQL Server
Available as appliance only
Purchase from HP, Dell
Server, storage and network all pre-built and configured
Many other MPP products based on PostgreSQL
PDW loosely based on acquired DATAllegro product
Implemented MPP with Ingres, written in Java, running on Linux
23
24. MapReduce versus MPP
24
MapReduce MPP
Splits preprocessing amongst mapper
nodes and aggregation amongst reducers
Scales infinitely on commodity hardware
Uses locally attached commodity disks on
nodes
Uses imperative code
Processes flat files, wide column tables
(HBase), relational tables (Hive)
Divide-and-conquer approach, parallel,
distributed
Splits query amongst nodes then unifies
result sets
Scales to high-end assets in the
appliance cabinet
Uses shared storage (can be more
network traffic)
Uses SQL
Works with relational tables only
Divide-and-conquer approach, parallel,
distributed
25. PolyBase
To be included in next version of PDW
Mashup of SQL Server and Hadoop
Enables PDW to address Hadoop data nodes (HDFS)
directly
Parallelism managed by PDW
Tables are imported into SQL Server db
They are EXTERNAL tables
They can participate in joins with standard tables
25
26. Resources
MS Big Data/HDInsight
http://www.microsoft.com/bigdata
Apache Hadoop
http://hadoop.apache.org/
Apache HBase
http://hbase.apache.org/
SQL PDW
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/pdw.aspx
PolyBase
http://gsl.azurewebsites.net/Projects/Polybase.aspx
xVelocity
http://www.microsoft.com/en-us/sqlserver/solutions-technologies/data-warehousing/in-
memory.aspx
Column store
http://en.wikipedia.org/wiki/Column-oriented_DBMS
Power View
http://office.microsoft.com/en-us/excel-help/power-view-explore-visualize-and-present-your-
data-HA102835634.aspx
Hekaton
http://blogs.technet.com/b/dataplatforminsider/archive/2012/11/08/breakthrough-performance-
with-in-memory-technologies.aspx
26