Your SlideShare is downloading. ×
  • Like
Big data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.


Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Big data



Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide
  • Difference of IOT and Internet IPV6 – MDSN this month … Slide Objectives:Huge opportunities in internet of thingsTransition:Transition statement(s) to setup the slideSpeaking Points:Internet of things can help us monitor our environment and help optimize our physical world.The tremendous amount of data needs to be stored and analyzed in real time, interactively and batch processing.Notes:Any notes go here
  • Slide Objectives:Talk from the bottom layer up to discuss the Microsoft big data solution.Transition:Transition statement(s) to setup the slideSpeaking Points:BI Platform: Sql server analysis service and reporting service.Self service BI: powerview, powerpivot, predictive analysis and embedded BI.Taking in unstructured data and strutted data sources through Hadoop, or PDWNotes:Any notes go here
  • Slide Objectives:Vision slideTransition:Transition statement(s) to setup the slideSpeaking Points:Broaden access to Hadoop on the windows platformEnterprise ready through AD, System center (to come).BI integration and Self service BINotes:Any notes go here
  • Slide Objectives:Architecture of hadoopTransition:Transition statement(s) to setup the slideSpeaking Points:Map reduce is the programming layer where it resembles the primitives of parallel programming. At the file system layer, the distributed Hadoop file system takes care of availability redundancy and reliability of the storage layer.Each block of your data is copied 3 times for safe keeping, and the map reduce layer can schedule work onto the node that contains the actual data blockNotes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Map reduce is about minimizing the movement of data inside your cluster.The job tracker understands where all the data blocks are, and will send the operation code to the node that contains the data.Notes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Slide Objectives:Objective #1Transition:Transition statement(s) to setup the slideSpeaking Points:Speaking Point #1Speaking Point #2Notes:Any notes go here
  • Slide Objectives:Understand the HDInsight eco-systemTransition:Transition statement(s) to setup the slideSpeaking Points:Biggest buzzword in Big Data right now is HadoopIt can mean many things, but always includes HDFS and MapReduceHDInsightRed = in product nowBlue = planned for productGreen = ecosystem can connect nowPurple = Samples availableOrange = ecosystem plannedFlume, HBase are not available in the first release of HDInsight ServiceAs of 3/15, we don’t have an on-premise solution, thus AD integration is not yet available. System center integration will come later as well.The Green boxes are packages in the ecosystem that have not been included in the service, but should work out of the box by downloading them.Notes:Any notes go here
  • Slide Objectives:Provides 1 layer to access both attached/local storage on each node and the remote Windows Azure Blog storage which is the default.Transition:Transition statement(s) to setup the slideSpeaking Points:One interface to rule both DFS and Azure blob storageBlob storage:Front End: Security/Auth and scaled out request handlerPartition Layer: Object Layer, Mapping of objects such as Tables, Blobs, Queues to streams (cached in Front End), CCStream Layer: 3-Node HA, Scale-out stream storePlease see details from windows azure storage paper. IN some ways ASV changes things again, we are now moving data to the compute, since data is now remote. Blob storage allows you to persist your data even when you tear down your cluster.Notes:Any notes go here
  • Slide Objectives:Understand the details of ASVTransition:Transition statement(s) to setup the slideSpeaking Points:You will need to create an Azure storage account, you will need your acct name and key.You should create a cluster close to where your data is. (storage in west should create a cluster in the west data center).Notes:Any notes go here
  • Slide Objectives:Best of both world in terms of programming flexibilityTransition:Transition statement(s) to setup the slideSpeaking Points:We offer everything the Hadoop distribution offers.In addition, we have made available javascript, browser hosted console, f#, c# linq2Hive to make life easier for .net /enterprise developers.In addition, devops can use powershell and node.js based CLI to control and manage the cluster.Notes:Any notes go here
  • Innovate across the stack in terms of developer tools for better experience.


  • 1. Terabytes (10E12) Click Stream Mobile Volume Petabytes (10E15) Internet of things Wikis / Blogs Sensors / RFID / Devices Social Sentiment Exabytes (10E18) Audio / Video WEB 2.0 Advertising eCommerce ERP / CRM Log Files Collaboration Spatial & GPS Coordinates Digital Marketing Data Market Feeds Payables Search Marketing Payroll Deal Tracking Web Logs Inventory Gigabytes (10E9) Contacts Sales Pipeline Recommendations eGov Feeds Weather Text/Image Velocity - Variety - variability ERP / CRM 1980 190,000$ Storage/GB 1990 9,000$ WEB 2.0 Internet of things 2000 15$ 2010 0.07$
  • 2. Big Data, BIG OPPORTUNITY Billions $ Software Growth 5 0 1.8 2.5 3.4 4.6 2012 2013 2014 2015 34% compound annual growth rate 49% CEOs and CIOs are planning big data projects Billions $ Services Growth 10 5 0 2.7 3.9 5.1 6.5 2012 2013 2014 2015 1. McKinsey&Company, McKinsey Global Survey Results, Minding Your Digital Business, 2012 2. IDC Market Analysis, Worldwide Big Data Technology and Services 2012–2015 Forecast , 2012 39% compound annual growth rate
  • 3. Distributed Processing (MapReduce) Distributed Storage (HDFS) ODBC Query (Hive) Legend Red = Core Hadoop Blue = Data processing Purple = Microsoft integration points and value adds Orange = Data Movement Green = Packages
  • 4. HDFS API Name Node Azure Blob Storage de Front Front end Frontend end Data Node Data Node … DFS (1 Data Node per Worker Role) and Compute Cluster Partition Layer Stream Layer Azure Storage (ASV)
  • 5. Hive, Pig, Mahout, Cascading, Scalding, Scoobi, P egasus… C#, F# Map/Reduce, LINQ to Hive, .NET management clients JavaScript Map/Reduce, Browser hosted console, Node.js management clients PowerShell, Cross Platform CLI tools