Introduction to Big Data


Published on

Introduction to Big Data and Microsoft Solution to Big Data

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Introduction to Big Data

  1. 1. Introduction to Big Data Joey Li @joeylicc
  2. 2. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using traditional database systems. Big Data Challenges (3Vs) Volume Amount of Data Velocity Speed of Data In & Out Variety Range of Data Types & Sources
  3. 3. Microsoft Solution to Big Data ● ● ● ● ● Microsoft HDInsight Microsoft .NET SDK for Hadoop Microsoft ODBC Driver for Hive Microsoft Excel (Power View & PowerPivot) Microsoft SharePoint (Power View)
  4. 4. Microsoft HDInsight ● 100% Apache Hadoop compatible Big Data implementation ● Microsoft support of HDInsight on Windows Server and Windows Azure ● Simplified deployment and ease of manageability with System Center 2012 or Windows Azure ● Elegant connectivity to Microsoft Office Excel 2013 and Business Intelligence tools
  5. 5. What is Hadoop? Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  6. 6. What is Hadoop? (Cont.) Hadoop includes 2 major modules 1. Hadoop Distributed File System (HDFS) A distributed file system that provides high-throughput access to application data 2. Hadoop MapReduce A programming model for parallel processing of large data sets
  7. 7. Hadoop Architecture
  8. 8. Hadoop Cluster
  9. 9. HDFS Write Operation
  10. 10. HDFS Read Operation
  11. 11. MapReduce
  12. 12. Hadoop Ecosystem
  13. 13. Microsoft .NET SDK for Hadoop ● ● ● ● HDInsight Cluster Management Hadoop Job Submission Customize Map/Reduce Job LINQ to Hive
  14. 14. Microsoft ODBC Driver for Hive ● Connect the following tools to Hadoop for data insight ○ Microsoft Excel (Power View & PowerPivot) ○ Microsoft SharePoint (Power View) ○ Microsoft SQL Server ■ Database Engine ■ Analysis Services
  15. 15. Learning Hadoop ● Get Started with Hadoop@Hortonworks ● Big Data University ● Getting Started with Microsoft Big Data
  16. 16. References ● Big Data@Wikipedia ● Big Data@Microsoft ● Hortonworks Data Platform (HDP)
  17. 17. References (Cont.) ● Apache Hadoop ● Apache Hadoop@Wikipedia ● Microsoft .NET SDK for Hadoop ● Microsoft ODBC Driver for Hive
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.