0
Introduction to
Big Data
Joey Li
joeylicc@gmail.com
@joeylicc
joeylicc.wordpress.com
What is Big Data?
Big Data is a collection of data sets so large and complex
that it becomes difficult to process using tr...
Microsoft Solution to Big Data
●
●
●
●
●

Microsoft HDInsight
Microsoft .NET SDK for Hadoop
Microsoft ODBC Driver for Hive...
Microsoft HDInsight
● 100% Apache Hadoop compatible Big Data
implementation
● Microsoft support of HDInsight on Windows Se...
What is Hadoop?
Apache Hadoop is an open-source software
framework that allows for the distributed processing of
large dat...
What is Hadoop? (Cont.)
Hadoop includes 2 major modules
1. Hadoop Distributed File System (HDFS)
A distributed file system...
Hadoop Architecture
Hadoop Cluster
HDFS Write Operation
HDFS Read Operation
MapReduce
Hadoop Ecosystem
Microsoft .NET SDK for Hadoop
●
●
●
●

HDInsight Cluster Management
Hadoop Job Submission
Customize Map/Reduce Job
LINQ to...
Microsoft ODBC Driver for Hive
● Connect the following tools to Hadoop for
data insight
○ Microsoft Excel (Power View & Po...
Learning Hadoop
● Get Started with Hadoop@Hortonworks
http://hortonworks.com/get-started/

● Big Data University
http://bi...
References
● Big Data@Wikipedia
http://en.wikipedia.org/wiki/Big_data

● Big Data@Microsoft
http://www.microsoft.com/en-us...
References (Cont.)
● Apache Hadoop
http://hadoop.apache.org/

● Apache Hadoop@Wikipedia
http://en.wikipedia.org/wiki/Apach...
Upcoming SlideShare
Loading in...5
×

Introduction to Big Data

1,435

Published on

Introduction to Big Data and Microsoft Solution to Big Data

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,435
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
133
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Introduction to Big Data"

  1. 1. Introduction to Big Data Joey Li joeylicc@gmail.com @joeylicc joeylicc.wordpress.com
  2. 2. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using traditional database systems. Big Data Challenges (3Vs) Volume Amount of Data Velocity Speed of Data In & Out Variety Range of Data Types & Sources
  3. 3. Microsoft Solution to Big Data ● ● ● ● ● Microsoft HDInsight Microsoft .NET SDK for Hadoop Microsoft ODBC Driver for Hive Microsoft Excel (Power View & PowerPivot) Microsoft SharePoint (Power View)
  4. 4. Microsoft HDInsight ● 100% Apache Hadoop compatible Big Data implementation ● Microsoft support of HDInsight on Windows Server and Windows Azure ● Simplified deployment and ease of manageability with System Center 2012 or Windows Azure ● Elegant connectivity to Microsoft Office Excel 2013 and Business Intelligence tools
  5. 5. What is Hadoop? Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  6. 6. What is Hadoop? (Cont.) Hadoop includes 2 major modules 1. Hadoop Distributed File System (HDFS) A distributed file system that provides high-throughput access to application data 2. Hadoop MapReduce A programming model for parallel processing of large data sets
  7. 7. Hadoop Architecture
  8. 8. Hadoop Cluster
  9. 9. HDFS Write Operation
  10. 10. HDFS Read Operation
  11. 11. MapReduce
  12. 12. Hadoop Ecosystem
  13. 13. Microsoft .NET SDK for Hadoop ● ● ● ● HDInsight Cluster Management Hadoop Job Submission Customize Map/Reduce Job LINQ to Hive
  14. 14. Microsoft ODBC Driver for Hive ● Connect the following tools to Hadoop for data insight ○ Microsoft Excel (Power View & PowerPivot) ○ Microsoft SharePoint (Power View) ○ Microsoft SQL Server ■ Database Engine ■ Analysis Services
  15. 15. Learning Hadoop ● Get Started with Hadoop@Hortonworks http://hortonworks.com/get-started/ ● Big Data University http://bigdatauniversity.com/ ● Getting Started with Microsoft Big Data http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
  16. 16. References ● Big Data@Wikipedia http://en.wikipedia.org/wiki/Big_data ● Big Data@Microsoft http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx ● Hortonworks Data Platform (HDP) http://hortonworks.com/
  17. 17. References (Cont.) ● Apache Hadoop http://hadoop.apache.org/ ● Apache Hadoop@Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop ● Microsoft .NET SDK for Hadoop http://hadoopsdk.codeplex.com/ ● Microsoft ODBC Driver for Hive http://www.microsoft.com/en-us/download/details.aspx?id=37134
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×