Introduction to Big Data
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Introduction to Big Data

  • 1,431 views
Uploaded on

Introduction to Big Data and Microsoft Solution to Big Data

Introduction to Big Data and Microsoft Solution to Big Data

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,431
On Slideshare
1,287
From Embeds
144
Number of Embeds
3

Actions

Shares
Downloads
117
Comments
0
Likes
0

Embeds 144

http://joeylicc.wordpress.com 140
http://silverreader.com 2
https://joeylicc.wordpress.com 2

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to Big Data Joey Li joeylicc@gmail.com @joeylicc joeylicc.wordpress.com
  • 2. What is Big Data? Big Data is a collection of data sets so large and complex that it becomes difficult to process using traditional database systems. Big Data Challenges (3Vs) Volume Amount of Data Velocity Speed of Data In & Out Variety Range of Data Types & Sources
  • 3. Microsoft Solution to Big Data ● ● ● ● ● Microsoft HDInsight Microsoft .NET SDK for Hadoop Microsoft ODBC Driver for Hive Microsoft Excel (Power View & PowerPivot) Microsoft SharePoint (Power View)
  • 4. Microsoft HDInsight ● 100% Apache Hadoop compatible Big Data implementation ● Microsoft support of HDInsight on Windows Server and Windows Azure ● Simplified deployment and ease of manageability with System Center 2012 or Windows Azure ● Elegant connectivity to Microsoft Office Excel 2013 and Business Intelligence tools
  • 5. What is Hadoop? Apache Hadoop is an open-source software framework that allows for the distributed processing of large data sets across clusters of computers using simple programming model. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
  • 6. What is Hadoop? (Cont.) Hadoop includes 2 major modules 1. Hadoop Distributed File System (HDFS) A distributed file system that provides high-throughput access to application data 2. Hadoop MapReduce A programming model for parallel processing of large data sets
  • 7. Hadoop Architecture
  • 8. Hadoop Cluster
  • 9. HDFS Write Operation
  • 10. HDFS Read Operation
  • 11. MapReduce
  • 12. Hadoop Ecosystem
  • 13. Microsoft .NET SDK for Hadoop ● ● ● ● HDInsight Cluster Management Hadoop Job Submission Customize Map/Reduce Job LINQ to Hive
  • 14. Microsoft ODBC Driver for Hive ● Connect the following tools to Hadoop for data insight ○ Microsoft Excel (Power View & PowerPivot) ○ Microsoft SharePoint (Power View) ○ Microsoft SQL Server ■ Database Engine ■ Analysis Services
  • 15. Learning Hadoop ● Get Started with Hadoop@Hortonworks http://hortonworks.com/get-started/ ● Big Data University http://bigdatauniversity.com/ ● Getting Started with Microsoft Big Data http://www.microsoftvirtualacademy.com/training-courses/getting-startedwith-microsoft-big-data
  • 16. References ● Big Data@Wikipedia http://en.wikipedia.org/wiki/Big_data ● Big Data@Microsoft http://www.microsoft.com/en-us/sqlserver/solutions-technologies/businessintelligence/big-data.aspx ● Hortonworks Data Platform (HDP) http://hortonworks.com/
  • 17. References (Cont.) ● Apache Hadoop http://hadoop.apache.org/ ● Apache Hadoop@Wikipedia http://en.wikipedia.org/wiki/Apache_Hadoop ● Microsoft .NET SDK for Hadoop http://hadoopsdk.codeplex.com/ ● Microsoft ODBC Driver for Hive http://www.microsoft.com/en-us/download/details.aspx?id=37134