Your SlideShare is downloading. ×
0
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Big Data – General Introduction
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Big Data – General Introduction

910

Published on

The session gives an introduction to Big Data. …

The session gives an introduction to Big Data.
Starts by giving a generally accepted definition of the term Big Data. Then explores why Big Data is important in the current business scenario . The topic ends with enumeration of the technologies used to analyze Big Data like Map Reduce, NoSql etc

5-7 key questions (non-generic) which would be covered in the webinar
(i) What is Big Data
(ii) Why is Big Data important in the current business scenario
(iii) How can an organization effectively use Big Data
(iv) What are the important technologies used to analyze Big Data?
(v) What are MapReduce/Hadoop/HDFS/NoSQL technologies?

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
910
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 1© Copyright 2013 EMC Corporation. All rights reserved. Big Data – General Introduction - Vignesh Gopalan , IIG
  • 2. 2© Copyright 2013 EMC Corporation. All rights reserved. Agenda Big Data – Definition Importance of Big Data Technologies used in Big Data Analysis
  • 3. 3© Copyright 2013 EMC Corporation. All rights reserved. Big Data – A Definition Volume Variety Velocity Veracity The ‘V’s of Big Data
  • 4. 4© Copyright 2013 EMC Corporation. All rights reserved. Why is Big Data Important? Business Analytics Big Science like LHC, Gene Sequencing Programs Big Government
  • 5. 5© Copyright 2013 EMC Corporation. All rights reserved. Big Data – Technologies Primer MapReduce computation framework and Hadoop Distributed File System Distributed databases NoSQL technologies
  • 6. 6© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Published by Google Scalable Fault-Tolerant Batch Computation in parallel A distributed computation framework
  • 7. 7© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Consists of two functions operating on key-value pairs. Map – performs filtering and sorting Reduce - performs summary operation on Map step results. … continued
  • 8. 8© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 9. 9© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 10. 10© Copyright 2013 EMC Corporation. All rights reserved. Distributed File System Distributed and scalable file system Highly Available Intrinsically aware of Map and Reduce jobs Supports horizontal and vertical partitioning HDFS – Hadoop Distributed File System
  • 11. 11© Copyright 2013 EMC Corporation. All rights reserved. HDFS Architecture Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  • 12. 12© Copyright 2013 EMC Corporation. All rights reserved. Apache Hadoop Open Source implementation of MapReduce + DFS Image Courtesy – Wikipedia
  • 13. 13© Copyright 2013 EMC Corporation. All rights reserved. NoSQL Databases Highly optimized key-value stores No ACID Guarantees. Eventual consistency Fault-Tolerant, Distributed architecture. Amazon Dynamo, Redis are examples. A distributed computation framework

×