Big Data – General Introduction

1,253 views
1,078 views

Published on

The session gives an introduction to Big Data.
Starts by giving a generally accepted definition of the term Big Data. Then explores why Big Data is important in the current business scenario . The topic ends with enumeration of the technologies used to analyze Big Data like Map Reduce, NoSql etc

5-7 key questions (non-generic) which would be covered in the webinar
(i) What is Big Data
(ii) Why is Big Data important in the current business scenario
(iii) How can an organization effectively use Big Data
(iv) What are the important technologies used to analyze Big Data?
(v) What are MapReduce/Hadoop/HDFS/NoSQL technologies?

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,253
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Big Data – General Introduction

  1. 1. 1© Copyright 2013 EMC Corporation. All rights reserved. Big Data – General Introduction - Vignesh Gopalan , IIG
  2. 2. 2© Copyright 2013 EMC Corporation. All rights reserved. Agenda Big Data – Definition Importance of Big Data Technologies used in Big Data Analysis
  3. 3. 3© Copyright 2013 EMC Corporation. All rights reserved. Big Data – A Definition Volume Variety Velocity Veracity The ‘V’s of Big Data
  4. 4. 4© Copyright 2013 EMC Corporation. All rights reserved. Why is Big Data Important? Business Analytics Big Science like LHC, Gene Sequencing Programs Big Government
  5. 5. 5© Copyright 2013 EMC Corporation. All rights reserved. Big Data – Technologies Primer MapReduce computation framework and Hadoop Distributed File System Distributed databases NoSQL technologies
  6. 6. 6© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Published by Google Scalable Fault-Tolerant Batch Computation in parallel A distributed computation framework
  7. 7. 7© Copyright 2013 EMC Corporation. All rights reserved. MapReduce Consists of two functions operating on key-value pairs. Map – performs filtering and sorting Reduce - performs summary operation on Map step results. … continued
  8. 8. 8© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  9. 9. 9© Copyright 2013 EMC Corporation. All rights reserved. Map Reduce… Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  10. 10. 10© Copyright 2013 EMC Corporation. All rights reserved. Distributed File System Distributed and scalable file system Highly Available Intrinsically aware of Map and Reduce jobs Supports horizontal and vertical partitioning HDFS – Hadoop Distributed File System
  11. 11. 11© Copyright 2013 EMC Corporation. All rights reserved. HDFS Architecture Image Courtesy – Big Data by Nathan Marz , James Warren, Manning Publications
  12. 12. 12© Copyright 2013 EMC Corporation. All rights reserved. Apache Hadoop Open Source implementation of MapReduce + DFS Image Courtesy – Wikipedia
  13. 13. 13© Copyright 2013 EMC Corporation. All rights reserved. NoSQL Databases Highly optimized key-value stores No ACID Guarantees. Eventual consistency Fault-Tolerant, Distributed architecture. Amazon Dynamo, Redis are examples. A distributed computation framework

×