Introduction to Big Data by Manouj Bongirr

640 views
403 views

Published on

Introduction to Big Data by Manouj Bongirr presented at Big Data Meetup - Pune Chapter #BigDataPune

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
640
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
23
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Introduction to Big Data by Manouj Bongirr

  1. 1. Copyright ©2012 Big Logic Technologies
  2. 2. A Big Data - Technology, Consulting & Training Firm -- Big Logic was founded in the US, based upon seeing the value of Apache Hadoop as it provides a Big Data Analytics Platform. -- At Big Logic, we share our experiences after guiding many enterprises through successful Big Data projects. We empower you to decide on build versus buy when it comes to achieving your defined business objectives across various technical environments. Copyright ©2012 Big Logic Technologies
  3. 3. Copyright ©2012 Big Logic Technologies
  4. 4. Big data is a term applied to data sets whose size is beyond the ability of commonly used software tools to capture, manage, and process the data within a tolerable elapsed time. Gartner Predicts 800% data growth over next 5 years 4 Copyright ©2012 Big Logic Technologies 80-90% of data produced today is unstructured
  5. 5. Copyright ©2012 Big Logic Technologies
  6. 6. 6 Copyright ©2012 Big Logic Technologies
  7. 7. gigabyte (GB) 109 1024MB terabyte (TB) 1012 1024GB petabyte (PB) 1015 1024TB exabyte (EB) 1018 1024PB zettabyte (ZB) 1021 1024EB yottabyte (YB) 1024 1024YB 2020 35 zettabytes i.e. 35Billion TBs 44x as much Data and Content Over Coming Decade 2009 800,000 petabytes Source: IDC, The Digital Universe Decade – Are You Ready?, May 2010 1 zettabyte = 1 099 511 627 776 GB 7 Copyright ©2012 Big Logic Technologies
  8. 8. Copyright ©2012 Big Logic Technologies
  9. 9. Source: http://www.slideshare.net/cultureofperform ance/gartner-idc-and-mckinsey-on-big-data Copyright ©2012 Big Logic Technologies
  10. 10. “ Moore's law is the observation that, over the history of computing hardware, the number of transistors on integrated circuits doubles approximately every two years. ” ..Intel co-founder Gordon E. Moore Copyright ©2012 Big Logic Technologies
  11. 11. RAM Max Capacity : 32GB HDD Max Size : 6TB -------------------CPU Max Speed------------------- Copyright ©2012 Big Logic Technologies
  12. 12. Copyright ©2012 Big Logic Technologies
  13. 13. Copyright ©2012 Big Logic Technologies
  14. 14. If I Need to process 100TB datasets • On 1 node: – scanning @ 50MB/s = 23 days • On 1000 node cluster: – scanning @ 50MB/s = 33 min  Challenge: Hardware Problems / Process and combine data from Multiple disks Copyright ©2012 Big Logic Technologies
  15. 15. •Apache Hadoop is an open source framework for storing, processing and analysing massive amounts of multi-structured data in a distributed environment. •Hadoop was inspired by Google's MapReduce and Google File System (GFS) papers. Copyright ©2012 Big Logic Technologies
  16. 16. If you are in any of the above segments you would be the part of the above revenue Copyright ©2012 Big Logic Technologies
  17. 17. Copyright ©2012 Big Logic Technologies

×