Bar camp bigdata

497 views

Published on

Presentation used during Big Data session @ Barcamp Bangalore

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
497
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Bar camp bigdata

  1. 1. BIG DATA Satish A G
  2. 2. What is Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
  3. 3. General Trivia
  4. 4. 4 Vs
  5. 5. Characteristics
  6. 6. Why?
  7. 7. Differences
  8. 8. Criteria
  9. 9. USE CASE- Online Services
  10. 10. … recruit sponsors and become an advocate …to be the first human sensor and crowd- source knowledge …to learn about how technology enables big data analytics What they want to do?? Employees Digital Nomads Technology Professionals Data Scientist Students Influencers … explore new, diverse data to improve human knowledge … explore how data is impacting mankind and be a data scientist for a day …share this with my network, and participate in the experience
  11. 11. Who is a Data Scientist? Data science incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. A practitioner of data science is called a data scientist
  12. 12. Profile of Data Scientist
  13. 13. Big Data Eco System
  14. 14. Data Analytics Lifecycle
  15. 15. One Approach
  16. 16. Techniques/ Analytical Methods
  17. 17. Text Analysis- Approach
  18. 18. Text Analysis - Process
  19. 19. MapReduce & HDFS • MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster (a group of connected computers that work together so that in many respects they can be viewed as a single system) • A MapReduce program comprises a Map() procedure that performs filtering & sorting and a Reduce() procedure that performs a summary operation. • The "MapReduce System" orchestrates by marshaling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, providing for redundancy and fault tolerance, and overall management of the whole process. • HDFS is a distributed, scalable, and portable file system written in Java for the Hadoop framework.
  20. 20. Hadoop • Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Effectively, it implements MapReduce & provides a distributed file system (HDFS) • It supports the running of applications on large clusters of commodity hardware. • Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
  21. 21. Overview
  22. 22. Enterprise Visualization Softwares

×