Your SlideShare is downloading. ×
Bar camp bigdata
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Bar camp bigdata

322
views

Published on

Presentation used during Big Data session @ Barcamp Bangalore

Presentation used during Big Data session @ Barcamp Bangalore

Published in: Education, Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
322
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. BIG DATA Satish A G
  • 2. What is Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
  • 3. General Trivia
  • 4. 4 Vs
  • 5. Characteristics
  • 6. Why?
  • 7. Differences
  • 8. Criteria
  • 9. USE CASE- Online Services
  • 10. … recruit sponsors and become an advocate …to be the first human sensor and crowd- source knowledge …to learn about how technology enables big data analytics What they want to do?? Employees Digital Nomads Technology Professionals Data Scientist Students Influencers … explore new, diverse data to improve human knowledge … explore how data is impacting mankind and be a data scientist for a day …share this with my network, and participate in the experience
  • 11. Who is a Data Scientist? Data science incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. A practitioner of data science is called a data scientist
  • 12. Profile of Data Scientist
  • 13. Big Data Eco System
  • 14. Data Analytics Lifecycle
  • 15. One Approach
  • 16. Techniques/ Analytical Methods
  • 17. Text Analysis- Approach
  • 18. Text Analysis - Process
  • 19. MapReduce & HDFS • MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster (a group of connected computers that work together so that in many respects they can be viewed as a single system) • A MapReduce program comprises a Map() procedure that performs filtering & sorting and a Reduce() procedure that performs a summary operation. • The "MapReduce System" orchestrates by marshaling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, providing for redundancy and fault tolerance, and overall management of the whole process. • HDFS is a distributed, scalable, and portable file system written in Java for the Hadoop framework.
  • 20. Hadoop • Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Effectively, it implements MapReduce & provides a distributed file system (HDFS) • It supports the running of applications on large clusters of commodity hardware. • Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
  • 21. Overview
  • 22. Enterprise Visualization Softwares