Bar camp bigdata
Upcoming SlideShare
Loading in...5
×
 

Bar camp bigdata

on

  • 502 views

Presentation used during Big Data session @ Barcamp Bangalore

Presentation used during Big Data session @ Barcamp Bangalore

Statistics

Views

Total Views
502
Views on SlideShare
501
Embed Views
1

Actions

Likes
1
Downloads
5
Comments
0

1 Embed 1

https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Bar camp bigdata Bar camp bigdata Presentation Transcript

  • BIG DATA Satish A G
  • What is Big Data? Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.
  • General Trivia
  • 4 Vs
  • Characteristics
  • Why?
  • Differences
  • Criteria
  • USE CASE- Online Services
  • … recruit sponsors and become an advocate …to be the first human sensor and crowd- source knowledge …to learn about how technology enables big data analytics What they want to do?? Employees Digital Nomads Technology Professionals Data Scientist Students Influencers … explore new, diverse data to improve human knowledge … explore how data is impacting mankind and be a data scientist for a day …share this with my network, and participate in the experience
  • Who is a Data Scientist? Data science incorporates varying elements and builds on techniques and theories from many fields, including mathematics, statistics, data engineering, pattern recognition and learning, advanced computing, visualization, uncertainty modeling, data warehousing, and high performance computing with the goal of extracting meaning from data and creating data products. A practitioner of data science is called a data scientist
  • Profile of Data Scientist
  • Big Data Eco System
  • Data Analytics Lifecycle
  • One Approach
  • Techniques/ Analytical Methods
  • Text Analysis- Approach
  • Text Analysis - Process
  • MapReduce & HDFS • MapReduce is a programming model for processing large data sets with a parallel, distributed algorithm on a cluster (a group of connected computers that work together so that in many respects they can be viewed as a single system) • A MapReduce program comprises a Map() procedure that performs filtering & sorting and a Reduce() procedure that performs a summary operation. • The "MapReduce System" orchestrates by marshaling the distributed servers, running the various tasks in parallel, managing all communications and data transfers between the various parts of the system, providing for redundancy and fault tolerance, and overall management of the whole process. • HDFS is a distributed, scalable, and portable file system written in Java for the Hadoop framework.
  • Hadoop • Apache Hadoop is an open-source software framework that supports data-intensive distributed applications, licensed under the Apache v2 license. • Effectively, it implements MapReduce & provides a distributed file system (HDFS) • It supports the running of applications on large clusters of commodity hardware. • Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
  • Overview
  • Enterprise Visualization Softwares