Intro to Big Data

681 views
613 views

Published on

Introduction to Big Data.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
681
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
34
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Intro to Big Data

  1. 1. Presented by: Jon Bloom Senior Consultant, Agile Bay, Inc.
  2. 2. Customers & Partners Jon Bloom Blog: http://www.bloomconsultingbi.com Twitter: @sqljon Linked-in: http://www.linkedin.com/in/BloomConsultingBI Email: jbloom@AgileBay.com
  3. 3. w w w . a g i l e b a y. c o m
  4. 4. Session Agenda  What is Big Data?  What is Hadoop?  BI vs. Hadoop  Demo:
  5. 5. Terms and Acronyms  Hadoop: Apache project (open source) project to develop software for reliable, scalable, distributed computing.  Cluster: A group of computers (nodes) linked together to perform a highly-available and high computation work  HDFS distributed file system that provides high-throughput access to application data.  YARN A framework for job scheduling and cluster resource management.  MapReduce A system for parallel processing of large data sets.
  6. 6. What is Big Data?  Volume, Velocity, Variety
  7. 7. What is Hadoop  Apache open source project  Batch Oriented Parallel Processing across Commodity Servers  Ecosystem • • • • • Ambari HBase Avro Cassandra Chukwa • • • • Hive Mahout Pig ZooKeeper
  8. 8. Distributed Computing & MapReduce Mapper Reducer
  9. 9. BI vs. Hadoop  Hadoop not a replacement of BI  Extends BI capabilities  BI = Scale up to 100s of Gigabytes  Hadoop = From 100s of Gygabytes to Terabytes (1,000s og Gygabytes) and Terabytes (1,000,000 Gigabytes)
  10. 10. Blog: www.bloomconsultingbi.com Twitter: @sqljon Linked-in: http://www.linkedin.com/in/BloomConsultingBI Email: jbloom@agilebay.com

×