Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Hadoop Ecosystem at a Glance

  1. Hadoop Ecosystem Overview
  2. About Neev Web Mobile Magento eCommerce SaaS Applications Video Streaming Portals Rich Internet Apps Custom Development iPhone Android Windows Phone 7 HTML5 Apps Cloud AWS Consulting Partner Rackspace Joyent Heroku Google App Engine Key Company Highlights 250+ team with experience in managing offshore, distributed development. Neev Technologies established in Jan ’05 VC Funding in 2009 By Basil Partners User Interface Design and User Experience Design Part of Publicis Groupe Member of NASSCOM. Performance Consulting Practices Development Centers in Bangalore and Pune. Quality Assurance & Testing Outsourced Product Development Offices at Bangalore, USA, Delhi, Pune, Singapore and Stockholm.
  3. Hadoop in a Nutshell : An Overview • Hadoop as we know is a Java based massive scalable distributed framework for processing large data (several peta bytes) across a cluster (1000s) of commodity computers. • The Hadoop ecosystem has grown over the last few years and there is a lot of jargon in terms of tools as well as frameworks. • Many organizations are investing & innovating heavily in Hadoop to make it better and easier. The mind map on the next slide should be useful to get a high level picture of the ecosystem.
  4. Hadoop : The Big Picture
  5. Hadoop Core The core consists of 1) HDFS or Hadoop Distributed File System is designed to run on a commodity cluster of machines. It is highly fault tolerant and is useful for processing large data sets. Files stored in HDFS are organized into blocks, typically 64MB or 128MB, and stored across nodes in the cluster. Each block of data is also replicated across more nodes generally 3 to avoid data loss in case of failure 2) MapReduce is a software framework for processing a large data set(peta byte scale), on a cluster of commodity hardware. When MapReduce is run, Hadoop splits the input and locates the nodes on the cluster. The actual jobs are then run at or close to the node where the data is residing so that the data is as close to the computation node. This stops the network from getting flooded with data or becoming a bottleneck
  6. Hadoop : Distributions Hadoop Distribution Description Apache Purely Open Source maintained by Apache Cloudera The leading distribution with capabilities like management, security, high availability and integration with many other solutions for software and hardware. HortonWorks Only version for Windows Servers MapR unique features like mounting over NFS GreenPlum Uses an SQL based Database Engine Intel Intel’s open source version AmazonEMR Amazon’s version of MapReduce called Elastic MapReduce, a part of AWS. EMR allows a Hadoop cluster to be deployed and MapReduce jobs to be run in the cloud with just a few clicks.
  7. Related Projects Related Projects Description Avro Data serialization framework that is useful in Hadoop and other systems Framework for analyzing large data set using a high level language called Pig Latin Hive is a data warehouse framework that stores querying of large data sets stored in Hadoop Pig Hive Hbase Mahout Yarn Ozzie Flume Sqoop Cascading HBase is a distributed scalable data store based on Hadoop Mahout is a scalable Machine learning library YARN is the next generation of MapReduce Involves running a sequence of MapReduce and other pre and post processing jobs at scheduled times or based on data availability A distributed, reliable and available service for collecting, aggregating and moving log data to HDFS Designed for transferring data between Hadoop and relational databases Application framework for building application using Hadoop
  8. Related Technologies Related Technologies Twitter Storm HPCC Dremel Description As opposed to Hadoop which is a batch processing system, Storm is a distributed real time processing system developed by Twitter. Storm is fast, scalable and easy to use. High Performance Computing Cluster is an MPP(Massive parallel processing) computing platform that helps solving problems with handling huge data. A scalable interactive ad-hoc query system for analysis of read-only nested data built by Google.
  9. Clients
  10. Partnerships
  11. Neev Information Technologies Pvt. Ltd. India - Bangalore India - Pune The Estate, # 121,6th Floor, #13 L’Square, 3rd Floor Dickenson Road Parihar Chowk, Aundh, Bangalore-560042 Pune – 411007. Phone :+91 80 25594416 Phone : +91-64103338 USA sales@neevtech.com Sweden Singapore Neev AB, Birger Jarlsgatan 1121 Boyce Rd Ste 1400, Pittsburgh PA 15241 Phone : +1 888-979-7860 #08-03 SGX Centre 2, 4 53, 6tr, Shenton Way, 11145, Stockholm Singapore 068807 Phone: +46723250723 Phone: +65 6435 1961 For more info on our offerings, visit www.neevtech.com
Advertisement