Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Introduction to Cloud computing and Big Data-Hadoop


Published on

Cloud Computing Evolution
Why Cloud Computing needed?
Cloud Computing Models
Cloud Solutions
Cloud Jobs opportunities
Criteria for Big Data
Big Data challenges
Technologies to process Big Data- Hadoop
Hadoop History and Architecture
Hadoop Eco-System
Hadoop Real-time Use cases
Hadoop Job opportunities
Hadoop and SAP HANA integration

Published in: Technology

Introduction to Cloud computing and Big Data-Hadoop

  1. 1. Introduction: Cloud Computing and Big Data - Hadoop Presented By: Nagarjuna D.N SAP CTL AT&T, Bengaluru Date: 14-07-2015
  2. 2. Overview • Cloud Computing Evolution • Why Cloud Computing needed? • Cloud Computing Models • Cloud Solutions • Cloud Jobs opportunities • Criteria for Big Data • Big Data challenges • Technologies to process Big Data- Hadoop • Hadoop History and Architecture • Hadoop Eco-System • Hadoop Real-time Use cases • Hadoop Job opportunities • Hadoop and SAP HANA integration • Summary 2
  3. 3. Internet of Things (IoT) Big Data “One of the Reason is Cloud Computing….!” 3
  4. 4. Cloud Computing (Evolution of an internet and its hidden from the end user) • Infrastructure is maintained somewhere with shared computing resources -servers and storage, network, all delivered over the Internet. • The Cloud delivers a hosting environment that is- -immediate, -flexible, -scalable, -secure, -available, -saves corporations money, time and resources. Flexible Scalable Secure
  5. 5. Cloud Computing (Cont….) • In addition, the platform provides on demand services, i.e always on, anywhere, anytime and any place. • “Pay-for-what-you-use”- metered basis. • Its based on utility computing and Virtualization. 5
  6. 6. Cloud Computing History
  7. 7. Traditional Infrastructure Model Forecasted Infrastructure Demand Time Capital 7
  8. 8. Acceptable Surplus Forecasted Infrastructure Demand Surplus Time Capital 8
  9. 9. Actual Infrastructure Model Actual Infrastructure Demand Time Capital 9
  10. 10. Unacceptable Surplus Surplus Time Capital 10
  11. 11. Unacceptable Deficit Deficit Time Capital 11
  12. 12. Utility Infrastructure Model (Concept of Cloud Computing) Actual Infrastructure Demand Time Capital 12
  13. 13. Cloud Flavors (Service Models) • IaaS – Infrastructure as a Service • PaaS – Platform as a Service • SaaS – Software as a Service 13
  14. 14. SaaS Examples 14
  15. 15. IaaS Examples 15
  16. 16. PaaS Examples 16
  17. 17. Cloud Deployment Models • Public Cloud • Private Cloud • Hybrid Cloud • Community Cloud 17
  18. 18. Cloud Distribution Examined 18
  19. 19. Enterprise Cloud Solutions 1. Test / Development / QA Platform o Use cloud infrastructure servers as test and development platform 2. Disaster Recovery o Keep images of servers on cloud infrastructure ready to go in case of a disaster 3. Cloud File Storage o Backup or Archive company data to cloud file storage 4. Load Balancing o Use cloud infrastructure for overflow management during peak usage times 19
  20. 20. Enterprise Cloud Solutions (cont) 5. Overhead Control o Lower overhead costs and make bids more competitive 6. Distributed Network Control and Cost Reporting o Create an individual private networks (VPC) for each of subsidiaries or contracts 7. Rapid Deployment o Turn up servers immediately to fulfill project timelines 8. Functional IT Labor Shift o Refocus IT labor expense on revenue producing activities 20
  21. 21. Preparing for the Future Cloud IT Jobs Sampling of IT skills likely to be in demand in the future o Functional application development and support  I.e. Oracle, SAP, SQL, linking hardware to software o Leveraging data to make strategic business decisions  I.e. Business Intelligence : Applying sales forecasts to inventory and manufacturing decisions o Mobile apps  Android, iPhone, Windows Mobile o Wi-Fi engineers  USF to include broadband communications (LTE replaces GSM/CDMA) o Optical engineers  Optical offers the highest bandwidth today (PON, CWDM, DWDM) o Virtualization Specialists  Economies of scale require virtualization (server, storage, client…) o IP Engineers o Network Security Specialists o Web developers o Social Media developers o Business Intelligence application development and support 21
  22. 22. IT Cloud infrastructure 23
  23. 23. “Big Data- Big Thing” • Big Data is exactly like Rubik’s cube. • Just like a Rubik’s cube Big Data has many different solutions. • If you take five Rubik’s cube and mix up the same way and give it to five different expert’s. • They will solve the Rubik’s cube in fractions of the seconds. • But if you pay attention to the same closely, you will notice that even though the final outcome is the same, the route taken to solve the Rubik’s cube is not the same. • Every expert will start at a different place(colors) and will try to resolve it with different methods. • It is nearly impossible to have a exact same route taken by two experts. Begining Big Data 24
  24. 24. 25
  25. 25. Big Data Definition in general • Big Data is a collection of data sets that are large and complex in nature. • They constitute both structured and unstructured data that grow large so fast that they are not manageable by traditional relational database systems(Eg., RDBMS). 26
  26. 26. Big Data Technically i. Volume petta bytes or Zetta bytes. ii. Velocity Batch or real(stream) time processing. iii. Variety Structured, semi-structured & Unstructured. It is estimated that 80% of world’s data are unstructured and rest of them semi-structured and structured. iv. Veracity The quality of the data being captured can vary greatly. Fig.Big Data Based on Doug Cutting 3Vs model 27
  27. 27. Variety of Data 1. Structured Data:- Data i.e. identifiable because its organized in a structure(Standard defined format) E.g.: Database, Data Warehouses & Electronic spreadsheets. 2. Semi-Structured Data:- Data i.e. neither raw data, nor typed data in a conventional database system E.g.: Wiki pages, Tweets, Facebook data & Instant Messages. 3. Unstructured Data:- its doesn’t have standard defined structure E.g.: Data files, Audio files, Video, Graphics & Multimedia. 28
  28. 28. Traditional Data v/s Big Data Attributes Traditional Data Big Data Volume Gigabytes to terabytes Petabytes to zettabytes Organizaton Centralized Distributed Structure Structured Semi-structured & unstructured Data model Strict schema based Flat schema Data relationship Complex interrelationships Almost flat with few relationships 29
  29. 29. Criteria of Big Data 1. 272 hours of video are uploaded to YouTube every minute and over 3 billion hours of video are watched every month. 2. Radio Frequency ID (RFID) systems generated up to 1,000 times more data compared to the conventional bar code systems. 3. 340 million tweets are sent every day and that amounts of 7TB of data. 4. Social networking site, Facebook, processes over 10TB of data every day. 5. Over 5 billion people use cell phones to call, send SMS, email, browse Internet, and interact via social networking sites. 6. The Square Kilometre Array project of NASA receives 700 TB of data per second. 30
  30. 30. Challenges with Big Data 1. Scaling is costly. 2. Strategy must be in place before you hit the limit of a single computer. 3. Most entreprises responded to scalability needs when they started facing problems of poor response and low throughput. 4. Adding hardware to existing system is manpower extensive and hence error prone. 5. Mixed data type - structured and unstructured - makes scaling even harder. 31
  31. 31. Exploring Big Data for business insights 32
  32. 32. 33
  33. 33. Big Data solutions with Hadoop 34
  34. 34. Organizations Adopted Big Data 35
  35. 35. How are Organizations using Big Data Technology? 36
  36. 36. 37
  37. 37. Feb 14th 2011 –Watson is IBM’s super computer built using Big Data Technology. Its not online & its process like a human brain. 38
  38. 38. 39
  39. 39. Tools typically used in Big Data Scenarios 40
  40. 40. Technology to process Big Data- Hadoop (Open-source software framework written in Java) • Open-source software: It's free to download, though more and more commercial versions of Hadoop are becoming available. • Framework: It means that everything you need to develop and run software applications is provided –programs, connections, etc. • Distributed storage: The Hadoop framework breaks big data into blocks, which are stored on clusters of commodity hardware. • Processing power: Hadoop concurrently processes large amounts of data using multiple low-cost computers for fast results. • Hadoop an DFS and not Database. Its designed for information from many forms. • Open source project started by Doug Cutting- employee of Yahoo. Hadoop is the name of his sons toy elephant. • Apache software foundation- Apache Hadoop. 41
  41. 41. Hadoop Creation History 42
  42. 42. Hadoop Architecture Hadoop core has two major components (daemons): 1. HDFS a. NameNode b. Secondary NameNode c. DataNode 2. MapReduce Engine (distributed data processing framework) a. JobTracker b. TaskTracker 46
  43. 43. What components make up Hadoop? • Hadoop Common – the libraries and utilities used by other Hadoop modules. • Hadoop Distributed File System (HDFS) – the Java-based scalable system that stores data across multiple machines without prior organization. • MapReduce – a software programming model for processing large sets of data in parallel. • YARN – resource management framework for scheduling and handling resource requests from distributed applications. (YARN is an acronym for Yet Another Resource Negotiator.) 45
  44. 44. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slaves Master Task Tracker Data Node Job Tracker Name Node MapReduce HDFS Hadoop Architecture 47
  45. 45. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slaves Master Task Tracker Data Node Job Tracker Name Node 48
  46. 46. Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Task Tracker Data Node Slaves Master Task Tracker Data Node Job Tracker Name Node 49
  47. 47. Node RACK RACK RACK RACK Cluster Data Center 50
  48. 48. 51
  49. 49. MapReduce Example 52
  50. 50. Benefits of Hadoop • Scalable– New nodes can be added without needing to change data formats. • Cost effective– Hadoop brings massively parallel computing to commodity hardwares. • Flexible– Hadoop is schema-less, and can absorb any type of data, structured or not, from any number of sources. • Fault tolerant– When you lose a node, the system redirects work to another location of the data and continues processing without missing a heartbeat. • Programming languages- Java(default)/python. • Last but not least – it’s free! ( Open source). 43
  51. 51. Hadoop is not Suitable for All Kinds of Applications Hadoop is not suitable to: • perform real-time, stream-based processing where data is processed immediately upon its arrival. • perform online access where low latency is required. 44
  52. 52. Hadoop Eco-System 53
  53. 53. Real-Time Hadoop Use Cases 1. Risk Modeling (How can banks understand customers & markets ?) 2. Customer churn analysis (why do companies really lose customers?) 3. Ad Targeting (How can companies increase campaign efficiency?) 4. Point of sale transaction analysis (How do retailers target promotion guaranteed to make you buy?) 5. Search quality (What’s in your search?) Hyperlink54
  54. 54. 55
  55. 55. 56
  56. 56. Hadoop Job Opportunities 57
  57. 57. 58
  58. 58. Apache Hadoop & SAP HANA Integration (Future Generation Technologies) 59
  59. 59. In Real-Time Business 60
  60. 60. Resources 61
  61. 61. Summary o Cloud Computing o Big Data o Apache Hadoop o Hadoop and SAP HANA integration 62
  62. 62. More Details Nagarjuna D N More Cloud Solutions Architect Skills: • Amazon Cloud (Amazon Web Services) • MongoDB (NoSQL Database) • Play Framework (Web Application Framework) • Domain/ SSL Certificate setup • Apache Hadoop, Apache Pig, Apache hive
  63. 63. Your Valuable Feedback Please • Compulsory to where I must improve………..!