Big Data Building Blocks with AWS Cloud


Published on

The Presentation Talks about how Cloud Computing is Big Data's Best Friend and How AWS Cloud Components Fit in to complete your Big Data Life Cycle.


- How Big is Big Data Actually growing?
- How Cloud has the potential to become Big Data's Best Friend
- A tour on The Big Data Life Cycle
- How AWS Cloud Components Fit in to this Life Cycle
- A Case Study of Our Log Analytics Tool Cloudlytics, using Big Data Implementation
on AWS Cloud.

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Big Data Building Blocks with AWS Cloud

  1. 1. KINESIS REDSHIFT DYNAMO DB EMR 2.5 quintillion bytes of Data is generated Everyday! Big Data Building Blocks With AWS How Do You Tackle These Big Data Challenges?
  2. 2. Agenda 1 5 Big Data is getting Bigger and Bigger ! 3 2 Figuring Out the Big Data Life Cycle 4 How AWS Building Blocks can Help Tame Big Data! Why is Cloud Big Data’s Best Friend ? Cloud IT Better How Cloudlytics Uses AWS Cloud for its Big Data 2
  3. 3. So What is Big Data ? Simply put, Big Data is data which cannot be processed by the current tools or technologies. Big Data is too Big, too Fast and too Varied. Cloud IT Better 3 High Resolution images from NASA, our place in the cosmos!
  4. 4. The 3 V’s that make Big Data difficult to Tame! Volume Conventional Databases allow processing of data in batches, it could take days weeks to process one batch of Big Data. Cloud IT Better Variety Data from social networks, sensors installed at store entrances, traffic lights, in airplanes, Car GPS and countless other sources !! 2.5 quintillion bytes of Data is generated everyday! 4 Velocity Twitter Generates 5 Giga Bytes of data/min Facebook generates 7 Giga Bytes of data/min.
  5. 5. Big Data is Getting Bigger and BIGGER! “ It is estimated that Walmart collects more than 2.5 petabytes of data EVERY HOUR from its customer transactions ” “ More data crosses the internet EVERY SECOND than were stored in the entire internet just 20 years ago? “ “ Zuckerberg noted that 1 billion pieces of content are shared via Facebook’s Open Graph DAILY ! “ Cloud IT Better 5
  6. 6. Why is Cloud Big Data’s Best Friend ? With Big Data, we Know we want to Generate, Store, Analyze & Share. But How does Cloud come in to Picture? Cloud IT Better 6
  7. 7. Our IT Resources are Limited & Precious! And, Cloud has The Solution for this !! Cloud IT Better 7
  8. 8. Cloud Has Many Advantages Elasticity Fast Time to Market On Demand Flexible Cost Effective Pay Per Use Secure Resilient Cloud IT Better No CapEx Remote Access 8 Scalable Pooled Resources
  9. 9. Cloud Optimizes Your IT Resources Cloud Makes Sure that Your Precious IT Resources are OPTIMIZED Cloud IT Better 9
  10. 10. Cloud makes it Easy! Cloud Makes Big Data Easier To Handle Image Courtesy: Cloud IT Better 10
  11. 11. Let us Figure out the Big Data Life Cycle Generation In order to make the entire process of Big Data more tangible, it is divided into 4 stages: Data Collaboration & Sharing Collection & Store Analyze & Computation Cloud IT Better 11
  12. 12. Generating the Data Financial analysis Scientific simulations Structured Data – Employee Records Semi Structured Data – End User Logs Bioinformatics research Data warehousing Generation Data Collaboration & Sharing Web based APIs can be used to access this data and Store it. 12 Web indexing Log file analysis Data Mining Unstructured Data – Social User Profile images Cloud IT Better Machine learning Collection & Store Analyze & Computation
  13. 13. Transferring Your Data to AWS Cloud To transfer your Data Sets on to the Cloud You can Use: AWS Import/Export AWS Storage Gateway Move large amounts of data into and out of AWS using portable storage devices for transport Secure Integration between an On-premises IT & AWS’s storage infrastructure AWS Direct Connect Establish a dedicated network connection from your premises to AWS Cloud IT Better 13
  14. 14. Collecting & Storing Data on AWS Cloud AWS Relational Database Service (RDS) Simple Storage Service (S3) Write, read, and delete objects containing from 1 byte to 5 terabytes of data each. A full featured relational databases giving you access to capabilities of a MySQL, Oracle, SQL Server, or PostgreSQL databases engines AWS DynamoDB A fast, fully managed NoSQL database service making it simple & cost-effective to store & retrieve any amount of data, and serve any level of request traffic. Cloud IT Better 14
  15. 15. Data Analysis on AWS Cloud Once You’ve stored your Content On Cloud, It is Time to Analyze It !! Cloud IT Better So if you’re Thinking implementing a Hadoop Infrastructure …… / 15
  16. 16. Data Analysis on AWS Cloud Setting Up a Hadoop Infrastructure is not that Easy, But AWS Has the Answer ! Image courtesy: Cloud IT Better 16
  17. 17. Data Analysis on AWS Cloud Amazon Elastic Map Reduce (EMR) • A managed Hadoop distribution by Amazon Web Services using customized Apache Hadoop framework • Using MapReduce, in which a data processing tasks are mapped to set of servers in a cluster for processing. • EMR integrates with AWS S3 (an alternative Storage to HDFS) & EC2(Compute Instances). • EMR allows you to tune the default Hadoop Job Flows to your custom needs. • The various How To’s of Hadoop Architecture such as adding, removing & configuring nodes is taken care of by EMR. Cloud IT Better 17
  18. 18. AWS Redshift for Retrieval & Collaboration Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service making it simple & cost-effective to efficiently analyze all your data using your existing business intelligence tools. • Amazon Redshift has a massively parallel processing (MPP) architecture, parallelizing and distributing SQL operations. • You can use AWS Redshift to Store and retrieve processed data quickly, to generate custom based Reports. AWS Redshift Cloud IT Better 18
  19. 19. AWS Data Pipelines for Automation AWS Data pipeline allows users to define a dependent chain of data sources and destinations with an option to create data processing activities called pipeline. Input Node Activity • • • • • Can be implemented across all stages of Big Data Life Cycle. Tasks Scheduled to perform Data movement and processing Activities. Failure & Retry options in Data pipeline workflows also Available. Input & Output Data nodes support S3 Bucket, DynamoDB, MySQL DB & SQL Data Source. Activities currently supported are Copy, EMR, Hive & Shell Activity. Output Node Cloud IT Better 19
  20. 20. AWS Kinesis (NEW) Amazon Kinesis is a fully managed service for real-time processing of streaming data at massive scale. Amazon Kinesis can collect and process hundreds of TBs of data/hr from hundreds of thousands of sources. • Real Time Processing allowing you to answer questions about the current state of your data. • Amazon Kinesis automatically provisions & manages the storage required to reliably & durably collect your data stream. • You can add as many as kinesis Streams as desired based on the volume & variety of Data. • Your Kinesis Streams are connected to your Kinesis App from which you can use DynamoDB or Redshift to process complex queries at real Time. Image courtesy: Cloud IT Better 20
  21. 21. The Big Data Life cycle - Compiled Generation AWS S3 AWS RDS AWS DynamoDB AWS Redshift AWS Data Pipeline Data Collaboration & Sharing Collection & Store Analyze & Computation AWS EMR AWS Data Pipeline Cloud IT Better 21 AWS S3 Component Description AWS RDS ……………………… AWS DynamoDB ……………… AWS Data Pipeline ........ ……………. …….
  22. 22. Use Case - Cloudlytics Cloudlytics is a Pay-as-you-Go, SaaS based Log Analytics Tool powered by AWS. It Takes the Big Data Approach using AWS Components such as EMR & Redshift. Processed Data Processing Customer Log Files Stored in S3 Customer Reports Cloud IT Better 22
  23. 23. Check out our Past Webinars Cloud IT Better 23
  24. 24. Thank you Follow Us On : Our Blog : Contact us : Cloud IT Better 24