Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Overview of big data in cloud computing

4,148 views

Published on

Big data in cloud computing

Published in: Technology
  • Be the first to comment

Overview of big data in cloud computing

  1. 1. BigData in Cloud computing Viet-Trung Tran @Vietstack Sunday 1 February 15
  2. 2. Bio Viet-Trung Tran trungtv@soict.hust.edu.vn https://www.facebook.com/groups/BigDataStartUp/ SoICT, Trendiction S.A Luxembourg, Microsoft Research Cambridge, INRIA France, BKAV Sunday 1 February 15
  3. 3. Sunday 1 February 15
  4. 4. Sunday 1 February 15
  5. 5. Sunday 1 February 15
  6. 6. Sunday 1 February 15
  7. 7. Sunday 1 February 15
  8. 8. Sunday 1 February 15
  9. 9. Sunday 1 February 15
  10. 10. Google trends Google MapReduce paper 2014 Sunday 1 February 15
  11. 11. BigData in science Sunday 1 February 15
  12. 12. Sunday 1 February 15
  13. 13. The Data Science: The 4th Paradigm for Scientific Discovery Last few decades Thousand years ago Today and the Future Last few hundred years 2 2 2. 3 4 a cG a a Κ−= ## # $ % && & ' ( ρπ Simulation of complex phenomena Newton’s laws, Maxwell’s equations… Description of natural phenomena Crédits: Dennis Gannon Sunday 1 February 15
  14. 14. What’s BigData Data has always been Big. The one aspect that differs now, if compared with the past, would be the sheer scale and accessibility of Data, which is the direct result of the super efficient speeds in which data can now be computed. Big Data is therefore an all- encompassing term for any collection of large data sets that were once difficult to process. Big data requires exceptional technologies to efficiently process large quantities of data within tolerable elapsed times. Sunday 1 February 15
  15. 15. Data mining -> BigData mining? Sunday 1 February 15
  16. 16. Simplified BigData stack Data analytics & visualization Data processing frameworks (Streaming, MapReduce, BSP model) Data management systems BlobSeer Sunday 1 February 15
  17. 17. BigData management Sunday 1 February 15
  18. 18. NoSQL Sunday 1 February 15
  19. 19. The last 25 years of commercial DBMS development can be summed up in a single phrase: "one size fits all". This phrase refers to the fact that the traditional DBMS architecture (originally designed and optimized for business data processing) has been used to support many data-centric applications with widely varying characteristics and requirements. In this paper, we argue that this concept is no longer applicable to the database market, and that the commercial world will fracture into a collection of independent database engines, some of which may be unified by a common front-end Sunday 1 February 15
  20. 20. Sunday 1 February 15
  21. 21. Why NoSQL “The whole point of seeking alternatives [to RDBMS systems] is that you need to solve a problem that relational databases are a bad fit for.” Eric Evans - Rackspace ACID does not scale Web applications have different needs Scalability Elasticity Flexible schema/ semi-structured data Geographically distributed Web applications do not always need Transaction Strong consistency Complex queries Sunday 1 February 15
  22. 22. Sunday 1 February 15
  23. 23. Sunday 1 February 15
  24. 24. Big Data processing engines MapReduce Sunday 1 February 15
  25. 25. Sunday 1 February 15
  26. 26. Stream processing Sunday 1 February 15
  27. 27. Large scale graph processing Sunday 1 February 15
  28. 28. 2012 Sunday 1 February 15
  29. 29. 2014 Sunday 1 February 15
  30. 30. Vanilla Hadoop ecosystem Sunday 1 February 15
  31. 31. Hortonworks data flatform Sunday 1 February 15
  32. 32. Sunday 1 February 15
  33. 33. Hadoop ecosystem: Microsoft HDinsight Sunday 1 February 15
  34. 34. BigData & Cloud A Match made in heaven? Sunday 1 February 15
  35. 35. Sunday 1 February 15
  36. 36. Sunday 1 February 15
  37. 37. Cloud features Sunday 1 February 15
  38. 38. Data in the Clouds As estimated by IDC, by 2020, about 40% data globally would be touched with Cloud Computing. Cloud adoption is accelerating – the amount of data stored in Amazon Web Services (AWS) S3 cloud storage has jumped from 262 billion objects in 2010 to over 1 trillion objects at the end of the first second of 2012. Sunday 1 February 15
  39. 39. While enterprises often keep their most sensitive data in-house, huge volumes of data such as social media data may be located externally. It is a fact that data that is too big to process is also too big to transfer anywhere, so it’s just the analytical program which needs to be moved —not the data. "You don't want to be shipping terabytes and petabytes around,". "Keep the data where it is, and then you move the analytics … to that data." Sunday 1 February 15
  40. 40. Cloud enables BigData Some of the first adopters of big data in cloud computing are users that deployed Hadoop clusters in highly scalable and elastic clouds: IBM, Azure, AWS Cloud computing democratizes big data – any enterprise can now work with unstructured data at a huge scale. Analytics-as-a-service (AaaS) models for cloud-based big data analytics Sunday 1 February 15
  41. 41. Drivers for big data on cloud adoption Cost reduction Managing cloud-based big data is cost-effective, scalable, and fast to build. Rapid provisioning/time to market Faster provisioning is important for big data applications because the value of data reduces quickly as time goes by.  Flexibility/scalability Big data analysis, especially in the life sciences industry, requires huge compute power for a brief amount of time. For this type of analysis, servers need to be provisioned in minutes. Sunday 1 February 15
  42. 42. Sunday 1 February 15
  43. 43. Sunday 1 February 15
  44. 44. BigData is not always Cloud-appropriate Low latency realtime data Virtualization overhead Multi-tenancy overhead Scalability Lack of cloud computing features to support RDBMS Availability “Rain cloud” incorporates clouds Data integrity/privacy Data can only be accessed by authorized users Currently, encryption is utilized by most researchers to ensure data privacy in the cloud Sunday 1 February 15
  45. 45. NoSQL vs SQL in the Cloud Sunday 1 February 15
  46. 46. Data security/peformance trade-offs Distributed nodes Distributed data Internode communication RPC over TCP/IP? Encrypted IO? Security/performance trade-offs Sunday 1 February 15
  47. 47. Cloud Architecture for Big Data Resource scheduling and SLA for Big Data on Cloud Storage and computation management in Cloud for Big Data Large-scale data intensive workflow in support of Big Data processing on Cloud Multiple source data processing and integration on Cloud Virtualisation and visualisation of Big Data on Cloud Fault tolerance and reliability for Big Data processing on Cloud MapReduce with Cloud for Big Data processing Distributed file storage system with Cloud for Big Data Inter-cloud technology for Big Data Security, privacy and trust in Big Data processing on Cloud Green, energy-efficient models and sustainability issues in Cloud for Big Data processing Cloud infrastructure for social networking with Big Data User friendly Cloud access for Big Data processing Innovative Cloud data centre networking for Big Data Wireless and mobility support in Cloud data centre for Big Data Sunday 1 February 15
  48. 48. BigData use cases Sunday 1 February 15
  49. 49. Security Analytics Sunday 1 February 15
  50. 50. Sunday 1 February 15
  51. 51. Sunday 1 February 15
  52. 52. Thank you for your attention Sunday 1 February 15
  53. 53. Sunday 1 February 15
  54. 54. 8 big trends in big data analytics http://www.computerworld.com/article/2690856/8-big-trends-in-big- data-analytics.html Sunday 1 February 15
  55. 55. Reference http://www.oracle.com/us/corporate/profit/big-ideas/012314- spasalapudi-2112687.html https://gigaom.com/2014/10/15/cloud-computing-is-going-to- absorb-your-big-data-workloads-too/ Sunday 1 February 15
  56. 56. Classification of BigData Sunday 1 February 15
  57. 57. Relationship between Cloud and BigData Sunday 1 February 15
  58. 58. Sunday 1 February 15
  59. 59. Sunday 1 February 15
  60. 60. Open research issues Data staging Distributed storage systems: NoSQL, NewSQL Data analysis Data security Sunday 1 February 15
  61. 61. In theory, Unfortunately, it’s not all good news. DB administrators don’t have an easy ride. The NoSQL databases that have appeared in the last few years, with their key-value pairs, document stores, and missing schemas, Sunday 1 February 15

×