Big data, Cloud Computing and No SQL


Published on

Cloud, Big Data and No Sql are popular buzz words today.
This presentation shows how they all fit together.
It makes sense in all of the above and show how these new technologies can help the business become more productive.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Consistency: A read sees all previously completed writes.Availability: Reads and writes always succeed.Partition tolerance: Guaranteed properties are maintained even when network failures prevent some machines from communicating with others. basic idea is that if a client writes to one side of a partition, any reads that go to the other side of that partition can't possibly know about the most recent write. Now you're faced with a choice: do you respond to the reads with potentially stale information, or do you wait (potentially forever) to hear from the other side of the partition and compromise availability?
  • Big data, Cloud Computing and No SQL

    1. 1. SELA DEVELOPER PRACTICE December 15-19, 2013 Manu Cohen-Yashar The Cloud, Big Data and NoSQL © Copyright SELA software & Education Labs Ltd. | 14-18 Baruch Hirsch St Bnei Brak, 51202 Israel |
    2. 2. Agenda What is the cloud Data boom No SQL Big Data Cloud Distributions What’s next
    3. 3. Make sense of : Cloud , Big Data and No SQL How they fit together Make money !!!
    4. 4. What is the cloud Cloud Computing is an Idea … Infrastructure is provisioned by a cloud provider. Automatic Scale. Elasticity. Pay as you use. Availability. Simple, Automatic, Economic.
    5. 5. Type of Clouds IAAS PAAS SAAS and more… Identity As A Service Connectivity As A Service Storage As A Service
    6. 6. Lots of Data Data is doubles every 18 month Pictures Web site emails Sensors Geo Information Financial Information Science Art . . . (Infinite list)
    7. 7. No Limits With the cloud it is now possible to mount any size if cluster and conduct any computation in any scale. The one who will make sense of all available data will rule the world. The conclusion: Use the cloud to analyze large scale of data.
    8. 8. Lets Talk about data When we think of data we think of …
    9. 9. Data has many forms Yet data comes in many forms and shapes Graphs Time Series Documents Blobs Geo Sensors Structured Unstructured Web
    10. 10. No Relational Not all types of data fit well into the relational world. Not all data use cases fit well into the ACID convention The relational model does not scale very good Difficult to distribute Difficult to replicate
    11. 11. The CAP Theory During a network partition, a distributed system must choose either Consistency or Availability. Sharded NoSQL RDBMS Replicated NoSQL
    12. 12. NO SQL Large family of databases No Schema No relations enforced Designed for high scale and distribution Types of NO SQL DB Key Value Wide Columns Documents Graph
    13. 13. Motivation for NO SQL Large Scale and Distribution Simplicity Low cost Good fit with the data model Volume, Velocity and Variety
    14. 14. Important There is no one NO SQL solution for all use cases There are over than 150 possible offerings…
    15. 15. The Cloud and NO SQL All Cloud Providers have NO SQL solutions Azure Tables Google Big Table Amazon DynamoDB NO SQL Databases are deployed on a cluster There are large number of cloud hosting offerings for no-sql clusters MongoHQ (MongoDB) Cassandra on Google Compute engine Many more
    16. 16. Example – Mongo in Azure
    17. 17. Big Data What is Big? “Big” cannot fit on a single machine. Conclusion: Big data has to be distributed.
    18. 18. Types of Big Data Processing Query General Analysis Classification Recommendation Clustering Auditing and monitoring More…
    19. 19. Challenges Develop a parallel algorithm Reduce the network traffic -> bring compute to data Monitor and manage large number of parallel tasks Survive failures Performance Linear scale
    20. 20. Batch Processing VS Operational Intelligence Batch Processing Work on existing data Provide results within minutes Operational Intelligence Work on stream of data Provide real-time results
    21. 21. Distributed File System No one server can store Big Data files Distribute files across cluster Failure is part of the game Similar API to traditional File Systems Examples: HDFS GFS Cassandra FS Mongo FS
    22. 22. Hadoop Big Data Analysis Platform Batch Processing Brings Compute tasks to data nodes Parallel Processing using Map-Reduce Open Source Huge eco system
    23. 23. Hadoop Eco System Writing a valuable Map-Reduce job for Hadoop is not simple Many open source projects provide abstractions Pig Hive HBase Sqoop Mahout ZooKeeper More
    24. 24. Hadoop on the Cloud Hadoop runs on a cluster You can use a cluster as a service on major cloud offerings
    25. 25. Storm Real-Time big data analytics Process streams of data Can be used with any programming language Wide integration with data sources
    26. 26. Check your schema Be open to use NO-SQL data stores Identify your use-case and find the right database for you Create a simple POC
    27. 27. Look for Big Data Ask yourself: What can I gain from big data? How the new data or analysis scope can enhance your existing set of capabilities? What additional opportunities for intervention or processes optimisation does it present? Identify your use case and find the right product and data model. Look for web distributions and create a simple POC
    28. 28. Questions