Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite

958 views

Published on

This session aims to establish applications running against distributed and scalable system, or as we know cloud computing system. We will introduce you not only briefing of Hazelcast but also deeper kernel of it, and how it works with Spark, the most famous Map-reduce library. Furthermore, we will introduce another in-memory cache called Apache Ignite and compare it with Hazelcast to see what's the difference between them. In the end, we will give a demonstration showing how Hazelcast and Spark work together well to form a cloud-base service which is distributed, flexible, reliable, available, scalable and stable. You can find demo code here: https://github.com/CyberJos/jcconf2016-hazelcast-spark

Published in: Software
  • Hello! I can recommend a site that has helped me. It's called ⇒ www.HelpWriting.net ⇐ They helped me for writing my quality research paper.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Hi there! I just wanted to share a list of sites that helped me a lot during my studies: .................................................................................................................................... www.EssayWrite.best - Write an essay .................................................................................................................................... www.LitReview.xyz - Summary of books .................................................................................................................................... www.Coursework.best - Online coursework .................................................................................................................................... www.Dissertations.me - proquest dissertations .................................................................................................................................... www.ReMovie.club - Movies reviews .................................................................................................................................... www.WebSlides.vip - Best powerpoint presentations .................................................................................................................................... www.WritePaper.info - Write a research paper .................................................................................................................................... www.EddyHelp.com - Homework help online .................................................................................................................................... www.MyResumeHelp.net - Professional resume writing service .................................................................................................................................. www.HelpWriting.net - Help with writing any papers ......................................................................................................................................... Save so as not to lose
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

JCConf 2016 - Cloud Computing Applications - Hazelcast, Spark and Ignite

  1. 1. Cloud Computing Applications Hazelcast, Spark and Ignite Joseph S. Kuo a.k.a. CyberJos
  2. 2. About Me .大學唸數學系時玩了一堆語言和架構 .22年程式資歷,17年Java資歷 .擔任過資訊講師,曾任職於遊戲雲端平台 公司、全球電子商務公司、知名資安公司以 及社群趨勢分析公司 .希望能一輩子寫程式玩技術到老
  3. 3. Agenda .Briefing of Hazelcast .More about Hazelcast .Spark Introduction .Hazelcast and Spark .About Apache Ignite .Things between Ignite and Hazelcast
  4. 4. Briefing of Hazelcast
  5. 5. What is Hazelcast? Hazelcast is an in-memory data grid which distributed data evenly among the nodes of a computing cluster, and shares available processing power and storage space to provide services. It also has the ability for failure tolerance and node loss.
  6. 6. Features .Distributed Caching: Queue, Set, List, Map, MultiMap, Lock, Topic, AtomicReference, AtomicLong, IdGenerator, Ringbuffer, Semaphores .Distributed Compute: Entry Processor, Executor Service, User Defined Services .Distributed Query: Query, Aggregators, Listener with Predicate, MapReduce
  7. 7. Features (Cont.) .Integrated Clustering: Hibernate 2nd Level Cache, Grails 3, JCS Resource Adapter .Standards: JCache, Apache jclouds .Cloud and Virtualization Support: Docker, AWS, Azure, Discovery Service Provider Interface, Kubernetes, Zookeper Discovery .Client-Server Protocols: Memcache, Open Binary Client Protocol, REST
  8. 8. Use Cases .In-Memory Data Grid .Caching .In-Memory NoSQL .Messaging .Application Scaling .Clustering
  9. 9. In-Memory Data Grid .Scale-out Computing: shared CPU power .Resilience: failure & data loss/performance .Programming Model: easily code clusters .Fast, Big Data: handle large sets in RAM .Dynamic Scalability: join/leave a cluster .Elastic Main Memory: memory pool
  10. 10. Caching .Elastic Memcached: Hazelcast has been used as an alternative to Memcached. .Hibernate 2nd Level Cache: It organizes caching into 1st and 2nd level caches. .Spring Cache: It supports Spring Cache which allows it to plug in to any Spring application.
  11. 11. In-Memory NoSQL .Scalability: size of RAM vs DISK By joining nodes in a cluster, we can gather RAM to store maps, and the CPU and RAM resources become available to the network. .Volatility: volatility of RAM vs Disk It uses P2P data distribution to provide no single node of failure. By default, it has data stored in two locations in the cluster.
  12. 12. In-Memory NoSQL (Cont.) .Rebalancing It automatically rebalances data if a node crashes. Shuffling data has a negative effect as it consumes network, CPU and RAM. .Going Native The High-Density Memory Store can avoid GC pauses. It uses NIO DirectByteBuffers and does not require any defragmentation.
  13. 13. Messaging Hazelcast provides Topic for distribution mechanism for publishing messages that are delivered to multiple subscribers. Publish and subscriptions are cluster-wide. Messages are ordered, that is, listeners will process the messages in the order they are actually published.
  14. 14. Application Scaling .Elastic Scalability: new servers join a cluster automatically .Super Speeds: memory transaction speed .High Availability: can deploy in backup pairs or even WAN replicated .Fault Tolerance: no single point of failure .Cloud Readiness: deploy right into EC2
  15. 15. Clustering Hazelcast is easily able to handle Session Clustering with in-memory performance, linear scalability as you add new nodes and reliability. This is a great way to ensure that session information is maintained when you are clustering web servers. You can also use a similar pattern for managing user identities.
  16. 16. Dependency .Maven <dependency> <groupId>com.hazelcast</groupId> <artifactId>hazelcast</artifactId> <version>3.7.2</version> </dependency> .Gradle dependencies { compile 'com.hazelcast:hazelcast:3.7.2' }
  17. 17. More about Hazelcast
  18. 18. What’s New in Hazelcast 3.4 .High-Density Memory Store .Hazelcast Configuration Import .Back Pressure
  19. 19. What’s New in Hazelcast 3.5 .Async Back Pressure .Client Configuration Import .Cluster Quorum .Hazelcast Client Protocol .Listener for Lost Partitions .Increased Visibility of Slow Operations .Sub-Listener Interfaces for Map Listener
  20. 20. What’s New in Hazelcast 3.6 .High-Density Memory Store for Map .Discovery SPI .Client Protocol & Version Compatibility .Support for cloud providers by jclouds® .Hot Restart Persistence .Lite Members .Lots of Features for Hazelcast JCache .Hazelcast Docker image
  21. 21. What’s New in Hazelcast 3.7 .Custom Eviction Policies .Discovery SPI for Azure .Hazelcast CLI with Scripting .OpenShift and CloudFoundry Plugin .Apache Spark Connector .Alignment of WAN Replication Clusters .Fault Tolerant Executor Service
  22. 22. Sample Code public class GetStartedMain { public static void main(final String[] args) { Config cfg = new Config(); HazelcastInstance instance =       Hazelcast.newHazelcastInstance(cfg); Map<Long, String> map = instance.getMap("test"); map.put(1L, "Demo"); System.our.println(map.get(1L)); } }
  23. 23. Sharding – 4 nodes
  24. 24. How Data is Partitioned? Data entries are distributed into partitions by using a hashing algorithm (key/name): .the key or name is serialized (converted into a byte array), .this byte array is hashed, and .the result of the hash is mod by the number of partitions.
  25. 25. Partition ID The result of this modulo - MOD (hash result, partition count) - is the partition in which the data will be stored, that is the partition ID. For ALL members you have in your cluster, the partition ID for a given key will always be the same.
  26. 26. Partition Table When we start a member, a partition table is created within it. This table stores the partition IDs and the cluster members to which they belong. The purpose of this table is to make all members (including lite members) in the cluster aware of this information, ensuring that each member knows where the data is.
  27. 27. Partition Table (Cont.) The oldest member in the cluster (the one that started first) periodically sends the partition table to all members. In this way each member in the cluster is informed about any changes to partition ownership. The ownerships may be changed when a new member joins the cluster, or when a member leaves the cluster.
  28. 28. Repartitioning Repartitioning is the process of redistribution of partition ownerships: .When a member joins to the cluster. .When a member leaves the cluster. In these cases, the partition table in the oldest member is updated with the new partition ownerships.
  29. 29. Topology - Embedded
  30. 30. Topology - Client/Server
  31. 31. Spark Introduction
  32. 32. What is Spark? .Spark is a fast and general-purpose cluster computing system. It provides high- level APIs and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools. .It provides an interface for programming entire clusters with implicit data parallelism and fault-tolerance.
  33. 33. Advantages .Speed Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. .Ease of Use Write application quickly. Spark offers over 80 high-level operators to build parallel applications.
  34. 34. Advantages (Cont.) .Generality Combine SQL, streaming and complex analytics libraries seamlessly in the same application. .Run Everywhere Support multiple cluster management and distributed storage system.
  35. 35. Features .Resilient distributed dataset (RDD) .Fault Tolerant .Map-reduce cluster computing .Build-in libraries .Languages: Java, Scala, Python and R .Interactive shell (Python, Scala, R) and web-based UI
  36. 36. RDD Resilient distributed dataset is a read-only distributed data set of elements partitioned across the nodes of the cluster that can be operated on in parallel. It can stay in memory and fall back to disk gracefully. An RDD in memory (cached) can be reused efficiently across parallel operations. Finally, RDD automatically recovers from node failures.
  37. 37. RDD Operations Two types of things that can be done on an RDD: .transformations like map, filter than results in another RDD .actions like count that result in an output
  38. 38. RDD Operations (Cont.)
  39. 39. RDD Fault Recovery
  40. 40. Directed Acyclic Graph
  41. 41. Cluster Topology
  42. 42. Dependency .Maven <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.0.0</version> </dependency> .Gradle dependencies { compile 'org.apache.spark:spark-core_2.11:2.0.0' }
  43. 43. Spark Node with Docker .Pull image (Docker 2.0) docker pull maguowei/spark .Launch a Spark node docker run -it -p 4040:4040 maguowei/spark pyspark docker run -it -p 4040:4040 maguowei/spark spark- shell .Monitoring http://localhost:4040/
  44. 44. Spark Cluster with Docker .Launch master image (driver program) docker run -it -h sandbox1 -p 7077:7077 -p 8080:8080 maguowei/spark bash .Append text to /etc/hosts 172.17.0.2 sandbox1 172.17.0.3 sandbox2 .Launch the master node /opt/spark-2.0.0-bin-hadoop2.7/sbin/start-master.sh .Monitoring http://localhost:8080/
  45. 45. Spark Cluster with Docker (Cont.) .Launch work images docker run -it -h sandbox2 maguowei/spark bash .Append text to /etc/hosts 172.17.0.2 sandbox1 172.17.0.3 sandbox2 .Launch a work node /opt/spark-2.0.0-bin-hadoop2.7/sbin/start-slave.sh spark://sandbox1:7077 .Run tasks docker exec <CONTAINER_ID> run-example <class> <arg>
  46. 46. same version for all places same version for all places same version for all places
  47. 47. Very important so say 3 times
  48. 48. Hazelcast and Spark
  49. 49. What is this Connector? A plug-in which allows maps and caches to be used as shared RDD caches by Spark using the Spark RDD API.
  50. 50. What is this Connector? Clients Clients / Hazelcast (MapReduce) Spark (MapReduce) / Hazelcast Spark Connector
  51. 51. Features .Read/Write support for Hazelcast Maps .Read/Write support for Hazelcast Caches
  52. 52. Requirements .Hazelcast 3.7.x .Apache Spark 1.6.1
  53. 53. Dependency .Maven <dependency> <groupId>com.hazelcast</groupId> <artifactId>hazelcast-spark</artifactId> <version>0.1</version> </dependency> .Gradle dependencies { compile 'com.hazelcast:hazelcast-spark:0.1' }
  54. 54. Properties The options for the SparkConf object .hazelcast.server.addresses: 127.0.0.1:5701 (Comma separated list) .hazelcast.server.groupName: dev .hazelcast.server.groupPass: dev-pass .hazelcast.spark.valueBatchingEnabled: true .hazelcast.spark.readBatchSize: 1000 .hazelcast.spark.writeBatchSize: 1000 .hazelcast.spark.clientXmlPath
  55. 55. Creating the SparkContext SparkConf conf = new SparkConf() .set("hazelcast.server.addresses", "127.0.0.1:5701") .set("hazelcast.server.groupName", "dev") .set("hazelcast.server.groupPass", "dev-pass") .set("hazelcast.spark.valueBatchingEnabled", "true") .set("hazelcast.spark.readBatchSize", "5000") .set("hazelcast.spark.writeBatchSize", "5000"); JavaSparkContext jsc = new JavaSparkContext("spark://127.0.0.1:7077", "appname", conf); // provide Hazelcast functions to the Spark Context. HazelcastSparkContext hsc = new HazelcastSparkContext(jsc);
  56. 56. Read Data to Hazelcast // read HazelcastJavaRDD rddFromMap = hsc.fromHazelcastMap("map-name-to-be-loaded"); HazelcastJavaRDD rddFromCache = hsc.fromHazelcastCache("cache-name-to-be-loaded");
  57. 57. Write Data to Hazelcast import static com.hazelcast.spark.connector. HazelcastJavaPairRDDFunctions.javaPairRddFunctions; JavaPairRDD<Object, Long> rdd = hsc.parallelize(new ArrayList<Object>() { add(1); add(2); add(3); }).zipWithIndex(); // write javaPairRddFunctions(rdd).saveToHazelcastMap(name); javaPairRddFunctions(rdd).saveToHazelcastCache(name);
  58. 58. About Apache Ignite
  59. 59. What is Ignite? Apache Ignite In-Memory Data Fabric is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time, orders of magnitude faster than possible with traditional disk- based or flash technologies.
  60. 60. Features .Data Grid .Compute Grid .Streaming and CEP .Data Structures .Messaging and Events .Service Grid
  61. 61. Data Grid
  62. 62. Data Grid .Distributed Caching: Key-Value Store, Partitioning & Replication, Client-Side Cache .Cluster Resiliency: Self-Healing Cluster .Memory Formats: On-heap, Off-heap, Tiered Storage .Marshalling: Binary Protocol .Distributed Transactions and Locks: ACID, Deadlock-free, Cross-partition, Locks
  63. 63. Data Grid (Cont.) .Distributed Query: SQL Queries, Joins, Continuous Queries, Indexing, Consistency, Fault-Tolerance .Persistence: Write-Through, Read-Through, Write-Behind Caching, Automatic Persistence .Standards: JCache, SQL, JDBC, OSGi .Integrations: DB, Hibernate L2 Cache, Session Clustering, Spring Caching
  64. 64. Computing Grid
  65. 65. Computing Grid .Distributed Closure Execution .Clustered Executor Service .MapReduce and ForkJoin .Load Balancing .Fault-Tolerance .Job Scheduling .Checkpointing
  66. 66. Streaming and CEP Ignite streaming allows to process continuous never-ending streams of data in scalable and fault-tolerant fashion. The rates at which data can be injected into Ignite can be very high and easily exceed millions of events per second on a moderately sized cluster.
  67. 67. Streaming and CEP
  68. 68. Data Structures .Queue and Set .Atomic Types .CountDownLatch .IdGenerator .Semaphore
  69. 69. Messaging and Events .Topic Based Messaging .Point-to-Point Messaging .Event Notifications .Automatic Batching
  70. 70. Service Grid
  71. 71. Dependency .Maven <dependency> <groupId>org.apache.ignite</groupId> <artifactId>ignite-core</artifactId> <version>1.7.0</version> </dependency> .Gradle dependencies { compile 'org.apache.ignite:ignite-core:1.7.0' }
  72. 72. Things between Ignite & Hazelcast
  73. 73. Benchmark Fight .GridGain posted: GridGain vs Hazelcast Benchmarks .It was also posted to Hazelcast Forum .Hazelcast CEO removed that post .Hazelcast fought back and claimed that GridGain cheated .GridGain re-tested and clarified
  74. 74. Difference Ignite Hazelcast Off-heap Memory Configurable Enterprise Off-heap Indexing Yes No Continuous Query Yes Enterprise SSL Encryption Yes Enterprise SQL Query Full ANSI 99 Limited Join Query Yes No Data Consistency Yes Partial
  75. 75. Difference (Cont.) Ignite Hazelcast Deadlock-free Yes No Computing Grid MapReduce, ForkJoin LoadBalance, ... MapReduce Streaming/ Yes No Service Grid Yes No Language .Net/C#/C++/Node.js .Net/C#/C++ Data Structures Less More Plug-in Less More
  76. 76. It doesn’t matter which you select
  77. 77. How you use it does matter
  78. 78. References .Hazelcast: http://hazelcast.org/ .Hazelcast Doc: http://hazelcast.org/documentation/ .Spark: http://spark.apache.org/ .Hazelcast Spark Connector: https://github.com/hazelcast/hazelcast-spark .Apache Ignite: https://ignite.apache.org/ .Sample Code: https://github.com/CyberJos/jcconf2016- hazelcast-spark
  79. 79. Thank You!!

×