• Like
RRM 7012
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.



Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. RRM 7012 Research Methodology Assignment Research Proposal Title: Workload sharing in distributed environment with networked Java Virtual Machine (JVM) Low Kang Wei
  • 2. 1061608968
  • 3. Overview Since the introduction in the early 1990s, Java has gained its popularity and emerged as a great impact on IT-based sciences, engineering and commercial applications. [1] The Java is designed to be simple, object-oriented and user-friendly , so that it can be easily programmed, learned and developed into applications. Besides, it is also [1] designed for creating highly secured and robust software application on various platforms in heterogeneous and distributed networks. In order to run Java application on different hardware and software platforms, the Java source code is first compiled into architecture neutral intermediate format, i.e. bytecodes, generated by Java [1] Compiler . The generated machine independent bytecodes are then interpreted by Java Virtual Machine (JVM) to the machine code for execution. JVM is a specification of abstract machine for which Java compiler can generate codes. It consists of stack based architecture. Specific implementation of JVM on specific hardware or software platforms allows a same bytecodes to run on these different platforms [2].
  • 4. Java API’s myfiles.java class files compile Java Virtual Machine bytecodes Classloader Execution myfiles.class Engine native method invocation Host Operating System Diagram 1 Java Virtual Machine Besides, Java technology supports concurrency through multithreading mechanism. It allows parallel execution of instructions. The Java thread is provided in the language library called Thread Class. A thread is a section of codes executed independently of other threads of control within a single program. It creates the potential for parallel programming with multithreading by using Java. Computer clustering is a method of grouping multiple computers through fast local area network to form a collaborative supercomputer environment. Clustering of computers can perform differently in different applications, such as High-Availability Cluster, Load-Balancing Cluster, High-Performance Cluster and Grid Computing. Applications running on this group of computers will treat themselves running on a single terminal, however, workload is shared among the clustered computers in distributed environment. Nowadays, the conventional JVM is still lack of support for the cluster environment.
  • 5. It is running as an instance in a local system where their workload cannot be shared with other instances of JVMs within cluster. This project intends to design a platform using Jikes Research Virtual Machine [3] (JikesRVM) for a single Java application to run on multiple terminals within a cluster. The application has made explicit use of java threads by the programmer and the platform intends to distribute the threads among the terminals without the awareness of the programmer. The workload can be distributed and shared among the terminals in the cluster to achieve higher performance. Besides, benchmarking suites and tools will be developed to benchmark the performance of the system. During the execution of the Java application, the threads are distributed and executed within the cluster through TCP connection. Therefore, the application is expected to gain higher performance than a single JVM while running in the networked JVMs. Java class file Java threads are distributed to the peer node through TCP connection and executed remotely. TCP Connection JikesRVM JikesRVM (main node) (peer node)
  • 6. Java class file Java threads are distributed to the peer node through TCP connection and executed remotely. TCP Connection JikesRVM JikesRVM (main node) (peer node) Diagram 2 Basic model of the workload distribution The primary idea of this project is firstly, the java class file is run on the modified version of JVM, for my case, the JikesRVM. The main node is the node where the multithreaded application is started. Before it starts distribute the workload, it creates sessions with JVMs running on the peer nodes. These peer nodes are waiting for the incoming request from the main node. The main and peer communicates through TCP connection. All the setup of the communication channels, I/O redirection, etc. will be done before the migration and execution of threads. .
  • 7. Justification of study With the ever-increasing popularity of World Wide Web, high-performance facilities are shifting from supercomputer to network of stations. Network of computers are usually deployed to achieve higher performance and availability over the single computer. Besides, it is more cost effective than a single computer of comparable speed and availability. Meantime, cluster computing has now becoming a norm for providing high workload commercial applications. Therefore, efficient workload balancing and thread migration are expected to play important role and widely adopted in distributed system. In the past, the performance of the Java programming language has been much worse than other programming language such as C or C++. However, improvement in just- in-time (JIT) compiler helps to boost performance of Java programs and enable Java program to perform on par to C and C++ programs. Besides, it has also emerged as a solution to unite Web, cluster, multiprocessor, and uniprocessor computing. Therefore, Java programming language is now broadly used in high performance distributed computing especially for server application. Java offers a wide variety of interfaces and extensions for parallel and distributed programming. This allows the Java as a language of High Performance Computing (HPC). Due to its platform independency, Java is suitable to develop highly secure and robust server application that run within a cluster where each terminal in the cluster may have different specification of hardware or software. A lightweight, transparent and efficient Java thread migration mechanism
  • 8. implemented at JVM level can help in shortening the execution time for multithreaded Java application in cluster environment. It automatically exploits parallelism in application by distributing threads, object and classes within the cluster. Besides, server applications are mostly multi-threaded, with each thread servicing different client that has limited interaction between them, therefore, it is believed that the server applications can gain great improvement in performance with an efficient thread migration mechanism in cluster.
  • 9. Research Objectives 1. To study and investigate existing workload distribution and thread migration mechanisms. 2. To study the performance of the platform running Java multithreaded application. 3. To design a platform where Java Virtual Machine within a cluster collaborates and works together by sharing the workload. Literature Review Prior study shows that there are some projects are being done on the Java technology on cluster and high performance computing especially in designing a distributed or cluster aware JVM. Java supports threads and provides concurrency constructs at the language level for thread-based parallel computing [1]. It is worth to study the possibility of extending the conventional JVM to execute task concurrently in cluster as cluster computing is becoming important in high performance computing. Execution of a single multi- threaded Java program will span multiple machines. There are some existing research projects which try to implement Java in distributed environment. Cluster Virtual Machine for Java (cJVM) [4] This is a project from IBM Haifa Research Lab since 1999. The main objective of the [5] cJVM is to provide a single system image (SSI) of a conventional JVM while running on a cluster. Java application can be run on the cJVM without any code
  • 10. modifications. cJVM maintains a distributed heap among the JVMs within the cluster. It uses the master-proxy model for object creation and the method shipping technique for transparent remote object access. The workload distribution within the cluster is conducted by means of remote thread creation. The cJVM runs on a cluster of IBM IntelliStations running Windows NT which connected via a Myrinet switch. However, this project is no longer active since year 2000[4]. The cJVM achieves 80% higher efficiency while running on a 4 cluster nodes by presenting a large set of optimizations addressing caching, locality of execution and object placement [6]. Distributed JVM (dJVM) [7] This is a project from Department of Computer Science of The Australian National University. It objective is to provide a distributed JVM on a cluster which hide the nature of the cluster from the Java application, i.e. SSI [8]. The project is based on the JikesRVM and the cluster consists of a 96 nodes, 192 processor machines, Bunyip [8] running Linux operating system . This project is the first implementation of JikesRVM in distributed environment. JESSICA [9] The project is under the Department of Computer Science of The University of Hong Kong. It is a Java-based solution for integrating computing resources in a heterogeneous environment [9]. This implementation is also aim to hide the distributed nature of the cluster from the application. Instead of using distributed heap as which has been done by cJVM, it uses the concept of global thread space and global object space, which is a sub space created through the support of a cluster enabled [10] infrastructure, i.e. Distributed Shared Memory (DSM) . The distribution of the workload is realized by using the thread migration method [13]. Some important issues have to be addressed when designing either the workload
  • 11. [8][11] distribution mechanism in the JVM or the architecture of the cluster aware JVM [12] . • Single System Image. Many studies address the importance of single system [8][10][11] image (SSI) in their implementation . John Zigman et al.[8] and Yariv Aridor et al.[11] claim that their implementations hide the cluster from the application, i.e. the application sees a traditional virtual machine, while their [10] system itself aware of the cluster. In M.J.M. Ma et al. implementation , to bridge the cluster with the Java’s multithreading programming model, it encapsulates system resources across the cluster in a single layer of abstraction. Therefore, the user application running on the layer will see the encapsulated resources as a single entity. All the migration and distribution mechanisms are work without the awareness of the end user or the application either. • Lightweight. In high performance computing, overheads are very sensitive to the overall performance. Any overhead generated during the run time of the application decrease the performance of the system. Therefore, runtime overheads in terms of time and space to support thread migration should be [12] minimize . The overheads may occur due to certain circumstances, such as message passing, class loading, object migration etc. Therefore, these overheads should be considered seriously and be minimized when designing the distributed JVM. • Transparent. The implementation of the mechanism in the distributed JVM should not introduce any special explicit migration call to Java threads [13]. The migration operation should transparent to the Java threads. Besides, transparent thread migration makes the migrated thread appear as same as traditional JVM threads to the other Java threads. Other threads will see the migrated thread as same as a thread running in local system. Besides,
  • 12. transparency also means that the migrated thread is no way to determine if it is executing in which node [11]. • Balancing. All the threads must be distributed to utilize less loaded nodes and the workload is span evenly within all the machines in cluster. By maintaining balanced load within the nodes in the cluster, only the system will achieve maximum gain in performance [12].
  • 13. Research Methodology In this research, the system will be deployed on IBM’s Jikes Research Virtual [3] Machine (RVM) . The RVM is an open source project licensed under the CPL, which has been approved by the Open Source Initiative (OSI) as a fully certified open source license. Therefore, it is free, open source, distributed and freely redistributed. This JVM aims to provide research communities with a flexible open test bed to [3] prototype new virtual machine technologies . It includes the latest virtual machine technologies for dynamic compilation, adaptive optimization, garbage collection, thread scheduling and synchronization. It has been deployed on many platforms such as IA-32 Linux, PowerPC 32 and 64 AIX, PowerPC 32 and 64 Linux, PowerPC 32 OS X etc. [14] JikesRVM is previously developed under the Jalapeno research project in IBM Watson’s Lab from December 1997 to October 2001. It is then open sourced by the IBM in year 2001. There is a distinguish characteristic of JikesRVM compare to other JVM is that it is implemented in Java. At its first release, the aim of this project is to come out with a virtual machine for Java servers written in the Java language. Due to this unique characteristic, transformation and optimization mechanism developed can [8] be used both on the application and on the JVM itself . The JVM is first self- bootstrapped by running Java code on itself, without require a second virtual machine. This implementation has provided additional degree of portability to the JVM to work on different platforms. The underlying operating system that hosts the JikesRVM in this research project is Linux. Linux operating system is currently the most popular operating system among the research communities. Therefore, all the terminals will use the Linux operating system as the host to the JikesRVM.
  • 14. As this project is aim to design a platform running within a cluster, the cluster consists of a collection of homogeneous (same operating system and architecture) machines connected locally by a network switch. Three Pentium PCs running Fedora Core [15] Linux will form the cluster in this project. These PCs are connected through a switch and it is assume that other than the interconnect network, there are no other physically shared resources between them. For this project, the distribution system is build based on the existing JikesRVM (version 2.4.6 and above). The JikesRVM will be modified by adding extra classes or modifying the existing class to fit the requirement of the project. This is to make sure that the modified JikesRVM can run on the cluster computer and the workload can be distributed and shared among the terminals. More workload can be injected into any of the terminals that running the modified version of JikesRVM and the workload will be migrated and distributed autonomously according to a set of defined rules. First of all, some communication mechanisms have to be implemented into the JikesRVM so that the JVM can communicate within the cluster, all the communications between the JVMs are done through TCP connection. Then, a lightweight and transparent Java thread migration mechanism will be implemented at the JVM level. This thread migration mechanism is the mean to migrate the workload of the thread from one JVM to another within the cluster. The workload can be shared and distributed among the machines in the cluster automatically and performance of each machine will be optimized. There is a mechanism will be implemented in the platform to load balance the workload among the terminals. This mechanism is a decision making algorithm which will identify and make decision on how the threads can be distributed according to a set of context it has gathered.
  • 15. The scalability of the final work will also be considered during the research. It means that any workstation can join the cluster at anytime without any reconfiguration to the whole cluster. Each terminal can join or exit from the cluster to share the workload within the cluster to some degree of scalability. The degree of the scalability is identified. Benchmarking tools will be developed in this project to benchmark the performance of the platform. It consists of some high computationally intensive application which workload can be injected into the JVM and the result can be collected for analyze. These applications consist of database searching application, data encryption and decryption application, etc. The main criterion to benchmark the platform is the work completion elapsed time. Project Plan Duration: Jun 2006 – May 2008 (2 Years) Timeline: (refer to Appendix A)
  • 16. References [1] Gosling and McGilton (May 1996). The Java Language Environment. Sun Microsystems Computer Company. [2] Tim Lindholm and Frank Yellin (1999). The Java Virtual Machine Specification (Second Edition). Sun Microsystems Computer Company. [3] JikesRVM. JikesRVM HomePage. http://jikesrvm.org/ [4] IBM Haifa Labs. Cluster Virtual Machine for Java. http://www.haifa.il.ibm.com/projects/systems/cjvm/index.html [5] Yariv Aridor, Michael Factor and Avi Teperman. cJVM: a Single System Image of a JVM on a Cluster. In International Conference on Parallel Processing, pages 4-11, 1999. [6] Y.Aridor, M.Factor, A.Teperman, T.Eilam and A.Schuster. A High Performance Cluster JVM Presenting a Pure Single System Image. In JAVA Grande, 2000. [7] The Australian National University. Department of Computer Science (DCS), Towards High-performance and Fault-tolerant Distributed Java Implementation. http://djvm.anu.edu.au/ [8] John Zigman and Ramesh Sankaranarayana. Designing a distributed JVM on a cluster. In Proceedings of the 17th High Performance and Large Scale Computing Conference, Nottingham, United Kingdom, 2003. [9] System Research Group, Department of Computer Science, The University of Hong Kong. JESSICA: Java-Enabled Single-System Image Computer Architecture. http://www.srg.cs.hku.hk/homepage/srg/research_jessica.htm [10] M.J.M. Ma, C.-L. Wang and F.C.M. Lau. JESSICA: Java-Enabled Single-System- Image Computing Architecture. Journal of Parallel and Distributed Computing, Vol. 60, No. 10, 1194-1222, October 2000.
  • 17. [11] Yariv Aridor, Michael Factor and Avi Terperman. Implementing Java on Clusters. In Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing, pages 722-731, Rhodes, Greece, 2001. [12] Wenzhang Zhu, Cho-Li Wang, and Francis C.M.Lau. Lightweight Transparent Java Thread Migration for Distributed JVM. In International Conference on Parallel Processing, pages 465-472, Kaohsiung, Taiwan, October 2003. [13] M. J. M. Ma, C. L. Wang, and F. C. M. Lau. Delta execution: A preemptive Java thread migration mechanism. Cluster Computing, Vol. 3, No. 2, pages 83-94, 2000. [14] Alpern, B., Attanasio, C. R., Barton, J. J., Burke, M. G., Cheng, P., Choi, J., Cocchi, A., Fink, S. J., Grove, D., Hind, M., Hummel, S. F., Lieber, D., Litvinov, V., Mergen, M. F., Ngo, T., Russell, J. R., Sarkar, V., Serrano, M. J., Shepherd, J. C., Smith, S. E., Sreedhar, V. C., Srinivasan, H., and Whaley, J. The Jalapeño virtual machine. IBM System Journal Vol. 39, Issue 1, pages 211 – 238, Jan. 2000. [15] Red Hat, Inc. Fedora Project, sponsored by Red Hat. http://fedora.redhat.com/ Proposed Supervisor Supervisor: Lam Hai Shuan Faculty of Engineering Multimedia University Jalan Multimedia, 63100 Cyberjaya, Selangor.
  • 18. Co-Supervisor: Dr. Somnuk Phon-Amnuaisuk Faculty of Information Technology Multimedia University Jalan Multimedia, 63100 Cyberjaya, Selangor.
  • 19. Appendix A Milestones 2006 2007 2008 Jul Mar Jul Mar Sep Aug Sep Oct May May Apr Apr Jun Aug Jan Feb Jun Jan Feb Jun Oct Nov Dec Nov Dec Part 1: Literature Review and Study                                                   Java Programming Language         Thread and Multithreading         Distributed Computing - Grid Computing and         Cluster Computing JVM and Distributed JVM             Study on JikesRVM                 Part 2: Research Proposal                                                   Outline and Writing         Editing and Finalizing         Part 3: Design and Implement                                                   3.1 Design Model for Migration of Thread Object                                                   Basic migration model         Migration of thread involving no other object         Migration of thread involving primitive type           object Migration of thread involving other object           (reference and array) Other object cases such as during runtime           3.2 Cooperation between master and slave                                                   Object synchronization between master and           slave Basic Communication model between master           and slave Communication model for 1 master and 1 slave           Communication model for 1 master and many           slave Communication model for many master and         many slave
  • 20. 3.3 Decision Making of Thread Migration                                                   Workload sharing           Load balancing           3.4 Fault Tolerance and Error handling                                                   Identifying the error may occurred during           migration Calculate fault tolerancy           Implementing error checking function to avoid           errors Part 4: Benchmarking and Fine Tuning                                                   Calculate and fine tune the performance of       migration model Benchmarking and result analyzing             Time difference between enabling and             disabling migration mechanism Part 5: Thesis Writing                                                   Introduction         Design and Implementation           Discussion           Recommendation         Editing and Finalizing