Assignment 9

123
-1

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
123
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Assignment 9

  1. 1. ASSIGNMENT Module Code Module Name Course Department ESD 532 Multi core Architecture and Programming M.Sc. [Engg.] in Real Time Embedded Systems Computer Engineering Name of the Student Bhargav Shah Reg. No CHB0910001 Batch Full-Time 2011. Module Leader Padma Priya Dharishini P. M.S.Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programmes(PEMP) #470-P Peenya Industrial Area, 4th Phase, Peenya, Bengaluru-560 058 Tel; 080 4906 5555, website: www.msrsas.org POSTGRADUATE ENGINEERING AND MANAGEMENT PROGRAMME – (PEMP) MSRSAS - Postgraduate Engineering and Management Programme - PEMP i
  2. 2. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Declaration Sheet Student Name Bhargav Shah Reg. No CHB0910001 Course RTES Batch Full-Time 2011 Module Code ESD 532 Module Title Module Date Multi Core Architecture and Programming to 06-02-2012 03-03-2012 Module Leader Padma Priya Dharishini P. Batch Full-Time 2011. Extension requests: Extensions can only be granted by the Head of the Department in consultation with the module leader. Extensions granted by any other person will not be accepted and hence the assignment will incur a penalty. Extensions MUST be requested by using the ‘Extension Request Form’, which is available with the ARO. A copy of the extension approval must be attached to the assignment submitted. Penalty for late submission Unless you have submitted proof of mitigating circumstances or have been granted an extension, the penalties for a late submission of an assignment shall be as follows: • Up to one week late: Penalty of 5 marks • One-Two weeks late: Penalty of 10 marks • More than Two weeks late: Fail - 0% recorded (F) All late assignments: must be submitted to Academic Records Office (ARO). It is your responsibility to ensure that the receipt of a late assignment is recorded in the ARO. If an extension was agreed, the authorization should be submitted to ARO during the submission of assignment. To ensure assignment reports are written concisely, the length should be restricted to a limit indicated in the assignment problem statement. Assignment reports greater than this length may incur a penalty of one grade (5 marks). Each delegate is required to retain a copy of the assignment report. Declaration The assignment submitted herewith is a result of my own investigations and that I have conformed to the guidelines against plagiarism as laid out in the PEMP Student Handbook. All sections of the text and results, which have been obtained from other sources, are fully referenced. I understand that cheating and plagiarism constitute a breach of University regulations and will be dealt with accordingly. Signature of the student Date Submission date stamp (by ARO) Signature of the Module Leader and date Multi core Architecture and Programming Signature of Head of the Department and date ii
  3. 3. MSRSAS - Postgraduate Engineering and Management Programme - PEMP M. S. Ramaiah School of Advanced Studies Postgraduate Engineering and Management Programme- Coventry University (UK) Assessment Sheet Department Computer Engineering Course RTES Module Code ESD 532 Module Leader Padma Priya Dharishini P. Module Completion Date 03-03-2012 Student Name Bhargav Shah ID Number CHB0910001 Attendance Details Batch Module Title Theory Full-Time 2011 Multi core Architecture and Programming Laboratory Fine Paid Remarks (if any for shortage of attendance) Q. No a Written Examination – Marks – Sheet (Assessor to Fill) C d Total Remarks b 1 2 3 4 5 6 Marks Scored for 100 Part a Marks Scored out of 50 Result PASS Assignment – Marks-Sheet (Assessor to Fill) C d Total b FAIL Remarks A B C Marks Scored for 100 Result Marks Scored out of 50 PASS FAIL PMAR- form completed for student feedback (Assessor has to mark) Yes Overall-Result Components Assessor Reviewer No Written Examination (Max 50) Pass / Fail Assignment (Max 50) Pass / Fail Total Marks (Max 100) (Before Late Penalty) Grade Total Marks (Max 100) (After Late Penalty) Grade IMPORTANT 1. The assignment and examination marks have to be rounded off to the nearest integer and entered in the respective fields 2. A minimum of 40% required for a pass in both assignment and written test individually 3. A student cannot fail on application of late penalty (i.e. on application of late penalty if the marks are below 40, cap at 40 marks) Signature of Reviewer with date Multi core Architecture and Programming Signature of Module Leader with date iii
  4. 4. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Abstract Multi-core processors may provide higher performance to current embedded processors to support future embedded systems functionalities. According to the Industrial Advisor Board, embedded systems will benefit from multi-core processors, as these systems are comprised by mixed applications, i.e. applications with hard real-time constrains and without real-time constrains, that can be executed into the same processor. Moreover, the Industrial Advisor Board also stated that memory operations represent one of the main bottlenecks that current embedded applications must face, being even more important than the performance of the core that can suffer a degradation of 10-20% without really affecting overall performance. We take profit of this fact by studying the effect of running several threads per core, that is, we make the core multithreaded. And we also studied the effect of caches, which are a well known technique in high performance computing to reduce the memory bottleneck. Chapter 1 discuss on Arbitration schemes of Memory Access in Multicore Systems , what are the types of arbitration schemes existed up to now which is the best one of them, what are the challenging factors for these arbitration schemes in the present situation and finally short note on the factors that support the proposed arbitration schemes. Chapter 2 discuss about a multi-threaded concept of consumer and producer threads how are going to share a common queue, how to prioritize the threads if we are sharing a common thread and some test cases to test the scenarios. Chapter 3 discuss about a different situation having 4 of producers with different queues and single consumers and it will discuss about the changing priority levels of the consumer so that in the conflicting condition with the consumer thread the producer will get the high priority to execute. Multi core Architecture and Programming iv
  5. 5. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Contents Declaration Sheet .................................................................................................................................ii Abstract .............................................................................................................................................. iv List of Figures ...................................................................................................................................vii Symbols .............................................................................................................................................vii Nomenclature...................................................................................................................................viii CHAPTER 1 ....................................................................................................................................... 9 Arbitration schemes of memory access in multi core ..................................................................... 9 1.1 Introduction ...........................................................................................................................9 1.2 Types of arbitration schemes .................................................................................................9 1.3 Challenges in arbitration schemes .......................................................................................10 1.4 Impact of the arbitration schemes on throughput and latency .................................................11 1.5 Proposal of better arbitration scheme with justification ..........................................................11 1.6 Conclusion ...............................................................................................................................12 CHAPTER 2 ..................................................................................................................................... 13 Development of Consumer Producer Application ........................................................................ 13 2.1 Introduction ..............................................................................................................................13 2.2 Sequence diagram ....................................................................................................................13 2.3 Development of parallelized program using Pthread/openMP ................................................14 2.4 Test cases and Testing results for scenario 1 ...........................................................................17 2.4.1 Test cases ........................................................................................................................................ 17 2.4.2 Testing results................................................................................................................................. 18 2.5 Sequence diagram.............................................................................................................................. 19 2.6 Development of paralleled program using pthread/openMP ...................................................20 2.4 Test cases and Testing results for scenario 2 ...........................................................................23 2.4.1 Test cases ........................................................................................................................................ 23 2.4.2 Testing results................................................................................................................................. 24 2.5 Conclusion ...............................................................................................................................25 CHAPTER 3 ..................................................................................................................................... 26 Development of Consumer Producer Application with extended priority concept ................... 26 3.1 Introduction ..............................................................................................................................26 3.2 Sequence diagram................................................................................................................26 3.2 Development of designed application ......................................................................................27 3.3 Test cases and testing results for scenario 3 ........................................................................34 3.3.1 Test cases........................................................................................................................................ 34 3.3.2 Documentation of the results ........................................................................................................ 35 3.5 Conclusion ...............................................................................................................................36 CHAPTER 4 ..................................................................................................................................... 37 4.1 Module Learning Outcomes.....................................................................................................37 4.2 Conclusion ...............................................................................................................................37 References ......................................................................................................................................... 38 Appendix-1 ........................................................................................................................................ 39 Appendix-2 ........................................................................................................................................ 40 Multi core Architecture and Programming v
  6. 6. MSRSAS - Postgraduate Engineering and Management Programme - PEMP List of Tabl Table 2. 1 Test cases for single producer single consumer ................................................................17 Table 2. 2 Test cases for single producer single consumer ................................................................23 Table 3. 1 Test cases for higher priority consumer thread .................................................................34 Multi core Architecture and Programming vi
  7. 7. MSRSAS - Postgraduate Engineering and Management Programme - PEMP List of Figures Figure 2. 1 Sequence diagram for one producer and one consumer ..................................................14 Figure 2. 2 Including Libraries and files for scenario 1 .....................................................................14 Figure 2. 3 Deceleration of mutext and structures for scenario 1 ......................................................14 Figure 2. 4 Function to create new list for scenario 1 ........................................................................15 Figure 2. 5 Main function for Application of scenario 1 ...................................................................15 Figure 2. 6 Body of producer thread for scenario 1 ...........................................................................16 Figure 2. 7 Body of consumer thread for scenario 1 ..........................................................................17 Figure 2. 8 Producer thread is waiting for value in critical region ....................................................18 Figure 2. 9 Consumer thread is printing the value from inserted by producer thread .......................18 Figure 2. 10 Sequence diagram of three producer one consumer ......................................................19 Figure 2. 11 Including Libraries and files for scenario 2 ...................................................................20 Figure 2. 12 Deceleration of mutext and structures for scenario 2 ....................................................20 Figure 2. 13 Function to create new list for scenario 2 ......................................................................20 Figure 2. 14 Main function for application of scenario 2 ..................................................................21 Figure 2. 15 Body of producer thread for scenario 2 .........................................................................22 Figure 2. 16 Body of consumer thread for scenario 2 ........................................................................22 Figure 2. 17 Producer thread is waiting in critical region ..................................................................24 Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region .......24 Figure3. 1 Sequence diagram for prioritized consumer thread ..........................................................27 Figure3. 2 Including library files for scenario 3 ................................................................................27 Figure3. 3 Declaration of constructive functions for scenario 3 ........................................................28 Figure3. 4 Decelaration of list pointers and location pointers ...........................................................28 Figure3. 5 Definition of constructive functions .................................................................................28 Figure3. 6 Declaration of thread function and synchronization objects ............................................29 Figure3. 7 Main function for application of scenario 2 ....................................................................29 Figure3. 8 First producer thread .........................................................................................................30 Figure3. 9 Second producer thread ....................................................................................................30 Figure3. 10 Third producer thread .....................................................................................................31 Figure3. 11 Fourth producer thread ...................................................................................................31 Figure3. 12 Consumer thread with highest priority queue.................................................................32 Figure3. 13 Continuation of consumer thread for second priority queue ..........................................33 Figure3. 14 Continuation of consumer thread for third priority queue .............................................33 Figure3. 15 Continuation of consumer thread for last priority queue ................................................34 Figure3. 16 Results of test cases ........................................................................................................35 Multi core Architecture and Programming vii
  8. 8. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Nomenclature WRR CMP SDRAM DRR SRR PD PBS TDMA CCSP Weighted Round Robin Chip Multiprocessors Synchronous Dynamic Random Access Memory Deficit Round Robin Stratified Round Robin Priority Division Priority Based Budget Scheduler Time Division Multiple Access Credit Controlled Static Priority Multi core Architecture and Programming viii
  9. 9. CHAPTER 1 Arbitration schemes of memory access in multi core 1.1 Introduction The constraints of embedded systems in terms of power consumption, thermal dissipation, costefficiency and performance can be met by using multi core processors (CMP or chip multiprocessors). On typical medium size CMPs, the cores share a bus to the highest levels of the memory hierarchy. In multi-core architectures, resources are often shared to reduce cost and exchange information. An off-chip memory is one of the most common shared resources. SDRAM is a popular off-chip memory currently used in cost sensitive and performance demanding applications due to its low price, high data rate and large storage. An asynchronous refresh operation and a dependence on the previous access make SDRAM access latency vary by an order of magnitude. The main contribution of this report to critically compare throughput and latency for available arbitration schemes of multi core. At the end justification for batter arbitration scheme is derived from the analysis. 1.2 Types of arbitration schemes[1] There have been many approaches to provide fairness, high throughput and worst case latency bounds in the arbiter especially in the networks domain. Weighted Round Robin (WRR) is a work conserving arbiter where cores are allocated a number of slots within a round robin cycle depending on their bandwidth requirements. If a core does not use its slot, the next active core in the round robin cycle is immediately assigned to increase the throughput. Cores producing busty traffic benefit at the cost of cores which produce uniform traffic. Deficit Round Robin (DRR) assigns different slot sizes to each master according to its bandwidth requirements and schedules them in a Round Robin (RR) fashion. Difference between DRR and RR is that if a master cannot use its slot or part of its slot in the current cycle, the remaining slot (deficit) is added into the next cycle. In the next cycle, the master can transfer up to an amount of data equal to the sum of its slot size and the deficit. Thus, the DRR tries to avoid the unfairness caused to uniform traffic generators in the WRR. Stratified Round Robin (SRR) groups masters with alike bandwidth requirements into one Class. After grouping masters into various classes ,two step arbitration is applied: interclass and infraclass. The inter class scheduler schedules each class Fk once in 2k clock cycles. Hence, the lesser the k, the often the class is scheduled. The intra class scheduler uses WRR mechanism to
  10. 10. MSRSAS - Postgraduate Engineering and Management Programme - PEMP select the next master within the class. Due to more uniform distribution of bandwidth, SRR reduces the worst case latencies compared to the WRR. However, to achieve low worst case latency for a class Fk, k must be minimized which leads to over allocation. Priority Division (PD) combines TDMA and static priorities to achieve guarantees and high resource utilization. Instead of fixing TDMA slots statically, PD fixes priorities of each master within the slot statically such that each master has at least one slot where it has the highest priority. Thus, masters have guarantees equal to TDMA and unused slots are arbitrated based on static priority to increase the resource utilization. This approach provides benefit over RR or WRR only if the response time of the shared resource is fixed. In the case of variable response time (e.g. SDRAM), this approach produces high worst case latencies. . In Priority Based Budget Scheduler (PBS), masters are assigned fixed budgets of access in a unit time (Replenishment Period). Moreover, masters are also assigned fixed priorities to resolve conflicts. Budget relates to master's bandwidth requirements while priority relates to master's latency requirements. Thus, coupling between latency and band- width is removed. The shared resource is granted to the active master with the highest priority which still has a budget left. At the beginning of a replenishment period, each master gets its original budget back. Akesson et al introduce a Credit Controlled Static Priority (CCSP) arbiter. The CCSP also uses priorities and budgets within the replenishment period. But, instead of using frame based replenishment periods, masters are replenished incrementally for fine grade bandwidth assignment. 1.3 Challenges in arbitration schemes The traditional shared bus arbitration schemes such as TMDA and round robin show the several defects such as bus starvation, and low system performance. In strict priority scheduling the higher priority packets can get the most of the bandwidth therefore the lower priority packets has to wait for longer time for the resource allocation. This will cause starvation in lower priority packets. In case of WRR and LARD regarding power consumption is that both of them always have their servers turned ON even though some of them do not serve any requests. Therefore, they cannot conserve any power; Weighted Round-Robin and Deficit Round-Robin are extensions that guarantee each requestor a minimum service, proportional to an allocated rate, in a common periodically repeating frame of fixed size. This type of frame-based rate regulation is similar to the Deferrable Server, and suffers from an inherent coupling between allocation granularity and latency, where allocation granularity is inversely proportional to the frame size. Larger frame size results in finer allocation granularity, reducing over allocation, but at the cost of increased latencies for all requestors. Another common example of frame-based scheduling disciplines is time-division Multi core Architecture and Programming 10
  11. 11. MSRSAS - Postgraduate Engineering and Management Programme - PEMP multiplexing that suffers from the additional disadvantage that it requires a schedule to be stored for each configuration, which is very costly if the frame size or the number of use cases are large[2]. The above arbitration algorithms cannot handle the strict real-time requirements, so two-level arbitration algorithm, which is called the RB_Lottery bus arbitration, has been developed which will solve the impartiality, starvation and real-time problems, which exist in the Lottery method, and reduces the average latency for bus requests[5]. Within hardware verifications, the proposed arbiter processes higher operation frequency than the Lottery arbiter. Although the pays more attention on the chip area and power consumptions than the Lottery arbiter and it also has less average latency of bus requests than the Lottery arbitration. 1.4 Impact of the arbitration schemes on throughput and latency[4] Each approach to provide fairness, high throughput and worst case latency bounds, optimization of one factor degrades other factors. In Weighted Round Robin to provide low worst case latency to any core, it has to be assigned more slots in the round robin cycle which leads to over allocation. Deficit Round Robin (DRR) has very high latencies in the worst case. For example, one master stays idle for a long time and gains high deficit. Afterwards, it contentiously requests the shared resource. Since it has gained a high deficit, it will occupy the shared resource for a long time incurring very high latencies to other masters. Due to the presence of priorities, PBS is fair to high priority masters and unfair to low priority masters. When all masters are executing HRTs (as outlined in the introduction), PBS results in large WCETs for low priority masters. Credit Controlled Static Priority (CCSP) also due to the presence of the priorities, large worst case execution time bounds for lower priority masters are produced. 1.5 Proposal of better arbitration scheme with justification Stratified Round Robin is better when compare to the other arbitration, since it is a fairqueuing packet scheduler which has good fairness and delay properties, and low complexity. It is unique among all other schedulers of comparable complexity in that it provides a single packet delay bound that is independent of the number of flows. Importantly, it is also enable a simple hardware implementation, and thus fills a current gap between scheduling algorithms that have provably good performance and those that are feasible and practical to implement in high-speed routers. Interactive applications such as video and audio conferencing require the total delay experienced by a packet in the network to be bounded on an end-to-end basis. The packet scheduler decides the order in which packets are sent on the output link, and therefore determines the queuing delay experienced by a packet at each intermediate router in the network. The Low complexity with Multi core Architecture and Programming 11
  12. 12. MSRSAS - Postgraduate Engineering and Management Programme - PEMP line rates increasing to 40 Gbps, it is critical that all packet processing tasks performed by routers, including output scheduling, be able to operate in nanosecond time frames. 1.6 Conclusion By critically comparing throughput and latency for available arbitration schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet scheduler which has good fairness and delay properties, and low complexity, even there are lot more negative aspects in round robin hope replacements would be done in future. Multi core Architecture and Programming 12
  13. 13. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 2 Development of Consumer Producer Application 2.1 Introduction Today, the world of software development is presented with a new challenge. To fully leverage this new class of multi-core hardware, software developers must change the way they create applications. By turning their focus to multi-threaded applications, developers will be able to take full advantage of multi-core devices and deliver software that meets the demands of the world. But this paradigm of multi-threaded software development adds a new wrinkle of complexity for those who care the utmost about software quality. Concurrency defects such as race conditions and deadlocks are software defect types that are unique to multi-threaded applications. Complex and hard-to-find, these defects can quickly derail a software project. To avoid catastrophic failures in multithreaded applications, software development organizations must understand how to identify and eliminate these deadly problems early in the application development lifecycle. Here as the part of this work multi threaded producer consumer application is created using the given linked list program. Two scenarios has been accommodate in this part of document. In the first case producer will insert one value to the doubled linked list and at the other edge the consumer will read that value and delete it. As the part of the second case three producer threads tries to insert value in the linked list, at the end one consumer thread tries to read it and delete it. Proper synchronization mechanism is developed. 2.2 Sequence diagram A sequence diagram is a kind of interaction diagram that shows how processes operate with one another and in what order. It is a construct of a Message Sequence Chart. A sequence diagram shows object interactions arranged in time sequence. It depicts the objects and classes involved in the scenario and the sequence of messages exchanged between the objects needed to carry out the functionality of the scenario. Sequence diagrams typically are associated with use case realizations in the Logical View of the system under development. Figure 2.1 shows the sequence diagram for the one producer and one consumer .In figure, the y-axis represents the time and x-axis represents the resources. In top left side one producer thread and top right side one consumer thread is shown. At the starting, produce has to write the data in the linked list. But linked list is shared between producer and consumer thread. To provide synchronization between producer and consumer thread mutext is used. So, producer locks the mutext and write data to the list. By the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to Multi core Architecture and Programming 13
  14. 14. MSRSAS - Postgraduate Engineering and Management Programme - PEMP wait until the producer releases the mutext.This phenomena shown in Figure 2.1. In this application consumer can’t read the data until the producer produces it and store it in linked list. This is synchronization mechanism is achieved by using mutext. Trying Obtain mutext but fails Critical region Consumer has wait until resource will be free by producer Figure 2. 1 Sequence diagram for one producer and one consumer 2.3 Development of parallelized program using Pthread/openMP There are two approaches to develop the threading programs in the Linux. One is using pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the single producer and single consumer thread. Figure 2. 2 Including Libraries and files for scenario 1 Figure 2.2 shows the all preprocessor statements of the code segment. The first one is “ll2.c” file which is having definition of all the functions related to the linked list operation. The second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two are the standard library files for normal functions. As the last lines of the image, a function named create is declared. Figure 2. 3 Deceleration of mutext and structures for scenario 1 Multi core Architecture and Programming 14
  15. 15. MSRSAS - Postgraduate Engineering and Management Programme - PEMP To obtain synchronization in the application mutext is used. Here “lock” is defined as the pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”. Structure to pointer *myList is created which is holding starting address of the list. Structure to pointer *p is also created which is pointed the current position for accessing the value. These all declaration is shown by the Figure 2.2. Figure 2. 4 Function to create new list for scenario 1 Figure 2.4 shows the definition of the creat function. By the calling of this function, it will create new list. The new list is pointed by the myList pointer. list_crate() is the function which created the new list and returns addresses of the new list in the form of list_head structure. At the second line of the pointer p is created to hold the current position of the element in the list.At the initial level the current position is set to first position. By calling the list_position_creat(). Figure 2. 5 Main function for Application of scenario 1 Figure 2.5 shows the main function for the single producer and single consumer application. In figure, two functions is declared with the void pointer argument and void pointer return type. The function named “ser” is the function which is called by the producer thread. The other side consumer thread will call function “cli”.In the main One void pointer named “exit” is defined to obtain the return value from the thread function. Here, two thread object is defined named “t_ser” and “t_cli”.On the successful creation of the producer thread, ID of the thread is stored in the Multi core Architecture and Programming 15
  16. 16. MSRSAS - Postgraduate Engineering and Management Programme - PEMP “t_ser” and ID of consumer thread is stored in “t_cli”.To create the thread “pthread_create” API is used with the appropriate arguments. In this application two threads are created one is producer thread and other is the consumer thread. The consumer thread dies automatically if the producer of the consumer thread exits. To avoid this situation main thread has to wait until the consumer thread exits successfully. This mechanism is provided by the “pthread_join” API. Figure 2. 6 Body of producer thread for scenario 1 Figur 2.6 shows the body for producer thread.At the starting of the producer thread the creat() is called. This will create the one new list and assign the pointer p to first location. Consumer can’t get any value before the producer stores it to the list. To avoid such race condition mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in to the critical region. After this producer thread will take a value from the user in the variable “val”. The user entered value is stored in the list and the position of the pointer p is updated with the new current location. The storing mechanism is provided by the function “list_inserLast” with the argument of the list object (myList) and value to be inserted. After successful insertion of the value in the list any of the thread can get that value. So to end up with the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. In the critical region if the consumer thread tries to take the mutex or tries to access the critical section it has to wait until the producer realizes it. So, after unlocking the mutex by producer thread the consumer thread will acquire the resources. Figure 2.7 shows the body of the consumer thread. At the starting of the thread function it tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets access to the shared list. The value is displayed by the passing the list object to the function named “list_display”. Now the consumer thread has to remove the value. To do this function Multi core Architecture and Programming 16
  17. 17. MSRSAS - Postgraduate Engineering and Management Programme - PEMP “list_removeLast” is called with the list object. The return type of this function is location of previous data. After removing this data mutex which is taken by the consumer thread is realized. This whole phenomenon is shown by the code in figure 2.7. Figure 2. 7 Body of consumer thread for scenario 1 2.4 Test cases and Testing results for scenario 1 2.4.1 Test cases In this section test cases are designed for the producer consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 2. 1 Test cases for single producer single consumer TCN Test cases Test Expected Result Data Output Obtained TC_1 Producer thread will insert value Int Consumer should read the value inserted by the producer Yes TC_2 Only after producer thread unlocks resources consumer should acquire it Main thread should wait until all the child threads are exits Any The proper synchronization should be maintain by producer and consumer threads Yes Any The main thread has to alive until all the threads executes completely. Yes Any kind of dead lock should not occurs Any All the function of the program should execute due to resource locking it should not create dead lock Yes TC_3 TC_4 Multi core Architecture and Programming 17
  18. 18. MSRSAS - Postgraduate Engineering and Management Programme - PEMP TC_5 After reading the data consumer thread should delete it Any After reading the data which entered by the producer, consumer thread has to delete it properly Yes 2.4.2 Testing results Figure 2.8 shows the testing results of TC_1,TC_2 and TC_4.Here the server thread is waiting for the value from user domain. The server thread will holds critical region until it stores he value in the shared list. By this time client thread is waiting to acquire resources. Figure 2. 8 Producer thread is waiting for value in critical region . Figure 2.9 shows the results of TC_3 and TC_4.By the time producer thread is leave the critical region the consumer thread will enter in the critical region to read the value entered by the producer.Here after reading the value the consumer thread has to delete it .This is shown in figure Figure 2. 9 Consumer thread is printing the value from inserted by producer thread Multi core Architecture and Programming 18
  19. 19. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.5 Sequence diagram Figure 2.10 shows the sequence diagram for the three producers and one consumer .In figure, the y-axis represents the time and x-axis represents the resources. In left side of the figure Consumer thread is in wait state due to resource is acquired by producer threads (on y axis) three producer threads and right side one consumer thread is shown. Producer thread 1 is in critical region Producer thread 2 is in critical region Producer thread 3 is in critical region Trying Obtain mutext but fails Figure 2. 10 Sequence diagram of three producer one consumer At the starting, every produce has to write the data in the linked list. But linked list is shared between producer and consumer thread. To provide synchronization between producer and consumer thread mutext is used. So, every producer locks the mutext and write data to the list. By the same time if the consumer tries to read the data then it tries to acquire mutext which is taken by producer and it fails. In this case the consumer thread has to wait until the producer releases the mutext. This phenomena shown in Figure 2.10. In this application consumer can’t read the data until the producer produces it and store it in linked list. This is synchronization mechanism is achieved by using mutext. Multi core Architecture and Programming 19
  20. 20. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.6 Development of paralleled program using pthread/openMP There are two approaches to develop the threading programs in the Linux. One is using pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the three producers and single consumer thread. Definition procedures for both scenarios are same the only difference is in the main body of application code. Figure 2. 11 Including Libraries and files for scenario 2 Figure 2.11 shows the all preprocessor statements of the code segment. The first one is “ll2.c” file which is having definition of all the functions related to the linked list operation. The second one is “ pthread.h”, which is having definition of all the threading related APIs. The last two are the standard library files for normal functions. As the last lines of the image, a function named create is declared. Figure 2. 12 Deceleration of mutext and structures for scenario 2 To obtain synchronization in the application mutext is used. Here “lock” is defined as the pthread mutext object. It is essential to initialize the mutext before using it. Here initialization is handle by the assignment operation of the macro named “PTHREAD_MUTEX_INITIALIZER”. Structure to pointer *myList is created which is holding starting address of the list. Structure to pointer *p is also created which is pointed the current position for accessing the value. These all declaration is shown by the Figure 2.12. Figure 2. 13 Function to create new list for scenario 2 Figure 2.13 shows the definition of the creat function. By the calling of this function, it will create new list. The new list is pointed by the myList pointer. list_crate() is the function which created the new list and returns addresses of the new list in the form of list_head structure. At the Multi core Architecture and Programming 20
  21. 21. MSRSAS - Postgraduate Engineering and Management Programme - PEMP second line of the pointer p is created to hold the current position of the element in the list.At the initial level the current position is set to first position. By calling the list_position_creat(). Figure 2.14 shows the main function for multiple producer and single consumer application. In figure ,two function is declared with the void pointer argument and void pointer return type. The function named “ser” is the function which is called by the producer thread. Here there are three producer threads are available which will call the same function thrice. The other side consumer thread will call function “cli”. In the main One void pointer named “exit” is defined to obtain the return value from the thread function. Figure 2. 14 Main function for application of scenario 2 Here, five thread object is defined named “t_ser”, “t_ser1”, “t_ser2”, “t_ser3” and “t_cli”. On the successful creation of the producer thread, ID of the thread is stored in the defined thread object for producer and ID of consumer thread is stored in “t_cli”. Before creating thread here creat() function is called to generate list and assign current location to the pointer p. In the case of single producer and single consumer this function is called in the producer thread function because the both threads run only once in that case. In this case producer function will execute thrice so every time no need of creating new list. Once list is created, all the threads have to insert the value and append the location pointer. To create the thread “pthread_create” API is used with the appropriate arguments. In this application four threads are created in which three producer threads and one consumer thread. The consumer thread dies automatically if the main thread exits. To avoid this situation main thread has to wait until the consumer thread exits successfully. This mechanism is provided by the “pthread_join” API. Consumer can’t get any value before the producer stores it to the list. Even consumer has to wait until all the producer stores the value in the list. The other side, no other producer thread can Multi core Architecture and Programming 21
  22. 22. MSRSAS - Postgraduate Engineering and Management Programme - PEMP insert value if one producer thread is in critical region. To achieve such synchronization, mutex named “lock” is used. Function named “pthread_mutex_lock” is used to take the mutex and enter in to the critical region. After these producers thread will take a value from the user in the local variable “val”. The user entered value is stored in the list and the position of the pointer p is updated by every producer thread. The storing mechanism is provided by the function “list_inserLast” with the argument of the list object (myList) and value to be inserted from last. After successful insertion of the value by all the producer thread any of the thread (consumer) can get that value. So to end up with the critical region, to release the obtained mutex “pthread_mutex_unlock” function is used. The body of the producer threads is shown by Figure 2.15. Figure 2. 15 Body of producer thread for scenario 2 Figure 2. 16 Body of consumer thread for scenario 2 Multi core Architecture and Programming 22
  23. 23. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Figure 2.16 shows the body of the consumer thread. At the starting of the thread function it tries to take mutex. After unlocking the mutex by the producer thread the consumer thread gets access to the shared list. The value is displayed by the passing the list object to the function named “list_display”. Now the consumer thread has to remove the value. To do this function “list_removeLast” is called to remove single value from the list. In this scenario there are three values are available in the list. So reading and deletion procedure is repeated thrice. The return type of this function is location of previous data. After removing this all data mutex which is taken by the consumer thread is realized. 2.4 Test cases and Testing results for scenario 2 2.4.1 Test cases In this section test cases are designed for the four producer one consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 2. 2 Test cases for single producer single consumer TCN Test cases Test Expected Result Data TC_1 TC_2 TC_3 All producer threads should insert value in List Consumer thread should reads the value in appropriate priority Main thread should wait until all the child threads are exits TC_4 Two producer threads should not insert value by same time Output Obtained Int Consumer should read the value inserted by the producer threads Yes Any Priority is assigned to the all the producer threads. Consumer should read it in the proper priority order Yes Any The main thread has to alive until all the threads executes completely. Yes Any The proper synchronization mechanism should be maintained by the app producer threads to insert value Yes Multi core Architecture and Programming 23
  24. 24. MSRSAS - Postgraduate Engineering and Management Programme - PEMP TC_5 After reading the data consumer thread should delete it one by one Any After reading the data which entered by the producer, consumer thread has to delete it properly Yes 2.4.2 Testing results Figure 2.17 shows the testing results of developed producer-consumer application. Here consumer thread is waiting until all the consumer threads leave the critical region. Here first priority is assigned to the first producer thread.Results of TC_1,TC_2 and TC_4 is shown in the figure below. Figure 2. 17 Producer thread is waiting in critical region Figure 2.18 shows the results of test cases TC_3 and TC_4 .Only after all server threads leave the critical region consumer can enter in it to read the values from the list. As par the given priority the consumer thread reads the value. Figure 2. 18 Consumer threads is active after all the producer thread finish the critical region NOTE: In this document all the results are documented for single iteration of application to provide clear understanding. Multi core Architecture and Programming 24
  25. 25. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 2.5 Conclusion Multi-core hardware is clearly increasing software complexity by driving the need for multithreaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise and consumer devices, the challenge of creating multi-threaded applications is here to stay for software developers. In the coming years, multi-threaded application development will most likely become the dominant paradigm in software As this shift continues, many development organizations will transition to multi-threaded application development on the fly. In view of this, a producer-consumer application is successfully created using pthread APIs. Both threads shares same linked list. The synchronization is provided by mutex. The test cases are developed by critically analyzing the application code and assignment requirements. All the test cases are successfully tested. Multi core Architecture and Programming 25
  26. 26. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 3 Development of Consumer Producer Application with extended priority concept 3.1 Introduction All modern operating systems divide CPU cycles in the form of time quantas among various processes and threads (or Linux tasks) in accordance with their policies and priorities. Thread scheduling is one of the most important and fundamental services offered by an operating system kernel. Some of the metrics an operating system scheduler seeks to optimize are: fairness, throughput, turnaround time, response time and efficiency .Multiprocessor operating systems assume that all cores are identical and offer the same performance. 3.2 Sequence diagram Figure 3.1 shows the sequence diagram in which the message queue is prioritizes. In figure, the y-axis represents the time and x-axis represents the resources. Producer thread is shown by the left side in the image. In producer thread, vertical thin line shows the main thread and thick overlapped line shows the producer threads. Each producer threads maintain one queue to store data. On the right side of the image one consumer thread is shown. Before spawning the producer and consumer thread main thread locks four semaphores. After locking semaphores main will create four producers and one consumer. At the starting, consumer threads will tries to acquire semaphore in proper priority. All the producer thread will access their own message queue and insert the data. At the end, the producer thread will unlock the semaphore so consumer thread can have access to particular semaphore. In figure the ascending priority order for producer threads/queues are thread4, thread3, thread2, thread1.By the end of thread 1 it will release the semaphore1.The consumer thread continuously seeking for the size of the all the lists associated with the queues. But due to the priority assigned to the thread 3 is higher so consumer thread looks to the size of the third queue first. If thread three doesn’t contain any data in its queue then consumer thread will look for the lower priority queue. As result of this mechanism, by the time only thread one enters the element in the queue and releases the semaphore. By the next moment consumer thread will look for availability of the element in the queue three but it fails. No other higher priority has data in its queue then rather than waiting for the higher priority thread, consumer thread will read and delete the data associated with the lower priority queue. By the time higher priority producer threads enters the value in its queue,the consumer thread will immediately reads it and delete it. Multi core Architecture and Programming 26
  27. 27. MSRSAS - Postgraduate Engineering and Management Programme - PEMP In some condition when consumer and producer thread tries to acquire resource at the same time the consumer thread is given the priority to access the resources. Each producer is storing data in their queue and unlocking semaphore Before spawning consumer /producer main task locks 4 semaphores Consumer thread has to wait until highest priority producer releases the semaphore Main task Producer thread with priority 3.2 Consu mer thread got Figure3. 1 Sequence diagram for prioritized consumer thread highest priority Development of designed application semaph There are two approaches to develop the threading programs in the Linux. One is using ore pthread APIs and the other one is by using openMP APIs. As this part of this scenario pthread APIs are chosen to develop the four producers and single consumer thread. Figure3. 2 Including library files for scenario 3 Multi core Architecture and Programming 27
  28. 28. MSRSAS - Postgraduate Engineering and Management Programme - PEMP In this scenario pthreas API are used. Definition of Pthread APIs are included by including pthread.h. To provide appropriate synchronization semaphore is used. Definition of semaphore APIs and decelatarion of the semaphore type object is included with the semaphore.h. Figure 3.2 shows this files is included with the application. Figure3. 3 Declaration of constructive functions for scenario 3 Figure 3.3 shows the deceleration of constructive function. Here in this scenario four thread will create four different list. To fulfill this requirement each function for each thread is decelared. Figure3. 4 Decelaration of list pointers and location pointers Structure to pointer *myList is created which is holding starting address of the list.Here we have four different queue. To hold these queues(hold base address) four different pointer to structure list_head is created. As same, to hold the current location in the in four different list four ll_node is created. Deceleration of these all object are shown by the Figure3.4. Figure3. 5 Definition of constructive functions Figure 3.5 shows the definition of the constructive functions. By the calling of this function, it will create new list. The new list is pointed by the myList series of pointers. list_create() is the Multi core Architecture and Programming 28
  29. 29. MSRSAS - Postgraduate Engineering and Management Programme - PEMP function which created the new list and returns addresses of the new list in the form of list_head structure. At the second line of the pointer p,q,r and s will be created to hold the current position of the element in the lists. At the initial level the current position is set to first position by calling the list_position_create(). Figure3. 6 Declaration of thread function and synchronization objects Here, four producer thread will call the four different function. Decelerations of these functions are shown by the figure 3.6.As the part of synchronization mechanism four semaphores and mutext is used.The only reason for using semaphore is it can be taken by one thread and released by the other thread but in the case of mutex it is not possible. The decelerations of these objects are shown by the Figure 3.6. Figure3. 7 Main function for application of scenario 2 Figure 3.7 shows the code for the main function. Semaphore is initialized at the starting of the main function. To initialize the semaphore function sem_init is used with the three arguments. The first argument shows the address of the sem_t (semaphore) type object. The second parameter Multi core Architecture and Programming 29
  30. 30. MSRSAS - Postgraduate Engineering and Management Programme - PEMP shows that semaphore is shared between all the threads of process. The third parameter shows the initial value of the semaphore. Here in our case the initial value is 1 so semaphore is binary semaphore. After initializing all synchronization tools, threads are created by locking the semaphore So at the end of this four producer thread and one consumer thread is created by locking four semaphore and main thread will wait for client thread to finish execution. Figure3. 8 First producer thread Figure 3.8 shows the first producer thread. The first thread will enter the value in to the list named mylist. At the end of the function thread 1 will unlock semaphore named l_th, which was taken by the main function before crating the thread. By the same time consumer thread is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for data at the same time. Figure3. 9 Second producer thread Figure 3.9 shows the second producer thread. The second thread will enter the value in to the list named mylist1. At the end of the function thread 2 will unlock semaphore named l_th1, Multi core Architecture and Programming 30
  31. 31. MSRSAS - Postgraduate Engineering and Management Programme - PEMP which was taken by the main function before crating the thread. By the same time consumer thread is also waiting to take the highest priority thread’s semaphore again. Here highest priority thread is thread 3 and semaphore associated with it is l_th3.Mutext is used to avoid the multiple threads seeks for data at the same time. Figure3. 10 Third producer thread Figure 3.10 shows the third producer thread. The third thread will enter the value in to the list named mylist2. At the end of the function thread 3 will unlock semaphore named l_th2, which was taken by the main function before crating the thread. By the same time consumer thread is waiting to lock semaphore l_th3 which is already locked by main function .Mutex is used to avoid the multiple threads seeks for data at the same time. Figure3. 11 Fourth producer thread Figure 3.10 shows the fourth producer thread. The fourth thread will enter the value in to the list named mylist3. At the end of the function thread 4 will unlock semaphore named l_th3, which was taken by the main function before crating the thread. This is the highest priority thread for Multi core Architecture and Programming 31
  32. 32. MSRSAS - Postgraduate Engineering and Management Programme - PEMP which consumer thread is looking. By the moment when thread four will release the semaphore, consumer thread will be active. The consumer thread will read the data from the highest priority thread to lowest priority thread. Figure3. 12 Consumer thread with highest priority queue Figure 3.12 shows the top half of the consumer thread. As per the requirement, when the producer and consumer thread comes to gather the consumer should get highest priority to access queue. TO obtain this one instance of the structure sched_param is created. Here two APIs are used named pthread_setschedparam() and pthread_setschedprio().The first API is used to change the scheduling policy of the current thread. For consumer thread scheduling policy is set to FIFO.Basically FIFO is the scheduling policy in which the thread which comes first in ready state will get a chance to execute. No other thread can preempt the current execution of the thread. In our case due to FIFO scheduling no other thread can preempt the consumer thread. On the other side, the requirement is when consumer and producer come to at the same time consumer should get the highest priority to in such situation. To fulfill this requirement here priority of the client thread is set high and server thread is working with the normal priority. To assign priority to thread pthread_setschedprio() is used with the argument of thread ID and value of priority. In FIFO scheduling the 0 is least priority and 99 is the highest priority. After setting the priority to the thread the consumer thread will continuously mask the size variable in every list Multi core Architecture and Programming 32
  33. 33. MSRSAS - Postgraduate Engineering and Management Programme - PEMP structure of the producer threads. If producer stores some data in its list, consumer will read it and remove it. Here first thread 3 is having the highest priority so consumer thread checks size of queue associated with the thread 3. If the size is not equal to zero it menace that some data is available in the queue so it has to delete as the first priority. Figure3. 13 Continuation of consumer thread for second priority queue If the highest priority thread is not having data in its queue then it not worth for consumer thread to wait until highest priority thread stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the highest priority thread (myList3) then it will jump to check the data is second priority queue. Here the second priority is assigned to the thread 2 and queue associated with thread 2 is myList2.This mechanism is seen form the figure 3.13. In figure consumer thread is checking the myList2 if some data is available in myList2 then consumer will print it and delete it. Figure3. 14 Continuation of consumer thread for third priority queue If the first two priority threads are not having data in their queue then it not worth for consumer thread to wait until any of two threads stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the first two priority thread (myList3 and myList2) then it will jump to check the data is third Multi core Architecture and Programming 33
  34. 34. MSRSAS - Postgraduate Engineering and Management Programme - PEMP priority queue. Here the third priority is assigned to the thread 1 and queue associated with thread 1 is myList1.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the myList3.If some data is available in myList1 then consumer will print it and delete it. Figure3. 15 Continuation of consumer thread for last priority queue If the first three priority threads are not having data in their queue then it not worth for consumer thread to wait until any of three threads stores data in its queue because consumer has to serve four producer threads. To achieve this functionality if consumer thread doesn’t get the data in the first three priority thread (myList3,myList2 and myList1) then it will jump to check the data is last priority queue. Here the last priority is assigned to the thread 0 and queue associated with thread 3 is myList3.This mechanism is seen form the figure 3.14. In figure consumer thread is checking the myList3.If some data is available in myList3 then consumer will print it and delete it. 3.3 Test cases and testing results for scenario 3 3.3.1 Test cases In this section test cases are designed for the four producer one consumer system. The table below describes the test cases which are to be performed. There are to validate the functionality of system with corner cases of input. Table 3. 1 Test cases for higher priority consumer thread TCN Test cases Test Expected Result Data TC_1 Consumer should acquire higher priorities and it should run first NA Output Obtained At the starting of the program the consumer should run and shows the list is empty Multi core Architecture and Programming Yes 34
  35. 35. MSRSAS - Postgraduate Engineering and Management Programme - PEMP If the consumer and producer runs to tries to access resource together then consumer should get access first Consumer should not wait for the higher priority thread to enter value Any By the time of access of shared resources, the consumer should get higher priority Yes Any If higher priority thread doesn’t enters the value then consumer should check in the other lower priority queue Yes TC_4 If two values is enters by the any of producer thread then consumer respond it Any There are some cases when consumer is busy in printing some values and by the same time some thread enter two values in tits queue. In such condition consumer should Read and delete both values Yes TC_2 TC_3 3.3.2 Documentation of the results Figure 3.16 shows the results of the test cases which is developed in above section. It can be seen from the image that consumer thread is resounding for all the present threads queues which is having values. Attached callouts will give the batter understanding about resuts. Consumer thread is executing first as it has highest priority Thread 3 is having higher priority but it doesn’t have value in its queue. But present thread 1 has a value in its queue. Rather than waiting consumer thread giving service to the lower priority thread. Figure3. 16 Results of test cases Multi core Architecture and Programming Thread 3 has a highest priority to be read. But it is coming but consumer is not waiting for thread three. 35
  36. 36. MSRSAS - Postgraduate Engineering and Management Programme - PEMP 3.5 Conclusion Consumer Producer Application with extended priority concept is explained with a sequence diagram showing the relation between FOUR static priority producer threads, one consumer thread, four queues and their synchronization mechanism, parallelized program using P thread and test cases to test the functionality. Multi core Architecture and Programming 36
  37. 37. MSRSAS - Postgraduate Engineering and Management Programme - PEMP CHAPTER 4 4.1 Module Learning Outcomes This module helped to expertise on parallel programming for multi-core architectures, learned multi-core processors along with their performance quantification & usability techniques, single & multi-core optimization techniques and development of multi-threaded parallel programming are explained clearly. Virtualization and partitioning techniques are explained detailed along with specific challenges and solutions. Got an idea about parallel programming of multi-core processors with appropriate case studies using OpenMP and pthreads. After this module I am able to analyze multi-core architectures, optimization process of single- & multi-core processors and parallel programming for multi-core processors proficiently using OpenMP library and GCC complier for programming multi-core processors and applying parallel programming concepts for developing applications on multi-core processors was well taught using lab programs has become a efficient way of learning pthreads, OpenMP and various synchronization techniques for eliminating deadlock situation. 4.2 Conclusion In Chapter1 By critically comparing throughput and latency for available arbitration schemes of multi core, Round Robin is better when compare to the other arbitration, since it is a fair-queuing packet scheduler which has good fairness and delay properties, and low complexity, even there are lot more negative aspects in round robin hope replacements would be done in future. In chapter2 Multi-core hardware is clearly increasing software complexity by driving the need for multi-threaded applications. Based on the rising rate of multi-core hardware adoption in both enterprise and consumer devices, the challenge of creating multi-threaded applications is here to stay for software developers. In view of this, a producer-consumer application is successfully created using pthread APIs. Both threads shares same linked list. The synchronization is provided by mutex. The test cases are developed by critically analyzing the application code and assignment requirements. All the test cases are successfully tested. In chapter3 Consumer Producer Application with extended priority concept is explained with a sequence diagram showing the relation between FOUR static priority producer threads, one consumer thread, four queues and their synchronization mechanism, parallelized program using P thread and test cases to test the functionality. Multi core Architecture and Programming 37
  38. 38. MSRSAS - Postgraduate Engineering and Management Programme - PEMP References [1] http://www.coverity.com/library/pdf/coverity_multi-threaded_whitepaper.pdf [2] www.irit.fr/publis/TRACES/12619_etfa2011.pd [3] www.cs.fsu.edu/research/reports/TR-100401.pd [4] paper.ijcsns.org/07_book/200809/20080936.pdf [5] www.sti.uniurb.it/bogliolo/e-publ/KLUWERjdaes03.pdf Multi core Architecture and Programming 38
  39. 39. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Appendix-1 Multi core Architecture and Programming 39
  40. 40. MSRSAS - Postgraduate Engineering and Management Programme - PEMP Appendix-2 Multi core Architecture and Programming 40

×