COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING

B.E. Computer Engineering Computer Laboratory - IV
Pune Vidyarthi Griha’s COLLEGE OF ENGINEERING, Nasik - 4
LAB MANUAL
COMPUTER
LABORATORY-IV
Subject Code: 410454
PROF. ANAND N. GHARU
2017 - 18

List of Assignments:
Group A:
1. Using Divide and Conquer Strategies design a cluster/Grid of BBB or Rasberi pi or
Computers in network to run a function for Binary Search Tree using C /C++/
Java/Python/ Scala2.
2. Using Divide and Conquer Strategies design a class for Concurrent Quick Sort using
C++.
3. Write a MPI program for calculating a quantity called coverage from data files.
Hint :- Program distributes computation efficiently across the cluster. The program
should be able to work with any number of nodes and should yield the same results as
the serial code.
4. Write a program on an unloaded cluster for several different numbers of nodes and
record the time taken in each case. Draw a graph of execution time against the number
of nodes.
5. Build a small compute cluster using Raspberry Pi/BBB modules to implement Booths
Multiplication algorithm.
6. EL-IV BAI : Use Business intelligence and analytics tools to recommend the
combination of share purchases and sales for maximizing the profit.
7. EL-IV MA : Write a mobile application to generate a Scientific calculator using
J2ME/ Python/ Scala/ C++/ Android.
Group B:
1. 8-Queens Matrix is Stored using JSON/XML having first Queen placed, use back-
tracking to place remaining Queens to generate final 8-queen's Matrix using Python.
Create a backtracking scenario and use HPC architecture (Preferably BBB) for
computation of next placement of a queen.
2. Develop a stack sampling using threads using VTune Amplifier.
3. Write a program to check task distribution using Gprof.l
4. Implement OBST Tree search using HPC task sub-division. Merge the results to get
final result.
5. Perform concurrent ODD-Even Merge sort using HPC infrastructure (preferably
BBB) using Python/ Scala/ Java/ C++.
6. EL-IV Frame the suitable assignment to perform computing using BIA tools
effectively.
7. EL-IV Write a Mobile App program using J2ME /Python /Scala /Java /Android to
check the palindrome in a given string.
Group C:
1. Write HTML5 programming techniques to compile a text PDF file integrating Latex.

Assignment No: A1
Title:
Using Divide and Conquer Strategies design a cluster/Grid of BBB or Rasberi pi or
Computers in network to run a function for Binary Search Tree using C /C++/ Java/Python/
Scala.
Aim:
Using Divide and Conquer Strategies design a cluster of Computers in network to Run a
function for Binary Search Tree using C
Prerequisites:
 MPICH
 Ubuntu
Objective:
1) To study algorithmic examples in distributed, concurrent and parallel
environments
2) To develop problem solving abilities using Mathematical Modeling
3) To develop time and space effcient algorithms
4 )To effectively use multi-core or distributed, concurrent/Parallel environments.
Theory:
Divide and Conquer:
In divide and conquer method, a given problem is,
I. Divide into smaller problems.
II. These sub problems are solved independently.
III. Combining all the solutions of sub problems into a solution of the whole.
If the sub problems are large enough then divide and conquer is reapplied. Hence recursive
algorithms are used in divide and conquer technology.

 ALGORITHM:
Algorithm DC (P)
{
If P is too small then
Return solution of P.
Else
{
Divide (P) and obtain P1, P2,…….,Pn
Where n>=1
Apply DC to each subproblem
Return combines (DC (P1), DC (P2)… DC (Pn));
}
}
 Recurrence relation
T(n)= g(n) if n is small
T(n1)+ T(n2)+…. T(nn) when n is sufficiently large
 APPLICATION:
1) Binary search
2) Merge sort
3) Quick sort
4) Strassen‘s matrix multiplication
5) Convex hull problems
BINARY SEARCH TREE
 A binary search tree is a binary tree in which each node has value greater than
every node of left subtree and less than every node of right subtree.
 Complexity
Average Worst case
Space O(n) O(n)
Search O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)
 In computer science, binary search trees (BST), sometimes called ordered or sorted
binary trees, are a particular type of containers: data structures that store "items" (such
as numbers, names etc.) in memory. They allow fast lookup, addition and removal of
items, and can be used to implement either dynamic sets of items, or lookup tables
that allow finding an item by its key (e.g., finding the phone number of a person by
name).
 Binary search trees keep their keys in sorted order, so that lookup and other
operations can use the principle of binary search: when looking for a key in a tree (or
a place to insert a new key), they traverse the tree from root to leaf, making
comparisons to keys stored in the nodes of the tree and deciding, based on the
comparison, to continue searching in the left or right subtrees.
 On average, this means that each comparison allows the operations to skip about half
of the tree, so that each lookup, insertion or deletion takes time proportional to the
logarithm of the number of items stored in the tree. This is much better than the linear
time required to find items by key in an (unsorted) array, but slower than the
corresponding operations on hash tables.

 The major advantage of binary search trees over other data structures is that the
related sorting algorithms and search algorithms such as in-order traversal can be very
efficient; they are also easy to code. Binary search trees are a fundamental data
structure used to construct more abstract data structures such as sets, multisets, and
associative arrays.
 Some of their disadvantages are as follows:The shape of the binary search tree totally
depends on the order of insertions and deletions, and it can become degenerated.
When inserting or searching for an element in a binary search tree, the key of each
visited node has to be compared with the key of the element to be inserted or found.
The keys in the binary search tree may be long and the run time may increase. After a
long intermixed sequence of random insertion and deletion, the expected height of the
tree approaches square root of the number of keys, √n, which grows much faster than
log n.
 Types: There are many types of binary search trees:
1] AVL trees
2] Red-black trees
both forms of self-balancing binary search trees.
 Operations:
Binary search trees support three main operations: insertion of elements, deletion of
elements, and lookup (checking whether a key is present).
1) Searching: Searching a binary search tree for a specific key can be a recursive or
an iterative process. We begin by examining the root node. If the tree is null, the
key we are searching for does not exist in the tree.
2) Insertion: Insertion begins as a search would begin; if the key is not equal to that
of the root, we search the left or right subtrees as before. Eventually, we will reach
an external node and add the new key-value pair (here encoded as a record
'newNode') as its right or left child, depending on the node's key..
3) Deletion: Simply remove the node from the tree. Deleting a node with one child:
remove the node and replace it with its child. Deleting a node with two children:
call the node to be deleted N. Do not delete N. Instead, choose either its in-order
successor node or its in-order predecessor node, R. Copy the value of R to N, then
recursively call delete on R until reaching one of the first two cases.
4) Tree traversal: Once the binary search tree has been created, its elements can be
retrieved in-order by recursively traversing the left subtree of the root node,
accessing the node itself, then recursively traversing the right subtree of the node,
continuing this pattern with each node in the tree as it's recursively accessed. As
with all binary trees, one may conduct a pre-order traversal or a post-order
traversal, but neither are likely to be useful for binary search trees. An in-order
traversal of a binary search tree will always result in a sorted list of node items
(numbers, strings or other comparable items).
 Application:
1) Sort:
A binary search tree can be used to implement a simple sorting algorithm. Similar to
heap sort, we insert all the values we wish to sort into a new ordered data structure—
in this case a binary search tree—and then traverse it in order.
2) Priority queue operations
Conclusion:.
Hence, we have successfully implemented a function for Binary Search Tree using C
and Divide and Conquer Strategies.

Assignment No: A2
Title:
Using Divide and Conquer Strategies design a class for Concurrent Quick Sort using C++.
Aim:
Implement a Concurrent Quick Sort using divide and conquer strategy
Prerequsites :
 Basic knowledge for concurrent c++ programming.
 Ubuntu
Objectives:
1. Understand the importance Divide and Conquer Strategies
2. To learn Quick sort
Theory:
Quick Sort:
QuickSort is a Divide and Conquer algorithm. It picks an element as pivot and partitions the
given array around the picked pivot. There are many different versions of quickSort that pick
pivot in different ways.
1) Always pick first element as pivot.
2) Always pick last element as pivot
3) Pick a random element as pivot.
4) Pick median as pivot.
The key process in quickSort is partition(). Target of partitions is, given an array and an
element x of array as pivot, put x at its correct position in sorted array and put all smaller
elements (smaller than x) before x, and put all greater elements (greater than x) after x. All
this should be done in linear time.
Partition Algorithm:
There can be many ways to do partition. The logic is simple, we start from the leftmost
element and keep track of index of smaller (or equal to) elements as i. While traversing, if we
find a smaller element, we swap current element with pivot., Otherwise we ignore current
element.
partition(array, lower, upper)
{
pivot is array[lower]
while (true)
{ scan from right to left using index called RIGHT
STOP when locate an element that should be left of pivot
scan from left to right using index called LEFT
stop when locate an element that should be right of pivot
swap array[RIGHT] and array[LEFT]

if (RIGHT and LEFT cross)
pos = location where LEFT/RIGHT cross
swap pivot and array[pos]
all values left of pivot are <= pivot
all values right of pivot are >= pivot
return pos
end pos
} }
Example:
Time Complexity:
Best case complexity of quick sort is O(n log n)
Worst case Complexity is O (n2
)
Concurrent Quick sort:
Quicksort can be parallelized in a variety of ways. In the context of recursive decomposition,
during each call of QUICKSORT, the array is partitioned into two parts and each part is
solved recursively. Sorting the smaller arrays represents two completely independent sub
problems that can be solved in parallel. Therefore, one way to parallelize quicksort is to
execute it initially on a single process; then, when the algorithm performs its recursive calls
assign one of the sub problems to one process & other to another process. Now each of these
processes sorts its array by using quicksort. The algorithm terminates when the arrays cannot
be further partitioned. Upon termination, each process holds an element of the array, and the
sorted order can be recovered by traversing the processes.
Conclusion:
Thus we have studied and implemented concurrent Quick sort.

Assignment No: A3
Title:
Write a MPI program for calculating a quantity called coverage from data files.
Hint: - Program distributes computation efficiently across the cluster. The program should be
able to work with any number of nodes and should yield the same results as the serial code.
Aim:
Aim of this assignment is to form cluster through MPI program.
Prerequisites:
 Ubuntu 14.04 (64 bit preferred), MPICH2 Software
 Student should know basic MPI programming basics.
Objective:
1. To understand concept of Message Passing Interface(MPI)
2. To effectively use multi-core or distributed, concurrent/Parallel environments.
Theory:
What is MPI?
M P I = Message Passing Interface
MPI is a specification for the developers and users of message passing libraries. By itself, it is
NOT a library - but rather the specification of what such a library should be. MPI primarily
addresses the message-passing parallel programming model: data is moved from the address
space of one process to that of another process through cooperative operations on each
process. Simply stated, the goal of the Message Passing Interface is to provide a widely used
standard for writing message passing programs.
The Message Passing Interface Standard (MPI) is a message passing library standard based
on the consensus of the MPI Forum, which has over 40 participating organizations, including
vendors, researchers, software library developers, and users.
The goal of the Message Passing Interface is to establish a portable, efficient, and flexible
standard for message passing that will be widely used for writing message passing programs.
MPI is not an IEEE or ISO standard, but has in fact, become the "industry standard" for
writing message passing programs on HPC platforms.
MPI Programming Model:
As architecture trends changed, shared memory SMPs were combined over networks creating
hybrid distributed memory / shared memory systems.

MPI implementers adapted their libraries to handle both types of underlying memory
architectures seamlessly. They also adapted/developed ways of handling different
interconnect and protocols.
MPI runs on virtually any hardware platform:
 Distributed Memory
 Shared Memory
 Hybrid
General MPI Program Structure:

Communicators and Groups:
MPI uses objects called communicators and groups to define which collection of processes
may communicate with each other. Most MPI routines require you to specify a communicator
as an argument. MPI_COMM_WORLD whenever a communicator is required - it is the
predefined communicator that includes all of your MPI processes.
MPI_COMM_WORLD whenever a communicator is required - it is the predefined
communicator that includes all of your MPI processes.
MPI_COMM_WORLD
3.4 Level of Thread Support:
MPI libraries vary in their level of thread support:
• MPI_THREAD_SINGLE - Level 0: Only one thread will execute.
• MPI_THREAD_FUNNELED - Level 1: The process may be multi-threaded, but only
the main thread will make MPI calls - all MPI calls are funneled to the main thread.
• MPI_THREAD_SERIALIZED - Level 2: The process may be multi-threaded, and
multiple threads may make MPI calls, but only one at a time. That is, calls are not
made concurrently from two distinct threads as all MPI calls are serialized.
• MPI_THREAD_MULTIPLE - Level 3: Multiple threads may call MPI with no
restrictions.
3.5 Pros of MPI:
o runs on either shared or distributed memory architectures
o can be used on a wider range of problems than OpenMP
o each process has its own local variables
o distributed memory computers are less expensive than large shared memory
computers

3.6 Cons of MPI:
o requires more programming changes to go from serial to parallel version
o can be harder to debug
o performance is limited by the communcation network between the nodes
Elementary MPI Data types:
MPI data type C equivalent
MPI_SHORT short int
MPI_INT int
MPI_LONG long int
MPI_LONG_LONG long long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_UNSIGNED_LONG_LONG unsigned long long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE char

MPI Scatter, Gather:
MPI_Gather(
void* send_data,
int send_count,
MPI_Datatype send_datatype,
void* recv_data,
int recv_count,
MPI_Datatype recv_datatype,
int root,
MPI_Comm communicator)
MPI_Bcast(
void* data, int count, MPI_Datatype datatype, int root, MPI_Comm communicator)
MPI Scatter, Gather
MPI Scatter, Gather

MPI Scatter, Gather
MPI Cluster Installation steps:
Prerequisite: Ubuntu 14.04 (64 bit preferred)
Installations:
1. Execute command: sudo apt-get update
2. Install MPI:
sudoapt-get install openmpi-bin openmpi-common libopenmpi1.6 libopenmpi-dev
3. On Slave nodes install SSH Server: sudo apt-get install openssh-server
4. On Master node: apt-get install openssh-client
5. Generate ssh key on master as well as slave: ssh-keygen -t dsa
6. Enter any string as password on prompted window
7. Execute on Server node
cp /home/username/.ssh/id_dsa.pub /home/username/.ssh/authorized_keys
8. Make sure slave machines know that server is authorized to access everything on
slave:
scp /home/mpiuser/.ssh/id_dsa.pub username@IP_address:.ssh/authorized_keys
9. Change file permissions on both master and slave:
chmod 700 /home/mpiuser/.ssh
chmod 600 /home/mpiuser/.ssh/authorized_keys
10. Test ssh connection: sshusername@IP_Address
11. To ensure SSH does not ask for password, use ssh-agent to remember ssh password:
eval `ssh-agent`
12. Tell ssh-agent the password for the SSH key: ssh-add ~/.ssh/id_dsa
13. Test by logging in that it doesn‘t ask for password: sshusername@IP_Address
14. Configuring Open MPI: to let Open MPI know cluster computers information write
host_file:
15.
Example:
# The Hostfile for Open MPI
# The master node, 'slots=2' is used because it is a dual-processor machine.
127.0.0.1 slots=2
# The following slave nodes are single processor machines:
10.10.210.101
10.10.210.102
10.10.210.103

16. Compile MPI program: mpicctestprogram.c
17. To run program on two processes on local machine:mpirun -np 2 ./a.out
18. To run over 6 processes on cluster: mpirun -np 6 --hostfilehost_file ./a.out
Conclusion:
Hence, we implement the MPI program for calculating a quantity called coverage
from data files.

Assignment No: A4
Title:
Write a program on an unloaded cluster for several different numbers of nodes and record the
time taken in each case. Draw a graph of execution time against the number of nodes.
Aim:
Aim of this assignment is record time taken to execute MPI program for several different
numbers of nodes.
Prerequisites:
 Student should know basic MPI programming basics.
Objective:
 To write a program on an unloaded cluster for several different numbers of nodes and
record the time taken in each case. Draw a graph of execution time against the number
of nodes
Theory:
A group of the same or similar element gathered or occurring closely together, a bunch is
called clusters. A computer clusters consists of a set of loosely or tightly connected
computers that work together so that, in many respect they can be viewed as single system.
Unlike grid computers, computer clusters have each node set to perform the same task,
controlled and schedule by software. The component of clusters are usually connected to each
other through fast local area networks, with each node (computer used as a server) running its
own instance of an operating system. There are three types of server clusters, based on how
the cluster system, call nodes, are connected to the devices that store the cluster configuration
and state data. This data must be stored in a way that allows each active node to obtain the
data even if one or more nodes are down.
MPI Routines:
MPI_Init : Initialize the MPI execution environment
Synopsis int MPI_Init( int *argc, char ***argv )
Input Parameters:
argc Pointer to the number of arguments
argv Pointer to the argument vector
Thread and Signal Safety
This routine must be called by one thread only. That thread is called the main thread and
must be the thread that calls MPI_Finalize.
MPI_Comm_size : Determines the size of the group associated with a
communicator
Synopsis int MPI_Comm_size( MPI_Comm comm, int *size )
Input Parameters:
comm communicator (handle)
Output Parameters
size number of processes in the group of comm (integer)

MPI_Comm_rank: Determines the rank of the calling process in the
communicator
Synopsis int MPI_Comm_rank( MPI_Comm comm, int *rank )
Input Parameters:
Output Parameters
rank rank of the calling process in the group of comm
(integer)
MPI_Send: int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int
tag, MPI_Comm comm)
Input Parameters:
buf initial address of send buffer (choice)
count number of elements in send buffer (nonnegative
integer)
datatype datatype of each send buffer element (handle)
dest rank of destination (integer)
tag message tag (integer)
MPI_Recv:
Synopsis:
int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm
comm, MPI_Status *status)
Output Parameters:
buf initial address of receive buffer (choice)
status status object (Status)
Input Parameters:
count maximum number of elements in receive buffer
(integer)
datatype datatype of each receive buffer element (handle)
source rank of source (integer)
tag message tag (integer)
MPI_Scatter(
void* send_data,
int send_count,
void* recv_data,
int recv_count,
int root,
MPI_Gather(
void* send_data,
int send_count,
void* recv_data,
int recv_count,

int root,
MPI_Bcast( void* data, int count, MPI_Datatype datatype, int root, MPI_Comm
communicator)
MPI_Barrier(MPI_Comm communicator)
MPI_Wtime(): Displays time required to execute by each node.
Conclusion:
Thus, we have studied execution time of uploaded cluster for several different
numbers of nodes.

Assignment No: A5
Title:
To build a small compute cluster using Raspberry Pi/BBB modules to implement
Booths Multiplication Algorithm.
Aim:
Aim of this assignment is to build a small compute cluster using Raspberry Pi/BBB modules,
to perform Booths Multiplication Algorithm.
Prerequisites:
 Beagle Bone Kit
 Student should know basic concepts of Raspberry Pi/BBB.
 Student should know basic concepts of Booths Multiplication Algorithm.
Objective:
1. To build a small compute cluster using Raspberry Pi/BBB modules, to perform
Booths Multiplication Algorithm.
Theory:
Booth's multiplication algorithm is a multiplication algorithm that multiplies two
signed binary numbers in two's complement notation. The algorithm was invented by
Andrew Donald Booth in 1950 while doing research on crystallography at Birkbeck College
in Bloomsbury, London.
Booth used desk calculators that were faster at shifting than adding and created the algorithm
to increase their speed. Booth's algorithm is of interest in the study of computer architecture.
Booth's algorithm examines adjacent pairs of bits of the N-bit multiplier Y in signed two's
complement representation, including an implicit bit below the least significant bit, y-1 = 0.
For each bit yi, for i running from 0 to N-1, the bits yi and yi-1are considered. Where these two
bits are equal, the product accumulator P is left unchanged. Where yi = 0 and yi-1 = 1, the
multiplicand times 2i
is added to P; and where yi = 1 and yi-1 = 0, the multiplicand times 2i
is
subtracted from P. The final value of P is the signed product.
The multiplicand and product are not specified; typically, these are both also in two's
complement representation, like the multiplier, but any number system that supports addition
and subtraction will work as well. As stated here, the order of the steps is not determined.
Typically, it proceeds from LSB to MSB, starting at i = 0; the multiplication by 2i
is then
typically replaced by incremental shifting of the P accumulator to the right between steps;
low bits can be shifted out, and subsequent additions and subtractions can then be done just
on the highest N bits of P.[1]
There are many variations and optimizations on these details.

The algorithm is often described as converting strings of 1's in the multiplier to a high-order
+1 and a low-order –1 at the ends of the string. When a string runs through the MSB, there is
no high-order +1, and the net effect is interpretation as a negative of the appropriate value.
Booth's algorithm works because 99 * N = 100 * N - N, but the latter is easier to calculate
(thus using fewer brain resources).
In binary, multiplication by powers of two are simply shifts, and in hardware, shifts can be
essentially free (routing requires no gates) though variable shifts require either multiplexers
or multiple clock cycles.
Thus instead of multiplying n * 7 as (n * 4) + (n * 2) + (n * 1) which requires 2 additions,
Booth recoding allows us to implement it as (n * 8) - (n * 1) requiring one subtraction.
5.1 Booths Multiplication Flowchart:
Figure 5.1: Booths Multiplication Flowchart
5.2 Advantages:
Booth's Multiplication Algorithm (invented by A.D. Booth in 1951) takes advantage of
the fact that the time taken for multiplication depends on the number of 1's in the multiplier.
Multiplication by 1 involves addition of the multiplicand to the partial product, but
multiplication by 0 does not.

5.3 Disadvantages:
The time taken for multiplication increases with the number of 1‘s in the multiplier.
Building a Compute Cluster with the BeagleBone Black:
As a developer, I've always been interested in learning about and developing for new
technologies. BeagleBone Black Launched in 2008, the original BeagleBoard was developed
by Texas Instruments as an open source computer. It featured a 720 MHz Cortex A8 arm chip
and 256MB of memory. The BeagleBoard-xm and BeagleBone were released in subsequent
years leading to the BeagleBone Black as the most recent release. Though its $45 price tag is
a little higher than a Raspberry Pi, it has a faster 1GHz Cortex 8 chip, 512 MB of RAM and
extra USB connections. In addition to 2GB of onboard memory that comes with a pre-
installed Linux distribution, there is a micro SD card slot allowing you to load additional
versions of Linux and boot to them instead.
Setting up the Cluster
The list of equipment is as follows:
1x 8 port gigabit switch
3x beaglebone blacks
3x ethernet cables
3x 5V 2 amp power supplys
The BeagleBone Black supports HDMI output so you can use them as standalone computers.
The simplest way to get the BeagleBone Black running is to use the supplied USB cable to
hook it up to an existing computer and SSH to the pre-installed OS. Initial searches for
BeagleBone compatible distributions reveals there are a few places to download them. The
last step was to power them up and boot them using the micro SD cards. Doing this required
holding down the user boot button while connecting the 5V power connector. The button is
located on a little raised section near the usb port and tells the device to read from the SD
card. Once you see the lights flashing repeatedly you can release the button. Since each
instance will have the same default hostname when initially booting, it is advisable to power
them on one at a time and follow the steps below to set the IP and hostname before powering
up the next one.
Step 1) Follow the above ateps for making Connection of Devices for a cluster.
Step 2) Power on all the devices after making the connection
Step 3) First reset the wifi route (just google how to reset wifi route) by pressing reset
button. Reset button given at the rear end of router (keep pressing the button for few seconds)
(resetting of router required so that default ip of router can be used.
Default ip is 192.168.1.1
Step 4) Open browser and type 192.168.1.1. Following screen will be opened. It will ask the
username and password of the router (default is admin) Press OK

After entering username and password following screen will be opened. Look at the pointer
which is pointing to status tab. Click here
After clicking on status following screen will be opened and click on the tab local network
pointed by pointer in following fig.

After clicking on local network following screen will appear which is having on button
named DHCP client Table. Click on it.
After clicking DHCP Client Table Button one Table will be dispayed as follows
In above screen node1 and nodem are the beagle bone blacks and student-Vostro-3902 is the
PC connected in the cluster
Note down the IP addresses of above three devices. (Two beagle bone blacks ans one PC)
Step 5) Use following Link for set up
http://wallfloweropen.com/?project=beaglebone-black-cluster-demo-build

Follow all the steps given in the page of above link .
Read it carefully and follow them as it is.
Repeat steps 2.1 through 2.6 for each Beagle Bone Black.
Explaination for step 2.3 of above link:
In Our Execution
1st
BBB got the ip 192.168.1.104
We will login on to the 1st
BBB by using following command in the terminal.
$ ssh root@192.168.1.104
Now modify primary network interface eth0 of the file /etc/network/interfaces
use command $ sudo nano /etc/network/interfaces
iface eth0 inet static
address 192.168.1.103
netmask 255.255.255.0
network 192.168.1.0
gateway 192.168.1.1
You can modify as per the ip assigned to the BBB in your cluster
Follow remaining commands as given on the web page.
Instead of the python program we have executed c with mpi programs .
Booths Multiplication Algorithm:
Booth's multiplication algorithm is a multiplication algorithm that multiplies two
signed binary numbers in two's complement notation. The algorithm was invented by Andrew
Donald Booth in 1950. The main purpose or use of this algorithm is to multiply 2 binary
no.s.Booth's Algorithm is multiplies two number in 2's Complement.
Booth's algorithm can be implemented by repeatedly adding (with ordinary unsigned binary
addition) one of two predetermined values A and S to a product P, then performing a
rightward arithmetic shift on P. Let m and r be the multiplicand and multiplier, respectively;
and let x and y represent the number of bits in m and r.
1. Determine the values of A and S, and the initial value of P. All of these numbers
should have a length equal to (x + y + 1).
1. A: Fill the most significant (leftmost) bits with the value of m. Fill the
remaining (y + 1) bits with zeros.
2. S: Fill the most significant bits with the value of (−m) in two's complement
notation. Fill the remaining (y + 1) bits with zeros.
3. P: Fill the most significant x bits with zeros. To the right of this, append the
value of r. Fill the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
1. If they are 01, find the value of P + A. Ignore any overflow.
2. If they are 10, find the value of P + S. Ignore any overflow.
3. If they are 00, do nothing. Use P directly in the next step.
4. If they are 11, do nothing. Use P directly in the next step.
3. Arithmetically shift the value obtained in the 2nd step by a single place to the right.
Let P now equal this new value.
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m and r.

Example:
Find 3 × (−4), with m = 3 and r = −4, and x = 4 and y = 4:
 m = 0011, -m = 1101, r = 1100
 A = 0011 0000 0
 S = 1101 0000 0
 P = 0000 1100 0
 Perform the loop four times:
1. P = 0000 1100 0. The last two bits are 00.
 P = 0000 0110 0. Arithmetic right shift.
 P = 1101 0011 0. P = P + S.
 The product is 1111 0100, which is −12.
The above mentioned technique is inadequate when the multiplicand is the most negative
number that can be represented (e.g. if the multiplicand has 4 bits then this value is −8). One
possible correction to this problem is to add one more bit to the left of A, S and P. This then
follows the implementation described above, with modifications in determining the bits of A
and S; e.g., the value of m, originally assigned to the first x bits of A, will be assigned to the
first x+1 bits of A. Below, we demonstrate the improved technique by multiplying −8 by 2
using 4 bits for the multiplicand and the multiplier:
 A = 1 1000 0000 0
 S = 0 1000 0000 0
 P = 0 0000 0010 0
 Perform the loop four times:
1. P = 0 0000 0010 0. The last two bits are 00.
 P = 0 0000 0001 0. Right shift.
 P = 0 1000 0001 0. P = P + S.
 P = 0 0100 0000 1. Right shift.
 P = 1 1100 0000 1. P = P + A.
 P = 1 1110 0000 0. Right shift.
 P = 1 1111 0000 0. Right shift.
 The product is 11110000 (after discarding the first and the last bit) which is −16.
Conclusion:
Thus, we have build a small compute cluster using Raspberry Pi/BBB modules and
performed the Booths Multiplication Algorithm.

Assignment No :B1
Title:
8-Queens Matrix is Stored using JSON/XML having first Queen placed, use back-tracking to
place remaining Queens to generate final 8-queen's Matrix using Python. Create a
backtracking scenario and use HPC architecture (Preferably BBB) for computation of next
placement of a queen.
Objective:-
1.To understand concept of Backtracking .
2. Understand the 8-queen matrix in JSON/XML.
3. To generate 8-queen matrix using backtracking
5. To develop problem solving abilities using Mathematical Modelling.
6. To develop time and space efficient algorithms.
Pre-Requisites:-
Ubuntu OS, BBB Kit
Theory:
The 8-queens puzzle is the problem of placing eight chess queens on an 8X8 chessboard so
that no two queens threaten each other. Thus, a solution requires that no two queens share the
same row, column, or diagonal. The 8- queens puzzle is an example of the more general n-
queens problem of placing n queens on an nXn chessboard, where solutions exist for all
natural numbers
n with the exception of n=2 and n=3.
The 8-queens puzzle has 92 distinct solutions. If solutions that differ only by symmetry
operations (rotations and reflections) of the board are counted as one, the puzzle has 12
fundamental solutions.
Description:
Backtracking:
Backtracking is a general algorithm for finding all (or some) solutions to some computational
problems, notably constraint satisfaction problems that incrementally builds candidates to the
solutions, and abandons each partial candidate c (backtracks) as soon as it determines that c
cannot possibly be completed to a valid solution. The classic textbook example of the use of
backtracking is the eight queens Puzzle that asks for all arrangements of eight chess queens
on a standard chessboard so that no queen attacks any other. In the common backtracking
approach, the partial candidates are arrangements of k queens in the first k rows of the board,
all in different rows and columns. Any partial solution that contains two mutually attacking
queens can be abandoned, since it can-not possibly be completed to a valid solution.
Backtracking can be applied only for problems which admit the concept of a partial candidate
solution and a relatively quick test of whether it can possibly be completed to a valid
solution. It is useless, for example, for locating a given value in an unordered table. When it
is applicable, however, backtracking is often much faster than brute force enumeration of all
complete candidates, since it can eliminate a large number of candidates with a single test.

Solving 8 Queen Problem by backtracking:
The 8 queen problem is a case of more general set of problems namely n queen problem. The
basic idea: How to place n queen on n by n board, so that they don't attack each other. As we
can expect the complexity of solving the problem increases with n. We will briefly introduce
solution by backtracking. For example: Q1 attacks some positions, therefore Q2 has to
comply with these constraints and take place, not directly attacked by Q1. Placing Q3 is
harder, since we have to satisfy constraints of Q1 and Q2. Going the same way we may reach
point, where the constraints make the placement of the next queen impossible. Therefore we
need to relax the constraints and find new solution. To do this we are going backwards and
finding new admissible solution.
To keep everything in order we keep the simple rule: last placed, first displaced. In other
words if we place successfully queen on the ith column but cannot find solution for (i+1)th
queen, then going backwards we will try to find other admissible solution for the 10th queen
first. This process is called backtrack Lets discuss this with example. For the purpose of this
handout we will find solution of 4 queen problem.
Advantages of Backtracking:
 It is a step-by-step representation of a solution to a given problem ,which is
very easy to understand
 It has got a definite procedure.
 It easy to first develop an algorithm, &then convert it into a flowchart &then
into a computer program.
 It is independent of programming language.
 It is easy to debug as every step is got its own logical sequence.
Disadvantages of Backtracking:
 It is time consuming computer program.
Algorithm:
1. Place the queens column wise, start from the left most column
2. If all queens are placed.
1. return true and print the solution matrix.
3. Else
1. Try all the rows in the current column.
2. Check if queen can be placed here safely if yes mark the current cell in solution
matrix as 1 and try to solve the rest of the problem recursively.
3. If placing the queen in above step leads to the solution return true.
4. If placing the queen in above step does not lead to the solution ,BACKTRACK,
mark the current cell in solution matrix as 0 and return false.

4. If all the rows are tried and nothing worked, return false and print NOSOLUTION.
Complexity OF N-Queen
It has time complexity: O(n^n), As NQueen function is recursively calling, but is there is any
tighter bound possible for this program? what about best case, and worst case time
complexity. I am also confused about the place() function which is O(k) and calling from
NQueen().
BeagleBone:
BeagleBone Black Overview:
The BeagleBone Black is the latest addition to the BeagleBoard.org family and like its
predecessors, is designed to address the Open Source Community, early adopters, and anyone
interested in a low cost ARM Cortex-A8 based processor. It has been equipped with a
minimum set of features to allow the user to experience the power of the processor and is not
intended as a full development platform as many of the features and interfaces supplied by
the processor are not accessible from the Beagle Bone Black via on board support of some
interfaces. It is not a complete product designed to do any particular function. It is a
foundation for experimentation and learning how to program the processor and to access the
peripherals by the creation of your own software and hardware. It also offers access to many
of the interfaces and allows for the use of add-on boards called capes, to add many different
combinations of features. A user may also develop their own board or add their own circuitry.
Board Component Locations:
This section describes the key components on the board. It provides information on their
location and function. Familiarize yourself with the various components on the board.
1. Connectors, LEDs, and Switches
 DC Power is the main DC input that accepts 5V power.
 Power Button alerts the processor to initiate the power down sequence.
 10/100 Ethernet is the connection to the LAN.
 Serial Debug is the serial debug port.
 USB Client is a mini USB connection to a PC that can also power the board.
 BOOT switch can be used to force a boot from the SD card.
 There are four blue LEDS that can be used by the user.
 Reset Button allows the user to reset the processor.
 uSD slot is where a uSD card can be installed.
 microHDMI connector is where the display is connected to.
 USB Host can be connected different USB interfaces such as Wi-Fi, Bluetooth,
Keyboard, etc

Features Of BeagleBone Black:
.
Key Components
 Sitara AM3359AZCZ100 is the processor for the board.
 Micron 512MB DDR3L is the Dual Data Rate RAM memory.
 TPS65217C PMIC provides the power rails to the various components on the board.
 SMSC Ethernet PHY is the physical interface to the network.
 Micron eMMC is an onboard MMC chip that holds up to 2GB of data.
 HDMI Framer provides control for an HDMI or DVI-D display with an adapter.
Connectivity:
1. Connect the small connector on the USB cable to the board as shown in Figure 1. The
connector is on the bottom side of the board.
2. Connect the large connector of the USB cable to your PC or laptop USB port.
3. The board will power on and the power LED will be on as shown in Figure 2 below.

4.Apply Power
The final step is to plug in the DC power supply to the DC power jack as shown in Figure4
below.
5. Booting the Board
As soon as the power is applied to the board, it will start the booting up process. When the
board starts to boot the LEDs will come on in sequence as shown in Figure 5 below. It will
take a few seconds for the status LEDs to come on, so be patient. The LEDs will be flashing
in an erratic manner as it boots the Linux kernel.
Step to run the program in BeagleBone:
1. In ubuntu terminal we need to type following command for accessing the beagle bone
terminal.
 Sudo su
 minicom –s
 Serial port setup
 Enter
 Type A
 Change address by ttyACM0
 Enter
 Type G
 Enter
 Save as dfl
 Exit
2.Now open beagle bone terminal type username: debian Pass: temppwd
3. Tyape vi progname.py
Type I to insert code type code on editor
4. Save code as Esc shift :wq Enter
4. To run the program write following command in beagle bone terminal.
Python progranname.py
Conclusion:
Hence, we have successfully implemented 8-Queen matrix using backtracking,
Beagle bone in python.

Assignment No: B2
Title:
Develop a stack sampling using threads using VTune Amplifier.
Aim:
A stack sampling using threads using VTune Amplifier.
Prerequisites:
 Ubuntu OS
Objective:
1. To understand concept of VTune Amplifier.
Theory:
Intel VTune Amplifier is a commercial application for software performance analysis for 32
and 64-bit x86 based machines, and has both GUI and command line interfaces. It is
available for both Linux and Microsoft Windows operating systems. Although basic features
work on both Intel and AMD hardware, advanced hardware-based sampling requires an Intel-
manufactured CPU.
It is available as part of Intel Parallel Studio or as a stand-alone product.
Code Optimization
VTune Amplifier assists in various kinds of code profiling including stack sampling, thread
profiling and hardware event sampling. The profiler result consists of details such as time
spent in each sub routine which can be drilled down to the instruction level. The time taken
by the instructions are indicative of any stalls in the pipeline during instruction execution.
The tool can be also used to analyze thread performance. The new GUI can filter data based
on a selection in the timeline.
Stack Sampling:
(Stack-based memory allocation):
Stacks in computing architectures are regions of memory where data is added or removed in a
last-in-first-out (LIFO) manner.
In most modern computer systems, each thread has a reserved region of memory referred to
as its stack. When a function executes, it may add some of its state data to the top of the
stack; when the function exits it is responsible for removing that data from the stack. At a
minimum, a thread's stack is used to store the location of function calls in order to allow
return statements to return to the correct location, but programmers may further choose to
explicitly use the stack. If a region of memory lies on the thread's stack, that memory is said
to have been allocated on the stack.
Because the data is added and removed in a last-in-first-out manner, stack-based memory
allocation is very simple and typically faster than heap-based memory allocation (also known
as dynamic memory allocation). Another feature is that memory on the stack is automatically,
and very efficiently, reclaimed when the function exits, which can be convenient for the

programmer if the data is no longer required. If however, the data needs to be kept in some
form, then it must be copied from the stack before the function exits. Therefore, stack based
allocation is suitable for temporary data or data which is no longer required after the creating
function exits.
Pseudo Procedure:
Dedicated users of the previous generation‘s VTune™ Performance Analyzer remember that
the tool supported Java application profiling. Over time, this feature disappeared from the
radar, but since then customers have clamored for Java support in the current VTune
Amplifier XE. Profiling pure Java applications and more importantly mixed Java and native
C/C++ applications is becoming necessary again. In response to this request, Java profiling
has been added in the new Intel(R) VTune™ Amplifier XE 2013 in addition to the JITed
application profiling support.
Why does someone need Java application profiling? The main purpose of performance
profiling is identifying functions or code locations which take up most of CPU‘s time, and
finding out how effectively they use this computing resource. Even though Java code
execution is handled a Managed Runtime Environment, it can be as ineffective in terms of
data management as in programs written using native languages. For example, if you‘re
conscious about performance of your data mining Java-application, you need to take into
consideration your target platform memory architecture, cache hierarchy and latency of
access to memory levels. From the platform microarchitecture point of view, profiling of a
Java applications is similar to profiling native applications but with one major difference:
since users want to see timing metrics against their program source code, the profiling tool
must be able to map performance metrics of the binary code either compiled or interpreted by
the JVM back to the original source code in Java or C/C++.
With VTune Amplifier XE Hotspot analysis you get a list of the hottest methods along with
their timing metrics and call stacks. Note that a workload distribution over threads is also
displayed in the time line view of results. Thread naming helps to identify where exactly the
most resource consuming code was executed.
Those who are pursuing maximum performance on a platform may apply some tricks like
writing and compiling performance critical modules of their Java project in native languages

like C or even assembly. This way of programming helps to employ powerful CPU resources
like vector computing (implemented though SIMD units and instruction sets). In this case, the
heavy calculating functions become hotspots in the profiling results, which is expected as
they do most of the job. However, you might be interested not only in hotspot functions, but
in identifying locations in Java-code those functions were called from through a JNI-
interface. Tracing such cross runtime calls in mixed language algorithm implementations
could be a challenge.
In order to help analysis of mixed code profiling results, VTune Amplifier XE is ―stitching‖
the Java call stack with the subsequent native call stack of C/C++ functions. The reverse call
stacks stitching works as well.
The most advanced usage of the tool is profiling and optimizing Java applications for the
microarchitecture of the CPU utilized in your platform. Although this may sound paradoxical
because Java and JVM technology is intended to free a programmer from machine
architecture specific coding, once Java code is optimized for current Intel microarchitectures
it will most probably keep this advantage for future generations of CPUs. VTune Amplifier
XE provides a state of the art Hardware Event-based profiling technology, which monitors
hardware events in the CPU‘s pipeline and can identify code pitfalls that limit most effective
execution of instructions in the CPU. The hardware performance metrics are available and
can be displayed against the application‘s modules, functions, and Java code source lines.
Hardware Event-based sampling collection with stacks is also available – it‘s useful when
you need to find out a call path for a function called in a driver or middleware layer in your
system.

It‘s fairly easy to configure your performance analysis using either the VTune Amplifier GUI
or command line tool. One way is to embed your java command in a batch file or executable
script.
For example, in my run.cmd file I have the following command:
java.exe -Xcomp -Djava.library.path=mixed_dllia32 -cp
C:DesignJavamixed_stacks MixedStacksTest 3 2
I just need to put a path to the run.cmd file in the Application field of the Launch Application
configuration in the Project Configuration of my VTune Amplifier XE project. In addition I
select ―Auto‖ as the managed code profiling and preserve analysis of child processes with
that specific switch. That‘s it. Now I can start an analysis.

Similarly, you can configure an analysis in the command line tool. For example, with
Hotspots analysis you can use the following command:
amplxe-cl –collect hotspots -- run.bat
or directly:
amplxe-cl –collect hotspots -- java.exe -Xcomp -
Djava.library.path=mixed_dllia32 -cp C:DesignJavamixed_stacks
MixedStacksTest 3 2
In case your Java application needs to run for some time or cannot be launched at the start of
this analysis, on Windows* you may attach the tool to the Java process. Change the Target
type selector to ―Attach to Process‖ and add your process name or PID.

You may face some obstacles while profiling Java applications. A JVM does funny tricks
with binary code and in some cases details of exact correspondence between executed
instruction address and source line numbers may be distorted. As a result, we may observe a
slight slipping of timing results down to the next source code lines. If it‘s a loop, the time
metric may slip upward. You should keep this in mind and be attentive to unlikely results.
You should expect that a JVM will interpret some rarely called methods instead of compiling
them for the sake of performance. The tool marks such calls as ―!Interpreter‖ in the restored
call stack – identifying the name of an interpreted call may be a feature in future product
updates. If you would like such functions to be displayed in stacks with their names, force the
JVM to compile them by using the ―–Xcomp‖ option. However, the timing characteristics
may change noticeably if many small or rarely used functions are being called during
execution. Note, due to inlining during the compilation stage, some functions might not
appear in the stack.

The following are some limitations:
 It‘s difficult to support all Java Runtime Environments (JRE) available in the market,
so at the moment we support Oracle* Java 6 and 7.
 Java application profiling is supported for Hotspots analysis and Hardware Event-
based analysis (e.g. Lightweight Hotspots), but Concurrency analysis is limited as
some embedded Java synchronization primitives (which do not call operating system
synchronization objects) cannot be recognized by the tool. As a result, some of the
timing metrics may be distorted for Concurrency as well as for Locks & Waits
analysis.
 The tool cannot attach to a Java process on Linux. We support attach on Windows at
the moment.
 There are no dedicated libraries supplying a user API for collection control in the Java
source code. However, you may want to try applying the native API by wrapping the
__itt calls with JNI calls.
Java support feature is still developing in the product. The Java run-times and Java virtual
machines are also changing with new JDK updates coming out every few months. So you
may face with some problems in stack unwinding or symbols retrieve from the JVM. In those
cases ask Intel support team for help with a problem.
Below is a detailed list of tricks you might want to consider when profiling Java or mixed
applications on different platforms.

Additional command line Oracle JDK Java VM options that change the behavior of the Java
VM
 On Linux x86 use client Oracle JDK Java VM instead of the server Java VM, i.e.
either explicitly specify ―-client‖ or simply do not specify ―-server‖ as an Oracle JDK
Java VM command line option.
 On Linux x64 try specifying the ‗-XX:-UseLoopCounter‘ command line option which
switches off on-the-fly substitution of the interpreted method with the compiled
version.
 On Windows try specifying '-Xcomp' that forces JIT compilation for better quality of
stack walking.
Note: when you force the JVM to compile initially interpreted functions, the timing of your
application may change and for small and rarely called functions compilation would be less
performance effective than interpretation
On Linux try to change stack unwinding mode to "After collection"
 Click the New Analysis button in the VTune Amplifier XE tool bar
 Choose the ‗Hotspots‘ analysis type and right-click
 Select ‗Copy from current ‘ in the context menu
 In the opened ‗Custom Analysis‘ dialog select ‗After collection‘ in ‗Stack unwinding
mode‘ drop-down list and press ‗OK‘ button
 Start collection using this new analysis type.
*Other names and brands may be declared as the property of others.
Conclusion:
Hence, we have successfully Develop a stack sampling using threads using VTune
Amplifier.

Assignment No: B3
Title:
To Write a program to check task distribution using Gprof.l
Aim:
Aim of this assignment is to check task distribution using Gprof.l .
Prerequisites:
 Student should know basic commands of linux.
 Student Should Know Gprof.1
Objective:
To Write a program to check task distribution using Gprof.l
Theory:
Profiling allows you to learn where your program spent its time and which functions called
which other functions while it was executing. This information can show you which pieces of
your program are slower than you expected, and might be candidates for rewriting to make
your program execute faster. It can also tell you which functions are being called more or less
often than you expected. This may help you spot bugs that had otherwise been unnoticed.
Since the profiler uses information collected during the actual execution of your program, it
can be used on programs that are too large or too complex to analyze by reading the source.
However, how your program is run will affect the information that shows up in the profile
data. If you don't use some feature of your program while it is being profiled, no profile
information will be generated for that feature.
"gprof" produces an execution profile of C, Pascal, or Fortran77 programs. The effect of
called routines is incorporated in the profile of each caller. The profile data is taken from the
call graph profile file (gmon.out default) which is created by programs that are compiled with
the -pg option of "cc", "pc", and "f77". The -pg option also links in versions of the library
routines that are compiled for profiling. "Gprof" reads the given object file (the default is
"a.out") and establishes the relation between its symbol table and the call graph profile from
gmon.out. If more than one profile file is specified, the "gprof" output shows the sum of the
profile information in the given profile files.
"Gprof" calculates the amount of time spent in each routine. Next, these times are propagated
along the edges of the call graph. Cycles are discovered, and calls into a cycle are made to
share the time of the cycle.
gprof
 gprof is the GNU Project PROFiler. gnu.org/software/binutils/
 Requires recompilation of the code.
 Compiler options and libraries provide wrappers for each routine call, and periodic
sampling of the program.
 A default gmon.out file is produced with the function call information.
 GPROF links the symbol list in the executable with the data in gmon.out.

GNU gprof time profiler:
1.Detail time statistics for each subroutine.
2.Create relative graph for all subroutines.
3.Analyze the program bottleneck.
4.Increase about 30% extra time cost.
Convert produced profile data into text file
gprof ListOfOptions ExecuteFile StatFiles > OutputFile
$ gprof –b test2 gmon.out > output.txt
- ListOfOptions can be omitted.
- ExecuteFile can be omitted when the file name is a.out.
- StatFiles can be omitted when the file name is gmon.out.
List of Options
-b: omit the table or data illustration on OutputFile.
-e(E) SRName: exclude the subroutine SRName from the table (and exclude its elapsed
time).
-f(F) SRName: only display the subroutine SRName on the table (and its elapsed time).
-s: combine more than one StatFile into single one with default file name gmon.sum.
-Z: only display all subroutines table which are unused on the program.
Types of Profiles:
1.Flat Profile
 CPU time spend in each function (self and cumulative)
 Number of times a function is called
 Useful to identify most expensive routines
2.Call Graph
 Number of times a function was called by other functions
 Number of times a function called other functions
 Useful to identify function relations
 Suggests places where function calls could be eliminated
3. Annotated Source
 Indicates number of times a line was executed
Example Program
Subroutine relative graph

$ gcc –pg test.c –o test
$ ./test
$ gprof –b test gmon.out > output
$ more output
Output :
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
71.90 30.17 30.17 1 30.17 30.17 C3
19.42 38.32 8.15 2 4.07 4.07 B2
7.99 41.67 3.35 1 3.35 3.35 C2
0.00 41.67 0.00 1 0.00 37.60 A
0.00 41.67 0.00 1 0.00 33.52 B1
0.00 41.67 0.00 1 0.00 0.00 C1
0.00 41.67 0.00 1 0.00 4.07 D
% time: the percent of self seconds from total program elapsed time.
cumulative seconds: the seconds cumulate from self seconds.
self seconds: total elapsed time called by its parents, not including its children‘s elapsed
time. equal to (self s/call)*(calls)
calls: total number for each subroutine called by its parents.
self s/call: elapsed time for each time called by its parents,not including its children‘s elapsed
time.
total s/call: total elapsed time called by its parents, including its children‘s elapsed time.
name: subroutine name.
Call Graph For Above Program:
Call graph
index %time self children called name
<spontaneous>
[1] 100.0 0.00 41.67 main[1]
0.00 37.60 1/1 A[2]
0.00 4.07 1/1 D[6]
------------------------------------------------------------------
0.00 37.60 1/1 main[1]
[2] 90.2 0.00 37.60 1 A[2]
0.00 33.52 1/1 B1[3]
4.07 0.00 1/2 B2[5]
------------------------------------------------------------------

0.00 33.52 1/1 A[2]
[3] 80.4 0.00 33.52 1 B1[3]
30.17 0.00 1/1 C3[4]
3.35 0.00 1/1 C2[7]
0.00 0.00 1/1 C1[8]
------------------------------------------------------------------
30.17 0.00 1/1 B1[3]
[4] 72.4 30.17 0.00 1 C3[4]
------------------------------------------------------------------
4.07 0.00 1/2 A[2]
4.07 0.00 1/2 D[6]
[5] 19.6 8.15 0.00 2 B2[5]
------------------------------------------------------------------
0.00 4.07 1/1 main[1]
[6] 9.8 0.00 4.07 1 D[6]
4.07 0.00 1/2 B2[5]
------------------------------------------------------------------
3.35 0.00 1/1 B1[3]
[7] 8.0 3.35 0.00 1 C2[7]
------------------------------------------------------------------
0.00 0.00 1/1 B1[3]
[8] 0.0 0.00 0.00 1 C1[8]
------------------------------------------------------------------
Several forms of output are available from the analysis.
1. The flat profile shows how much time your program spent in each function, and how
many times that function was called. If you simply want to know which functions
burn most of the cycles, it is stated concisely here.
2. The call graph shows, for each function, which functions called it, which other
functions it called, and how many times. There is also an estimate of how much time
was spent in the subroutines of each function. This can suggest places where you
might try to eliminate function calls that use a lot of time.
3. The annotated source listing is a copy of the program's source code, labeled with the
number of times each line of the program was executed.
Profiling has several steps:
 You must compile and link your program with profiling enabled. See section
Compiling a Program for Profiling.
 You must execute your program to generate a profile data file. See section Executing
the Program.
 You must run gprof to analyze the profile data. See section gprof Command
Summary.
Gprof comes pre-installed with most of the Linux distributions, but if that‘s not the case
with your Linux distro, you can download and install it through a command line package
manager like apt-get or yum. For example, run the following command to download and
install gprof on Debian-based systems:
sudo apt-get install binutils

Compiling a program for profiling:
The first step in generating profile information for your program is to compile and link it with
profiling enabled.
To compile a source file for profiling, specify the `-pg' option when you run the compiler.
(This is in addition to the options you normally use.)
To link the program for profiling, if you use a compiler such as cc to do the linking, simply
specify `-pg' in addition to your usual options. The same option, `-pg', alters either
compilation or linking to do what is necessary for profiling. Here are examples:
cc -g -c myprog.c utils.c -pg
cc -o myprog myprog.o utils.o -pg
The `-pg' option also works with a command that both compiles and links:
cc -o myprog myprog.c utils.c -g –pg
Executing the Program:
Your program will write the profile data into a file called `gmon.out' just before exiting. If
there is already a file called `gmon.out', its contents are overwritten. There is currently no
way to tell the program to write the profile data under a different name, but you can rename
the file afterward if you are concerned that it may be overwritten.
Conclusion:.
Thus, we have implemented task distribution using Gprof.l

Assignment No:B4
AIM
Implement OBST Tree search using HPC task sub-division. Merge the results to get final
result.
Objective:-
1.To understand concept of OBST.
3. To develop problem solving abilities using Mathematical Modeling.
Pre-Requisites:-
Ubuntu OS
Theory :
An optimal binary search tree is a binary search tree for which the nodes are arranged on
levels such that the tree cost is minimum. For the purpose of a better presentation of optimal
binary search trees, we will consider ―extended binary search trees‖, which have the keys
stored at their internal nodes. Suppose ―n‖ keys k1, k2, … , k n are stored at the internal
nodes of a binary search tree. It is assumed that the keys are given in sorted order, so that k1<
k2 < … < kn. An extended binary search tree is obtained from the binary search tree by
adding successor nodes to each of its terminal nodes as indicated in the following figure by
squares.

Advantage:
The major advantage of binary search trees over other data structures is that the
related sorting algorithms and search algorithms such as in-order traversal can be very
efficient, they are also easy to code.
Disadvantages:
 The shape of the binary search tree totally depends on the order of insertions and
deletions, and can become degenerate.
 When inserting or searching for an element in a binary search tree, the key of each visited
node has to be compared with the key of the element to be inserted or found.
 The keys in the binary search tree may be long and the run time may increase.
 After a long intermixed sequence of random insertion and deletion, the expected height of
the tree approaches square root of the number of keys, √n, which grows much faster
than log n.
Analysis:
The optimal binary search tree has a time complexity of O(n^3). It‘s space efficiency is only
O(n^2). It can possibly be lowered to complexity O(n^2) with smarter recursive functions and
smaller ranges of values.
Conclusion:
Thus, we successfully implemented a OBST Tree search using HPC task division.

Assignment No: B5
AIM:
Perform concurrent ODD-Even Merge Sort using HPC infrastructure(preferably BBB) using
python/java/C++
Objective:-
1.To understand concept of odd-Even Merge sort
3. To develop problem solving abilities using Mathematical Modelling.
Pre-Requisites:-
Ubuntu OS, BBB Kit
Theory:
Concept of ODD-Even Merge sort :
The odd-even merge sort algorithm was developed by K.E. BATCHER It is based on a merge
algorithm that merges two sorted halves of a sequence to a completely sorted sequence.
In contrast to merge sort, this algorithm is not data-dependent, i.e. the same comparisons are
performed regardless of the actual data. Therefore, odd-even mergesort can be implemented
as a sorting network.
Algorithm:
Step0. If m = 1,merge the sequences with one comparison.
Step1. PartitionX1 and X2 into their odd and even parts. That is, partition X1 into
X1odd = k1, k3,..km-1 and X1even =k2,k4,…..km Similarly, partition X2 into X2odd and
X2even.
Step 2. Recursively merge X1odd with X2odd using m processors. Let
L1=l1,l2,….lm be the result. Note that X1odd,X1even,X2odd, and X2even are in sorted
order. At the same time merge X1een with X2evn using the other m processors to get L2 =
lm+l,lm+2 …..l2m
Step3. Shuffle L1 and L2, that is, form the sequence L =l1, lm+1,l2,
lm+2……lm,l2m. Compare every pair( lm+i,li+1) and interchange them if they are out of
order. That is, compare
lm+1 with l2 and interchange them if need be, compare lm+2 with lm and interchange them
if need be, and so on. Output the resultant sequence.

Example:
Analysis:
Let T(n) be the number of comparisons performed by odd-even merge(n). Then we have
for n>2
T(n) = 2·T(n/2) + n/2-1.
With T(2) = 1 we have
T(n) = n/2 · (log(n)-1) + 1 O(n·log(n)).
Conclusion:
Thus, we successfully implemented Odd-Even Merge Sort using HPC infrastructure.

// ODD-Even Merge Sort
Program:
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int merge(double *ina, int lena, double *inb, int lenb, double *out) {
int i,j;
int outcount=0;
for (i=0,j=0; i<lena; i++) {
while ((inb[j] < ina[i]) && j < lenb) {
out[outcount++] = inb[j++];
}
out[outcount++] = ina[i];
}
while (j<lenb)
out[outcount++] = inb[j++];
return 0;
}
int domerge_sort(double *a, int start, int end, double *b) {
if ((end - start) <= 1) return 0;
int mid = (end+start)/2;
domerge_sort(a, start, mid, b);
domerge_sort(a, mid, end, b);
merge(&(a[start]), mid-start, &(a[mid]), end-mid, &(b[start]));
int i;
for (i=start; i<end; i++)
a[i] = b[i];
return 0;
}
int merge_sort(int n, double *a) {
double b[n];
domerge_sort(a, 0, n, b);
return 0;
}
void printstat(int rank, int iter, char *txt, double *la, int n) {
printf("[%d] %s iter %d: <", rank, txt, iter);
int i,j;
for (j=0; j<n-1; j++)
printf("%6.3lf,",la[j]);
printf("%6.3lf>n", la[n-1]);
}
void MPI_Pairwise_Exchange(int localn, double *locala, int sendrank, int recvrank,

MPI_Comm comm) {
/*
* the sending rank just sends the data and waits for the results;
* the receiving rank receives it, sorts the combined data, and returns
* the correct half of the data.
*/
int rank;
double remote[localn];
double all[2*localn];
const int mergetag = 1;
const int sortedtag = 2;
MPI_Comm_rank(comm, &rank);
if (rank == sendrank) {
MPI_Send(locala, localn, MPI_DOUBLE, recvrank, mergetag,
MPI_COMM_WORLD);
MPI_Recv(locala, localn, MPI_DOUBLE, recvrank, sortedtag, MPI_COMM_WORLD,
MPI_STATUS_IGNORE);
} else {
MPI_Recv(remote, localn, MPI_DOUBLE, sendrank, mergetag,
MPI_COMM_WORLD, MPI_STATUS_IGNORE);
merge(locala, localn, remote, localn, all);
int theirstart = 0, mystart = localn;
if (sendrank > rank) {
theirstart = localn;
mystart = 0;
}
MPI_Send(&(all[theirstart]), localn, MPI_DOUBLE, sendrank, sortedtag,
MPI_COMM_WORLD);
int i;
for (i=mystart; i<mystart+localn; i++)
locala[i-mystart] = all[i];
}
}
int MPI_OddEven_Sort(int n, double *a, int root, MPI_Comm comm)
{
int rank, size, i;
double *local_a;
// get rank and size of comm
MPI_Comm_rank(comm, &rank); //&rank = address of rank
MPI_Comm_size(comm, &size);
local_a = (double *) calloc(n / size, sizeof(double));
// scatter the array a to local_a
MPI_Scatter(a, n / size, MPI_DOUBLE, local_a, n / size, MPI_DOUBLE,
root, comm);

// sort local_a
merge_sort(n / size, local_a);
//odd-even part
for (i = 1; i <= size; i++) {
printstat(rank, i, "before", local_a, n/size);
if ((i + rank) % 2 == 0) { // means i and rank have same nature
if (rank < size - 1) {
MPI_Pairwise_Exchange(n / size, local_a, rank, rank + 1, comm);
}
} else if (rank > 0) {
MPI_Pairwise_Exchange(n / size, local_a, rank - 1, rank, comm);
}
}
printstat(rank, i-1, "after", local_a, n/size);
// gather local_a to a
MPI_Gather(local_a, n / size, MPI_DOUBLE, a, n / size, MPI_DOUBLE,
root, comm);
if (rank == root)
printstat(rank, i, " all done ", a, n);
return MPI_SUCCESS;
}
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);
int n = argc-1;
double a[n];
int i;
for (i=0; i<n; i++)
a[i] = atof(argv[i+1]);
MPI_OddEven_Sort(n, a, 0, MPI_COMM_WORLD);
MPI_Finalize();
return 0;
}

Assignment No: C1
Aim:
Write HTML5 programming techniques to compile a text PDF file integrating Latex
Objective:
 To learn the programming techniques in HTML5
Theory:
Introduction:
The Web Hypertext Application Technology Working Group (WHATWG) began work on
the new standard in 2004. At that time, HTML 4.01 had not been updated since 2000, and
the World Wide Web Consortium (W3C) was focusing future developments on XHTML 2.0.
In 2009, the W3C allowed the XHTML 2.0 Working Group's charter to expire and decided
not to renew it. W3C and WHATWG are currently working together on the development of
HTML5.
While some features of HTML5 are often compared to Adobe Flash, the two technologies are
very different. Both include features for playing audio and video within web pages, and for
using Scalable Vector Graphics. However, HTML5 on its own cannot be used for animation
or interactivity – it must be supplemented with CSS3 or JavaScript. There are many Flash
capabilities that have no direct counterpart in HTML5. See Comparison of HTML5 and
Flash.
Although HTML5 has been well known among web developers for years, its interactive
capabilities became a topic of mainstream media around April 2010[11][12][13][14]
after Apple
Inc's then-CEO Steve Jobs issued a public letter titled "Thoughts on Flash" where he
concluded that "Flash is no longer necessary to watch video or consume any kind of web
content" and that "new open standards created in the mobile era, such as HTML5, will
win".[15]
This sparked a debate in web development circles suggesting that, while HTML5
provides enhanced functionality, developers must consider the varying browser support of the
different parts of the standard as well as other functionality differences between HTML5 and
Flash.[16]
In early November 2011, Adobe announced that it would discontinue development
of Flash for mobile devices and reorient its efforts in developing tools using HTML5
Features and APIs:
The W3C proposed a greater reliance on modularity as a key part of the plan to make
faster progress, meaning identifying specific features, either proposed or already existing in
the spec, and advancing them as separate specifications. Some technologies that were
originally defined in HTML5 itself are now defined in separate specifications:
 HTML Working Group – HTML Canvas 2D Context
 Web Apps WG – Web Messaging, Web Workers, Web Storage, WebSocket API,
Server-Sent Events
 IETF HyBi WG – WebSocket Protocol
 WebRTC WG – WebRTC
 W3C Web Media Text Tracks CG – WebVTT
After the standardization of the HTML5 specification in October 2014, the core vocabulary
and features are being extended in four ways. Likewise, some features that were removed

from the original HTML5 specification have been standardized separately as modules, such
as Microdata and Canvas. Technical specifications introduced as HTML5 extensions such as
Polyglot Markup have also been standardized as modules. Some W3C specifications that
were originally separate specifications have been adapted as HTML5 extensions or features,
such as SVG. Some features that might have slowed down the standardization of HTML5
will be standardized as upcoming specifications, instead. HTML 5.1 is expected to be
finalized in 2016, and it is currently on the standardization track at the W3C
Logical Structure:
PDF's logical structure features provide a mechanism for incorporating structural
information about a document's content into a PDF file. Such information might include, for
example, the organization of the document into chapters, headings, paragraphs and sections
or the identification of special elements such as figures, tables, and footnotes. The logical
structure features are extensible, allowing applications that produce PDF files to choose what
structural information to include and how to represent it, while enabling PDF consumers to
navigate a file without knowing the producer's structural conventions.
PDF logical structure shares basic features with standard document markup languages such as
HTML, SGML, and XML. A document's logical structure is expressed as a hierarchy of
structure elements, each represented by a dictionary object. Like their counterparts in other
markup languages, PDF structure elements can have content and attributes. In PDF, rendered
document content takes over the role occupied by text in HTML, SGML, and XML.
A PDF document's logical structure is stored separately from its visible content, with pointers
from each to the other. This separation allows the ordering and nesting of logical elements to
be entirely independent of the order and location of graphics objects on the document's pages.
The logical structure of a document is described by a hierarchy of objects called the structure
hierarchy or structure tree. At the root of the hierarchy is a dictionary object called the
structure tree root, located by means of the StructTreeRoot entry in the document catalog. See
Section 14.7.2, ("Structure Hierarchy") in PDF 1.7 (ISO 32000-1): Table 322 shows the
entries in the structure tree root dictionary. The K entry specifies the immediate children of
the structure tree root, which are structure elements.
LaTeX:
LaTeX uses a markup language in order to describe document structure and
presentation. LaTeX converts your source text, combined with the markup, into a high quality
document. For the purpose of analogy, web pages work in a similar way: the HTML is used
to describe the document, but it is your browser that presents it in its full glory - with
different colours, fonts, sizes, etc.
The input for LaTeX is a plain text file. You can create it with any text editor. It contains the
text of the document, as well as the commands that tell LaTeX how to typeset the text.
A minimal example looks something like the following (the commands will be explained
later):
documentclass{article}

begin{document}
Hello world!
end{document}
Spaces
The LaTeX compiler normalises whitespace so that whitespace characters, such as [space] or
[tab], are treated uniformly as "space": several consecutive "spaces" are treated as one,
"space" opening a line is generally ignored, and a single line break also yields ―space‖. A
double line break (an empty line), however, defines the end of a paragraph; multiple empty
lines are also treated as the end of a paragraph. An example of applying these rules is
presented below: the left-hand side shows the user's input (.tex), while the right-hand side
depicts the rendered output (.dvi/.pdf/.ps).
It does not matter whether
you
enter one or several
spaces
after a word.
An empty line starts a
new
paragraph.
It does not matter whether you enter one or several
spaces after a word.
An empty line starts a new paragraph.
Reserved Characters
The following symbols are reserved characters that either have a special meaning under
LaTeX or are unavailable in all the fonts. If you enter them directly in your text, they will
normally not print but rather make LaTeX do things you did not intend.
# $ % ^ & _ { } ~
As you will see, these characters can be used in your documents all the same by adding a
prefix backslash:
# $ % ^{} & _ { } ~{} textbackslash{}
The backslash character cannot be entered by adding another backslash in front of it ();
this sequence is used for line breaking. For introducing a backslash in math mode, you can
use backslash instead.
The commands ~ and ^ produce respectively a tilde and a hat which is placed over the next
letter. For example ~n gives ñ. That's why you need braces to specify there is no letter as
argument. You can also use textasciitilde and textasciicircum to enter these characters; or
other commands .
If you want to insert text that might contain several particular symbols (such as URIs), you
can consider using the verb command, which will be discussed later in the section on
formatting. For source code, see Source Code Listings

The 'less than' (<) and 'greater than' (>) characters are the only visible ASCII characters (not
reserved) that will not print correctly. See Special Characters for an explanation and a
workaround.
Non-ASCII characters (e.g. accents, diacritics) can be typed in directly for most cases.
However you must configure the document appropriately. The other symbols and many more
can be printed with special commands as in mathematical formulae or as accents. We will
tackle this issue in Special Characters.
Conclusion:
Thus, we have studied the programming to compile a text PDF file integrating Latex.

COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING

Similar to COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING (20)

More from PUNE VIDYARTHI GRIHA'S COLLEGE OF ENGINEERING, NASHIK

More from PUNE VIDYARTHI GRIHA'S COLLEGE OF ENGINEERING, NASHIK (20)

Recently uploaded

Recently uploaded (20)

COMPUTER LABORATORY-4 LAB MANUAL BE COMPUTER ENGINEERING