The document discusses exploiting multi-core architectures to speed up processing of satellite data. It describes developing a multithreaded application using a master-slave model to parallelize preprocessing steps like extracting auxiliary data, Reed-Solomon decoding, and decompression. Performance analysis shows the application scales dynamically based on system resources and achieves maximum speedup when assigning 7 threads per core. While theoretical speedup is limited by Amdahl's law, accounting for parallelization overhead provides a more realistic performance measure.
Effective Sparse Matrix Representation for the GPU ArchitecturesIJCSEA Journal
General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under consideration. One such very widely used computation intensive kernel is sparse matrix vector multiplication (SPMV) in sparse matrix based applications. Most of the existing data format representations of sparse matrix are developed with respect to the central processing unit (CPU) or multi cores. This paper gives a new format for sparse matrix representation with respect to graphics processor architecture that can give 2x to 5x performance improvement compared to CSR (compressed row format), 2x to 54x performance improvement with respect to COO (coordinate format) and 3x to 10 x improvement compared to CSR vector format for the class of application that fit for the proposed new format. It also gives 10% to 133% improvements in memory transfer (of only access information of sparse matrix) between CPU and GPU. This paper gives the details of the new format and its requirement with complete experimentation details and results of comparison.
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
Big data is a popular term used to define the exponential evolution and availability of data, includes both structured and unstructured data. The volatile progression of demands on big data processing imposes heavy burden on computation, communication and storage in geographically distributed data centers. Hence it is necessary to minimize the cost of big data processing, which also includes fault tolerance cost. Big Data processing involves two types of faults: node failure and data loss. Both the faults can be recovered using heartbeat messages. Here heartbeat messages acts as an acknowledgement messages between two servers. This paper depicts about the study of node failure and recovery, data replication and heartbeat messages.
Here are some question and there solve. This question belongs to cse department of Mawlana Bhashani Science and Technology University(MBSTU), Tangail, Bangladesh.
MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONSijcsit
This research paper aims at comparing two multi-core processors machines, the Intel core i7-4960X
processor (Ivy Bridge E) and the AMD Phenom II X6. It starts by introducing a single-core processor machine to motivate the need for multi-core processors. Then, it explains the multi-core processor machine and the issues that rises in implementing them. It also provides a real life example machines such as
TILEPro64 and Epiphany-IV 64-core 28nm Microprocessor (E64G401). The methodology that was used in comparing the Intel core i7 and AMD phenom II processors starts by explaining how processors' performance are measured, then by listing the most important and relevant technical specification to the
comparison. After that, running the comparison by using different metrics such as power, the use of HyperThreading
technology, the operating frequency, the use of AES encryption and decryption, and the different characteristics of cache memory such as the size, classification, and its memory controller. Finally, reaching to a roughly decision about which one of them has a better over all performance.
Effective Sparse Matrix Representation for the GPU ArchitecturesIJCSEA Journal
General purpose computation on graphics processing unit (GPU) is prominent in the high performance computing era of this time. Porting or accelerating the data parallel applications onto GPU gives the default performance improvement because of the increased computational units. Better performances can be seen if application specific fine tuning is done with respect to the architecture under consideration. One such very widely used computation intensive kernel is sparse matrix vector multiplication (SPMV) in sparse matrix based applications. Most of the existing data format representations of sparse matrix are developed with respect to the central processing unit (CPU) or multi cores. This paper gives a new format for sparse matrix representation with respect to graphics processor architecture that can give 2x to 5x performance improvement compared to CSR (compressed row format), 2x to 54x performance improvement with respect to COO (coordinate format) and 3x to 10 x improvement compared to CSR vector format for the class of application that fit for the proposed new format. It also gives 10% to 133% improvements in memory transfer (of only access information of sparse matrix) between CPU and GPU. This paper gives the details of the new format and its requirement with complete experimentation details and results of comparison.
Fault Tolerance in Big Data Processing Using Heartbeat Messages and Data Repl...IJSRD
Big data is a popular term used to define the exponential evolution and availability of data, includes both structured and unstructured data. The volatile progression of demands on big data processing imposes heavy burden on computation, communication and storage in geographically distributed data centers. Hence it is necessary to minimize the cost of big data processing, which also includes fault tolerance cost. Big Data processing involves two types of faults: node failure and data loss. Both the faults can be recovered using heartbeat messages. Here heartbeat messages acts as an acknowledgement messages between two servers. This paper depicts about the study of node failure and recovery, data replication and heartbeat messages.
Here are some question and there solve. This question belongs to cse department of Mawlana Bhashani Science and Technology University(MBSTU), Tangail, Bangladesh.
MULTI-CORE PROCESSORS: CONCEPTS AND IMPLEMENTATIONSijcsit
This research paper aims at comparing two multi-core processors machines, the Intel core i7-4960X
processor (Ivy Bridge E) and the AMD Phenom II X6. It starts by introducing a single-core processor machine to motivate the need for multi-core processors. Then, it explains the multi-core processor machine and the issues that rises in implementing them. It also provides a real life example machines such as
TILEPro64 and Epiphany-IV 64-core 28nm Microprocessor (E64G401). The methodology that was used in comparing the Intel core i7 and AMD phenom II processors starts by explaining how processors' performance are measured, then by listing the most important and relevant technical specification to the
comparison. After that, running the comparison by using different metrics such as power, the use of HyperThreading
technology, the operating frequency, the use of AES encryption and decryption, and the different characteristics of cache memory such as the size, classification, and its memory controller. Finally, reaching to a roughly decision about which one of them has a better over all performance.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop is an open- source implementation of MapReduce. Hadoop is useful for storing the large volume
of data into Hadoop Distributed File System (In distribute Manner) and that data get processed by MapReduce model
in parallel. MapReduce is scalable and efficient programming model to perform large scale data intensive application.
In Homogeneous Environment, it decreases the data transmission overhead because Hadoop schedules data location
and all nodes have ideal work with computing speed in a cluster. Unhappily, it’s difficult to take data locality in
heterogeneous or shared environments. The performance of MapReduce is improved using data Prefetching
mechanism which can fetch the data from slower remote node to faster node and as a result job execution time is
reduced. This mechanism hides the overhead of data processing and data transmission when data is not local. Data
prefetching design reduces data transmission overhead effectively.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Authoring Your OWN creative, electronic book for mathematics: the MC-squared ...Christian Bokhove
Workshop at the International Conference on Mathematics Textbook Research and Development, 29-31 July 2014, Southampton, UK.
The EU-funded‘MC-squared’ project is working with a number of European communities to develop digital, interactive, creative, mathematics ‘textbooks’ that the project calls ‘cBooks’. The cBooks are authored in a Digital Mathematics Environment in which participants can construct books with various interactive ‘widgets’. This paper provides an outline of the MC-squared project illustrating an interactive storyboard of the Digital Mathematics Environment architecture. This includes examples of how authoring by cBook designers of interactive ‘widgets’ is possible. The workshop that relates to this paper is augmented, of course, by suitable ‘hands-on’ materials aimed at two possible cBooks: one focusing on aspects of geometric and spatial thinking using building blocks, the other on aspects of number and fractions.
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
Hadoop is an open- source implementation of MapReduce. Hadoop is useful for storing the large volume
of data into Hadoop Distributed File System (In distribute Manner) and that data get processed by MapReduce model
in parallel. MapReduce is scalable and efficient programming model to perform large scale data intensive application.
In Homogeneous Environment, it decreases the data transmission overhead because Hadoop schedules data location
and all nodes have ideal work with computing speed in a cluster. Unhappily, it’s difficult to take data locality in
heterogeneous or shared environments. The performance of MapReduce is improved using data Prefetching
mechanism which can fetch the data from slower remote node to faster node and as a result job execution time is
reduced. This mechanism hides the overhead of data processing and data transmission when data is not local. Data
prefetching design reduces data transmission overhead effectively.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Authoring Your OWN creative, electronic book for mathematics: the MC-squared ...Christian Bokhove
Workshop at the International Conference on Mathematics Textbook Research and Development, 29-31 July 2014, Southampton, UK.
The EU-funded‘MC-squared’ project is working with a number of European communities to develop digital, interactive, creative, mathematics ‘textbooks’ that the project calls ‘cBooks’. The cBooks are authored in a Digital Mathematics Environment in which participants can construct books with various interactive ‘widgets’. This paper provides an outline of the MC-squared project illustrating an interactive storyboard of the Digital Mathematics Environment architecture. This includes examples of how authoring by cBook designers of interactive ‘widgets’ is possible. The workshop that relates to this paper is augmented, of course, by suitable ‘hands-on’ materials aimed at two possible cBooks: one focusing on aspects of geometric and spatial thinking using building blocks, the other on aspects of number and fractions.
Multi-core processor and Multi-channel memory architectureUmair Amjad
Content of presentation:
Multi-core processors
Multi-channel memory architecture
Comparison between single and multi channel memory
Conclusion
References
EMPLOYING MULTI CORE ARCHITECTURE TO OPTIMIZE ON PERFORMANCE, FOR APPROACH IN...IAEME Publication
Cloud detection is an important task in meteorological application. Cloud information is especially important for now-casting purposes [1] and as an input for different satellite based estimation of atmospheric and surface parameters [2 -4]. The solar energy is the principal source of energy in the solar system. Clouds have high reflectance and absorption property which is used to distinguish them with land, water or sea area. There is critical demand to develop application, which can calculate the presence of cloud by using the available satellite image processing data, so that prediction of radiated solar energy can be optimised and energy budget can be predicted more easily.
This is a PPT on the life and works of John Von Neumann. This PPT contains:
1. John Von Neumann
2. Background
3. Early Life
4. Collage Days
5. Progress In the US
6. Contributions
7. Contributions in Computer Feild
8. Stored Program Concept “Von Neumann Architecture”
9. Concepts behind the modern electronic digital computer 10. Atomic Bomb
11. Influences
12. End Of The Road
13. Honours
14. Conclusion
15. Thank You
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
An octa core processor with shared memory and message-passingeSAT Journals
Abstract This being the era of fast, high performance computing, there is the need of having efficient optimizations in the processor architecture and at the same time in memory hierarchy too. Each and every day, the advancement of applications in communication and multimedia systems are compelling to increase number of cores in the main processor viz., dual-core, quad-core, octa-core and so on. But, for enhancing the overall performance of multi processor chip, there are stringent requirements to improve inter-core synchronization. Thus, a MPSoC with 8-cores supporting both message-passing and shared-memory inter-core communication mechanisms is implemented on Virtex 5 LX110T FPGA. Each core is based on MIPS III (Microprocessor without interlocked pipelined stages) ISA, handling only integer type instructions and having six-stage pipeline with data hazard detection unit and forwarding logic. The eight processing cores and one central shared memory core are inter connected using 3x3 2-D mesh topology based Network-on-chip (NoC) with virtual channel router. The router is four stage pipelined supporting DOR X-Y routing algorithm and with round robin arbitration technique. For verification and functionality test of above fully synthesized multi core processor, matrix multiplication operation is mapped onto the above said. Partitioning and scheduling of multiple multiplications and addition for each element of resultant matrix has been done accordingly among eight cores to get maximum throughput. All the codes for processor design are written in Verilog HDL. Keywords: MPSoC, message-passing, shared memory, MIPS, ISA, wormhole router, network-on-chip, SIMD, data level parallelism, 2-D Mesh, virtual channel
A data center network is a system in which multiple server are connected to each other to share information and resources. Multiple remote office or user connected to data center network and server for resource or information sharing.
Multiple remote office connected to data center server via VPN. Multiple ISP connected each branch and give failover service and using routing protocol OSPF.
Efficient and scalable multitenant placement approach for in memory database ...CSITiaesprime
Of late Multitenant model with In-Memory database has become prominent area for research. The paper has used advantages of multitenancy to reduce the cost for hardware, labor and make availability of storage by sharing database memory and file execution. The purpose of this paper is to give overview of proposed Supple architecture for implementing in-memory database backend and multitenancy, applicable in public and private cloud settings. Backend in memory database uses column-oriented approach with dictionary based compression technique. We used dedicated sample benchmark for the workload processing and also adopt the SLA penalty model. In particular, we present two approximation algorithms, multi-tenant placement (MTP) and best-fit greedy to show the quality of tenant placement. The experimental results show that MTP algorithm is scalable and efficient in comparison with best-fit greedy algorithm over proposed architecture.
Implementation of Speed Efficient Image Processing algorithm on Multi-Process...AM Publications
Today’s world demand for faster, accurate and power efficient embedded devices. These devices are not only used for controlling applications, but are also expected to perform advanced applications like image processing, audio/video encoding/decoding, etc, which need more computational speed with low power. The multiprocessor systems on a chip (MPSoC) are an option to deal with these increasing computational needs. The proposed research work is based on development of the Multi-processors environment on a FPGA chip using Microblaze soft-core processor (from Xilinx Co.). Advances in FPGA technology led the implementation of such architectures in a single chip feasible and very appealing. This MPSoC environment will be used to reduce the overhead on central processor (Master). Later an Image processing application is developed to run on a Multi-processor environment and improvement in the speed of operation is observed.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
Affect of parallel computing on multicore processorscsandit
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make
number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of
Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the
various techniques there followed in, for improving the performance of the Multicore
Processors.
We conducted cluster experiments to find this limit. In this paper we propose an alternate design
of Multicore processor based on the results of our cluster experiment.
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
The paper addresses the importance of welding design to prevent corrosion at steel. Welding is
used to join pipe, profiles at bridges, spindle, and a lot more part of engineering construction. The
problems happened associated with welding are common issues in these fields, especially corrosion.
Corrosion can be reduced with many methods, they are painting, controlling humidity, and also good
welding design. In the research, it can be found that reducing residual stress on the welding can be
solved in corrosion rate reduction problem.
Preheating on 500oC and 600oC give better condition to reduce corosion rate than condition after
preheating 400oC. For all welding groove type, material with 500oC and 600oC preheating after 14 days
corrosion test is 0,5%-0,69% lost. Material with 400oC preheating after 14 days corrosion test is 0,57%-0,76%
lost.
Welding groove also influence corrosion rate. X and V type welding groove give better condition to reduce
corrosion rate than use 1/2V and 1/2 X welding groove. After 14 days corrosion test, the samples with
X welding groove type is 0,5%-0,57% lost. The samples with V welding groove after 14 days corrosion test is
0,51%-0,59% lost. The samples with 1/2V and 1/2X welding groove after 14 days corrosion test is 0,58%-
0,71% lost.
Router 1X3 – RTL Design and VerificationIJERD Editor
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, it‟s top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a person’s social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesn’t only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
One of the major environmental concerns is the disposal of the waste materials and utilization of
industrial by products. Lime stone quarries will produce millions of tons waste dust powder every year. Having
considerable high degree of fineness in comparision to cement this material may be utilized as a partial
replacement to cement. For this purpose an experiment is conducted to investigate the possibility of using lime
stone powder in the production of SCC with combined use GGBS and how it affects the fresh and mechanical
properties of SCC. First SCC is made by replacing cement with GGBS in percentages like 10, 20, 30, 40, 50 and
by taking the optimum mix with GGBS lime stone powder is blended to mix in percentages like 5, 10, 15, 20 as
a partial replacement to cement. Test results shows that the SCC mix with combination of 30% GGBS and 15%
limestone powder gives maximum compressive strength and fresh properties are also in the limits prescribed by
the EFNARC.
Democratizing Fuzzing at Scale by Abhishek Aryaabh.arya
Presented at NUS: Fuzzing and Software Security Summer School 2024
This keynote talks about the democratization of fuzzing at scale, highlighting the collaboration between open source communities, academia, and industry to advance the field of fuzzing. It delves into the history of fuzzing, the development of scalable fuzzing platforms, and the empowerment of community-driven research. The talk will further discuss recent advancements leveraging AI/ML and offer insights into the future evolution of the fuzzing landscape.
Quality defects in TMT Bars, Possible causes and Potential Solutions.PrashantGoswami42
Maintaining high-quality standards in the production of TMT bars is crucial for ensuring structural integrity in construction. Addressing common defects through careful monitoring, standardized processes, and advanced technology can significantly improve the quality of TMT bars. Continuous training and adherence to quality control measures will also play a pivotal role in minimizing these defects.
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Gen AI Study Jams _ For the GDSC Leads in India.pdf
Exploiting Multi Core Architectures for Process Speed Up
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 11, Issue 07 (July 2015), PP.08-15
8
Exploiting Multi Core Architectures for Process Speed Up
Deepika Roy1
, N. Saikumari2
, Ch Umakanth3
, M. Manju Sarma4
, B.Lakshmi5
National Remote Sensing Centre, Indian Space Research Organization, India
Abstract:- As a part of the Indian Space program, Indian Space Research Organization has been developing
and deploying Indian Remote Sensing (IRS) satellites. National Remote Sensing Centre (NRSC) is responsible
for acquiring, processing and disseminating the data products for different users. Pre-processing of satellite data
is one of the important steps in satellite data processing. It involves decoding, decompressing and reformatting
of the satellite data so that it can be given in an appropriate form to next level of processing. The pre-processing
step is time consuming and needs to be optimized to deliver the data products with minimum turnaround time.
This paper presents the design and performance analysis of a dynamically scalable multithreaded process
Keywords:- Multi CPU, Multi Core, Multithreading, Data Parallelism, Performance Speedup
I. INTRODUCTION
The growing application needs of remote sensing data demand higher spatial, spectral and temporal
resolutions, thus increasing the volumes of data. Added to this, the demand for near real-time supply of data has
increased the need for faster and faster processing. Hence improving process execution times has become an
important concern.
Earlier the servers used for data processing were single core machines. Saturation of clock speed of
single core processors, made CMP (Chip Multi Processors) as the main stream in CPU designs [1]. The multiple
cores in a processor run at a lower frequency but result in speedup as all cores can be executed in parallel. To
exploit the full capacity of multiple cores the software should be multithreaded which is achieved by splitting
the workload to different cores [2]. This introduced the multithreaded programming paradigm.
The applications which use massive data sets are termed as Data Intensive [5]. Due to large amount of
data to be processed, the data intensive applications are I/O bound and hence there is considerable amount of
CPU idle time which can be utilized by overlapping processing and I/O threads. The number of threads per
processor are to be optimum so that the advantages of multithreading are not overshadowed by the overhead
involved in threads creation and synchronization.
The applications where complex computations are involved are termed as Compute intensive [5]. Due
to heavy utilization of processor these are mostly CPU bound. In case of such applications, it is apt to run one
thread per processor. More than one thread per processor will result in unwanted overhead where time will be
wasted in scheduling the threads and context switching.
The independent frame/block wise format of satellite downlink data makes it a suitable candidate for
multithreaded implementation using data parallelism.
II. PARALLEL COMPUTING DESIGN METHODOLOGY
Parallel computing is a form of computation in which calculations are carried out simultaneously.
There are several different levels of parallel computing: bit-level, instruction level, data, and task. The form of
parallelism implemented in this paper is SIMD (single instruction multiple data).
The sequential programming method can utilize only one core resulting in underutilization of the multi
core architecture.
In Multi Threaded application, a large data set is broken into multiple subsets and each is assigned to
an individual core. After processing is complete, these subsets are recombined into a single output. This
implementation framework yields high performance by efficiently utilizing multiple cores in multi CPU multi
core architecture.
III. MULTITHREADING MODEL
The data parallelism model employed here is a variant of Master – Slave (Boss-Worker) model [3]. In
this design, two threads function as boss, one to assign tasks and the other to control the worker threads. The
first boss thread determines the number of worker threads, the data/work partitioning and assigning the jobs to
each worker thread [4].
2. Exploiting Multi Core Architectures for Process Speed Up
9
The number of worker threads are determined during run time based on number of free
processors/cores and memory. As the application is designed to run in time sharing environment memory usage
is to be optimum to avoid swapping.
The first boss thread also sets up various parameters for the worker threads. It initializes information
for each thread like the thread number, data subset to be processed, the output file and location.
Each worker thread performs the same task on a different data partition. Instead of keeping total subset
data in the memory, the worker thread runs in a loop for predefined blocks of the subset. This ensures optimal
utilization of memory resource. The second boss thread waits for the completion of all the worker threads,
consolidates the output result and performs the final cleaning up.
The execution of the worker thread is controlled using semaphores. One semaphore is used to control
the group of worker threads to reduce the overhead. Semaphore is initialized to the number of worker threads in
that group and decremented each time a new worker thread is created. The other semaphore is used to control
the joining of the worker threads.
Fig 1: Boss –Worker Thread Model
IV. APPLICATION
Parallel computing based on Data Parallelism has been applied for Level-0 data processing of recent
IRS satellites such as Oceansat-2, Cartosat-2B, Resourcesat-2 and RISAT-1. The Level-0 processing of IRS
satellite data involves various steps for pre-processing of raw data like extraction of auxiliary data, decoding,
decompression and reformatting of raw satellite data to make it suitable for data product generation.
A typical application would involve reading of a data partition from disk(I/O operation), processing
it(CPU bound processing) and then writing that processed data back to the disk(I/O operation). Thus we have to
decide the number of threads per processor such that the I/O operation of one thread can be interleaved with the
CPU bound processing of other thread. So minimum number of threads assigned to each processor is 2. The
number of threads per processing unit can be increased based on the CPU idle time.
The following processes are implemented using multithreaded approach for faster processing.
A. Auxiliary Data Extraction Process
IRS data is received as raw frames each consisting of auxiliary data followed by video data. Auxiliary
data gives the details to interpret the actual image data. It contains information about compression modes,
imaging modes, sensor modes, information about payload health and position. It is necessary to extract this
auxiliary data for further processing of image data.
semaphor
e
3. Exploiting Multi Core Architectures for Process Speed Up
10
This process extracts the auxiliary data from every frame of the raw data and writes it into a separate
file. The similarity in format of raw data frames makes it convenient to partition the data among threads. Each
thread then performs similar processing on the data partition assigned to it.
Each thread is provided with the start and end line of its data partition. All threads simultaneously
access the raw file for reading at a predefined offset. The extracted auxiliary data is also simultaneously written
into the output file at a predefined offset position. This simultaneous reading and writing to the same raw file is
done using the system calls provided by the POSIX threads library.
B. Reed Solomon Decoding
The IRS satellite data is encoded on board using Reed Solomon Encoding scheme. Encoding helps in
detecting and correcting errors that occur during data transmission and acquisition. RS Encoding scheme used
here is RS (255,247). In this type of encoding, for every 247 symbols 8 parity symbols are appended and sent as
a block of 255 symbols identifying it as a RS block. On ground the 8 parity symbols are used for error detection
and correction of the 247 symbols.
Each RS block requires similar operation to be performed. Each frame of raw data contains many RS
blocks. This gives scope for parallel computing. So data partitioning between threads is done based on frames.
The data partitioning scheme has to ensure that there is no dependency between the threads while
handling different satellite data. All threads decode their data subset and write the decoded data simultaneously
in the output file at their designated offsets. Each thread also calculates the number of errors found. In the end
the joining boss thread combines these counters of each thread and gives a single value for determining the
errors in the raw data.
C. Decompression
IRS satellites apply compression to the payload data to reduce the downlink data rate. Many types of
compression schemes have been used based on the imaging and payload type. Some examples of these
compression schemes are Differential Pulse Code Modulation (DPCM), Multi Linear Gain (MLG) and JPEG
like compression. The concept of data partitioning and multithreading is employed here for exploiting multi-
core architectures.
V. IMPLEMENTATION
The application is implemented using POSIX library for threads and C programming language on
LINUX platform. The POSIX thread library, IEEE POSIX 1003.1c standard, is a standards based thread API for
C/C++ [6]. Pthreads is a shared memory programming model where parallelism takes the form of parallel
function invocations by threads which can access shared global data. Pthreads defines a set of C programming
language types, functions and constants. It allows one to spawn a new concurrent process flow. It is most
effective on multi-processor or multi-core systems where the process flow can be scheduled to run on another
processor thus gaining speed through parallel or distributed processing.
There are other multithreading models available like compiler based OpenMP and Cilk. Our
application uses POSIX threads as it gives a low level control of thread creation, scheduling, joining,
synchronization, communication and more flexibility in program design. Even though programming with
Pthreads needs considerable threading-specific code, the finer grained control over threading applications
provided by its greater range of primitive functions makes POSIX threads the natural choice for our software[7].
Synchronization between threads is achieved using POSIX semaphores.
This multithreaded design is generic in nature and can be easily extended to other applications for IRS
satellite level-0 processing where there is a scope of data parallelism.
This design will work on any system with multi core architecture and will be scaled up/down
dynamically based on the system architecture which in turn will determine the level of parallelism that can be
achieved.
VI. DEPLOYMENT
The application is deployed on different configurations and it scales itself up or down dynamically
based on the processing units and memory available.
Deployment is done on Intel and AMD processors without any modifications in the software as it is
POSIX complaint and hence easily portable. It is deployed on architectures such as: Intel Xeon 4 CPU 4 Core
server with 16GB of memory and Red Hat Enterprise Linux version 5.3, Intel Itanium (IA-64) 4 CPU single
core server with 4GB of memory on RHEL version 4, Intel Xeon 4 CPU 4 Core server with 16GB of memory
and Red Hat Enterprise Linux version 5.3 and AMD Opteron 4 CPU 8 Core server with 32 GB of memory and
RHEL 5.5.
4. Exploiting Multi Core Architectures for Process Speed Up
11
VII. PERFORMANCE EVALUATION
A. Timing Analysis
The timing analysis was done on two different types of applications. Time taken to execute the
sequential and parallel applications was examined on 4 CPU 8 cores server (32 cores in total). The timing
analysis for these two test cases is as follows.
1) Test case 1- Reed Solomon Decoding
In the first test case, Reed Solomon decoding was run on 6.3 GB of raw data.
The sequential application took 403 seconds to execute.
The following table lists the time taken by the multithreaded data parallel application with varying number of
threads per core
TABLE I: Time taken to run test case 1 with different number of threads
Number of
threads per core
Total number
of threads
Time Taken(seconds)
1 32 72
2 64 70
3 96 65
4 128 63
5 160 63
6 192 60
7 224 50
8 256 60
9 288 68
10 320 70
Fig 2: Number of threads per core vs. time for Test Case 1
2) Test Case 2- Reed Solomon Decoding with DPCM Decompression
In second test case, the application performed both Reed Solomon decoding and Differential Pulse
Code Modulation Decompression on 9.6 GB of raw data.
The sequential application took 985 seconds to execute.
The following table lists the time taken by the multithreaded data parallel application with varying
number of threads per core
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10
TimeTaken
inseconds
Number of threads per core
5. Exploiting Multi Core Architectures for Process Speed Up
12
TABLE II: TIME TAKEN TO RUN THE TEST CASE 2 WITH DIFFERENT NUMBER OF THREADS
Number of
threads per core
Total number
of threads
Time Taken(seconds)
1 32 720
2 64 362
3 96 343
4 128 264
5 160 192
6 192 138
7 224 132
8 256 317
9 288 358
10 320 390
Fig 3: Number of threads per core vs. time for Test Case 2
It can be observed from the graphs shown in Figure 2 and Figure 3 that the maximum speedup is
achieved when 7 threads are assigned to each core i.e. when 224 threads are created in total. It is also observed
that if the number of threads per core increases beyond 7 the execution time starts increasing. This can be
attributed to the fact that more the number of threads, more time is spent in thread overheads like thread
creation, synchronization, scheduling and swapping and thus performance speedup reduces. Hence the number
of threads should be created based on the nature of the application. Increasing the number of threads may not
lead to increasing the speed up after certain limit and may also cause performance degradation.
B. Speedup Calculation
The speedup achieved by parallelizing the application can be calculated as:
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑇(1)
𝑇(𝑁)
(1)
Where
T(1) : is the time required to run the serial code on a single processor
T(N): time required to run the parallel code on N processors
Here T(1) contains the time for the serial version Ts and the parallelizable portion Tp of the code. Hence T(1) =
Ts + Tp.
Theoretically, if a task is split among N processors, it should complete in 1/N of the serial time, giving
a speedup of N. However in real world scenario, there will always be some tasks that need to run serially and
cannot be parallelized. These serial tasks will not run any faster on a multi core system and will take the same
time as taken on a single processor system. Thus the time spent in executing these serial tasks will limit the total
speedup of the parallel application.
These limitations are taken into account to some extent by Amdahl’s Law, which states that the
speedup of a program using multiple processors in parallel computing is limited by the time needed for the
sequential fraction of the program. So speedup according to this law can be calculated as:
0
100
200
300
400
500
600
700
800
1 2 3 4 5 6 7 8 9 10
TimeTaken
inseconds
Number of threads per core
6. Exploiting Multi Core Architectures for Process Speed Up
13
𝑆 𝑁 =
𝑇 1
𝑇 𝑁
=
𝑇𝑠+𝑇𝑝
𝑇𝑠+
𝑇𝑝
𝑁
(2)
Where
Ts : Time taken by the serial part of the code
Tp : Time taken by the parallelizable part of the code
N: Number of processing units (currently 32)
For test case 1,
T(1) = 403 seconds
Ts = 7 seconds
Tp = 396 seconds
N = 32
Substituting these values in the equation 2, we get the speedup factor according to Amdahl’s Law for test case 1
as 20.
For test case 2,
T(1) = 985 seconds
Ts = 14 seconds
Tp = 971 seconds
N = 32
Substituting these values in the equation 2, we get the speedup factor according to Amdahl’s Law for test case 2
as 22.
However the speedup factors thus calculated for both the test cases according to Amdahl’s Law gives
far too optimistic values. This is because it ignores the overhead incurred due to parallelizing the code.
A more realistic calculation of speedup would be to include the parallelizing overhead too [8]. So the
following terms are also included in calculating the speedup:
Tis: The (average) additional serial time spent doing things like inter thread communications, thread creation
and setup, and so forth in all parallelized tasks. This time can depend on N, the number of processors, in a
variety of ways, but the simplest assumption is that each thread has to expend this much time, one after the
other, so that the total additional serial time is N* Tis. In our application the overhead time taken for thread
creation, setting up of initial environment and inputs for threads and thread synchronization takes 9.8
seconds for test case 1 and 38 seconds for test case 2.
Tip: The (average) additional time spent by each processor doing just the setup and work that it does in
parallel. This may well include the idle time, which is often important enough to be accounted for separately.
This time amounts to 2 usecs per thread.
By considering the above factors, the expected speed up can be calculated as:
𝑆𝑝𝑒𝑒𝑑𝑢𝑝 =
𝑇𝑠 + 𝑇𝑝
𝑇𝑠 + 𝑁 ∗ 𝑇𝑖𝑠 +
𝑇𝑝
𝑁
+ 𝑇𝑖𝑝
3
For test case 1, substituting the above two overhead times in equation 3, we get the calculated speedup as
13. The real maximum speedup achieved for test case 1 at 7 threads per processor using equation 1 and Table 1
is 8.
For test case 2, substituting the above two overhead times in equation 3, we get the calculated speedup as
12. The real maximum speedup achieved for test case 2 at 7 threads per processor using equation 1 and Table 2
is 7.
It is observed from both the test cases that the actual maximum speedup achieved is close to but less than
the calculated speedup. This can be attributed to the fact that some time is also spent in merging the outputs of
threads. Some variable delays also happen due to thread scheduling which is dependent on the operating system
and beyond the control of the application. Hence the achieved speedup is in accordance to the expected values
for our application.
C. CPU Performance Analysis
CPU performance was analysed on an Intel Xeon 4 CPU single core server with Hyper-Threading
enabled. Figure 4 shows the CPU utilization by the sequential application. It is observed that only one CPU
shows maximum performance at one instance of time. The other processing nodes are mostly idle. The memory
utilization is also very less as only 174 MB of 4GB is used.
7. Exploiting Multi Core Architectures for Process Speed Up
14
Fig 4: CPU & Memory utilization - sequential processing
Figure 5 shows the performance of multithreaded application on the same server.
Fig 5: CPU & Memory utilization - multithreaded processing
As the application starts and threads are created, all the CPUs show peak (100%) performance. In this
application 7 threads are running in parallel. This indicates that all the cores are being used to their maximum
performance. The memory profile also shows that 1.9 GB of 4GB available memory is being used. Thus the
multithreaded application efficiently utilizes all the available resources of processing nodes and memory to
speed up the data processing.
VIII. CONCLUSION
The speedup achieved for test case 1 with raw data size of 6.3 GB is 8, the theoretical calculated
speedup factor being 13. For test case 2 with 9.6 GB of raw data, the achieved speed up factor is 7 and the
theoretical calculated speedup factor was 12.The achieved speedup in both the test cases is very close to the
expected values taking into account some variable delays in threading overheads when 7 threads are assigned to
each core of a 4 CPU 8 core server.
The speed up achieved has resulted in considerable reduction of Turn Around Time for data products
generation thereby realizing the required delivery timelines. The application design is generic and can be easily
configured for new applications for future IRS satellite missions thereby reducing the implementation and
testing times.
8. Exploiting Multi Core Architectures for Process Speed Up
15
ACKNOWLEDGEMENTS
The authors extend their sincere gratitude to Deputy Director, SDAPSA and Director, NRSC for their
constant encouragement and support.
REFERENCES
[1]. Bryan Schauer, “Multicore Processors – A Necessity”, ProQuest Discovery Guides, September 2008,
http://www.csa.com/discoveryguides/multicore/review.pdf
[2]. M. Aater Suleman, Moinuddin K Qureshi , Onur Mutlu, , Yale N. Patt, “Accelerating Critical Section
Execution with Asymmetric Multi-Core Architectures”, January 2010
[3]. Bradford Nichols, Dick Buttlar, and Jackie Farrell, “Pthreads Programming”, First Edition, O'Reilly &
Associates, September 1996
[4]. Ian J. Ghent ,“Faster results by multi-threading data steps”, SAS Global Forum, 2010,
http://support.sas.com/resources/papers/proceedings10/109-2010.pdf
[5]. Richard T. Kouzes, Gordon A. Anderson, Stephen T. Elbert, Ian Gorton, and Deborah K. Gracio, “The
Changing Paradigm of Data-Intensive Computing”, IEEE Computer Society, 2009
[6]. D. Butenhof, “Programming With POSIX Threads”, Addison-Wesley, 1997
[7]. Andrew Binstock, “Threading Models for High-Performance Computing: Pthreads or OpenMP?”,
October 2010, http://software.intel.com/en-us/articles/threading-models-for-high-performance-
computing-pthreads-or-openmp
[8]. Robert G. Brown, “Maximizing Beowulf Performance”, USENIX Association Proceedings of the 4th
Annual Linux Showcase & Conference, October 2000,
https://www.usenix.org/legacy/publications/library/proceedings/
als00/2000papers/papers/full_papers/brownrobert/brownrobert.pdf