In supporting its large scale, multidisciplinary scientific research efforts across all the university campuses and by the research personnel spread over literally every corner of the state, the state of Nevada needs to build and leverage its own Cyber infrastructure. Following the well-established as-a-service model, this state-wide Cyber infrastructure that consists of data acquisition, data storage, advanced instruments, visualization, computing and information processing systems, and people, all seamlessly linked together through a high-speed network, is designed and operated to deliver the benefits of Cyber infrastructure-as-aService (CaaS).There are three major service groups in this CaaS, namely (i) supporting infrastructural
services that comprise sensors, computing/storage/networking hardware, operating system, management tools, virtualization and message passing interface (MPI); (ii) data transmission and storage services that provide connectivity to various big data sources, as well as cached and stored datasets in a distributed
storage backend; and (iii) processing and visualization services that provide user access to rich processing and visualization tools and packages essential to various scientific research workflows. Built on commodity hardware and open source software packages, the Southern Nevada Research Cloud(SNRC)and a data repository in a separate location constitute a low cost solution to deliver all these services around CaaS. The service-oriented architecture and implementation of the SNRC are geared to encapsulate as much detail of big data processing and cloud computing as possible away from end users; rather scientists only need to learn and access an interactive web-based interface to conduct their collaborative, multidisciplinary, dataintensive research. The capability and easy-to-use features of the SNRC are demonstrated through a use case that attempts to derive a solar radiation model from a large data set by regression analysis.
Grid Computing - Collection of computer resources from multiple locationsDibyadip Das
Grid computing is the collection of computer resources from multiple locations to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files.
Grid Computing is the emerging technology. you will learn all the stuff related to grid computing in this slides. this slide shows various architecture and its easy explanation.
Grid Computing - Collection of computer resources from multiple locationsDibyadip Das
Grid computing is the collection of computer resources from multiple locations to reach a common goal. The grid can be thought of as a distributed system with non-interactive workloads that involve a large number of files.
Grid Computing is the emerging technology. you will learn all the stuff related to grid computing in this slides. this slide shows various architecture and its easy explanation.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
A Literature Survey on Resource Management Techniques, Issues and Challenges ...TELKOMNIKA JOURNAL
Cloud computing is a large scale distributed computing which provides on demand services for
clients. Cloud Clients use web browsers, mobile apps, thin clients, or terminal emulators to request and
control their cloud resources at any time and anywhere through the network. As many companies are
shifting their data to cloud and as many people are being aware of the advantages of storing data to cloud,
there is increasing number of cloud computing infrastructure and large amount of data which lead to the
complexity management for cloud providers. We surveyed the state-of-the-art resource management
techniques for IaaS (infrastructure as a service) in cloud computing. Then we put forward different major
issues in the deployment of the cloud infrastructure in order to avoid poor service delivery in cloud
computing.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESneirew J
ABSTRACT
Data in the cloud is increasing rapidly. This huge amount of data is stored in various data centers around the world. Data deduplication allows lossless compression by removing the duplicate data. So, these data centers are able to utilize the storage efficiently by removing the redundant data. Attacks in the cloud computing infrastructure are not new, but attacks based on the deduplication feature in the cloud computing is relatively new and has made its urge nowadays. Attacks on deduplication features in the cloud environment can happen in several ways and can give away sensitive information. Though, deduplication feature facilitates efficient storage usage and bandwidth utilization, there are some drawbacks of this feature. In this paper, data deduplication features are closely examined. The behavior of data deduplication depending on its various parameters are explained and analyzed in this paper.
SYSTEMS USING WIRELESS SENSOR NETWORKS FOR BIG DATACSEIJJournal
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
Systems using Wireless Sensor Networks for Big Datacivejjour
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
Systems using Wireless Sensor Networks for Big DataCSEIJJournal
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
The advent of Big Data has seen the emergence of new processing and storage challenges. These challenges are often solved by distributed processing. Distributed systems are inherently dynamic and unstable, so it is realistic to expect that some resources will fail during use. Load balancing and task scheduling is an important step in determining the performance of parallel applications. Hence the need to design load balancing algorithms adapted to grid computing. In this paper, we propose a dynamic and hierarchical load balancing strategy at two levels: Intrascheduler load balancing, in order to avoid the use of the large-scale communication network, and interscheduler load balancing, for a load regulation of our whole system. The strategy allows improving the average response time of CLOAK-Reduce application tasks with minimal communication. We first focus on the three performance indicators, namely response time, process latency and running time of MapReduce tasks.
A Literature Survey on Resource Management Techniques, Issues and Challenges ...TELKOMNIKA JOURNAL
Cloud computing is a large scale distributed computing which provides on demand services for
clients. Cloud Clients use web browsers, mobile apps, thin clients, or terminal emulators to request and
control their cloud resources at any time and anywhere through the network. As many companies are
shifting their data to cloud and as many people are being aware of the advantages of storing data to cloud,
there is increasing number of cloud computing infrastructure and large amount of data which lead to the
complexity management for cloud providers. We surveyed the state-of-the-art resource management
techniques for IaaS (infrastructure as a service) in cloud computing. Then we put forward different major
issues in the deployment of the cloud infrastructure in order to avoid poor service delivery in cloud
computing.
Open source grid middleware packages – Globus Toolkit (GT4) Architecture , Configuration – Usage of Globus – Main components and Programming model - Introduction to Hadoop Framework - Mapreduce, Input splitting, map and reduce functions, specifying input and output parameters, configuring and running a job – Design of Hadoop file system, HDFS concepts, command line and java interface, dataflow of File read & File write.
International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
ANALYSIS OF ATTACK TECHNIQUES ON CLOUD BASED DATA DEDUPLICATION TECHNIQUESneirew J
ABSTRACT
Data in the cloud is increasing rapidly. This huge amount of data is stored in various data centers around the world. Data deduplication allows lossless compression by removing the duplicate data. So, these data centers are able to utilize the storage efficiently by removing the redundant data. Attacks in the cloud computing infrastructure are not new, but attacks based on the deduplication feature in the cloud computing is relatively new and has made its urge nowadays. Attacks on deduplication features in the cloud environment can happen in several ways and can give away sensitive information. Though, deduplication feature facilitates efficient storage usage and bandwidth utilization, there are some drawbacks of this feature. In this paper, data deduplication features are closely examined. The behavior of data deduplication depending on its various parameters are explained and analyzed in this paper.
SYSTEMS USING WIRELESS SENSOR NETWORKS FOR BIG DATACSEIJJournal
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
Systems using Wireless Sensor Networks for Big Datacivejjour
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
Systems using Wireless Sensor Networks for Big DataCSEIJJournal
Wireless sensor networks are continually developing in the big data world and are widely employed in
many aspects of life. In the monitoring region, the WSN gathers, analyses, and sends information about the
detected item. In recent years, WSN has also made important strides in the management of critical data
protection, traffic monitoring, and climate - change detection. The rich big data contributors known as
wireless sensor networks provide a significant amount of data from numerous sensor nodes in large-scale
networks (WSNs), which are among the numerous potential datasets. However, unlike traditional wireless
networks, suffer from significant constraints in communication and data dependability due to the cluster’s
constraints. This paper gives a detailed assessment of cutting-edge research on using WSN into large data
systems Potential network and effective deployment and scientific problems are presented and discussed in
the context of the study topics and aim. Finally, unresolved issues are addressed in order to discuss
interesting future research possibilities.
CONTAINERIZED SERVICES ORCHESTRATION FOR EDGE COMPUTING IN SOFTWARE-DEFINED W...IJCNCJournal
As SD-WAN disrupts legacy WAN technologies and becomes the preferred WAN technology adopted by corporations, and Kubernetes becomes the de-facto container orchestration tool, the opportunities for deploying edge-computing containerized applications running over SD-WAN are vast. Service orchestration in SD-WAN has not been provided with enough attention, resulting in the lack of research focused on service discovery in these scenarios. In this article, an in-house service discovery solution that works alongside Kubernetes’ master node for allowing improved traffic handling and better user experience when running micro-services is developed. The service discovery solution was conceived following a design science research approach. Our research includes the implementation of a proof-ofconcept SD-WAN topology alongside a Kubernetes cluster that allows us to deploy custom services and delimit the necessary characteristics of our in-house solution. Also, the implementation's performance is tested based on the required times for updating the discovery solution according to service updates. Finally, some conclusions and modifications are pointed out based on the results, while also discussing possible enhancements.
An elastic , effective, activety or intelligent ,graceful networking architecture layout be desired to make processing massive data. next to that ,existent network architectures be considerably incapable for
cleatting the huge data. massive data thrusts network exchequers into border it consequence with in network overcrowding ,needy achievement, then permicious employer exprtises. this offered the current state-of-the-art research affronts ,potential solutions into huge data networking notion. More specifically, present the state of networking problems into massive data connected intrequirements,capacity,running ,
data manipulating also will introduce the architectures of MapReduce , Hadoop paradigm within research
requirements, fabric networks and software defined networks which utilizized into making today’s idly growing digital world and compare and contrast into identify relevant drawbacks and solutions.
Privacy preserving public auditing for secured cloud storagedbpublications
As the cloud computing technology develops during the last decade, outsourcing data to cloud service for storage becomes an attractive trend, which benefits in sparing efforts on heavy data maintenance and management. Nevertheless, since the outsourced cloud storage is not fully trustworthy, it raises security concerns on how to realize data deduplication in cloud while achieving integrity auditing. In this work, we study the problem of integrity auditing and secure deduplication on cloud data. Specifically, aiming at achieving both data integrity and deduplication in cloud, we propose two secure systems, namely SecCloud and SecCloud+. SecCloud introduces an auditing entity with a maintenance of a MapReduce cloud, which helps clients generate data tags before uploading as well as audit the integrity of data having been stored in cloud. Compared with previous work, the computation by user in SecCloud is greatly reduced during the file uploading and auditing phases. SecCloud+ is designed motivated by the fact that customers always want to encrypt their data before uploading, and enables integrity auditing and secure deduplication on encrypted data.
Cisco Network Convergence System: Building the Foundation for the Internet of...Cisco Service Provider
Cisco Network Convergence System (NCS) is a family of integrated packet routing and transport systems designed to help service providers capture their share of the IoE Value at Stake. NCS is built on major innovations in silicon, optics and software and provides the building blocks of a multilayer converged network that intelligently manages and scales functions across its architecture.
ACG Research analyzed the business case for NCS and found it achieves massive scale via multichassis system architecture, the density and performance of its new chip set, and the extension of the control plane to virtual machines (VM) internally and externally.
The NECOS project addresses the limitations of current cloud computing infrastructures to respond to the demand of new services, as presented in two use-cases, that will drive the whole execution of the project.
The NECOS platform will be based on state of the art open software platforms, which will be carefully selected, rather than start from scratch. This baseline platform will be enhanced with the management and orchestration algorithms and the APIs that will constitute the research activity of the project. Finally, the NECOS platform will be validated, in the context of the two proposed use cases, using the 5TONIC and FIBRE testing frameworks.
This volume of the Open Datacenter Interoperable Network (ODIN) describes software defined networking (SDN) and OpenFlow. SDN is used to simplify network control and management, automate network virtualization services, and provide a platform from which to build agile ....
Similar to CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN SCIENTIFIC RESEARCH (20)
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Explore the innovative world of trenchless pipe repair with our comprehensive guide, "The Benefits and Techniques of Trenchless Pipe Repair." This document delves into the modern methods of repairing underground pipes without the need for extensive excavation, highlighting the numerous advantages and the latest techniques used in the industry.
Learn about the cost savings, reduced environmental impact, and minimal disruption associated with trenchless technology. Discover detailed explanations of popular techniques such as pipe bursting, cured-in-place pipe (CIPP) lining, and directional drilling. Understand how these methods can be applied to various types of infrastructure, from residential plumbing to large-scale municipal systems.
Ideal for homeowners, contractors, engineers, and anyone interested in modern plumbing solutions, this guide provides valuable insights into why trenchless pipe repair is becoming the preferred choice for pipe rehabilitation. Stay informed about the latest advancements and best practices in the field.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
CYBER INFRASTRUCTURE AS A SERVICE TO EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN SCIENTIFIC RESEARCH
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
DOI:10.5121/ijcsit.2017.9303 31
CYBER INFRASTRUCTURE AS A SERVICE TO
EMPOWER MULTIDISCIPLINARY, DATA-DRIVEN
SCIENTIFIC RESEARCH
Xiangrong Ma1
, Zhao Fu1
, Yingtao Jiang1
, Mei Yang1
, Haroon Stephen2
1
Department of Electrical and Computer Engineering, University of Nevada, Las Vegas,
Las Vegas, USA 89154
2
Department of Civil and Environmental Engineering and Construction
University of Nevada, Las Vegas, Las Vegas, USA 89154
ABSTRACT
In supporting its large scale, multidisciplinary scientific research efforts across all the university campuses
and by the research personnel spread over literally every corner of the state, the state of Nevada needs to
build and leverage its own Cyber infrastructure. Following the well-established as-a-service model, this
state-wide Cyber infrastructure that consists of data acquisition, data storage, advanced instruments,
visualization, computing and information processing systems, and people, all seamlessly linked together
through a high-speed network, is designed and operated to deliver the benefits of Cyber infrastructure-as-a-
Service (CaaS).There are three major service groups in this CaaS, namely (i) supporting infrastructural
services that comprise sensors, computing/storage/networking hardware, operating system, management
tools, virtualization and message passing interface (MPI); (ii) data transmission and storage services that
provide connectivity to various big data sources, as well as cached and stored datasets in a distributed
storage backend; and (iii) processing and visualization services that provide user access to rich processing
and visualization tools and packages essential to various scientific research workflows. Built on commodity
hardware and open source software packages, the Southern Nevada Research Cloud(SNRC)and a data
repository in a separate location constitute a low cost solution to deliver all these services around CaaS. The
service-oriented architecture and implementation of the SNRC are geared to encapsulate as much detail of
big data processing and cloud computing as possible away from end users; rather scientists only need to
learn and access an interactive web-based interface to conduct their collaborative, multidisciplinary, data-
intensive research. The capability and easy-to-use features of the SNRC are demonstrated through a use
case that attempts to derive a solar radiation model from a large data set by regression analysis.
KEYWORDS
Cyber infrastructure-as-a-Service; cloud computing; big data; Map Reduce; data-driven scientific research.
1. INTRODUCTION
Funded by the U.S National Science Foundation(NSF) since 2013, the Nevada Solar Nexus
project[1]marks a long term, large scale, state-wide effort to study the impacts of solar energy
generation on limited water resouces and fragile desert environment in the state of Nevada. One of
the key elements of the Nexus project, truly multidisciplinary by nature, is to build and leverage its
Cyber infrastructure(CI) to support a large array of Nexus researchers (college professors, post
docs, graduate students, undergraduate students, and community participants) for their research
endeavors. Up to date, these Nexus researchers have collected over 30GBytes of geospatial data
(atmosphere, precipitation, soil, and vegetation) and 1TB of image data from 13 sensor towers at
multiple sites, and the size of the data collected is continuously growing every day at a very fast
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
32
pace. These valuable data are all delivered and deposited directly to the Nevada Research Data
Center (NRDC), a data repository physically located at a university campus in northern Nevada.
Except the data storage and management services offered by the NRDC, the southern Nevada is
tasked to provide all the remaining cyber infrastructure services, including data integration, data
mining, data analysis/processing, and data visualization as well as other needed computing and
information processing services, through a private cloud, named as the Southern Nevada Research
Cloud(SNRC). All the sensors and the CI services provided by NRDC and SNRC are seamlessly
linked together by high-speed networks. If fully realized, this Nexus CI will drive the entire
research community to take advantage of all its capabilities to pursue their data-intensive,
multidisciplinary research ambitions.
Design, implementation, and operation of the Nexus CI follows the increasingly prevalent “as a
service” model, hereafter dubbed as Cyber infrastructure-as-a-Service (CaaS). With CaaS as the
delivery model, the SNRC is specially designed to bring traditionally separate High Performance
Computing(HPC) and Big data processing platforms together to create unified cloud platform. In
addition, through a web-based service endpoint, integration of SNRC and NRDC allow send users
to effortlessly access and process remote large datasets through high-speed networks and advanced
data processing and storage capabilities. The CaaS can be divided into three major service groups:
Figure1. The major service groups and servcies of the CaaS model.
1) Supporting Infrastructural Services that comprise all types of hardware, operating system,
management tools, virtualization, and message passing interface(MPI);
2) Data Transmission and Storage Services that provide the connectivity to Big data sources,
as well as cached and stored datasets in a distributed storage backend;
3) Processing and Visualization Services that provide user an access to computing resources
and tools essential to process and visualize large datasets.
All the major services in each of the three service groups in CaaS are shown in Figure 1. End user
can access all of these services through a web-based interface. Note that majority of the services
are built upon open source projects, which considerably cuts down the development cost and time.
The time and financial resources thus saved are rather diverted to the components unique to the
Nexus CI, such as the data caching and system management. In what follows, Section 2 details the
design and implementations related to the infrastructural level services, such as hardware,
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
33
Operating System, virtualization and networking. Section 3 and Section 4 cover the design and
implementation issues of the storage and processing subsystems, respectively. The capability and
easy-to-use feature of the SNRC are demonstrated through a use case described in Section
5.Finally, the paper is summarized in Section 6.
2. SUPPORTING INFRASTRUCTURAL SERVICES
2.1. Hardware Resources
All the hardware components needed by the SNRC were determined by evaluating various factors,
tradeoffs and scenarios, including cost, turn-around time, estimation and prediction of current and
future data volumes, number of users to be supported, vendor reputation, and type and level of
technical support from vendors. The major hardware components of the SNRC are listed in Figure
2, and they are housed in a server rack shown in the Figure.
Figure2. The hardware components of SNRC.
2.2. Operating System
Traditional GNU/Linux distributions have some limitations in a cloud-computing context, for
reasons summarized below.
• Compatibility: Most traditional Enterprise Linux distributions have release a cycle spanning
three to five years, depending on certain Long-term support kernel and foundational tools
and libraries (e.g. C libraries). While a new distribution is still under development, often
those outstanding distributions may have already become obsolete, making them
incompatible with many tools which are desired or must be supported for scientific research.
• Unnecessary bloat: The one-size-fits-all design tends to make the distributions
unnecessarily large and bloat, which leads to an enlarged test matrix for a new release.
• Unoptimization: Since all the binary distributions attempt to support as many platforms as
possible, they usually target the most popular micro-architectures in the market. Thus, new
instructions and other less popular hardware resources are typically poorly supported, or
even left out unsupported.
These limitations forced us to develop a new distribution to meet the needs of the SNRC. The new
distribution was based on Gen too Linux [6], a source-based rolling release meta distribution
which offers a large degree of flexibility for distribution customization and optimization. Our
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
34
distribution was highly customized to include just the essentials and User land applications, while
performance optimization was performed to reflect the specific processor architectures and
application environment that we were working with. Security was enhanced with kernel hardening
[7].The distribution already saw an average of 2% performance gain with compiler flags tuning
only. Introducing faster libraries had another huge impacts on application performance; we noticed
up to 36X speedup on matrix operations when choosing Open BLAS[8]over net lib[9]. However,
deployment of the new distribution on physical computing nodes met some practical problems.
First, this distribution experienced resistance from the system administrative stuff that did not
possess the skill set and understanding of the underline operating system internals. Secondly,
support of physical hardware required more device drivers and administrative tools, which were
not always useful in user application environment. Instead, Centos 7 was finally a dopted to be the
base OS running on the physical machines (Figure 1) upon its public release, and the new
distribution was later repacked to include cloud image, running inside the virtual machines and
containers.
2.3. Virtualization and Networking
Virtualization helps improve the scalability, efficiency and availability of resources. Traditional
virtualization tools, such as libvirt, were found to have poor scalability from our own experimental
study.
.
Figure 3. The Network Architecture.
As the number of virtual instances increases, the network performance could drop dramatically. In
our experiment,40 virtual instances were setup by libvirt with KVM[10], but2 out of the 40 nodes
suffered a packet lose rate of more than90% constantly. This serious problem was solved by
adding Open Stack[11] and Open vSwitch (OVS) [12]. OVS supports standard management
interfaces and protocols and enables network automation through programmatic extension. To
isolate traffic between data processing and cluster control/management services, networks were
segmented into three different Virtual LANs (VLANs) as illustrated in Figure3.
Figure 4 shows how traffic is routed from the North to the South (i.e., traffic between a virtual
instance and the external network). Firewall and packet state tracking are handled by the security
groups, which can be configured within Open Stack. To send packets from a virtual instance to the
external network (the Internet in our case), firstly, the instance tap interface forwards packets to the
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
35
Linux bridge (qbr), after which the packets pass through the OVS integration bridge (br-int) port
(qvb). Patch ports, a pair of virtual devices that act as a patch cable, pair the integration bridge and
tunnel bridge. The tunnel bridge sends the data packets to the controller node over a physical
interface using the Generic Routing Encapsulation (GRE) [13], as our payload protocols are
compatible, but payload addresses are not. The controller next forwards the data packets from the
tunnel bridge to the integration bridge. Since there might be multiple virtual networks, a
namespace router is included. This router forwards packets from the integration bridge to its
gateway (qg), which is patched with the external bridge. The external bridge sends the data packets
to the physical gateway that connects to the Internet.
Figure 4. Network traffic flow (North-South).
2.4. Management
Not only do management tools help reduce the human labor needed, they actually are a key piece
to the success of this project. Various tools have been deployed to ease the configuration and
management task throughout the development. In specific, hardware resources are managed
through Open Stack and IPMI tools, while users are managed through LDAP. Besides these
existing tools, a number of scripts were developed in-house to automate the management tasks.
The script developed for OS installation and configuration, for instance, enables the MAC
address to be hashed into the hostnames and internal IP address, and then it generates Kick Start
files based on the customized templates. TFTP and other services were configured to allow
computing nodes to boot from the Intel Preboote Xecution Environment (PXE), loading the OS
image from the head node through the network and then completing the installation.
3. DATA TRANSMISSION AND STORAGE SERVICES
3.1. Data Transmission
Data transmission is one central service that needs to be provided bythe CaaS. Allowing users to
have seamless access to external datasets saves them huge amount of time otherwise spent on
data preparation. An efficient data transmission model shall minimize redundant data
transmissions and optimize the network connection query to give users best possible Quality-of-
Service (QoS) experiences. In SNRC, we implemented a transmission scheme that involves
resource caching and connection tracking and management, as illustrated in Figure 5.
The data transmission unit consists of five major components: REST clients, a global connection
manager (CM), a caching server, a sanitizer, and persistent local storage (e.g., a database or a file
system). A REST client serves as an adapter to different data sources, and provides user API to
query and download datasets stored in a remote site. Acaching server saves a REST request and
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
36
its corresponding response asone key/value pair to avertredundant transmissions. The CM puts all
the external network requests into a queue, schedules them based on their QoS requirements, and
monitors the connection status. To service one data request, following steps are typically
involved.
• The system first checks if the data are already in the local storage system. If yes, the location
of that data will be immediately returned.
• If the data are found not locally available, check if it is in caching server. If it is a hit, return
the data.
• Otherwise (it is a miss), send a request to the CM, and the CM will queue and execute that
request. Upon fulfilling that request, the data will be sent back to the caller that initiated the
request.
• The sanitizer will validate the data and store it into a local storage system.
Figure5. The workflow to servcice a data loading request.
3.2. Distributed Storage Subsystem
A typical distributed storage system, like Red Hat Global File System (GFS) and Oracle Cluster
File System (OCFS), segments and stores data into different blobs among the hosts, while leaving
processingtoend user. A MapReduce[14] based system(Google GFS and Hadoop) is more
feasible for our purposes as it seamlessly integrates storage and processing.The MapReduce
model involves applying a map operation to key/value pairs, then a reduce operation to all the
values sharing the same key, and a merge operation to process the result. Hadoop [15] is a
framework for running applications on large clusters built from commodity hardware. It takes
advantage of data locality [16] to reduce the communication cost in parallel processing. Hadoop
implements the MapReduce paradigm and provides a distributed filesystem – the Hadoop
Distributed File System(HDFS). However, when data are stored on a single storage cluster and
shared among users, access control and QoS can be addressed by setting up two storage clusters
(Figure 6): TheVirtual Storage Cluster(VSC) and the Physical Storage Cluster(PSC).
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
InstanceInstances
InstanceInstances
Instance
Instances
Instance
InstanceInstances
VHDFS
Volume
VHDFS
Volume
VHDFS
Volume
VHDFS
Volume
Virtual Storage
Namespace
Virtual Storage
Cluster
VSC resides in a virtual cluster managed by the OpenStack, and can be accessed by users. With an
essentially identical architecture as VCS, PSC can provide similar functionalities, but it
mainlyservices the persistent storage needs. MapReduce jobs initiated by users will be dispatched
to the intermediate cluster, while streaming sensor data will be stored in the persistent cluster.Note
that as high as an order of magnitude of performance penalty
is adopted, mandating the system to be fine
study, to maximize the I/O throughput, each physical hard drive
the cinder volume and the HDFS volu
As illustrated in Figure 6, HDFS volumes are configured as
cluster using LVM, while cinder volumesare managed by
OpenStack), and both are exposed to
System Interface). In this context, virtual instances shouldbe built (withwritten scripts) toensure
that an instancehas been granted access only to the disk in its hosted node.
diversity of the real-world scientific computational
network performance and distributed parallel data processing performance.Since the primary focus
of SNRC is on scientific big data processing, the HiBench [18] benchma
overall system performance in terms of speed (i.e., job running time), throughput, I/O bandwidth.
HiBench consists of a set of workloads
In our test, four workloads(Database join,
clustering) were selected to cover typical application scenarios in scienti
popular micro benchmarks were
used for assessing the system performance
Table
International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
Compute Nodes
Cinder
Storage
Service
HDFS
Volume
Cinder
Volume
HDFS
Volume
Cinder
Volume
HDFS
Volume
Cinder
Volume
HDFS
Volume
Cinder
Volume
Physical Storage
Namespace
(Name Node)
VHDFS
Volume
VHDFS
Volume
Virtual Storage
Namespace
Nova
Compute
Service
Neutron
Network
Service
Physical Storage
Cluster
Figure6. Storage System Design
virtual cluster managed by the OpenStack, and can be accessed by users. With an
essentially identical architecture as VCS, PSC can provide similar functionalities, but it
persistent storage needs. MapReduce jobs initiated by users will be dispatched
to the intermediate cluster, while streaming sensor data will be stored in the persistent cluster.Note
an order of magnitude of performance penalty canbe paid if a wrong storage scheme
is adopted, mandating the system to be fine-tuned to support I/O intensive applications. In this
study, to maximize the I/O throughput, each physical hard drive was partitioned into two volumes:
HDFS volume.
, HDFS volumes are configured as a block storage device for persistent
cluster using LVM, while cinder volumesare managed by the cinder (the block storage service for
OpenStack), and both are exposed to the virtual instances through iSCSI (Internet Small Computer
System Interface). In this context, virtual instances shouldbe built (withwritten scripts) toensure
an instancehas been granted access only to the disk in its hosted node. With respect
world scientific computational needs, the SNRC was tested
network performance and distributed parallel data processing performance.Since the primary focus
fic big data processing, the HiBench [18] benchmark suite wa
overall system performance in terms of speed (i.e., job running time), throughput, I/O bandwidth.
HiBench consists of a set of workloads from both synthetic and real-world application
In our test, four workloads(Database join, aggregation, Bayesian classification and K
selected to cover typical application scenarios in scientific data processing, three
included, and two HDFS benchmarks (DFS read and write)
system performance of the HDFS.
Table I Distributed Processing Perforamnce
International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
37
virtual cluster managed by the OpenStack, and can be accessed by users. With an
essentially identical architecture as VCS, PSC can provide similar functionalities, but it
persistent storage needs. MapReduce jobs initiated by users will be dispatched
to the intermediate cluster, while streaming sensor data will be stored in the persistent cluster.Note
f a wrong storage scheme
tuned to support I/O intensive applications. In this
s partitioned into two volumes:
block storage device for persistent
cinder (the block storage service for
virtual instances through iSCSI (Internet Small Computer
System Interface). In this context, virtual instances shouldbe built (withwritten scripts) toensure
respect to the
tested in terms of
network performance and distributed parallel data processing performance.Since the primary focus
was used to test
overall system performance in terms of speed (i.e., job running time), throughput, I/O bandwidth.
world applications.
fication and K-means
fic data processing, three
(DFS read and write) were
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
38
Tests were performed on the physical storage cluster among 10 data nodes, and results were
tabulated in Table I. The HDFS workload test showed that the throughput of HDFS could reach up
to 294 MB/s on read, and 251 MB/s on write.When dealing with the work load of “sort” (Table 1),
the system could process a total of 3.29GB of data in about 35 seconds, a mere throughput of
92.68 MB/s. When the better map-reduce-based algorithms were used, the TeraSort could process
32GB of data in only 116 seconds, boosting the throughput to 274.40 MB/s, which is close to DFS
I/O peak throughput.
4. PROCESSING AND VISUALIZATION
The multidisciplinary research flavor in Nexus requires diverse tool and programming language
support. For decades, scientists have been using imperative programming language such as C, C++
or Fortran to code and run their scientific models ,after which they submit their programs to a
place like Portable Batch System(PBS); parallelism is achieved by using multi-thread and message
passing library. As the programming paradigm shifts, declarative programming languages, such as
Matlab/Octave and R, have gained popularity. They offer comparable performance but demand
much shorter development time.
As the world is going to rely on more distributed storage, it puts additional burden on users to
acquire deeper understanding of Map Reduce and even require them to learn a both new and old
language: Java. The Resilient Distributed Datasets (RDD) abstract parallel data structures
[17]allow users to explicitly persist intermediate results in memory, encapsulate implementation
details away from users, and offer a rich set of operators for data processing. Spark [18], based on
the concept of RDD, is easier to program and performs better when all the data can fit into the
memory; it also performs real-time processing using the existing machine-learning libraries and
other toolboxes.
The IPyth on Notebook (now known as the Jupyter Notebook) can integrate multiple computing
kernels into one single computational environment, combining code execution, rich text, plots and
rich media into an interactive document. The notebooks servers can be deployed in hypervisor-
based VM or Linux Container (e.g., docker) with near to native performance. These processing
framework and tools (and many others) were bundled together and integrated into our cloud
distribution. Figure 7 shows the organization of these components. From a remote device, a user
logs into the notebook system, calling a REST client to import data from a remote site into the
HDFS. The data are saved in the HDFS and cached as a RDD in memory using Spark. The RDD
can be processed in Octave, R, or Python using any machine learning libraries or other packages,
final and immediate results can all be visualized within the notebook instantly.
Figure7. Theprocessing and visualization framework.
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
39
5. USECASE STUDY: BUILDING A SOLAR RADIATION MODEL
The objective of this use case is to showcase how to build a solar radiation model that can guide
the allocation of solar energy resources in different geographical locations of Nevada at different
times. This model will be built upon from the Geospatial datasets collected from the 13 sensor
stations over six years of time,stored in NRDC, and processed using the SNRC platform.
5.1. Data Preparation and Analysis
There are 2236 measurementsperformed every minute, bringing a total of 7 billionplus data
records in the dataset. These data can be transferred from the NRDC to the SNRC Jupyter
notebook with the data transmission API given below:
Since our focus is limited to solar radiation measured in photovoltaics (PV) panels, all irrelevant
data records would have to be purged. After purging, the resulting CSV file would contain all the
solar radiation data of the past six years, with the first nine rows as the headers, followed by rows
of numerical values arranged according to their timestamp indices. Pandas library provides a two-
dimensional heterogeneous tabular data structure named as Data Frame[19], and arithmetic
operations can be performed on both rows and columns. Note that most of the functions provided
by Pandas’ Data frame have their equivalentsin Spark, which is also supported by SNRC.
The downloaded CSV file can be parsed into Data Frame by running the following:
In our case, parsing the downloaded file (2.2 GB) into the Data Frame took less than 1 minute to
complete. We chose to study monthly mean radiation at 12:00 AM (Pacific Time),which involved
down-sampling of the radiation data. Since the time series data are indexed by their timestamps,
we can locate values using the Data built-in functions provided by pandas, and have such data later
converted to a dictionary data structure with year as its key.
Now we can visualize the processed data in bar diagram as shown in Figure 8.a
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
Figure 8. a) montly ave
5.2. Model Curve Fitting
Figure 8.b helps scientists tovisualize how
theoretical study would generate a physics
solar position and the reflect rate.
one can quickly derive and fine tune a numerical model using these observed data. Simple
inspection of the data plotted in
maintains a sinusoidal relationship with respect to time. That is,
where X is the timestamp in minutes, Y
estimated. These parameters, for instance, can be easily obtained with
by calling a function from the Notebook.
The regression model thus obtained can be readily compared with the observed data (
plotting them together in Figure 9. Apparently, this regression model fits the observed data really
well.
Figure 9. Results obtained from the regression model are compared against the observed data.
6. SUMMARY
This paper described various services and features
support a large research team engaging in
International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
Figure 8. a) montly average solar radiation, and b) daily radiation by month.
8.b helps scientists tovisualize how solar radiation varies with the hourson a day
generate a physics-backed model to explain the relationship between
reflect rate. On the other hand, since we already have enough observed data,
quickly derive and fine tune a numerical model using these observed data. Simple
inspection of the data plotted in Figure 8.bcan lead to a hypothesis that the solar radiation
maintains a sinusoidal relationship with respect to time. That is,
where X is the timestamp in minutes, Y is the radiation, and A, B, C, D are parameters
estimated. These parameters, for instance, can be easily obtained with least square approximation
by calling a function from the Notebook.
The regression model thus obtained can be readily compared with the observed data (
plotting them together in Figure 9. Apparently, this regression model fits the observed data really
Results obtained from the regression model are compared against the observed data.
rvices and features of the SNRC, a platform specially
support a large research team engaging in multidisciplinary, data-driven research efforts.
International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
40
varies with the hourson a day. A
the relationship between the
, since we already have enough observed data,
quickly derive and fine tune a numerical model using these observed data. Simple
ure 8.bcan lead to a hypothesis that the solar radiation
and A, B, C, D are parameters to be
approximation
The regression model thus obtained can be readily compared with the observed data (Figure 9) by
plotting them together in Figure 9. Apparently, this regression model fits the observed data really
Results obtained from the regression model are compared against the observed data.
specially developed to
research efforts.
11. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 3, June 2017
41
Organized around the delivery model of cyber infrastructure-as-a-service (CaaS), the SNRC
currently can store and process tens of Tera Bytes of data and support over 100 researchers
scattered around multiple campuses and remote sites. Capability of SNRC can be easily scaled up
to process hundreds to thousands Tera Bytes of data and support significantly more users, with
installation of additional commodity hardware resources.
REFERENCE
[1] Nvsolarnexus.org, “The Solar-Energy-Water-Environment Nexus Project,” [Online]. Available:
http://nvsolarnexus.org.
[2] S. Dascalu, F. C. Harris Jr, M. McMahon Jr, E. Fritzinger, S. Strachan, and R. Kelley, “An Overview
of the Nevada Climate Change Portal,” Proc. 7th International Congress on Environmental Modelling
and Software (iEMSs), 2014, vol. 1, no. 2014, pp. 75–82.
[3] V. D. Le, M. M. Neff, R. V Stewart, R. Kelley, E. Fritzinger, S. M. Dascalu, and F. C. Harris,
“Microservice-based architecture for the NRDC,” Proc. IEEE International Conference on Industrial
Informatics(INDIN), 2015, pp. 1659–1664.
[4] G. Foundation Inc, “Gentoo Linux Project,” [Online]. Available: http//www. gentoo. org.
[5] A. Chuvakin, “Linux Kernel Hardening,” [Online]. Available:
http://www.symantec.com/connect/articles/linux-kernel-hardening.
[6] Z. Xianyi, W. Qian, and Z. Chothia, “OpenBLAS” [Online]. Available: http//xianyi. github.
io/OpenBLAS.
[7] S. Browne, J. Dongarra, E. Grosse, and T. Rowan, “The Netlib mathematical software repository,” D-
Lib Magazine, vol. 1, no. 3 Sep. 1995.
[8] A. Kivity, Y. Kamay, D. Laor, U. Lublin, and A. Liguori, “KVM: the Linux Virtual Machine
Monitor,” Proc. Linux Symposium, 2007, vol. 1, pp. 225–230.
[9] O. Sefraoui, M. Aissaoui, and M. Eleuldj, “OpenStack: Toward an Open-Source Solution for Cloud
Computing,” International Journal of Computer Applications, vol. 55, no. 3, pp. 38–42, Oct. 2012.
[10] B. Pfaff, J. Pettit, T. Koponen, E. Jackson, A. Zhou, J. Rajahalme, J. Gross, A. Wang, J. Stringer, P.
Shelar, and others, “The design and implementation of open vswitch,” Proc. 12th USENIX Symposium
on Networked Systems Design and Implementation (NSDI), 2015, pp. 117–130.
[11] D. Farinacci, T. Li, S. Hanks, D. Meyer, and P. Traina, “Generic routing encapsulation (GRE),” RFC
2784, Mar. 2000.
[12] J. Dean and S. Ghemawat, “MapReduce: simplified data processing on large clusters,”
Communications of the ACM, vol. 51, no.1, pp. 1–13,Jan, 2008.
[13] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop distributed file system,” Proc. 26th
IEEE Symposium on Massive Storage Systems and Technologies, 2010.
[14] A. Rogers and K. Pingali, "Process decomposition through locality of reference," ACM SIGPLAN
Notices, vol. 24, no. 7, pp. 69–80, Jul. 1989.
[15] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I.
Stoica, “Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing,”
Proc. 9th USENIX conference on Networked Systems Design and Implementation, 2012, pp. 2-2.
[16] Shengsheng Huang, Jie Huang, Jinquan Dai, Tao Xie, and Bo Huang, “The HiBench benchmark suite:
characterization of the MapReduce-based data analysis,” Proc. 26th International Conference on Data
Engineering Workshops, Mar. 2010.
[17] http://pandas.pydata.org/pandas-docs