2. 2
Table of Content
1. Letter of Transmittal 3
2. Project Objectives and Goals 4
3. Strategic Plan 5
4. Organization Structure 6
5. Choosing The Right Vendor 7
6. Project Design 8
7. Critical Success Factors 9
8. Implementation Plan 10
8.1 IDEF0 11
8.2 WBS 12
8.3 RACI 13
8.4 Gantt Chart 14
8.5 CPM 16
9 Project Execution Plan 17
10 Project Maintenance Plan 18
11 Budgeting and Financial Analysis 19
12 Tasks and Responsibilities 23
13 Roles and Responsibilities 24
14 Risk Assessment 26
14.1 RISK Matrix
15 SWOT Analysis 27
16 Future Scope 28
17 References 29
3. 3
1. LETTER OF TRANSMITTAL
Professor Klosterman
Subject: Proposal for setting up a High Performance Computing (HPC)
facility for the University.
The objective of the project is to provide state-of-the-art computing infrastructure,
resources and high quality services to the various researchers across different
departments in the university by setting up a centralized High Performance
computing (HPC) facility. This investment will be required to stay internationally
competitive.
High Performance Computing is taking very rapid strides and making a telling
impact in every conceivable field of science and technology. The very popular
Moore’s law states that the number of transistors doubles every two years and
hence the computing power doubles every 18-24 months. This law has been
holding steady for the last 4 decades resulting in faster and faster computers that
increasingly reduce the time to resolve grand challenge problems. The
breakthroughs in semiconductor manufacturing and innovative computer
architectures using commodity parts, make High Performance Computing very
affordable and manageable today.
The estimated initial budget for the HPC facility is US$ 2 Million. Periodic
upgrade to the Computer System will be required roughly every 3 years to take
advantage of the new technologies.
Sincerely,
Sharyu Mahale
4. 4
2. PROJECT OBJECTIVES AND GOALS
Ø The main objective of this project is to set up a centralized HPC facility in
the university to cater the computing needs of students and researchers
Ø To evaluate, procure and install the HPC system within 2-3 months within
the allocated budget of US$ 2M
Ø To choose the HPC system that provides seamless scalability for at least 5
years, resulting in investment protection and smooth enhancement of
computing capabilities to keep in step with technology advancements
Ø To ensure that the computing resources are always made available to the
researchers with minimal or no downtime
Ø To set up a team of specialists who can monitor the usage of the facility
constantly
Ø To encourage industry collaboration in areas of national interest and get
additional funding by way of project grants
Ø Creating a Prestigious Reference site “A case study” for business promotion
and brand building to be recognized as a leading ‘HPC’ solution provider
Ø Strengthen relationships with various OEMs involved in the project for
future support
Ø Attracting OEMs who are left out of the current project for future upcoming
requirements
5. 5
3. STRATEGIC PLAN
Ø Routine communications between the stakeholders and the project manager
Ø Adopting collaborative customer centric approach to gain confidence
Ø Motivating the team
Ø Organize online Webinars on various subjects of new technology and ROAD
map sharing
Ø Conduct training programs to facilitate customer participation
Ø The training and certification for various software tools, compilers, CUDA
programming will attract customers
6. 6
4. ORGANIZATION STRUCTURE
Ø Projectized Organization structure is used as every employee is assigned a
specific task during the entire course of project
Ø Project manager is consulted and is accountable for every task
Ø PM will communicate with the clients and resolve any grievances
Figure 1: Organizational Structure
Purchasing
Department
Project Manager
External
Consultant
Engineering
Department
Purchase
Manager
Systems
Administrator
Domain
Specialist
Electrical
Department
Computer
Science Engineer
Site Preparation
Specialist
S
Power and
Cooling
Specialist
Finance
Manager
Sales
Representative
7. 7
5. CHOOSING THE RIGHT VENDOR
Ø After matching the needs of the University, appropriate vendors(OEMs)
providing different solutions are chosen.
Following is the vendor list:
1. Fujitsu which includes (This constitutes the biggest component of the overall
solution)
2. Fujitsu PRIMERGY Servers
3. Fujitsu ETERNUS Storage
4. Fujitsu HCS Cluster Manager
5. Mellanox Infniband Switches as Primary Interconnect
6. D-Link/Digisol Network Switches as Secondary Interconnect
7. Intel Parallel Studio XE 2015
Ø The design is a fully balanced HPC service that maintains synergy between the
fundamental factors affecting the efficiency of a HPC system, namely: compute
power, memory throughput and storage I/O subsystem capabilities
Ø To meet the requirements, careful architecture is chosen. The latest
technological products will ensure processing performance delivery even under
very demanding or diverse environments
8. 8
6. PROJECT DESIGN
Ø After understanding the requirements of HPC, the following design is proposed
Ø The proposed logical diagram recommends a solution that best meets the
requirements
Figure 2: Logical diagram for 224 nodes cluster
9. 9
7. CRITICAL SUCCESS FACTORS
The successful setting up of an operational HPC Facility depends on the following
factors:
Ø Forming a core group of experts with specialization in the field of computer
architecture, software, domain specialists. An external consultant is also
preferred to be part of this team
Ø Prepare a comprehensive document that clearly spells out the architecture and
performance requirements of the proposed HPC system. This requires a great
deal of interaction with the key user community and their specific computing
needs
Ø Shortlist and invite globally reputed computer vendors with proven capabilities
for a technical presentation to the University’s core committee members
Ø Critical evaluation of bids from selected vendors and choosing a vendor among
the technically qualified ones with the best price/performance
Ø Ensuring successful implementation of the selected HPC system and running
comprehensive tests on the system to ensure that it meets the performance
parameters set forward in the RFP
Ø Providing the computing resources to various users based on established criteria
and ensuring maximum uptime of the system. This has to be achieved by a team
of engineers from the supplier providing 24x7 onsite technical support
10. 10
8. IMPLEMENTATION PLAN
Ø Once all the HPC required documents are gathered project moves on to its
implementation phase
Ø Tools like IDEF0, WBS, RACI, Gantt Chart, CPM helps the project to be on
time and on budget
11. 11
IDEF0
Ø IDEF0 model is is a function modeling methodology for describing
manufacturing functions, which offers a functional modeling language for the
analysis, development, reengineering, and integration of information systems;
business processes; or software engineering analysis[I]
Figure 3: IDEF0 Diagram
12. 12
WBS
Ø This tool breaks down the work into subtask which helps in easy monitoring of
various tasks
Ø A work breakdown structure (WBS), in project management and systems
engineering, is a deliverable-oriented decomposition of a project into smaller
components[II]
Figure 4: Work breakdown structure (WBS) Diagram
13. 13
RACI
Ø This tool helps the project manager to keep a track on the authority on the
project and helps him to monitor the work
Ø Uncertainty in the authority can be avoided by allotting responsibilities to
various team members
Figure 5: RACI Diagram
RACI represents: R –Responsibility, A-Accountable, C-Consulted, I-Informed
• Responsibility = person or role responsible for ensuring that the item is
completed
• Accountable = person or role responsible for actually doing or completing
the item
• Consulted = person or role whose subject matter expertise is required in
order to complete the item
• Informed = person or role that needs to be kept informed of the status of
item completion
14. 14
Gantt chart
Ø Gantt chart illustrates the start and end date of every task in a project
Ø Interdependencies associated with the tasks are taken into consideration
Ø Tasks which are independent are done simultaneously to reduce the number of
days
16. 16
CPM (Critical Path Method)
Ø The project can be completed in 66 days
Ø The critical path for the project is START-2-5-7-8-9-13-15-17-18-20-END
Ø All the activities in the critical path can not be delayed and they need to done on
time to avoid any delay in time frame
Ø Activities 3, 4, 11, 14, 12 can be sufficiently delayed as they are not part of the
critical path
Figure 7: Critical Path Diagram
17. 17
9. PROJECT EXECUTION PLAN
Content Description Time Remark
Open hardware
and check BOM
Open hard ware boxes, check and
verify BOM with customer
1st
day
Rack Stacking Mount hardware on rack and network
cabling
2nd
day
Update Firmware Update hardware firmware 3rd
day If
Required
Installation Installation Rocks cluster manager and
CentOS in master node and compute
nodes
Installation Torque resource manager
Installation of InfiniBand driver and
IB aware MPI
Installation of open source Compiler
3rd
day
Application
Installation
Installation of user application, run
and test sample program
4th
Day
Benchmarking
submission
Benchmark running and submission
after customer satisfaction
5th
Day
Documentation Prepare and submit documentation One week
time after
installation
Training As Required in the RFP
18. 18
10. PROJECT MAINTENANCE PLAN
In addition to cutting-edge products and solutions, in Association with Fujitsu and
other OEM, Fujitsu and Micropoint Computers Private Limited company delivers
world-class Maintenance and Support Services: From integrating new technologies
into IT infrastructures to providing fast and uncomplicated maintenance & support
services pro-actively and reactively according to defined SLAs.
Complex HPC solutions often result in multivendor environments which have to be
managed effectively regarding maintenance services. Coordinating between several
hardware and software service providers, is necessary to ensure the smooth running
of a HPC environment.
Some of the key features of Managed Maintenance services are as follows:
Ø Establish clearly defined service responsibilities
Ø Provide a central contact person for maintenance and support
Ø Provide real-time information about service levels and the quality of the service
delivery
Ø Guarantee the availability of the multivendor environment according to the
defined SLA
Ø Enable flexible modification of services to meet changing requirements
Ø Reduce costs and ensures greater economic viability for maintenance work in
multivendor environments
Ø Improve the satisfaction of internal and external customers
Ø Availability of spares specific to installation and locations across the country
Ø Fixing any technical glitches on time to avoid shutdowns
19. 19
11. BUDGETING AND FINANCIAL ANALYSIS
Ø Product wise cost for every specifications is given below. Bottom-up budget
analysis is done for this project
Ø The initial investment for this project is obtained from company’s funds
Ø The total cost for the project is US$ 1,844,341.8 (without taking into
consideration the optional items)
Ø The major chunk of the project cost is associated with direct cost
Terms of Payment
Ø 50 percent advance before project execution
Ø Balance 40 percent after the delivery of all equipment
Ø 10 percent after satisfactory installation and commissioning
Finances for the project
Ø Micropoint Computers Private Limited company is an established HPC
solution provider, having reputation and experience for last 15 to 20 Years
Ø ‘Micropoint’ has excellent relation and open credit facility from OEM / their
distributors (Original Equipment Manufacturers like DELL, Fujitsu, Mellanox,
Intel, NVIDIA, Ingram, and Avnet)
Ø They have a credit facility for Limited number of period i.e. 45 days to 90
days
Ø The delay in the deliveries or customer payments can allow the credit time to
increase
Ø Only in the extreme cases bank funds and short term finance is required, this
will account for half a million dollars for a period of 3 to 6 months
20. 20
Product wise Cost:
Sr.
No
Name and Specification of the Item Qty
Unit Price Total Price
1 Master Nodes
1.1 Master Node (in HA Mode) 2
US$
7,612.00
US$
15,224.00
1.2 Login/Compilation nodes
1.2.1 CPU Only 2
US$
7,612.00
US$
15,224.00
1.2.2 GPU Enabled Two Nodes in One Chassis 1
US$
14,258.00
US$
14,258.00
1.2.3
Intel Xeon Phi Enabled - Two Nodes in One
Chassis
1
US$
14,258.00
US$
14,258.00
2 Compute Nodes
2.1 Only CPU(64GB RAM) 126
US$
5,609.00
US$
706,734.32
2.2 With 512GB RAM 4
US$
10,542.00
US$
42,168.00
3
GPU Enabled Compute Nodes - Two Nodes in
One Chassis
8
US$
26,530.00
US$
212,239.96
4
Intel Xeon Phi Enabled Compute Nodes Two
Nodes in One Chassis
8
US$
27,259.00
US$
218,072.00
5 Management node 1
US$
5,810.00
US$ 5,810.00
6 Storage : 1SET 1
US$
221,351.00
US$
221,351.00
21. 21
7 Software
7.1
Cluster management tool 1SET
Unit Price per node of tool must be quoted to
take in to account any changes in the quantity of
compute nodes.
1
US$
29,500.00
US$
29,500.00
7.2
Intel Parallel Studio XE 2015 cluster edition
with 10 user floating licenses 1SET
1
US$
28,067.75
US$
28,067.75
7.3
Operating System - CENT-OS Open source
Unit Price per node of OS must be quoted to
take in to account any changes in the quantity of
compute nodes.
1 US$ 0.00 US$ 0.00
8 Primary Communication Network
8.1
Single Chassis switch scalable up to 300 ports
with with backplane configured for 100% non-
blocking for all ports without leafs As Per BOM
1
US$
97,076.00
US$
97,076.00
8.2
leaf blade with optical cables, quantity same as
number of ports per leaf blade of appropriate
lengths. As Per BOM
12
US$
13,095.00
US$
157,139.96
9
Secondary Communication Network - As Per
BOM
1
US$
22,607.00
US$
22,607.00
GRAND TOTAL
One million seven hundred ninety-nine thousand seven
hundred thirty
US$
1,799,730.00
10 Installation and commissioning charges LS
Free Of
Cost Free Of Cost
22. 22
11
Any additional Items required to complete the
solution (provide detailed itemized list) On-Site
Spares as per Requirement
1 Free Of
Cost Free Of Cost
12
Manpower on site for system administration (for
1st year)
2
US$
7435.30 US$ 14870.60
13
Manpower on site for system administration (for
2nd year)
2
US$
7435.30 US$ 14870.60
14
Manpower on site for system administration (for
3rd year)
2
US$
7435.30 US$ 14870.60
Optional Items:
1 AMC for all above mandatory items for 4th year LS
US$
266526.21
US$
266526.21
2 AMC for all above mandatory items for 5th year LS
US$
3165226.21
US$
3165226.21
3
Manpower on site for system administration(for
4th year)
2
US$
7435.30 US$ 14870.60
4
Manpower on site for system administration(for
5th year
2
US$
8950.45
US$
17900.90
23. 23
12. TASKS AND RESPONSIBILTIES
The project consists of the following three key phases for successful
implementation
1. Project proposal and approval
Ø This is a very critical first phase of the project. It includes preparation of a
detailed requirement document that clearly outlines the need for the HPC
system, the ROI analysis, proposed configuration and approximate budget for
procurement, implementation, training and successful deployment of the HPC
system
Ø This document should clearly specify other costs such as real estate, electric
supply, UPS with backup generators, computer site preparation, manpower
requirements for HPC operations
Ø The proposal has to be sent to the funding agency after the document is
approved by the core committee of the University and the Director. Once the
project funding is approved, the other phases will follow
2. Vendor selection and HPC system installation
Ø This phase consists of sending Request for Proposals to the HPC vendors,
selection of the HPC vendor after a detailed and comprehensive evaluation
process, awarding the contract and supply and installation of the HPC system
and acceptance of the same after running the stipulated tests
3. HPC system deployment
Ø The final phase consists of releasing the HPC resources to the users, monitoring
the usage and maintaining uptime of the system
24. 24
13. Roles & Responsibilities
Each department and its personnel have a specific role to play which leads to the
successful implementation of the HPC project.
Administration and Finance Department
Ø The administration & finance department is responsible for the commercial and
financial aspects of the project
Ø Its job is to release the final RFP to the vendors, validate the commercial
aspects of the proposal and ensure that this is in line with the Terms and
conditions of the RFP in all respects, coordinate with the technical evaluation
committee to shortlist the qualified vendors, enter in to contract negotiations
and issue the Purchase Contract to the selected vendor
Ø This department is also responsible for releasing payment to the vendor after
they have fulfilled their obligations as per the RFP stipulations
Ø The engineering department is in-charge of setting up and running the plant.
The engineering department handles all technical aspects of the plant
Ø The engineers make sure the plant is set up and runs smoothly and they also
trouble shoot any occasional problems that may arise
Computer Science Department and Computer Centre
Ø They form a part of the technical evaluation committee and play a major role in
forming the technical specifications of the project. They are also responsible for
evaluation and validation of the technical aspects of the bids received from
various vendors
25. 25
Ø Once the HPC system is delivered and installed, this department will oversee
the various tests performed on the HPC system as per the RFP norms and sign
off the acceptance test report
Domain specialists
Ø The domain specialists come from various user departments and are experts in
various disciplines – Physical sciences, Life Sciences, engineering and others,
who actually make sure that the HPC system configuration is optimized for the
various applications
Ø This group is responsible for defining the performance criteria of the HPC
system and provide benchmark programs to be tested on the system
External Consultant
Ø The external consultant is from a well reputed outside organization which has
already procured and deployed similar HPC system
Ø The consultant provides technical inputs to the University people from the
beginning leading to RFP, selection of vendor leading to the acceptance of the
HPC system
Ø The consultant can only make recommendations to the university but cannot
make the final decision
Electrical engineer and site preparation specialist
Ø These experts are responsible for the power and cooling aspects as well as
defining the space requirements and data center preparation
26. 26
14. RISK ASSESSMENT
RISK MATRIX
Ø Different risks associated with the project are shown below along with their
likelihood and severity
Ø Critical Risks should be given high priority over other as they can have adverse
effects on the project
Risk NO Risk Associated Likelihood Severity
1 Technical Glitches 5 4
2 Delay in time frame 3 4
3 Scope Creep 2 4
4 Over Budget 2 3
5
Low team
motivation 2 4
6
Conflicts of
opinions 2 2
High 1
Probability Medium 2
Low 6 4 3,5
Low Medium High
Severity
Here:
Ignore the threats Monitor the threats Critical threats
27. 27
15. SWOT ANALYSIS
STRENGTHS WEAKNESSES OPPORTUNITIES THREATS
Sufficient
capital
available
Exchange of
currency rates
Niche target market Time
constraints
Experience in
the field for 25
years
Time frames,
deadlines and
pressures
New technologies Scope creep
Availability of
excellent
vendors
Expansion of
customer base
Constant
improvement
in technical
support
Strong
technical
support
28. 28
16. FUTURE SCOPE
Once the installation for the HPC is complete, any modification can be done
effectively.
Ø Material addition can be done with additional cost as per line items price
breakup offered
Ø As per the Demand the processing power can be increased by adding the
compute nodes, increasing memory of existing system for better performance
Ø Similar storage system can be upgraded for additional capacity seamlessly. The
design is scalable for any further expansion in near future of 18 months
(generally after 12 to 18 months the technology changes and the hardware
becomes irrelevant for high performance