SlideShare a Scribd company logo
Breaking Bottlenecks
LSF @ AMD
28 May 2003 2
LSF @ AMD – History
•We’ve used LSF for 7 years, from K6 to the
Opteron and beyond
•My group has 3 large clusters-the largest with
several thousand systems—mostly Athlon Linux.
3
Large LSF Clusters = Hard Work!
•How do we do it?
•A Great LSF Team
•Finding and using good tools
•I’ll talk about types of tools and some specific
examples
28 May 2003
4
Tools by Example
•These examples are from our large environment
•They should be useful anywhere people want to
work smartr.
28 May 2003
5
Grim Reality: Thousands of Systems
•System Database
–MySQL + Perl + local programs
–Updated daily, automatically
•Trouble Ticket Database
–RT - Request Tracker
–Used by sysadmins and customers
28 May 2003
6
Coordination: Large Systems Team,
Large Customer Base
•RT: trouble ticket system
•We use RT to track:
–Track customer problems
–Track bugs in vendor software
–Schedule and control changes to the LSF cluster
•You need one of these, unless you like to work too
hard.
28 May 2003
7
Test or Change Many or All!
Systems?
•We use clsh (the cluster shell) to run programs on
many systems serially or in parallel.
•Clsh can execute programs in our cluster at over
600 systems/minute.
•Example: run `uname –a’ on all systems in the
tx_linux netgroup:
–$ clsh –ng tx_linux ‘uname –a’
•Scared yet?
–$ sudo clsh –ng tx_linux ‘halt’
28 May 2003
8
Programs Crashing or Hanging?
•Trace a running program
–# strace -p <PID>
•Run a program while tracing it
–$ strace –t –v –f /bin/hostname
•Everything is a file, find the files
–$ lsof -p <PID>
28 May 2003
9
Teamwork
•Working Together
–While Apart!
–At inconvenient times...
•IRC – for text chat
•Telephone headsets – for voice chat!
•VNC – for shared X sessions
28 May 2003
10
The Tool for Tools - Perl
•Obvious facts (?)
–Cross-platform
–Great software library (CPAN)
–Well known in the EDA and Unix world
–Fun to use (for some strange folks, anyway)
–The strong attractive force
28 May 2003
11
1000 foot view of the cluster
•Cricket/RRDtool
•System Accounting
•Syslog Server
28 May 2003
12
The Second Law / Entropy
•Entropy is
•Misteaks happen
•RCS/CVS/SCCS/…
–You must use revision control, or chaos will win
•Sudo
–Use sudo for root access, for logging and assigning limited
privilege
28 May 2003
13
Acute vs Chronic Trouble
•How do we diagnose and fix symptoms that are not
easily reproduced?
•Lsfbug-a program for users
– Saves Unix environment
– Saves LSF environment
– Submits a test job to LSF
– Emails the output to the LSF team
28 May 2003
14
Cross-Platform Compatibility
•Use similar paths for similar tools – regardless of
the OS or OS version
– Perl should always be at the same place – even for AIX and Linux
and HP-UX and …
•Install user tools on NFS servers
•Use package management software (opt_depot,
stow)
•Install systems w/Kickstart/Jumpstart/Ignite
28 May 2003
15
Expertise
•Unix Generally
•OS Specific: Linux, Solaris, HP-UX
•Hardware Specific
•Networking
–routing, switching
–services
• NFS, NIS, DNS, NTP
28 May 2003
16
Tool List 1
•clsh mailto:quentin.fennessy@amd.com
•cricket http://cricket.sourceforge.net
•CVS http://www.cvshome.org
•ecc module http://www.anime.net/~goemon/linux-ecc
•ethereal http://www.ethereal.com
•fping http://www.fping.com
•hping2 http://www.hping.org
•iozone http://www.iozone.org
•ircd http://www.funet.fi/~irc/server
•lsof ftp://vic.cc.purdue.edu/pub/tools/unix/lsof
28 May 2003
17
Tools List 2
•mtr http://www.bitwizard.nl/mtr
•mysql http://www.mysql.com
•ntop http://www.ntop.org
•opt_depot http://www.arlut.utexas.edu/csd/opt_depot
•Perl http://www.perl.com
•RCS http://www.gnu.org/software/rcs/rcs.html
•Rrdtool http://freshmeat.net/projects/rrdtool
•rsync http://rsync.samba.org/rsync
•RT http://www.bestpractical.com/rt
•SMART tools http://smartmontools.sourceforge.net
•stow http://www.gnu.org/software/stow/stow.html
28 May 2003
18
Tools List 3
•strace http://sourceforge.net/projects/strace
•sudo http://www.courtesan.com/sudo
•tusc Use Google
•vnc http://www.uk.research.att.com/vn
•xchat http://www.xchat.org
28 May 2003
19
Reading List
•The Practice of System and Network
Administration, by Limoncelli and Hogan
–http://www.sysadminfocus.com
•The Unix System Administration Handbook
–http://www.admin.com
28 May 2003
28 May 2003 20
Trademark Attribution
AMD, the AMD Arrow Logo and combinations thereof
are trademarks of Advanced Micro Devices, Inc.
Other product names used in this presentation are
for identification purposes only and may be
trademarks of their respective companies.

More Related Content

Viewers also liked

NORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
NORMAL MODE ANALYSIS OF CYLINDER HEAD COVERNORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
NORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
IAEME Publication
 
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENTSTRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
IAEME Publication
 
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
IAEME Publication
 
Ex09
Ex09Ex09
Ex09
Emagister
 
Nueva plantilla presentaciones educa digital regional 2014
Nueva plantilla presentaciones educa digital regional 2014Nueva plantilla presentaciones educa digital regional 2014
Nueva plantilla presentaciones educa digital regional 2014
Monica Posada
 
La robótica
La robóticaLa robótica
La robótica
Maicol Estacio
 
Taller # 2 fotos luis fernnado alvarez caro 8°e
Taller # 2 fotos  luis fernnado alvarez caro 8°eTaller # 2 fotos  luis fernnado alvarez caro 8°e
Taller # 2 fotos luis fernnado alvarez caro 8°e
Luis Fer
 
Tic
TicTic
La isla petrificada
La isla petrificadaLa isla petrificada
La isla petrificada
Juanma2525
 
La gestión y ciclo de vida de proyectos
La gestión y ciclo de vida de proyectosLa gestión y ciclo de vida de proyectos
La gestión y ciclo de vida de proyectos
Carofun
 
Colegio bachillerato procer jose cuero y caicedo
Colegio bachillerato procer jose cuero y caicedoColegio bachillerato procer jose cuero y caicedo
Colegio bachillerato procer jose cuero y caicedo
reinachiriap
 
Presentacion
PresentacionPresentacion
Presentacion
Comprar Pulsometros
 
Competencias
CompetenciasCompetencias
Competencias
fymurillo
 
Nomina Junio
Nomina JunioNomina Junio
Presentacion - cambio climatico
Presentacion - cambio climaticoPresentacion - cambio climatico
Presentacion - cambio climatico
Richard Villalobos Sanchez
 
OTHERS ARTWORK - 2D & 3D
OTHERS ARTWORK - 2D & 3DOTHERS ARTWORK - 2D & 3D
OTHERS ARTWORK - 2D & 3DHO Wah Wong
 
TEMPLATE_FIRMENPRAESENTATION
TEMPLATE_FIRMENPRAESENTATIONTEMPLATE_FIRMENPRAESENTATION
TEMPLATE_FIRMENPRAESENTATIONFelix Vogel
 
Wikispaces 03 2008
Wikispaces 03 2008Wikispaces 03 2008
Wikispaces 03 2008estudyskills
 

Viewers also liked (18)

NORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
NORMAL MODE ANALYSIS OF CYLINDER HEAD COVERNORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
NORMAL MODE ANALYSIS OF CYLINDER HEAD COVER
 
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENTSTRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
STRENGTH AND PERMEABILITY STUDIES ON CONCRETE WITH NANO-CEMENT
 
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
AN EXPERIMENTAL STUDY ON PARTIAL REPLACEMENT OF CEMENT WITH BAGASSE ASH IN CO...
 
Ex09
Ex09Ex09
Ex09
 
Nueva plantilla presentaciones educa digital regional 2014
Nueva plantilla presentaciones educa digital regional 2014Nueva plantilla presentaciones educa digital regional 2014
Nueva plantilla presentaciones educa digital regional 2014
 
La robótica
La robóticaLa robótica
La robótica
 
Taller # 2 fotos luis fernnado alvarez caro 8°e
Taller # 2 fotos  luis fernnado alvarez caro 8°eTaller # 2 fotos  luis fernnado alvarez caro 8°e
Taller # 2 fotos luis fernnado alvarez caro 8°e
 
Tic
TicTic
Tic
 
La isla petrificada
La isla petrificadaLa isla petrificada
La isla petrificada
 
La gestión y ciclo de vida de proyectos
La gestión y ciclo de vida de proyectosLa gestión y ciclo de vida de proyectos
La gestión y ciclo de vida de proyectos
 
Colegio bachillerato procer jose cuero y caicedo
Colegio bachillerato procer jose cuero y caicedoColegio bachillerato procer jose cuero y caicedo
Colegio bachillerato procer jose cuero y caicedo
 
Presentacion
PresentacionPresentacion
Presentacion
 
Competencias
CompetenciasCompetencias
Competencias
 
Nomina Junio
Nomina JunioNomina Junio
Nomina Junio
 
Presentacion - cambio climatico
Presentacion - cambio climaticoPresentacion - cambio climatico
Presentacion - cambio climatico
 
OTHERS ARTWORK - 2D & 3D
OTHERS ARTWORK - 2D & 3DOTHERS ARTWORK - 2D & 3D
OTHERS ARTWORK - 2D & 3D
 
TEMPLATE_FIRMENPRAESENTATION
TEMPLATE_FIRMENPRAESENTATIONTEMPLATE_FIRMENPRAESENTATION
TEMPLATE_FIRMENPRAESENTATION
 
Wikispaces 03 2008
Wikispaces 03 2008Wikispaces 03 2008
Wikispaces 03 2008
 

Similar to Breaking Bottlenecks: LSF @ AMD

Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02
Clint Edmonson
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
Brendan Gregg
 
CS403: Operating System : Lec 3 Types of OS (1) .pptx
CS403: Operating System : Lec 3 Types of OS (1) .pptxCS403: Operating System : Lec 3 Types of OS (1) .pptx
CS403: Operating System : Lec 3 Types of OS (1) .pptx
Asst.prof M.Gokilavani
 
types_of_operating_systems_sk_akram.pptx
types_of_operating_systems_sk_akram.pptxtypes_of_operating_systems_sk_akram.pptx
types_of_operating_systems_sk_akram.pptx
SkAkram9
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
inside-BigData.com
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
Ahmed Misbah
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
Boden Russell
 
An operating system for multicore and clouds: mechanism and implementation
An operating system for multicore and clouds: mechanism and implementationAn operating system for multicore and clouds: mechanism and implementation
An operating system for multicore and clouds: mechanism and implementation
Mohanadarshan Vivekanandalingam
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
Romain Jacotin
 
Mba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systemsMba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systems
Rai University
 
Mba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systemsMba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systems
Rai University
 
11. operating-systems-part-2
11. operating-systems-part-211. operating-systems-part-2
11. operating-systems-part-2
Muhammad Ahad
 
Platform Technologies Report (1).pptx
Platform Technologies Report (1).pptxPlatform Technologies Report (1).pptx
Platform Technologies Report (1).pptx
BeviljeanCharcos
 
Classifications of OS.pptx
Classifications of OS.pptxClassifications of OS.pptx
Classifications of OS.pptx
Balamurugan M
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
BigDataEverywhere
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
Oleg Tsal-Tsalko
 
Operating System
Operating SystemOperating System
Operating System
Hitesh Mohapatra
 
Chap1
Chap1Chap1
Chap1
adisi
 
Clustering
ClusteringClustering
Clustering
Abhay Pai
 
EMBEDDED OS
EMBEDDED OSEMBEDDED OS
EMBEDDED OS
AJAL A J
 

Similar to Breaking Bottlenecks: LSF @ AMD (20)

Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02Windows Server 2008 R2 Dev Session 02
Windows Server 2008 R2 Dev Session 02
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
CS403: Operating System : Lec 3 Types of OS (1) .pptx
CS403: Operating System : Lec 3 Types of OS (1) .pptxCS403: Operating System : Lec 3 Types of OS (1) .pptx
CS403: Operating System : Lec 3 Types of OS (1) .pptx
 
types_of_operating_systems_sk_akram.pptx
types_of_operating_systems_sk_akram.pptxtypes_of_operating_systems_sk_akram.pptx
types_of_operating_systems_sk_akram.pptx
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Big Data for QAs
Big Data for QAsBig Data for QAs
Big Data for QAs
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
An operating system for multicore and clouds: mechanism and implementation
An operating system for multicore and clouds: mechanism and implementationAn operating system for multicore and clouds: mechanism and implementation
An operating system for multicore and clouds: mechanism and implementation
 
The Google File System (GFS)
The Google File System (GFS)The Google File System (GFS)
The Google File System (GFS)
 
Mba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systemsMba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systems
 
Mba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systemsMba i-ifm-u-3 operating systems
Mba i-ifm-u-3 operating systems
 
11. operating-systems-part-2
11. operating-systems-part-211. operating-systems-part-2
11. operating-systems-part-2
 
Platform Technologies Report (1).pptx
Platform Technologies Report (1).pptxPlatform Technologies Report (1).pptx
Platform Technologies Report (1).pptx
 
Classifications of OS.pptx
Classifications of OS.pptxClassifications of OS.pptx
Classifications of OS.pptx
 
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
Big Data Everywhere Chicago: High Performance Computing - Contributions Towar...
 
Distributed systems and scalability rules
Distributed systems and scalability rulesDistributed systems and scalability rules
Distributed systems and scalability rules
 
Operating System
Operating SystemOperating System
Operating System
 
Chap1
Chap1Chap1
Chap1
 
Clustering
ClusteringClustering
Clustering
 
EMBEDDED OS
EMBEDDED OSEMBEDDED OS
EMBEDDED OS
 

Recently uploaded

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
Madan Karki
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
IJECEIAES
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
IJECEIAES
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
zubairahmad848137
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
kandramariana6
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
JamalHussainArman
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 

Recently uploaded (20)

BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Manufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptxManufacturing Process of molasses based distillery ppt.pptx
Manufacturing Process of molasses based distillery ppt.pptx
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...Advanced control scheme of doubly fed induction generator for wind turbine us...
Advanced control scheme of doubly fed induction generator for wind turbine us...
 
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Casting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdfCasting-Defect-inSlab continuous casting.pdf
Casting-Defect-inSlab continuous casting.pdf
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
132/33KV substation case study Presentation
132/33KV substation case study Presentation132/33KV substation case study Presentation
132/33KV substation case study Presentation
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptxML Based Model for NIDS MSc Updated Presentation.v2.pptx
ML Based Model for NIDS MSc Updated Presentation.v2.pptx
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 

Breaking Bottlenecks: LSF @ AMD

  • 2. 28 May 2003 2 LSF @ AMD – History •We’ve used LSF for 7 years, from K6 to the Opteron and beyond •My group has 3 large clusters-the largest with several thousand systems—mostly Athlon Linux.
  • 3. 3 Large LSF Clusters = Hard Work! •How do we do it? •A Great LSF Team •Finding and using good tools •I’ll talk about types of tools and some specific examples 28 May 2003
  • 4. 4 Tools by Example •These examples are from our large environment •They should be useful anywhere people want to work smartr. 28 May 2003
  • 5. 5 Grim Reality: Thousands of Systems •System Database –MySQL + Perl + local programs –Updated daily, automatically •Trouble Ticket Database –RT - Request Tracker –Used by sysadmins and customers 28 May 2003
  • 6. 6 Coordination: Large Systems Team, Large Customer Base •RT: trouble ticket system •We use RT to track: –Track customer problems –Track bugs in vendor software –Schedule and control changes to the LSF cluster •You need one of these, unless you like to work too hard. 28 May 2003
  • 7. 7 Test or Change Many or All! Systems? •We use clsh (the cluster shell) to run programs on many systems serially or in parallel. •Clsh can execute programs in our cluster at over 600 systems/minute. •Example: run `uname –a’ on all systems in the tx_linux netgroup: –$ clsh –ng tx_linux ‘uname –a’ •Scared yet? –$ sudo clsh –ng tx_linux ‘halt’ 28 May 2003
  • 8. 8 Programs Crashing or Hanging? •Trace a running program –# strace -p <PID> •Run a program while tracing it –$ strace –t –v –f /bin/hostname •Everything is a file, find the files –$ lsof -p <PID> 28 May 2003
  • 9. 9 Teamwork •Working Together –While Apart! –At inconvenient times... •IRC – for text chat •Telephone headsets – for voice chat! •VNC – for shared X sessions 28 May 2003
  • 10. 10 The Tool for Tools - Perl •Obvious facts (?) –Cross-platform –Great software library (CPAN) –Well known in the EDA and Unix world –Fun to use (for some strange folks, anyway) –The strong attractive force 28 May 2003
  • 11. 11 1000 foot view of the cluster •Cricket/RRDtool •System Accounting •Syslog Server 28 May 2003
  • 12. 12 The Second Law / Entropy •Entropy is •Misteaks happen •RCS/CVS/SCCS/… –You must use revision control, or chaos will win •Sudo –Use sudo for root access, for logging and assigning limited privilege 28 May 2003
  • 13. 13 Acute vs Chronic Trouble •How do we diagnose and fix symptoms that are not easily reproduced? •Lsfbug-a program for users – Saves Unix environment – Saves LSF environment – Submits a test job to LSF – Emails the output to the LSF team 28 May 2003
  • 14. 14 Cross-Platform Compatibility •Use similar paths for similar tools – regardless of the OS or OS version – Perl should always be at the same place – even for AIX and Linux and HP-UX and … •Install user tools on NFS servers •Use package management software (opt_depot, stow) •Install systems w/Kickstart/Jumpstart/Ignite 28 May 2003
  • 15. 15 Expertise •Unix Generally •OS Specific: Linux, Solaris, HP-UX •Hardware Specific •Networking –routing, switching –services • NFS, NIS, DNS, NTP 28 May 2003
  • 16. 16 Tool List 1 •clsh mailto:quentin.fennessy@amd.com •cricket http://cricket.sourceforge.net •CVS http://www.cvshome.org •ecc module http://www.anime.net/~goemon/linux-ecc •ethereal http://www.ethereal.com •fping http://www.fping.com •hping2 http://www.hping.org •iozone http://www.iozone.org •ircd http://www.funet.fi/~irc/server •lsof ftp://vic.cc.purdue.edu/pub/tools/unix/lsof 28 May 2003
  • 17. 17 Tools List 2 •mtr http://www.bitwizard.nl/mtr •mysql http://www.mysql.com •ntop http://www.ntop.org •opt_depot http://www.arlut.utexas.edu/csd/opt_depot •Perl http://www.perl.com •RCS http://www.gnu.org/software/rcs/rcs.html •Rrdtool http://freshmeat.net/projects/rrdtool •rsync http://rsync.samba.org/rsync •RT http://www.bestpractical.com/rt •SMART tools http://smartmontools.sourceforge.net •stow http://www.gnu.org/software/stow/stow.html 28 May 2003
  • 18. 18 Tools List 3 •strace http://sourceforge.net/projects/strace •sudo http://www.courtesan.com/sudo •tusc Use Google •vnc http://www.uk.research.att.com/vn •xchat http://www.xchat.org 28 May 2003
  • 19. 19 Reading List •The Practice of System and Network Administration, by Limoncelli and Hogan –http://www.sysadminfocus.com •The Unix System Administration Handbook –http://www.admin.com 28 May 2003
  • 20. 28 May 2003 20 Trademark Attribution AMD, the AMD Arrow Logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. Other product names used in this presentation are for identification purposes only and may be trademarks of their respective companies.