Your SlideShare is downloading. ×
0
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Evergreen Sysadmin Survival Skills
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Evergreen Sysadmin Survival Skills

2,310

Published on

Slides that accompanied a three-hour crash training course on sysadmin survival skills useful for sysadmins of Evergreen open source library software. Session led by Don McMorris, Equinox Software.

Slides that accompanied a three-hour crash training course on sysadmin survival skills useful for sysadmins of Evergreen open source library software. Session led by Don McMorris, Equinox Software.

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,310
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Evergreen Sysadmin Survival Skills A presentation for the Evergreen International Conference 2009 Don McMorris, Equinox Software Inc.
  • 2. Target Audience • Personnel who will administer the ILS servers or will provide upper-tier support in an Evergreen installation, including roles in: • Hardware • Operating System • Network • Monitoring • Deep analysis/troubleshooting
  • 3. Assumptions •Experience in administration of Debian-based system at a console level • File structure organization • Configuration file maintenance • dpkg/apt, CPAN, configure/make/make install, etc. • Success installing basic Evergreen 1.4 server • Administration of a medium to large deployment • A lot of the talk will still be relevant to smaller installs too, though!
  • 4. OpenSRF architecture (handout) • XMPP Server (via ejabberd) • Facilitates IPC • OpenSRF Router • Tracks what applications are running, and where • Directs requests to individual service instances • OpenSRF Service • Takes requests, performs some type of process, and responds to the client • Evergreen consists of several OpenSRF services • OpenSRF Client • Sends requests to OpenSRF services • May be standalone or running as part of service (sub-requests)
  • 5. Source: http://evergreen-ils.org/documentation/OpenILS_Structural_Overview.png
  • 6. Source: http://www.evergreen-ils.org/documentation/OpenSRF_Server_Architecture.jpg
  • 7. OpenSRF architecture (handout) • XMPP Server (via ejabberd) • Facilitates IPC • OpenSRF Router • Tracks what applications are running, and where • Directs requests to individual service instances • OpenSRF Service • Takes requests, performs some type of process, and responds to the client • Evergreen consists of several OpenSRF services • OpenSRF Client • Sends requests to OpenSRF services • May be standalone or running as part of service (sub-requests)
  • 8. Core OpenSRF Services • opensrf.settings • opensrf.math • Used to test OpenSRF communication chain • opensrf.dbmath • Used to test OpenSRF communication chain and application database connectivity.
  • 9. Evergreen OpenSRF Services • open-ils.actor (user/org unit update functions) • open-ils.auth (user authentication) • open-ils.cat (cataloging functions) • open-ils.circ (circulation functions) • open-ils.collections (money collection [agency] functions) • open-ils.creditcard (WIP, credit card payment processing) • open-ils.cstore (general retrieval functions; C implementation) • open-ils.fielder (field mapper functions) • open-ils.ingest (process bib records – create indexes/etc.) • open-ils.penalty (standing penalty functions) • open-ils.permacrud ([user] permission functions)
  • 10. Evergreen OpenSRF Services (cont.) • open-ils.reporter (reporter functions) • open-ils.reporter-store (reporter storage functions [C implementation]) • open-ils.search (search functions) • open-ils.storage (general retrieval functions; perl implementation) • open-ils.supercat (returns bib records in various formats) • open-ils.vandelay (end-user bib import/export functions)
  • 11. Typical Hardware Setup • Database boxes • Main RW DB • RO DBs • Report server w/RO DB • N+1 Server bricks, each consisting of: • One quot;headquot; (ejabberd, apache, open-ils.settings) • One or more quot;dronesquot; (general application processing) • A quot;single server brickquot; refers to a single server with ejabberd, apache, and all applicable applications • Utility servers • N+1 SIP servers (may be part of server bricks) • quot;Batch processingquot; server (fine generator, hold targeter, etc) • 2X memcache servers • 2X firewall/Load balancers • Logger • Monitoring/alerts, mail, etc.
  • 12. Typical Hardware Setup • Redundant Ethernet • All servers have dual NICs bonded in active/failover mode • Each NIC wired to independent Ethernet switch • Redundant power (on Critical servers [DB, Memcache, etc]) • Multiple power supplies (2-4) • Each power supply independently fed • Separate electrical panel • Separate UPS system • Hardware RAID • Debian Linux
  • 13. Hardware Example • 2M patrons • 10M items • 16M circs/year • ~280 branches
  • 14. Hardware Example • T3 (45mbit) to library network; redundant 10m to public Internet • Database boxes • Main RW DB (4x quad-core Xeon, 64GB RAM) • 2X RO DBs (4x quad-core Xeon, 64GB RAM) • Report server w/RO DB (4x quad-core Xeon, 32GB RAM) • Server bricks, each consisting of: • Head (quad-core Xeon, 4GB RAM) • 2X drones (2x quad-core Xeon, 8GB RAM) • Utility servers • 2X SIP servers (2x quad-core Xeon, 16GB RAM) • Batch processing server (quad-core Xeon, 4GB RAM) • 2X memcache servers (dual-core Xeon, 4GB RAM) • 2X firewall/Load balancers (quad-core Xeon, 4GB RAM, 4 NIC) • Logger (dual-core Opteron, 4GB RAM)
  • 15. Typical Software Setup • Debian Lenny • PostgreSQL 8.2 (8.3 in testing) • Syslog-ng • Nagios (with NRPE) • Linux-ha, lvs, ldirector • NFS • OpenSRF 1.0 • Evergreen 1.4
  • 16. Network Architecture • Border connection(s) come in to pair of load balancer/firewall boxes • Active/Hot Standby • Linux HA/Heartbeat used to perform automatic failover • LVS/ldirector used to query brick heads (ldirectorping.txt) • Brick is quot;removed from rotationquot; when ldirectorping.txt isn't found or has invalid contents • Incoming connections load-balanced round-robin to brick heads • Connections NAT'd • LAN consists of 2 separate physical switches • Each server has one connection to each • Switches are linked via cross-over cable or similar • Packet manipulators (such as caching proxies) tend to cause issues
  • 17. Network Architecture • Border connection(s) come in to pair of load balancer/firewall boxes • Active/Hot Standby • Linux HA/Heartbeat used to perform automatic failover • LVS/ldirector used to query brick heads (ldirectorping.txt) • Brick is quot;removed from rotationquot; when ldirectorping.txt isn't found or has invalid contents • Incoming connections load-balanced round-robin to brick heads • Connections NAT'd • LAN consists of 2 separate physical switches • Each server has one connection to each • Switches are linked via cross-over cable or similar • Packet manipulators (such as caching proxies) tend to cause issues
  • 18. Open inbound ports (border) • 80 HTTP • 443 HTTPS • 22 SSH (possibly restricted by source IP) • 210 Z39.50 server • 6001 SIP2 RAW (NOTE: SIP communication is plain-text)
  • 19. Logging • Syslog-ng • Central logging server • All logs, including Evergreen, Apache, Postgres, and system • Typical directory structure: • Evergreen: /var/log/remote/prod/%Y/%m/%d/file.%H.log • System: /var/log/remote/sys/$HOSTNAME/%Y/%m/%d/file.%H.log • gzip nightly • Off-system archive as necessary
  • 20. Checking the logs • Identify which log file(s) to look at • What date and time are you looking for? • Identify thread trace from logs DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 osrfsys.10.log ... 2009-04-30 10:54:02 DEMOSYS-BRICK0-DRONE1 open-ils.circ: [INFO:5119:ScriptRunner.pm:60:1241097309492131] script_runner: circ_permit_hold : Patron=98725, Patron_Username=21234000054577, Patron_Profile_Group=Patron, Patron_Fines=, Patron_OverdueCount=, Patron_Items_Out=, Patron_Barcode=21234000054577, Patron_Library=Summer City Public Library Requestor=circ1, Copy=3870901, Copy_Barcode=312345000098765, Copy_status=Available, Circ_Mod=DVD, Circ_Lib=SUMMER, Copy_location=Stacks, Item_Owning_lib=4, Volume=5834287, Record=1984545, Is_Renewal: no, Is_Precat: no, Hold_request_lib=SUMMER, Hold_Pickup_Lib=4 ... • Search for thread trace to get the complete thread DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep ':1241097309492131]' osrfsys.10.log • If logs have been gzip'd, 'zgrep' can be used
  • 21. Checking the logs (cont.) • Checking the postgres logs may be worth while also DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 pg.10.log ... 2009-04-30 10:56:04 DEMOSYS-DB1 postgres[4298]: [607-7] barcode = '312345000098765', call_number = 5834287, circ_as_type = NULL, circ_lib = 4, circ_modifier = 'DVD', circulate = 't' ... DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 'postgres[4298]: [607-' pg.10.log ... • In this case, note we use the app name and PID in addition to the thread trace postgres[4298]: [607-7] •postgres = app name • 4298 = pid • 607 = thread trace • Postgres also uses a sequence ID • “-7” = “this is the 7th line of the query”
  • 22. Monitoring • ESI uses Nagios, but looking into additions/alternates • mrtg + snmp • Splunk • Ping • Free memory • System Load Average • > 1.0 sustained average on an app server often indicates a “hung” process • Critical processes (apache, jabber, memcache, clark-kent.pl, etc) • Disk Free Space
  • 23. Monitoring (continued) • Lock file age • Fine generator • Hold targeter • Backup status • File age • WAL file age • Sync to external backup • Service timeouts (“Returning NULL” in the gateway logs) • Bandwidth Usage • Motherboard sensors • Third-party ping service
  • 24. Database schema (handout) • Action • Circulations, holds, surveys, etc • Actor • Org Units (libraries, branches, bookmobiles, etc), users, cards, workstations • Asset • Call numbers (volume), copies (items), statcats • Auditor • Audit data (usually org units, users, copies, bib records) • Authority • Authority Record data
  • 25. Database schema (continued) • Biblio • Bibliographic record data (not including metabib entries) • Config • Configuration tables (audiences, bib levels, item forms, circ rules, etc) • Container • Buckets (biblio, call number, item, user) • Metabib • Indexed bibliographic record fields • Money • Fines, payments, etc.
  • 26. Database schema (continued) • Offline • Tables for the offline staff client • Permission • Profile groups, user group maps, permission allocations, etc. • Reporter • Reports, Report Templates, Report Folders, and Report Views • Vandelay • Bib/Authority import/export
  • 27. Cron Jobs • All: • ntpdate • Utility: • Hold targeter • Hold thawer • Fine generator • Reshelving completer • Notices/collections • Backup • General backup scripts • rsync
  • 28. Backups • Central file store • Large systems may use dedicated server • SCP/rsync/etc. data to central file store • Many servers (ex: DB) quot;pushquot; data to backup store • Backup server sometimes quot;pullsquot; data from servers • Sync data store to external/removable media • USB hard drive works great for this • Encrypting backups (eg: with cryptfs) definitely a good idea • Rotate backups • Ideally, every morning • Off-site • Secure storage (preferably in a safe designed specifically for backup media)
  • 29. Conify (with demo) • “System Settings” in staff client • Moves OU Types, OU Names, permissions, etc. from backend to in staff client • Autogen still required • New in 1.4 • Improvements in 1.6
  • 30. Circ Policies (with demo) • Circ matrix defined in /openils/var/circ/ # ls -lh /openils/var/circ/ total 64K -rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js -rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js -rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js -rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js -rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js -rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js -rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js -rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js • Major moves from files to DB in 1.6 (some support in 1.4) • Will facilitate config via staff client • Very flexible. Can address several aspects of users, copies, bibs, etc. • Typically will use combination of item library, library performing action (circulation, hold, etc), patron group, etc.
  • 31. Circ Policies (with demo) • Circ matrix defined in /openils/var/circ/ # ls -lh /openils/var/circ/ total 64K -rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js -rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js -rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js -rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js -rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js -rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js -rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js -rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js • Major moves from files to DB in 1.6 (some support in 1.4) • Will facilitate config via staff client • Very flexible. Can address several aspects of users, copies, bibs, etc. • Typically will use combination of item library, library performing action (circulation, hold, etc), patron group, etc.
  • 32. Q&A / Unaddressed items
  • 33. Resources • Evergreen ILS dokuwiki: http://www.evergreen-ils.org/dokuwiki/ • Evergreen SVN Repo (web UI): http://svn.evergreen-ils.org/trac/ILS/ • IRC: chat.freenode.net #evergreen • Mailing lists: http://www.evergreen-ils.org/listserv.php • Equinox Software Inc.: http://www.esilibrary.com • Presentation materials: http://www.esilibrary.com/~dmcmorris/eg09/ Presentation by: Don McMorris, Operations Manager, Equinox Software Inc. (770) 709-5569 / 877-OPEN-ILS x5569 dmcmorris@esilibrary.com

×