Evergreen Sysadmin
    Survival Skills



         A presentation for the
Evergreen International Conference 2009
  Don Mc...
Target Audience
• Personnel who will administer the ILS servers or will provide upper-tier
  support in an Evergreen insta...
Assumptions
•Experience in administration of Debian-based system at a console level
   • File structure organization
   • ...
OpenSRF architecture (handout)
• XMPP Server (via ejabberd)
   • Facilitates IPC

• OpenSRF Router
   • Tracks what applic...
Source:
http://evergreen-ils.org/documentation/OpenILS_Structural_Overview.png
Source: http://www.evergreen-ils.org/documentation/OpenSRF_Server_Architecture.jpg
OpenSRF architecture (handout)
• XMPP Server (via ejabberd)
   • Facilitates IPC

• OpenSRF Router
   • Tracks what applic...
Core OpenSRF Services
• opensrf.settings

• opensrf.math
    • Used to test OpenSRF communication chain

• opensrf.dbmath
...
Evergreen OpenSRF Services
•   open-ils.actor (user/org unit update functions)
•   open-ils.auth (user authentication)
•  ...
Evergreen OpenSRF Services (cont.)
•   open-ils.reporter (reporter functions)
•   open-ils.reporter-store (reporter storag...
Typical Hardware Setup
• Database boxes
    • Main RW DB
    • RO DBs
    • Report server w/RO DB
• N+1 Server bricks, eac...
Typical Hardware Setup
• Redundant Ethernet
    • All servers have dual NICs bonded in active/failover mode
    • Each NIC...
Hardware Example
• 2M patrons

• 10M items

• 16M circs/year

• ~280 branches
Hardware Example
• T3 (45mbit) to library network; redundant 10m to public Internet
• Database boxes
    • Main RW DB (4x ...
Typical Software Setup
• Debian Lenny

• PostgreSQL 8.2 (8.3 in testing)

• Syslog-ng

• Nagios (with NRPE)

• Linux-ha, l...
Network Architecture
• Border connection(s) come in to pair of load balancer/firewall boxes
    • Active/Hot Standby
    •...
Network Architecture
• Border connection(s) come in to pair of load balancer/firewall boxes
    • Active/Hot Standby
    •...
Open inbound ports (border)
• 80    HTTP

• 443   HTTPS

• 22    SSH (possibly restricted by source IP)

• 210   Z39.50 se...
Logging
• Syslog-ng

• Central logging server
    • All logs, including Evergreen, Apache, Postgres, and system

• Typical...
Checking the logs
• Identify which log file(s) to look at
    • What date and time are you looking for?

• Identify thread...
Checking the logs (cont.)
• Checking the postgres logs may be worth while also
    DEMOSYS-DB2:/var/log/remote/prod/2009/0...
Monitoring
• ESI uses Nagios, but looking into additions/alternates
    • mrtg + snmp
    • Splunk

• Ping

• Free memory
...
Monitoring (continued)
• Lock file age
    • Fine generator
    • Hold targeter

• Backup status
    • File age
    • WAL ...
Database schema (handout)
• Action
    • Circulations, holds, surveys, etc

• Actor
    • Org Units (libraries, branches, ...
Database schema (continued)
• Biblio
    • Bibliographic record data (not including metabib entries)

• Config
    • Confi...
Database schema (continued)
• Offline
    • Tables for the offline staff client

• Permission
    • Profile groups, user g...
Cron Jobs
• All:
    • ntpdate

• Utility:
    • Hold targeter
    • Hold thawer
    • Fine generator
    • Reshelving com...
Backups
• Central file store
    • Large systems may use dedicated server

• SCP/rsync/etc. data to central file store
   ...
Conify (with demo)
• “System Settings” in staff client

• Moves OU Types, OU Names, permissions, etc. from backend to in s...
Circ Policies (with demo)
• Circ matrix defined in /openils/var/circ/
    # ls -lh /openils/var/circ/
    total 64K
    -r...
Circ Policies (with demo)
• Circ matrix defined in /openils/var/circ/
    # ls -lh /openils/var/circ/
    total 64K
    -r...
Q&A / Unaddressed items
Resources
• Evergreen ILS dokuwiki: http://www.evergreen-ils.org/dokuwiki/

• Evergreen SVN Repo (web UI): http://svn.ever...
Upcoming SlideShare
Loading in …5
×

Evergreen Sysadmin Survival Skills

2,685 views

Published on

Slides that accompanied a three-hour crash training course on sysadmin survival skills useful for sysadmins of Evergreen open source library software. Session led by Don McMorris, Equinox Software.

Published in: Education, Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,685
On SlideShare
0
From Embeds
0
Number of Embeds
9
Actions
Shares
0
Downloads
0
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide

Evergreen Sysadmin Survival Skills

  1. 1. Evergreen Sysadmin Survival Skills A presentation for the Evergreen International Conference 2009 Don McMorris, Equinox Software Inc.
  2. 2. Target Audience • Personnel who will administer the ILS servers or will provide upper-tier support in an Evergreen installation, including roles in: • Hardware • Operating System • Network • Monitoring • Deep analysis/troubleshooting
  3. 3. Assumptions •Experience in administration of Debian-based system at a console level • File structure organization • Configuration file maintenance • dpkg/apt, CPAN, configure/make/make install, etc. • Success installing basic Evergreen 1.4 server • Administration of a medium to large deployment • A lot of the talk will still be relevant to smaller installs too, though!
  4. 4. OpenSRF architecture (handout) • XMPP Server (via ejabberd) • Facilitates IPC • OpenSRF Router • Tracks what applications are running, and where • Directs requests to individual service instances • OpenSRF Service • Takes requests, performs some type of process, and responds to the client • Evergreen consists of several OpenSRF services • OpenSRF Client • Sends requests to OpenSRF services • May be standalone or running as part of service (sub-requests)
  5. 5. Source: http://evergreen-ils.org/documentation/OpenILS_Structural_Overview.png
  6. 6. Source: http://www.evergreen-ils.org/documentation/OpenSRF_Server_Architecture.jpg
  7. 7. OpenSRF architecture (handout) • XMPP Server (via ejabberd) • Facilitates IPC • OpenSRF Router • Tracks what applications are running, and where • Directs requests to individual service instances • OpenSRF Service • Takes requests, performs some type of process, and responds to the client • Evergreen consists of several OpenSRF services • OpenSRF Client • Sends requests to OpenSRF services • May be standalone or running as part of service (sub-requests)
  8. 8. Core OpenSRF Services • opensrf.settings • opensrf.math • Used to test OpenSRF communication chain • opensrf.dbmath • Used to test OpenSRF communication chain and application database connectivity.
  9. 9. Evergreen OpenSRF Services • open-ils.actor (user/org unit update functions) • open-ils.auth (user authentication) • open-ils.cat (cataloging functions) • open-ils.circ (circulation functions) • open-ils.collections (money collection [agency] functions) • open-ils.creditcard (WIP, credit card payment processing) • open-ils.cstore (general retrieval functions; C implementation) • open-ils.fielder (field mapper functions) • open-ils.ingest (process bib records – create indexes/etc.) • open-ils.penalty (standing penalty functions) • open-ils.permacrud ([user] permission functions)
  10. 10. Evergreen OpenSRF Services (cont.) • open-ils.reporter (reporter functions) • open-ils.reporter-store (reporter storage functions [C implementation]) • open-ils.search (search functions) • open-ils.storage (general retrieval functions; perl implementation) • open-ils.supercat (returns bib records in various formats) • open-ils.vandelay (end-user bib import/export functions)
  11. 11. Typical Hardware Setup • Database boxes • Main RW DB • RO DBs • Report server w/RO DB • N+1 Server bricks, each consisting of: • One quot;headquot; (ejabberd, apache, open-ils.settings) • One or more quot;dronesquot; (general application processing) • A quot;single server brickquot; refers to a single server with ejabberd, apache, and all applicable applications • Utility servers • N+1 SIP servers (may be part of server bricks) • quot;Batch processingquot; server (fine generator, hold targeter, etc) • 2X memcache servers • 2X firewall/Load balancers • Logger • Monitoring/alerts, mail, etc.
  12. 12. Typical Hardware Setup • Redundant Ethernet • All servers have dual NICs bonded in active/failover mode • Each NIC wired to independent Ethernet switch • Redundant power (on Critical servers [DB, Memcache, etc]) • Multiple power supplies (2-4) • Each power supply independently fed • Separate electrical panel • Separate UPS system • Hardware RAID • Debian Linux
  13. 13. Hardware Example • 2M patrons • 10M items • 16M circs/year • ~280 branches
  14. 14. Hardware Example • T3 (45mbit) to library network; redundant 10m to public Internet • Database boxes • Main RW DB (4x quad-core Xeon, 64GB RAM) • 2X RO DBs (4x quad-core Xeon, 64GB RAM) • Report server w/RO DB (4x quad-core Xeon, 32GB RAM) • Server bricks, each consisting of: • Head (quad-core Xeon, 4GB RAM) • 2X drones (2x quad-core Xeon, 8GB RAM) • Utility servers • 2X SIP servers (2x quad-core Xeon, 16GB RAM) • Batch processing server (quad-core Xeon, 4GB RAM) • 2X memcache servers (dual-core Xeon, 4GB RAM) • 2X firewall/Load balancers (quad-core Xeon, 4GB RAM, 4 NIC) • Logger (dual-core Opteron, 4GB RAM)
  15. 15. Typical Software Setup • Debian Lenny • PostgreSQL 8.2 (8.3 in testing) • Syslog-ng • Nagios (with NRPE) • Linux-ha, lvs, ldirector • NFS • OpenSRF 1.0 • Evergreen 1.4
  16. 16. Network Architecture • Border connection(s) come in to pair of load balancer/firewall boxes • Active/Hot Standby • Linux HA/Heartbeat used to perform automatic failover • LVS/ldirector used to query brick heads (ldirectorping.txt) • Brick is quot;removed from rotationquot; when ldirectorping.txt isn't found or has invalid contents • Incoming connections load-balanced round-robin to brick heads • Connections NAT'd • LAN consists of 2 separate physical switches • Each server has one connection to each • Switches are linked via cross-over cable or similar • Packet manipulators (such as caching proxies) tend to cause issues
  17. 17. Network Architecture • Border connection(s) come in to pair of load balancer/firewall boxes • Active/Hot Standby • Linux HA/Heartbeat used to perform automatic failover • LVS/ldirector used to query brick heads (ldirectorping.txt) • Brick is quot;removed from rotationquot; when ldirectorping.txt isn't found or has invalid contents • Incoming connections load-balanced round-robin to brick heads • Connections NAT'd • LAN consists of 2 separate physical switches • Each server has one connection to each • Switches are linked via cross-over cable or similar • Packet manipulators (such as caching proxies) tend to cause issues
  18. 18. Open inbound ports (border) • 80 HTTP • 443 HTTPS • 22 SSH (possibly restricted by source IP) • 210 Z39.50 server • 6001 SIP2 RAW (NOTE: SIP communication is plain-text)
  19. 19. Logging • Syslog-ng • Central logging server • All logs, including Evergreen, Apache, Postgres, and system • Typical directory structure: • Evergreen: /var/log/remote/prod/%Y/%m/%d/file.%H.log • System: /var/log/remote/sys/$HOSTNAME/%Y/%m/%d/file.%H.log • gzip nightly • Off-system archive as necessary
  20. 20. Checking the logs • Identify which log file(s) to look at • What date and time are you looking for? • Identify thread trace from logs DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 osrfsys.10.log ... 2009-04-30 10:54:02 DEMOSYS-BRICK0-DRONE1 open-ils.circ: [INFO:5119:ScriptRunner.pm:60:1241097309492131] script_runner: circ_permit_hold : Patron=98725, Patron_Username=21234000054577, Patron_Profile_Group=Patron, Patron_Fines=, Patron_OverdueCount=, Patron_Items_Out=, Patron_Barcode=21234000054577, Patron_Library=Summer City Public Library Requestor=circ1, Copy=3870901, Copy_Barcode=312345000098765, Copy_status=Available, Circ_Mod=DVD, Circ_Lib=SUMMER, Copy_location=Stacks, Item_Owning_lib=4, Volume=5834287, Record=1984545, Is_Renewal: no, Is_Precat: no, Hold_request_lib=SUMMER, Hold_Pickup_Lib=4 ... • Search for thread trace to get the complete thread DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep ':1241097309492131]' osrfsys.10.log • If logs have been gzip'd, 'zgrep' can be used
  21. 21. Checking the logs (cont.) • Checking the postgres logs may be worth while also DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 312345000098765 pg.10.log ... 2009-04-30 10:56:04 DEMOSYS-DB1 postgres[4298]: [607-7] barcode = '312345000098765', call_number = 5834287, circ_as_type = NULL, circ_lib = 4, circ_modifier = 'DVD', circulate = 't' ... DEMOSYS-DB2:/var/log/remote/prod/2009/04/30# grep 'postgres[4298]: [607-' pg.10.log ... • In this case, note we use the app name and PID in addition to the thread trace postgres[4298]: [607-7] •postgres = app name • 4298 = pid • 607 = thread trace • Postgres also uses a sequence ID • “-7” = “this is the 7th line of the query”
  22. 22. Monitoring • ESI uses Nagios, but looking into additions/alternates • mrtg + snmp • Splunk • Ping • Free memory • System Load Average • > 1.0 sustained average on an app server often indicates a “hung” process • Critical processes (apache, jabber, memcache, clark-kent.pl, etc) • Disk Free Space
  23. 23. Monitoring (continued) • Lock file age • Fine generator • Hold targeter • Backup status • File age • WAL file age • Sync to external backup • Service timeouts (“Returning NULL” in the gateway logs) • Bandwidth Usage • Motherboard sensors • Third-party ping service
  24. 24. Database schema (handout) • Action • Circulations, holds, surveys, etc • Actor • Org Units (libraries, branches, bookmobiles, etc), users, cards, workstations • Asset • Call numbers (volume), copies (items), statcats • Auditor • Audit data (usually org units, users, copies, bib records) • Authority • Authority Record data
  25. 25. Database schema (continued) • Biblio • Bibliographic record data (not including metabib entries) • Config • Configuration tables (audiences, bib levels, item forms, circ rules, etc) • Container • Buckets (biblio, call number, item, user) • Metabib • Indexed bibliographic record fields • Money • Fines, payments, etc.
  26. 26. Database schema (continued) • Offline • Tables for the offline staff client • Permission • Profile groups, user group maps, permission allocations, etc. • Reporter • Reports, Report Templates, Report Folders, and Report Views • Vandelay • Bib/Authority import/export
  27. 27. Cron Jobs • All: • ntpdate • Utility: • Hold targeter • Hold thawer • Fine generator • Reshelving completer • Notices/collections • Backup • General backup scripts • rsync
  28. 28. Backups • Central file store • Large systems may use dedicated server • SCP/rsync/etc. data to central file store • Many servers (ex: DB) quot;pushquot; data to backup store • Backup server sometimes quot;pullsquot; data from servers • Sync data store to external/removable media • USB hard drive works great for this • Encrypting backups (eg: with cryptfs) definitely a good idea • Rotate backups • Ideally, every morning • Off-site • Secure storage (preferably in a safe designed specifically for backup media)
  29. 29. Conify (with demo) • “System Settings” in staff client • Moves OU Types, OU Names, permissions, etc. from backend to in staff client • Autogen still required • New in 1.4 • Improvements in 1.6
  30. 30. Circ Policies (with demo) • Circ matrix defined in /openils/var/circ/ # ls -lh /openils/var/circ/ total 64K -rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js -rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js -rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js -rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js -rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js -rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js -rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js -rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js • Major moves from files to DB in 1.6 (some support in 1.4) • Will facilitate config via staff client • Very flexible. Can address several aspects of users, copies, bibs, etc. • Typically will use combination of item library, library performing action (circulation, hold, etc), patron group, etc.
  31. 31. Circ Policies (with demo) • Circ matrix defined in /openils/var/circ/ # ls -lh /openils/var/circ/ total 64K -rwxr-xr-x 1 opensrf opensrf 25K 2009-05-18 21:47 circ_duration.js -rwxr-xr-x 1 opensrf opensrf 730 2009-05-06 16:46 circ_groups.js -rwxr-xr-x 1 opensrf opensrf 1.6K 2009-05-18 21:00 circ_item_config.js -rwxr-xr-x 1 opensrf opensrf 8.1K 2009-05-06 16:46 circ_lib.js -rwxr-xr-x 1 opensrf opensrf 439 2009-05-06 16:46 circ_permit_copy.js -rwxr-xr-x 1 opensrf opensrf 840 2009-05-06 16:46 circ_permit_hold.js -rwxr-xr-x 1 opensrf opensrf 652 2009-05-06 16:46 circ_permit_patron.js -rwxr-xr-x 1 opensrf opensrf 137 2009-05-06 16:46 circ_permit_renew.js • Major moves from files to DB in 1.6 (some support in 1.4) • Will facilitate config via staff client • Very flexible. Can address several aspects of users, copies, bibs, etc. • Typically will use combination of item library, library performing action (circulation, hold, etc), patron group, etc.
  32. 32. Q&A / Unaddressed items
  33. 33. Resources • Evergreen ILS dokuwiki: http://www.evergreen-ils.org/dokuwiki/ • Evergreen SVN Repo (web UI): http://svn.evergreen-ils.org/trac/ILS/ • IRC: chat.freenode.net #evergreen • Mailing lists: http://www.evergreen-ils.org/listserv.php • Equinox Software Inc.: http://www.esilibrary.com • Presentation materials: http://www.esilibrary.com/~dmcmorris/eg09/ Presentation by: Don McMorris, Operations Manager, Equinox Software Inc. (770) 709-5569 / 877-OPEN-ILS x5569 dmcmorris@esilibrary.com

×