Troubleshooting Urouter Problems: WebEx Presentation

6,103 views

Published on

• Overview of exclusive and shared architecture
• Do’s and don’ts
• Under the hood – the Urouter log
• Troubleshooting techniques

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
6,103
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
118
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Troubleshooting Urouter Problems: WebEx Presentation

  1. 1. Troubleshooting Urouter Problems Oct. 31, 2013 Chris Breemer, Compuware Technical Support Table of contents 1. 2. 3. 4. 5. 6. Overview of exclusive and shared architecture Client-side troubleshooting Under the hood – the urouter log Server-side troubleshooting Troubleshooting runtime problems Troubleshooting web connection problems What NOT to expect • • • • • A comprehensive discussion of all possible urouter and userver features. A configuration and performance tuning guide. A Tomcat/IIS/Web configuration guide. A document for all platforms. Only Windows and Unix are discussed. A list of all possible problems and their solution What we’ll do • • • • Take you through the entire connection process Examine the urouter trace file to get the big picture of what happens under the hood Highlight the usual spots where things can go wrong Present some of the tools that can be used for monitoring and troubleshooting
  2. 2. 1 – Overview of exclusive and shared architecture Traditionally, the Uniface client-server architecture gave every client its own dedicated polyserver. A simple and robust protocol, it was entirely satisfactory in the time when it was designed. However over the years, with installations growing bigger and bigger, the limitations of this approach became clear: • • • Too many processes created on the server – 300 clients meant 300 polyserver processes on the server, using up a lot of memory and other resources. Most platforms also have certain restrictions and quota that limit the number of polyservers. Too many database connections, because each polyserver process needs to connect to the database, causing strain on the database High licensing costs - well, not with the old SEK-based licensing but with the new DLM licensing which checks on numbers. Therefore in Uniface 8, we introduced the shared architecture. The key to the shared architecture is a new middleware process called the urouter. Clients now only communicate with the urouter, who maintains a pool of uservers which can basically serve any client as long as that client does not ask for a specific server. With this approach it is possible to have far less userver (the new name for polyserver) processes than there are client processes. This can greatly reduce Uniface footprint on the server. It is possible to assign servers a specific role, like being a database-only server, application-only server, file-only server, or even a server specific for one database. This feature is seldom if ever used however, and we will not discuss it here. In this presentation, any server is a database server, application server, and file server simultaneously. By default, a database request from a client has no preference for a specific server. In that case that urouter can pick one of the currently running uservers that are in status Idle, and assign it the client’s request. Typically, that will be the most recently used userver. This is a good moment to discuss the specific states that shared uservers can be in : Idle, no state The userver is not busy executing a request, it is not locked to a client, and has no state (i.e. instances). It is available to serve any client’s request as long as that client does not have a preference. In a web environment, where stateless components are used, this is the state we will expect to see for most uservers. Idle, has state The userver is not busy executing a request, it is not locked to a client, but has state (i.e. one or more open instances). It is available to serve any client’s request as long as that client does not have a preference, but a client who does an activate of the open instance will use only this userver.
  3. 3. Locked The userver is locked to a specific client when it is in a transaction. This typically happens when the client modifies an occurrence and thus locks a row in the database on the server. This userver is not available to serve requests from other clients until the transaction is committed (i.e. the modifications stored). The client in question will during this time ONLY use this same userver. Note that this userver can also have state for the same or any other client. This you can see in the Router Monitor, but not in the urouter log. Busy The userver is currently executing a request for a client and is obviously not available for anything until the request is completed and the response sent back to urouter. Locked uservers often cause a problem in old applications that were migrated from Uniface 7 or earlier. These applications often keep transactions open for a long time, so that new uservers have to be started all the time. In the worst case, one could end up with as many uservers as there are clients, plus the added overhead of the urouter juggling all that data traffic. This is clearly not good for performance. For this reason, customers with old applications are often advised to use exclusive uservers only. The shared architecture is best suited for applications that were specifically designed for it. Another potential pitfall is the fact that shared uservers can hold state for multiple clients, thus enabling one client to overwrite or access the other client’s data. For example consider this case. Client 1 creates an instance of service A and activates an operation in A that sets a component variable in the service. This is all handled by the same userver, say server with sid=1, which now is in status Idle, has state. Client 2 also creates an instance of service A and activates the same operation in A setting the component to another value. Because server 1 is available, it will handle the request, using the same instance. Client 1 now wants to read back the value it had set, but gets the value that was set by client 2. In combination with the Locked status, this can also lead to unexpected wait situations. Consider this example; Client 1 retrieves a row and creates an instance of service A. This is all handled by the same userver, say server with sid=1, which now is in status Idle, has state. Client 2 also retrieves a row and modifies it. Because server 1 was available, both requests are handled by server 1, and the modification causes server 1 to be locked to client 2. Client 1 now wants to activate an operation in the instance it had created. This can obviously only be handled by server 1, but server 1 is now locked, causing client 1 to wait until client 2 completes its transaction. Both scenarios can happen in a migrated client-server application that does not manage its instances and transactions carefully. In such cases, using exclusive servers may be the better choice.
  4. 4. The Router Monitor To find out if shared uservers are locked and/or have state, you use the Router Monitor (urmon.exe). You first need to connect to the urouter in question, using Urouter>Connect, then from the pulldown menu choose Servers->Show. By clicking the small icons to the left of the Server ID fields you can being up a Server state form for each userver, showing that one of our two uservers is Locked and the other isn’t. Both are Idle, however Server 1 is not available to other clients than the one owning the lock. The 3 fields together labeled Current State can have the respective values Idle/Busy, Locked/Has context and Starting/Exiting. Unfortunately, there is no way to find out why exactly an userver is locked, or what context it is that an userver is holding. The urouter does not keep this kind of information. It is worth nothing that you cannot see exclusive uservers in the Router Monitor (although you may see them briefly during the connection process).
  5. 5. 2 - Client side troubleshooting Where to start investigating when things don’t work depends on what information you are given. If the customer or end user rings saying “I can’t get into the application”, and you know the application involves an urouter, it is probably best to start right from the beginning, verifying all steps one by one. A common mistake is to take certain things for granted, for example that the client is connecting to the urouter that you think it should be, whereas maybe it is going somewhere completely different. It is good practice not to take ANYTHING for granted, even the things that you are sure should not be wrong. Remember Murphy ! As an example, let’s investigate a classic client/server scenario (we’ll get to web later), where a client application on Windows is supposed to start an exclusive userver on Solaris to retrieve data from Oracle. Many user applications have an application-specific logon screen that verifies the user credentials in the server database, and it is often at that point that something fails. The failure can be anywhere between client and database, but the user’s perception is only that they “can’t log on”. Typical error symptoms here can be • • • an hourglass, and or application no longer responds an error message Logon to database failed Logon (TCP:violet+10094|chris|***|userver) failed with status -21, Network logon error The most important thing is to get the EXACT error symptoms from the user. If they say there was an error on the screen, have them do it again and take a screenshot before closing the application. Always ask for a message frame with $ioprint = 255. Always ask for customer’s assignment file(s) but do not take for granted that they send the right file(s). Whenever you look at an assignment file ask yourself if the application is indeed using this file. The quickest way to find out is to insert a deliberate error in the file and verify the application now refuses to start up. Uniface will always give a clear message about assignment file errors (except in a server process, which is very inconvenient, we’ll get to that). For example, insert this line in your assignment file [error] and start the application. This results in a transcript window with the exact error and error location.
  6. 6. If you don’t see that error, your application is NOT using this asn-file. You will see such an error on the screen even if the transcript is redirected to a file elsewhere in the asnfile(s). I have found this little trick extremely useful over the years. Having made 100% sure what assignment file(s) the application uses, we can now start putting things in there to troubleshoot. If the symptom was an hourglass, it is best to apply a client connect timeout : [settings] $net_timeout cct=20s This will cause the client to give up after 20 seconds when the network path can’t be connected. Next step is to verify which path is causing the problem. In this case this will be a network path. If you locate the path in the asn-file, do verify the application indeed tries to open this path, by replacing the password in the logon string by a question mark, e.g. $def tcp:violet+10094|chris|?|userver –ex This should cause Uniface to display a logon form for path $DEF: If you don’t see such a logon form, you are definitely not using this path. Keep in mind that the same path could be defined elsewhere, maybe in usys.asn or an included asnfile. It is best to avoid any redundancy in your assignments, and not rely on Uniface’s rules about which assignment takes preference (which depends on the kind of assignment and is not always what you’d expect).
  7. 7. This is a good place to give some general tips about maintaining assignment files such that they are more easy to read and maintain. White space is allowed in many places, do take advantage of it. Keeping your asn-files tidy, clear, and unambiguous is a good practice which pays off when there’s a problem. Some tips than can be useful (though you may not want to follow all of them) : • • • • • • • • • • Use spaces (not tabs) to create a tabular layout Equal signs can generally be omitted (except when part of the right-side value) Work alphabetically Use uppercase for section names, lowercase elsewhere (unless it is case sensitive like filenames on Unix) Avoid unnecessary comments and commented-out lines Avoid empty sections Avoid redundancy Do not use any settings or flags unless you know why Specify full pathnames for files Do not rely on Uniface’s precedence rules (avoid duplication across asn-files) To illustrate some of these points, see how much clearer and tidier an average asn-file can become:
  8. 8. A typical jumble… … made nice and tidy ! [SETTINGS] $trace_is_true $variation=CUS $keyboard=MSWINX $enhanced_edit=all $active_field=col=21 $curocc_video=col=21 ;$def_curocc_video=col=21 [SETTINGS] ;;; Initialize Booleans as false $STORE_BOOLEANS_AS_FALSE ;search in DICT first ! $search_object=DBMS_FIRST $search_descriptor=DBMS_FIRST ;search only in DOL and URR ;$search_object=FILE_ONLY ;$search_descriptor=FILE_ONLY ;;; Uniface License File ;;$license_options LM_LICENSE_FILE="P:UNIFACE8Uniface_License.xml" ;;; Uniface 9 License file $license_options LM_LICENSE_FILE=7188@licserver $active_field $curocc_video $enhanced_edit $keyboard $license_options $search_descriptor $search_object $store_booleans_as_false $trace_is_true $variation col=21 col=21 all mswinx lm_license_file=7188@licserver dbms_first dbms_first cus [PATHS] $dict $idf $sys $uuu ora:ora10g|oradict|oradict $dict $dict $dict [ENTITIES] *.UVCS $dict:*.* [USER_3GL] [USER_3GL] ;;; demandload=KERNEL32.DLL,ADVAPI32.DLL,CusMail32.dll, ScreenPrint.dll ;;; Load User DLL's .. dllCusMail32.dll /preload ADVAPI32.DLL /preload .. dllGAPI32.DLL /preload ;;; Load local kernel32.dll laden KERNEL32.DLL /preload [LOGICALS] [PATHS] $DICT ORA:ORA10G|oradict|oradict $IDF $SYS $UUU = $DICT = $DICT = $DICT [ENTITIES] *.TEXT *.DICT *.UVCS $DICT:*.* $DICT:*.* $DICT:*.* ..dllcusmail32.dll ..dllgapi32.dll advapi32.dll kernel32.dll /preload /preload /preload /preload
  9. 9. Back to troubleshooting the connection. Now that we know beyond doubt which path is causing the problem, let’s look at the components of the path, e.g. $tcp = violet+13001|john|smith|orsv This simple path assumes quite some things: • • • • • We can reach the host violet on the network That machine has an urouter running on port 13001 That machine has an account for user john (password smith) The urouter assignment file on that machine has the UST orsv defined The server described in the UST orsv can be started under the account john Let’s check all this step by step. Is the network accessible and can we actually reach that host ? On the commandline, do $ ping violet for a sanity check. The expected output will look like this C:>ping violet Pinging violet.emea.cpwr.corp [172.16.32.135] with 32 bytes of data: Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Reply from 172.16.32.135: bytes=32 time<1ms TTL=255 Ping statistics for 172.16.32.135: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 0ms, Average = 0ms The ping command is available on Windows, Unix and Linux. It can already diagnose certain connection issues. For example; C:>ping violet Pinging violet.emea.cpwr.corp [172.16.33.217] with 32 bytes of data: Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Reply from 172.16.43.135: Destination host unreachable. Ping statistics for 172.16.33.217: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
  10. 10. This host was known to the network (the DNS server on IP address 172.16.43.135) but could not be reached. Most often this means the machine is turned off, or does not have a working network connection, or was maybe disposed of without reconfiguring the DNS tables. When you get such an error, consult your network administrator. In Uniface, an unreachable host will cause the client to wait forever (showing the hourglass) unless you have a client connect timeout, in which case it will eventually show TCP error 10060 (on Windows, on Unix the number will be different). TCP error [10060]: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. (10060) Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -18, Failed to connect to URouter This is on Windows. If the client is on Unix, this situation will produce error 111 (Connection refused) : TCP error [111]: Connection refused (111) Logon (TCP:violet+10094|chris|***|userver -ex) failed with status -18, Failed to connect to URouter The message Connection refused sometimes confuses people. It is not a connection problem, or a rights/access/permission problem, it means that you could connect to the server but no process is listing on the specified port. Another typical ping response is this C:>ping violet Ping request could not find host violet. Please check the name and try again. meaning that you have the host name wrong or misspelled. In Uniface, a wrong hostname will produce following error in the message frame: TCP: No such host (-4) Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -18, Failed to connect to URouter If the ping is successful, we know the host is alive and reachable. So how do we know it’s listening on port 13001 ? You can of course go and log on to that machine and check, which we will do, but you can already get some info right here using the telnet command, which is available on all Unix/Linux and most Windows versions. Telnet actually tries to connect, rather than do just a network sanity test. It is important to use both ping (which works on the IP level) and telnet (which works on the TCP level). While you cannot use telnet to actually converse with the urouter, you can use it to test the connection, specifying the port you want to reach : C:>telnet violet 13001 Connecting To violet...Could not open connection to the host, on port 13001: Connect failed Note that telnet does not tell you WHY it can’t connect. In this case we already made sure with ping that the host was alive and kicking, so the conclusion is that no process is listening on violet on port 13001. This typically means that urouter is not running, or that we have the port number wrong.
  11. 11. The other possible response of telnet is … nothing at all! That usually means it has connected ! On Windows, the screen goes blank, and you have to press Ctrl-] (the Telnet escape character) to get the prompt: Welcome to Microsoft Telnet Client Escape Character is 'CTRL+]' Microsoft Telnet> On Unix, telnet will display the telnet banner: Trying 172.16.32.135... Connected to cwnl-violet.emea.cpwr.corp (172.16.32.135). Escape character is '^]'. and also here you press Ctrl-] to get the prompt: ^] telnet> Now is a good time to verify you are indeed connected to port 13001 on violet. Use TcpView ( a tool to be presented hereafter) :
  12. 12. Or else use the netstat command: C: >netstat Active Connections Proto TCP TCP TCP TCP TCP TCP TCP ... TCP TCP TCP TCP TCP TCP TCP Local Address 0.0.0.0:80 0.0.0.0:135 0.0.0.0:445 0.0.0.0:623 127.0.0.1:51635 127.0.0.1:51938 127.0.0.1:56213 Foreign Address AMS090861D1:0 AMS090861D1:0 AMS090861D1:0 AMS090861D1:0 AMS090861D1:51549 AMS090861D1:0 AMS090861D1:56214 State LISTENING LISTENING LISTENING LISTENING ESTABLISHED LISTENING ESTABLISHED 172.16.43.135:56623 172.16.43.135:56643 172.16.43.135:56651 172.16.43.135:56655 172.16.43.135:56656 172.16.43.135:56659 172.16.43.135:56660 emea-ams-fs101:microsoft-ds CLOSE_WAIT 65.55.246.20:https ESTABLISHED lhr08s02-in-f4:https TIME_WAIT lhr08s02-in-f0:https TIME_WAIT emea-ams-ps002:microsoft-ds ESTABLISHED cwnl-violet:13001 ESTABLISHED db3msgr6011506:https ESTABLISHED You should find your connection on the specified host and port with the status ESTABLISHED. Now it should definitely be possible to make a connection to a userver on the other end. Before moving on to server-side troubleshooting, let’s mention some very useful (and free!) tools I regularly use on Windows client. We’ll see some of these tools in action later on.
  13. 13. TcpView from http://technet.microsoft.com/en-us/sysinternals/bb897437.aspx See all network connections as well as services listening on ports. TcpView is basically a graphical interface on top of netstat, but is easier to use. Also you can see process properties, and kill specific processes and sockets. A good way to see if your local service is listening or your client has been connected to a service.
  14. 14. Process Monitor from http://technet.microsoft.com/en-us/sysinternals/bb896645.aspx Monitor system activity of all or selected processes. This can include file I/O, registry activity, network activity and process activity. A great way to find out which file is not being found, or why an “access denied” error is given. You can also see details about sockets and process and thread creation. Generally this is the first tool I deploy when troubleshooting any file-related problem, on Windows.
  15. 15. Process Explorer from http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx See all running processes and their relations and properties. You can see DLL’s and all other types of resources in use by a specific process, as well as many properties of the process like threads and stack traces. Ideal to check which files, pipes, etc are being used, and which DLL’s are loaded and if they are of the correct version. A very handy option of Process Explorer is the Find->Handle or DLL function in the pulldown menu, which shows you which process has a specific file or DLL open.
  16. 16. Dependency Walker from http://www.dependencywalker.com/ Examine DLL dependencies and exports/entrypoints/ordinals of .exe and .dll files. The profiling option for .exe files is like a debugger, a great way to investigate problems related to loading and initializing dll’s.
  17. 17. Network Monitor from http://www.microsoft.com/en-us/download/details.aspx?id=4865 The tool for monitoring of network traffic in selected processes and conversations. The sure way to see what really goes over the network line, and see details about several network layers. The tool has knowledge of nearly all network-related protocols. Network Monitor is a step up from earlier tools like WireShark (formerly Ethereal) and tcpdump (Unix only). One disadvantage is that (AFAIK) you cannot monitor network traffic within the machine, i.e. between a client and a locally running urouter.
  18. 18. Redmond Path from http://download.cnet.com/Redmond-Path/3000-2094_4-10811594.html Maintaining the Windows PATH variable is an arduous and error-prone task if you use the standard Windows interface. This GUI path manager makes life a lot easier, and is great to remove unwanted entries from the PATH and move things around.
  19. 19. 3 - Under the hood – the urouter log In most or many cases the key to troubleshooting connection problems is first of all the urouter log. By default, the logfile generated by urouter only displays top-level errors such as the dreaded -25, -17 and -16 errors. These are quite useless, except for alerting you the fact that there IS a problem, so you always need to go back to the customer to reproduce the problem with full logging. To see the details, you must use $ioprint = 255, and to get the maximum info, use tracing. The recommended settings are these [settings] $ioprint $trc_start $trc_levels $trc_info 255 urouter.trc 9A-Za-z6c5s0R5t0z0N cat,lvl,dtt which will be the settings used for this presentation. The term "urouter log" will henceforth mean the urouter.trc file generated by these settings. TIP While troubleshooting, put your logging where you can SEE it. Don’t stuff it away in files with hard to remember names and locations. Nothing beats seeing things happen in real time. Once you succeed in writing a log- or tracefile, download a program like WinTail or BareTail so you can follow the log and see stuff rolling over the screen as you progress. This is on Windows. On Unix, you can use the tail command to follow a file being written to. Also handy on Unix is directing your logfile to a terminal window. E.g. [settings] $ioprint = 255 $putmess_logfile = /dev/pts/2 Before using the urouter log for troubleshooting, let's go through it following a successful connection. This gives a pretty good idea of the flow of events, as well that it tells you what you should expect to see (never examine a log without knowing what it SHOULD look like). We will list all lines from the log, interjected by explanations of what is about o happen. The following is what you see when an urouter is started on Windows. The startup banner contains valuable information about the environment. On Windows, it will display the full command line, unless it is a service we are starting, in which case it displays the service name (like in this case: Uniface 96 Development URouter). On other platforms, the command line is not available in this banner.
  20. 20. [startup] [startup] [startup] [startup] [startup] [startup] [startup] [startup] [startup] [startup] ====================================================================== Date/time : 2013-09-20 13:09:29.27 Uniface : [MSW] 9.6.03.02, X301 (Sep 18 2013), $ioprint=255 Command : Uniface 96 Development URouter, pid=3620 Directory : d:uf96commonbin OS : Windows 7 Service Pack 1 (Build 7601) Processor : Intel64 Family 6 Model 42 Stepping 7, Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 8149 Mb Hostname : AMS090861D1.clients.emea.cpwr.corp User : system ====================================================================== Urouter.exe, which is not much more than a startup shell, loads urout.dll, which contains most or all of the urouter code. Always check that the DLL version is the one expected (i.e. the same as the version of urouter.exe). This check is of course specific for Windows, on other platforms we don't have the luxury of DLL versions. Also displayed is the source version number of rout.c, which is important as that file contains most of the urouter code. 9 9 1F 1379682569 2F 1379682569 Loaded 'urout' from d:uf96commonbinurout.dll, version: 9.6.03 CONT_ID=%fv: rout.c-163 % %dc: Mon Mar 04 16:01:33 2013 % X301 Urouter declares itself started and creates a listen thread: 9 9 9 9 9 3F 4F 5F 6F 7F 1379682569 1379682569 1379682569 1379682569 1379682569 URouter started at 20-sep-2013 13:09:29 URouter pid=3620;rid=E689D463-26F2-4930-BADC-F2B7D5DFD2B3 started thread to listen to TCP:+13001 UROUTERSTART: waiting for listening threads listen_net: new thread active, cnt=2, lst=0, pmq=0 The listen thread loads the PSV middleware DLL umwpsv10.dll. This DLL implements the polyserver protocol, like handshaking and building of specific client-server messages. Again, do verify the DLL version. 9 8F 1379682569 Loaded 'umwpsv10' from d:uf96commonbinumwpsv10.dll, version: 9.6.03 X301 The listen thread loads the TCP driver, utcp10.dll, so that it can start listening for connection requests. This DLL is the actual interface to TCP/IP, i.e. the network, using sockets for connection. As always, do check the DLL version. 9 9F 1379682569 Loaded 'utcp10' from d:uf96commonbinutcp10.dll, version: 9.6.03 X301 Some internal housekeeping… 5 5 5 1s 1379682569 2s 1379682569 3s 1379682569 UNWTCP: enter TCP(6424304) call=NETINFO, chn=0, lst=0 UNWTCP: exit TCP(6424304) call=NETINFO, chn=0, lst=0, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(6424304) call=NETCREATE_SHARED, chn=0, lst=0 Urouter now calls bind() (this is a function in the TCP socket API) to associate the name of the host with the socket. 3 4s 1379682569 UNWTCP: TCP6create : bind(): chn=613 hst=AMS090861D1.clients.emea.cpwr.corp on TCP4
  21. 21. Some more housekeeping. Unfortunately, the listen() call that will put the socket in listen mode is not displayed here, unless it fails. 5 5 5 5 9 5s 6s 7s 8s 10F 1379682569 1379682569 1379682569 1379682569 1379682569 UNWTCP: exit TCP(6424304) call=NETCREATE_SHARED, chn=0, lst=613, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(6459368) call=NETINSTANCE, chn=0, lst=613 UNWTCP: exit TCP(6459368) call=NETINSTANCE, chn=0, lst=613, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(6459368) call=NETCONNECT, chn=0, lst=613 UROUTERSTART: All listening threads started This is where the log pauses when urouter is successfully started up. It is now listening for connection requests on its designated port 13001. It is always a good idea to verify, using a command like netstat or TcpView, that the socket is in LISTEN or LISTENING mode. Here is how that looks in TcpView : On Unix or Linux (or on Windows, if you prefer the command line to the GUI) you use the netstat command: $ netstat -a | grep 13001 *.13001 *.* 0 0 49152 0 LISTEN So now, our urouter is all set up to go, waiting to get to work. From now on, the flow of events depends on whether the client requested an exclusive or shared userver. Let's first look at an exclusive connection. The request from a client comes in: 9 5 5 5 10F 11s 12s 13s 1379682041 1379682041 1379682041 1379682041 accepted new connection on TCP:+13001 UNWTCP: enter TCP(3512392) call=NETINSTANCE, chn=0, lst=609 UNWTCP: exit TCP(3512392) call=NETINSTANCE, chn=0, lst=609, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(3512392) call=NETCONNECT, chn=0, lst=609 A thread is created to handle the request, so that urouter has its hands free to accept new requests :
  22. 22. 9 1 2Z 1379682041 3Z 1379682041 thpsv: new thread received u=3511008, thp=7798664, net=3510248, upsv=0, rmth=0, tha=0 thpsv: new thread active chn=621, cnt=3, lst=1, pmq=0, cc=1 The urouter reads the socket to obtain the client's connection request 5 5 14s 1379682041 15s 1379682041 UNWTCP: enter TCP(3510248) call=NETGET, chn=621, lst=609 UNWTCP: exit TCP(3510248) call=NETGET, chn=621, lst=609, result=NET_SUCCESS, err=0 The request was for an exclusive userver (EXCLTCON). You see full details about the requesting client (clt=) and the logon information (log=). 9 9 9 11F 1379682041 12F 1379682041 13F 1379682041 From Client:chn=621;len=151: EXCLTCON; clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=14968;tid=13548;sid=0;usr=cwnl-chris;ust=) log=(hst=TCP:localhost+13001;usr=emeacwnl-chris;ust=userver -ex) Next is some housekeeping : 9 1 5 5 5 5 14F 1a 2a 3a 16s 17s 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=14968, ust= claimsrv: want sid=0 usr=cwnl-chris ust=userver mnem=ANY ex=1 strt=1 prepare_to_wait: for sid=1; ust=userver prepare_to_wait: Queued client entry #1 in server queue UNWTCP: enter TCP(3510248) call=NET_GETOSCHAN, chn=621, lst=609 UNWTCP: exit TCP(3510248) call=NET_GETOSCHAN, chn=621, lst=609, result=NET_SUCCESS, err=0 The big moment, urouter is going to start our userver. Note that urouter assigns a server id (srvid or sid) to each userver it starts. For an exclusive server this is not so relevant but for shared servers this number is all-important. 9 15F 1379682041 svstart: starting server: user=emeacwnl-chris; pgm=d:uf96commonbinuserver.exe -srvid=1 -dnp=TCP:+13001||DA0210E1-A6DC-44C5-89E1-58627D9ABCAF| -drv=ANY -ust=userver -chn=620 -ex /adm=d:uf96commonadm dir=D:uf96 1 1S 1379682041 useCreatePAU: Inheriting handle=620 This is a very important line, the one you will always be looking for first. Here, control is effectively passed to the operating system to userver process. It is obviously a point where many things can go wrong. If so, an error message will usually be reported immediately. Note that userver is passed a handle to the open connection with the client (chn=620). This is known as connection inheritance. The userver will use this same channel (after urouter has closed its copy of it) to talk directly with the client. In this example there is no error so urouter reports userver successfully started, and proceeds to wait for the userver reporting back – again a point where things can go wrong, if userver has some startup problem or aborts prematurely. Note the process id (pid), that may come in handy when you go looking for the userver process or maybe a log- or tracefile that contains the pid in its name. 9 5 5 16F 1379682041 4a 1379682041 5a 1379682041 svstart: Succesfully launched server, new pid=672 handle_wait: wait for server sid=1; ust=userver handle_wait: wait for client entry #1 in server queue
  23. 23. Next, we see the newly created userver establishing a network connection to urouter (it has just been started so it has none yet). In this stage, the userver is just another client to urouter, with a request that needs to be handled, so a new thread is created which will be terminated once the userver has successfully registered. 3 5 9 5 5 5 9 1 5 18s 19s 17F 20s 21s 22s 4Z 5Z 23s 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 UNWTCP: TCPaccept: chn=665 got host=AMS090861D1.clients.emea.cpwr.corp on TCP4 UNWTCP: exit TCP(3512392) call=NETCONNECT, chn=665, lst=609, result=NET_SUCCESS, err=0 accepted new connection on TCP:+13001 UNWTCP: enter TCP(3524160) call=NETINSTANCE, chn=0, lst=609 UNWTCP: exit TCP(3524160) call=NETINSTANCE, chn=0, lst=609, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(3524160) call=NETCONNECT, chn=0, lst=609 thpsv: new thread received u=3523320, thp=7798664, net=3512392, upsv=0, rmth=0, tha=0 thpsv: new thread active chn=665, cnt=4, lst=1, pmq=0, cc=1 UNWTCP: enter TCP(3512392) call=NETGET, chn=665, lst=609 The server registration comes in (SRVCON). Only now, urouter can be sure the userver is alive and kicking, and include it in its internal administration. 5 24s 1379682041 UNWTCP: exit TCP(3512392) call=NETGET, chn=665, lst=609, result=NET_SUCCESS, err=0 9 18F 1379682041 From Server:chn=665;len=219: SRVCON; 9 19F 1379682041 clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=672;tid=7216;sid=1;usr=cwnl-chris;ust=userver) 9 20F 1379682041 log=(hst=TCP:AMS090861D1.clients.emea.cpwr.corp+13001;usr=cwnl-chris;ust=userver -drv=ANY oschn=620;rid=DA0210E1-A6DC-44C5-89E1-58627D9ABCAF) 5 1c 1379682041 srvload: local server registering sid=1;rid=DA0210E1-A6DC-44C5-89E1-58627D9ABCAF 9 21F 1379682041 reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=672, ust=userver 9 22F 1379682041 srvload: this is server sid=1 Urouter tells userver it has been successfully registered (CONANS) and can start communicating directly with the client. 9 5 5 23F 1379682041 25s 1379682041 26s 1379682041 To Server:chn=665;len=3: CONANS; continue:sid=1: UNWTCP: enter TCP(3512392) call=NETPUT, chn=665, lst=609 UNWTCP: exit TCP(3512392) call=NETPUT, chn=665, lst=609, result=NET_SUCCESS, err=0 Following tracing indicates that the client and userver are now connected to each other. 5 5 1 9 5 6a 7a 6Z 24F 8a 1379682041 1379682041 1379682041 1379682041 1379682041 notify_next_client: finding next client for sid=1 notify_next_client: client #1 in server queue can use sid=1 thpsv: thread exit, cnt=3, lst=1, pmq=0, cc=0 handle_wait: Queued client #1 continues with sid=1, ust=userver exclusive match reserved Here, urouter closes its end of the inherited socket : 5 5 27s 1379682041 28s 1379682041 UNWTCP: enter TCP(3510248) call=NETCLOSE, chn=621, lst=609 UNWTCP: exit TCP(3510248) call=NETCLOSE, chn=621, lst=609, result=NET_SUCCESS, err=0 You may ask, why it shows chn=621 here, and not chn=620 as we saw above. It’s really the same thing. Internally, Uniface uses the actual handle plus one, and is not always consistent in which value to display.
  24. 24. Following lines look alarming – stopping server ? We’ve just started it ! But really, all this means is that the entries for this this userver, as well as its client, are being removed from urouter’s internal administration. The userver and client of course remain running. 9 9 5 5 5 5 25F 26F 29s 30s 9a 10a 1379682041 1379682041 1379682041 1379682041 1379682041 1379682041 Stopping server sid=1; shut=0 mode=normal Reason for stop: Serv entry is given free UNWTCP: enter TCP(3512392) call=NETDISCONNECT, chn=665, lst=609 UNWTCP: exit TCP(3512392) call=NETDISCONNECT, chn=0, lst=0, result=NET_SUCCESS, err=0 notify_next_client: finding next client for sid=1 notify_next_client: no clients found for sid=1 Lastly, urouter removes the thread it had created to handle the server registration. 1 7Z 1379682041 thpsv: thread exit, cnt=2, lst=1, pmq=0, cc=0 At this point, urouter effectively forgets all about what has just happened. Client and userver are connected and can do without the urouter. This also means that exclusive connections are not visible in the Router Monitor (URMON). So, for problems between a client and an exclusive server, once they have connected to each other, the urouter log is not the place to look. We will get back to that. Now, let’s see how a shared connection looks under the hood. The urouter log will be virtually the same until the moment when a client connects. The difference starts once urouter discovers what connection the client wants. Whereas with the exclusive connection we saw 9 11F 1379682041 From Client:chn=621;len=151: EXCLTCON; with a shared connection we now see 9 9 9 9 11F 12F 13F 14F 1380822143 1380822143 1380822143 1380822143 From Client:chn=621;len=147: CLTCON; clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=13288;tid=2884;sid=0;usr=cwnl-chris;ust=) log=(hst=TCP:localhost+13001;usr=emeacwnl-chris;ust=userver) reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=13288, ust= and from here on things go a bit differently. Instead of starting a server right away, urouter ‘shakes hands’ with the client, then sends back a message asking what exactly it wants to be done. 9 5 5 5 5 9 9 5 5 5 5 15F 16s 17s 18s 19s 16F 17F 20s 21s 22s 23s 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 To Client:chn=621;len=2: CONANS; continue: UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, From Client:chn=621;len=43: HANDSHAKE; pv=9:max=4096:ver=9.6~007F To Client:chn=621;len=46: HANDSHAKE; pv=9:max=4096:ver=9.6~007F UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, err=0 err=0 err=0 err=0
  25. 25. Note the handshaking details. The client and urouter exchange their major Uniface version (9.6) to be sure they are compatible. No handshaking is done between urouter and userver, as you might have expected. The urouter expects the userver to be the same version (which is reasonable, as it has started the userver). The handshaking is more or less informational. A handshaking error is reported only when the client and server have a different major release. An exception here is for web requests. In that case, the client is the WRD which for some reason reports version 8.1 in the handshake. Next, the message from the client is received. In this case it is a database request (DBREQ) caused by the client doing a retrieve : 9 9 1 5 5 18F 19F 1a 2a 3a 1380822143 1380822143 1380822143 1380822143 1380822143 From Client:chn=621;len=160: DBREQ; typ=D;av=I;op=I;mod=129;iop=255;ign=0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; claimsrv: want sid=0 usr=cwnl-chris ust=userver mnem=ANY ex=0 strt=1 prepare_to_wait: for sid=1; ust=userver prepare_to_wait: Queued client entry #1 in server queue The “claimsrv” line above is important. The item want sid=0 signals that this client request is not for a specific userver. This client evidently has no transaction, state, or instances open in an userver, or else it would ask to be served by that specific userver (want sid=N). Instead, the client specifies that this request can be served by any userver that is available (i.e. in Idle state). This is typically what we would expect to see in a web application using stateless requests. Urouter, seeing that no uservers are running yet, decides that this request will be handled by the userver with sid=1, and queues it in anticipation of the userver becoming available. Next we see our userver being started: 9 9 5 5 20F 1380822143 svstart: starting server: user=emeacwnl-chris; pgm=d:uf96commonbinuserver.exe -srvid=1 -dnp=TCP:+13001||CD24AB7D-470A-43F8-9FA1-EFE98719820E| -drv=ANY -ust=userver /adm=d:uf96commonadm -dir=D:uf96 21F 1380822143 svstart: Succesfully launched server, new pid=4040 4a 1380822143 handle_wait: wait for server sid=1; ust=userver 5a 1380822143 handle_wait: wait for client entry #1 in server queue This is almost the same line as we saw for an exclusive userver, except for the absence of the –ex flag (obviously) and the –chn=NNN argument (this userver will communicate with urouter, not with the client). Now urouter will wait for the userver to connect back, which we see happening here, followed by the creating of a new thread and some housekeeping: 3 5 9 5 5 5 9 1 5 5 24s 25s 22F 26s 27s 28s 4Z 5Z 29s 30s 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 UNWTCP: TCPaccept: chn=665 got host=AMS090861D1.clients.emea.cpwr.corp on TCP4 UNWTCP: exit TCP(7706152) call=NETCONNECT, chn=665, lst=613, result=NET_SUCCESS, err=0 accepted new connection on TCP:+13001 UNWTCP: enter TCP(7720504) call=NETINSTANCE, chn=0, lst=613 UNWTCP: exit TCP(7720504) call=NETINSTANCE, chn=0, lst=613, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(7720504) call=NETCONNECT, chn=0, lst=613 thpsv: new thread received u=7719120, thp=7576760, net=7706152, upsv=0, rmth=0, tha=0 thpsv: new thread active chn=665, cnt=4, lst=1, pmq=0, cc=1 UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, err=0
  26. 26. Next we see the message from the userver coming in, asking to be registered as a shared server (SRVCON) : 9 23F 1380822143 9 24F 1380822143 9 25F 1380822143 9FA1-EFE98719820E) 5 1c 1380822143 9 26F 1380822143 9 27F 1380822143 From Server:chn=665;len=208: SRVCON; clt=(hst=172.16.43.135,AMS090861D1.clients.emea.cpwr.corp;pid=4040;tid=11328;sid=1;usr=cwnl-chris;ust=userver) log=(hst=TCP:AMS090861D1.clients.emea.cpwr.corp+13001;usr=cwnl-chris;ust=userver -drv=ANY;rid=CD24AB7D-470A-43F8srvload: local server registering sid=1;rid=CD24AB7D-470A-43F8-9FA1-EFE98719820E reguser: nid=172.16.43.135, node=AMS090861D1.clients.emea.cpwr.corp, pid=4040, ust=userver srvload: this is server sid=1 The server id (sid=1) is the id for this server. These ID’s are handed out sequentially by urouter and are the main keys in the urouter’s administration. The registration being done, urouter sends a confirmation answer (CONANS) back to the userver: 9 5 5 28F 1380822143 31s 1380822143 32s 1380822143 To Server:chn=665;len=3: CONANS; continue:sid=1: UNWTCP: enter TCP(7706152) call=NETPUT, chn=665, lst=613 UNWTCP: exit TCP(7706152) call=NETPUT, chn=665, lst=613, result=NET_SUCCESS, err=0 and searches its administration for a client that can be served by this userver (this will typically be the client that just posted the request) : 5 5 1 9 5 6a 7a 6Z 29F 8a 1380822143 1380822143 1380822143 1380822143 1380822143 notify_next_client: finding next client for sid=1 notify_next_client: client #1 in server queue can use sid=1 thpsv: thread exit, cnt=3, lst=1, pmq=0, cc=0 handle_wait: Queued client #1 continues with sid=1, ust=userver capable match reserved Having decided that queued client 1 and userver 1 are a match, urouter forwards the client’s database request (DBREQ) to the userver: 9 9 5 5 5 5 30F 31F 33s 34s 35s 36s 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 To Server:chn=665;len=160: DBREQ; typ=D;av=I;op=I;mod=129;iop=255;ign=0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; UNWTCP: enter TCP(7706152) call=NETPUT, chn=665, lst=613 UNWTCP: exit TCP(7706152) call=NETPUT, chn=665, lst=613, result=NET_SUCCESS, err=0 UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, err=0
  27. 27. You need to realize that a simple retrieve done by the client will result in a number of database requests being sent to the database driver via the urouter (Logon, Open Table, and Select/Fetch). The one you see here is actually the Logon request (never mind all those letter codes….) and the answer from the server promptly follows and is passed back to the client: 9 9 9 9 5 5 5 5 9 9 9 9 5 5 9 5 5 5 5 32F 33F 34F 35F 37s 38s 39s 40s 36F 37F 38F 39F 41s 42s 40F 9a 10a 43s 44s 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 1380822143 From Server:chn=665;len=118: ANSWER; typ=Z;av=I;op=M;ret=0,0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; To Client:chn=621;len=118: ANSWER; typ=Z;av=I;op=M;ret=0,0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, UNWTCP: enter TCP(7706152) call=NETGET, chn=665, lst=613 UNWTCP: exit TCP(7706152) call=NETGET, chn=665, lst=613, result=NET_SUCCESS, From Server:chn=665;len=1150: ANSWER; typ=Z;av=I;op=Z;ret=0,0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; To Client:chn=621;len=1150: ANSWER; typ=Z;av=I;op=Z;ret=0,0; hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, sid=1 ready: not locked and no state notify_next_client: finding next client for sid=1 notify_next_client: no clients found for sid=1 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 UNWTCP: exit TCP(7704552) call=NETGET, chn=621, lst=613, result=NET_SUCCESS, err=0 err=0 err=0 err=0 Note that there are actually TWO responses from the server, both passed back to the client. One of these is the server’s messageframe information, being send back to the client because we specified $ioprint=255. Without the use of $ioprint, you’ll see only one answer here. Depending on the level of ioprint, there can be more server responses. Note that after completely processing this request, the urouter reports the state of this userver : sid=1 ready: not locked and no state and will check if there are any other pending requests in the queue for this specific userver. There are none at this moment, so urouter goes and listens for the next client request. We see a similar exchange of data for the two other driver requests (Open Table and Select/Fetch) which we’ll not include here, it’s just more of the same.
  28. 28. Finally, when this client exits, we see urouter disconnecting the client (though NOT the userver) from its administration, and terminating the thread that was created for this client: 9 103F 1380822145 From Client:chn=621;len=36: CLSNETREQ; typ=X;av=I;op=Z;mod=0;iop=255;ign=0; 9 104F 1380822145 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 9 105F 1380822145 To Client:chn=621;len=32: ANSWER; typ=Z;av=X;op=Z;ret=0,0; 9 106F 1380822145 hop=0;dbg=0;pid=13288;tid=2884;qid=0;ins=0; 5 105s 1380822145 UNWTCP: enter TCP(7704552) call=NETPUT, chn=621, lst=613 5 106s 1380822145 UNWTCP: exit TCP(7704552) call=NETPUT, chn=621, lst=613, result=NET_SUCCESS, err=0 5 107s 1380822145 UNWTCP: enter TCP(7704552) call=NETGET, chn=621, lst=613 5 108s 1380822145 UNWTCP: exit TCP(7704552) call=NETGET, chn=0, lst=0, result=Ignoring error, err=-12 9 107F 1380822145 client gone, searching for servers to stop for client: [(AMS090861D1.clients.emea.cpwr.corp/172.16.43.135) cltpid=13288] 5 109s 1380822145 UNWTCP: enter TCP(7704552) call=NETDISCONNECT, chn=0, lst=0 5 110s 1380822145 UNWTCP: exit TCP(7704552) call=NETDISCONNECT, chn=0, lst=0, result=NET_SUCCESS, err=0 7Z 1380822145 thpsv: thread exit, cnt=2, lst=1, pmq=0, cc=0 This concludes our tour of the urouter log, in situations where everything goes well. This knowledge will come in handy when things don’t go well. You may have been wondering what the numbers mean at the beginning of each line, like for example 5 108s 1380822145 UNWTCP: exit TCP(7704552) call=NETGET, chn=0, lst=0, result=Ignoring error, err=-12 The first two are the level, sequence number and category of the message. These are of practical use to Compuware Technical Support only. The large number is the timestamp, in the standard format of seconds since the Epoch (i.e. since jan. 1, 1970, 00:00:00, or on Windows since dec.30, 1899, 00:00:00). It was displayed in this format because I mistakenly used this setting for the tracing: $trc_info cat,lvl,dtm where dtm means “datetime” (number of seconds since the Epoch), instead of the recommended $trc_info cat,lvl,dtt where dtt means “delta time and thread id” (elapsed time since the start of the log). Then the lines would have looked like this 9 F 0:00.664.41 t=1: URouter started at 24-oct-2013 14:16:41 The delta time is in the following format: minutes:seconds.milliseconds.microseconds Both formats make it hard to calculate the absolute datetime for a specific line, even though the starting time of the log is printed near the top: 9 F 0:00.664.41 t=1: URouter started at 24-oct-2013 14:16:41 To make life easier, we also print the absolute datetime whenever a real error is logged, e.g. 9 F 0:16.313.74 t=3: [Thu Oct 24 14:43:59 2013] err=-25: thpsv: Problems handling request
  29. 29. 4 - Server-side troubleshooting So, we have been able to verify that we’ve done everything right on the client, and we know what to expect from urouter when all goes well. Yet, there will usually be some problem that we need to find. Let’s examine the checks that can be done, using the one-step-at-the time approach, and the tools to use. For Windows, the tools were already mentioned. On Unix/Linux, the most important tool to know about is truss (which is called strace on Linux). As a rule, these programs come with the operating system. If they are not there, ask your system administrator to install them. We’ll talk more about truss/strace later. The first check is whether the urouter process is indeed running. On Windows, use Task Manager or Process Explorer. On Unix, the ps command. If it is running, shut it down because we are going to start from scratch. From the urouter shortcut, service definition, script, or whatever it is that starts the urouter, get the name of the assignment file. This will usually be uniface/adm/urouter.asn but this could have been overruled at the command line etc. It is important to be 100% sure about what assignment file(s) is/are being used. If you don’t know exactly, find it using Process Monitor (on Windows) or truss/strace on Unix/Linux. An example of doing this on Linux with strace: $ strace -o strace.log common/bin/urouter ^C $ grep '.asn' strace.log open("/h/chris/uf/96/lia/uniface/adm/usys.asn", O_RDONLY) = 3 open("/h/chris/uf/96/lia/common/adm/usys.asn", O_RDONLY) = 4 open("urouter.asn", O_RDONLY) = -1 ENOENT (No such file or directory) open("urouter.asn", O_RDONLY) = -1 ENOENT (No such file or directory) open("/h/chris/uf/96/lia/uniface/adm/urouter.asn", O_RDONLY) = 3 This tells you exactly what asn-files are being opened, and which are tried but were not found (the urouter.asn in the working directory). On Windows, you get similar information from Process Monitor if you filter on “Path ends with .asn”. If you THINK you know what assignment files urouter uses, and can’t or don’t want to run a tool to verify it, then insert a syntax error in it, and verify that urouter now refuses to start. Unfortunately it will not give a message of it, like a client application would. Although urouter does open and read uniface/adm/usys.asn, like any Uniface process does, there is usually nothing much here relevant to urouter. It is good practice to write the urouter assignment file so it is self-contained, and will not need assignments in usys.asn. A typical urouter assignment file is simple and small, e.g. [SETTINGS] $ioprint $putmess_logfile $default_net 255 urouter.log TCP:+13001||| [SERVERS] userver /h/chris/uf/96/lia/common/bin/userver /dir=/h/chris/uf/96/lia As a rule, you don’t need anything else besides the [SETTINGS] and [SERVERS] sections. The above is usually enough for a first successful test.
  30. 30. Note that there are different places where urouter’s port number can be defined. In the asn, on the command line, or in /etc/services. I find it most useful to keep this information in the asn-file: $default_net TCP:+13001||| So it can be easily changed and you don’t need to look in different places. When you have started the urouter , verify in the tracefile that urouter is indeed using this port number by looking for this line : started thread to listen to TCP:+13001 and then use netstat to check if it is indeed listening: $ netstat -a|grep 13001 tcp 0 0 *:13001 *:* LISTEN What if urouter does not start ? There could be several reasons for this: 1) Errors with loading dll’s or shared libraries (e.g. LD_LIBRARY_PATH not set, or on Windows the Uniface bin folder is not in PATH). Tools like Process Monitor, Dependency Walker, truss/strace will usually show what is wrong. 2) Invalid image type (e.g. trying to start a LIA executable on LIB). Make sure you have installed the correct platform. 3) Assignment statement errors. No message is given of this (unlike in a client) Check your urouter/userver assignment files for syntax errors by using them with IDF, e.g. $ $idf /asn=uniface/adm/urouter.asn 8008 - Assignment error: '[SETINGS]' in uniface/adm/urouter.asn:2 4) Logfile in use. No message is given of this. If you need to run multiple instances of urouter, make sure the logfile names specified in the urouter asn are unique by using one or more of the special tokens in the file name: Token ----%p %u %t %h expanded to ----------process id username timestamp hostname
  31. 31. 5) Port in use (another urouter already running on the same port number). No message is given of this but you can find this error in the urouter log: 1 8 1 1 9 5 5 9 s s s s F s s F UNWTCP: TCP6create : bind(): chn=6 hst=0.0.0.0 failed ret=98 Address already in use UNWTCP: TCPclose: chn=6 success UNWTCP: TCP6create : failed ret=98 UNWTCP: exit TCP(506652288) call=NETCREATE_SHARED, chn=0, lst=0, result=NETERR_UNKNOWN, err=98 can't create listen channel at TCP:+13001 UNWTCP: enter TCP(506652288) call=NETMSG, chn=0, lst=0 UNWTCP: exit TCP(506652288) call=NETMSG, chn=0, lst=0, result=NET_SUCCESS, err=0 TCP (98) TCP error [98]: Address already in use This is a good moment to talk about the main troubleshooting tool on Unix : truss (an acronym for Trace Utility for System Calls and Signals). On Linux, this program is called strace. On older versions of HP-UX, it used to be called tusc, but current versions have truss also. On machines truss/strace is available by default, but if it isn’t, and you have to troubleshoot, insist on getting it installed. It’s the first I turn to when troubleshooting any file related problem on Unix. From here in we use the name truss to denote either strace or truss. The programs are largely the same but for specific options you always need to consult the local man page. You can use truss on one of two ways: 1) Start a program under truss, using the full command line as argument. For example $ truss common/bin/urouter /pri=255 2) Hook truss up to an already running process, using the process id (pid) as argument. For example $ truss –p pid In this most basic scenario, truss outputs on the screen all Unix system calls made by the process, with their argument and return values, and signals. Be prepared for a lot of output, even the simplest one-liner C program can already produce a page or more of output. There are many command line parameters to control what you want to see and what you don’t. For example, if you are interested only in seen which files are accessed, you can add the –topen argument. For example: $ truss -topen,write $idf /who produces output like this:
  32. 32. open("/var/ld/64/ld.config", O_RDONLY) = 3 open("/usr/lib/64/libc.so.1", O_RDONLY) = 3 open("/usr/lib/64/libdl.so.1", O_RDONLY) = 3 open("/usr/platform/SUNW,Sun-Fire-V490/lib/sparcv9/libc_psr.so.1", O_RDONLY) = 3 open("/.machine", O_RDONLY) Err#2 ENOENT open("/var/ld/64/ld.config", O_RDONLY) = 3 open("/usr/lib/secure/64/s9_preload.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libCrun.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libm.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libCstd.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libc.so.1", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libucall.so", O_RDONLY) = 3 open("/lib/sparcv9/libnsl.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libsocket.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libdl.so.1", O_RDONLY) = 3 open("/lib/sparcv9/librt.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libmalloc.so.1", O_RDONLY) = 3 open("/lib/sparcv9/libthread.so.1", O_RDONLY) = 3 open("/usr/lib/64/libmp.so.2", O_RDONLY) = 3 open("/usr/lib/64/libaio.so.1", O_RDONLY) = 3 open("/usr/lib/64/libmd5.so.1", O_RDONLY) = 3 open("/usr/platform/SUNW,Sun-Fire-V490/lib/sparcv9/libc_psr.so.1", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libulib.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libuenc.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/libdlm64.so", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/lib/liburtl.so", O_RDONLY) = 3 open("/var/run/name_service_door", O_RDONLY) = 3 open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/ADM/USYS.INI", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/uniface/adm/usys.asn", O_RDONLY) = 4 open("/h/chris/uf/96/so9/common/adm/usys.asn", O_RDONLY) = 5 open("idf.asn", O_RDONLY) = 4 open("/usr/share/lib/zoneinfo/MET", O_RDONLY) = 4 open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/adm/usys.ini", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/ADM/USYS.INI", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/usys.urr", O_RDONLY) = 5 open("/h/chris/uf/96/so9/common/usys/udesc.urr", O_RDONLY) = 5 open("/h/chris/uf/96/so9/common/usys/uobj.dol", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/uobj.dol", O_RDONLY) Err#2 ENOENT open("/H/CHRIS/UF/96/SO9/COMMON/USYS/UOBJ.DOL", O_RDONLY) Err#2 ENOENT open("/h/chris/uf/96/so9/common/usys/usys.dol", O_RDONLY) = 5 I find truss most useful to find errors on opening files : what files does the program try to open, where does it look for them, and what is the result. In particular, the loading of shared libraries is interesting because it is the only way to make sure from which directory in the LD_LIBRARY_PATH is actually loaded. For this purpose you need to look at the open and stat system calls. For example if you do $ truss -o truss.log -topen,stat common/bin/urouter
  33. 33. you can see in the output only the calls used to locate and open files: stat("/h/chris/uf/96/so9/common/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/h/chris/uf/dlm41/Linux/64/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/local/products/dbms/oracle1020/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/local/products/dbms/oracle1020/rdbms/lib/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/cwnl/solaris/compilers/cc57CC57/SUNWspro/lib/rw7/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/cwnl/solaris/compilers/cc57CC57/SUNWspro/lib/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/opt/SUNWspro/lib/v9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/usr/ccs/lib/sparcv9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) Err#2 ENOENT stat("/lib/sparcv9/libsocket.so.1", 0xFFFFFFFF7FFFE2F0) = 0 open("/lib/sparcv9/libsocket.so.1", O_RDONLY) = 3 showing the multiple locations where the system tries to find a shared library. The above is not exceptional but sometimes the list of failed attempts gets really long which is of course not good for performance. It is good practice to review your LD_LIBRARY_PATH (or LD_LIBRARY_PATH64, or LIBPATH, or SHLIB_PATH, depending on your platform) and make sure there are no unwanted, non-existing or duplicate directories, and that the most used directories (like the Uniface lib directory) are not at the very bottom of a long list. Some truss flags you need to know about: -o file Directs the output to a file, which is usually preferable over getting tons of stuff on your screen. Truss does not work well with I/O redirection. -t func1,func2,… -e trace=func1,func2,… (truss) (strace) Directs truss only to trace the specified functions. -t !func3,func4,… -e trace=!func3,func4,… (truss) (strace) Directs truss NOT to trace the specified functions. -t !nanosleep -e trace=!nanosleep (truss) (strace) Prevents the output from filling up with nanosleep() calls. Calling this function is what urouter does when it has nothing better to do. You’ll want to use this especially when you hook up to a running urouter. -w 2 Trace full I/O buffers for file descriptor 2 (this is the Unix standard error channel, stderr). Very handy to see system-level error messages that would otherwise be lost. We’ll see an example of this later on. Other file descriptors could be used here too, obviously, like 1 for standard output, stdout.
  34. 34. -f Follow child processes. Use this when you also want to trace all executables spawned by your program. When tracing urouter, you generally do not want to trace all the uservers. When tracing an userver, you generally DO want to trace spawned program. However be aware that this option can lead to enormous amounts of output, especially when shell scripts are being executed. Back to the troubleshooting trip. Having ascertained that we can reach port 13000 on the server, and that urouter is indeed running on that port, it should be possible to make a connection. If you start the client, a line like this should appear in the urouter log 9 10F 1379682041 accepted new connection on TCP:+13001 and in netstat, you should now see an additional connection with status ESTABLISHED: $ netstat -a | grep 13001 *.13001 *.* 0 0 49152 0 LISTEN cwnl-violet.13001 AMS090861D1.clients.emea.cpwr.corp.63508 65024 0 49640 0 ESTABLISHED The urouter now receives the network logon path provided by the client. As an example, suppose the client logon path is $psv = tcp:violet+13001|chris|urouter2013|prod001 –ex which means the client requested to start an exclusive userver on the machine violet under the account chris with password urouter2013 and with UST prod001. The urouter must translate the UST prod001 into an actual userver command line that can be started. This information is found in the urouter’s assignment file.
  35. 35. The UST definition. The urouter assignment file contains the parameters for each known userver (known as the UST definition) in the [SERVERS] section, e.g. [SERVERS] prod001 /h/chris/uf/94/so9/common/bin/userver /dir=/home/prod001 /pri=255 In the above example the UST is prod001, which is what the client requested in the UST part of the network logon path. The definition for this UST is thus /h/chris/uf/94/so9/common/bin/userver /dir=/home/prod001 /pri=255 /max=1 This looks like a full userver command line but it isn’t – not yet, anyway. Certain parts, like /dir and /max, will not be passed to userver but processed by urouter before starting the userver. Urouter will add command line arguments of its own to the userver command. It is important to realize that there is NO SYNTAX CHECKING here. Everything that is not recognized by urouter is passed verbatim to the userver. That could be command line switches (starting with / or - ) or program arguments. They will be put in the correct order by urouter, i.e. switches first, arguments last). The full userver command line can be found in the urouter log, as we have already seen, in the line starting with svstart: The full command line for the above UST will be 9 F 0:03.905.09 t=3: svstart: starting server: user=chris; pgm=/h/chris/uf/94/so9/common/bin/userver/h/chris/uf/94/so9/common/bin/userver -srvid=1 -pri=255 -dnp=TCP:+13001||0AF1C028-3B14-11E3-B6E9-A25E8C4312FA| -drv=ANY -ust=userver -chn=8 -ex -dir=/h/chris/uf/94/so9 The /dir switch needs special mention. When specified, urouter attempts to make this the working directory for the userver process. However there is NO ERROR CHECKING. When it fails, because the directory does not exist, or is misspelled, or has no permission, urouter will proceed to start the userver in the currecnt directory. Additionally, it also passes the /dir command to the userver (this is a bug…) who will silently exit because it cannot set the directory. TIP – A nice undocumented feature, available as from R122, E102 and 9.6.01, is the possibility to run the userver under a debugger-like program without the need for a wrapper script or program. This is a great way to diagnose problems encountered during userver startup. This can be done in the UST definition by prepending the debugger command and a + sign before the name of the userver. The debugger program in question will usually be truss (on Unix) or Dependency Walker (in Windows). For example: userver = /usr/bin/truss -f -o /tmp/truss.log + /uf/bin/userver /dir=... or userver = c:toolsdepends.exe /pb /od:c:userver.dwi + D:ufbinuserver.exe /dir=... The plus sign is essential to avoid urouter mashing up the command line and get confused about which switches belong to which program.
  36. 36. Please note that full path names must be used for truss and depends.exe. In this stage, there isn’t a command interpreter that will go and look for a command in a list of locations. It has to be just right. When in doubt where a command is located, use the command which (on Unix) or where (on Windows). For example: $ which truss /usr/bin/truss or C:Userscwnl-chris>where depends D:Toolsdepends.exe Another trick here is have the UST refer to a wrapper script instead of the userver executable : userver = /uf/bin/userver.sh /dir=... which gives you complete flexibility about what to do before the userver is launched. In its most basic form, the wrapper script looks like this #!/bin/ksh . /h/chris/uf/94/so9/common/adm/insunis /h/chris/uf/94/so9/common/bin/userver $* simply passing all arguments on to the userver. The nice thing is that here you have an opportunity to 1) 2) 3) 4) Set any environment variables the userver needs (think of ORACLE_HOME, ORACLE_SID, DSQUERY, etc.) Do initializing, logging, etc Check userver exit code and handle possible core dumps Use truss if your version is older than R122 or E102 so you can’t use the above trick. This is described in more detail in this article http://frontline.compuware.com/products/uf/tech/22752.aspx (note: the exec command used in this article is no longer needed in recent versions). Don’t forget to set the executable bit on the script. I have never attempted a wrapper script on Windows and don’t know if it is possible. Having parsed the UST definition, urouter will handle the security aspects of the request. It is important to understand what exactly is about to happen now. The keywords are authentication and impersonation.
  37. 37. Authentication First of all the username and password provided by the client must be checked for validity on the target system. This validation consists of 3 checks: • • • Username and password are both required. Neither of them can be empty. The user must exist (i.e. has an active account) on the target system. The password for this username must be valid. If one of these checks fails, urouter rejects the connection request and returns error -21 (Authentication error), causing the client to report a logon failure: Client connect error: -21: cretpsv: Authentication of user/password failed for user foobar 8061 - Network error detected ( (0)). Logon (TCP:violet+13001|foobar|***|userver -ex) failed with status -21, Network logon error Don’t be worried about the message “Network error detected”. This is Uniface speak for “I could not get connected to the userver”. If there really is a network error you would see the details of it. It is worth noting that in most cases, authentication does not requires root rights. But in some circumstances it does, e.g. on a Unix system that uses shadow passwords, because /etc/shadow is readable only by root. On Windows, processes require a set of User Rights. Discussing these is outside the scope of this presentation. A Uniface installation sets the correct rights for urouter and userver. If however Windows still reports “ A required privilege is not held by the client” when trying to start a urouter or userver, it is hard (or maybe impossible) to find out what specific user privilege it is missing. When in doubt it usually helps to assign these rights to all users involved: • • • • • Act as part of the Operating System Create a token object Log on as a batch job Log on as a service Replace a process level token
  38. 38. Impersonation Before starting an userver, the urouter must create a subprocess with the user credentials of the specified user. In Unix terms, it needs to change the userid. For obvious reasons, this action always requires root permission. However, if the desired userid is equal to the current userid, there is no need to change it, and urouter will skip this action. In this case the urouter does not need to run as root, but it means that you can can use only one user for all your uservers, i.e. the user as which the urouter is running. Things work slightly different on Windows. Here, authentication and impersonation are not separate actions, but are both implemented by the CreateProcessAsUser() function. This function is called regardless of whether the urouter and userver have the same user id. Typically on Windows, urouter will be running as a service user the system account NT_AUTHORITYSYSTEM. Uniface provides a mechanism to implement custom authentication, called a Security Driver. The idea is that customers can develop their own 3GL code in C to make urouter perform whatever authentication they desire. This C code must use the macros defined in the include file uniface3glincludezsecint.h and the code must export the function usecappl() which can be called by urouter. Because this is implemented by a C call-out, a security driver must be included in the [USER_3GL] section of the urouter’s assignment file: [user_3gl] mysecdriver(usecappl) Note that it is ONLY the authentication you can customize here, not the impersonation. For this reason security drivers are mainly used on Unix. On Windows, a security driver could be used to impose additional authentication for an userver. This step is then performed before calling CreateProcessAsUser(), which will in turn still do the standard Windows authentication. As far as I know, no security drivers are being deployed on other platforms than Unix. Besides custom authentication, a security driver can also optionally implement encryption of network logon strings and/or postmessage headers. The sample security driver provided with Uniface does just this, and can be activated in the asn like this [user_3gl] zsecdrv(usecappl) Note that there is however no way in Uniface to encrypt all data traffic passing from client to userver. Such functionality is not currently on the roadmap. • • A security driver that does only authentication needs to be included in the urouter assignment only. A security driver that does encryption of logon strings and/or message headers must be included in ALL assignment files (client, urouter and userver).
  39. 39. In practice, few if any customers write their own security driver, but over the years a couple of security drivers have been provided by Technical Support to various customers: NAME hpux dummy pam upass PURPOSE Support for shadow passwords on HP-UX and Trusted HP-UX (both are not handled by the default authentication) A security driver that does NO authentication, i.e. does not check the password (but SOME bogus password must be supplied because it is mandatory). Note that the user must still be a valid user, because it will be used in the impersonate step. On Solaris, authenticates using the PAM (Pluggable Authentication Module). By default, Uniface only uses the PAM on Linux. A combination of a security driver and setuid-root program that allows urouter to run as a normal user on a system using shadow passwords. Recall that validation a shadow password requires root access. Some customers object to an urouter being root-owned. With this implementation, the actual validation is done by executing the setuid-root program upass. Available from ftp://ftp.compuware.com/pub/uniface/outgoing/cbr/upass-3.06.tar While the sample code and include file of the security driver look rather complicated, the actual code can be very succinct. For example the source code of the dummy security driver reads like this, nicely illustrating the concept: #include "zsecdrv.h" /* Include header from uniface/3gl/include */ long usecappl (USecDrv *Sec) { if ( Sec->Function == USEC_DRVINFO ) { USecSetUserPassVal(Sec); } return USEC_SUCCESS; } /* Entry point to be called by urouter */ /* Urouter to secdriver: What functions do you implement ? */ /* Secdriver to urouter: I do user/password validation only */ /* Actual validation simply returns OK always */
  40. 40. Setting the userver’s environment. For an userver to work properly, it will typically need a bunch of environment variables – at least on Unix, less so on Windows. Think of the variables USYS etc. set by insunis, the variables needed to access a database like ORACLE_SID or DSQUERY, and the variables needed to locate shared libraries (LD_LIBRARY_PATH and friends). There are several ways an userver on Unix can obtain its necessary environment: • By inheritance from its parent, the urouter. If you set all variables in a script or terminal session before starting urouter, all uservers get them too. • By using the –su option in the UST definition. This will cause urouter to run su to execute the target user’s logon profile. This is however a complicated process and not all user profiles are suitable to be executed in a server process (because they may do something with the screen or keyboard). Also, the use of /su may be restricted on some systems. This is why I never recommend using /su but instead the 3rd option: • By using a wrapper script as described earlier. This is an efficient and convenient way to make sure each userver gets exactly what it needs.
  41. 41. Starting the userver. Having successfully authenticated the username and password, (on Unix) forked a subprocess with the target user’s credentials, and ( if necessary) having set the working directory and user environment, urouter is now ready to start the actual userver process. This is a point where a lot of things can go wrong, and it is the most problematic stage to troubleshoot, as we give control to the operating system and have to wait until an userver is up and running. This first class of problems that can be encountered here are issues with the userver executable itself, preventing the OS from loading it. These include: File not found The file may have been moved, or you could have misspelled the name Insufficient permission The file does not have read and/or execute permission for the user in question Wrong type of executable Possibly the file is for another platform (e.g. LIA instead of LIB) In all these cases, the urouter does not actually detect the problem. Instead it suggests that the userver has been successfully launched - but that only means, at this point, that a new process has been created which still needs to execute the userver. The failure to execute is not trapped, which I believe to be a bug, and instead the urouter times out waiting for the userver to respond. You find in the urouter log (some lines left out for brevity) : 9 F 9 5 9 9 9 F a F F F 4:17.968.17 t=3: svstart: starting server: user=chris; pgm=/this/is/a/bogus/path//userver -srvid=1 -dnp=TCP:+13001||BA3CE7A2-3AFF-11E3-95CF-BC93302E3648| -drv=ANY -ust=userver -chn=8 -ex -dir=/foo/bar 4:17.971.48 t=3: svstart: Succesfully launched server, new pid=18243 4:17.971.60 t=3: handle_wait: wait for server sid=1; ust=userver 5:20.726.65 t=1: clean_sweep: Server startup timed out after 63 seconds, sid=1 5:20.727.29 t=1: Stopping server sid=1; shut=0 mode=normal 5:20.728.26 t=3: [Tue Oct 22 11:58:16 2013] err=-25: getsrv: handle_wait wait failed And the client only reports a -25 error: 8061 - Network error detected ( (0)). Logon (TCP:violet+13001|chris|***|userver -ex) failed with status -25, UServer unexpectedly gone The client error text is “UServer unexpectedly gone is also a little misleading”, suggesting that there has been an userver process when in fact there never was one. Problems like this are best tackled by running Process Monitor (on Windows) or running the userver under truss. That will reveal the actual problem. Or, you can use a wrapper script which picks up the standard output channels of userver (which by default get lost for a server).
  42. 42. The next class of problems are userver startup problems. That is, the OS has successfully started the userver process, but it exits more or less immediately (in any case before reporting back to urouter, and before being able to produce a log- or trace file) because of some initialization error. Some common problems : 1) Assignment statement error. For example, the userver asn-file contains some unrecognized word, for example the first line is [BLAAAAAAA] This will also cause a server timeout and -25 error. As said earlier, userver does not log this kind of error anywhere, which is a bit of a pain. It is therefore good practice to sanity-check the userver assignment file. Two ways to do this: • Run idf with the userver’s assignment file. You don’t expect idf to start, only report the error: $ $idf /asn=common/adm/userver.asn 8008 - Assignment error: '[BLAAAAAAA]' in uniface/adm/userver.asn:1 • Run the userver under truss in a command window: $ truss common/bin/userver . . . open("/h/chris/uf/94/so9/uniface/adm/userver.asn", O_RDONLY) = 4 read(4, " [ B L A", 4) = 4 read(4, " A A A A A A ]nn [ S E".., 1020) = 808 close(4) = 0 lseek(1, 0, SEEK_CUR) = 1723552 lseek(2, 0, SEEK_CUR) = 1723587 lseek(2, 0, SEEK_CUR) = 1723622 lseek(1, 0, SEEK_CUR) = 1723657 lseek(2, 0, SEEK_CUR) = 1723692 lseek(2, 0, SEEK_CUR) = 1723727 _exit(1) If you see userver exiting immediately after reading some assignment line, you can be sure that line is wrong. In earlier Uniface versions, you could also get an “Assignment statement error” when the log or trace file could not be written. In recent versions, this is no longer a fatal error. Userver will now create a file userverNNNNN.log in the current directory and write the error(s) in it, e.g. ULOG Error: Failed to open log /foo/userver.log ULOG Error: Failed to write to /foo/userver.log There are also assignment statement errors which are not detected at startup, only when the specific assignment is being used. Think for example of database connector parameters. These will be parsed and checked by the connector in question when the path is first accessed, and as a rule, a clear message is given in the Uniface message frame.
  43. 43. 2) Failure to load a shared library. Suppose someone has deleted the DLM installation directory. The userver will then not be able to start as it statically depends on libdlm64.so. In a terminal window this is easily detected by simply running userver from the command line: $ common/bin/userver ld.so.1: userver: fatal: libdlm64.so: open failed: No such file or directory In a server environment, this is harder to detect as the standard error channel is not preserved, nor does an error like this end up in the urouter log (and an userver log is not created because the process cannot load). Also, the environment may not be the same as in a terminal window. Best thing is to run userver under truss. Let’s use some custom truss flags: [SERVERS] userver /usr/bin/truss -t write -w 2 -o truss.log + /h/chris/uf/94/so9/common/bin/userver specifying to trace only write calls, and display the full I/O buffer for file descriptor 2 (stderr). This produces the output that nicely shows the problem: write(2, l d . 4 . s o r 0xFFFFFFFF7F332790, 77) = 77 s o . 1 : u s e r v e r : f a t a l : l i b d l m 6 o : o p e n f a i l e d : N o s u c h f i l e d i r e c t o r yn 3) Userver is of wrong architecture or linkage. For example if you try to run Uniface 9.6 for Redhat EL Linux 6.x on Redhat EL 4.x you get the error $ common/bin/userver common/bin/userver: error while loading shared libraries: requires glibc 2.5 or later dynamic linker As you see this is easily diagnosed from the command line, but for good measure it can also be shown with truss, as in the previous example. From here on, the userver should be able to produce a logfile, so that we no longer have to grope in the grey area between urouter and userver. You can still get network errors in the client, though, particularly if the userver dies prematurely, but at least you will now have some logging available. Before moving on to all the problems that can still occur, some more tips.
  44. 44. Manage your shared library paths. On Windows all executable files (.exe and .dll files) are found via one environment variable, PATH. Typically, any software installation puts its directories in this system-wide variable, so that programs will mostly pick up the correct files. On Unix this is different and more complex. On Unix, separate environment variables are used. The PATH variable to locate commands, the LD_LIBRARY_PATH variable (LD_LIBRARY_PATH _64 on Solaris) to locate shared libraries. Furthermore on Unix, environment variables are local to the process that defines them, unless you export them, in which case they become visible to all child processes. A variable defined in a certain process is never visible in other processes except the defining process and its children. This is why you need to execute the insunis script, which defines a lot of environment variables for Uniface, with the dot prefix: $ . common/adm/insunis so that it is executed by the current shell. If you forget the dot, the Unix shell will execute the commands in a subshell, and the variables will only be valid for that subshell, which exits immediately after. Upon returning to the current shell, they are gone and forgotten. The variable exported in insunis that concerns us is LD_LIBRARY_PATH(_64). For example this one from Solaris: LD_LIBRARY_PATH_64=$USYSLIB:/h/chris/uf/dlm41/SunOS/64:$LD_LIBRARY_PATH_64 ; export LD_LIBRARY_PATH_64 By default, this takes care of the Uniface and DLM libraries. As a rule, you will need to add the directories for any databases you use, e.g. the Oracle bin directory. As seen earlier, long paths can lead to needlessly long search trips for a executable or shared library. It is good practice to keep your paths organized and free from duplicate, unused, or wrong directories. If the list is still long it can help performance to order the list according to usage, i.e. the most often used directories first. On Solaris, you can also manage the locating and loading of shared libraries with the crle (configure runtime linking environment) command. It’s beyond the scope of this paper, but worth keeping in mind.
  45. 45. Pre-starting servers. A good way to troubleshoot uservers in advance of testing the application is to use the [PRE_START] section in the urouter asn-file. Surprisingly, this is possible for exclusive as well as shared uservers. A pre-started exclusive userver is visible in the Urouter Monitor, showing the /ex flag, until such time as a client connects to it. At that moment, it disappears from view. To pre-start an userver, include the complete logon path (as specified on the client) in the [PRE_START] section, for example [PRE_START] tcp:localhost+13001|chris|bla|userver Urouter and DLM. Urouter does not do anything with licensing, in the sense that it does not require a license file or server. It does not check out any features. However on Unix, you cannot start urouter unless DLM is installed and the DLM directory added in LD_LIBRARY_PATH. This is because urouter is statically dependent on libulib.so, which in turn is statically dependent on libdlm64.so. So if that dependency cannot be resolved, you may see an error like this when starting urouter: $ urouter ld.so.1: urouter: fatal: libdlm64.so: open failed: No such file or directory Killed
  46. 46. 6 - Troubleshooting runtime problems. We have covered just about all situations that can prevent an userver from being started and going about its job. Some things can still go wrong in the early stage. License error. Unlike urouter, which does not use DLM, an userver will typically want to check out a license feature (unless it uses only the Sequential Driver, $SEQ, which is free of license). A license error is passed back to the client message frame, e.g. Server: Using license option LM_LICENSE_FILE 7188@lic.emea.cpwr.corp Server: Checkout USRVORA: -1 Server: The licensed number of concurrent users has been reached; try again later. Borrowed :: The application that was requested is not licensed. 7188@lic.emea.cpwr.corp :: A connection could not be established between this client and the license server. 7188@lic.emea.cpwr.corp :: The licensed number of concurrent users has been reached; try again later. compulock :: The application that was requested is not licensed. Fatal error: 8011 - License not available.Server: Fatal error: 8011 - License not available.Server: 2013-10-25 10:23:24.88 - Uniface session stopped
  47. 47. Failure to load a shared library / Database environment not set up We have already discussed shared library loading problems, but that was for libraries which were statically linked. Most shared libraries, in particular the database connectors, are loaded dynamically upon first use. As with static dependencies, this requires LD_LIBRARY_PATH to be set. If that is not the case, Uniface cannot load the database connector. This is not apparent on the client side: Server: Using license option LM_LICENSE_FILE 7188@cwnl-license1.emea.cpwr.corp Server: Feature USRVORA expires in 74 days Server: Checkout USRVORA: 1 [-2](_read) READ:2 [-2] done<end of module> as it just reports error -2 (Occurrence not found). The userver trace file reveals the real problem: 9 9 9 8z 16F 9z SYS_I010: dlopen: ld.so.1: userver: fatal: uora62.so: open failed: No such file or directory Unable to open uora62; error ld.so.1: userver: fatal: uora62.so: open failed: No such file or directory SYS_I011: udllvec: UDBORA00 not found in uora62 Although this is also misleading. It suggests that libuora62.so could not be found, whereas the actual problem is that this file is there but it cannot locate the Oracle libraries. Recent versions of Uniface do a better job of reporting the exact problem: Could not load /h/chris/uf/94/so9/common/lib/libuora64.so. dlerror: ld.so.1: uniface: fatal: libclntsh.so.11.1: open failed: No such file or directory From this point, an userver will usually be communicating with its client, and, at least initially, working normally. The common things that can still go wrong are • • • • Crashes Hangs Memory and CPU usage Urouter refuses to start more uservers
  48. 48. Troubleshooting crashes. Terminology first. Some people call every problem a crash, for instance when userver reports an unexpected/fatal error to the client, or if urouter reports a dead server. At Compuware we use the more strict meaning of crash,. which is that the program has terminated and this has been reported by the operating system. On Unix, you know a program has crashed when it suddenly ends with Segmentation Fault(coredump) or Bus Error (coredump) On Windows, you know a program has crashed when you get one of the various Windows popup saying it “has experienced a problem and needs to be shut down”, “terminated unexpectedly”, “has stopped working” or something alike. Uniface fatal errors like “9010 Out of memory” or “9024 Logon error” are not crashes – just Uniface fatal errors. Typically, crashes are “handled” by the operating system in one way or another. On Unix, a crash usually results in a core dump being created in the program’s current directory. Depending on the type of system and configuration, a crash can also be logged in the local syslog. Traditionally the name of the core dump file is core but different systems have different rules. Linux sensibly adds the process id to the name, e.g. core.2525. On some systems, core files can also be disabled, and/or specific naming rules assigned (see e.g. the coreadm command on Linux). When a crash is reported or suspected, Compuware Support will always ask for the core file, which contains valuable information about the state of the system at that moment, and in particular the stack trace. Some useful commands to look for and examine core files: $ ls -l core* -rw------1 chris $ file core core: chris 41490316 Oct 25 17:55 core ELF 64-bit MSB core file SPARCV9 Version 1, from 'userver' $ strings core | head CORE userver /h/chris/uf/94/so9/common/bin/userver -srvid=1 -dnp=TCP:+13001||39A3B9EA-3D84-1 . . . or $ strings core | grep userver
  49. 49. As you see core files can get quite big, especially if the process was eating memory before it crashed. It can also be hard to examine a core file on another system than it was generated on. For these reasons Compuware will usually ask for a stack trace generated from the core, rather than the file itself. For this you need to have the name of the core file, the name of the executable that dumped it (as shown above you can see that with the strings command), and a debugger like dbx, gdb, or wdb. For our purpose, all 3 debuggers work the same. Invoke your debugger, for example gdb on Linux, with the userver executable and core file names as arguments: $ gdb /h/chris/uf/94/so9/common/bin/userver core . . . t@1 (l@1) terminated by signal SEGV (Segmentation Fault) 0xffffffff7e1a5aac: _so_recv+0x000c: bcc,pt %icc,_so_recv+0x28 (dbx) ! 0xffffffff7e1a5ac8 This already displays the type of crash and location. Enter the command where to display the stack trace: (dbx) where current thread: t@1 =>[1] _so_recv(0x4, 0xffffffff7fffbede, 0x2, 0x0, 0xffffffff7de0e318, 0x73), at 0xffffffff7e1a5aac [2] do_recv(0x100171580, 0x6, 0xffffffff7fffbede, 0x2, 0xffffffff7de265b8, 0xffffffff7bc08490), at 0xffffffff7bc0347c [3] TCPreceive(0x100171580, 0x6, 0xffffffff7fffc280, 0x1000, 0xffffffff7fffc1b8, 0x10589c), at 0xffffffff7bc03ba4 [4] UNWTCP(0x100171580, 0x73, 0x10012a0f0, 0x0, 0x0, 0x0), at 0xffffffff7bc07278 [5] dorcv(0x100171580, 0x200, 0xffffffff7fffe678, 0x0, 0x47, 0x0), at 0xffffffff7be0196c [6] recmsg(0x100171580, 0x0, 0xffffffffffffffff, 0xffffffff7fffeac8, 0x1, 0xe0), at 0xffffffff7be01ec8 [7] umwgo(0xffffffff7fffe980, 0x0, 0xffffffffffffffff, 0x10012a0f0, 0xffffffff7be03f68, 0xffffffff7c2645d0), at 0xffffffff7c0c5f30 [8] urecmsg(0x100171580, 0x0, 0x200, 0xffffffff7fffeac8, 0x1, 0x0), at 0xffffffff7c0c66c4 [9] srvloop(0x10012a0f0, 0x0, 0x7ffd, 0xffffffff7fffeac8, 0xffffffff7b70a020, 0x1), at 0xffffffff7b60868c [10] USERVERSTART(0x10012a0f0, 0x50, 0x100129190, 0x100bec, 0x1, 0xffffffff7b70a020), at 0xffffffff7b609630 [11] USRVMAIN(0x10012d7b0, 0x10012d8cc, 0x5, 0x10012a0f0, 0x100101440, 0x0), at 0x100001168 [12] UMAIN(0x5, 0xffffffff7fffee58, 0xffffffff7fffed98, 0x0, 0x100101ac8, 0x100000ee8), at 0xffffffff7df03d20 [13] main(0x5, 0x40, 0x0, 0x10070c, 0x0, 0x100101440), at 0x100000d68 (dbx) For debuggable binaries (sometimes provided by Support to help troubleshooting) this would also show the source and line number.
  50. 50. When you have neither of these debuggers installed, good old adb (Absolute Debugger - should be present on all Unix systems) will do the job, albeit without the ability to display source information. The command line syntax is the same as for dbx/gdb/wdb. Note that adb does not display a prompt, it just sits there waiting for your input. The command you enter to display the stack trace is $c : $ adb /h/chris/uf/94/so9/common/bin/userver core core file = core -- program ``/h/chris/uf/94/so9/common/bin/userver'' on platform SUNW,Sun-Fire-V490 SIGSEGV: Segmentation Fault adb: warning: core file is from SunOS 5.10 Generic_142909-17; shared text mappings may not match installed libraries $c libc.so.1`_so_recv+0xc(100171580, 6, ffffffff7fffbede, 2, ffffffff7de265b8, ffffffff7bc08490) libutcp10.so`TCPreceive+0x2c(100171580, 6, ffffffff7fffc280, 1000, ffffffff7fffc1b8, 10589c) libutcp10.so`UNWTCP+0x670(100171580, 73, 10012a0f0, 0, 0, 0) libumwpsv10.so`dorcv+0x84(100171580, 200, ffffffff7fffe678, 0, 47, 0) libumwpsv10.so`recmsg+0x130(100171580, 0, ffffffffffffffff, ffffffff7fffeac8, 1, e0) liburtl.so`umwgo+0xf8(ffffffff7fffe980, 0, ffffffffffffffff, 10012a0f0, ffffffff7be03f68, ffffffff7c2645d0) liburtl.so`urecmsg+0x34(100171580, 0, 200, ffffffff7fffeac8, 1, 0) libuserv.so`srvloop+0x1e4(10012a0f0, 0, 7ffd, ffffffff7fffeac8, ffffffff7b70a020, 1) libuserv.so`USERVERSTART+0x200(10012a0f0, 50, 100129190, 100bec, 1, ffffffff7b70a020) USRVMAIN+0x280(10012d7b0, 10012d8cc, 5, 10012a0f0, 100101440, 0) libucall.so`UMAIN+0x38(5, ffffffff7fffee58, ffffffff7fffed98, 0, 100101ac8, 100000ee8) main+0x38(5, 40, 0, 10070c, 0, 100101440) _start+0x17c(0, ffffffff7fffee58, ffffffff7f60e640, ffffffff7f1bf588, ffffff00, ffffffff7f710000) To exit adb, press Control-D. On Solaris and HP-UX, you can also use the pstack command to produce the stack trace: $ pstack core Producing output like this: $ pstack core core 'core' of 28762: /h/chris/uf/94/so9/common/bin/userver -srvid=1 -dnp=TCP:+13001||39A3B9 ----------------- lwp# 1 / thread# 1 -------------------ffffffff7e1a5aac _so_recv (100171580, 6, ffffffff7fffbede, 2, ffffffff7de265b8, ffffffff7bc08490) + c ffffffff7bc03ba4 TCPreceive (100171580, 6, ffffffff7fffc280, 1000, ffffffff7fffc1b8, 10589c) + 2c ffffffff7bc07278 UNWTCP (100171580, 73, 10012a0f0, 0, 0, 0) + 670 ffffffff7be0196c dorcv (100171580, 200, ffffffff7fffe678, 0, 47, 0) + 84 ffffffff7be01ec8 recmsg (100171580, 0, ffffffffffffffff, ffffffff7fffeac8, 1, e0) + 130 ffffffff7c0c5f30 umwgo (ffffffff7fffe980, 0, ffffffffffffffff, 10012a0f0, ffffffff7be03f68, ffffffff7c2645d0) + f8 ffffffff7c0c66c4 urecmsg (100171580, 0, 200, ffffffff7fffeac8, 1, 0) + 34 ffffffff7b60868c srvloop (10012a0f0, 0, 7ffd, ffffffff7fffeac8, ffffffff7b70a020, 1) + 1e4 ffffffff7b609630 USERVERSTART (10012a0f0, 50, 100129190, 100bec, 1, ffffffff7b70a020) + 200 0000000100001168 USRVMAIN (10012d7b0, 10012d8cc, 5, 10012a0f0, 100101440, 0) + 280 ffffffff7df03d20 UMAIN (5, ffffffff7fffee58, ffffffff7fffed98, 0, 100101ac8, 100000ee8) + 38 0000000100000d68 main (5, 40, 0, 10070c, 0, 100101440) + 38 0000000100000cfc _start (0, ffffffff7fffee58, ffffffff7f60e640, ffffffff7f1bf588, ffffff00, ffffffff7f710000) + 17c ----------------- lwp# 2 / thread# 2 -------------------ffffffff7e1a5818 _libc_nanosleep (7a120, ffffffff7d2106f4, 52cb3570, 111954, ffffffff7cbbc8ac, ffffffff7cb60b48) + 8 ffffffff7cbbc8ac l101001111 (1f4, cf, 526a9453, ffffffff7ccfae70, 16b280, 7a120) + a4 ffffffff7cb64558 l010011111 (2ec8, 0, 0, 0, 0, 0) + 428 ffffffff7d2181a8 _lwp_start (0, 0, 0, 0, 0, 0)
  51. 51. Crash analysis on Windows is a bit less straightforward though not essentially different. First of all, any crash should have been logged in the Windows Application Event Log: The Event Logs are very useful to search for the history of problems. Whenever you have (or suspect) a crash, always look there first, it already displays some useful information. NB - We have seen cases where userver crashes were logged in the event log but not of the users had ever noticed any problem ! This can happen in a web environment, where after a (random) crash, a new userver is started , the request is repeated and succeeds.
  52. 52. Traditionally, Windows is shipped with the “postmortem” debugger Dr. Watson. You can install Dr.Watson as the default exit debugger with the command drwtsn32 –i which will put the Dr.Watson command line in the AeDebug subkey of registry key HKLMSOFTWAREMicrosoftWindows NTCurrentVersion: This value specifies the program to be invoked on the process that runs on a crash. Other possibilities here are the Visual Studio debugger, WinDbg, or the JIT debugger; msdev.exe -p %ld -e %ld C:debuggerswindbg.exe -p %ld -e %ld -g C:Windowssystem32VSjitdebugger.exe -p %ld -e %ld
  53. 53. Dr. Watson will, for each crash, log an entry in its log file drwtsn32.log. The location of that file can be seen/configured by running Dr.Watson interactively: C:> drwtsn32 The Dr.Watson log can be useful input for Compuware support to analyze crashes. As crashes are appended in the log, it can get quite large over the years, and it makes sense to clear it every now and then. As you see above, Dr. Watson also offers a possibility to create user mode dumps which can be useful for Compuware to analyze crashes.
  54. 54. In recent Windows versions (as from Windows Vista), Microsoft has discontinued Dr.Watson, although it can still be downloaded and it still works. Instead of Dr. Watson we now have Windows Error Reporting (WER) that enables you to collect a user mode dump (sometimes referred to as minidump or crash dump)when an application crashes. This is also done in the registry, in key HKEY_LOCAL_MACHINESOFTWAREMicrosoftWindowsWindows Error ReportingLocalDumps The name of the created dump consists of the executable, the process id (pid) and the suffix .dmp, e.g. uniface.exe.4332.dmp Dump files can be analyzed by Compuware support.
  55. 55. Troubleshooting hangs. Uservers can become unresponsive or appear to ‘hang’. This usually manifests in the client by the hourglass or *busy* indicator not going away, or by urouter reporting a timeout. When this happens we must try to find out what userver is doing, or else waiting for. The most obvious thing to check is whether the userver may be waiting on a database lock. How to find that out depends on the database in question and goes beyond the scope of this presentation. You might be able to deduce it from examining the current state of the process. To examine the state of a process on Windows, the best tools are Process Monitor (to see if it is still doing something) and Process Explorer (to see what resources and DLL’s it has open, and what the process’s threads are doing). After starting Process Monitor, use the Filter function in the pulldown menu to restrict output to userver.exe:

×