Successfully reported this slideshow.

TCPIP Networks for DBAs



Loading in …3
1 of 39
1 of 39

More Related Content

Related Books

Free with a 14 day trial from Scribd

See all

TCPIP Networks for DBAs

  1. 1. Everything a DBA Should Know about TCP/IP Networks<br />Chen (Gwen) Shapira<br />
  2. 2. My Stories<br />ORA-12545 on connection to RAC<br />Job does not finish running<br />Reading 2M rows<br />Copying redo logs to DR site<br />
  3. 3. Will Show<br />Collect hard data – Don’t guess<br />When & What to tune<br />Back of the envelope calculations<br />
  4. 4. ORA-12545 Connecting to RAC <br />
  5. 5. Why guess when you canCapture?<br />
  6. 6.
  7. 7.
  8. 8.
  9. 9. I want to connect<br />Go to that server! Bye!<br />Go where???<br />
  10. 10. Solutions<br />Fix LOCAL_LISTENER<br />Fix DNS<br />
  11. 11. Batch Job Never Finishes<br />
  12. 12. Capture on bothClient & Server<br />
  13. 13.
  14. 14.
  15. 15. Run this procedure<br />ACK!<br />
  16. 16. Two hours later…<br />Hello? Are you alive? No?<br />BYE!<br />Waiting<br />
  17. 17. The firewall is eating my packets!<br />
  18. 18. Solutions<br />Talk to network admin<br />Configure SQLNET.EXPIRE_TIME<br />Configure tcp_keepalive_time<br />
  19. 19. Give Me 2M Rows ASAP<br />
  20. 20. Start with Wait Events<br />
  21. 21. SQL*Net Message to client-Meaningless<br />
  22. 22. SQL*Net Message from client-Nearly Meaningless<br />
  23. 23. Do the numbers make sense?<br />Bytes Sent<br />Time<br />Roundtrips<br />
  24. 24. Tune the ArraySize<br />Or setFetchSize()<br />With 2M rows:<br /><ul><li>Fetch 10 => 200,000 Roundtrips
  25. 25. Fetch 5000 => 400 Roundtrips</li></li></ul><li>(Don’t) Tune SDU<br />Oracle’s buffer – 2K or 8K <br />Can set to max – 32K<br />Can set to multiple of 1476 byte<br />Highly unlikely target<br />
  26. 26. Beware:Compulsive Tuning Disorder<br />
  27. 27. Get Redo Logs to DR Site<br />
  28. 28. Q1: Bandwidth?<br />
  29. 29. OC3 =&gt; 155 Mb/s =&gt; ~ 70G/hour =&gt; ~ 60G with headers<br />
  30. 30. Key problem:Line utilization<br />
  31. 31. Q2: Latency?<br />TNSPing Roundtrip time – 500ms<br />
  32. 32. Data &lt; 1500 bytes<br />500 ms<br />ACK<br />
  33. 33. Data &lt; 1500 bytes<br />ACK<br />
  34. 34. 155Mb/s * 500ms=9.6MBytes<br />
  35. 35. Advertised Windows<br />net.core.wmem_default<br />net.core.wmem_max<br />net.core.rmem_default<br />net.core.rmem_max<br />
  36. 36. Congestion Window<br />Errors<br />Window Size<br />Time<br />
  37. 37. WAN Accelarator<br />$<br />$<br />$<br />$<br />
  38. 38. Rememeber<br />Collect hard data – Don’t guess<br />When & What to tune<br />Back of the envelope calculations<br />
  39. 39. Questions?<br />

Editor's Notes

  • ORA-12545: Connect failed because target host or object does not exist Can’t connect to Oracle server. Can be anything from misspelled host in TNSNAMES.ORA to mis-configured firewall. Normally it is a reproducible error, but in our case it happened only 50% of the time.
  • Click Capture->Options to get to this screen Select the right interface Add filters If capturing large amounts of data, or over night – configure to stop capture or use ring buffer
  • Now that we know EXACTLY what is the problem, we can look for a solution.If the server is redirecting using the wrong server name (local name instead of a VIP for instance) – fix LOCAL_LISTENER parameterIf the server is redirecting correctly – the DNS should be fixed to allow the client to connect to that server
  • Client connects to server with SQLPLUS and calls a stored procedure. Several hours later and still no reply from server Server no longer shows any session from the client Log table shows that the procedure stopped running in the middle.
  • After two hours the server sent a keep alive. Then few minutes later another one and then few more. When the server received no reply it closed the connection. Oracle rolled back the last statement and closed the session. 8 hours later the client is still waiting for a reply while the server has long forgotten about the whole thing.
  • This is a bit of a guess. I do know that the server is sending keep alives and the client is not receiving them. Obviously they get lost somewhere. Since this is a LAN, the firewall is a likely candidate.
  • Now that we know EXACTLY what is the problem and have the captures to prove it, we can discuss the mystery with the network admin. The discussion is likely to involve the times that jobs run and the firwall connection timeout settings,
  • This event measures the time it takes for Oracle to request the operating system to send a buffer of data to the client. It is all local and normally takes very little time.
  • This event measures the time it took for the message to reach the client, for the client to process it, and for the client reply to reach the server.It is nearly meaningless because it measures too much things and you can’t really know where the time went. That’s why DBAs were taught to ignore it.You can capture on client and server to learn more:When did the server send the messages, how long did they take to arrive, how long until the client sent its replies?
  • Check the statistics – how long did it take to run the query? How much data? How many roundtrips?You are tuning the network so we can assume that nearly all the query time is spent on network Or that you determined with other means how much time is spent on the network. Ask your network admin – I sent 1500K on our LAN and it took 30 secs, is this reasonable? If not, maybe your capture holds clues to the problem. Do you see retransmits? Does the numer of roundtrips make sense?
  • Note that 10 is Java’s default fetch size. The difference in number of roundtrips is huge and has real performance impact.Set the fetch or array size to as high as you can. The only limit is the amount of memory you have for storing the rows (and with SQLPLUS there is a 5000 row limit).
  • There is a lot of advice on the net for tuning SDU, but very little proof that it ever helped improve performance.Set to maximum size (32k) if you want to avoid context switches caused by Oracle filling the bufferSet to a multiple of 1476 bytes to minimize number of packets sentIf you get improvement – let me know
  • Make sure you have a real performance problem *and* that the network times are really the issue before you start fiddling with random knobs. After fiddling with knobs make sure it really improved performance *and* that you know why.
  • You have dataguard or a similar setup and you need to move large amounts of data to a remote site on a continious basis
  • But will we really get 60G?Best way to know is to test. If you are testing for DataGuard, test with Oracle’s file transfer and not SCP since SCP sets its own parameters.If you get less than you expect – maybe you get enough and then you are fine. Only tune if you *need* to move data faster.
  • If you don’t get all the bandwidth you expect, the problem is usually line utilization. Maybe someone else is using the line, but it could be something else.
  • If we wait for each ACK until we send the next packet – we’ll get an amazing 3k/s bandwidth out of our 155M/s line.To get more bandwidth, we need to have more data “in flight”
  • How much in flight do we need?
  • The magic number is bandwidth times latency – also known as bandwidth delay product.
  • You’ve done the math and you need more bandwidth than you can squeeze. Now you have the data to go to your management and ask for a WAN Accelerator – clever device that uses multiple tricks including compression and caching to move data fast across the internet.
  • ×