Friday, 13 June 2014

How to Trace IIS website hangs/Slowness Issue :
Hangs are fairly common for production applications, and can be incredibly frustrating to troubleshoot. The main reason for this are:
·         They may be happening only sometimes and can be hard to catch.
·         They can be caused by complex and interrelated factors that can be difficult to isolate.
In this article show you how we can systematically isolate and diagnose most hangs in production

STEP 1: Is it really a hang?

An IIS website hangs whenever the it appears to stop serving incoming requests, with requests either taking a very long time or timing out. It's generally caused by all available application threads becoming blocked, causing subsequent requests to get queued (or sometimes by the number of active requests exceeding configured concurrency limits).
It’s important to differentiate the following kinds of hangs:
Full hang. All requests to your application are very slow or time out. Symptoms include detectable request queuing and sometimes 503 Service Unavailable errors when queue limits are reached.
NOTE: Most hangs do not involve high CPU, and are often called "low CPU hangs". Also, most of the time, high CPU does not itself causes a hang. In rare cases, you may also get a "high CPU hang", which we don't cover here.
1.       Rolling hang. Most requests are slow, but eventually load. This usually occurs before a full hang develops, but may also represent a stable state for an application that is overloaded.
2.       Slow requests. Only specific URLs in your application are slow. This is not generally a true hang, but rather just a performance problem with a specific part of your application.
Here are 3 "reasonable" early detection signs:
1.       "Http Service Request Queues\MaxQueueItemAge" performance counter increasing. This means IIS is falling behind in request processing, so all incoming requests are waiting at least this long to begin getting processed.
2.       "Http Service Request Queues\ArrivalRate" counter exceeds the"W3WP_W3SVC\Requests / sec" counter for the application pool's worker process over a period of time. This basically implies that more requests are coming into the system than are being processed, and this always eventually results in queueing.
3.       And the best way to detect a hang is: snapshotting currently executing requests. If the number of currently executing requests is growing, this can reliably tell you that requests are piling up ... which will always lead to higher latencies and request queueing.
Most importantly, this can also tell you which URLs are causing the hang, and which requests are queued.

You can view all currently executing requests in InetMgr, by opening the server node, going to Worker Processes, and picking your application pool's worker process:

You can also automate this by using the AppCmd command line tool:
%windir%\system32\inetsrv\appcmd list requests /elapsed:10000
This will show you which requests are executing, optionally longer than theelapsed filter you specified. I recommend an elapsed filter of at least 5 seconds or longer.
If you see multiple requests that are taking a long time to execute AND you are seeing more and more requests begin to accumulate, you likely have a hang. If you DO NOT see requests accumulating, its likely that you have slow requests to some parts of your application, but you do not have a hang.
Detecting a hang reliably is suprisingly difficult. While you can almost always tell when you have a hang by requesting your website externally, detecting it internally can be surprisingly hard.
There are many possible places where a hang can happen, and many possible signs of hangs. Most of these signs are unreliable on their own (e.g. ASP .NET queueing counters), and the reliable ones (executing requests, thread snapshots) are prohibitively expensive to monitor all the time. With LeanSentry, we solved this problem by using progressive hang detection, which starts out with lightweight monitoring of more than a dozen different performance counters ... and then confirms a likely hang with executing request snapshots and the debugger.

 

STEP 2: Diagnose the hang


Once you confirm the hang, the next step is to determine where its taking place.
It's not IIS (but check it anyway).

IIS hangs happen when all available IIS threads are blocked, causing IIS to stop dequeueing additional requests. This is rare these days, because IIS request threads almost never block. Instead, IIS hands off request processing to an ASP .NET, Classic ASP, or FastCGI application, freeing up its threads to dequeue more requests.
To quickly eliminate IIS as the source of the hang, check:
·         "Http Service Request Queues\CurrentQueueSize" counter. If its 0, IIS is having no problems dequeueing requests.
·         "W3WP_W3SVC\Active Threads" counter. This will almost always be 0, or 1 because IIS threads almost never block. If its significantly higher, you likely have IIS thread blockage due to a custom module or because you explicitly configured ASP .NET to run on IIS threads. Consider increasing your MaxPoolThreads registry key.
Diagnose the hang.

Snapshot the currently executing requests to identify where blockage is taking place.
REQUEST "7000000780000548" (url:GET /test.aspx, time:30465 msec, client:localhost, stage:ExecuteRequestHandler, module:ManagedPipelineHandler)
        REQUEST "f200000280000777" (url:GET /test.aspx, time:29071 msec, client:localhost, stage:ExecuteRequestHandler, module:ManagedPipelineHandler)
        ...
        REQUEST "6f00000780000567" (url:GET /, time:1279 msec, client:localhost, stage:AuthenticateRequest, module:WindowsAuthentication)
        REQUEST "7500020080000648" (url:GET /login, time:764 msec, client:localhost, stage:AuthenticateRequest, module:WindowsAuthentication)
                   
You can use the resulting list of executing requests to learn A LOT about whats happening, including which URL is causing the blockage, and which requests are queued.
Expert tip #1: identifying requests causing the hang. You can identify which requests are the ones causing the hang because they will be at the front of the list, taking the longest time to execute. They will generally all be stuck in the same module and stage, and often the same URL.
If the hang is being caused by a specific ASP .NET controller or page, the module will say "IsapiModule" (Classic mode) or"ManagedPipelineHandler" (Integrated mode), and the stage will say"ExecuteRequestHandler". The URL should then point to the page/controller responsible.
Expert tip #2: Identifying queued requests. See the block of requests at the bottom of the list? These are the queued requests!
In Integrated mode, these will all have the module/stage corresponding to the first ASP .NET module in the pipeline. This will generally be "Windows Authentication" in "AuthenticateRequest" or sometimes "Session" in "AcquireRequestState".

STEP 3: What code is causing the hang? (for developers)

At this point, you've confirmed the hang, and determined where in your application its located (e.g. URL). The next and final step is for the developer to figure out what in the application code is causing the hang.
Are you that developer? Then, you know how hard it is to make this final leap, because most of the time hangs are very hard to reproduce in the test environment. Because of this, you'll likely need to analyze the hang in production while its still happening.
Here is how:
1.       Make sure you have Windows Debugging Tools installed on the server (takes longer), or get ProcDump (faster).
Expert tip #3: It always pays to have these tools available on each production server ahead of time. Taking the dump approach is usually faster and poses less impact to your production process, letting you analyze it offline. However, taking a dump could be a problem if your process memory is many Gbs in size.
1.       Identify the worker process for the application pool having the hang. The executing request list will show you the process id if you run it with the /xml switch.
2.       Attach the debugger to the process, OR, snapshot a dump using procdump and load it in a debugger later.
3.  // attach debugger live (if you are fast)
4.          ntsd -p [PID]
5.          // or take a dump to attach later
6.          procdump -ma -w [PID] c:\dump.dmp
7.          ntsd -z c:\dump.dmp
                    
8.       Snapshot the thread stacks, and exit. Make sure to detach before closing the debugger, to avoid killing the process!
9.  .loadby sos clr
10.         .loadby sos mscorwks
11.         ~*e!clrstack
12.         .detach
13.         qq
                    
14.    The output will show you the code where each thread is currently executing. It will look like this:
15. OS Thread Id: 0x88b4 (7)
16.         RetAddr          Call Site
17.         000007fed5a43ec9 ASP.test_aspx.Page_Load(System.Object, System.EventArgs)
18.         000007fee5a50562 System.Web.UI.Control.OnLoad(System.EventArgs)
19.         000007fee5a4caec System.Web.UI.Control.LoadRecursive()
20.         000007fee5a4beb0 System.Web.UI.Page.ProcessRequest()
21.         000007ff001b0219 System.Web.UI.Page.ProcessRequest(System.Web.HttpContext)
22.         000007fee5a53357 ASP.test_aspx.ProcessRequest(System.Web.HttpContext)
23.         000007fee61fcc14 System.Web.Hosting.PipelineRuntime.ProcessRequestNotification(IntPtr, IntPtr, IntPtr, Int32)
                    
24.    Wait 10-20 seconds, and do it again. If you are taking a dump, just take two dumps 10 seconds or so apart.
Alright. Once you have the two thread stack lists, your objective is to find thread ids that have the same stack in both snapshots. These stacks show the code that is blocking the threads, and thereby causing the hang.
NOTE: If you are only seeing a couple threads or no threads with the same stack, its likely because you either a) have a rolling hang where requests are taking a while but are still moving, or b) your application is asynchronous. If its async, debugging hangs is WAY harder because its nearly impossible to tell where requests are blocked without stacks. In this case, you need to implement custom application tracing across async boundaries to help you debug hangs. I will blog more about this in the near future.

7 comments:

  1. Some important IIS Error and Solution Details :

    401 - Unauthorized: Access is denied due to invalid credentials.
    You do not have permission to view this directory or page using the credentials that you supplied.
    401.1 - Unauthorized: Access is denied due to invalid credentials.
    You do not have permission to view this directory or page using the credentials that you supplied.
    401.2 - Unauthorized: Access is denied due to server configuration.
    You do not have permission to view this directory or page using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.

    401.3 - Unauthorized: Access is denied due to an ACL set on the requested resource.
    You do not have permission to view this directory or page due to the access control list (ACL) that is configured for this resource on the Web server.

    401.4 - Unauthorized: Authorization failed by filter installed on the Web server.
    You might not have permission to view this directory or page using the credentials that you supplied. The Web server has a filter installed to verify users connecting to the server and it failed to authenticate your credentials.

    401.5 - Unauthorized: Authorization failed by an ISAPI/CGI application.
    The URL you attempted to reach has an ISAPI or CGI application installed that verifies user credentials before proceeding. This application cannot verify your credentials.

    ReplyDelete
  2. 403 - Forbidden: Access is denied.
    You do not have permission to view this directory or page using the credentials that you supplied.

    403.1 - Forbidden: Execute access is denied.
    You have attempted to execute a CGI, ISAPI, or other executable program from a directory that does not allow programs to be executed.

    403.2 - Forbidden: Read access is denied.
    There is a problem with the page you are looking for and it cannot be displayed. This error can occur if you are trying to display an HTML page that resides in a directory that is configured to allow Execute or Script permissions only.

    403.3 - Forbidden: Write access is denied.
    There is a problem saving the page to the Web site. This error can occur if you attempt to upload a file or modify a file in a directory that does not allow Write access.

    403.4 - Forbidden: SSL is required to view this resource.
    The page you are trying to access is secured with Secure Sockets Layer (SSL).

    403.5 - Forbidden: SSL 128 is required to view this resource.
    The resource you are trying to access is secured with a 128-bit version of Secure Sockets Layer (SSL). In order to view this resource, you need a browser that supports this version of SSL.

    403.6 - Forbidden: IP address of the client has been rejected.
    The Web server you are attempting to reach has a list of IP addresses that are not allowed to access the Web site, and the IP address of your browsing computer is on this list.

    403.7 - Forbidden: SSL client certificate is required.
    The page you are attempting to access requires your browser to have a Secure Sockets Layer (SSL) client certificate that the Web server will recognize. The client certificate is used for identifying you as a valid user of the resource.

    ReplyDelete
  3. 403.8 - Forbidden: DNS name of the client is rejected.
    The Web server you are attempting to reach has a list of DNS names that are not allowed to access this Web site, and the DNS name of your browsing computer is on this list.

    403.9 - Forbidden: Too many clients are trying to connect to the Web server.
    The Web server is too busy to process your request at this time.

    403.10 - Forbidden: Web server is configured to deny Execute access.
    You have attempted to execute a CGI, ISAPI, or other executable program from a directory that does not allow programs to be executed.
    403.11 - Forbidden: Password has been changed.
    You do not have permission to view this directory or page using the credentials that you supplied.

    403.12 - Forbidden: Client certificate is denied access by the server certificate mapper.
    The account to which your client certificate is mapped on the Web server has been denied access to this Web site. A Secure Sockets Layer (SSL) client certificate is used for identifying you as a valid user of the resource.

    403.13 - Forbidden: Client certificate has been revoked on the Web server.
    Your client certificate was revoked, or the revocation server could not be contacted. A Secure Sockets Layer (SSL) client certificate is used for identifying you as a valid user of the resource.

    ReplyDelete
  4. 403.14 - Forbidden: Directory listing denied.
    The Web server is configured not to display a list of the contents of this directory.

    403.15 - Forbidden: Client access licenses have exceeded limits on the Web server.
    There are too many people accessing the Web site at this time. The Web server has exceeded its Client Access License limit.

    403.16 - Forbidden: Client certificate is ill-formed or is not trusted by the Web server.
    Your client certificate is untrusted or invalid. A Secure Sockets Layer (SSL) client certificate is used for identifying you as a valid user of the resource

    403.17 - Forbidden: Client certificate has expired or is not yet valid.
    Your client certificate has expired or is not yet valid. A Secure Sockets Layer (SSL) client certificate is used for identifying you as a valid user of the resource.

    403.18 - Forbidden: Cannot execute requested URL in the current application pool.
    The specified request cannot be executed in the application pool that is configured for this resource on the Web server.

    403.19 - Forbidden: Cannot execute CGIs for the client in this application pool.
    The configured user for this application pool does not have sufficient privileges to execute CGI applications.

    ReplyDelete
  5. 404 - File or directory not found.
    The resource you are looking for might have been removed, had its name changed, or is temporarily unavailable.

    404.1 - File or directory not found: Web site not accessible on the requested port.
    The Web site you are trying to access has an IP address that is configured not to accept requests that specify a port number.

    404.2 - File or directory not found: Lockdown policy prevents this request.
    The page you are requesting cannot be served due to the Web service extensions that are configured on the Web server.

    404.3 - File or directory not found: MIME map policy prevents this request.
    The page you are requesting cannot be served due to the Multipurpose Internet Mail Extensions (MIME) map policy that is configured on the Web server. The page you requested has a file name extension that is not recognised, and is therefore not allowed.

    404.4 - File or directory not found: No module handler is registered to handle the request.
    The resource you are looking for does not have a module or handler associated with it. It cannot be handled and served.

    404.5 - URL sequence denied.
    The specified URL sequence is not accepted by the server.

    404.6 - HTTP verb denied.
    The specified HTTP verb is not accepted by the server.

    404.7 - File extension denied.
    The specified file extension of the resource is not accepted by the server.

    404.8 - URL namespace hidden.
    The namespace of the specified URL is hidden by configuration.

    404.9 - File attribute hidden.
    The requested file has a hidden attribute which prevents it from being served.

    ReplyDelete
  6. 404.10 - Request header too long.
    One of the request headers is longer than the specified limit configured in the server.

    404.11 - URL is double-escaped.
    This URL is denied because it is susceptible to double-escaping attacks.

    404.12 - URL has high bit characters.
    This URL is denied because it has high-bit characters.

    404.13 - Content-Length too large.
    This URL is denied because the Content-Length set is longer than specified by configuration.

    404.14 - URL too long.
    This URL is denied because its length is longer than specified by configuration.

    404.15 - Query-String too long.
    This URL is denied because its Query-String is longer than specified by configuration.

    405 - HTTP verb used to access this page is not allowed.
    The page you are looking for cannot be displayed because an invalid method (HTTP verb) was used to attempt access.

    406 - Client browser does not accept the MIME type of the requested page.
    The page you are looking for cannot be opened by your browser because it has a file name extension that your browser does not accept.

    412 - Precondition set by the client failed when evaluated on the Web server.
    The request was not completed due to preconditions that are set in the request header. Preconditions prevent the requested method from being applied to a resource other than the one intended. An example of a precondition is testing for expired content in the page cache of the client.

    500 - Internal server error.
    There is a problem with the resource you are looking for, and it cannot be displayed.

    500.13 - Server error: Web server is too busy.
    The request cannot be processed at this time. The amount of traffic exceeds the Web site's configured capacity.

    500.14 - Server error: Invalid application configuration on the server.
    The request cannot be processed due to application configuration errors on the Web server.

    500.15 - Server error: Direct requests for GLOBAL.ASA are not allowed.
    GLOBAL.ASA is a special file that cannot be accessed directly by your browser.

    500.16 - Server error: UNC authorization credentials incorrect.
    The page you are requesting cannot be accessed due to UNC authorization settings that are configured incorrectly on the Web server.

    500.17 - Server error: URL authorization store cannot be found.
    The URL Authorization store for the page you requested cannot be found on the Web server, therefore your credentials cannot be verified.

    ReplyDelete
  7. 500.18 - Server error: URL authorization store cannot be opened.
    The URL Authorization store for the page you requested cannot be opened on the Web server, therefore your credentials cannot be verified.

    500.19 - Server error: Data for this file is configured improperly.
    The requested page cannot be accessed because of a configuration error.

    501 - Header values specify a method that is not implemented.
    The page you are looking for cannot be displayed because a header value in the request does not match certain configuration settings on the Web server. For example, a request header might specify a POST to a static file that cannot be posted to, or specify a Transfer-Encoding value that cannot make use of compression

    502 - Web server received an invalid response while acting as a gateway or proxy server.
    There is a problem with the page you are looking for, and it cannot be displayed. When the Web server (while acting as a gateway or proxy) contacted the upstream content server, it received an invalid response from the content server.

    ReplyDelete