
The Java servlet API, like most Java APIs, is synchronous and a thread is allocated to a request for the duration of the handling of a HTTP request. This has been sufficient to allow web 1.0 applications implemented on servlets to scale to enterprise level loads. However, web 2.0 applications, specifically those using Ajax and Comet techniques, present a traffic profile that cannot be scalably handled using synchronous servlets. This paper examines the problem and introduces the features that have been implemented in the Jetty 6 server to address them.
The traditional Java IO model ( the java.net.Socket
API ) associates a thread with every TCP/IP connection. This was a
reasonable approach when the majority of web browsers were using
non-persistent HTTP/1.0 and the TCP/IP connection would be closed
after every response. However, today almost all HTTP traffic is over
one or more persistent connections, which are kept open after a
response has completed so that the expense and latency of creating a
new connection can be avoided on subsequent requests. These
connections can be kept open for 10s of seconds, even minutes, while
users are reading pages, completing forms or ceased using the
website.
Thus servers using traditional IO will have one or two connections, threads and the associated buffers allocated for every client browser concurrently using the server. Even though modern JVMs are much better at handling large numbers of threads, this still a problem for scaling, as the memory usage of every thread stack and buffer is not insignificant.
This issue can be partially resolved by the server reducing idle persistent times when it is low on resources, so idle connections can be closed to free resources for new connections with active requests. However this approach trends towards inefficient non-persistent connections as a throttle mechanism to protect a servers inefficient use of resources. High request rates may be achieved, but only over few concurrent connections and thus for only few concurrent users.
Java 1.4 introduced the java.nio package that supports asynchronous IO, so that threads can be allocated to connections only when requests are being processed. When the connection is idle, associated thread and buffers can be returned to pools for reuse and the connection can be added to an NIO select set to wait for new activity without requiring significant resources. Once a request is received on a connection, a thread and buffer are allocated and the synchronous servlet API invoked to handle the request.
This thread-per-request model allows much greater scaling of connections to web 1.0 applications without sacrificing the efficiencies of persistent connections. It allows a server to support both high request rates and high numbers of concurrent users.
The thread-per-request model works on the assumption of a typical web 1.0 request profile: A burst of requests is received that represents a user accessing a page of the application, which is followed by a pause while the user reads the page, selects the next navigation link, completes a form or does nothing.
Web 2.0 applications using Ajax techniques erode this assumption as they are able to issue new requests while a page is being read or a form is being completed. Thus the pauses between requests have become shorter and the thread-per-request model less efficient because a user will more frequently have requests being handled within the server.
Web 2. applications using Comet techniques totally invalidate the thread-per-request model. Comet applications use long standing requests that are held by the server to allow asynchronous responses to be sent by the server to the client. Thus an Ajax/Comet client will almost always have an outstanding request being held by the server so that a response can be sent in reaction to an asynchronous event on the server. Because the servlet API is synchronous, a thread must be allocated to these requests, even if the servlet handling them is idly waiting for an asynchronous event or a timeout.
The following table shows that a Web 1.0 server can handle 10000 users with 500 threads and 36MB of thread stacks, which is easily achievable with current JVMs and servers. For a Web 2.0 application these requirements explode an order of magnitude to 10600 threads and 694MB of stack memory, which is pushing the limits of current servers without even considering the resource requirements of the application:
|
|
Formula |
Web 1.0 |
Web 2.0 + Comet |
|
Users |
u |
10000 |
10000 |
|
|
|
|
|
|
Requests/Burst |
b |
5 |
2 |
|
Burst period (s) |
p |
20 |
5 |
|
Request Duration (s) |
d |
0.200 |
0.150 |
|
Poll Duration (s) |
D |
0 |
10 |
|
|
|
|
|
|
Request rate (req/s) |
rr=u*b/20 |
2500 |
4000 |
|
Poll rate (req/s) |
pr=u/d |
0 |
1000 |
|
Total (req/s) |
r=rr+pr |
2500 |
5000 |
|
|
|
|
|
|
Concurrent requests |
c=rr*d+pr*D |
500 |
10600 |
|
Min Threads |
T=c |
500 |
10600 |
|
Stack memory |
S=64*1024*T |
32MB |
694MB |
This analysis only considers the resources of threads and stacks, but there are many other commonly pooled resources that are allocated during servlet request handling: input buffers, output buffers, character to byte converters, byte to character converters,etc. All of these may suffer an order of magnitude increase in usage in a web2.0 application.
If Web 2.0 Comet applications are going to be scalably deployed on Java servlet servers, then a mechanism needs to be found to allow a thread and associated resources to only be allocated to a request when that thread can be productively used to progress the handling of the request and the production of the response. The thread-per-request model must evolve to a thread-per-active-request model.
While Web 2.0 Ajax/Comet applications have exposed a flaw in the thread-per-model, there are several other use-cases common to even web 1.0 applications that have expose the same flaw. When ever a servlet needs to wait for a resource or an event the thread-per-request model is degraded as request threads and memory are allocated to requests that are simply waiting and those resources are not being used productively.
Consider the scenario where some URLs within a web application require a JDBC connection from a limited connection pool. Many requests may be blocked waiting for a JDBC connection and each will consume a thread from the servers thread pool. It is possible that all the available threads available may be blocked and be unavailable to handle requests/users that do not require JDBC, resulting in a few URLs starving resources from the rest of the application.
Servlets often spend time waiting: obtaining JDBC connections; using JDBC connections; calling other remote services; proxying requests; waiting to receive input; waiting to flush output etc. With the current servlet API the only way for this waiting to be achieve is to block the thread allocated to the request by the servlet container.
Unfortunately the solution to the waiting servlet problem is not a simple as providing non-blocking IO API to the servlets or and event driven JDBC API. Even if these non-blocking APIs were available to the servlet, there is no mechanisms to return the thread to the pool when one of these APIs indicates that no productive work can be done. Any mechanism or to free the thread must deal with the the calling stack, which may involve one or more Filters, RequestDispatchers and/or other ServletContexts. If the calling stack is not discarded, then the resources (specifically memory) needed for them are not able to be reused and scalability is not achieved.
Jetty 6 has implemented a number of inter-related features that provide an integrated solution to the problem of scaling servlets that wait.
Jetty uses the Java.nio library for it's main HTTP connector implementation. This supports a thread-per-request model that approaches thread-per-active-request when used together with other jetty features. The connections received by the NIO connectors are always run in non-blocking IO mode and blocking semantics is applied only to the input streams and output streams that wrap the connection when passed to a servlet. Where possible and appropriate, advantage has of advanced NIO features such as gather writes, direct buffers and memory mapped file buffers.
To deal with the need for servlets to have an efficient waiting mechanism, Jetty 6 introduced a Continuation Mechanism, which allows the current threads handler to be suspended and resumed at a later time in response to a timeout or an asynchronous event. A continuation is obtained by a servlet via a portable API that will work in any Servlet container. In containers other than Jetty, a simple WaitingContinuation is used where suspend is the equivalent to a wait() and resume the equivalent of an notify(), so the application works but does not avoid the problem of thread allocation.
When run in Jetty, the continuation instance obtained will be a RetryContinuation, which when suspended actually throws a special RuntimeException that allows the request thread to exit the Filter/RequestDispatcher/Servlet chain and unwind the statck to the container. The thread is returned to the pool while Jetty holds the request until either a timeout occurs or the resume method is called.
This mechanisms relies on the stateless nature of HTTP and the ability to retry idempotent requests. It allows requests to wait within the container rather than within the servlet, without the expense of the allocated thread and buffers.
The following table indicates how Web 2.0 Comet applications can be implemented with continuations with only a modest increase in the server requirements:
|
|
Formula |
Web 1.0 |
Web 2.0 + Comet |
Web 2.0 + Comet + Continuations |
|
Users |
u |
10000 |
10000 |
10000 |
|
|
|
|
|
|
|
Requests/Burst |
b |
5 |
2 |
2 |
|
Burst period (s) |
p |
20 |
5 |
5 |
|
Request Duration (s) |
d |
0.200 |
0.150 |
0.175 |
|
Poll Duration (s) |
D |
0 |
10 |
10 |
|
|
|
|
|
|
|
Request rate (req/s) |
rr=u*b/20 |
2500 |
4000 |
4000 |
|
Poll rate (req/s) |
pr=u/d |
0 |
1000 |
1000 |
|
Total (req/s) |
r=rr+pr |
2500 |
5000 |
5000 |
|
|
|
|
|
|
|
Concurrent requests |
c=rr*d+pr*D |
500 |
10600 |
10700 |
|
Min Threads |
T=c T=r*d |
500 - |
10600 - |
- 875 |
|
Stack memory |
S=64*1024*T |
32MB |
694MB |
57MB |
Jetty 6 uses a split buffer architecture and dynamic buffer allocation. An idle connection will have no buffers allocated to it, but once a request received only an small request header buffer is allocated. I
Many (most) requests have no content or only small content, so often the header buffer often the only buffer required for a request. Only if the request indicates that it has a large content is an input buffer is allocated.
If the servlet waits (hopefully using a continuation) before writing a response, then no response buffers are allocated during the wait.
Only when the servlet starts writing a response is an output buffer allocated and only when the response is committed is a response header buffer allocated and written with the output buffer in an efficient gather write operation.
These strategies mean that Jetty 6 allocates buffers only when they are required and thus it is possible to have fewer larger buffers within the same memory footprint. Having large input and output buffers is a significant advantage as it reduces the probability that the servlet code will block while reading/writing from/to a slow network. Furthermore, as the IO need only be blocking during the call to the servlet it is possible to fill the first input buffer and flush the output buffer using asynchronous IO. The larger the buffers then more often all the IO will be able to be done in non blocking mode without an allocated thread and stack.
Often a HTTP request arrives in a separate TCP/IP packet to the request content. Furthermore, the Expect Continues mechanism may be used by a request to demand a handshake response before sending any content packets. If a server dispatches a request to a servlet immediately that the request header is received, then that servlet is likely to block waiting to read the content in the next TCP/IP packet.
Jetty 6 can delay the dispatch of a request to a servlet until content has been read with asynchronous IO and is available for the servlet to immediately handle. In the near future, Jetty will be able to delay dispatch until the entire content has been received (if the buffer is large enough) thus allowing only non blocking IO to be used to read the entire request and the servlet will not block in input as all the content will be available in memory.
Jetty 6 employs a number of innovative strategies to ensure that only the resources that are actually required are assigned to a connection and only for the duration of they are needed. This careful resource management gives Jetty an architecture designed to scale to meet the needs of Ajax/Comet applications as well as the other use-cases that require servlets to wait.
Currently these solutions have all been implemented within the bounds of the current servlet API. However, the jetty developers are in discussion with the servlet standard JSR and other container implementers in order to find common ground for an extension to the servlet API that will allow more explicit handling of asynchronous events.
(c)opyright Webtide 2006. Some rights reserved. Creative Commons License