 |













|
 |
Query-strings and Cookies caching
General: PipeBoost can cache multiple copies of
the same page when this page is specific to a user or a group, for
example, so that every user will receive his own copy of the page.
This can be an efficient web application acceleration solution when
used in tandem with cache pre-validation.
How it works: Multi-copy caching is accomplished
by reading a cookie (or cookies) that uniquely identifies a particular
user or group and combining it with the query-string. These together
form a unique identifier of a particular page copy. The page
identifier is used to store and later locate a cached copy of the
page.
For example, a script could have query-strings of this form:
GetReport.asp?Month=10&Page=1&CustID=1371
GetReport.asp?Month=10&Page=1&CustID=5054
where the argument CustID is a unique identifier of a customer. Or the
application could set a CustID cookie to the required value and not
use the argument in a query-string at all. This cookie has to be
registered with PipeBoost, so that PipeBoost uses the value of this
cookie in combination with the query-string to form a page indetifier.
It is not enough to simply store the customer ID in a session
variable, because PipeBoost has no means of accessing the session
data. These cookies are configured in the PipeBoost Cookies property
page of a web site.
A web developer has to create his own cookies for use with PipeBoost.
One cannot rely on cookies like ASPSESSIONID, because such cookies
change too often and a developer has no control over them. A
frequently changing cookie simply negates the effectiveness of this
whole caching system.
This system should only be used for scripts with a relatively finite
number of possible outcomes. An example of good use is daily report
pages that are specific to a customer that may take quite a while to
generate each time. An example of improper use is a massive search
engine, where it is not possible to cache even a fraction of frequent
queries.
Security Considerations: Of course, the above GetReport example is simplified. Some more work
has to be done to use this system securely.
Access to cache
Reviewing the above example one can see it is quite easy to change the
CustID to any value and get someone else’s cached data without logging
in as that customer. PipeBoost does not and cannot authenticate the
requests for you. If your script responds with a status code 200 and
PipeBoost has a fresh cached copy of the requested page, it will
simply send that copy, assuming everything else allows compression.
This issue can be easily resolved in a number of ways:
|
Method 1 - Authenticate |
This method relies on session states and may not be suitable for
sites that do not use sessions. But the solution is simple:
store the CustID as a session variable and compare the value of
this session variable with the value supplied in a query-string
or a cookie. If the values do not match, emit an Access denied
page. This may still not be enough as PipeBoost may still send
out a cached copy (as was discussed earlier), so to prevent that
a script has to use the X-Pb-SendCache response field. The whole
midification to the code would look something like this:
If Request("CustID") <> Session("CustID") Then
‘make sure PipeBoost does not send a cached copy
Response.AddHeader "X-Pb-SendCache", "no"
‘make sure PipeBoost does not cache this page either
Response.CacheControl = "no-cache"
Server.Transfer("/login.asp?error=AccessDenied")
End If
Or if CustID is supplied in a cookie instead of the
query-string, then change Session("CustID") to
Request.Cookies("CustID") . |
|
Method 2 - Use strong encryption |
This method relies on strong symmetric-key block ciphers and
does not require a session state. If you choose to store all of
the important session data in cookies, you should use this
method. The solution is: encrypt and encode the CustID value
when forming the URL or setting a cookie and decrypt it when the
request comes back. PipeBoost will use encrypted values to form
a page identifier, which is just as unique as the original
decrypted value, but practically impossible to guess. For
example, URLs could look like this:
GetReport.asp?Month=10&Page=1&CustID=5GHI082V62H77Q3A
GetReport.asp?Month=10&Page=1&CustID=JU634F5TYQWO80I9
You have to be careful not to use persistent cookies or to
authenticate the requests (like in Method 1), otherwise you
cannot ensure the security of your customer’s data. And if you
need even more security, combine the two methods, but remember
that you have to use an industry-standard strong cipher, because
it has been proven more than once that security through
obscurity does not work. |
Cache Size: You should be very careful what you
enable the multi-copy caching for. In HTTP there is no distinction
between the files that accept query-strings and the files that do not,
a URL to any file can have a query-string in it. Other than from the
configuration that you provide, PipeBoost cannot determine what should
be cached with a single copy and what with multiple copies.
There are two basic considerations: memory and disk
space. PipeBoost keeps the entire cache database index in memory for
high-speed access. The cached files are stored on disk in the location
that you specify. Every 1000 cached files will take up ~400Kb in
memory, because of memory allocation granularity. So what would happen
if you enabled multi-copy caching for all .html files and your site
has about 500 of them (not a big site by today’s standards)? Let us
assume that your Maximum cached files/file option is set to 2000. An
attacker knowing about your setup could request all of those 500 HTML
files with 2000 query-strings each, which would result in 500 x 2000 =
1000000 cached files. This would require ~400Mb of RAM to keep the
cache DB index! And at an average size of 20Kb per HTML and 5Kb after
compression, the total disk space needed to store the cache would be
5Gb.
Improper configuration can easily open the door to a
Denial of Service attack on your server. The best way is to enable
multi-copy caching only for the files that will use it and can benefit
from it.
Limitations:
· There are only so many copies of a page that can be efficiently
cached. Currently, PipeBoost can handle upto 16384 cached copies per
page.
· A web application has to set cookies or query-strings values that
uniquely identify a user or group to make the resulting page
identifier unique.
· Only one type of pages can be cached for a particular file at any
moment of time. The type is defined by the Cache-Control response
header field (please also see PipeBoost Cookies). It can be either
public or private. Normally, your script or application would only
produce one type of pages. If for some reason you decide to switch to
the other type, you have to delete all cache for this file before
caching will resume.
|
 |