|
|
![]() WORKSHOP
Session Tracking on the Web Why Session Tracking?Before we discuss why session tracking (or, you can refer to it as user tracking) is important, let's first understand how the Web's fundamental protocol, HTTP (HyperText Transport Protocol) works. Web browsers use the HTTP protocol to communicate with web servers. The HTTP/1.0 protocol is a connectionless protocol. This means that once a browser's request for a Web page is satisfied by the Web server, the connection between the Web server and the browser is closed. Let us consider the following sequence of events that take place when you request a page from Yahoo! by going to http://www.yahoo.com/:
The next time you send a request for another page on Yahoo!'s site, even within a matter of seconds, a new connection is opened and closed as described above. Because a new connection has to be established each time a request is sent to a Web server, the Web server does not know that it is the same Joe Smith who made the request just a few moments ago. In Web parlance, it means that the Web server doesn't know the state. However, having knowledge of the state is very important for some of the business applications. And this is where session tracking comes in. Session tracking refers to the mechanism that allows a Web server to track user data or sessions as the user goes from one page to the next in the website. For example, this enables a shopping site to remember what you already have in your shopping basket or allows it to remember what you were buying when you go from a regular server to a secure server that collects your credit card information. To summarize, keeping track of users between Web pages of an application is called maintaining or tracking state. But this doesn't mean that a session or state has to be kept "open" all the time. Web servers can be configured to expire a user's session after a time period. Web servers do this so that unnecessary user session data does not have to be carried around for very long periods of time. An expired session usually manifests itself in the form of a message from webserver that says something like "Your Session Has Expired." Most sites will want you to log in (or sign in) again after the session expires. How to Track Sessions?Now that we know that the current version of HTTP protocol (1.0) is stateless or connectionless and that it is often necessary to maintain the state, let's explore the three typical ways in which session tracking is accomplished. 1. Cookies Netscape Navigator stores cookies in a single file, named cookies.txt, in the Users directory. Here's what it looks like:
As the seond line in the file snippet above indicate, Netscape’s cookie file uses a format that is specified by the cookie_spec.html file at Netscape’s site. Also, you can see that NexTag.com stores my email address so that they can automatically fill in my email when I sign in on their site. Whereas, Internet Explorer stores each cookie individually in the Temporary Internet Files directory (in the Windows directory) with names such as cookie:vijay@www.website.com. If you want to open these files, feel free to do so. But don't modify them. Your visit back to those sites may not work properly if you make any changes. A cookie can also be used to store other information about a user to offer a personalized experience to the users not within the same session (or visit), but over multiple sessions as well. For example, Amazon.com greets me with a "Hello, Vijay R", each time I go to their site because they store information in a cookie that allows them to identify who I am. By the way, Amazon.com accomplishes their 1-click feature with cookies as well. Although the use of cookies is quite common, Web servers cannot always depend on them for session tracking. Some of the older browsers do not support cookies and/or a user can disable cookies in the browser if she feels uncomfortable about its use. In the case where cookies cannot be used, the Web servers can use a mechanism called URL rewriting to track user session information. 2. URL Rewriting
http://myexchange.com/offers URL Rewriting has the advantage of being able to work with browsers that do not accept cookies. However, the disadvantage with this mechanism is that care needs to be taken that every URL has the session information appended to it. Not doing so will cause the user to lose her session. 3. Hidden form fields <INPUT TYPE="HIDDEN" NAME="sessionInfo" VALUE="username"> When a form is submitted, the "sessionInfo" variable and the specified value (username in the above example) are sent to the Web server along with the information from other FORM fields such as text fields, radio buttons, check boxes, etc. If every page that a user visits contains a hidden field with the session information, the Web server can easily get access to the user's session information as the user moves from page to page. The disadvantage of course is that every page needs to have the hidden field with the session information. A Better Solution to Session Tracking: HTTP/1.1Folks at World Wide Web Consortium (W3C) have been aware of this problem and have proposed a new version of HTTP -- HTTP/1.1, which, in addition to other enhancements, supports persistent connections. This feature allows the connection between a browser and a Web server to remain open until it times out or is explicitly closed. Needless to say, a persistent connection will make it a lot easier for Web servers to track user sessions since the same connection between the browser and Web server can be used for the life of the session. In addition to taking care of the session tracking problem that I have discussed so far, it also makes the interaction with a Web server more efficient as there is no overhead involved in opening and closing connection with individual requests. Consider a scenario where a browser is requesting a Web page with three unique graphics from a Web server. In this case, the browser will open a total of four connections: one for each image, and one for the HTML document itself. Opening and closing each connection to the webserver is fairly time consuming. Use of HTTP/1.1, therefore, results in significant performance benefits as it allows all requests to share a single connection. So why isn't Web a better place yet? That's because the use of HTTP/1.1 requires that both browsers and Web servers support HTTP/1.1. Although most new browsers and servers support HTTP/1.1, there are still a substantial number of Web browsers and servers running older version of HTTP. And until they are upgraded, we won't experience the beneficial effects of HTTP/1.1. In this article I have tried to give you a broad overview of session tracking and HTTP/1.1. If you have specific questions, feel free to send me an e-mail at vijay@nextag.com. References
© Internet Technical Group Last update: April 15, 2000 URL: http://www.sandia.gov/itg/newsletter/mar00/workshop_session_management.html hosted by Sandia National Labs Disclaimer: Neither Sandia Corporation, the United States Government, nor any agency thereof, nor any of their employees makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately-owned rights. Reference herein to any specific commercial product, process, or service by trade name, trademark, manufacturer, or otherwise does not necessarily constitute or imply its endorsement, recommendation, or favoring by Sandia Corporation, the United States Government, or any agency thereof. The views and opinions expressed herein do not necessarily state or reflect those of Sandia Corporation, the United States Government or any agency thereof. |