User Session Identification Using Reference Length

Jozef Kapusta, Michal Munk, Martin Drlík


One of the methods of web log mining is also discovering patterns of behaviour of web site visitors. Based on the found users’ behaviour patterns that are represented by sequence rules, it is possible to modify and improve web site of the organization. Data for the analysis are gained from the web server log file. These anonymous data represent the problem of unique identification of the web site visitor. The paper deals with less commonly used navigation-driven methods of user session identification. These methods assume that the user goes over several navigation pages during her/his visit until she/he finds the content page with required information. The content page is a page where the user spends considerably more time in comparison with navigation pages. The content page is considered to be the end of the session. Searching of the next content page using navigation pages constitutes a new user session. The division of pages into content and navigation pages is based on the calculation of cut-off time C. The verification of exponential distribution of variable that represents the time which user spent on the particular page is coessential. We prepared an experiment with data gained from log file of university web server. We tried to verify, if the time spent on web pages has exponential distribution and we estimated the value of cut-off time. The found results confirm our assumptions that the navigation-driven methods could be used to proper user session identification.


