User Session Identification Using Reference Length

Jozef Kapusta, Michal Munk, Martin Drlík

Abstract


One of the methods of web log mining is also discovering patterns of behaviour of web site visitors. Based on the found users’ behaviour patterns that are represented by sequence rules, it is possible to modify and improve web site of the organization. Data for the analysis are gained from the web server log file. These anonymous data represent the problem of unique identification of the web site visitor. The paper deals with less commonly used navigation-driven methods of user session identification. These methods assume that the user goes over several navigation pages during her/his visit until she/he finds the content page with required information. The content page is a page where the user spends considerably more time in comparison with navigation pages. The content page is considered to be the end of the session. Searching of the next content page using navigation pages constitutes a new user session. The division of pages into content and navigation pages is based on the calculation of cut-off time C. The verification of exponential distribution of variable that represents the time which user spent on the particular page is coessential. We prepared an experiment with data gained from log file of university web server. We tried to verify, if the time spent on web pages has exponential distribution and we estimated the value of cut-off time. The found results confirm our assumptions that the navigation-driven methods could be used to proper user session identification.

References


Berendt, B., Spiliopoulou, M., 2000. Analysis of navigation behaviour in web sites integrating multiple information systems. VLDB Journal, 9 (1), pp. 56-75.

Chen, Ming-Syan, Park, Jong Soo, Yu, Philip S., 1996. Data mining for path traversal patterns in a web environment. In: Proceedings - International Conference on Distributed Computing Systems, pp. 385-392.

Cooley, R., Mobasher, B., Srivastava, J., 1999. Knowledge and Information System, 1.Springer-Verlag, ISSN 0219-1377.

Černá, M., Poulová, P. , 2008. Visit rate of internet portals and utilization of their tools and services. E & M Ekonomie a management, Vol. 11, Issue 4, 2008, 132-143, ISSN 1212-3609.

Mobasher, B., Cooley, R., Srivastava, J., 2000. Automatic Personalization Based on Web Usage Mining. Communications of the ACM, 43 (8), pp. 142-151.

Munk, M., Drlík, M., 2011. Impact of Different Pre-Processing Tasks on Effective Identification of Users' Behavioral Patterns in Web-based Educational System. International Conference on Computational Science 2011, ICCS 2011, Procedia Computer Science, Elsevier, 2011.

Munk, M., Drlik, M., 2011. Influence of different session timeouts thresholds on results of sequence rule analysis in educational data mining. Communications in Computer and Information Science, 166 CCIS (PART 1), pp. 60-74.

Munk, M., Kapusta, J., Švec, P, 2009. Data preprocessing dependency for web usage mining based on sequence rule analysis. In: Proceedings of the IADIS European Conference on Data Mining 2009, ECDM'09, Part of the IADIS MCCSIS 2009, 179-181, ISBN 978-972892488-1.

Munk, M., Kapusta, J., Švec, P., 2010. Data preprocessing evaluation for web log mining: reconstruction of activities of a web visitor. In: International Conference on Computational Science 2010, ICCS 2010, Procedia Computer Science, Elsevier, Vol. 1, Issue 1, 2273-2280, 2010, ISSN 1877-0509.

Munk, M., Kapusta, J., Švec, P., Turčáni, M., 2010. Data advance preparation factors affecting results of sequence rule analysis in web log mining. E & M Ekonomie a Management, Vol. 13, Issue 4, 2010, 143-160, ISSN 1212-3609.


Full Text: PDF

Back to list of accepted papers