School of Business publications portal
Aaltodoc publication archive
Aalto University School of Business Master's Theses are now in the Aaltodoc publication archive (Aalto University institutional repository)
School of Business | Department of Information and Service Economy | Quantitative Methods of Economics | 2013
Thesis number: 13524
Predicting website audience demographics based on browsing history
Author: Ivanova, Eleonora
Title: Predicting website audience demographics based on browsing history
Year: 2013  Language: eng
Department: Department of Information and Service Economy
Academic subject: Quantitative Methods of Economics
Index terms: taloustieteet; economic science; media; media; internet; internet; kuluttajakäyttäytyminen; consumer behaviour; arviointi; evaluation; mittarit; ratings; tilastotiede; statistical science
Pages: 126
Full text:
» hse_ethesis_13524.pdf pdf  size:6 MB (5608679)
Key terms: demographic prediction; demographic targeting; browsing behavior; clickstream analysis; web user profiling; web analytics; classification; logistic regression; web cookies
Objectives of the Study:

The objective of the study was to explore the possibility to predict demographics from browsing behavior of web users. To achieve this objective, the issue of predicting online audience demographics was addressed from three different perspectives. Firstly, the study addressed quality of input data for models and its impact on the accuracy of predictions. Then, it was analyzed how demographics of web users influences their online behavior and, finally, the focus laid on defining factors useful for predictions.

Academic background and methodology:

Scientific literature has a record of several previous attempts to predict online audience demographics. Also, some studies examine demographic differences in online behavior. However, the issue of quality of input data for predictive models is almost entirely ignored. Two theoretical frameworks for the study were formed on the basis of the literature review. Other research method used in this study is statistical analysis including t-tests, z-tests, ANOVA, linear regression and logistic regression models.

Findings and conclusions:

The study showed existence of several factors greatly deteriorating quality of input data for models predicting online audience demographics. This results in a decrease in accuracy of predictions in several ways such as smaller datasets, overestimation of the size of some demographic groups and incorrect models. Also, the study indicated that demographic groups show differences in online behavior including preferred website content, website visiting patterns over time and likelihood to click online ads. Thus, information on these aspects of online behavior can be used for predicting demographics of web users.
Electronic publications are subject to copyright. The publications can be read freely and printed for personal use. Use for commercial purposes is forbidden.