Recommender Systems for Social Bookmarking
Ph.D. Thesis

Recommender Systems for Social Bookmarking

Ph.D. Thesis, Tilburg University, 2009
ISBN 978-90-8559-582-3

Download PDF [5.2 MB]
Recommender systems belong to a class of personalized information filtering technologies that aim to identify which items in a collection might be of interest to a particular user. Recommendations can be made using a variety of information sources related to both the user and the items: past user preferences, demographic information, item popularity, the metadata characteristics of the products, etc. Social bookmarking websites, with their emphasis on open collaborative information access, offer an ideal scenario for the application of recommender systems technology. They allow users to manage their favorite bookmarks online through a web interface and, in many cases, allow their users to tag the content they have added to the system with keywords. The underlying application then makes all information sharable among users. Examples of social bookmarking services include Delicious, Diigo, Furl, CiteULike, and BibSonomy.
In my Ph.D. thesis I describe the work I have done on item recommendation for social bookmarking, i.e., recommending interesting bookmarks to users based on the content they bookmarked in the past. In my experiments I distinguish between two types of information sources. The first one is usage data contained in the folksonomy, which represents the past selections and transactions of all users, i.e., who added which items, and with what tags. The second information source is the metadata describing the bookmarks or articles on a social bookmarking website, such as title, description, authorship, tags, and temporal and publication-related metadata. I compare and combine the content-based aspect with the more common usage-based approaches. I evaluate my approaches on four data sets constructed from three different social bookmarking websites: BibSonomy, CiteULike, and Delicious. In addition, I investigate different combination methods for combining different algorithms and show which of those methods can successfully improve recommendation performance.
Finally, I consider two growing pains that accompany the maturation of social bookmarking websites: spam and duplicate content. I examine how widespread each of these problems are for social bookmarking and how to develop effective automatic methods for detecting such unwanted content. Finally, I investigate the influence spam and duplicate content can have on item recommendation.
In the news
Together with CiteULike, I have applied my work on their website to provide automated article recommendations to their users. Our preliminary experiments have generated a couple of mentions in the blogosphere.
After my defense date I plan to make several resources available here. They will be made available in several phases and will include:
  • The data sets I used in my experiments
  • The algorithms I used in my experiments, bundled together in a Python package