A Practical Series on Recommendation Systems

Today, I am announcing a series of posts I am developing about recommendation systems. The series is aimed at software/machine learning engineers. I have two goals for the series:

  • Provide practical and implementable strategies for delivering recommendations in real-time
  • Present the mathematical intuition behind recommender problems

The reason for the first goal is that I believe the existing guides around recommendation systems are either (a) too academic to be implementable or (b) too simple to be useful. We’ll use real data and real code to demonstrate recommendation techniques and they’ll be a lot of cleaning and preprocessing to do to make these systems work. The data set we use will be big enough that we need to be clever but small enough that we can run the examples locally. And as in real life, the question of whether the recommendations are good or bad won’t have an immediately obvious answer.

As to the second goal, I believe it’s the developer’s level of intuition for the mathematics of recommender algorithms that separates good recommenders from great ones. You don’t need math much beyond basic linear algebra to become great at building recommendation systems. But you will need an intuition for how and why recommender systems work and, as we’ll see, linear algebra is almost always the answer. We’ll always develop the mathematics in service of our first goal — to build very practical and useful recommendation systems.

The data we’ll use

Throughout the series, we’ll use the Goodreads data made available from UCSD. This data includes:

  • Reading, rating, and review data from over 800k users and over 2M books
  • Metadata for each book

Our goal using this data is to build a system that can suggest to users what to read next.

The specific datasets we’ll need from the UCSD site are:

Required Software

We’ll work in Python throughout the series. Our examples will use Python 3.8 but try to maintain compatibility with all Python 3+ versions. In addition to the standard libraries, we’ll make extensive use of:

without much explanation.

We’ll present simple code examples in-line in the posts to illustrate the computations. For more complete implementations, we’ll link out to the companion GitHub project for the series.

Contents (Work in Progress)

  1. Representing user behaviors and preferences
    1. Use matrix algebra to recommend books
    2. A co-occurrence recommender for Goodreads data
    3. A Probability Model
    4. Markov Chain Model Recap
  2. Collaborative Filtering and Factorization techniques
  3. TBD
  4. TBD