One thing that I have been thinking a lot about since I wrote my chapter on matrix factorization methods, and since I am currently writing a chapter on graph theory is on the idea of a recommendation engine. (If you want to get the book free as I write it click here) They have become super relevant, and probably one of the most important pieces of technology in retail marketing.

Recommendation engines keep you glued to your streaming service, clicking on articles on your favorite news outlet, and buying from your favorite retailers. Recommendation engines have taken the place of sales associates in the online space. They are literally everywhere you go online. The problem is the level of sophistication varies greatly. I myself have a recommendation engine on this blog. It is not using a very fancy algorithm at all. In fact, all it does is prioritize content that has similar tagging that I manually add to the articles. Nothing really sophisticated here, mainly because I don’t want to spend valuable time maintaining it.

So as I was thinking about graph theory, my mind wandered over to the connections between matrix factorization, and the basic idea of recommendation engines. It turns out that you can think of recommendation engines as learning embeddings on a bipartite graph. I won’t go into the gory math details here. At any rate, I got to thinking about how deep learning algorithms have gotten very good at learning embeddings, and I realized that I could get my deep learning algorithms to generate recommendations for a given user.

So, I am not going to productionalize this model. I don’t think that I have enough data from my logs to train such a model effectively, and again I don’t want to maintain a model.

The Data

Obviously, to train a deep learning model, we need some data. In general, we only need one type of data to do this an edge list in a bipartite network where we have two node types, a user type and a product type notice the italics here indicate that there is a lot of wiggle room in what we consider a user and a product. In general a user is what we are making a recommendation to and a product is what we are going to make a recommendation about.

We actually, only need those columns because we can be tricky about how we implement the last column, i.e. we can generate negative cases synthetically, if we only have positive cases in our data. But from a machine learning model, we need something like a rating for the user and product pair. Fortunately for us, the data that we will use has a rating.

We are going to be using the movielens small data set. You can download it from here. It has the following format:


userId
movieIdratingtimestamp
0114.0964982703
1134.0964981247
2164.0964982224
31475.0964983815
41505.0964982931

There isn’t a whole bunch of feature engineering that we need to do here. I do want to change my ratings around just a little bit. I can do that with the following code:

df['rating'] = df['rating'].map({1:-1,2:-1,3:0,4:1,5:1,1.5:-1,2.5:-1,3.5:1,4.5:1})

Other than that, our data was pretty clean. So we’ll leave it at that.

The Model

So at this point what you probably are most interested in is the model for handing out recommendations. The best thing to do in this case is to just take a play out of the collaborative filtering playbook and come up with some sort of matrix factorization method.

Probably, the simplest thing to do is to just create a latent space for users and a latent space for movies and then take the dot product of those two spaces. We can then spit out a rating for the user/movie pair. With that in place all we need to do is back propagate the error between the prediction and the true rating the user gave.

user_input = Input(shape=[1], name="User-Input")
user_embedding = Embedding(len(df.userId.unique())+1, 5, name="User-Embedding")(user_input)
user_vec = Flatten(name="Flatten-users")(user_embedding)

movie_input = Input(shape=[1], name="Movie-Input")
movie_embedding = Embedding(len(df.movieId.unique())+1, 5, name="Movie-Embedding")(movie_input)
movie_vec = Flatten(name="Flatten-movies")(movie_embedding)

prod = Dot(name="Dot-Product", axes=1)([user_vec, movie_vec])
model = Model([user_input, movie_input], prod)
model.compile('adam', 'mean_squared_error')

That’s the entire code for our model. Let’s go through it in three chunks. The first chunk says take a user id as input, and then embed the user into a 5-dimensional space. Flatten it out so that we have a vector.

The next chunk is very similar in that we take a movie id as input and spit out a 5 dimensional vector for the movie.

The last chunk takes the dot product between these two vectors and produces a single number. We then define the model by saying that we want to take the inputs and output the dot product between their latent embedding space. The last line says how to do the back propagation.

That’s it. It really is just collaborative filtering. I don’t think that we have done any real deep learning. All of this could have been defined and done without using a deep learning library like Keras, but Keras sure made it convenient.

Getting Predictions

There are a few ways that you can approach the prediction problem. You can do everything at run-time, and serve up predictions dynamically on the fly. There is something to be said about that.

In my experience, however, I just don’t have the horsepower that would require. So what you can do is you can pre-compute the recommendations and store them in a database somewhere. It is really easy to do since all that you have to do is to put in a user id and a movie id. You can then just store the score and ids in a database table and retrieve them later when your user logs in the next time.

So how do you actually get the scores? Here is an example.

import numpy as np
model.predict([np.array([610]),np.array([1])])

This will give us a recommendation score for user 610 and movie 1. Assuming that you have a database, you can just look these up to make sense of them. I won’t do that here.

Here is the output you get:

array([[1.0163633]], dtype=float32)

It looks like this movie should be very much recommended to this user. I only trained for 3 iterations, because time. I would suspect that after more iterations you wouldn’t see values above 1 like this in the recommendations since our recommendation scores run between -1 and 1.

Let’s go deeper

So when I say let’s go deeper, I mean let’s take advantage of back propagation and add layers to our architecture. Here, I just add two layers to each of the “towers” in my model. Essentially, I create a deep neural network on top of each of my embedding vectors and then I use the same trick. You can also add layers to the bottom of the network if you want to. I’ll leave that to you if you are interested, but the idea here is that you can mess around with the architecture all day and night to try to squeeze out all the performance that you can from this algorithm.

So here is the code for my new model:

user_input = Input(shape=[1], name="User-Input")
user_embedding = Embedding(len(df.userId.unique())+1, 5, name="User-Embedding")(user_input)
user_vec = Flatten(name="Flatten-users")(user_embedding)
user1 = Dense(100, name='uDense1')(user_vec)
user2 = Dense(100, name='uDense2')(user1)

movie_input = Input(shape=[1], name="Movie-Input")
movie_embedding = Embedding(len(df.movieId.unique())+1, 5, name="Movie-Embedding")(movie_input)
movie_vec = Flatten(name="Flatten-movies")(movie_embedding)
movie1 = Dense(100, name='mDense1')(movie_vec)
movie2 = Dense(100, name='mDense2')(movie1)

prod = Dot(name="Dot-Product", axes=1)([user2, movie2])
model = Model([user_input, movie_input], prod)
model.compile('adam', 'mean_squared_error')

movie_model = Model(movie_input,movie_vec)

I ran this model for 3 iterations like the other model, and I got a much lower error, however, the results weren’t that much more impressive. Qualitatively, it appears that I got pretty much the same predictions, perhaps they were a bit more precise, but still the same. It also took way longer to run so take that into account.

Anyway, there you have it. I don’t have time right now to talk about one last point which is to use this algorithm to cluster individuals and movies together. This is important because clustering them together gives you the basis for a search algorithm. And there are neat tricks for hashing the vectors into a continuous space, etc. Perhaps that is a good topic for another blog post.

Leave a Reply

Your email address will not be published. Required fields are marked *