Low ranking accuracy of the example with MovieLens20M? #24
Comments
Hey Saul, One could say similar things of TensorFlow. Google has open sourced its Scott On Fri, May 20, 2016 at 8:17 AM, Saúl notifications@github.com wrote:
|
Hi Scott, Thanks for your contribution. Please let me clarify: I am not asking that's the exact configuration Amazon uses for their systems. I believe it would suffice providing one that performs well enough in a public dataset such as MovieLens 20M that uses, for instance, configurations found in papers such as this one. It is awesome that Amazon releases code like this, I am just kindly requesting a little bit of guidance on how to make the provided example work. Best wishes |
So one easy first step would be to add denoising to the example network, no? Second, if you're willing, that's a very cool paper (I saw something like Scott On Fri, May 20, 2016 at 8:39 AM, Saúl notifications@github.com wrote:
|
PS here's how to do that first step...
On Fri, May 20, 2016 at 8:45 AM, Scott Le Grand varelse2005@gmail.com
|
@saulvargas is it possible to get us the scripts which you used to generate your test and train dataset |
Hi @rgeorgej , Sure! I can prepare a simple repository with all required to reproduce the experiment I performed, although it might take me a couple of days... I'll let you know. Cheers |
Hey Saul, I'd love to make a benchmark out of this for both training speed and predictive performance. Any progress here? |
Hi, I've been working in my spare time in a script and some Java code to fully reproduce the experiment I performed two weeks ago. It is still work in progress as I have to include the steps for DSSTNE, but meanwhile you can take a look here: https://github.com/saulvargas/dsstne-comparison/ Basically, if you execute run.sh, you download the original MovieLens20M dataset, perform a random 80/20 random split for training and test, generate some CF baselines with RankSys and the evaluate the precision@10 of these baselines. Cheers |
Hi @rgeorgej Sorry it took so long, but now I have all the code in https://github.com/saulvargas/dsstne-comparison/ that is required to reproduce the experiments I conducted, now including training DSSTNE with my 80/20% split for MovieLens20M data. I hope it helps. Just execute the steps in run.sh (the ub recommender might take a while to be trained). Cheers |
Thanks @saulvargas for all the help and we will take it to our side to get you a decent config with offline performance for the movie lens data |
So with a fairly simple autoencoder applied to an 80/20 split of the dataset, I get a precision of 8.75%@10. That's a far cry from your best efforts, but that's out of the gate so let's see where we can take it. |
So I was splitting 80/20 on users, you were splitting on movie views. I'll have that data for you by tomorrow. Got P@10 to 9.3% for that partitioning on the second try though so I suspect there's lots of headroom for improvement. |
With this fix, 32.7% P@10, 48.4% P@1. Will post to github tonight after work, but here's the first submission, incorporating input denoising and a sparseness penalty in the hidden layer: {
} |
Round 2, MAP@10 of 41.1% and a P@10 of 35.3%. Network supplied below:
} |
Hi there, I have a related issue about MovieLense example and the choice of ErrorFunction. First, I wonder if during input/output data generation stage (generateNetCDF ...), time stamps (how long users watch a movie) are actually recorded in gl_input.nc file and used as an implicit measure of user-movie affinity, rather than replacing them as a label '1' indicating whether user has watched the movie or not? If the former is the case, I think a regression type error function such as L2 should be used in this Auto Encoding. 'ScaledMarginalCrossEntropy' is relevant to classification setting (like the alternative scenario I mentioned above). thanks, |
Hi all, Sorry it took me so long to get back to you with this. Unfortunately I have not been successful at obtaining a decent ranking accuracy with any of the two last configurations that @scottlegrand kindly provided. They basically perform as bad as the original one for my evaluation methodology. I suspect we may be applying different evaluation protocols and, therefore, the provided configurations may not be adequate for the one I am interested about. If I find some time, I will try to learn enough about ANNs so that I can understand how to come up with a configuration that results in high ranking accuracy for my setup. Therefore, I think we can close this issue now. For your reference I am sharing the data I generated: https://www.dropbox.com/s/krk8mkzynn9igqv/dsstne-comparison-data.zip?dl=0 The code is already here: https://github.com/saulvargas/dsstne-comparison/ |
I was wondering if anyone ran this on the 100K dataset and evaluated its accuracy in terms of MSE? Are these benchmarks from DSSTNE publicly available? |
Hi,
I've been playing around today with DSSTNE with the goal of running the example with MovieLens20M and compare the NN in the example with some state-of-the-art CF algorithms that I have implemented here. From my evaluation (which is by no means exhaustive or perfect) the example provided by DSSTNE does not seem to be competitive with respect to state of the art CF algorithms.
To summarise, I have downloaded the original MovieLens 20M dataset and I have performed a random 80%-20% partition. I have transformed the training subset to the DSSTNE format, with the only difference that I do not include the timestamps of the dataset, but 1's for all movies (is this actually very important??). I have generated recommendations with my CF algorithms (popularity, user-based and matrix factorisation) and, following the steps in the example, the predictions of DSSTNE. Finally, I have evaluated the performance with the testing subset using precision at cutoff 10.
These are the results, the configuration provided in your example does not seem to work very well:
pop 0.10974162112149495
ub 0.24097987334078072
mf 0.25135912784469483
dsstne 0.056956854920365056
I am no expert in ANN's so I cannot figure out easily whether I should modify the parameters in the config.json provided in the example to make it work better. Have you compared the performance of the example with similar CF algorithms? If so, could you please share some results/insights?
Cheers
Saúl
The text was updated successfully, but these errors were encountered: