yvanzo ,for some reason the setting to change my VM from 10 to 16gb didn't take, hardware acceleration was off and video memory was at 4mb. I've tweaked the setings back to something more reasonable so I'm going to give it another go.
alastairp: that day when you were talking about batches of 100, we're you suggesting to input data of 100 users in here and get the recs of 100 altogether?
[listenbrainz-server] paramsingh merged pull request #1063 (master…pydantic-model-dataframes): Pydantic model for data returned from spark (create_dataframes.py) https://github.com/metabrainz/listenbrainz-serv...
alastairp
pristine___: it was a hypothesis based on the number of joins you were doing
yes, get the recs of all 100, and so when you do the join to go from index -> recording/artist ids (I think that's what it was doing?) you're only doing 1 join per 100 users, not 100 joins
alastairp: I have collected recs of all the users and performed a single join. I have done this improvement in the new PR ( 1073). The join here is not the only bottleneck. The script is slow because we were generating recs for all users one at a time i.e. we were calling the func I linked above for every user. So this morning I realised what if we call that fun (generate_rec) once for 100/1000 users. It will
drastically reduce the run time. Thanks for the hypothesis.
BrainzGit
[listenbrainz-server] paramsingh opened pull request #1074 (master…param/move-python-tests-to-jenkins): [wip] move python tests over to jenkins from travis https://github.com/metabrainz/listenbrainz-serv...
i have today off and no plans, so am working on MeB stuff
well, technically working on MeB stuff is a plan
ruaok
lol
I'm reading your doc and we both thought of completely different stuff.
in a really good way, from what I can see.
d4rkie joined the channel
iliekcomputers
it's why i wrote it down!
to make sure we were in sync
ruaok
<3
Nyanko-sensei has quit
let me throw my thoughts on the bottom of the paper -- it almost feels like sullying your clean plan, lol.
iliekcomputers
sure
pristine___
iliekcomputers: will have a look. Thanks
Gazooo794 has quit
Gazooo794 joined the channel
alastairp
pristine___: oh, interesting. I'm not sure that I understand - do you mean that to predict items for 100 users is approximately as fast as predicting items for 1 user?
ruaok
that would not surprise me given spark, really.
iliekcomputers
me either
ruaok
iliekcomputers: do you use google photos?
alastairp
it makes a certain amount of sense. if that's the case, it's the same kind of speedup we got in AB higlevel too - the majority of the time was spent loading the models and getting them into a format in memory that is useful. actually passing data through the model to get an output is the fast part
iliekcomputers
yes
ruaok
alastairp: that
alastairp
sweet, looking forward to a 100x speedup overnight, then
ruaok
iliekcomputers: I love the scrollbar in the main photos timeline. it gives a good overview of your photos timeline.
imagine if FB had that.
alastairp
FB? 😱
Rotab
FB 😍
pristine___
alastairp: yup. Runtime for 100 (or even 200) user ~ runtime for one user. I am limiting it to 100 users because I need to perform a join afterwards. The join can lead to OOM because it will be a cross join. If the join wouldn't have been there, I would have generated recs for all (10k users) at once. :p
alastairp
pristine___: great. what exactly is the join for?
I'm not sure exactly how this storage part of spark works, but can you generate all at once, then generate a new data table/datastore/rrd/whatever for a smaller number of users and join against that?
pristine___
So we get recording ids from the recommender. The join is to get the corresponding recording mbids.
ruaok
iliekcomputers: dumped my thoughts. we can discsuss now while things are fresh or we can let them simmer for a bit and discuss later....
iliekcomputers
let me give it a read
alastairp
pristine___: how many rows in the final recommendations?
I'm surprised that postgres can join a billion rows without breaking a sweat but spark runs out of memory
ruaok
that reminds me, I want to try and read MLHD into timescale....
alastairp
oh nice
pristine___: anyway, I'd seriously look into doing the predictions on a lot of users, but the join on a smaller subset
iliekcomputers
ruaok: read through it, left a few comments, mostly agreements, i think we're on the same page in terms of what we want to do
in the long term
ruaok
I expected as much reading your stuff. let me read the comments.
pristine___
alastairp: Given your postgres remark I was thinking to test join for all users. I have OOM in past, but maybe they were not because of the size of dataset. It will be a nice thing to test on cluster
iliekcomputers
not sure if you have any comments on the features that I'm proposing to build specifically, maybe more comments will come up when i have a more technical design ;)
alastairp
pristine___: yeah, right. this is definitely something that we should try, and only do it differently if it doesn't work
pristine___
Rec for all and join for all. I am excited to see how much the runtime will be reduced.
alastairp: on a side note, I was just going through pyspark github. It is fairly easy to get all time recs for users. And many other things.
I started looking in to the spark stuff about a month ago but got distracted this month, I'll start looking at it again
ruaok
iliekcomputers: I have two key things to discuss about your proposal. 1) recommending a track without giving any comment on the recommendation feels like its missing something. somehow incomplete. 2) The interactions with BrainzPlayer need improvements for a better user experience
iliekcomputers
hmm.
let's chat about 1
ruaok
k.
iliekcomputers
let's chat about 1 first, i mean
ruaok
I fully understands that adding comment option opens pandoras's box.
iliekcomputers
i'm not totally against it. i was trying to keep things simple. if we do go that road, i assume the comment would ideally live in CB
reosarevok
should the user recommending give the comment, or the one who gets recommended?
(or both)
ruaok
hmmm. CB.
iliekcomputers
the user recommending the track, i would say
ruaok
anothe can of worms, really.
iliekcomputers
i would be against allowing threads for nw
so the users who see the recommendation wouldn't be able to comment
ruaok
yes, no threads or comments. #dontreadthecomments
reosarevok
haha
ruaok
counter-recommend, not comment.
reosarevok
One thing that would be cool is to be told whether the user you recommended it to loved/hated it
(you have buttons for that, right?)
ruaok
that I see useful.
reosarevok
So, you get *some* reaction, but without the comments
ruaok
up/down voting.
reosarevok
If you want to actually talk about it, send the user a message
iliekcomputers
again, a good idea, not sure if we want it in the initial version.
reosarevok
Via MB or whatever (that'd be easier once we had MeB accounts not mainly living in MB :p)
ruaok
+1 to both iliekcomputers and reosarevok
I think limiting the features in the first run will give us a better idea as to how to continue better in the long run
reosarevok
Yeah
iliekcomputers
this project would essentially set up the base, allowing people to add friends
once that's done, there's a whole host of things we could do
reosarevok
IMO as a start, just allowing people to send recommendations and that's it is enough
ruaok
and recommend a recording, yes?
iliekcomputers
yeah.
reosarevok
I mean, that's what we sometimes do here, with zas dropping a bandcamp link or whatever
ruaok
great. I think that is a very good first step.
reosarevok: THAT
this allows a vehicle for zas to do this. and I want to read his feed.
then lets move to point 2.
iliekcomputers
wait
reosarevok
Later you can decide whether you want a "timeline" with "your friend X got a BB badge!!"