yvanzo ,for some reason the setting to change my VM from 10 to 16gb didn't take, hardware acceleration was off and video memory was at 4mb. I've tweaked the setings back to something more reasonable so I'm going to give it another go.
alastairp: that day when you were talking about batches of 100, we're you suggesting to input data of 100 users in here and get the recs of 100 altogether?
[listenbrainz-server] paramsingh merged pull request #1063 (master…pydantic-model-dataframes): Pydantic model for data returned from spark (create_dataframes.py) https://github.com/metabrainz/listenbrainz-server…
2020-09-04 24842, 2020
alastairp
pristine___: it was a hypothesis based on the number of joins you were doing
2020-09-04 24816, 2020
alastairp
yes, get the recs of all 100, and so when you do the join to go from index -> recording/artist ids (I think that's what it was doing?) you're only doing 1 join per 100 users, not 100 joins
alastairp: I have collected recs of all the users and performed a single join. I have done this improvement in the new PR ( 1073). The join here is not the only bottleneck. The script is slow because we were generating recs for all users one at a time i.e. we were calling the func I linked above for every user. So this morning I realised what if we call that fun (generate_rec) once for 100/1000 users. It will
2020-09-04 24819, 2020
pristine___
drastically reduce the run time. Thanks for the hypothesis.
2020-09-04 24814, 2020
BrainzGit
[listenbrainz-server] paramsingh opened pull request #1074 (master…param/move-python-tests-to-jenkins): [wip] move python tests over to jenkins from travis https://github.com/metabrainz/listenbrainz-server…
i have today off and no plans, so am working on MeB stuff
2020-09-04 24807, 2020
iliekcomputers
well, technically working on MeB stuff is a plan
2020-09-04 24821, 2020
ruaok
lol
2020-09-04 24856, 2020
ruaok
I'm reading your doc and we both thought of completely different stuff.
2020-09-04 24810, 2020
ruaok
in a really good way, from what I can see.
2020-09-04 24853, 2020
d4rkie joined the channel
2020-09-04 24853, 2020
iliekcomputers
it's why i wrote it down!
2020-09-04 24814, 2020
iliekcomputers
to make sure we were in sync
2020-09-04 24828, 2020
ruaok
<3
2020-09-04 24831, 2020
Nyanko-sensei has quit
2020-09-04 24802, 2020
ruaok
let me throw my thoughts on the bottom of the paper -- it almost feels like sullying your clean plan, lol.
2020-09-04 24820, 2020
iliekcomputers
sure
2020-09-04 24838, 2020
pristine___
iliekcomputers: will have a look. Thanks
2020-09-04 24802, 2020
Gazooo794 has quit
2020-09-04 24847, 2020
Gazooo794 joined the channel
2020-09-04 24804, 2020
alastairp
pristine___: oh, interesting. I'm not sure that I understand - do you mean that to predict items for 100 users is approximately as fast as predicting items for 1 user?
2020-09-04 24848, 2020
ruaok
that would not surprise me given spark, really.
2020-09-04 24817, 2020
iliekcomputers
me either
2020-09-04 24806, 2020
ruaok
iliekcomputers: do you use google photos?
2020-09-04 24816, 2020
alastairp
it makes a certain amount of sense. if that's the case, it's the same kind of speedup we got in AB higlevel too - the majority of the time was spent loading the models and getting them into a format in memory that is useful. actually passing data through the model to get an output is the fast part
2020-09-04 24818, 2020
iliekcomputers
yes
2020-09-04 24827, 2020
ruaok
alastairp: that
2020-09-04 24851, 2020
alastairp
sweet, looking forward to a 100x speedup overnight, then
2020-09-04 24852, 2020
ruaok
iliekcomputers: I love the scrollbar in the main photos timeline. it gives a good overview of your photos timeline.
2020-09-04 24857, 2020
ruaok
imagine if FB had that.
2020-09-04 24808, 2020
alastairp
FB? 😱
2020-09-04 24803, 2020
Rotab
FB 😍
2020-09-04 24841, 2020
pristine___
alastairp: yup. Runtime for 100 (or even 200) user ~ runtime for one user. I am limiting it to 100 users because I need to perform a join afterwards. The join can lead to OOM because it will be a cross join. If the join wouldn't have been there, I would have generated recs for all (10k users) at once. :p
2020-09-04 24812, 2020
alastairp
pristine___: great. what exactly is the join for?
2020-09-04 24846, 2020
alastairp
I'm not sure exactly how this storage part of spark works, but can you generate all at once, then generate a new data table/datastore/rrd/whatever for a smaller number of users and join against that?
2020-09-04 24807, 2020
pristine___
So we get recording ids from the recommender. The join is to get the corresponding recording mbids.
2020-09-04 24821, 2020
ruaok
iliekcomputers: dumped my thoughts. we can discsuss now while things are fresh or we can let them simmer for a bit and discuss later....
2020-09-04 24834, 2020
iliekcomputers
let me give it a read
2020-09-04 24846, 2020
alastairp
pristine___: how many rows in the final recommendations?
2020-09-04 24818, 2020
alastairp
I'm surprised that postgres can join a billion rows without breaking a sweat but spark runs out of memory
2020-09-04 24800, 2020
ruaok
that reminds me, I want to try and read MLHD into timescale....
2020-09-04 24806, 2020
alastairp
oh nice
2020-09-04 24842, 2020
alastairp
pristine___: anyway, I'd seriously look into doing the predictions on a lot of users, but the join on a smaller subset
2020-09-04 24851, 2020
iliekcomputers
ruaok: read through it, left a few comments, mostly agreements, i think we're on the same page in terms of what we want to do
2020-09-04 24857, 2020
iliekcomputers
in the long term
2020-09-04 24818, 2020
ruaok
I expected as much reading your stuff. let me read the comments.
2020-09-04 24831, 2020
pristine___
alastairp: Given your postgres remark I was thinking to test join for all users. I have OOM in past, but maybe they were not because of the size of dataset. It will be a nice thing to test on cluster
2020-09-04 24856, 2020
iliekcomputers
not sure if you have any comments on the features that I'm proposing to build specifically, maybe more comments will come up when i have a more technical design ;)
2020-09-04 24801, 2020
alastairp
pristine___: yeah, right. this is definitely something that we should try, and only do it differently if it doesn't work
2020-09-04 24819, 2020
pristine___
Rec for all and join for all. I am excited to see how much the runtime will be reduced.
2020-09-04 24841, 2020
pristine___
alastairp: on a side note, I was just going through pyspark github. It is fairly easy to get all time recs for users. And many other things.
I started looking in to the spark stuff about a month ago but got distracted this month, I'll start looking at it again
2020-09-04 24846, 2020
ruaok
iliekcomputers: I have two key things to discuss about your proposal. 1) recommending a track without giving any comment on the recommendation feels like its missing something. somehow incomplete. 2) The interactions with BrainzPlayer need improvements for a better user experience
2020-09-04 24824, 2020
iliekcomputers
hmm.
2020-09-04 24827, 2020
iliekcomputers
let's chat about 1
2020-09-04 24835, 2020
ruaok
k.
2020-09-04 24840, 2020
iliekcomputers
let's chat about 1 first, i mean
2020-09-04 24848, 2020
ruaok
I fully understands that adding comment option opens pandoras's box.
2020-09-04 24852, 2020
iliekcomputers
i'm not totally against it. i was trying to keep things simple. if we do go that road, i assume the comment would ideally live in CB
2020-09-04 24801, 2020
reosarevok
should the user recommending give the comment, or the one who gets recommended?
2020-09-04 24810, 2020
reosarevok
(or both)
2020-09-04 24811, 2020
ruaok
hmmm. CB.
2020-09-04 24814, 2020
iliekcomputers
the user recommending the track, i would say
2020-09-04 24819, 2020
ruaok
anothe can of worms, really.
2020-09-04 24831, 2020
iliekcomputers
i would be against allowing threads for nw
2020-09-04 24845, 2020
iliekcomputers
so the users who see the recommendation wouldn't be able to comment
2020-09-04 24847, 2020
ruaok
yes, no threads or comments. #dontreadthecomments
2020-09-04 24852, 2020
reosarevok
haha
2020-09-04 24803, 2020
ruaok
counter-recommend, not comment.
2020-09-04 24806, 2020
reosarevok
One thing that would be cool is to be told whether the user you recommended it to loved/hated it
2020-09-04 24816, 2020
reosarevok
(you have buttons for that, right?)
2020-09-04 24822, 2020
ruaok
that I see useful.
2020-09-04 24824, 2020
reosarevok
So, you get *some* reaction, but without the comments
2020-09-04 24826, 2020
ruaok
up/down voting.
2020-09-04 24848, 2020
reosarevok
If you want to actually talk about it, send the user a message
2020-09-04 24805, 2020
iliekcomputers
again, a good idea, not sure if we want it in the initial version.
2020-09-04 24808, 2020
reosarevok
Via MB or whatever (that'd be easier once we had MeB accounts not mainly living in MB :p)
2020-09-04 24831, 2020
ruaok
+1 to both iliekcomputers and reosarevok
2020-09-04 24858, 2020
ruaok
I think limiting the features in the first run will give us a better idea as to how to continue better in the long run
2020-09-04 24809, 2020
reosarevok
Yeah
2020-09-04 24813, 2020
iliekcomputers
this project would essentially set up the base, allowing people to add friends
2020-09-04 24825, 2020
iliekcomputers
once that's done, there's a whole host of things we could do
2020-09-04 24836, 2020
reosarevok
IMO as a start, just allowing people to send recommendations and that's it is enough
2020-09-04 24838, 2020
ruaok
and recommend a recording, yes?
2020-09-04 24843, 2020
iliekcomputers
yeah.
2020-09-04 24850, 2020
reosarevok
I mean, that's what we sometimes do here, with zas dropping a bandcamp link or whatever
2020-09-04 24851, 2020
ruaok
great. I think that is a very good first step.
2020-09-04 24857, 2020
ruaok
reosarevok: THAT
2020-09-04 24806, 2020
ruaok
this allows a vehicle for zas to do this. and I want to read his feed.
2020-09-04 24812, 2020
ruaok
then lets move to point 2.
2020-09-04 24819, 2020
iliekcomputers
wait
2020-09-04 24823, 2020
reosarevok
Later you can decide whether you want a "timeline" with "your friend X got a BB badge!!"