in #metabrainz

0:25 AM
supersandro2000 has quit
0:25 AM
supersandro2000 joined the channel
0:38 AM
nelgin

yvanzo ,for some reason the setting to change my VM from 10 to 16gb didn't take, hardware acceleration was off and video memory was at 4mb. I've tweaked the setings back to something more reasonable so I'm going to give it another go.
3:17 AM
Sophist-UK has quit
3:26 AM
Sophist-UK joined the channel
3:56 AM
2020-09-04 01:12:29,450: Importing recording...
3:56 AM
client_loop: send disconnect: Broken pipe
3:56 AM
Very nice. My VM died.
4:56 AM
sumedh joined the channel
5:16 AM
ishaanshah

Morning
5:25 AM
shivam-kapila

Morning
5:27 AM
sumedh has quit
5:33 AM
sumedh joined the channel
5:56 AM
travis-ci joined the channel
5:56 AM
travis-ci

Project bookbrainz-site build #3406: passed in 5 min 8 sec: https://travis-ci.org/bookbrainz/bookbrainz-sit...
5:56 AM
travis-ci has left the channel
6:12 AM
sumedh has quit
6:24 AM
BrainzGit

[listenbrainz-server] paramsingh merged pull request #1072 (master…ishaan/shift_stats_cronjobs): Shift cronjobs by 12 hours https://github.com/metabrainz/listenbrainz-serv...
6:29 AM
sumedh joined the channel
6:38 AM
[listenbrainz-server] release v-2020-09-04.0 has been published by release-drafter[bot]: https://github.com/metabrainz/listenbrainz-serv...
6:41 AM
[listenbrainz-server] paramsingh merged pull request #1064 (master…dependabot/pip/spotipy-2.14.0): Bump spotipy from 2.12.0 to 2.14.0 https://github.com/metabrainz/listenbrainz-serv...
6:41 AM
[listenbrainz-server] paramsingh merged pull request #1066 (master…dependabot/pip/flask-cors-3.0.9): Bump flask-cors from 3.0.8 to 3.0.9 https://github.com/metabrainz/listenbrainz-serv...
6:41 AM
[listenbrainz-server] paramsingh merged pull request #1067 (master…dependabot/pip/yattag-1.14.0): Bump yattag from 1.13.2 to 1.14.0 https://github.com/metabrainz/listenbrainz-serv...
6:41 AM
[listenbrainz-server] paramsingh merged pull request #1069 (master…dependabot/pip/pytest-cov-2.10.1): Bump pytest-cov from 2.10.0 to 2.10.1 https://github.com/metabrainz/listenbrainz-serv...
6:41 AM
[listenbrainz-server] paramsingh merged pull request #1068 (master…dependabot/pip/sphinx-3.2.1): Bump sphinx from 3.1.2 to 3.2.1 https://github.com/metabrainz/listenbrainz-serv...
6:56 AM
[listenbrainz-server] paramsingh closed pull request #1065 (master…dependabot/pip/ujson-3.1.0): Bump ujson from 1.35 to 3.1.0 https://github.com/metabrainz/listenbrainz-serv...
6:57 AM
pristine___

ruaok: hey
7:10 AM
jmp_music_

Morning!
7:30 AM
v6lur joined the channel
7:31 AM
pristine___

https://stackoverflow.com/questions/46904078/sp...
7:35 AM
iliekcomputers

https://www.dropbox.com/scl/fi/5ckz83v6goguorpf...
7:36 AM
ruaok: ^
7:36 AM
put my thoughts down for later today.
7:38 AM
reosarevok

Fun!
7:41 AM
alastairp

iliekcomputers: cool. so this is mostly related to user-user interactions, rather than automated recommendations?
7:41 AM
iliekcomputers

alastairp: yes!
7:42 AM
alastairp

great. I started an overview document for the second one too, so a good combination
7:42 AM
iliekcomputers

This could eventually lead to data that we can feed into automated recommendations
7:43 AM
But yeah, for now it is essentially an counterpart that's more human based
7:44 AM
alastairp

please feel free to add text or comments
7:53 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1677 (master…MBS-11065): MBS-11065: Only block smart links if they have a path https://github.com/metabrainz/musicbrainz-serve...
7:53 AM
BrainzBot

MBS-11065: Smart link blocks affecting legitimate links https://tickets.metabrainz.org/browse/MBS-11065
8:01 AM
pristine___

https://github.com/metabrainz/listenbrainz-serv...
8:02 AM
alastairp: that day when you were talking about batches of 100, we're you suggesting to input data of 100 users in here and get the recs of 100 altogether?
8:10 AM
BrainzGit

[listenbrainz-server] paramsingh merged pull request #1040 (master…document-time-range): Add a docstring for the time_range https://github.com/metabrainz/listenbrainz-serv...
8:10 AM
[listenbrainz-server] paramsingh merged pull request #1051 (master…stats): LB-708: Merge the __init__.py and utils.py in stats directory. https://github.com/metabrainz/listenbrainz-serv...
8:10 AM
BrainzBot

LB-708: Merge the __init__.py and utils.py in stats directory. https://tickets.metabrainz.org/browse/LB-708
8:13 AM
BrainzGit

[listenbrainz-server] paramsingh merged pull request #1063 (master…pydantic-model-dataframes): Pydantic model for data returned from spark (create_dataframes.py) https://github.com/metabrainz/listenbrainz-serv...
8:15 AM
alastairp

pristine___: it was a hypothesis based on the number of joins you were doing
8:16 AM
yes, get the recs of all 100, and so when you do the join to go from index -> recording/artist ids (I think that's what it was doing?) you're only doing 1 join per 100 users, not 100 joins
8:20 AM
_lucifer

alastairp: would like you review on this for further improvements https://docs.google.com/document/d/1vaLT5AXont6...
8:28 AM
pristine___

alastairp: I have collected recs of all the users and performed a single join. I have done this improvement in the new PR ( 1073). The join here is not the only bottleneck. The script is slow because we were generating recs for all users one at a time i.e. we were calling the func I linked above for every user. So this morning I realised what if we call that fun (generate_rec) once for 100/1000 users. It will
8:28 AM
drastically reduce the run time. Thanks for the hypothesis.
8:39 AM
BrainzGit

[listenbrainz-server] paramsingh opened pull request #1074 (master…param/move-python-tests-to-jenkins): [wip] move python tests over to jenkins from travis https://github.com/metabrainz/listenbrainz-serv...
8:55 AM
iliekcomputers

pristine___: there are a couple of tests failing in ^ that i think belong to you https://ci.metabrainz.org/job/listenbrainz-spar...
8:56 AM
i'll skip them for now.
8:59 AM
ruaok trundles in
8:59 AM
ruaok

moooin!
8:59 AM
iliekcomputers

morning.
8:59 AM
i have today off and no plans, so am working on MeB stuff
9:00 AM
well, technically working on MeB stuff is a plan
9:00 AM
ruaok

lol
9:00 AM
I'm reading your doc and we both thought of completely different stuff.
9:01 AM
in a really good way, from what I can see.
9:01 AM
d4rkie joined the channel
9:01 AM
iliekcomputers

it's why i wrote it down!
9:02 AM
to make sure we were in sync
9:02 AM
ruaok

<3
9:02 AM
Nyanko-sensei has quit
9:03 AM
let me throw my thoughts on the bottom of the paper -- it almost feels like sullying your clean plan, lol.
9:03 AM
iliekcomputers

sure
9:04 AM
pristine___

iliekcomputers: will have a look. Thanks
9:05 AM
Gazooo794 has quit
9:06 AM
Gazooo794 joined the channel
9:13 AM
alastairp

pristine___: oh, interesting. I'm not sure that I understand - do you mean that to predict items for 100 users is approximately as fast as predicting items for 1 user?
9:13 AM
ruaok

that would not surprise me given spark, really.
9:15 AM
iliekcomputers

me either
9:16 AM
ruaok

iliekcomputers: do you use google photos?
9:16 AM
alastairp

it makes a certain amount of sense. if that's the case, it's the same kind of speedup we got in AB higlevel too - the majority of the time was spent loading the models and getting them into a format in memory that is useful. actually passing data through the model to get an output is the fast part
9:16 AM
iliekcomputers

yes
9:16 AM
ruaok

alastairp: that
9:16 AM
alastairp

sweet, looking forward to a 100x speedup overnight, then
9:16 AM
ruaok

iliekcomputers: I love the scrollbar in the main photos timeline. it gives a good overview of your photos timeline.
9:16 AM
imagine if FB had that.
9:17 AM
alastairp

FB? 😱
9:22 AM
Rotab

FB 😍
9:23 AM
pristine___

alastairp: yup. Runtime for 100 (or even 200) user ~ runtime for one user. I am limiting it to 100 users because I need to perform a join afterwards. The join can lead to OOM because it will be a cross join. If the join wouldn't have been there, I would have generated recs for all (10k users) at once. :p
9:24 AM
alastairp

pristine___: great. what exactly is the join for?
9:24 AM
I'm not sure exactly how this storage part of spark works, but can you generate all at once, then generate a new data table/datastore/rrd/whatever for a smaller number of users and join against that?
9:25 AM
pristine___

So we get recording ids from the recommender. The join is to get the corresponding recording mbids.
9:27 AM
ruaok

iliekcomputers: dumped my thoughts. we can discsuss now while things are fresh or we can let them simmer for a bit and discuss later....
9:27 AM
iliekcomputers

let me give it a read
9:30 AM
alastairp

pristine___: how many rows in the final recommendations?
9:31 AM
I'm surprised that postgres can join a billion rows without breaking a sweat but spark runs out of memory
9:32 AM
ruaok

that reminds me, I want to try and read MLHD into timescale....
9:32 AM
alastairp

oh nice
9:32 AM
pristine___: anyway, I'd seriously look into doing the predictions on a lot of users, but the join on a smaller subset
9:39 AM
iliekcomputers

ruaok: read through it, left a few comments, mostly agreements, i think we're on the same page in terms of what we want to do
9:39 AM
in the long term
9:40 AM
ruaok

I expected as much reading your stuff. let me read the comments.
9:40 AM
pristine___

alastairp: Given your postgres remark I was thinking to test join for all users. I have OOM in past, but maybe they were not because of the size of dataset. It will be a nice thing to test on cluster
9:40 AM
iliekcomputers

not sure if you have any comments on the features that I'm proposing to build specifically, maybe more comments will come up when i have a more technical design ;)
9:41 AM
alastairp

pristine___: yeah, right. this is definitely something that we should try, and only do it differently if it doesn't work
9:41 AM
pristine___

Rec for all and join for all. I am excited to see how much the runtime will be reduced.
9:44 AM
alastairp: on a side note, I was just going through pyspark github. It is fairly easy to get all time recs for users. And many other things.
9:44 AM
https://github.com/apache/spark/blob/master/pyt...
9:44 AM
In case you want to have a look.
9:47 AM
alastairp

yeah, I saw your link yesterday. it looks good
9:48 AM
I started looking in to the spark stuff about a month ago but got distracted this month, I'll start looking at it again
9:48 AM
ruaok

iliekcomputers: I have two key things to discuss about your proposal. 1) recommending a track without giving any comment on the recommendation feels like its missing something. somehow incomplete. 2) The interactions with BrainzPlayer need improvements for a better user experience
9:49 AM
iliekcomputers

hmm.
9:49 AM
let's chat about 1
9:49 AM
ruaok

k.
9:49 AM
iliekcomputers

let's chat about 1 first, i mean
9:49 AM
ruaok

I fully understands that adding comment option opens pandoras's box.
9:50 AM
iliekcomputers

i'm not totally against it. i was trying to keep things simple. if we do go that road, i assume the comment would ideally live in CB
9:51 AM
reosarevok

should the user recommending give the comment, or the one who gets recommended?
9:51 AM
(or both)
9:51 AM
ruaok

hmmm. CB.
9:51 AM
iliekcomputers

the user recommending the track, i would say
9:51 AM
ruaok

anothe can of worms, really.
9:51 AM
iliekcomputers

i would be against allowing threads for nw
9:51 AM
so the users who see the recommendation wouldn't be able to comment
9:51 AM
ruaok

yes, no threads or comments. #dontreadthecomments
9:51 AM
reosarevok

haha
9:52 AM
ruaok

counter-recommend, not comment.
9:52 AM
reosarevok

One thing that would be cool is to be told whether the user you recommended it to loved/hated it
9:52 AM
(you have buttons for that, right?)
9:52 AM
ruaok

that I see useful.
9:52 AM
reosarevok

So, you get *some* reaction, but without the comments
9:52 AM
ruaok

up/down voting.
9:52 AM
reosarevok

If you want to actually talk about it, send the user a message
9:53 AM
iliekcomputers

again, a good idea, not sure if we want it in the initial version.
9:53 AM
reosarevok

Via MB or whatever (that'd be easier once we had MeB accounts not mainly living in MB :p)
9:53 AM
ruaok

+1 to both iliekcomputers and reosarevok
9:53 AM
I think limiting the features in the first run will give us a better idea as to how to continue better in the long run
9:54 AM
reosarevok

Yeah
9:54 AM
iliekcomputers

this project would essentially set up the base, allowing people to add friends
9:54 AM
once that's done, there's a whole host of things we could do
9:54 AM
reosarevok

IMO as a start, just allowing people to send recommendations and that's it is enough
9:54 AM
ruaok

and recommend a recording, yes?
9:54 AM
iliekcomputers

yeah.
9:54 AM
reosarevok

I mean, that's what we sometimes do here, with zas dropping a bandcamp link or whatever
9:54 AM
ruaok

great. I think that is a very good first step.
9:54 AM
reosarevok: THAT
9:55 AM
this allows a vehicle for zas to do this. and I want to read his feed.
9:55 AM
then lets move to point 2.
9:55 AM
iliekcomputers

wait
9:55 AM
reosarevok

Later you can decide whether you want a "timeline" with "your friend X got a BB badge!!"
9:55 AM
ruaok waits