in #metabrainz

2:11 AM
yyoung

yvanzo: Hi, about the implementation, I found something that still needs discussion
2:12 AM
In my initial thought, to add a link to the list, users will have to: 1. enter the url 2. pass the validation
2:13 AM
However, URL validation isn't enough, because if there's relationship error after the link is added, users will have to open the popover to change the URL, which is kind of annoying
2:14 AM
Therefore we'll have to require the user to also select a relationship type before appending the link
2:15 AM
But that'll make adding links on blurred impossible, leaving only pressing enter or maybe other ways
2:17 AM
Also, if I didn't get it wrong, in the new UI, users select relationship type after the link is added to the list, so we can't do relationship validation in advance either
2:17 AM
What do you think? :)
3:07 AM
ritiek joined the channel
3:53 AM
gcrk__ joined the channel
3:57 AM
gcrk has quit
4:02 AM
Protab joined the channel
4:09 AM
Rotab has quit
5:26 AM
gcrk__ has quit
6:50 AM
MRiddickW has quit
7:05 AM
gcrk joined the channel
7:42 AM
ritiek has quit
8:20 AM
lucifer

ruaok: morning. hi! do you know some spark sql?
8:21 AM
ruaok

moin. some. never actually written a query, so probably not too helpful.
8:22 AM
lucifer

ah ok. i wanted to get a sanity check on my artist similarity query.
8:27 AM
ruaok

I can try.
8:44 AM
lucifer

https://www.irccloud.com/pastebin/jyqqYaQx/
8:44 AM
explode is equivalent to unnest of postgres.
8:46 AM
I wanted to put the explode column in the FROM or group by but it isn't supported there so had to create a temp table.
9:16 AM
ruaok: i think i found a bug in dataframe generation. the where clause looks wrong. https://github.com/metabrainz/listenbrainz-serv...
9:16 AM
ruaok

I wonder if such a bug would affect the user similarity stuff.
9:16 AM
ruaok is digesting the query above
9:17 AM
lucifer

firstly, i am not sure how this even works with where, count is an aggregate so should have been having. anyways, moving on this is comparing each row with threshold whereas it should be comparing sum of all listens of a user.
9:17 AM
ruaok

can we chat about the first query before we dive into this bug?
9:17 AM
lucifer

sure
9:18 AM
ruaok

k. so the temp table is only there to expand the arguments we pass to the query?
9:19 AM
lucifer

yes
9:19 AM
ruaok

and then the rest seems to be equivalent to a standard PG query that does a count/group by.
9:19 AM
`dense_rank() over(order by user_name) as user_id`
9:19 AM
is the switch from user_name to user_id intentional here?
9:20 AM
lucifer

yes, we want to assign each user_name and mbid a numeric index that can be used as index in the matrix.
9:20 AM
ruaok

k
9:21 AM
ah, I see how. the dense rank function creates these artist id and user ids suitable for feeding to spark.
9:21 AM
yes, that looks sane to me.
9:22 AM
lucifer

cool, thanks. i forgot to add threshold to this, currently working on that.
9:23 AM
ruaok

hmm. is this for artist similarities or for user similarities?
9:23 AM
lucifer

user similarities.
9:23 AM
ruaok

"artist similarity query." for user similarties, I figure.
9:23 AM
k
9:23 AM
yes, looks fully sane.
9:24 AM
now lets look at the dataframe issue.
9:24 AM
lucifer

yes
9:24 AM
ruaok

because I wonder if fixing that might have an impact on the user similarities based on recordings.
9:25 AM
lucifer

i think it should, my understanding is that currently it is ignoring all listen counts less than 50.
9:25 AM
that is listen counts of a particular recording of a particular user
9:26 AM
ruaok

oh. and not total listen counts per user? because that was the goal and this would most certainly explain how things went to shit.
9:26 AM
lucifer

right
9:26 AM
and same goes for recs
9:27 AM
ruaok

oh yes, that most certainly does what you say it does.
9:27 AM
and it explains the top similar users and how borked it is.
9:28 AM
lucifer

right, i checked some top similar users, they don't have similar recordings rather lots of recordings lisetned to a large number of times.
9:28 AM
also explains why i am many times in top similar users :)
9:28 AM
ruaok

exactly that.
9:28 AM
phew, finally and explanation for it.
9:28 AM
lets fix this query and re-run user similarities.
9:28 AM
lucifer

yup, on it
9:28 AM
ruaok

thanks
9:32 AM
Etua joined the channel
9:39 AM
gcrk: ping
9:39 AM
gcrk

ruaok, pong
9:39 AM
ruaok

hey, have you started using that API to lookup your tracks?
9:40 AM
gcrk

hi :) Did got to it yet
9:40 AM
ruaok

ok, thanks. then I need to hunt down a problem we've having. :)
9:41 AM
gcrk

I am wondering where to build something around
9:41 AM
you are talking about mbid lookup?
9:41 AM
ruaok

yes.
9:41 AM
something has made that endpoint busy or angry. not sure what yet
9:41 AM
gcrk

I was considering trying to hack a lookup function into listenbrainz itself, so if a listen does not have a mbid I can just hit "lookup mbid" and get there
9:42 AM
its probably the easiest way to play around with it
9:43 AM
On the other side something like a pull function into funkwhale would be neat, but i have along list of todos there
9:43 AM
ruaok

depends on what your goal is.
9:43 AM
I think the first thing is to figure out what works -- then we can figure out how to deploy it.
9:44 AM
gcrk

oh I expected this is already somehow stable?
9:44 AM
ruaok

everything is still in flux, to varying degrees.
9:44 AM
the APIs we expose on labs.api.listenbrainz.org are stable, but undocumented.
9:44 AM
meaning that we haven't committed to scaling them yet.
9:45 AM
and I see that the very powerful typesense lookup we offer is not likely going to scale very well on our servers.
9:46 AM
so, we'll likely need to move to lookup matches from our MBID mapping that we're working on.
9:46 AM
gcrk

So my hole motivation is to get to this recommendation stuff, which obviously needs good listening data. so the fact funkwhale reports listens without mbid seems to be a problem to work on first
9:46 AM
ruaok

very much agreed.
9:46 AM
then, somehow, make it so that our recommendation engine can make effective recommendations without being a privacy nightmare.
9:47 AM
gcrk

I wont be any help in this regard :(
9:48 AM
Just because I have no idea about these algorithms :D
9:49 AM
ruaok

no need for you to understand that part -- we've got the knowledge for that.
9:49 AM
we need candidate sets to recommend effectively.
9:50 AM
a candidate set is a list of track that can be recommended for any given user.
9:50 AM
*tracks
9:50 AM
gcrk

So jeah, lets dump my current state of mind before lunch: I think the community of funkwhale would get angry if funkwhale stats changing the metadata without review. So we would need some kind of suggestions which users can simply hit "take over this data"
9:50 AM
in the best case its only the mbid
9:50 AM
ruaok

very much agreed.
9:51 AM
because this process can go wrong. we see this all the time when people put too much blind faith into picard.
9:51 AM
load 10,000 albums and lookup/then save.
9:51 AM
then we get an email "YOU DESTROYED MY MUSIC COLLECTION". 🙄
9:52 AM
tandy[m]

<ruaok "then we get an email "YOU DESTRO"> lol
9:52 AM
gcrk

And besides the technical problems with that I don't think its something we should encourage. it drives away users from their libraries and we want to raise the value of music, not follow this spotify shit where its only about consuming whatever appeals
9:52 AM
tandy[m]

i wish beets had more man power behind it
9:53 AM
picard is doing really well tho, compared to when i used to use it in 2019
9:53 AM
ruaok

gcrk: <3 I love how you think. matches my desires perfectly.
9:54 AM
gcrk

its a little bit stolen from the beets website I think, I just added some spotify hate
9:54 AM
ruaok

I'm glad you turned up here -- I think funkwhale could be our killer app for our recommendations.
9:54 AM
people who have FW installations are very much the music nerds we want.
9:54 AM
gcrk

I actually think we should make this a plugin though
9:55 AM
so each user can decide to "leak" the listens and get recommendations in exchange, or not
9:55 AM
ruaok

yeah, I am not even thinking about the implementation details yet.
9:55 AM
there are too many major privacy issues to deal with first.
9:55 AM
gcrk

well, its good to have some considerations though to prepare the stuff
9:55 AM
ruaok

everything MUST be opt-in.
9:56 AM
gcrk

anyway, lets stop pretending to work and get some lunch. see ya! :)
9:56 AM
ruaok

bye!
10:11 AM
Etua has quit
10:41 AM
yvanzo: bitmap: we've had some emails over the weekend about CAA being borked. I looked at the RMQ graph and it seems sane. are things working correctly then?
10:56 AM
lucifer: the typesense server freaked out and couldn't be restarted. I upgraded to 0.20.0 and rebuilt the index. lookup speed has drastically improved now. now around 80 reqs/s. :) :)
10:56 AM
lucifer

awesome :DD
11:15 AM
BrainzGit

[critiquebrainz] 14dependabot-preview[bot] opened pull request #366 (03master…dependabot/npm_and_yarn/codemirror-5.62.0): [Security] Bump codemirror from 5.45.0 to 5.62.0 https://github.com/metabrainz/critiquebrainz/pu...
11:15 AM
[critiquebrainz] 14dependabot-preview[bot] closed pull request #362 (03master…dependabot/npm_and_yarn/codemirror-5.61.1): [Security] Bump codemirror from 5.45.0 to 5.61.1 https://github.com/metabrainz/critiquebrainz/pu...
11:19 AM
gcrk has quit
11:30 AM
lucifer

ruaok: we didn't add a page for missing musicbrainz data page yet, right? is that a part of jasondk's gsoc project?
11:31 AM
ruaok

correct. no.
11:32 AM
lucifer

cool, i'll take a stab it later this week then.
11:32 AM
ruaok

where is the superman emoji??
11:34 AM
lucifer

this one? 🦸
11:34 AM
ruaok

ahhh, yes.
11:35 AM
!m 🦸 lucifer
11:35 AM
BrainzBot

You're doing good work, 🦸 lucifer!
11:35 AM
lucifer

lol :D
11:36 AM
ruaok

hmm. I'm not one to suggest that we imitate amazon's corporate culture, but this is interesting: https://www.justingarrison.com/blog/2021-03-15-...
11:37 AM
we do this for releases and it has made our complicated releases much much more successful. and we create docs for hackdays and more complicated features.
12:00 PM
ritiek joined the channel
12:19 PM
alastairp

OK
12:19 PM
conference reviews finished
12:23 PM
ruaok: I especially like the ability to go back to a document and see why we decided to do something. We've found a few things that seem like stupid decisions we made, but no paper trail of why we originally thought it was a good idea
12:23 PM
I've started trying to add a bit more detail to these documents that we've been making, I guess time will tell as to if they were useful or not
12:24 PM
Mineo has quit
12:27 PM
lucifer

https://www.irccloud.com/pastebin/zrrbuMbd/
12:27 PM
ok that quickly went out of hand :(
12:27 PM
ruaok: ^ artist query with thresholding.
12:28 PM
i think the solution for recordings will be similar (it using self joins though, why?), will need to wrap the query in a subquery to add a where clause.
12:29 PM
Mineo joined the channel
12:30 PM
actually, ignore that query, it'll mess up the artist_id stuff, let me rethink it.
12:52 PM
MRiddickW joined the channel
13:01 PM
alastairp

lucifer: hi, were we planning on doing an AB discussion, or did we do that on Friday?
13:20 PM
lucifer

alastairp: yeah, we had discussed the python3 migration briefly on friday. between then and now i ran futurize tool on the ab codebase and it seems only minimal changes are required to run on python 3.
13:21 PM
ruaok

`mbids = [x for x in mbids if x]`
13:21 PM
alastairp

amazing, sounds good
13:21 PM
ruaok

that reads funny, but is awesome. :)
13:21 PM
alastairp

ruaok: to skip '' and None ?