yvanzo: Hi, about the implementation, I found something that still needs discussion
In my initial thought, to add a link to the list, users will have to: 1. enter the url 2. pass the validation
However, URL validation isn't enough, because if there's relationship error after the link is added, users will have to open the popover to change the URL, which is kind of annoying
Therefore we'll have to require the user to also select a relationship type before appending the link
But that'll make adding links on blurred impossible, leaving only pressing enter or maybe other ways
Also, if I didn't get it wrong, in the new UI, users select relationship type after the link is added to the list, so we can't do relationship validation in advance either
What do you think? :)
ritiek joined the channel
gcrk__ joined the channel
gcrk has quit
Protab joined the channel
Rotab has quit
gcrk__ has quit
MRiddickW has quit
gcrk joined the channel
ritiek has quit
lucifer
ruaok: morning. hi! do you know some spark sql?
ruaok
moin. some. never actually written a query, so probably not too helpful.
lucifer
ah ok. i wanted to get a sanity check on my artist similarity query.
I wonder if such a bug would affect the user similarity stuff.
ruaok is digesting the query above
lucifer
firstly, i am not sure how this even works with where, count is an aggregate so should have been having. anyways, moving on this is comparing each row with threshold whereas it should be comparing sum of all listens of a user.
ruaok
can we chat about the first query before we dive into this bug?
lucifer
sure
ruaok
k. so the temp table is only there to expand the arguments we pass to the query?
lucifer
yes
ruaok
and then the rest seems to be equivalent to a standard PG query that does a count/group by.
`dense_rank() over(order by user_name) as user_id`
is the switch from user_name to user_id intentional here?
lucifer
yes, we want to assign each user_name and mbid a numeric index that can be used as index in the matrix.
ruaok
k
ah, I see how. the dense rank function creates these artist id and user ids suitable for feeding to spark.
yes, that looks sane to me.
lucifer
cool, thanks. i forgot to add threshold to this, currently working on that.
ruaok
hmm. is this for artist similarities or for user similarities?
lucifer
user similarities.
ruaok
"artist similarity query." for user similarties, I figure.
k
yes, looks fully sane.
now lets look at the dataframe issue.
lucifer
yes
ruaok
because I wonder if fixing that might have an impact on the user similarities based on recordings.
lucifer
i think it should, my understanding is that currently it is ignoring all listen counts less than 50.
that is listen counts of a particular recording of a particular user
ruaok
oh. and not total listen counts per user? because that was the goal and this would most certainly explain how things went to shit.
lucifer
right
and same goes for recs
ruaok
oh yes, that most certainly does what you say it does.
and it explains the top similar users and how borked it is.
lucifer
right, i checked some top similar users, they don't have similar recordings rather lots of recordings lisetned to a large number of times.
also explains why i am many times in top similar users :)
ruaok
exactly that.
phew, finally and explanation for it.
lets fix this query and re-run user similarities.
lucifer
yup, on it
ruaok
thanks
Etua joined the channel
gcrk: ping
gcrk
ruaok, pong
ruaok
hey, have you started using that API to lookup your tracks?
gcrk
hi :) Did got to it yet
ruaok
ok, thanks. then I need to hunt down a problem we've having. :)
gcrk
I am wondering where to build something around
you are talking about mbid lookup?
ruaok
yes.
something has made that endpoint busy or angry. not sure what yet
gcrk
I was considering trying to hack a lookup function into listenbrainz itself, so if a listen does not have a mbid I can just hit "lookup mbid" and get there
its probably the easiest way to play around with it
On the other side something like a pull function into funkwhale would be neat, but i have along list of todos there
ruaok
depends on what your goal is.
I think the first thing is to figure out what works -- then we can figure out how to deploy it.
meaning that we haven't committed to scaling them yet.
and I see that the very powerful typesense lookup we offer is not likely going to scale very well on our servers.
so, we'll likely need to move to lookup matches from our MBID mapping that we're working on.
gcrk
So my hole motivation is to get to this recommendation stuff, which obviously needs good listening data. so the fact funkwhale reports listens without mbid seems to be a problem to work on first
ruaok
very much agreed.
then, somehow, make it so that our recommendation engine can make effective recommendations without being a privacy nightmare.
gcrk
I wont be any help in this regard :(
Just because I have no idea about these algorithms :D
ruaok
no need for you to understand that part -- we've got the knowledge for that.
we need candidate sets to recommend effectively.
a candidate set is a list of track that can be recommended for any given user.
*tracks
gcrk
So jeah, lets dump my current state of mind before lunch: I think the community of funkwhale would get angry if funkwhale stats changing the metadata without review. So we would need some kind of suggestions which users can simply hit "take over this data"
in the best case its only the mbid
ruaok
very much agreed.
because this process can go wrong. we see this all the time when people put too much blind faith into picard.
load 10,000 albums and lookup/then save.
then we get an email "YOU DESTROYED MY MUSIC COLLECTION". 🙄
tandy[m]
<ruaok "then we get an email "YOU DESTRO"> lol
gcrk
And besides the technical problems with that I don't think its something we should encourage. it drives away users from their libraries and we want to raise the value of music, not follow this spotify shit where its only about consuming whatever appeals
tandy[m]
i wish beets had more man power behind it
picard is doing really well tho, compared to when i used to use it in 2019
ruaok
gcrk: <3 I love how you think. matches my desires perfectly.
gcrk
its a little bit stolen from the beets website I think, I just added some spotify hate
ruaok
I'm glad you turned up here -- I think funkwhale could be our killer app for our recommendations.
people who have FW installations are very much the music nerds we want.
gcrk
I actually think we should make this a plugin though
so each user can decide to "leak" the listens and get recommendations in exchange, or not
ruaok
yeah, I am not even thinking about the implementation details yet.
there are too many major privacy issues to deal with first.
gcrk
well, its good to have some considerations though to prepare the stuff
ruaok
everything MUST be opt-in.
gcrk
anyway, lets stop pretending to work and get some lunch. see ya! :)
ruaok
bye!
Etua has quit
yvanzo: bitmap: we've had some emails over the weekend about CAA being borked. I looked at the RMQ graph and it seems sane. are things working correctly then?
lucifer: the typesense server freaked out and couldn't be restarted. I upgraded to 0.20.0 and rebuilt the index. lookup speed has drastically improved now. now around 80 reqs/s. :) :)
lucifer
awesome :DD
BrainzGit
[critiquebrainz] 14dependabot-preview[bot] opened pull request #366 (03master…dependabot/npm_and_yarn/codemirror-5.62.0): [Security] Bump codemirror from 5.45.0 to 5.62.0 https://github.com/metabrainz/critiquebrainz/pu...
[critiquebrainz] 14dependabot-preview[bot] closed pull request #362 (03master…dependabot/npm_and_yarn/codemirror-5.61.1): [Security] Bump codemirror from 5.45.0 to 5.61.1 https://github.com/metabrainz/critiquebrainz/pu...
gcrk has quit
lucifer
ruaok: we didn't add a page for missing musicbrainz data page yet, right? is that a part of jasondk's gsoc project?
we do this for releases and it has made our complicated releases much much more successful. and we create docs for hackdays and more complicated features.
ritiek joined the channel
alastairp
OK
conference reviews finished
ruaok: I especially like the ability to go back to a document and see why we decided to do something. We've found a few things that seem like stupid decisions we made, but no paper trail of why we originally thought it was a good idea
I've started trying to add a bit more detail to these documents that we've been making, I guess time will tell as to if they were useful or not
i think the solution for recordings will be similar (it using self joins though, why?), will need to wrap the query in a subquery to add a where clause.
Mineo joined the channel
actually, ignore that query, it'll mess up the artist_id stuff, let me rethink it.
MRiddickW joined the channel
alastairp
lucifer: hi, were we planning on doing an AB discussion, or did we do that on Friday?
lucifer
alastairp: yeah, we had discussed the python3 migration briefly on friday. between then and now i ran futurize tool on the ab codebase and it seems only minimal changes are required to run on python 3.