#metabrainz

/

      • yyoung
        yvanzo: Hi, about the implementation, I found something that still needs discussion
      • In my initial thought, to add a link to the list, users will have to: 1. enter the url 2. pass the validation
      • However, URL validation isn't enough, because if there's relationship error after the link is added, users will have to open the popover to change the URL, which is kind of annoying
      • Therefore we'll have to require the user to also select a relationship type before appending the link
      • But that'll make adding links on blurred impossible, leaving only pressing enter or maybe other ways
      • Also, if I didn't get it wrong, in the new UI, users select relationship type after the link is added to the list, so we can't do relationship validation in advance either
      • What do you think? :)
      • ritiek joined the channel
      • gcrk__ joined the channel
      • gcrk has quit
      • Protab joined the channel
      • Rotab has quit
      • gcrk__ has quit
      • MRiddickW has quit
      • gcrk joined the channel
      • ritiek has quit
      • lucifer
        ruaok: morning. hi! do you know some spark sql?
      • ruaok
        moin. some. never actually written a query, so probably not too helpful.
      • lucifer
        ah ok. i wanted to get a sanity check on my artist similarity query.
      • ruaok
        I can try.
      • lucifer
      • explode is equivalent to unnest of postgres.
      • I wanted to put the explode column in the FROM or group by but it isn't supported there so had to create a temp table.
      • ruaok: i think i found a bug in dataframe generation. the where clause looks wrong. https://github.com/metabrainz/listenbrainz-serv...
      • ruaok
        I wonder if such a bug would affect the user similarity stuff.
      • ruaok is digesting the query above
      • lucifer
        firstly, i am not sure how this even works with where, count is an aggregate so should have been having. anyways, moving on this is comparing each row with threshold whereas it should be comparing sum of all listens of a user.
      • ruaok
        can we chat about the first query before we dive into this bug?
      • lucifer
        sure
      • ruaok
        k. so the temp table is only there to expand the arguments we pass to the query?
      • lucifer
        yes
      • ruaok
        and then the rest seems to be equivalent to a standard PG query that does a count/group by.
      • `dense_rank() over(order by user_name) as user_id`
      • is the switch from user_name to user_id intentional here?
      • lucifer
        yes, we want to assign each user_name and mbid a numeric index that can be used as index in the matrix.
      • ruaok
        k
      • ah, I see how. the dense rank function creates these artist id and user ids suitable for feeding to spark.
      • yes, that looks sane to me.
      • lucifer
        cool, thanks. i forgot to add threshold to this, currently working on that.
      • ruaok
        hmm. is this for artist similarities or for user similarities?
      • lucifer
        user similarities.
      • ruaok
        "artist similarity query." for user similarties, I figure.
      • k
      • yes, looks fully sane.
      • now lets look at the dataframe issue.
      • lucifer
        yes
      • ruaok
        because I wonder if fixing that might have an impact on the user similarities based on recordings.
      • lucifer
        i think it should, my understanding is that currently it is ignoring all listen counts less than 50.
      • that is listen counts of a particular recording of a particular user
      • ruaok
        oh. and not total listen counts per user? because that was the goal and this would most certainly explain how things went to shit.
      • lucifer
        right
      • and same goes for recs
      • ruaok
        oh yes, that most certainly does what you say it does.
      • and it explains the top similar users and how borked it is.
      • lucifer
        right, i checked some top similar users, they don't have similar recordings rather lots of recordings lisetned to a large number of times.
      • also explains why i am many times in top similar users :)
      • ruaok
        exactly that.
      • phew, finally and explanation for it.
      • lets fix this query and re-run user similarities.
      • lucifer
        yup, on it
      • ruaok
        thanks
      • Etua joined the channel
      • gcrk: ping
      • gcrk
        ruaok, pong
      • ruaok
        hey, have you started using that API to lookup your tracks?
      • gcrk
        hi :) Did got to it yet
      • ruaok
        ok, thanks. then I need to hunt down a problem we've having. :)
      • gcrk
        I am wondering where to build something around
      • you are talking about mbid lookup?
      • ruaok
        yes.
      • something has made that endpoint busy or angry. not sure what yet
      • gcrk
        I was considering trying to hack a lookup function into listenbrainz itself, so if a listen does not have a mbid I can just hit "lookup mbid" and get there
      • its probably the easiest way to play around with it
      • On the other side something like a pull function into funkwhale would be neat, but i have along list of todos there
      • ruaok
        depends on what your goal is.
      • I think the first thing is to figure out what works -- then we can figure out how to deploy it.
      • gcrk
        oh I expected this is already somehow stable?
      • ruaok
        everything is still in flux, to varying degrees.
      • the APIs we expose on labs.api.listenbrainz.org are stable, but undocumented.
      • meaning that we haven't committed to scaling them yet.
      • and I see that the very powerful typesense lookup we offer is not likely going to scale very well on our servers.
      • so, we'll likely need to move to lookup matches from our MBID mapping that we're working on.
      • gcrk
        So my hole motivation is to get to this recommendation stuff, which obviously needs good listening data. so the fact funkwhale reports listens without mbid seems to be a problem to work on first
      • ruaok
        very much agreed.
      • then, somehow, make it so that our recommendation engine can make effective recommendations without being a privacy nightmare.
      • gcrk
        I wont be any help in this regard :(
      • Just because I have no idea about these algorithms :D
      • ruaok
        no need for you to understand that part -- we've got the knowledge for that.
      • we need candidate sets to recommend effectively.
      • a candidate set is a list of track that can be recommended for any given user.
      • *tracks
      • gcrk
        So jeah, lets dump my current state of mind before lunch: I think the community of funkwhale would get angry if funkwhale stats changing the metadata without review. So we would need some kind of suggestions which users can simply hit "take over this data"
      • in the best case its only the mbid
      • ruaok
        very much agreed.
      • because this process can go wrong. we see this all the time when people put too much blind faith into picard.
      • load 10,000 albums and lookup/then save.
      • then we get an email "YOU DESTROYED MY MUSIC COLLECTION". 🙄
      • tandy[m]
        <ruaok "then we get an email "YOU DESTRO"> lol
      • gcrk
        And besides the technical problems with that I don't think its something we should encourage. it drives away users from their libraries and we want to raise the value of music, not follow this spotify shit where its only about consuming whatever appeals
      • tandy[m]
        i wish beets had more man power behind it
      • picard is doing really well tho, compared to when i used to use it in 2019
      • ruaok
        gcrk: <3 I love how you think. matches my desires perfectly.
      • gcrk
        its a little bit stolen from the beets website I think, I just added some spotify hate
      • ruaok
        I'm glad you turned up here -- I think funkwhale could be our killer app for our recommendations.
      • people who have FW installations are very much the music nerds we want.
      • gcrk
        I actually think we should make this a plugin though
      • so each user can decide to "leak" the listens and get recommendations in exchange, or not
      • ruaok
        yeah, I am not even thinking about the implementation details yet.
      • there are too many major privacy issues to deal with first.
      • gcrk
        well, its good to have some considerations though to prepare the stuff
      • ruaok
        everything MUST be opt-in.
      • gcrk
        anyway, lets stop pretending to work and get some lunch. see ya! :)
      • ruaok
        bye!
      • Etua has quit
      • yvanzo: bitmap: we've had some emails over the weekend about CAA being borked. I looked at the RMQ graph and it seems sane. are things working correctly then?
      • lucifer: the typesense server freaked out and couldn't be restarted. I upgraded to 0.20.0 and rebuilt the index. lookup speed has drastically improved now. now around 80 reqs/s. :) :)
      • lucifer
        awesome :DD
      • BrainzGit
        [critiquebrainz] 14dependabot-preview[bot] opened pull request #366 (03master…dependabot/npm_and_yarn/codemirror-5.62.0): [Security] Bump codemirror from 5.45.0 to 5.62.0 https://github.com/metabrainz/critiquebrainz/pu...
      • [critiquebrainz] 14dependabot-preview[bot] closed pull request #362 (03master…dependabot/npm_and_yarn/codemirror-5.61.1): [Security] Bump codemirror from 5.45.0 to 5.61.1 https://github.com/metabrainz/critiquebrainz/pu...
      • gcrk has quit
      • lucifer
        ruaok: we didn't add a page for missing musicbrainz data page yet, right? is that a part of jasondk's gsoc project?
      • ruaok
        correct. no.
      • lucifer
        cool, i'll take a stab it later this week then.
      • ruaok
        where is the superman emoji??
      • lucifer
        this one? 🦸
      • ruaok
        ahhh, yes.
      • !m 🦸 lucifer
      • BrainzBot
        You're doing good work, 🦸 lucifer!
      • lucifer
        lol :D
      • ruaok
        hmm. I'm not one to suggest that we imitate amazon's corporate culture, but this is interesting: https://www.justingarrison.com/blog/2021-03-15-...
      • we do this for releases and it has made our complicated releases much much more successful. and we create docs for hackdays and more complicated features.
      • ritiek joined the channel
      • alastairp
        OK
      • conference reviews finished
      • ruaok: I especially like the ability to go back to a document and see why we decided to do something. We've found a few things that seem like stupid decisions we made, but no paper trail of why we originally thought it was a good idea
      • I've started trying to add a bit more detail to these documents that we've been making, I guess time will tell as to if they were useful or not
      • Mineo has quit
      • lucifer
      • ok that quickly went out of hand :(
      • ruaok: ^ artist query with thresholding.
      • i think the solution for recordings will be similar (it using self joins though, why?), will need to wrap the query in a subquery to add a where clause.
      • Mineo joined the channel
      • actually, ignore that query, it'll mess up the artist_id stuff, let me rethink it.
      • MRiddickW joined the channel
      • alastairp
        lucifer: hi, were we planning on doing an AB discussion, or did we do that on Friday?
      • lucifer
        alastairp: yeah, we had discussed the python3 migration briefly on friday. between then and now i ran futurize tool on the ab codebase and it seems only minimal changes are required to run on python 3.
      • ruaok
        `mbids = [x for x in mbids if x]`
      • alastairp
        amazing, sounds good
      • ruaok
        that reads funny, but is awesome. :)
      • alastairp
        ruaok: to skip '' and None ?