#metabrainz

/

      • ruaok
        its stable, but not officially supported right now -- its not scalable.
      • 2021-06-16 16733, 2021

      • ruaok
        meaning that we can't deploy it to 1000s of users.
      • 2021-06-16 16757, 2021

      • gcrk
        reliable in the sense of "how likely do I get the right result"?
      • 2021-06-16 16704, 2021

      • ruaok
        but we're working a project where we're going back through 443M listens and matching them to MBIDs.
      • 2021-06-16 16725, 2021

      • ruaok
        gcrk: I won't speculate on that -- it really depends on the quality of the metadata you feed it.
      • 2021-06-16 16703, 2021

      • ruaok
        as you can see you can use any of these endpoints as an API: https://labs.api.listenbrainz.org/recording-searc…
      • 2021-06-16 16725, 2021

      • ruaok
        go and run a few thousands tracks through it and get an idea yourself.
      • 2021-06-16 16734, 2021

      • gcrk
        I'll do :) thanks for the pointers
      • 2021-06-16 16702, 2021

      • MRiddickW has quit
      • 2021-06-16 16735, 2021

      • ruaok
        np
      • 2021-06-16 16735, 2021

      • Mineo has quit
      • 2021-06-16 16757, 2021

      • lucifer
        ruaok: so is there anything else we can do to speed up this mbid mapping process? I'd happy to help with it.
      • 2021-06-16 16738, 2021

      • ruaok
        I think there are other more important things to work on. remember that we don't get direct benefit from matching anything that is more than 2 years old.
      • 2021-06-16 16747, 2021

      • ruaok
        I guess that is good for stats, maybe.
      • 2021-06-16 16713, 2021

      • ruaok
        if you're bored, lets start planning out how to update spark to not use MSIDs.
      • 2021-06-16 16728, 2021

      • ruaok
        and get better artist similarity going.
      • 2021-06-16 16729, 2021

      • lucifer
        sure.
      • 2021-06-16 16751, 2021

      • ruaok
        I'll keep on making this mapping better for at least another couple of days. but then its just a matter of letting it rip.
      • 2021-06-16 16736, 2021

      • lucifer
        makes sense
      • 2021-06-16 16742, 2021

      • Mineo joined the channel
      • 2021-06-16 16748, 2021

      • lucifer
        regarding getting listens to spark in real time, one issue is file size. too small parquet files are inefficient for spark. so we could keep stuff in memory till ~100MB before writing but then we risk losing that 100MB of listens in spark if something goes wrong before they are written to disk.
      • 2021-06-16 16738, 2021

      • ruaok
        I think even getting listens into spark in real time is not all that important now that we've got the dumps/imports working well.
      • 2021-06-16 16752, 2021

      • ruaok
        lets work on making use of the stable services we have now.
      • 2021-06-16 16719, 2021

      • lucifer
        yes, makes sense. so the current aim is to replace msids with mbids in spark?
      • 2021-06-16 16726, 2021

      • ruaok
        yes.
      • 2021-06-16 16732, 2021

      • ruaok
        rip out all the MSID->MBID mapping.
      • 2021-06-16 16749, 2021

      • ruaok
        and we'll need to adjust the spark dumps to fetch and output MBIDs.
      • 2021-06-16 16710, 2021

      • ruaok
        I think that may just need to be a custom dump, and not a transmogrification.
      • 2021-06-16 16736, 2021

      • ruaok
        in fact, we could make the spark dumps smaller by only dumping the fields that actually get consumed in spark.
      • 2021-06-16 16741, 2021

      • lucifer
        so why not just dump the last 2 years data for spark.
      • 2021-06-16 16748, 2021

      • ruaok
        IIRC there is too much data in spark that spark never looks at.
      • 2021-06-16 16707, 2021

      • ruaok
        if we dump only 2 years then the "all times" stats are not correct.
      • 2021-06-16 16718, 2021

      • ruaok
        but I see the allure in what you're suggesting.
      • 2021-06-16 16719, 2021

      • lucifer
        ah right, forgot about that.
      • 2021-06-16 16713, 2021

      • ruaok
        that said, we may not want to move the stats stuff to MBIDs yet.
      • 2021-06-16 16719, 2021

      • ruaok
        just the recommendation stuff.
      • 2021-06-16 16753, 2021

      • lucifer
        i see so we'll be having 2 dumps for spark.
      • 2021-06-16 16749, 2021

      • ruaok
        I didn't suggest that, but I can't rule that out either.
      • 2021-06-16 16721, 2021

      • ruaok
        I think we should carefully look at what fields spark uses and then custom tailor the dumps to only fetch the used fields from the DB and make smaller customized dumps for spark.
      • 2021-06-16 16701, 2021

      • ruaok
        which I think will actually be faster than transmogrifying. transmogrifying is far slower than I had hoped it would be
      • 2021-06-16 16752, 2021

      • lucifer
        makes sense. i'll see what fields spark is using.
      • 2021-06-16 16729, 2021

      • ruaok
        great.
      • 2021-06-16 16751, 2021

      • lucifer
      • 2021-06-16 16701, 2021

      • lucifer
        only the tags field is unused.
      • 2021-06-16 16717, 2021

      • lucifer
        artist/release/track_msid/mbids/name are all used in respective stats at least.
      • 2021-06-16 16728, 2021

      • ruaok
        ah, I see we already did this trick. alas.
      • 2021-06-16 16737, 2021

      • kepstin has quit
      • 2021-06-16 16737, 2021

      • JuniorJPDJ has quit
      • 2021-06-16 16737, 2021

      • yyoung[m] has quit
      • 2021-06-16 16738, 2021

      • akshaaatt[m] has quit
      • 2021-06-16 16738, 2021

      • tandy[m] has quit
      • 2021-06-16 16743, 2021

      • elomatreb[m] has quit
      • 2021-06-16 16755, 2021

      • elomatreb[m] joined the channel
      • 2021-06-16 16701, 2021

      • yyoung[m] joined the channel
      • 2021-06-16 16701, 2021

      • kepstin joined the channel
      • 2021-06-16 16701, 2021

      • JuniorJPDJ joined the channel
      • 2021-06-16 16712, 2021

      • tandy[m] joined the channel
      • 2021-06-16 16712, 2021

      • akshaaatt[m] joined the channel
      • 2021-06-16 16752, 2021

      • CatQuest
        aw man ruaok I knew you'd be exited about funkwhale the minute i saw the guy and outsidecontext talk in #musicbrainz :D
      • 2021-06-16 16709, 2021

      • CatQuest
        excited *
      • 2021-06-16 16751, 2021

      • ruaok
        yeah, me too. :)
      • 2021-06-16 16730, 2021

      • ritiek has quit
      • 2021-06-16 16757, 2021

      • ritiek joined the channel
      • 2021-06-16 16717, 2021

      • gcrk
        ruaok, I just enabled the scrobbling to music brainz for me
      • 2021-06-16 16729, 2021

      • gcrk
        I think I am already experiencing issues since my tracks are not always having mbids
      • 2021-06-16 16727, 2021

      • outsidecontext
        gcrk: what issues do you have? The listens should still get submitted, just without IDs
      • 2021-06-16 16745, 2021

      • BrainzGit
        [critiquebrainz] 14dependabot-preview[bot] opened pull request #365 (03master…dependabot/npm_and_yarn/postcss-7.0.36): [Security] Bump postcss from 7.0.14 to 7.0.36 https://github.com/metabrainz/critiquebrainz/pull…
      • 2021-06-16 16709, 2021

      • gcrk
        outsidecontext, jeah right, it gets submitted but when I click the track it leads nowhere
      • 2021-06-16 16722, 2021

      • gcrk
        or throws an error
      • 2021-06-16 16749, 2021

      • gcrk
      • 2021-06-16 16752, 2021

      • gcrk
      • 2021-06-16 16709, 2021

      • ruaok
        monkey: ^^
      • 2021-06-16 16719, 2021

      • ruaok
        what is your user name on LB, gcrk ?
      • 2021-06-16 16724, 2021

      • outsidecontext
        That's odd, the listens shouldn't be linked at all I think. Or changed that recently?
      • 2021-06-16 16734, 2021

      • gcrk
        gcrkrause
      • 2021-06-16 16741, 2021

      • ruaok
        they should be linked if an MBID is present.
      • 2021-06-16 16703, 2021

      • ruaok fragt sich ob gcrk deutsch ist
      • 2021-06-16 16707, 2021

      • gcrk
        maybe funkwhale does report a mbid but its "None"
      • 2021-06-16 16727, 2021

      • gcrk
        ruaok, sagen wir mal ich kann deutsch :)
      • 2021-06-16 16745, 2021

      • ruaok
        ah, .at oder .ch?
      • 2021-06-16 16705, 2021

      • gcrk
        I am living in DE but i wouldn't identify with being "deutsch"
      • 2021-06-16 16750, 2021

      • ruaok
        :)
      • 2021-06-16 16755, 2021

      • monkey
        Hm, the links are working for me and redirect to the correct MusicBrainz page
      • 2021-06-16 16700, 2021

      • ruaok
        ok, listens are arriving, which is the most important thin,
      • 2021-06-16 16715, 2021

      • ruaok
        yeah, for me too.
      • 2021-06-16 16716, 2021

      • ruaok
        huh.
      • 2021-06-16 16718, 2021

      • gcrk
        monkey, I deleted the wrong ones since I was suspicious they slow down the page
      • 2021-06-16 16725, 2021

      • monkey
        Ahaa
      • 2021-06-16 16751, 2021

      • gcrk
        lets see if I can generate new ones
      • 2021-06-16 16731, 2021

      • monkey
        In any case, there sohudl'nt be a link if there's no MBID. If it redirects you to "…/recording/None", then I suspect FW is sending the MBID as the string "None".
      • 2021-06-16 16731, 2021

      • monkey
        Could that be possible?
      • 2021-06-16 16717, 2021

      • gcrk
        monkey, sure, thats quite likely
      • 2021-06-16 16720, 2021

      • outsidecontext
        I also would have suspected they get submitted that way. But that would mean they are actually as a string "None" in FW already. The submission plugin checks if they are non-empty: https://dev.funkwhale.audio/funkwhale/funkwhale/-…
      • 2021-06-16 16707, 2021

      • outsidecontext
        the plugin could do a validity check if it is a proper MBID. but I think that this actually should rather be done by the server, monkey ?
      • 2021-06-16 16751, 2021

      • monkey
        I'm well versed in the front-end side of things, less so on the server side ingestion:) Passing the hot potato to ruaok !
      • 2021-06-16 16715, 2021

      • gcrk
        I think we should made sure we dont send None there on Funkwhale side
      • 2021-06-16 16751, 2021

      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2147 (03master…MBS-11724): MBS-11724: Fix typo https://github.com/metabrainz/musicbrainz-server/…
      • 2021-06-16 16754, 2021

      • ruaok
        too starchy, monkey. can you pass the jamon??
      • 2021-06-16 16732, 2021

      • monkey
        I confirm I'm seeing a "None" MBID now
      • 2021-06-16 16736, 2021

      • monkey
      • 2021-06-16 16737, 2021

      • outsidecontext
        gcrk: if FW already has this as a string "None" that should be investigated, as it looks like either a bug in FW when reading the data or your files already had the MBID tags with a literal value of "None", which would indicate a bug in the tool which wrote them
      • 2021-06-16 16740, 2021

      • monkey
        This is the listen in question
      • 2021-06-16 16707, 2021

      • gcrk
        hm, funkwhale does not has "None" stored there
      • 2021-06-16 16724, 2021

      • ruaok
        lucifer: oy. we have a problem with listen mbid validation. ^^ Can you please hack up a PR for that?
      • 2021-06-16 16748, 2021

      • lucifer
        ruaok: sure
      • 2021-06-16 16753, 2021

      • ruaok
        thx
      • 2021-06-16 16755, 2021

      • lucifer reads the backlog
      • 2021-06-16 16710, 2021

      • outsidecontext
        gcrk: can you check what it has stored for the file in question?
      • 2021-06-16 16746, 2021

      • gcrk
        outsidecontext, not really, but I can see what is stored in the funkwhale database with the django backend
      • 2021-06-16 16754, 2021

      • gcrk
        and mbid seems to be empty
      • 2021-06-16 16743, 2021

      • gcrk
        Not sure though if the django admin displays "None" as simply an empty field
      • 2021-06-16 16737, 2021

      • lucifer
        ruaok: i see. so the intended fix is to validate the mbid/msid in payloads and drop them if those are invalid?
      • 2021-06-16 16751, 2021

      • ruaok
        yes
      • 2021-06-16 16705, 2021

      • lucifer
        👍 makes sense.
      • 2021-06-16 16711, 2021

      • outsidecontext
        gcrk: but then https://dev.funkwhale.audio/funkwhale/funkwhale/-… should actually not set the MBID at all, that's odd
      • 2021-06-16 16713, 2021

      • ruaok
        well just mbid. msids are assigned by us
      • 2021-06-16 16721, 2021

      • gcrk
        outsidecontext, I think there is quite some magic serialization in between which can go wrong
      • 2021-06-16 16706, 2021

      • outsidecontext
        probably, yes. could be somewhere in between that None gets converted to a string.
      • 2021-06-16 16714, 2021

      • gcrk
        wen can simply do something like "if track.album && track.album != None"
      • 2021-06-16 16730, 2021

      • gcrk
        but maybe its better to fix the serializer
      • 2021-06-16 16742, 2021

      • Mineo has quit
      • 2021-06-16 16714, 2021

      • outsidecontext
        gcrk: or is there any chance this is using an older funkwhale? that issue actuall was in the original version that made it into 1.1, see https://dev.funkwhale.audio/funkwhale/funkwhale/-…
      • 2021-06-16 16728, 2021

      • gcrk
        the patch you linked is part of 1.1
      • 2021-06-16 16734, 2021

      • outsidecontext
        ah, no. that fix was from before, yes
      • 2021-06-16 16750, 2021

      • Mineo joined the channel
      • 2021-06-16 16744, 2021

      • lucifer
        ruaok: we already have those checks.... https://github.com/metabrainz/listenbrainz-server…
      • 2021-06-16 16706, 2021

      • ruaok
        phew. but where did that None come from in the screenshot above?
      • 2021-06-16 16735, 2021

      • lucifer
        i checked its in the db as well.
      • 2021-06-16 16750, 2021

      • lucifer
        (the "None")
      • 2021-06-16 16711, 2021

      • ruaok
        monkey: where did that screenshot with None in it come from??
      • 2021-06-16 16723, 2021

      • ruaok
        thanks for checking, lucifer
      • 2021-06-16 16747, 2021

      • ruaok
      • 2021-06-16 16738, 2021

      • ruaok
        you asked if I had indexes... I do on recording_msid. I tried a compound index on recording_msid and match_type and a single index on match_type, but the materialize section didn't improve.
      • 2021-06-16 16739, 2021

      • outsidecontext
        gcrk, ruaok: I just tested on my funkwhale instance with freshly uploaded files without MBID. No "None", and the files are unlinked, no "None" here
      • 2021-06-16 16707, 2021

      • gcrk
        which funkwhale version?
      • 2021-06-16 16727, 2021

      • outsidecontext
        the 1.1.2 release version
      • 2021-06-16 16752, 2021

      • outsidecontext
        and if I understand the code lucifer linked to correctly submission would actually fail if FW would send "recording_mbid": "None"
      • 2021-06-16 16707, 2021

      • outsidecontext
        but this all makes the issue just even stranger :(
      • 2021-06-16 16724, 2021

      • gcrk
        my database stores an empty value
      • 2021-06-16 16726, 2021

      • gcrk
        I just checked
      • 2021-06-16 16734, 2021

      • outsidecontext
        as the developer of the client plugin I blame the server :p
      • 2021-06-16 16707, 2021

      • ruaok
        as the server developer lucifer objects.
      • 2021-06-16 16717, 2021

      • ruaok
        if I may speak for lucifer. :)
      • 2021-06-16 16741, 2021

      • monkey
        ruaok: The screenshot is from the front-end, in the developer console (with react devtools)
      • 2021-06-16 16720, 2021

      • outsidecontext
        yes, the code looks solid for server also. so overall this issue is impossible to happen, those are the worst
      • 2021-06-16 16742, 2021

      • outsidecontext
        gcrk: if debug logging is enabled the plugin actually should log the request data as something like "ListenBrainz single: { thepayload }", maybe you could check that
      • 2021-06-16 16756, 2021

      • lucifer
        ruaok: i wish i could object but this `select * from listen where data->'track_metadata'->'additional_info'->>'recording_mbid'::text = 'None';` returned 4972 rows
      • 2021-06-16 16731, 2021

      • ruaok
        whoops.
      • 2021-06-16 16759, 2021

      • lucifer
        looking further into it.
      • 2021-06-16 16710, 2021

      • gcrk
        outsidecontext, just DEBUG=true?