go and run a few thousands tracks through it and get an idea yourself.
2021-06-16 16734, 2021
gcrk
I'll do :) thanks for the pointers
2021-06-16 16702, 2021
MRiddickW has quit
2021-06-16 16735, 2021
ruaok
np
2021-06-16 16735, 2021
Mineo has quit
2021-06-16 16757, 2021
lucifer
ruaok: so is there anything else we can do to speed up this mbid mapping process? I'd happy to help with it.
2021-06-16 16738, 2021
ruaok
I think there are other more important things to work on. remember that we don't get direct benefit from matching anything that is more than 2 years old.
2021-06-16 16747, 2021
ruaok
I guess that is good for stats, maybe.
2021-06-16 16713, 2021
ruaok
if you're bored, lets start planning out how to update spark to not use MSIDs.
2021-06-16 16728, 2021
ruaok
and get better artist similarity going.
2021-06-16 16729, 2021
lucifer
sure.
2021-06-16 16751, 2021
ruaok
I'll keep on making this mapping better for at least another couple of days. but then its just a matter of letting it rip.
2021-06-16 16736, 2021
lucifer
makes sense
2021-06-16 16742, 2021
Mineo joined the channel
2021-06-16 16748, 2021
lucifer
regarding getting listens to spark in real time, one issue is file size. too small parquet files are inefficient for spark. so we could keep stuff in memory till ~100MB before writing but then we risk losing that 100MB of listens in spark if something goes wrong before they are written to disk.
2021-06-16 16738, 2021
ruaok
I think even getting listens into spark in real time is not all that important now that we've got the dumps/imports working well.
2021-06-16 16752, 2021
ruaok
lets work on making use of the stable services we have now.
2021-06-16 16719, 2021
lucifer
yes, makes sense. so the current aim is to replace msids with mbids in spark?
2021-06-16 16726, 2021
ruaok
yes.
2021-06-16 16732, 2021
ruaok
rip out all the MSID->MBID mapping.
2021-06-16 16749, 2021
ruaok
and we'll need to adjust the spark dumps to fetch and output MBIDs.
2021-06-16 16710, 2021
ruaok
I think that may just need to be a custom dump, and not a transmogrification.
2021-06-16 16736, 2021
ruaok
in fact, we could make the spark dumps smaller by only dumping the fields that actually get consumed in spark.
2021-06-16 16741, 2021
lucifer
so why not just dump the last 2 years data for spark.
2021-06-16 16748, 2021
ruaok
IIRC there is too much data in spark that spark never looks at.
2021-06-16 16707, 2021
ruaok
if we dump only 2 years then the "all times" stats are not correct.
2021-06-16 16718, 2021
ruaok
but I see the allure in what you're suggesting.
2021-06-16 16719, 2021
lucifer
ah right, forgot about that.
2021-06-16 16713, 2021
ruaok
that said, we may not want to move the stats stuff to MBIDs yet.
2021-06-16 16719, 2021
ruaok
just the recommendation stuff.
2021-06-16 16753, 2021
lucifer
i see so we'll be having 2 dumps for spark.
2021-06-16 16749, 2021
ruaok
I didn't suggest that, but I can't rule that out either.
2021-06-16 16721, 2021
ruaok
I think we should carefully look at what fields spark uses and then custom tailor the dumps to only fetch the used fields from the DB and make smaller customized dumps for spark.
2021-06-16 16701, 2021
ruaok
which I think will actually be faster than transmogrifying. transmogrifying is far slower than I had hoped it would be
artist/release/track_msid/mbids/name are all used in respective stats at least.
2021-06-16 16728, 2021
ruaok
ah, I see we already did this trick. alas.
2021-06-16 16737, 2021
kepstin has quit
2021-06-16 16737, 2021
JuniorJPDJ has quit
2021-06-16 16737, 2021
yyoung[m] has quit
2021-06-16 16738, 2021
akshaaatt[m] has quit
2021-06-16 16738, 2021
tandy[m] has quit
2021-06-16 16743, 2021
elomatreb[m] has quit
2021-06-16 16755, 2021
elomatreb[m] joined the channel
2021-06-16 16701, 2021
yyoung[m] joined the channel
2021-06-16 16701, 2021
kepstin joined the channel
2021-06-16 16701, 2021
JuniorJPDJ joined the channel
2021-06-16 16712, 2021
tandy[m] joined the channel
2021-06-16 16712, 2021
akshaaatt[m] joined the channel
2021-06-16 16752, 2021
CatQuest
aw man ruaok I knew you'd be exited about funkwhale the minute i saw the guy and outsidecontext talk in #musicbrainz :D
2021-06-16 16709, 2021
CatQuest
excited *
2021-06-16 16751, 2021
ruaok
yeah, me too. :)
2021-06-16 16730, 2021
ritiek has quit
2021-06-16 16757, 2021
ritiek joined the channel
2021-06-16 16717, 2021
gcrk
ruaok, I just enabled the scrobbling to music brainz for me
2021-06-16 16729, 2021
gcrk
I think I am already experiencing issues since my tracks are not always having mbids
2021-06-16 16727, 2021
outsidecontext
gcrk: what issues do you have? The listens should still get submitted, just without IDs
2021-06-16 16745, 2021
BrainzGit
[critiquebrainz] 14dependabot-preview[bot] opened pull request #365 (03master…dependabot/npm_and_yarn/postcss-7.0.36): [Security] Bump postcss from 7.0.14 to 7.0.36 https://github.com/metabrainz/critiquebrainz/pull…
2021-06-16 16709, 2021
gcrk
outsidecontext, jeah right, it gets submitted but when I click the track it leads nowhere
That's odd, the listens shouldn't be linked at all I think. Or changed that recently?
2021-06-16 16734, 2021
gcrk
gcrkrause
2021-06-16 16741, 2021
ruaok
they should be linked if an MBID is present.
2021-06-16 16703, 2021
ruaok fragt sich ob gcrk deutsch ist
2021-06-16 16707, 2021
gcrk
maybe funkwhale does report a mbid but its "None"
2021-06-16 16727, 2021
gcrk
ruaok, sagen wir mal ich kann deutsch :)
2021-06-16 16745, 2021
ruaok
ah, .at oder .ch?
2021-06-16 16705, 2021
gcrk
I am living in DE but i wouldn't identify with being "deutsch"
2021-06-16 16750, 2021
ruaok
:)
2021-06-16 16755, 2021
monkey
Hm, the links are working for me and redirect to the correct MusicBrainz page
2021-06-16 16700, 2021
ruaok
ok, listens are arriving, which is the most important thin,
2021-06-16 16715, 2021
ruaok
yeah, for me too.
2021-06-16 16716, 2021
ruaok
huh.
2021-06-16 16718, 2021
gcrk
monkey, I deleted the wrong ones since I was suspicious they slow down the page
2021-06-16 16725, 2021
monkey
Ahaa
2021-06-16 16751, 2021
gcrk
lets see if I can generate new ones
2021-06-16 16731, 2021
monkey
In any case, there sohudl'nt be a link if there's no MBID. If it redirects you to "…/recording/None", then I suspect FW is sending the MBID as the string "None".
2021-06-16 16731, 2021
monkey
Could that be possible?
2021-06-16 16717, 2021
gcrk
monkey, sure, thats quite likely
2021-06-16 16720, 2021
outsidecontext
I also would have suspected they get submitted that way. But that would mean they are actually as a string "None" in FW already. The submission plugin checks if they are non-empty: https://dev.funkwhale.audio/funkwhale/funkwhale/-…
2021-06-16 16707, 2021
outsidecontext
the plugin could do a validity check if it is a proper MBID. but I think that this actually should rather be done by the server, monkey ?
2021-06-16 16751, 2021
monkey
I'm well versed in the front-end side of things, less so on the server side ingestion:) Passing the hot potato to ruaok !
2021-06-16 16715, 2021
gcrk
I think we should made sure we dont send None there on Funkwhale side
gcrk: if FW already has this as a string "None" that should be investigated, as it looks like either a bug in FW when reading the data or your files already had the MBID tags with a literal value of "None", which would indicate a bug in the tool which wrote them
2021-06-16 16740, 2021
monkey
This is the listen in question
2021-06-16 16707, 2021
gcrk
hm, funkwhale does not has "None" stored there
2021-06-16 16724, 2021
ruaok
lucifer: oy. we have a problem with listen mbid validation. ^^ Can you please hack up a PR for that?
2021-06-16 16748, 2021
lucifer
ruaok: sure
2021-06-16 16753, 2021
ruaok
thx
2021-06-16 16755, 2021
lucifer reads the backlog
2021-06-16 16710, 2021
outsidecontext
gcrk: can you check what it has stored for the file in question?
2021-06-16 16746, 2021
gcrk
outsidecontext, not really, but I can see what is stored in the funkwhale database with the django backend
2021-06-16 16754, 2021
gcrk
and mbid seems to be empty
2021-06-16 16743, 2021
gcrk
Not sure though if the django admin displays "None" as simply an empty field
2021-06-16 16737, 2021
lucifer
ruaok: i see. so the intended fix is to validate the mbid/msid in payloads and drop them if those are invalid?
you asked if I had indexes... I do on recording_msid. I tried a compound index on recording_msid and match_type and a single index on match_type, but the materialize section didn't improve.
2021-06-16 16739, 2021
outsidecontext
gcrk, ruaok: I just tested on my funkwhale instance with freshly uploaded files without MBID. No "None", and the files are unlinked, no "None" here
2021-06-16 16707, 2021
gcrk
which funkwhale version?
2021-06-16 16727, 2021
outsidecontext
the 1.1.2 release version
2021-06-16 16752, 2021
outsidecontext
and if I understand the code lucifer linked to correctly submission would actually fail if FW would send "recording_mbid": "None"
2021-06-16 16707, 2021
outsidecontext
but this all makes the issue just even stranger :(
2021-06-16 16724, 2021
gcrk
my database stores an empty value
2021-06-16 16726, 2021
gcrk
I just checked
2021-06-16 16734, 2021
outsidecontext
as the developer of the client plugin I blame the server :p
2021-06-16 16707, 2021
ruaok
as the server developer lucifer objects.
2021-06-16 16717, 2021
ruaok
if I may speak for lucifer. :)
2021-06-16 16741, 2021
monkey
ruaok: The screenshot is from the front-end, in the developer console (with react devtools)
2021-06-16 16720, 2021
outsidecontext
yes, the code looks solid for server also. so overall this issue is impossible to happen, those are the worst
2021-06-16 16742, 2021
outsidecontext
gcrk: if debug logging is enabled the plugin actually should log the request data as something like "ListenBrainz single: { thepayload }", maybe you could check that
2021-06-16 16756, 2021
lucifer
ruaok: i wish i could object but this `select * from listen where data->'track_metadata'->'additional_info'->>'recording_mbid'::text = 'None';` returned 4972 rows