Always prefer the submitted MBIDs when available, otherwise use the mapped ones.
and I think that should be make consistent everywhere, unless there is a clear case that it needs to be different.
lucifer
makes sense. i agree.
alastairp
at any point should we compare user-submitted MBIDs against the metadata somehow?
as a way of trying to determine if the mbid makes sense or is garbage
ruaok
likely.
lucifer
not sure how we would do that but makes sense to do it.
ruaok
I'm kinda taking this as an opportunity to get more user involvement. If your MBIDs suck, do better.
alastairp
lucifer: you said that we prefer mapping mbids in stats, because that's what spark uses?
lucifer
yes, the spark dumps only used mapped mbids hence stats use only mapped mbids.
zas
ruaok: pong
alastairp
when stats come back to the site, how are those mbids used? are there situations where a difference in the submitted/computed stats are going to cause stats to break or show incomplete data?
lucifer
the mbids are used to group listens by spark so different mbids will show up twice in stats. like if some of user listens are mapped but some aren't then they show up twice. this has been reported quite a few times.
do you know how the rsync rules between the ftp.musicbrainz.org and our FTP server are setup?
lucifer
now if the user submitted listens have a different mbid and than what mapping would give and only some listens were submitted with a mbid, we'd have dupes.
ruaok
we're pruning dumps, but the FTP site has many more dumps, causing their FTP server disk to nearly fill up.
zas
no idea how and when it was set up (it wasn't by me), anything in syswiki
lucifer
i ran a query to check where the user submitted mbid is different from mapped one, for listens of last week its roughly 1% of all mapped listens.
zas
?
alastairp
lucifer: right. so should be possible to use user-submitted mbid here if it exists, and only if not use the mapper?
looks like we need to adjust the rsync command here to include --delete option.
lucifer
makes sense.
what are the implications? say if due to some error dumps go missing temporarily or even permanently from kiss.
CatQuest has quit
ruaok
the the FTP site reflects that.
*then
lucifer
right, do we have anything as a fallback in that case to put those back up?
ruaok
I'm not quite sure I understand the case you're guarding against.
can you elaborate?
lucifer
we have had issues earlier where dumps were temporarily unavailable on lemmy. though that was mostly due to storage box issues. not sure if there are other reasons due to which such thing can happen in future.
ruaok: alastairp: we didn't get to conclude the discussion. use user submitted mbids over mapped. try to compare user submitted mbids with mapped. other than that?
ruaok
more the the first part than the latter part, but yes.
lucifer
should we use user submitted mbids in stats? there may be more duplicates due to that for some users.
ruaok
I mean, if we compare user submitted vs matched... then what?
lucifer
yeah, we'd need manual intervention to see what's right. so it goes back to asking the user.
ruaok
I want to say for stats we should use mapped ids. I have a feeling that that will give better results.
lucifer
i think so too.
also, acc to a query i ran, for listens of the last week there are ~1% listens where user submitted mbids are different from mapped ones.
thoughts about anything to do there?
ruaok
nothing.
and of those 1% how many were *really* wrong and how many were "different album in the same release group" sort of problems?
lucifer
latter for the few i checked.
ruaok
nothing left to do there.
lucifer
👍
moving on, about artist/track/release names. use user submitted ones or mapped?
i lean in favor of mapped everywhere except listens page. on the listens page, a future enhancement could be to show both and let the user choose?
ruaok
"i lean in favor of mapped everywhere except listens page. " agreed.
lucifer
👍
ruaok
"a future enhancement could be to show both and let the user choose?" at best. we'll need to learn how this shakes out and what users want.
lucifer
yeah indeed
next up, manual mapping. how to deal with it? say different users map a msid to different mbids.
this is another part of the missing mb data stuff riksucks is working on. for listens that could not get mapped but do exist in MB. so the other day we were discussing about a way to let users manually map those.
ruaok
I think we should not worry about that right this second.
we have so many other things to do, I'd rather have us focus on features, than the perfect mapping tools.
lucifer
not worry as in not implement manual mapping for now or something else?
ruaok
one should drive demand and thus clarity for the other.
dont implement for now.
keep on radar to learn how to do it, but for now, lets move on.
lucifer: are we calcuating site-wide stats on a regular basis now?
lucifer
ruaok: +1. agreed
ruaok: no. not deployed yet.
ruaok
do you think that will be deloyed in the next 3 weeks?
lucifer
yes, i intend to deploy today
ruaok
great.
lucifer
we'll have sitewide stats but not all for now.
ruaok
and of course there is an API endpoint for it yes?
lucifer
daily actvity and listening activity are unfinished currently.
yes api is there
frontend ui is still pending
ruaok
I really only need "top recordings of 2021"
lucifer
yes that is doable. we currently don't have 2021 though because all periods are last.
ruaok
ah yes, we need a "to date" report for that.
lucifer
should be a minor change to generate for just 2021 though.
ruaok
that would be quite helpful!
lucifer
sure, will add that.
ruaok
thx
lucifer
thoughts on how to store in db? add to_date_week to enum or a new column for last/to_date period?
ruaok
enum, I'd say.
lucifer
👍
monkey: alastairp: any PRs to merge? i'll do a release.
monkey
Maybe 1708 if you've got time for a quick second look
alastairp
nothign here
lucifer
lgtm, will deploy on test to test before merging.
ruaok: safe to close LB-53 now i guess? we don't intend to do that anymore
I've been testing #1713 as well, which I am quite confident with now.
(after spending a full day on something I thought would take an hour)
lucifer
ruaok: i and alastairp were thinking to enable sentry tracing in prod. it'll help to know api usage and and performance metrics. in beta/test we trace every request, for prod we should do a low value (5% maybe). i myself haven't seen noticeable delays on beta or test so i think it should be fine. if we feel, its slowing down stuff we can always set it back to 0. its a matter of changing a consul config value. thoughts?
ruaok
do it
lucifer
awesome, thanks!
monkey: alastairp: i also debugged LB-993. suggestions before i implement the fix?