Always prefer the submitted MBIDs when available, otherwise use the mapped ones.
2021-11-08 31212, 2021
ruaok
and I think that should be make consistent everywhere, unless there is a clear case that it needs to be different.
2021-11-08 31235, 2021
lucifer
makes sense. i agree.
2021-11-08 31206, 2021
alastairp
at any point should we compare user-submitted MBIDs against the metadata somehow?
2021-11-08 31217, 2021
alastairp
as a way of trying to determine if the mbid makes sense or is garbage
2021-11-08 31231, 2021
ruaok
likely.
2021-11-08 31258, 2021
lucifer
not sure how we would do that but makes sense to do it.
2021-11-08 31203, 2021
ruaok
I'm kinda taking this as an opportunity to get more user involvement. If your MBIDs suck, do better.
2021-11-08 31206, 2021
alastairp
lucifer: you said that we prefer mapping mbids in stats, because that's what spark uses?
2021-11-08 31234, 2021
lucifer
yes, the spark dumps only used mapped mbids hence stats use only mapped mbids.
2021-11-08 31256, 2021
zas
ruaok: pong
2021-11-08 31258, 2021
alastairp
when stats come back to the site, how are those mbids used? are there situations where a difference in the submitted/computed stats are going to cause stats to break or show incomplete data?
2021-11-08 31247, 2021
lucifer
the mbids are used to group listens by spark so different mbids will show up twice in stats. like if some of user listens are mapped but some aren't then they show up twice. this has been reported quite a few times.
do you know how the rsync rules between the ftp.musicbrainz.org and our FTP server are setup?
2021-11-08 31238, 2021
lucifer
now if the user submitted listens have a different mbid and than what mapping would give and only some listens were submitted with a mbid, we'd have dupes.
2021-11-08 31254, 2021
ruaok
we're pruning dumps, but the FTP site has many more dumps, causing their FTP server disk to nearly fill up.
2021-11-08 31232, 2021
zas
no idea how and when it was set up (it wasn't by me), anything in syswiki
2021-11-08 31234, 2021
lucifer
i ran a query to check where the user submitted mbid is different from mapped one, for listens of last week its roughly 1% of all mapped listens.
2021-11-08 31235, 2021
zas
?
2021-11-08 31246, 2021
alastairp
lucifer: right. so should be possible to use user-submitted mbid here if it exists, and only if not use the mapper?
looks like we need to adjust the rsync command here to include --delete option.
2021-11-08 31227, 2021
lucifer
makes sense.
2021-11-08 31217, 2021
lucifer
what are the implications? say if due to some error dumps go missing temporarily or even permanently from kiss.
2021-11-08 31238, 2021
CatQuest has quit
2021-11-08 31250, 2021
ruaok
the the FTP site reflects that.
2021-11-08 31215, 2021
ruaok
*then
2021-11-08 31239, 2021
lucifer
right, do we have anything as a fallback in that case to put those back up?
2021-11-08 31210, 2021
ruaok
I'm not quite sure I understand the case you're guarding against.
2021-11-08 31213, 2021
ruaok
can you elaborate?
2021-11-08 31231, 2021
lucifer
we have had issues earlier where dumps were temporarily unavailable on lemmy. though that was mostly due to storage box issues. not sure if there are other reasons due to which such thing can happen in future.
ruaok: alastairp: we didn't get to conclude the discussion. use user submitted mbids over mapped. try to compare user submitted mbids with mapped. other than that?
2021-11-08 31216, 2021
ruaok
more the the first part than the latter part, but yes.
2021-11-08 31227, 2021
lucifer
should we use user submitted mbids in stats? there may be more duplicates due to that for some users.
2021-11-08 31231, 2021
ruaok
I mean, if we compare user submitted vs matched... then what?
2021-11-08 31204, 2021
lucifer
yeah, we'd need manual intervention to see what's right. so it goes back to asking the user.
2021-11-08 31211, 2021
ruaok
I want to say for stats we should use mapped ids. I have a feeling that that will give better results.
2021-11-08 31220, 2021
lucifer
i think so too.
2021-11-08 31204, 2021
lucifer
also, acc to a query i ran, for listens of the last week there are ~1% listens where user submitted mbids are different from mapped ones.
2021-11-08 31222, 2021
lucifer
thoughts about anything to do there?
2021-11-08 31235, 2021
ruaok
nothing.
2021-11-08 31209, 2021
ruaok
and of those 1% how many were *really* wrong and how many were "different album in the same release group" sort of problems?
2021-11-08 31220, 2021
lucifer
latter for the few i checked.
2021-11-08 31232, 2021
ruaok
nothing left to do there.
2021-11-08 31228, 2021
lucifer
👍
2021-11-08 31259, 2021
lucifer
moving on, about artist/track/release names. use user submitted ones or mapped?
2021-11-08 31252, 2021
lucifer
i lean in favor of mapped everywhere except listens page. on the listens page, a future enhancement could be to show both and let the user choose?
2021-11-08 31231, 2021
ruaok
"i lean in favor of mapped everywhere except listens page. " agreed.
2021-11-08 31251, 2021
lucifer
👍
2021-11-08 31202, 2021
ruaok
"a future enhancement could be to show both and let the user choose?" at best. we'll need to learn how this shakes out and what users want.
2021-11-08 31211, 2021
lucifer
yeah indeed
2021-11-08 31216, 2021
lucifer
next up, manual mapping. how to deal with it? say different users map a msid to different mbids.
2021-11-08 31212, 2021
lucifer
this is another part of the missing mb data stuff riksucks is working on. for listens that could not get mapped but do exist in MB. so the other day we were discussing about a way to let users manually map those.
2021-11-08 31213, 2021
ruaok
I think we should not worry about that right this second.
2021-11-08 31236, 2021
ruaok
we have so many other things to do, I'd rather have us focus on features, than the perfect mapping tools.
2021-11-08 31242, 2021
lucifer
not worry as in not implement manual mapping for now or something else?
2021-11-08 31251, 2021
ruaok
one should drive demand and thus clarity for the other.
2021-11-08 31201, 2021
ruaok
dont implement for now.
2021-11-08 31212, 2021
ruaok
keep on radar to learn how to do it, but for now, lets move on.
2021-11-08 31248, 2021
ruaok
lucifer: are we calcuating site-wide stats on a regular basis now?
2021-11-08 31223, 2021
lucifer
ruaok: +1. agreed
2021-11-08 31234, 2021
lucifer
ruaok: no. not deployed yet.
2021-11-08 31253, 2021
ruaok
do you think that will be deloyed in the next 3 weeks?
2021-11-08 31203, 2021
lucifer
yes, i intend to deploy today
2021-11-08 31210, 2021
ruaok
great.
2021-11-08 31223, 2021
lucifer
we'll have sitewide stats but not all for now.
2021-11-08 31224, 2021
ruaok
and of course there is an API endpoint for it yes?
2021-11-08 31239, 2021
lucifer
daily actvity and listening activity are unfinished currently.
2021-11-08 31242, 2021
lucifer
yes api is there
2021-11-08 31251, 2021
lucifer
frontend ui is still pending
2021-11-08 31201, 2021
ruaok
I really only need "top recordings of 2021"
2021-11-08 31241, 2021
lucifer
yes that is doable. we currently don't have 2021 though because all periods are last.
2021-11-08 31208, 2021
ruaok
ah yes, we need a "to date" report for that.
2021-11-08 31209, 2021
lucifer
should be a minor change to generate for just 2021 though.
2021-11-08 31233, 2021
ruaok
that would be quite helpful!
2021-11-08 31252, 2021
lucifer
sure, will add that.
2021-11-08 31256, 2021
ruaok
thx
2021-11-08 31222, 2021
lucifer
thoughts on how to store in db? add to_date_week to enum or a new column for last/to_date period?
2021-11-08 31244, 2021
ruaok
enum, I'd say.
2021-11-08 31252, 2021
lucifer
👍
2021-11-08 31225, 2021
lucifer
monkey: alastairp: any PRs to merge? i'll do a release.
2021-11-08 31251, 2021
monkey
Maybe 1708 if you've got time for a quick second look
2021-11-08 31207, 2021
alastairp
nothign here
2021-11-08 31210, 2021
lucifer
lgtm, will deploy on test to test before merging.
2021-11-08 31256, 2021
lucifer
ruaok: safe to close LB-53 now i guess? we don't intend to do that anymore
I've been testing #1713 as well, which I am quite confident with now.
2021-11-08 31204, 2021
monkey
(after spending a full day on something I thought would take an hour)
2021-11-08 31209, 2021
lucifer
ruaok: i and alastairp were thinking to enable sentry tracing in prod. it'll help to know api usage and and performance metrics. in beta/test we trace every request, for prod we should do a low value (5% maybe). i myself haven't seen noticeable delays on beta or test so i think it should be fine. if we feel, its slowing down stuff we can always set it back to 0. its a matter of changing a consul config value. thoughts?
2021-11-08 31242, 2021
ruaok
do it
2021-11-08 31254, 2021
lucifer
awesome, thanks!
2021-11-08 31244, 2021
lucifer
monkey: alastairp: i also debugged LB-993. suggestions before i implement the fix?