in #metabrainz

11:20 AM
ruaok

Always prefer the submitted MBIDs when available, otherwise use the mapped ones.
11:21 AM
and I think that should be make consistent everywhere, unless there is a clear case that it needs to be different.
11:21 AM
lucifer

makes sense. i agree.
11:22 AM
alastairp

at any point should we compare user-submitted MBIDs against the metadata somehow?
11:22 AM
as a way of trying to determine if the mbid makes sense or is garbage
11:22 AM
ruaok

likely.
11:22 AM
lucifer

not sure how we would do that but makes sense to do it.
11:23 AM
ruaok

I'm kinda taking this as an opportunity to get more user involvement. If your MBIDs suck, do better.
11:23 AM
alastairp

lucifer: you said that we prefer mapping mbids in stats, because that's what spark uses?
11:23 AM
lucifer

yes, the spark dumps only used mapped mbids hence stats use only mapped mbids.
11:23 AM
zas

ruaok: pong
11:23 AM
alastairp

when stats come back to the site, how are those mbids used? are there situations where a difference in the submitted/computed stats are going to cause stats to break or show incomplete data?
11:25 AM
lucifer

the mbids are used to group listens by spark so different mbids will show up twice in stats. like if some of user listens are mapped but some aren't then they show up twice. this has been reported quite a few times.
11:25 AM
ruaok

zas: https://tickets.metabrainz.org/browse/LB-992
11:25 AM
BrainzBot

LB-992: Too many LB full dumps on FTP site
11:26 AM
ruaok

do you know how the rsync rules between the ftp.musicbrainz.org and our FTP server are setup?
11:26 AM
lucifer

now if the user submitted listens have a different mbid and than what mapping would give and only some listens were submitted with a mbid, we'd have dupes.
11:26 AM
ruaok

we're pruning dumps, but the FTP site has many more dumps, causing their FTP server disk to nearly fill up.
11:27 AM
zas

no idea how and when it was set up (it wasn't by me), anything in syswiki
11:27 AM
lucifer

i ran a query to check where the user submitted mbid is different from mapped one, for listens of last week its roughly 1% of all mapped listens.
11:27 AM
zas

?
11:27 AM
alastairp

lucifer: right. so should be possible to use user-submitted mbid here if it exists, and only if not use the mapper?
11:28 AM
zas

https://github.com/metabrainz/syswiki/blob/3fb3...
11:28 AM
ruaok

no wait, something is a miss.
11:28 AM
lucifer

alastairp: yes should be possible.
11:28 AM
ruaok

I manually deleted the extra dumps from OUR server to solve the problem.
11:28 AM
zas: ignore me for now.
11:28 AM
lucifer

ruaok: you deleted full dumps so i checked incremental dumps.
11:29 AM
ZaphodBeeblebrox joined the channel
11:29 AM
ZaphodBeeblebrox has quit
11:29 AM
ZaphodBeeblebrox joined the channel
11:29 AM
ruaok

and there are loads of incremental dumps too, yes?
11:29 AM
lucifer

https://www.irccloud.com/pastebin/827V8smT/
11:30 AM
ruaok

lucifer: https://github.com/metabrainz/listenbrainz-serv...
11:30 AM
lucifer

only 30 in ftp dir inside cron.
11:30 AM
ruaok

the rsync command is there.
11:30 AM
lucifer

but loads on data.mb.org
11:31 AM
ruaok

looks like we need to adjust the rsync command here to include --delete option.
11:31 AM
lucifer

makes sense.
11:32 AM
what are the implications? say if due to some error dumps go missing temporarily or even permanently from kiss.
11:32 AM
CatQuest has quit
11:32 AM
ruaok

the the FTP site reflects that.
11:33 AM
*then
11:33 AM
lucifer

right, do we have anything as a fallback in that case to put those back up?
11:34 AM
ruaok

I'm not quite sure I understand the case you're guarding against.
11:34 AM
can you elaborate?
11:36 AM
lucifer

we have had issues earlier where dumps were temporarily unavailable on lemmy. though that was mostly due to storage box issues. not sure if there are other reasons due to which such thing can happen in future.
11:37 AM
ruaok

ah, ok.
11:37 AM
yeah, I wouldn't worry about it.
11:37 AM
lucifer

huh, we do already have --delete there though https://github.com/metabrainz/listenbrainz-serv...
11:37 AM
ruaok

given that our storage is a lot more reliable now.
11:37 AM
lucifer

makes sense
11:38 AM
ruaok

ok, perhaps we need to investigate the dump logs and see what the output of rsync is
11:38 AM
ZaphodBeeblebrox is now known as CatQuest
11:38 AM
and why its not actually doing the delete.
11:38 AM
CatQuest is now known as ZaphodBeeblebrox
11:42 AM
lucifer

https://www.irccloud.com/pastebin/U0lZXA7g/
11:42 AM
we are using verbose output but don't see anything useful here.
11:42 AM
ruaok

is there a way to increase verbosity?
11:44 AM
lucifer

not sure, will need to check.
11:44 AM
oh i see https://askubuntu.com/questions/476041/how-do-i...
11:46 AM
ruaok

that should be an easy fix then.
12:11 PM
monkey

Deploying a PR to test.LB
12:17 PM
OK akshat PR #1713 is ready to be reviewed, and just deployed to test.LB
12:18 PM
akshat

Amazing, thanks!
12:25 PM
BrainzGit

[listenbrainz-server] 14MonkeyDo merged pull request #1719 (03master…monkey-fix-pinned-recording-link-color): Fix pinned recording link color https://github.com/metabrainz/listenbrainz-serv...
12:37 PM
[listenbrainz-server] 14MonkeyDo merged pull request #1704 (03master…monkey-brainzplayer-metadata-fix): Move brainzplayer_metadata object in submitted listens https://github.com/metabrainz/listenbrainz-serv...
13:00 PM
lucifer

ruaok: alastairp: we didn't get to conclude the discussion. use user submitted mbids over mapped. try to compare user submitted mbids with mapped. other than that?
13:01 PM
ruaok

more the the first part than the latter part, but yes.
13:01 PM
lucifer

should we use user submitted mbids in stats? there may be more duplicates due to that for some users.
13:01 PM
ruaok

I mean, if we compare user submitted vs matched... then what?
13:02 PM
lucifer

yeah, we'd need manual intervention to see what's right. so it goes back to asking the user.
13:02 PM
ruaok

I want to say for stats we should use mapped ids. I have a feeling that that will give better results.
13:02 PM
lucifer

i think so too.
13:03 PM
also, acc to a query i ran, for listens of the last week there are ~1% listens where user submitted mbids are different from mapped ones.
13:03 PM
thoughts about anything to do there?
13:03 PM
ruaok

nothing.
13:04 PM
and of those 1% how many were *really* wrong and how many were "different album in the same release group" sort of problems?
13:04 PM
lucifer

latter for the few i checked.
13:04 PM
ruaok

nothing left to do there.
13:05 PM
lucifer

👍
13:05 PM
moving on, about artist/track/release names. use user submitted ones or mapped?
13:06 PM
i lean in favor of mapped everywhere except listens page. on the listens page, a future enhancement could be to show both and let the user choose?
13:08 PM
ruaok

"i lean in favor of mapped everywhere except listens page. " agreed.
13:08 PM
lucifer

👍
13:09 PM
ruaok

"a future enhancement could be to show both and let the user choose?" at best. we'll need to learn how this shakes out and what users want.
13:09 PM
lucifer

yeah indeed
13:10 PM
next up, manual mapping. how to deal with it? say different users map a msid to different mbids.
13:11 PM
this is another part of the missing mb data stuff riksucks is working on. for listens that could not get mapped but do exist in MB. so the other day we were discussing about a way to let users manually map those.
13:11 PM
ruaok

I think we should not worry about that right this second.
13:11 PM
we have so many other things to do, I'd rather have us focus on features, than the perfect mapping tools.
13:11 PM
lucifer

not worry as in not implement manual mapping for now or something else?
13:11 PM
ruaok

one should drive demand and thus clarity for the other.
13:12 PM
dont implement for now.
13:12 PM
keep on radar to learn how to do it, but for now, lets move on.
13:18 PM
lucifer: are we calcuating site-wide stats on a regular basis now?
13:24 PM
lucifer

ruaok: +1. agreed
13:24 PM
ruaok: no. not deployed yet.
13:24 PM
ruaok

do you think that will be deloyed in the next 3 weeks?
13:25 PM
lucifer

yes, i intend to deploy today
13:25 PM
ruaok

great.
13:25 PM
lucifer

we'll have sitewide stats but not all for now.
13:25 PM
ruaok

and of course there is an API endpoint for it yes?
13:25 PM
lucifer

daily actvity and listening activity are unfinished currently.
13:25 PM
yes api is there
13:25 PM
frontend ui is still pending
13:26 PM
ruaok

I really only need "top recordings of 2021"
13:26 PM
lucifer

yes that is doable. we currently don't have 2021 though because all periods are last.
13:27 PM
ruaok

ah yes, we need a "to date" report for that.
13:27 PM
lucifer

should be a minor change to generate for just 2021 though.
13:27 PM
ruaok

that would be quite helpful!
13:27 PM
lucifer

sure, will add that.
13:27 PM
ruaok

thx
13:28 PM
lucifer

thoughts on how to store in db? add to_date_week to enum or a new column for last/to_date period?
13:28 PM
ruaok

enum, I'd say.
13:28 PM
lucifer

👍
13:29 PM
monkey: alastairp: any PRs to merge? i'll do a release.
13:29 PM
monkey

Maybe 1708 if you've got time for a quick second look
13:32 PM
alastairp

nothign here
13:32 PM
lucifer

lgtm, will deploy on test to test before merging.
13:34 PM
ruaok: safe to close LB-53 now i guess? we don't intend to do that anymore
13:34 PM
BrainzBot

LB-53: Create a cluster based on all messybrainz submissions with the same meta_sha256 https://tickets.metabrainz.org/browse/LB-53
13:35 PM
ruaok

yea
13:40 PM
monkey

I've been testing #1713 as well, which I am quite confident with now.
13:41 PM
(after spending a full day on something I thought would take an hour)
13:47 PM
lucifer

ruaok: i and alastairp were thinking to enable sentry tracing in prod. it'll help to know api usage and and performance metrics. in beta/test we trace every request, for prod we should do a low value (5% maybe). i myself haven't seen noticeable delays on beta or test so i think it should be fine. if we feel, its slowing down stuff we can always set it back to 0. its a matter of changing a consul config value. thoughts?
13:47 PM
ruaok

do it
13:47 PM
lucifer

awesome, thanks!
13:48 PM
monkey: alastairp: i also debugged LB-993. suggestions before i implement the fix?
13:48 PM
BrainzBot

LB-993: User feedback endpoint returns 502 when querying too many recordings https://tickets.metabrainz.org/browse/LB-993
13:49 PM
ruaok

I'm working on the planning out my part in the 2021 review stuff and there will be a few queries that will need a boatload of disk space.
13:49 PM
lucifer

(see comment on ticket for details)
13:49 PM
ruaok

shall we schedule a disk upgrade?
13:49 PM
lucifer

ah that reminds me of gaga upgrade
13:49 PM
monkey

I see no issue in increasing buffer size
13:49 PM
lucifer

sure makes sense
13:49 PM
alastairp

lucifer: same issue here in the AB bulk get api query
13:49 PM
ruaok

ok, let me kick off the process.
13:50 PM
lucifer

ah i was wondering if we had seen this in AB. (similarity endpoint?) but i thought it had too low a limit to face this issue.
13:50 PM
alastairp

lucifer: https://github.com/metabrainz/acousticbrainz-se...
13:50 PM
lucifer

makes sense
13:51 PM
so lets double the buffer size? the query is fast for this case in LB so I don't see issues there.
13:51 PM
alastairp

though I see our solution there was just to limit the max number of items
13:51 PM
yes, let's upgrade the buffer size
13:51 PM
lucifer

👍
13:53 PM
ruaok

crap, we didn't consider something when working out how to add mapped MBIDs....
13:53 PM
we didn't include the "listenstore offline" function.
13:54 PM
lucifer

can't we just take mbid mapping containers and it'll restart from there when it comes back up?
13:54 PM
*take down
13:54 PM
ruaok

hmm, actually that might already be taken care of by the fetch listens being turned off during listenstore offline.