#metabrainz

/

      • ruaok
        Always prefer the submitted MBIDs when available, otherwise use the mapped ones.
      • and I think that should be make consistent everywhere, unless there is a clear case that it needs to be different.
      • lucifer
        makes sense. i agree.
      • alastairp
        at any point should we compare user-submitted MBIDs against the metadata somehow?
      • as a way of trying to determine if the mbid makes sense or is garbage
      • ruaok
        likely.
      • lucifer
        not sure how we would do that but makes sense to do it.
      • ruaok
        I'm kinda taking this as an opportunity to get more user involvement. If your MBIDs suck, do better.
      • alastairp
        lucifer: you said that we prefer mapping mbids in stats, because that's what spark uses?
      • lucifer
        yes, the spark dumps only used mapped mbids hence stats use only mapped mbids.
      • zas
        ruaok: pong
      • alastairp
        when stats come back to the site, how are those mbids used? are there situations where a difference in the submitted/computed stats are going to cause stats to break or show incomplete data?
      • lucifer
        the mbids are used to group listens by spark so different mbids will show up twice in stats. like if some of user listens are mapped but some aren't then they show up twice. this has been reported quite a few times.
      • ruaok
      • BrainzBot
        LB-992: Too many LB full dumps on FTP site
      • ruaok
        do you know how the rsync rules between the ftp.musicbrainz.org and our FTP server are setup?
      • lucifer
        now if the user submitted listens have a different mbid and than what mapping would give and only some listens were submitted with a mbid, we'd have dupes.
      • ruaok
        we're pruning dumps, but the FTP site has many more dumps, causing their FTP server disk to nearly fill up.
      • zas
        no idea how and when it was set up (it wasn't by me), anything in syswiki
      • lucifer
        i ran a query to check where the user submitted mbid is different from mapped one, for listens of last week its roughly 1% of all mapped listens.
      • zas
        ?
      • alastairp
        lucifer: right. so should be possible to use user-submitted mbid here if it exists, and only if not use the mapper?
      • zas
      • ruaok
        no wait, something is a miss.
      • lucifer
        alastairp: yes should be possible.
      • ruaok
        I manually deleted the extra dumps from OUR server to solve the problem.
      • zas: ignore me for now.
      • lucifer
        ruaok: you deleted full dumps so i checked incremental dumps.
      • ZaphodBeeblebrox joined the channel
      • ZaphodBeeblebrox has quit
      • ZaphodBeeblebrox joined the channel
      • ruaok
        and there are loads of incremental dumps too, yes?
      • lucifer
      • ruaok
      • lucifer
        only 30 in ftp dir inside cron.
      • ruaok
        the rsync command is there.
      • lucifer
        but loads on data.mb.org
      • ruaok
        looks like we need to adjust the rsync command here to include --delete option.
      • lucifer
        makes sense.
      • what are the implications? say if due to some error dumps go missing temporarily or even permanently from kiss.
      • CatQuest has quit
      • ruaok
        the the FTP site reflects that.
      • *then
      • lucifer
        right, do we have anything as a fallback in that case to put those back up?
      • ruaok
        I'm not quite sure I understand the case you're guarding against.
      • can you elaborate?
      • lucifer
        we have had issues earlier where dumps were temporarily unavailable on lemmy. though that was mostly due to storage box issues. not sure if there are other reasons due to which such thing can happen in future.
      • ruaok
        ah, ok.
      • yeah, I wouldn't worry about it.
      • lucifer
        huh, we do already have --delete there though https://github.com/metabrainz/listenbrainz-serv...
      • ruaok
        given that our storage is a lot more reliable now.
      • lucifer
        makes sense
      • ruaok
        ok, perhaps we need to investigate the dump logs and see what the output of rsync is
      • ZaphodBeeblebrox is now known as CatQuest
      • and why its not actually doing the delete.
      • CatQuest is now known as ZaphodBeeblebrox
      • lucifer
      • we are using verbose output but don't see anything useful here.
      • ruaok
        is there a way to increase verbosity?
      • lucifer
        not sure, will need to check.
      • ruaok
        that should be an easy fix then.
      • monkey
        Deploying a PR to test.LB
      • OK akshat PR #1713 is ready to be reviewed, and just deployed to test.LB
      • akshat
        Amazing, thanks!
      • BrainzGit
        [listenbrainz-server] 14MonkeyDo merged pull request #1719 (03master…monkey-fix-pinned-recording-link-color): Fix pinned recording link color https://github.com/metabrainz/listenbrainz-serv...
      • [listenbrainz-server] 14MonkeyDo merged pull request #1704 (03master…monkey-brainzplayer-metadata-fix): Move brainzplayer_metadata object in submitted listens https://github.com/metabrainz/listenbrainz-serv...
      • lucifer
        ruaok: alastairp: we didn't get to conclude the discussion. use user submitted mbids over mapped. try to compare user submitted mbids with mapped. other than that?
      • ruaok
        more the the first part than the latter part, but yes.
      • lucifer
        should we use user submitted mbids in stats? there may be more duplicates due to that for some users.
      • ruaok
        I mean, if we compare user submitted vs matched... then what?
      • lucifer
        yeah, we'd need manual intervention to see what's right. so it goes back to asking the user.
      • ruaok
        I want to say for stats we should use mapped ids. I have a feeling that that will give better results.
      • lucifer
        i think so too.
      • also, acc to a query i ran, for listens of the last week there are ~1% listens where user submitted mbids are different from mapped ones.
      • thoughts about anything to do there?
      • ruaok
        nothing.
      • and of those 1% how many were *really* wrong and how many were "different album in the same release group" sort of problems?
      • lucifer
        latter for the few i checked.
      • ruaok
        nothing left to do there.
      • lucifer
        👍
      • moving on, about artist/track/release names. use user submitted ones or mapped?
      • i lean in favor of mapped everywhere except listens page. on the listens page, a future enhancement could be to show both and let the user choose?
      • ruaok
        "i lean in favor of mapped everywhere except listens page. " agreed.
      • lucifer
        👍
      • ruaok
        "a future enhancement could be to show both and let the user choose?" at best. we'll need to learn how this shakes out and what users want.
      • lucifer
        yeah indeed
      • next up, manual mapping. how to deal with it? say different users map a msid to different mbids.
      • this is another part of the missing mb data stuff riksucks is working on. for listens that could not get mapped but do exist in MB. so the other day we were discussing about a way to let users manually map those.
      • ruaok
        I think we should not worry about that right this second.
      • we have so many other things to do, I'd rather have us focus on features, than the perfect mapping tools.
      • lucifer
        not worry as in not implement manual mapping for now or something else?
      • ruaok
        one should drive demand and thus clarity for the other.
      • dont implement for now.
      • keep on radar to learn how to do it, but for now, lets move on.
      • lucifer: are we calcuating site-wide stats on a regular basis now?
      • lucifer
        ruaok: +1. agreed
      • ruaok: no. not deployed yet.
      • ruaok
        do you think that will be deloyed in the next 3 weeks?
      • lucifer
        yes, i intend to deploy today
      • ruaok
        great.
      • lucifer
        we'll have sitewide stats but not all for now.
      • ruaok
        and of course there is an API endpoint for it yes?
      • lucifer
        daily actvity and listening activity are unfinished currently.
      • yes api is there
      • frontend ui is still pending
      • ruaok
        I really only need "top recordings of 2021"
      • lucifer
        yes that is doable. we currently don't have 2021 though because all periods are last.
      • ruaok
        ah yes, we need a "to date" report for that.
      • lucifer
        should be a minor change to generate for just 2021 though.
      • ruaok
        that would be quite helpful!
      • lucifer
        sure, will add that.
      • ruaok
        thx
      • lucifer
        thoughts on how to store in db? add to_date_week to enum or a new column for last/to_date period?
      • ruaok
        enum, I'd say.
      • lucifer
        👍
      • monkey: alastairp: any PRs to merge? i'll do a release.
      • monkey
        Maybe 1708 if you've got time for a quick second look
      • alastairp
        nothign here
      • lucifer
        lgtm, will deploy on test to test before merging.
      • ruaok: safe to close LB-53 now i guess? we don't intend to do that anymore
      • BrainzBot
        LB-53: Create a cluster based on all messybrainz submissions with the same meta_sha256 https://tickets.metabrainz.org/browse/LB-53
      • ruaok
        yea
      • monkey
        I've been testing #1713 as well, which I am quite confident with now.
      • (after spending a full day on something I thought would take an hour)
      • lucifer
        ruaok: i and alastairp were thinking to enable sentry tracing in prod. it'll help to know api usage and and performance metrics. in beta/test we trace every request, for prod we should do a low value (5% maybe). i myself haven't seen noticeable delays on beta or test so i think it should be fine. if we feel, its slowing down stuff we can always set it back to 0. its a matter of changing a consul config value. thoughts?
      • ruaok
        do it
      • lucifer
        awesome, thanks!
      • monkey: alastairp: i also debugged LB-993. suggestions before i implement the fix?
      • BrainzBot
        LB-993: User feedback endpoint returns 502 when querying too many recordings https://tickets.metabrainz.org/browse/LB-993
      • ruaok
        I'm working on the planning out my part in the 2021 review stuff and there will be a few queries that will need a boatload of disk space.
      • lucifer
        (see comment on ticket for details)
      • ruaok
        shall we schedule a disk upgrade?
      • lucifer
        ah that reminds me of gaga upgrade
      • monkey
        I see no issue in increasing buffer size
      • lucifer
        sure makes sense
      • alastairp
        lucifer: same issue here in the AB bulk get api query
      • ruaok
        ok, let me kick off the process.
      • lucifer
        ah i was wondering if we had seen this in AB. (similarity endpoint?) but i thought it had too low a limit to face this issue.
      • alastairp
      • lucifer
        makes sense
      • so lets double the buffer size? the query is fast for this case in LB so I don't see issues there.
      • alastairp
        though I see our solution there was just to limit the max number of items
      • yes, let's upgrade the buffer size
      • lucifer
        👍
      • ruaok
        crap, we didn't consider something when working out how to add mapped MBIDs....
      • we didn't include the "listenstore offline" function.
      • lucifer
        can't we just take mbid mapping containers and it'll restart from there when it comes back up?
      • *take down
      • ruaok
        hmm, actually that might already be taken care of by the fetch listens being turned off during listenstore offline.