#metabrainz

/

      • jesus2099 joined the channel
      • jesus2099 has quit
      • ritiek has quit
      • ritiek joined the channel
      • ritiek has quit
      • BrainzGit
        [bookbrainz-site] 14akashgp09 opened pull request #659 (03series-entity…series-delete-handler): feat(series): deletion handler https://github.com/bookbrainz/bookbrainz-site/p...
      • [listenbrainz-server] 14jdaok opened pull request #1538 (03master…Pinned-Recording-Modal): Pinned recording modal https://github.com/metabrainz/listenbrainz-serv...
      • texke has quit
      • texke joined the channel
      • d4rkie joined the channel
      • d4rk-ph0enix has quit
      • ZaphodBeeblebrox joined the channel
      • ZaphodBeeblebrox has quit
      • ZaphodBeeblebrox joined the channel
      • akashgp09 joined the channel
      • CatQuest has quit
      • ruaok
        moooin!
      • time or the accounting gualg for me...
      • zas: alastairp: monkey: yvanzo: invoices please
      • Zastai joined the channel
      • alastairp
        lucifer: great catch
      • ruaok: on the way
      • lucifer: right, I was aware that they were going to be different. So the trick now is to work out exactly where they're different. unfortunately part of this might involve debugging internal gaia representations, let me have a quick look to see if I can find a good way to do this
      • I'll be mostly away today if you want to do something else, but we could pick it up tomorrow
      • lucifer
        morning!
      • alastairp: Hi! indeed they can be different but the difference in the accuracy field is huge, 0.89% vs 90%.
      • alastairp
        so one of them is clearly doing the wrong thing, then?
      • lucifer
        (unless its an error in how we display it and its actuall 89%)
      • alastairp
        given that they're off by a factor of 100, that sounds very possible
      • lucifer
        the values is the table are similar though.
      • indeed, the values in the table determine the accuracy right?
      • alastairp
        yes, right
      • sum of (num correct/num possible)
      • or somesuch
      • lucifer
        makes sense, display error probably then. i'll see if i can fix it.
      • alastairp
        ok, so. one other thing now - we should have 2 files representing the model. the gaia one ends in .history, I'm not sure how the sklearn one is
      • lucifer
        then we can work on the bigger problem of making sklearn and gaia converge :)
      • alastairp
        if it's just 1% it could just be an issue of how it split the data randomly when training
      • we have a way of generating highlevel files from lowlevel + model: hl_calc for gaia, and there are some scripts for sklearn
      • lucifer
        i see, let me see if i can find the .history equivalent.
      • alastairp
        so one next trick would be to take another random sample of lowlevel files and run them through bth
      • both
      • and look at the resulting hl files and see how consistent they are
      • lucifer
        also gaia took ~8 mins to run but sklearn almost ~30 mins.
      • alastairp
        gaia uses multi processing I believe, does sklearn?
      • lucifer
        it probably supports it, not sure if we configure it to use it though
      • alastairp
        with default settings, gaia sweeps through about 700 combinations of parameters (different values of C, gamma, preprocessing steps)
      • and each combination is independent
      • sklearn definitely has helpers for this kind of stuff
      • lucifer
        yes makes sense
      • alastairp
        I'm going out for a moment, back soon
      • lucifer
        👍
      • ritiek joined the channel
      • ritiek has quit
      • yyoung
        reosarevok: Seems CI is not triggered in PRs?
      • reosarevok
        Huh. Does seem that way
      • lucifer
        https://status.circleci.com/ Github Webhooks are down for CircleCI probably that's why.
      • down for everybody actually https://www.githubstatus.com/
      • reosarevok
        Yeah, that's also probably why we don't have BranzGit messages here
      • lucifer
        yup, makes sense.
      • ruaok: FYI, spark cluster is unusable currently. I am trying to request user similarity based on artists but the executors crash. I have been trying to debug it for some time now.
      • ruaok
        k
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2156 (03master…MBS-11756): MBS-11756: Collapse artist roles when there are too many https://github.com/metabrainz/musicbrainz-serve...
      • yyoung
        reosarevok: Selenium tests timed out, can you rerun it? https://ci.metabrainz.org/job/musicbrainz-serve...
      • Zastai
        lucifer: with the .89 vs 90, could it be as simple as a different decimal point leading to issues (e.g. one always using '.', the other using a locale-specific one that might end up being ',')?
      • lucifer
        Zastai: might be possible but i don't think we do localization in AB. i think its rather related to one tool storing it in on a 0 to 1 scale while other directly outputing in %.
      • Zastai
        makes sense
      • lucifer
        ruaok: i think the issue is fixed now. user similarity values based on artists have been generated.
      • and it is causing top similar page to ISE, doing a release should fix it.
      • any other PRs you want to merge ruaok, if not I'll release now.
      • ruaok
        nothing, go for it.
      • lucifer
        👍
      • BrainzGit
        [listenbrainz-server] release 03v-2021-07-01.0 has been published by 14github-actions[bot]: https://github.com/metabrainz/listenbrainz-serv...
      • lucifer
        top similar loads but not showing as expected.
      • global similarity using artists.
      • akshaaatt[m]
        lucifer: Hola!
      • ruaok
        woah, very different results.
      • lucifer
        hi akshaaatt[m]
      • yup, indeed
      • akshaaatt[m]
        lucifer: I was focusing on two things recently, 1) Trying to see if there were an alternate available for our current barcode scanner, and for now the conclusion states that let things be the way they are. 2) I was thinking and designing how to devise the search activities and further
      • So I have some good idea regarding 2
      • Will start work on a new branch rn do you suggest we focus on the previous PRs first?
      • rn or*
      • I am comfortable as per your directions :)
      • Sophist_UK has quit
      • lucifer
        i think either is fine. regarding open PRs, i think its mostly done from your side.
      • i would suggest that we first work on adding settings to hide stuff and the remaining onboarding stuff.
      • akshaaatt[m]
        Right!
      • I shall add that to the onboarding PR then?
      • lucifer
        sure onboarding stuff can go to that PR. settings to the webview one methinks?
      • akshaaatt[m]
        Yeah right! That was what I was saying then. We could add the settings as soon as these PRs were dealt with
      • But they are just settings I think shouldn't be a problem.
      • so I^
      • lucifer
        yup
      • ruaok
        param and/or lucifer: can you please look at the react/javascript portions of #1538?
      • lucifer
        i am not much familiar with it but sure i'll take a look later today.
      • param
        yeah, happy to help
      • ruaok
        ah, sorry. param said he's be able to do it.
      • ok,cool, thanks param,
      • lucifer: #1535 is ready to merge I think. wanna sanity check it so that #1538 can be rebased?
      • lucifer
        one less encounter with js :). thanks param
      • ruaok: sure on it
      • ruaok
        a man after my own ♥️
      • lucifer
        lol XD
      • 1535 approved, should i merge?
      • ruaok
        plz!
      • BrainzGit
        [listenbrainz-server] 14amCap1712 merged pull request #1535 (03master…add-blurb-content-limit-to-pin-recordings): Add blurb content limit to pin recordings https://github.com/metabrainz/listenbrainz-serv...
      • ruaok
        lucifer: so the listenebrainz outdated dumps email is wrong. it says 464 is 16 days old. all the files are june 26. odd.
      • oh, I see.
      • lucifer
        the dump-id was of 15th.
      • ruaok
        it goes by dump date, not by file date.
      • lucifer
        right
      • i had seen the mail in the morning and checked the cron logs, the full dump is underway currently.
      • ruaok
        maybe the check needs to be extended by a day.
      • lucifer
        yeah i think that makes sense.
      • BrainzGit
        [listenbrainz-server] 14MonkeyDo opened pull request #1539 (03master…monkey-idle-brainzplayer): BrainzPlayer datasources improvements https://github.com/metabrainz/listenbrainz-serv...
      • monkey
        Is anyone using test.LB for anything specific? I'd like to deploy this PR ^ so that we can test it live
      • It should improve the player, reduce verbosity and make it clear to users they need to connect a music service
      • BrainzGit
        [listenbrainz-server] 14mayhem opened pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-serv...
      • ruaok quietly slinks away
      • lucifer
        monkey: no one is using test.lb currently.
      • monkey
        Okidoke, thanks :)
      • lucifer
        (i know because i had updated it last :))
      • ruaok: that was a really tough one. phew. finally completed reviewing it.
      • ruaok
        lol. soo many characters to descibe a one character fix.
      • lucifer
        😂
      • ruaok
        the test failure seem unrelated.
      • lucifer
        yup, i re-ran tests.
      • the failure is related to the old resource is closed issue probably.
      • BrainzGit
        [listenbrainz-server] 14mayhem merged pull request #1540 (03master…extend-outdated-dumps-check): Extend complaining about outdated monthly dumps by one day. https://github.com/metabrainz/listenbrainz-serv...
      • ruaok
        lucifer: the parquet files for the new spark dumps... should they be organized by year folders and then all the listens for a single day in a single file?
      • or however those concepts translate. :)
      • lucifer
        any way would be fine, we are going to control to read logic as well.
      • let me check a few things and see if one way is better than others.
      • ruaok
        ok
      • lucifer
        there's LB-722
      • BrainzBot
        LB-722: Restructure data in hdfs to allow easier updates https://tickets.metabrainz.org/browse/LB-722
      • lucifer
        and another thing is that hdfs is less efficient when reading small files.
      • structuring by years makes sense to me, below that we should take a look into what's the average size of a day's listens file.
      • iirc, i have seen many 2MB files in spark. hdfs default block size is 128 MB so a lot of space is wasted.
      • trying to keep the usual file size between 64-128 MB would be good.
      • (size of each paraquet file)
      • ruaok
        what if we just write files with monotonically increasing numbers, all just under 128MB?
      • numbers as filenames, that is.
      • should they be in sorted order?
      • lucifer
        i think that would work. we could ask spark to read the files in descending order and stop when an entire block is out of range of the time period.
      • yes sorted would be good, we can minimize loading unneeded files this way.
      • ruaok
        ok, hopefully I can get that started tomorrow.
      • lucifer
        cool, thanks!