#metabrainz

/

      • diru1100
        yvanzo: Yes, that is exactly what I like to accomplish. But the trained file by Leo_Verto is not present to work on or I didn't find it so far....
      • yvanzo
        Yes, this is why a trainer must be made first.
      • Leo_Verto
        I can provide the trained model, it might make sense to re-train it on newer spam though.
      • yvanzo
        When Leo_Verto worked on it, he had direct access to privated data. It simplified development, but it also made it difficult ot fully open source it.
      • We want to avoid this by making everything that is necessary to train a model, but the private data, thus the needed dummy data.
      • The trainer must be deployed on MeB servers along with the MB DB, so it will have access to private data.
      • diru1100
        Yes, that is my intention too :) but the model if trained on dummy data would have a very bad foundation from the beggining, unless the dummy data is highly accurate to the real spam.
      • this brings to my other issue: What I found is that, the data that we keep through online learning is minimal relative to the data the model is trained on. This might not help change the model perception anytime soon. To change this, I want to retrain the model every week/month with new editor data. For this to happen, first we have to send the data for the model to test, send it back to SpamNinja, let them
      • classify the SB result. Store them back and train the model based on final review.
      • reosarevok
        yvanzo: did I get it correctly in the Muziekweb email that we should also change the cleanup to standardize to .nl?
      • yvanzo
        reosarevok: yup, I think so
      • reosarevok
        Ok
      • yvanzo
        reosarevok: and probably remove language too.
      • reosarevok
        We already do, it seems
      • yvanzo
        diru1100: let's have something based on dummy data that can be retrained with dummy feedback first :)
      • We can reboot from a rightful model after that, does it make sense?
      • diru1100
        yes, you want to test out the whole process with dummy data everywhere, first?.
      • yvanzo
        yes :)
      • bitmap
        reosarevok: could you rebase https://github.com/metabrainz/musicbrainz-serve... when you have a moment ('cause it's still referencing musicbrainz_collate so breaking for me)
      • reosarevok
        Sure
      • Gimme a sec
      • diru1100
        ok sure, I can use dummy data and do it np. Which approach do you want me to follow for updating the model? the online learning one or retrain every week/month one? or shall we test both of those as well?
      • BrainzGit
        [musicbrainz-server] yvanzo merged pull request #1526 (master…easiest-install-notes): Replace musicbrainz-vm with musicbrainz-docker https://github.com/metabrainz/musicbrainz-serve...
      • yvanzo
        diru1100: online learning at least, both if you feel it could be worth it :)
      • and if you want to do both, start with the simplest one, so something can be tested sooner.
      • Leo_Verto
        I think the problem with online learning is that the email and website tokenizers will need to be recreated once in a while to include new spam domains. This automatically invalidates all previously trained models.
      • KindTwo joined the channel
      • KindOne has quit
      • reosarevok
        bitmap: rebased
      • bitmap
        thx
      • KindOne joined the channel
      • KindTwo has quit
      • reosarevok
        Guess I should be rebasing pretty much everything really!
      • BrainzGit
        [musicbrainz-server] reosarevok merged pull request #1522 (master…MBS-10839): MBS-10839: Add merge button to recording list in artist overview https://github.com/metabrainz/musicbrainz-serve...
      • BrainzBot
        MBS-10839: "Add selected recordings for merging" missing in standalone-only overview https://tickets.metabrainz.org/browse/MBS-10839
      • BrainzGit
        [musicbrainz-server] reosarevok merged pull request #1519 (master…MBS-10834): MBS-10834: Account flag to disable writing edit notes https://github.com/metabrainz/musicbrainz-serve...
      • BrainzBot
        MBS-10834: New account flag for disabling ability to write edit notes https://tickets.metabrainz.org/browse/MBS-10834
      • v6lur joined the channel
      • diru1100
        yvanzo: I think I can complete the phase 1 with just dummy data and it doesn't involve any updation methods. I will research which way is better till then and we can go with that?
      • yvanzo
        works for me!
      • diru1100
        Cool :)
      • yvanzo
        :)
      • KindTwo joined the channel
      • KindOne has quit
      • KindTwo is now known as KindOne
      • shivam-kapila
        ruaok: Added GET API endpoints too to the PR
      • ruaok: Mr_Monkey's listens couldn't be reached after 3 pages because of this check. 3 week data is missing but window is 15 days.
      • ruaok
        yeah, fixing two tests and adding buttons prompting the user to reload with a longer range are the two things left to do.
      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1530 (master…MBS-10849): MBS-10849: Check for allowNew on AddReleaseGroup https://github.com/metabrainz/musicbrainz-serve...
      • BrainzBot
        MBS-10849: Add release group preview on RE shows "This entity has been removed" https://tickets.metabrainz.org/browse/MBS-10849
      • ruaok
        shivam-kapila: I trust your react skills are better than mine, right? mine are non-existent...
      • shivam-kapila
        I know react. Can't rate it though.
      • ruaok
        well, if you're up for helping, we need to add a button...
      • that gives the user the option to "look harder" for listens.
      • shivam-kapila
        Hm. I can do that.
      • ruaok
        ok, in the react props I will pass a new value called "try_harder" (boolean).
      • if true, show text and a button at the bottom of the listens in the center
      • "We could not find any more listen, but there may be more."
      • [ try harder ]
      • try harder should load the page with the same parameters, but with try_harder=1 as an extra argument.
      • does that make sense?
      • shivam-kapila
        Yeah
      • I will get this done
      • ruaok
        <3
      • v6lur has quit
      • BestSteve has quit
      • v6lur joined the channel
      • bitmap
        yvanzo: would like to deploy https://github.com/metabrainz/musicbrainz-serve... today so we don't have to babysit the cron container tonight & over the weekend
      • yvanzo
        bitmap: reviewed
      • BrainzGit has quit
      • BrainzGit joined the channel
      • jmp_music has quit
      • d4rkie joined the channel
      • BrainzGit
        [musicbrainz-server] mwiencek merged pull request #1528 (production…slow-reports): Prevent two statement timeouts in reports https://github.com/metabrainz/musicbrainz-serve...
      • Nyanko-sensei has quit
      • MajorLurker joined the channel
      • BrainzGit has left the channel
      • BrainzGit joined the channel
      • v6lur has quit
      • ruaok
        shivam-kapila: timescale-rebased-again is pushed -- something is making the connection to the db blow up during the integration tests. maybe you have a moment to look at it -- I suspect i'll be afk all day tomorrow.
      • zas
        bitmap, yvanzo: it seems container mb website on ludwig is misbehaving. Can you have a look?
      • bitmap, yvanzo: around?
      • ok, I'll just restart it, logs don't help at all (zilions of "Can't use an undefined value as a subroutine reference at /home/musicbrainz/carton-local/lib/perl5/Plack/Util.pm line 14")
      • I also restarted the ws container on ludwig, everything is back to normal now, not sure what happened (but it seems both containers were affected)
      • ZaphodBeeblebrox has quit
      • Chinmay3199 has quit
      • rdswift
        zas: The Picard docs refer to the following as basic tags, but I haven't yet found a release that will produce them. Do you know if they are still valid, or have they been deprecated? musicbrainz_originalalbumid, musicbrainz_originalartistid, musicbrainz_releasetrackid, originalalbum, originalartist
      • d4rkie has quit
      • Nyanko-sensei joined the channel