#metabrainz

/

      • d4rkie has quit
      • d4rk-ph0enix joined the channel
      • BrainzGit
        [critiquebrainz] 14TrellixVulnTeam opened pull request #475 (03master…master): CVE-2007-4559 Patch https://github.com/metabrainz/critiquebrainz/pu...
      • [sir] 14amCap1712 opened pull request #140 (03master…fix-deprecation-warnings): Fix deprecation warnings in SIR https://github.com/metabrainz/sir/pull/140
      • [sir] 14amCap1712 opened pull request #141 (03fix-deprecation-warnings…fix-more-warnings): Fix more SQLAlchemy related warnings https://github.com/metabrainz/sir/pull/141
      • [sir] 14amCap1712 opened pull request #142 (03fix-more-warnings…2to3): Port Sir to Python 3 https://github.com/metabrainz/sir/pull/142
      • -- BotBot disconnected, possible missing messages --
      • -- BotBot disconnected, possible missing messages --
      • BrainzBot joined the channel
      • kgz joined the channel
      • rektide joined the channel
      • yvanzo
        O’Moin
      • lucifer
        morning!
      • BrainzGit
        [troi-recommendation-playground] 14amCap1712 opened pull request #72 (03main…recent-listens-filter): Add RecentListensTimestampLookup element https://github.com/metabrainz/troi-recommendati...
      • lucifer
        yvanzo: hi! while testing the SQLAlchemy PR, did you notice any performance regression in indexing?
      • yvanzo
        lucifer: No but the performances were so uneven before that it is hard to compare.
      • lucifer
        i see, makes sense.
      • yvanzo
        Did you notice anything like that?
      • lucifer
        while testing the Sir Python 3 PR on wolf, i noticed that SIR is making multiple queries to fetch the data of a single object. the N + 1 select problem.
      • yvanzo
        Might it be related to mbdata?
      • lucifer
        i don't remember how much time it used to take earlier so not sure if its a regression.
      • possibly, but i did checkout v3.0.1 version and did a quick test and saw the same issue there.
      • yvanzo
        I can make a speed comparison test on an isolated VM.
      • lucifer
        i'll try to do a clean setup and see if it happens on v3.0.1 as well. in any case, makes sense to fix the issue.
      • oh that would be great, thanks!
      • i'll look into fixing the N + 1 select problem after Sir is ported to Py 3.
      • yvanzo
        lucifer: did you test reindexing with #140 & #141 already?
      • lucifer
        yvanzo: no i didn't reindex with those. i am reindexing with #142 currently which changes from all the PRs.
      • *which contains
      • fwiw, the real data indexing tests pass with #140 and #141.
      • yvanzo
        how do you check queries being made while testing?
      • lucifer
        testing as in when running the unit tests or while testing indexing on the entire live database?
      • for unit tests, add `echo=True` to `create_engine` call in `sir.util`. and add a `assert False` at the end of the test you want to view queries for (because pytest doesn't logs for passing tests by default).
      • for live database test, run sir reindex with `--sqltimings` option
      • yvanzo
        as in how did you notice it but both are interesting
      • lucifer
      • here is a snippet of logs from --sqltimings
      • as you can see its loading the comment field in a separate query instead of the one used to load main artist data
      • for the unit tests, this is the hack i use to view queries: https://github.com/metabrainz/sir/commit/5d8706...
      • SQLAlchemy now has a `raisedload` feature, which raises an exception if it detects multiple queries being made to fetch columns. we can look into adding that once the issue is resolved. that'll probably help in preventing regressions.
      • v6lur has quit
      • riksucks
        hello lucifer!
      • I was lurking jira and found LB-1151, I believe that when I had coded the frontend of missing mb I had also reported of recordings that are already mapped in MB. I think we even brainstormed on how to go about it.
      • BrainzBot
        LB-1151: Cannot select existing recording in "missing metadata" page https://tickets.metabrainz.org/browse/LB-1151
      • riksucks
        I think what came of it was, storing these false positives in a db and not serve these false positives in the missing mb endpoint, and later on over time map them. I might be wrong
      • is it how we decided, because I do remember that there were some uncertainties, which is why we didn't proceed.
      • cc monkey aerozol
      • trolley has quit
      • trolley joined the channel
      • v6lur joined the channel
      • lucifer
        riksucks: hi! we not only want to store the false positives but also be able to say that this false positive should have been mapped to this recording in musicbrainz.
      • for that part of the feature, we probably need some more data in LB (the improvements to mb_metadata_cache which is why that is currently on hold)
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2696 (03master…MBS-12674): MBS-12674: Don't assume offset data exists https://github.com/metabrainz/musicbrainz-serve...
      • reosarevok
        bitmap, yvanzo ^ can one of you check this today? I'd like to put this (and other merged fixes) into beta tonight
      • yvanzo: do you need a second ticket for the refcount bit of https://github.com/metabrainz/musicbrainz-serve... ? Or is it fine if I just widen the existing ticket to cover it?
      • yvanzo
        reosarevok: not personally; either way is fine as long as it is covered by a ticket since it is a user-facing change.
      • reosarevok
        Ok
      • For the other issue, dunno if you saw the https://github.com/metabrainz/musicbrainz-serve... PR but it's only half-fixable :/
      • For the ticket, I changed "Expand time before auto-deletion to 48 hours" to "Standardize time before auto-deletion to 48 hours" - and I had already added this comment https://tickets.metabrainz.org/browse/MBS-10861... - does that seem enough ? :)
      • BrainzBot
        MBS-10861: Standardize time before auto-deletion to 48 hours
      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2697 (03master…MBS-12672): MBS-12672: Add "Genre" to edit search like other entities https://github.com/metabrainz/musicbrainz-serve...
      • yvanzo
        reosarevok: Why unreferenced ACs should be autoremoved earlier if it is causing issues?
      • reosarevok
        Because otherwise any artists used in those that should be autoremoved stay around for over a week - but also, IIRC I mentioned that this issue is not really helped by a longer autoremoval time
      • Because anyone can reuse the AC in an open edit at any time during the autoremoval-pending period
      • So if it's 7 days, but somebody adds the edit 2 days before the end of the 7 days, it's the same as if it was 2 days and they entered it at the beginning
      • (the only issue caused, in any case, should be that the edit adds a new AC rather than reusing the old, now autoremoved one, at least by bitmap's understanding)
      • yvanzo
        (which is quite annoying as we support AC MBIDs)
      • reosarevok
        Basically, as far as I can tell, the only way to avoid that happening at all is checking whether any edit data anywhere is using the AC, which seems unfeasible, or have a separate edits_pending-like thing that we update when any edit uses a specific AC, which I guess is doable, but complicated
      • Or, I guess, we redefine edits_pending so that every time an AC is used in an open edit it gets updated, but then most popular ACs would be highlighted pretty much all the time, which does not seem desirable
      • We can at least stop removals when the AC is being edited directly (it has actual edits_pending)
      • That much is easy
      • And then we can think if there's a good way we can improve the setup with a schema change so that other edits are also detected / stop the removal maybe :)
      • yvanzo
        Do you have at hand any example of edit missing ACs?
      • reosarevok
        Not really - it's a theoretical worry right now, although I bet some are out there
      • But basically: empty an AC of uses so it will get added to the unreferenced list, then enter an open (pending) edit that would use it again, for example by adding it to a new track in an existing release or by editing a recording artist
      • The good thing is that since releases for example get added while pending, it being used in a new release would not cause issues (neither would, of course, it being used in any automatically applied edit)
      • alastairp
        lucifer: great notes about testing sir! thanks
      • would be nice to put them in a debugging/admin doc :)
      • please let me know if I can help on anything too
      • today I'm going to get back into oauth
      • monkey
        aerozol: Hi! One more mockup request for you :) It looks like we'll soon need to add more user preferences to listenbrainz (looking at LB#2206), and the existing settings page is a bit haphazard (https://listenbrainz.org/profile). Not sure where more settings options would currently fit. Could you help us prepare a better setting page? We'll eventually want separate sections for general profile settings, brainzplayer settings,
      • account tokens & deletion, etc.
      • BrainzBot
        Add troi preference for auto spotify export: https://github.com/metabrainz/listenbrainz-serv...
      • monkey
        A section for recommendation settings, perhaps? ^
      • yvanzo
        alastairp: About Solr in MB, do you want to have a feature-release before moving to Solr 9?
      • alastairp
        yvanzo: I think a feature release would be good - the sort change + reosarevok's PR for genre?
      • would be a good way to also test sir+sqlalchemy upgrade without also introducing additional complexity of another upgrade with solr?
      • yvanzo
        yes, plus dropping id for tags and cdstubs, because it has a dependency on mbssss
      • alastairp
        great. for tags + cdstubs if that includes a schema change then we'll need to reindex those cores?
      • yvanzo
        lucifer: Does it seem good to plan for another py2 release of SIR to match changes made to other changes to other search repositories?
      • alastairp
        as I mentioned in the PR for my search order PR, I think we can get away with just shutting down the core, editing the data file, and starting it up again
      • it means that we don't need to reindex the large cores
      • yvanzo
        alastairp: For tags and cdstubs, I guess that it is not needed to reindex those cores to keep it working, reindexing would just lighten these cores; To be confirmed with tests.
      • These two ones are not large cores anyway.
      • alastairp
        because it changes/removes a column from the schema I recommend to reindex
      • yes, and because they're small I think that it should be fine to do this
      • yvanzo
        Will look into it more closely then, if it requires reindexing cores, maybe it should go with Solr 9 release instead.
      • alastairp
        hmm. but that means we would need to reindex _all_ cores?
      • instead of just the two?
      • yvanzo
        For Solr 9? Yes.
      • I assumed so at least.
      • lucifer
        alastairp: sure, i'll open a PR later to add it to sir documentation.
      • yvanzo: sounds good to do a py2 release if needed.
      • mayhem: for the filter constant, thoughts on naming? DAYS_OF_RECENT_LISTENS_TO_EXCLUDE ?
      • mayhem
        SGTM
      • lucifer
        alastairp, yvanzo: i couldn't find the same section in solr 7 docs but from solr 8. https://solr.apache.org/guide/8_0/reindexing.ht...
      • reindexing seems optional in case of field removal.
      • mayhem
        lucifer: seems there are loads of open questions on troi -- I'm around if you want to discuss.
      • Maxr1998 has quit
      • lucifer
        sure, in a min.
      • yvanzo
        thanks
      • lucifer
      • BrainzGit
        [troi-recommendation-playground] 14mayhem merged pull request #42 (03main…add-genre-filter): Add a genre filter element https://github.com/metabrainz/troi-recommendati...
      • [troi-recommendation-playground] 14amCap1712 merged pull request #72 (03main…recent-listens-filter): Add RecentListensTimestampLookup element https://github.com/metabrainz/troi-recommendati...
      • lucifer
        mayhem: let's start
      • mayhem is ready
      • mayhem
        what the open question there?
      • lucifer
        for exporting existing playlists from LB to spotify, i added this patch. i am not sure whether a patch is the right way to do this and whether this patch should live in LB or troi?
      • mayhem
        hmmm.
      • if we're going to use troi to generate playlists for the time being, then leaving it here for now sounds fine.
      • lucifer
        one thing for sure that should live in troi is the deserialize json to a playlist object.
      • mayhem
        when/if we outgrow troi for this purpose, we'll reexamine.
      • lucifer
        sure sounds good.
      • mayhem
        agreed.
      • lucifer
        another related thing is we need to external playlist urls in the playlists. so that we can return the url of the spotify playlist to the frontend.
      • mayhem
        makes sense.
      • lucifer
        i added an external_urls field to the playlist object. currently, its a list but i think it makes to make it a dict so that in future we can eaily allow submitting to multiple sites.
      • mayhem
        good plan.
      • lucifer
        relatedly, we probably want to show the number of tracks not matched or which tracks exactly were not matched i guess. how do we implement that?
      • should we attach a generic metadata dict to each playlist?
      • actually for the current needs, we could iterate once more over the existing recordings in LB so probably let's just do that.
      • mayhem
        we already have this, no?
      • mayhem finds a link
      • lucifer
      • algorithm_metadata ?
      • mayhem
        yes.
      • lucifer
        that sounds wrong for the kind of data we want to store.
      • i want to store something like 47 / 50 tracks matched. here are the 3 tracks we couldn't match.
      • mayhem
        my vision for this field is to store things that relate to the playlist at hand.
      • ah, I see. yes.
      • playlist level data vs track level. got it.
      • lucifer
        i see. another thing is we filter algorithm metadata in LB iirc.
      • mayhem
        I think we need a playlist level dict, agreed.
      • I think we should start thinking about adding a transparency field to the alg metadata.
      • which should contain human readable text as to WHY this track is in their playlist.
      • lucifer
        yeah so currently if we submit algorithm metadata as a normal user, its ignored. https://github.com/metabrainz/listenbrainz-serv...
      • transparency field makes sense
      • mayhem nods
      • oh and about the external uris as well, those also need to have a field in JSPF extensions i think.