[sir] 14amCap1712 opened pull request #141 (03fix-deprecation-warnings…fix-more-warnings): Fix more SQLAlchemy related warnings https://github.com/metabrainz/sir/pull/141
yvanzo: hi! while testing the SQLAlchemy PR, did you notice any performance regression in indexing?
yvanzo
lucifer: No but the performances were so uneven before that it is hard to compare.
lucifer
i see, makes sense.
yvanzo
Did you notice anything like that?
lucifer
while testing the Sir Python 3 PR on wolf, i noticed that SIR is making multiple queries to fetch the data of a single object. the N + 1 select problem.
yvanzo
Might it be related to mbdata?
lucifer
i don't remember how much time it used to take earlier so not sure if its a regression.
possibly, but i did checkout v3.0.1 version and did a quick test and saw the same issue there.
yvanzo
I can make a speed comparison test on an isolated VM.
lucifer
i'll try to do a clean setup and see if it happens on v3.0.1 as well. in any case, makes sense to fix the issue.
oh that would be great, thanks!
i'll look into fixing the N + 1 select problem after Sir is ported to Py 3.
yvanzo
lucifer: did you test reindexing with #140 & #141 already?
lucifer
yvanzo: no i didn't reindex with those. i am reindexing with #142 currently which changes from all the PRs.
*which contains
fwiw, the real data indexing tests pass with #140 and #141.
yvanzo
how do you check queries being made while testing?
lucifer
testing as in when running the unit tests or while testing indexing on the entire live database?
for unit tests, add `echo=True` to `create_engine` call in `sir.util`. and add a `assert False` at the end of the test you want to view queries for (because pytest doesn't logs for passing tests by default).
for live database test, run sir reindex with `--sqltimings` option
yvanzo
as in how did you notice it but both are interesting
SQLAlchemy now has a `raisedload` feature, which raises an exception if it detects multiple queries being made to fetch columns. we can look into adding that once the issue is resolved. that'll probably help in preventing regressions.
v6lur has quit
riksucks
hello lucifer!
I was lurking jira and found LB-1151, I believe that when I had coded the frontend of missing mb I had also reported of recordings that are already mapped in MB. I think we even brainstormed on how to go about it.
I think what came of it was, storing these false positives in a db and not serve these false positives in the missing mb endpoint, and later on over time map them. I might be wrong
is it how we decided, because I do remember that there were some uncertainties, which is why we didn't proceed.
cc monkey aerozol
trolley has quit
trolley joined the channel
v6lur joined the channel
lucifer
riksucks: hi! we not only want to store the false positives but also be able to say that this false positive should have been mapped to this recording in musicbrainz.
for that part of the feature, we probably need some more data in LB (the improvements to mb_metadata_cache which is why that is currently on hold)
For the ticket, I changed "Expand time before auto-deletion to 48 hours" to "Standardize time before auto-deletion to 48 hours" - and I had already added this comment https://tickets.metabrainz.org/browse/MBS-10861... - does that seem enough ? :)
BrainzBot
MBS-10861: Standardize time before auto-deletion to 48 hours
reosarevok: Why unreferenced ACs should be autoremoved earlier if it is causing issues?
reosarevok
Because otherwise any artists used in those that should be autoremoved stay around for over a week - but also, IIRC I mentioned that this issue is not really helped by a longer autoremoval time
Because anyone can reuse the AC in an open edit at any time during the autoremoval-pending period
So if it's 7 days, but somebody adds the edit 2 days before the end of the 7 days, it's the same as if it was 2 days and they entered it at the beginning
(the only issue caused, in any case, should be that the edit adds a new AC rather than reusing the old, now autoremoved one, at least by bitmap's understanding)
yvanzo
(which is quite annoying as we support AC MBIDs)
reosarevok
Basically, as far as I can tell, the only way to avoid that happening at all is checking whether any edit data anywhere is using the AC, which seems unfeasible, or have a separate edits_pending-like thing that we update when any edit uses a specific AC, which I guess is doable, but complicated
Or, I guess, we redefine edits_pending so that every time an AC is used in an open edit it gets updated, but then most popular ACs would be highlighted pretty much all the time, which does not seem desirable
We can at least stop removals when the AC is being edited directly (it has actual edits_pending)
That much is easy
And then we can think if there's a good way we can improve the setup with a schema change so that other edits are also detected / stop the removal maybe :)
yvanzo
Do you have at hand any example of edit missing ACs?
reosarevok
Not really - it's a theoretical worry right now, although I bet some are out there
But basically: empty an AC of uses so it will get added to the unreferenced list, then enter an open (pending) edit that would use it again, for example by adding it to a new track in an existing release or by editing a recording artist
The good thing is that since releases for example get added while pending, it being used in a new release would not cause issues (neither would, of course, it being used in any automatically applied edit)
alastairp
lucifer: great notes about testing sir! thanks
would be nice to put them in a debugging/admin doc :)
please let me know if I can help on anything too
today I'm going to get back into oauth
monkey
aerozol: Hi! One more mockup request for you :) It looks like we'll soon need to add more user preferences to listenbrainz (looking at LB#2206), and the existing settings page is a bit haphazard (https://listenbrainz.org/profile). Not sure where more settings options would currently fit. Could you help us prepare a better setting page? We'll eventually want separate sections for general profile settings, brainzplayer settings,
alastairp: About Solr in MB, do you want to have a feature-release before moving to Solr 9?
alastairp
yvanzo: I think a feature release would be good - the sort change + reosarevok's PR for genre?
would be a good way to also test sir+sqlalchemy upgrade without also introducing additional complexity of another upgrade with solr?
yvanzo
yes, plus dropping id for tags and cdstubs, because it has a dependency on mbssss
alastairp
great. for tags + cdstubs if that includes a schema change then we'll need to reindex those cores?
yvanzo
lucifer: Does it seem good to plan for another py2 release of SIR to match changes made to other changes to other search repositories?
alastairp
as I mentioned in the PR for my search order PR, I think we can get away with just shutting down the core, editing the data file, and starting it up again
it means that we don't need to reindex the large cores
yvanzo
alastairp: For tags and cdstubs, I guess that it is not needed to reindex those cores to keep it working, reindexing would just lighten these cores; To be confirmed with tests.
These two ones are not large cores anyway.
alastairp
because it changes/removes a column from the schema I recommend to reindex
yes, and because they're small I think that it should be fine to do this
yvanzo
Will look into it more closely then, if it requires reindexing cores, maybe it should go with Solr 9 release instead.
alastairp
hmm. but that means we would need to reindex _all_ cores?
instead of just the two?
yvanzo
For Solr 9? Yes.
I assumed so at least.
lucifer
alastairp: sure, i'll open a PR later to add it to sir documentation.
yvanzo: sounds good to do a py2 release if needed.
mayhem: for the filter constant, thoughts on naming? DAYS_OF_RECENT_LISTENS_TO_EXCLUDE ?
for exporting existing playlists from LB to spotify, i added this patch. i am not sure whether a patch is the right way to do this and whether this patch should live in LB or troi?
mayhem
hmmm.
if we're going to use troi to generate playlists for the time being, then leaving it here for now sounds fine.
lucifer
one thing for sure that should live in troi is the deserialize json to a playlist object.
mayhem
when/if we outgrow troi for this purpose, we'll reexamine.
lucifer
sure sounds good.
mayhem
agreed.
lucifer
another related thing is we need to external playlist urls in the playlists. so that we can return the url of the spotify playlist to the frontend.
mayhem
makes sense.
lucifer
i added an external_urls field to the playlist object. currently, its a list but i think it makes to make it a dict so that in future we can eaily allow submitting to multiple sites.
mayhem
good plan.
lucifer
relatedly, we probably want to show the number of tracks not matched or which tracks exactly were not matched i guess. how do we implement that?
should we attach a generic metadata dict to each playlist?
actually for the current needs, we could iterate once more over the existing recordings in LB so probably let's just do that.