in #metabrainz

0:03 AM
d4rkie has quit
0:03 AM
d4rk-ph0enix joined the channel
0:57 AM
BrainzGit

[critiquebrainz] 14TrellixVulnTeam opened pull request #475 (03master…master): CVE-2007-4559 Patch https://github.com/metabrainz/critiquebrainz/pu...
3:11 AM
[sir] 14amCap1712 opened pull request #140 (03master…fix-deprecation-warnings): Fix deprecation warnings in SIR https://github.com/metabrainz/sir/pull/140
3:16 AM
[sir] 14amCap1712 opened pull request #141 (03fix-deprecation-warnings…fix-more-warnings): Fix more SQLAlchemy related warnings https://github.com/metabrainz/sir/pull/141
3:17 AM
[sir] 14amCap1712 opened pull request #142 (03fix-more-warnings…2to3): Port Sir to Python 3 https://github.com/metabrainz/sir/pull/142
5:14 AM
-- BotBot disconnected, possible missing messages --
5:16 AM
-- BotBot disconnected, possible missing messages --
5:17 AM
BrainzBot joined the channel
5:17 AM
kgz joined the channel
5:18 AM
rektide joined the channel
6:09 AM
yvanzo

O’Moin
6:23 AM
lucifer

morning!
6:36 AM
BrainzGit

[troi-recommendation-playground] 14amCap1712 opened pull request #72 (03main…recent-listens-filter): Add RecentListensTimestampLookup element https://github.com/metabrainz/troi-recommendati...
6:39 AM
lucifer

yvanzo: hi! while testing the SQLAlchemy PR, did you notice any performance regression in indexing?
6:42 AM
yvanzo

lucifer: No but the performances were so uneven before that it is hard to compare.
6:42 AM
lucifer

i see, makes sense.
6:43 AM
yvanzo

Did you notice anything like that?
6:43 AM
lucifer

while testing the Sir Python 3 PR on wolf, i noticed that SIR is making multiple queries to fetch the data of a single object. the N + 1 select problem.
6:44 AM
yvanzo

Might it be related to mbdata?
6:44 AM
lucifer

i don't remember how much time it used to take earlier so not sure if its a regression.
6:45 AM
possibly, but i did checkout v3.0.1 version and did a quick test and saw the same issue there.
6:45 AM
yvanzo

I can make a speed comparison test on an isolated VM.
6:46 AM
lucifer

i'll try to do a clean setup and see if it happens on v3.0.1 as well. in any case, makes sense to fix the issue.
6:46 AM
oh that would be great, thanks!
6:46 AM
i'll look into fixing the N + 1 select problem after Sir is ported to Py 3.
6:47 AM
yvanzo

lucifer: did you test reindexing with #140 & #141 already?
6:48 AM
lucifer

yvanzo: no i didn't reindex with those. i am reindexing with #142 currently which changes from all the PRs.
6:48 AM
*which contains
6:48 AM
fwiw, the real data indexing tests pass with #140 and #141.
6:49 AM
yvanzo

how do you check queries being made while testing?
6:50 AM
lucifer

testing as in when running the unit tests or while testing indexing on the entire live database?
6:51 AM
for unit tests, add `echo=True` to `create_engine` call in `sir.util`. and add a `assert False` at the end of the test you want to view queries for (because pytest doesn't logs for passing tests by default).
6:52 AM
for live database test, run sir reindex with `--sqltimings` option
6:52 AM
yvanzo

as in how did you notice it but both are interesting
6:52 AM
lucifer

https://www.irccloud.com/pastebin/R6j9Cnua/
6:53 AM
here is a snippet of logs from --sqltimings
6:53 AM
as you can see its loading the comment field in a separate query instead of the one used to load main artist data
7:01 AM
for the unit tests, this is the hack i use to view queries: https://github.com/metabrainz/sir/commit/5d8706...
7:02 AM
SQLAlchemy now has a `raisedload` feature, which raises an exception if it detects multiple queries being made to fetch columns. we can look into adding that once the issue is resolved. that'll probably help in preventing regressions.
7:32 AM
v6lur has quit
7:40 AM
riksucks

hello lucifer!
7:41 AM
I was lurking jira and found LB-1151, I believe that when I had coded the frontend of missing mb I had also reported of recordings that are already mapped in MB. I think we even brainstormed on how to go about it.
7:41 AM
BrainzBot

LB-1151: Cannot select existing recording in "missing metadata" page https://tickets.metabrainz.org/browse/LB-1151
7:42 AM
riksucks

I think what came of it was, storing these false positives in a db and not serve these false positives in the missing mb endpoint, and later on over time map them. I might be wrong
7:47 AM
is it how we decided, because I do remember that there were some uncertainties, which is why we didn't proceed.
7:47 AM
cc monkey aerozol
8:03 AM
trolley has quit
8:04 AM
trolley joined the channel
8:06 AM
v6lur joined the channel
8:34 AM
lucifer

riksucks: hi! we not only want to store the false positives but also be able to say that this false positive should have been mapped to this recording in musicbrainz.
8:35 AM
for that part of the feature, we probably need some more data in LB (the improvements to mb_metadata_cache which is why that is currently on hold)
8:51 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2696 (03master…MBS-12674): MBS-12674: Don't assume offset data exists https://github.com/metabrainz/musicbrainz-serve...
8:51 AM
reosarevok

bitmap, yvanzo ^ can one of you check this today? I'd like to put this (and other merged fixes) into beta tonight
8:53 AM
yvanzo: do you need a second ticket for the refcount bit of https://github.com/metabrainz/musicbrainz-serve... ? Or is it fine if I just widen the existing ticket to cover it?
8:57 AM
yvanzo

reosarevok: not personally; either way is fine as long as it is covered by a ticket since it is a user-facing change.
8:57 AM
reosarevok

Ok
8:57 AM
For the other issue, dunno if you saw the https://github.com/metabrainz/musicbrainz-serve... PR but it's only half-fixable :/
8:58 AM
For the ticket, I changed "Expand time before auto-deletion to 48 hours" to "Standardize time before auto-deletion to 48 hours" - and I had already added this comment https://tickets.metabrainz.org/browse/MBS-10861... - does that seem enough ? :)
8:58 AM
BrainzBot

MBS-10861: Standardize time before auto-deletion to 48 hours
9:04 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2697 (03master…MBS-12672): MBS-12672: Add "Genre" to edit search like other entities https://github.com/metabrainz/musicbrainz-serve...
9:08 AM
yvanzo

reosarevok: Why unreferenced ACs should be autoremoved earlier if it is causing issues?
9:09 AM
reosarevok

Because otherwise any artists used in those that should be autoremoved stay around for over a week - but also, IIRC I mentioned that this issue is not really helped by a longer autoremoval time
9:09 AM
Because anyone can reuse the AC in an open edit at any time during the autoremoval-pending period
9:09 AM
So if it's 7 days, but somebody adds the edit 2 days before the end of the 7 days, it's the same as if it was 2 days and they entered it at the beginning
9:10 AM
(the only issue caused, in any case, should be that the edit adds a new AC rather than reusing the old, now autoremoved one, at least by bitmap's understanding)
9:11 AM
yvanzo

(which is quite annoying as we support AC MBIDs)
9:15 AM
reosarevok

Basically, as far as I can tell, the only way to avoid that happening at all is checking whether any edit data anywhere is using the AC, which seems unfeasible, or have a separate edits_pending-like thing that we update when any edit uses a specific AC, which I guess is doable, but complicated
9:16 AM
Or, I guess, we redefine edits_pending so that every time an AC is used in an open edit it gets updated, but then most popular ACs would be highlighted pretty much all the time, which does not seem desirable
9:16 AM
We can at least stop removals when the AC is being edited directly (it has actual edits_pending)
9:16 AM
That much is easy
9:19 AM
And then we can think if there's a good way we can improve the setup with a schema change so that other edits are also detected / stop the removal maybe :)
9:20 AM
yvanzo

Do you have at hand any example of edit missing ACs?
9:21 AM
reosarevok

Not really - it's a theoretical worry right now, although I bet some are out there
9:21 AM
But basically: empty an AC of uses so it will get added to the unreferenced list, then enter an open (pending) edit that would use it again, for example by adding it to a new track in an existing release or by editing a recording artist
9:22 AM
The good thing is that since releases for example get added while pending, it being used in a new release would not cause issues (neither would, of course, it being used in any automatically applied edit)
9:42 AM
alastairp

lucifer: great notes about testing sir! thanks
9:42 AM
would be nice to put them in a debugging/admin doc :)
9:43 AM
please let me know if I can help on anything too
9:47 AM
today I'm going to get back into oauth
10:15 AM
monkey

aerozol: Hi! One more mockup request for you :) It looks like we'll soon need to add more user preferences to listenbrainz (looking at LB#2206), and the existing settings page is a bit haphazard (https://listenbrainz.org/profile). Not sure where more settings options would currently fit. Could you help us prepare a better setting page? We'll eventually want separate sections for general profile settings, brainzplayer settings,
10:15 AM
account tokens & deletion, etc.
10:15 AM
BrainzBot

Add troi preference for auto spotify export: https://github.com/metabrainz/listenbrainz-serv...
10:15 AM
monkey

A section for recommendation settings, perhaps? ^
10:30 AM
yvanzo

alastairp: About Solr in MB, do you want to have a feature-release before moving to Solr 9?
10:34 AM
alastairp

yvanzo: I think a feature release would be good - the sort change + reosarevok's PR for genre?
10:35 AM
would be a good way to also test sir+sqlalchemy upgrade without also introducing additional complexity of another upgrade with solr?
10:35 AM
yvanzo

yes, plus dropping id for tags and cdstubs, because it has a dependency on mbssss
10:37 AM
alastairp

great. for tags + cdstubs if that includes a schema change then we'll need to reindex those cores?
10:37 AM
yvanzo

lucifer: Does it seem good to plan for another py2 release of SIR to match changes made to other changes to other search repositories?
10:37 AM
alastairp

as I mentioned in the PR for my search order PR, I think we can get away with just shutting down the core, editing the data file, and starting it up again
10:37 AM
it means that we don't need to reindex the large cores
10:39 AM
yvanzo

alastairp: For tags and cdstubs, I guess that it is not needed to reindex those cores to keep it working, reindexing would just lighten these cores; To be confirmed with tests.
10:40 AM
These two ones are not large cores anyway.
10:40 AM
alastairp

because it changes/removes a column from the schema I recommend to reindex
10:40 AM
yes, and because they're small I think that it should be fine to do this
10:41 AM
yvanzo

Will look into it more closely then, if it requires reindexing cores, maybe it should go with Solr 9 release instead.
10:41 AM
alastairp

hmm. but that means we would need to reindex _all_ cores?
10:41 AM
instead of just the two?
10:42 AM
yvanzo

For Solr 9? Yes.
10:43 AM
I assumed so at least.
10:48 AM
lucifer

alastairp: sure, i'll open a PR later to add it to sir documentation.
10:49 AM
yvanzo: sounds good to do a py2 release if needed.
10:55 AM
mayhem: for the filter constant, thoughts on naming? DAYS_OF_RECENT_LISTENS_TO_EXCLUDE ?
10:57 AM
mayhem

SGTM
10:57 AM
lucifer

alastairp, yvanzo: i couldn't find the same section in solr 7 docs but from solr 8. https://solr.apache.org/guide/8_0/reindexing.ht...
10:57 AM
reindexing seems optional in case of field removal.
10:58 AM
mayhem

lucifer: seems there are loads of open questions on troi -- I'm around if you want to discuss.
10:59 AM
Maxr1998 has quit
10:59 AM
lucifer

sure, in a min.
10:59 AM
yvanzo

thanks
10:59 AM
lucifer

lets merge https://github.com/metabrainz/troi-recommendati... ?
10:59 AM
BrainzGit

[troi-recommendation-playground] 14mayhem merged pull request #42 (03main…add-genre-filter): Add a genre filter element https://github.com/metabrainz/troi-recommendati...
11:00 AM
[troi-recommendation-playground] 14amCap1712 merged pull request #72 (03main…recent-listens-filter): Add RecentListensTimestampLookup element https://github.com/metabrainz/troi-recommendati...
11:00 AM
lucifer

mayhem: let's start
11:01 AM
mayhem is ready
11:01 AM
https://github.com/metabrainz/troi-recommendati...
11:02 AM
mayhem

what the open question there?
11:02 AM
lucifer

for exporting existing playlists from LB to spotify, i added this patch. i am not sure whether a patch is the right way to do this and whether this patch should live in LB or troi?
11:03 AM
mayhem

hmmm.
11:03 AM
if we're going to use troi to generate playlists for the time being, then leaving it here for now sounds fine.
11:03 AM
lucifer

one thing for sure that should live in troi is the deserialize json to a playlist object.
11:03 AM
mayhem

when/if we outgrow troi for this purpose, we'll reexamine.
11:04 AM
lucifer

sure sounds good.
11:04 AM
mayhem

agreed.
11:04 AM
lucifer

another related thing is we need to external playlist urls in the playlists. so that we can return the url of the spotify playlist to the frontend.
11:05 AM
mayhem

makes sense.
11:05 AM
lucifer

i added an external_urls field to the playlist object. currently, its a list but i think it makes to make it a dict so that in future we can eaily allow submitting to multiple sites.
11:06 AM
mayhem

good plan.
11:06 AM
lucifer

relatedly, we probably want to show the number of tracks not matched or which tracks exactly were not matched i guess. how do we implement that?
11:07 AM
should we attach a generic metadata dict to each playlist?
11:07 AM
actually for the current needs, we could iterate once more over the existing recordings in LB so probably let's just do that.
11:08 AM
mayhem

we already have this, no?
11:08 AM
mayhem finds a link
11:08 AM
lucifer

https://musicbrainz.org/doc/jspf#playlist
11:08 AM
algorithm_metadata ?
11:09 AM
mayhem

yes.
11:09 AM
lucifer

that sounds wrong for the kind of data we want to store.
11:10 AM
i want to store something like 47 / 50 tracks matched. here are the 3 tracks we couldn't match.
11:10 AM
mayhem

my vision for this field is to store things that relate to the playlist at hand.
11:10 AM
ah, I see. yes.
11:10 AM
playlist level data vs track level. got it.
11:10 AM
lucifer

i see. another thing is we filter algorithm metadata in LB iirc.
11:10 AM
mayhem

I think we need a playlist level dict, agreed.
11:11 AM
I think we should start thinking about adding a transparency field to the alg metadata.
11:11 AM
which should contain human readable text as to WHY this track is in their playlist.
11:12 AM
lucifer

yeah so currently if we submit algorithm metadata as a normal user, its ignored. https://github.com/metabrainz/listenbrainz-serv...
11:12 AM
transparency field makes sense
11:12 AM
mayhem nods
11:13 AM
oh and about the external uris as well, those also need to have a field in JSPF extensions i think.