in #metabrainz

4:03 AM
Shubh joined the channel
5:26 AM
monotux has quit
5:27 AM
monotux joined the channel
7:11 AM
BrainzGit

[musicbrainz-server] 14yvanzo merged pull request #2458 (03master…mbs-12258-update-if-needed): MBS-12258: Skip updating up-to-date containers https://github.com/metabrainz/musicbrainz-serve...
7:36 AM
reosarevok

yvanzo: I was thinking about MBS-12273
7:36 AM
BrainzBot

MBS-12273: Improve misleading collection edits header https://tickets.metabrainz.org/browse/MBS-12273
7:36 AM
reosarevok

And I get the feeling the best option would actually be to just rename the collection type names from "Artist" to "Artist collection", etc
7:37 AM
To match what we do with series, which is already "Recording series", "Release series", etc.
7:37 AM
Would that cause a mess in sir? Do we currently index collections at all?
7:42 AM
I mean, I could also append (collection) to the headers, but just renaming seems simpler and better to me tbh :)
8:30 AM
cuanim joined the channel
8:44 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2464 (03master…MBS-9234): MBS-9234: Don't select a default type when adding new series https://github.com/metabrainz/musicbrainz-serve...
9:00 AM
yyoung[m] has quit
9:21 AM
mayhem

moooin!
9:21 AM
BrainzGit

[musicbrainz-server] 14reosarevok merged pull request #2440 (03master…MBS-12227): MBS-12227: Don’t include spammer editors in "valid" statistics https://github.com/metabrainz/musicbrainz-serve...
9:22 AM
mayhem

alastairp: ping me when you're ready to work on the metadata viewer stuff. turns out I have to solve the following problems today: canonical recordings (✅), canonical releases (easy), recording to canonical release (not too hard). Then I can add release focused stuff the the metadata cache.
9:23 AM
nothing like cranking out 3 useful datasets so I can provide freaking coverart to the viewer.
9:23 AM
cuanim has quit
9:27 AM
cuanim joined the channel
9:49 AM
monkey

Wee !
9:52 AM
To be fair we do need other recording info, and the viewer would not look right without cover art :)
9:53 AM
mayhem

not complaining -- these things need to get done and yes no cover art is a total deal breaker.
9:55 AM
milosh has quit
10:07 AM
texke has quit
10:15 AM
canonical releases data now computing.
10:24 AM
texke joined the channel
10:25 AM
alastairp

mayhem: morning. I'm here today
10:25 AM
alastairp opens metadata doc
10:26 AM
mayhem

moin!
10:27 AM
given where I am at and a sizable project that needs doing, it might be good for you to work on the auto update feature of the mb_metadata cache.
10:27 AM
alastairp

monkey: https://sebastienlorber.com/records-and-tuples-... this was an interesting read
10:27 AM
mayhem

I'm going to be busy generating the release data today and likely tomorrow.
10:27 AM
alastairp

mayhem: right - reading the replication packets?
10:27 AM
mayhem

yes, but not that low level. :)
10:28 AM
the process, as I envision it:
10:28 AM
1. When the data gets generated, save a last-checked timestamp.
10:28 AM
2. Then for an hourly cron job, wake up, read last checked timestamp.
10:29 AM
3. Fetch MBIDs for artists, recordings and releases that have changed since the last run.
10:29 AM
4. Mark these MBIDs as dirty.
10:30 AM
5. Fetch all rows that are marked as dirty and re-fetch the data, unsetting the dirty flag.
10:30 AM
6. Save an updated last-checked timestamp.
10:30 AM
7. Go Back to 2.
10:32 AM
alastairp

so to confirm - you have code for generating the data, both from scratch and given a set of candidate mbids (mbids of what?)
10:32 AM
mayhem

yes.
10:32 AM
alastairp

is there a db with this generated data somewhere so I can look at its structure?
10:32 AM
mayhem

bono
10:32 AM
and gaga.
10:32 AM
mb_metadata_cache on gaga.
10:33 AM
it is not up to date, however. I've already created a new column in the table called "artist_mbids UUID[] NOT NULL".
10:34 AM
and soon there will be a "release_mbid UUID NOT NULL" column.
10:34 AM
alastairp

right - that's just what I was going to ask. I think I saw you mention this last night. each row will have a column containing the recording mbid, artist mbids, and release mbids that this row affects?
10:34 AM
mayhem

and there is a recording_mbid column. those are your three mbid columns that if one of those MBIDs changes, mark the row dirty.
10:34 AM
alastairp

so we don't have to walk through the MB database to find these relations
10:34 AM
perfect
10:35 AM
mayhem

I think I just answered your question, yes?
10:35 AM
I've not added the GIN index on artist_mbids yet. that still need to be done.
10:35 AM
alastairp

yeah, that was the last bit that I wasn't sure about
10:36 AM
mayhem

I'm now also making changes to the canonical-recordings branch in order to calculate the pre-cursor data sets for the release stuff. not yet sure how to best handle those branches yet....
10:37 AM
alastairp

the metadata mapping is the mb-metdata-cache branch?
10:37 AM
mayhem

yes
10:37 AM
canonical recordings and canonical releases are being calculate right now.
10:47 AM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2465 (03master…MBS-12275): MBS-12275 / MBS-12276: YouTube playlist cleanup / validation improvements https://github.com/metabrainz/musicbrainz-serve...
10:59 AM
Freso

My brain cannot retain attention in anything for more than 30 seconds today. Going to go for a walk and not look at a screen for a while or try to force my brain. I’ll post notes from last night’s meeting tomorrow. :\
11:01 AM
CatQuest

heeey guys, is everything fish?
11:01 AM
fish.
11:02 AM
!m freso, take care of yourself first.
11:02 AM
BrainzBot

You're doing good work, freso, take care of yourself first.!
11:03 AM
CatQuest

mayhem: not that i've put it in thedoc because I don' ven think I could access it f I tried (though might attempt later if there isa lnk.
11:03 AM
but Internet Archive and/or Wikidata are my open source <3
11:06 AM
can I also say I love this idea of giving back? great 10++
11:06 AM
👏
11:06 AM
ah i can acess the doc, yay
11:17 AM
yvanzo

reosarevok: Collections are not indexed at all; See https://github.com/metabrainz/mbsssss
11:46 AM
atj

personally, I would be looking at software libraries which you rely on / have used for a long period of time
11:47 AM
those are often maintained by one person with little recognition
11:47 AM
mayhem

atj: agreed. once you've picked yours, enter them into the spreadsheet.
11:47 AM
CatQuest

oh, ok
11:47 AM
alastairp

https://xkcd.com/2347/ etc
11:47 AM
CatQuest

!recall dependency
11:47 AM
BrainzBot

https://xkcd.com/2347/
11:48 AM
mayhem

atj: my first suggestions are ones that make a huge difference to us (spark, typesense), but I guess apache doesn't really need our money so much.
11:48 AM
maybe I'll pick something else.
11:48 AM
hmm. should that XKCD be our inspiration? ideally we only support these random thanklessly maintained projects?
11:48 AM
atj

I'm not best placed to know which Python or Perl libraries you rely on
11:49 AM
mayhem

atj: that isn't the idea either. what sysadming tools are worth to support?
11:49 AM
let everyone speak to their corner of open source.
11:49 AM
speak of?
11:49 AM
asymmentric joined the channel
11:50 AM
alastairp

I'm currently thinking about the diffference between dependencies that we use in our projects, or the tooling that we use to make it
11:50 AM
mayhem

does there need to be a distinction?
11:51 AM
atj

I'll have a think. I was just conveying my thoughts on the approach that I think would make the most difference.
11:51 AM
alastairp

not in terms of actually making a donation, but I initially thought about things like flask, but then I realised that I use something like iterm or tmux just as much if not more
11:51 AM
mayhem

I'm fully open to feedback on how to do this better. but I really like the focus on the small projects.
11:52 AM
alastairp: yes, when you start looking at it, there are gobs and gobs of projects we use an never think of.
12:03 PM
lucifer

mayhem: afaik, the money directly donated to apache doesn't go to maintainers of various projects. so might be better to directly donate to the maintainers of projects (if the info is available usually it is but probably not always).
12:07 PM
mayhem

lucifer: yes. US based non-profits cannot receive donations that are earmarked for a specific purpose.
12:07 PM
so, yeah, better not apache.
12:10 PM
lucifer

ah right, i had forgotten that.
12:17 PM
asymmentric has quit
12:29 PM
yyoung[m] joined the channel
12:30 PM
mayhem: mb-metadata-cache is the branch for this sprint?
12:30 PM
Ansh

lucifer: To show the releases for labels, events for places, I need to add some functions. So should I add them in CB or in BU ?
12:31 PM
lucifer

Ansh, whats the purpose of those functions?
12:31 PM
alastairp

Ansh: this is to retrieve data from the musicbrainz database? I think it should go in the BU methods
12:31 PM
lucifer

mayhem: or better question, which branch has the mb metadata endpoints?
12:32 PM
mayhem

mb-metadata-cache
12:32 PM
Ansh

alastairp: Yes I need to get data from mb database.
12:32 PM
lucifer

👍
12:32 PM
mayhem

lucifer: but I am not sure if it makes sense to put the other metadata work into that.
12:32 PM
alastairp

Ansh: see for example, the musicbrainz API has a parameter to get releases for labels: https://musicbrainz.org/doc/MusicBrainz_API#Sub...
12:32 PM
mayhem

I think the UI might best be in a different branch. backend in this one? what do you think?
12:33 PM
alastairp

Ansh: such as this: https://musicbrainz.org/ws/2/label/1391bdc7-a22...
12:33 PM
lucifer

i am adding the mbid lookup endpoint so probably a different branch targeting this one so that it can be reviewed separately but i can use the same blueprint etc.
12:33 PM
alastairp

so it makes sense to modify https://github.com/metabrainz/brainzutils-pytho... to allow an include parameter for 'releases'
12:33 PM
lucifer

ui in a different branch based on this one makes sense.
12:33 PM
mayhem

lucifer: sure, just branch off my branch then.
12:34 PM
lucifer

👍
12:36 PM
Ansh

alastairp: Got it. Also after adding it to BU, can it be directly used in CB? Because the versions we are using is different.
12:36 PM
alastairp

Ansh: look at how CB loads the BU dependency: https://github.com/metabrainz/critiquebrainz/bl...
12:37 PM
you can temporarily modify this to point directly to your branch and then rebuild CB in order to install it
12:38 PM
Ansh

Understood
12:41 PM
lucifer

mayhem: the listenbrainz.db.metadata seems to be missing in that branch. forgot to commit?
12:43 PM
mayhem

https://github.com/metabrainz/listenbrainz-serv...
12:43 PM
its there.
12:43 PM
lucifer

huh its missing here locally. probably issue on my end.
12:45 PM
ah, i see it now. sorry for the false alarm 😓
13:54 PM
mayhem: oh, i think we forgot to discuss the msid situation again. artist name, recording name lookup endpoint won't have msids so how do we record the results in the mapping tables. should we generate a msid based on artist name and recording name on the fly?
13:56 PM
the potential downside is that the usual listens endpoint considers some extra fields while generating msids so there can be dupes. but the mapper is great at handling those so don't think it should be an issue.
13:56 PM
alternatively, we lookup each time and do not store the result of a match.
13:58 PM
mayhem

my plan was not to store the results.
13:58 PM
because the existing mapping may to a better job.
13:58 PM
lucifer

i see makes sense.
13:58 PM
mayhem

and if we're really happy with the lighter lookup endpoint, we can add the saving later.
13:59 PM
lucifer

yeah it may also help that once the now playing stuff turns into actual listens, some enhancements in the mapper can have it consider temporal relations detect albums so on.
13:59 PM
👍
13:59 PM
mayhem

exactly.
14:08 PM
alastairp

lucifer: hi, remind me - I think that we have a constraint on some postgres array fields to ensure that they are the correct shape, is that right?
14:09 PM
mayhem

we do.
14:10 PM
or we did. looking.
14:11 PM
https://github.com/metabrainz/listenbrainz-serv...
14:11 PM
alastairp

that's the one I was thinking of, thanks.
14:11 PM
mayhem: do you see a value in adding such a constraint to artist_mbids?
14:12 PM
(in the metadata table)
14:12 PM
mayhem

it can't hurt, but there will be very limited access to the table (from few code bits) that it might not really be needed.
14:13 PM
alastairp

yeah, right. I thought the same thing
14:14 PM
mayhem

ok, bono now has the `mapping.recording_canonical_release` table
14:15 PM
which could also be used to add cover art to the playlists in LB.
14:22 PM
monkey

I agree with you atj, I'm also looking for OSS projects that make my life easier AND don't have a lot of visibility or contributions already
14:28 PM
agatzk has quit
14:28 PM
agatzk joined the channel