alastairp: ping me when you're ready to work on the metadata viewer stuff. turns out I have to solve the following problems today: canonical recordings (✅), canonical releases (easy), recording to canonical release (not too hard). Then I can add release focused stuff the the metadata cache.
nothing like cranking out 3 useful datasets so I can provide freaking coverart to the viewer.
cuanim has quit
cuanim joined the channel
monkey
Wee !
To be fair we do need other recording info, and the viewer would not look right without cover art :)
mayhem
not complaining -- these things need to get done and yes no cover art is a total deal breaker.
milosh has quit
texke has quit
canonical releases data now computing.
texke joined the channel
alastairp
mayhem: morning. I'm here today
alastairp opens metadata doc
mayhem
moin!
given where I am at and a sizable project that needs doing, it might be good for you to work on the auto update feature of the mb_metadata cache.
I'm going to be busy generating the release data today and likely tomorrow.
alastairp
mayhem: right - reading the replication packets?
mayhem
yes, but not that low level. :)
the process, as I envision it:
1. When the data gets generated, save a last-checked timestamp.
2. Then for an hourly cron job, wake up, read last checked timestamp.
3. Fetch MBIDs for artists, recordings and releases that have changed since the last run.
4. Mark these MBIDs as dirty.
5. Fetch all rows that are marked as dirty and re-fetch the data, unsetting the dirty flag.
6. Save an updated last-checked timestamp.
7. Go Back to 2.
alastairp
so to confirm - you have code for generating the data, both from scratch and given a set of candidate mbids (mbids of what?)
mayhem
yes.
alastairp
is there a db with this generated data somewhere so I can look at its structure?
mayhem
bono
and gaga.
mb_metadata_cache on gaga.
it is not up to date, however. I've already created a new column in the table called "artist_mbids UUID[] NOT NULL".
and soon there will be a "release_mbid UUID NOT NULL" column.
alastairp
right - that's just what I was going to ask. I think I saw you mention this last night. each row will have a column containing the recording mbid, artist mbids, and release mbids that this row affects?
mayhem
and there is a recording_mbid column. those are your three mbid columns that if one of those MBIDs changes, mark the row dirty.
alastairp
so we don't have to walk through the MB database to find these relations
perfect
mayhem
I think I just answered your question, yes?
I've not added the GIN index on artist_mbids yet. that still need to be done.
alastairp
yeah, that was the last bit that I wasn't sure about
mayhem
I'm now also making changes to the canonical-recordings branch in order to calculate the pre-cursor data sets for the release stuff. not yet sure how to best handle those branches yet....
alastairp
the metadata mapping is the mb-metdata-cache branch?
mayhem
yes
canonical recordings and canonical releases are being calculate right now.
My brain cannot retain attention in anything for more than 30 seconds today. Going to go for a walk and not look at a screen for a while or try to force my brain. I’ll post notes from last night’s meeting tomorrow. :\
CatQuest
heeey guys, is everything fish?
fish.
!m freso, take care of yourself first.
BrainzBot
You're doing good work, freso, take care of yourself first.!
CatQuest
mayhem: not that i've put it in thedoc because I don' ven think I could access it f I tried (though might attempt later if there isa lnk.
but Internet Archive and/or Wikidata are my open source <3
can I also say I love this idea of giving back? great 10++
atj: my first suggestions are ones that make a huge difference to us (spark, typesense), but I guess apache doesn't really need our money so much.
maybe I'll pick something else.
hmm. should that XKCD be our inspiration? ideally we only support these random thanklessly maintained projects?
atj
I'm not best placed to know which Python or Perl libraries you rely on
mayhem
atj: that isn't the idea either. what sysadming tools are worth to support?
let everyone speak to their corner of open source.
speak of?
asymmentric joined the channel
alastairp
I'm currently thinking about the diffference between dependencies that we use in our projects, or the tooling that we use to make it
mayhem
does there need to be a distinction?
atj
I'll have a think. I was just conveying my thoughts on the approach that I think would make the most difference.
alastairp
not in terms of actually making a donation, but I initially thought about things like flask, but then I realised that I use something like iterm or tmux just as much if not more
mayhem
I'm fully open to feedback on how to do this better. but I really like the focus on the small projects.
alastairp: yes, when you start looking at it, there are gobs and gobs of projects we use an never think of.
lucifer
mayhem: afaik, the money directly donated to apache doesn't go to maintainers of various projects. so might be better to directly donate to the maintainers of projects (if the info is available usually it is but probably not always).
mayhem
lucifer: yes. US based non-profits cannot receive donations that are earmarked for a specific purpose.
so, yeah, better not apache.
lucifer
ah right, i had forgotten that.
asymmentric has quit
yyoung[m] joined the channel
mayhem: mb-metadata-cache is the branch for this sprint?
Ansh
lucifer: To show the releases for labels, events for places, I need to add some functions. So should I add them in CB or in BU ?
lucifer
Ansh, whats the purpose of those functions?
alastairp
Ansh: this is to retrieve data from the musicbrainz database? I think it should go in the BU methods
lucifer
mayhem: or better question, which branch has the mb metadata endpoints?
mayhem
mb-metadata-cache
Ansh
alastairp: Yes I need to get data from mb database.
lucifer
👍
mayhem
lucifer: but I am not sure if it makes sense to put the other metadata work into that.
i am adding the mbid lookup endpoint so probably a different branch targeting this one so that it can be reviewed separately but i can use the same blueprint etc.
huh its missing here locally. probably issue on my end.
ah, i see it now. sorry for the false alarm 😓
mayhem: oh, i think we forgot to discuss the msid situation again. artist name, recording name lookup endpoint won't have msids so how do we record the results in the mapping tables. should we generate a msid based on artist name and recording name on the fly?
the potential downside is that the usual listens endpoint considers some extra fields while generating msids so there can be dupes. but the mapper is great at handling those so don't think it should be an issue.
alternatively, we lookup each time and do not store the result of a match.
mayhem
my plan was not to store the results.
because the existing mapping may to a better job.
lucifer
i see makes sense.
mayhem
and if we're really happy with the lighter lookup endpoint, we can add the saving later.
lucifer
yeah it may also help that once the now playing stuff turns into actual listens, some enhancements in the mapper can have it consider temporal relations detect albums so on.
👍
mayhem
exactly.
alastairp
lucifer: hi, remind me - I think that we have a constraint on some postgres array fields to ensure that they are the correct shape, is that right?