the first one was registered the same time as musicbrainz.org -- back then there was no google and search was just starting typo domains were a real thing back then.
also, there was a thought that people might not find a .org (kinda not well known then) and that a .com should be reserved to redirect to the .org
kinda outdated concepts.
I could see keeping bookbrainz.com. any thoughts monkey ?
back then, maybe 15 years ago, we talked about possibly doing foodbrainz and tvbrainz.
foodbrainz has been done, but I dont recall the name. Freso, do you?
Freso
I think FilmBrainz is more likely to be a thing sometime than TVBrainz, so not sure TVBrainz is needed. There is already OpenFoodFacts and I haven’t seen much legit community requests for FoodBrainz, so I don’t think that’s going to happen ever.
mayhem
tvbrainz? What's TV?
Freso
Yes, OpenFoodFacts. :p
zas
lucifer: good point, it was one of my concerns, that said there are plenty of domains that can be used to phish
lucifer
makes sense
mayhem
I'm open to keeping musicbrains.org -- for phishing concerns.
we also have moviebrainz,org which is still a possibility.
Freso
Yeah.
mayhem
ok, motion carried, I'll make that happen.
akshaaatt
I think our domains are unique as is. Would someone try to zip up the domains if we don't have them?
mayhem
akshaaatt: they always do.
CatQuest
gamebrainz
Freso
akshaaatt: 🤷
akshaaatt
Oh!
mayhem
ok, onward to the next topic.
Freso
mayhem: Supporting open source
mayhem
we'll soon be getting paid for the ODI participation.
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Supporting open source (ruaok), Next meeting (Freso)
and I've decided I want to help open source in general a bit, so I want us to lead by example.
akshaaatt
++
monkey
+1000
mayhem
I am dedicating a budget of $6000 annually to this cause -- at least to start.
the first year is being paid out by Microsoft/ODI.
I'd like each teammember to identify 2 or 3 open source projects that they think should be supported.
akshaaatt
Sounds superb!
mayhem
I will create a spreadsheet for this in a minute.
monkey
Yeah, I love the idea!
mayhem
propose a project and propose an annual support payment. add a link to where to support the project.
reosarevok
Neat
mayhem
then in maybe 2 weeks we can have another meeting where we talk about the chosen projects and allocate funds, ok?
monkey
đź‘Ť
mayhem
then, I want to create a page on meb.org that states this.
akshaaatt
Can we do something like an internal Gsoc thing where we let students/people provide proposals and pay, mentor them for the projects? This could happen during the months of Nov, Dec, Jan where people are anyway looking for internships
Freso
Sounds good.
mayhem
I'm going to state what % of our income we're dedicating to OSS and challenge other companies to do the same or better.
let's make some noise about this, because imagine if someone like Google did this as well?
OSS developers are burning out, so lets see if we can help a little.
fin. back to freso.
Freso
Freso: Next meeting
lucifer
sounds great. awesome!
Freso
This is just a PSA that Europe switches to DST on Sunday, so for Indians and USians and others that do not follow Europe’s DST schedule, note that next week’s meeting will be an hour… uh, later? for you.
lucifer
earlier
Freso
Earlier. Thanks lucifer. :p
alastairp
an hour different
Freso
fin.
And that wraps up today’s meeting too.
Thank you everyone for your time! Stay safe out there. :)
</BANG>
akshaaatt
Thank you!
monkey
Cheers !
lucifer
mayhem: you may be able to speed up the query a bit, change the table to unlogged before writing data and then to logged after writing. downside is something crashes table loses data but you'll run the query again in that case anyway
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Reviews
*is if something crashes
mayhem
the query time is all in the execute() -- the writing is relatively fast.
lucifer
ah đź‘Ť
alastairp
mayhem: is the query significantly faster when you just apply it to a few rows (i.e. when a replication packet comes in?)
mayhem
it is instantaneous when I request only one row, so I would expect so.
yeah, so I guess it doesn't matter how long it takes (within reason...) if we're only going to run it once
PrathameshG: hi, I'm here. not sure how late it is for you or if you want to talk
what have you managed to do?
lucifer
oh but how do you apply it to some rows? lookup edits and figure out the entities that need to refreshed
Sophist-UK joined the channel
alastairp
lucifer: the plan is to consume replication packets, which say which rows have changed
PrathameshG
alastairp: Hey there, DW I'll be online for another hour.
lucifer
i see, makes sense.
mayhem
lucifer: this is why we are so damned anal about the created/last_updated columns. so that downstream users like this can clearly understand what changed.
and I have a GIN index on a ARRAY column that lists artist_mbids, so I can quickly mark rows as dirty.
and then one query to select the dirty rows in a CTE with the rest of the query.
alastairp: I was on a vacation and got stung by a bee on my hand so sadly I wasn't able to do much 🤦‍♂️
Although, so far I've managed to get used to my environment on bono, and tried out loading and testing some of the data sets.
monkey
That a buzzkill…
That's*
alastairp
monkey: buzzz off
monkey
Yes honey.
alastairp
PrathameshG: oops, hope everything is ok. but don't worry about it - we're happy for you to just play around and look at the data. no pressure to do anything
lucifer
oh i forgot to mention earlier (~2hrs ago), LB prod updated
alastairp
PrathameshG: so it sounds like you now know how to do things, but you're not sure what to do?
PrathameshG
Nono, it's completely alright, I'd be more than happy to take responsibility and take up targets and try to complete them
Yes, that's exactly what's happening right now. I don't have a clue what I have to do
alastairp
so first of all - last week you were talking about a bunch of interesting ideas that you had to look up mlhd data in last.fm and other sources
PrathameshG
Yes that's right
alastairp
just because you have an account on bono don't think that you only have to do what we suggest, feel free to use the resources for your own project too
that being said, let me explain to you what we wanted to do with the mhld
PrathameshG
Thanks a lot, I was thinking of running some network intensive stuff on it.
Please go ahead
alastairp
some history: experience has shown us that a lot of data in last.fm is wrong. about 10 or so years ago (someone correct me if I'm wrong), musicbrainz made a big change to its database structure, and we introduced a number of new concepts. it seems that maybe last.fm were late to identify this change
the result of this is that sometimes when they give you a "recording mbid", this might not actually be correct. it might be a track mbid (a "track" is a recording on a specific release - you could have only 1 recording, 2 releases, and 2 tracks)
so one thing that we're not sure about, is if all of the ids in the data files are actually correct and if they actually exist in musicbrainz
the first thing we wanted to do was look through the data files and cross-reference it with the database and see which things exist, which things have been deleted for some reason, and which things are in the database but with the wrong id
PrathameshG
Alright, so we've to revalidate all the MBIDs
Sounds very doable
So I'll just start by writing scripts to evaluate the mbids and cross check them with the musicbrainz db.
I'll keep giving you updates for the same :)
And of course, if you've anything to add on it, please go ahead and mention it
^ Found this above snippet that mayhem posted in the last convo. Will try to implement some of it along the way đź‘Ś
alastairp
feel free to share with us the code that you write, there are many tricks that one can apply to make something like this comparison fast
PrathameshG
Yes absolutely
alastairp
yes right - that's a broader outline of what we hope to do
PrathameshG
Firstly, I was wondering if I'd have to hit the database on each row?
That's gonna be really intense for the database
alastairp
yeah, exactly ;)
PrathameshG
Alrighty, I'll get started with 1 dataset first then :))
Will update you soon.
alastairp
databases are really good at doing things in bulk, my first intuition would be to process at least 1 file at once
lucifer
for checking whether the mbid is track mbid or recording mbid stuff, you could probably get away with just reading the index which should be fast enough. also do lookup in batchs.
alastairp
for example, I'm just looking through some of the sample 00 files - some have 150k rows, some have 40k rows. postgresql will have no problem if you pass in a query with 100,000 parameters
PrathameshG
Got it đź‘Ť lucifer alastairp
alastairp
however I wonder if there are some even easier ways to do this, I'm just trying something - one moment
PrathameshG
yep
Also, just to confirm our primary concern is with the recording-mbid right?
alastairp
yes, right - because we already have these relationships in the musicbrainz database our plan is to ignore the artist and release ids and re-compute them
PrathameshG
Sounds good. So I'll just drop the artist/release ID columns during analysis. Will speed up the process a bit