the first one was registered the same time as musicbrainz.org -- back then there was no google and search was just starting typo domains were a real thing back then.
2022-03-21 08039, 2022
mayhem
also, there was a thought that people might not find a .org (kinda not well known then) and that a .com should be reserved to redirect to the .org
2022-03-21 08054, 2022
mayhem
kinda outdated concepts.
2022-03-21 08006, 2022
mayhem
I could see keeping bookbrainz.com. any thoughts monkey ?
back then, maybe 15 years ago, we talked about possibly doing foodbrainz and tvbrainz.
2022-03-21 08053, 2022
mayhem
foodbrainz has been done, but I dont recall the name. Freso, do you?
2022-03-21 08058, 2022
Freso
I think FilmBrainz is more likely to be a thing sometime than TVBrainz, so not sure TVBrainz is needed. There is already OpenFoodFacts and I haven’t seen much legit community requests for FoodBrainz, so I don’t think that’s going to happen ever.
2022-03-21 08001, 2022
mayhem
tvbrainz? What's TV?
2022-03-21 08002, 2022
Freso
Yes, OpenFoodFacts. :p
2022-03-21 08013, 2022
zas
lucifer: good point, it was one of my concerns, that said there are plenty of domains that can be used to phish
2022-03-21 08024, 2022
lucifer
makes sense
2022-03-21 08039, 2022
mayhem
I'm open to keeping musicbrains.org -- for phishing concerns.
we also have moviebrainz,org which is still a possibility.
2022-03-21 08056, 2022
Freso
Yeah.
2022-03-21 08059, 2022
mayhem
ok, motion carried, I'll make that happen.
2022-03-21 08059, 2022
akshaaatt
I think our domains are unique as is. Would someone try to zip up the domains if we don't have them?
2022-03-21 08008, 2022
mayhem
akshaaatt: they always do.
2022-03-21 08009, 2022
CatQuest
gamebrainz
2022-03-21 08010, 2022
Freso
akshaaatt: 🤷
2022-03-21 08013, 2022
akshaaatt
Oh!
2022-03-21 08022, 2022
mayhem
ok, onward to the next topic.
2022-03-21 08031, 2022
Freso
mayhem: Supporting open source
2022-03-21 08034, 2022
mayhem
we'll soon be getting paid for the ODI participation.
2022-03-21 08045, 2022
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Supporting open source (ruaok), Next meeting (Freso)
2022-03-21 08054, 2022
mayhem
and I've decided I want to help open source in general a bit, so I want us to lead by example.
2022-03-21 08004, 2022
akshaaatt
++
2022-03-21 08015, 2022
monkey
+1000
2022-03-21 08015, 2022
mayhem
I am dedicating a budget of $6000 annually to this cause -- at least to start.
2022-03-21 08034, 2022
mayhem
the first year is being paid out by Microsoft/ODI.
2022-03-21 08003, 2022
mayhem
I'd like each teammember to identify 2 or 3 open source projects that they think should be supported.
2022-03-21 08011, 2022
akshaaatt
Sounds superb!
2022-03-21 08025, 2022
mayhem
I will create a spreadsheet for this in a minute.
2022-03-21 08028, 2022
monkey
Yeah, I love the idea!
2022-03-21 08041, 2022
mayhem
propose a project and propose an annual support payment. add a link to where to support the project.
2022-03-21 08059, 2022
reosarevok
Neat
2022-03-21 08003, 2022
mayhem
then in maybe 2 weeks we can have another meeting where we talk about the chosen projects and allocate funds, ok?
2022-03-21 08012, 2022
monkey
👍
2022-03-21 08015, 2022
mayhem
then, I want to create a page on meb.org that states this.
2022-03-21 08025, 2022
akshaaatt
Can we do something like an internal Gsoc thing where we let students/people provide proposals and pay, mentor them for the projects? This could happen during the months of Nov, Dec, Jan where people are anyway looking for internships
2022-03-21 08038, 2022
Freso
Sounds good.
2022-03-21 08040, 2022
mayhem
I'm going to state what % of our income we're dedicating to OSS and challenge other companies to do the same or better.
2022-03-21 08001, 2022
mayhem
let's make some noise about this, because imagine if someone like Google did this as well?
2022-03-21 08016, 2022
mayhem
OSS developers are burning out, so lets see if we can help a little.
2022-03-21 08020, 2022
mayhem
fin. back to freso.
2022-03-21 08028, 2022
Freso
Freso: Next meeting
2022-03-21 08039, 2022
lucifer
sounds great. awesome!
2022-03-21 08018, 2022
Freso
This is just a PSA that Europe switches to DST on Sunday, so for Indians and USians and others that do not follow Europe’s DST schedule, note that next week’s meeting will be an hour… uh, later? for you.
2022-03-21 08031, 2022
lucifer
earlier
2022-03-21 08039, 2022
Freso
Earlier. Thanks lucifer. :p
2022-03-21 08043, 2022
alastairp
an hour different
2022-03-21 08003, 2022
Freso
fin.
2022-03-21 08015, 2022
Freso
And that wraps up today’s meeting too.
2022-03-21 08027, 2022
Freso
Thank you everyone for your time! Stay safe out there. :)
2022-03-21 08032, 2022
Freso
</BANG>
2022-03-21 08037, 2022
akshaaatt
Thank you!
2022-03-21 08038, 2022
monkey
Cheers !
2022-03-21 08039, 2022
lucifer
mayhem: you may be able to speed up the query a bit, change the table to unlogged before writing data and then to logged after writing. downside is something crashes table loses data but you'll run the query again in that case anyway
2022-03-21 08045, 2022
TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Reviews
2022-03-21 08057, 2022
lucifer
*is if something crashes
2022-03-21 08012, 2022
mayhem
the query time is all in the execute() -- the writing is relatively fast.
2022-03-21 08030, 2022
lucifer
ah 👍
2022-03-21 08038, 2022
alastairp
mayhem: is the query significantly faster when you just apply it to a few rows (i.e. when a replication packet comes in?)
2022-03-21 08003, 2022
mayhem
it is instantaneous when I request only one row, so I would expect so.
yeah, so I guess it doesn't matter how long it takes (within reason...) if we're only going to run it once
2022-03-21 08058, 2022
alastairp
PrathameshG: hi, I'm here. not sure how late it is for you or if you want to talk
2022-03-21 08002, 2022
alastairp
what have you managed to do?
2022-03-21 08015, 2022
lucifer
oh but how do you apply it to some rows? lookup edits and figure out the entities that need to refreshed
2022-03-21 08032, 2022
Sophist-UK joined the channel
2022-03-21 08059, 2022
alastairp
lucifer: the plan is to consume replication packets, which say which rows have changed
2022-03-21 08004, 2022
PrathameshG
alastairp: Hey there, DW I'll be online for another hour.
2022-03-21 08027, 2022
lucifer
i see, makes sense.
2022-03-21 08025, 2022
mayhem
lucifer: this is why we are so damned anal about the created/last_updated columns. so that downstream users like this can clearly understand what changed.
2022-03-21 08003, 2022
mayhem
and I have a GIN index on a ARRAY column that lists artist_mbids, so I can quickly mark rows as dirty.
2022-03-21 08031, 2022
mayhem
and then one query to select the dirty rows in a CTE with the rest of the query.
alastairp: I was on a vacation and got stung by a bee on my hand so sadly I wasn't able to do much 🤦‍♂️
2022-03-21 08026, 2022
PrathameshG
Although, so far I've managed to get used to my environment on bono, and tried out loading and testing some of the data sets.
2022-03-21 08049, 2022
monkey
That a buzzkill…
2022-03-21 08002, 2022
monkey
That's*
2022-03-21 08010, 2022
alastairp
monkey: buzzz off
2022-03-21 08020, 2022
monkey
Yes honey.
2022-03-21 08047, 2022
alastairp
PrathameshG: oops, hope everything is ok. but don't worry about it - we're happy for you to just play around and look at the data. no pressure to do anything
2022-03-21 08005, 2022
lucifer
oh i forgot to mention earlier (~2hrs ago), LB prod updated
2022-03-21 08012, 2022
alastairp
PrathameshG: so it sounds like you now know how to do things, but you're not sure what to do?
2022-03-21 08035, 2022
PrathameshG
Nono, it's completely alright, I'd be more than happy to take responsibility and take up targets and try to complete them
2022-03-21 08052, 2022
PrathameshG
Yes, that's exactly what's happening right now. I don't have a clue what I have to do
2022-03-21 08008, 2022
alastairp
so first of all - last week you were talking about a bunch of interesting ideas that you had to look up mlhd data in last.fm and other sources
2022-03-21 08034, 2022
PrathameshG
Yes that's right
2022-03-21 08035, 2022
alastairp
just because you have an account on bono don't think that you only have to do what we suggest, feel free to use the resources for your own project too
2022-03-21 08002, 2022
alastairp
that being said, let me explain to you what we wanted to do with the mhld
2022-03-21 08040, 2022
PrathameshG
Thanks a lot, I was thinking of running some network intensive stuff on it.
2022-03-21 08054, 2022
PrathameshG
Please go ahead
2022-03-21 08024, 2022
alastairp
some history: experience has shown us that a lot of data in last.fm is wrong. about 10 or so years ago (someone correct me if I'm wrong), musicbrainz made a big change to its database structure, and we introduced a number of new concepts. it seems that maybe last.fm were late to identify this change
2022-03-21 08028, 2022
alastairp
the result of this is that sometimes when they give you a "recording mbid", this might not actually be correct. it might be a track mbid (a "track" is a recording on a specific release - you could have only 1 recording, 2 releases, and 2 tracks)
2022-03-21 08057, 2022
alastairp
so one thing that we're not sure about, is if all of the ids in the data files are actually correct and if they actually exist in musicbrainz
2022-03-21 08032, 2022
alastairp
the first thing we wanted to do was look through the data files and cross-reference it with the database and see which things exist, which things have been deleted for some reason, and which things are in the database but with the wrong id
2022-03-21 08032, 2022
PrathameshG
Alright, so we've to revalidate all the MBIDs
2022-03-21 08043, 2022
PrathameshG
Sounds very doable
2022-03-21 08033, 2022
PrathameshG
So I'll just start by writing scripts to evaluate the mbids and cross check them with the musicbrainz db.
2022-03-21 08034, 2022
PrathameshG
I'll keep giving you updates for the same :)
2022-03-21 08059, 2022
PrathameshG
And of course, if you've anything to add on it, please go ahead and mention it
^ Found this above snippet that mayhem posted in the last convo. Will try to implement some of it along the way 👌
2022-03-21 08040, 2022
alastairp
feel free to share with us the code that you write, there are many tricks that one can apply to make something like this comparison fast
2022-03-21 08050, 2022
PrathameshG
Yes absolutely
2022-03-21 08009, 2022
alastairp
yes right - that's a broader outline of what we hope to do
2022-03-21 08023, 2022
PrathameshG
Firstly, I was wondering if I'd have to hit the database on each row?
2022-03-21 08024, 2022
PrathameshG
That's gonna be really intense for the database
2022-03-21 08030, 2022
alastairp
yeah, exactly ;)
2022-03-21 08009, 2022
PrathameshG
Alrighty, I'll get started with 1 dataset first then :))
2022-03-21 08013, 2022
PrathameshG
Will update you soon.
2022-03-21 08049, 2022
alastairp
databases are really good at doing things in bulk, my first intuition would be to process at least 1 file at once
2022-03-21 08052, 2022
lucifer
for checking whether the mbid is track mbid or recording mbid stuff, you could probably get away with just reading the index which should be fast enough. also do lookup in batchs.
2022-03-21 08036, 2022
alastairp
for example, I'm just looking through some of the sample 00 files - some have 150k rows, some have 40k rows. postgresql will have no problem if you pass in a query with 100,000 parameters
2022-03-21 08041, 2022
PrathameshG
Got it 👍 lucifer alastairp
2022-03-21 08059, 2022
alastairp
however I wonder if there are some even easier ways to do this, I'm just trying something - one moment
2022-03-21 08008, 2022
PrathameshG
yep
2022-03-21 08046, 2022
PrathameshG
Also, just to confirm our primary concern is with the recording-mbid right?
2022-03-21 08040, 2022
alastairp
yes, right - because we already have these relationships in the musicbrainz database our plan is to ignore the artist and release ids and re-compute them
2022-03-21 08031, 2022
PrathameshG
Sounds good. So I'll just drop the artist/release ID columns during analysis. Will speed up the process a bit