#metabrainz

/

      • Freso
        🙋
      • Dealt with reports and forum stuff and such, being around+about, etc.
      • fin.
      • Aaaand… that’s it for reviews, I believe.
      • Thank you for your reviews! :)
      • We have a few items on the agenda, so…
      • ruaok: Expiring domains
      • mayhem
        so, have a look at this list we have expiring soon: https://gist.github.com/mayhem/436428f8e925c50e...
      • the first one was registered the same time as musicbrainz.org -- back then there was no google and search was just starting typo domains were a real thing back then.
      • also, there was a thought that people might not find a .org (kinda not well known then) and that a .com should be reserved to redirect to the .org
      • kinda outdated concepts.
      • I could see keeping bookbrainz.com. any thoughts monkey ?
      • musicbrains.org I think we can ditch. any objections?
      • Freso
        +1
      • zas
        +1
      • monkey
        Do we have any data of how many hits they get?
      • lucifer
        it currently redirects to mb.org though so someone could try to phish?
      • monkey
        Generally I don't see the need for it
      • mayhem
        we have no stats.
      • but given that some people care, lets keep bb.com and ditch musicbrains.org
      • then...
      • back then, maybe 15 years ago, we talked about possibly doing foodbrainz and tvbrainz.
      • foodbrainz has been done, but I dont recall the name. Freso, do you?
      • Freso
        I think FilmBrainz is more likely to be a thing sometime than TVBrainz, so not sure TVBrainz is needed. There is already OpenFoodFacts and I haven’t seen much legit community requests for FoodBrainz, so I don’t think that’s going to happen ever.
      • mayhem
        tvbrainz? What's TV?
      • Freso
        Yes, OpenFoodFacts. :p
      • zas
        lucifer: good point, it was one of my concerns, that said there are plenty of domains that can be used to phish
      • lucifer
        makes sense
      • mayhem
        I'm open to keeping musicbrains.org -- for phishing concerns.
      • but TV and food. we're just not going to do that.
      • zas
        +1^^
      • Freso
        +1
      • alastairp
        +1
      • mayhem
        so to summarize: keep boobrainz.com, musicbrains.org and ditch the tv and food domains.
      • any objections?
      • monkey
        +1
      • CatQuest
        boobrainz? awesome
      • mayhem
        we also have moviebrainz,org which is still a possibility.
      • Freso
        Yeah.
      • mayhem
        ok, motion carried, I'll make that happen.
      • akshaaatt
        I think our domains are unique as is. Would someone try to zip up the domains if we don't have them?
      • mayhem
        akshaaatt: they always do.
      • CatQuest
        gamebrainz
      • Freso
        akshaaatt: 🤷
      • akshaaatt
        Oh!
      • mayhem
        ok, onward to the next topic.
      • Freso
        mayhem: Supporting open source
      • mayhem
        we'll soon be getting paid for the ODI participation.
      • TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Supporting open source (ruaok), Next meeting (Freso)
      • and I've decided I want to help open source in general a bit, so I want us to lead by example.
      • akshaaatt
        ++
      • monkey
        +1000
      • mayhem
        I am dedicating a budget of $6000 annually to this cause -- at least to start.
      • the first year is being paid out by Microsoft/ODI.
      • I'd like each teammember to identify 2 or 3 open source projects that they think should be supported.
      • akshaaatt
        Sounds superb!
      • mayhem
        I will create a spreadsheet for this in a minute.
      • monkey
        Yeah, I love the idea!
      • mayhem
        propose a project and propose an annual support payment. add a link to where to support the project.
      • reosarevok
        Neat
      • mayhem
        then in maybe 2 weeks we can have another meeting where we talk about the chosen projects and allocate funds, ok?
      • monkey
        đź‘Ť
      • mayhem
        then, I want to create a page on meb.org that states this.
      • akshaaatt
        Can we do something like an internal Gsoc thing where we let students/people provide proposals and pay, mentor them for the projects? This could happen during the months of Nov, Dec, Jan where people are anyway looking for internships
      • Freso
        Sounds good.
      • mayhem
        I'm going to state what % of our income we're dedicating to OSS and challenge other companies to do the same or better.
      • let's make some noise about this, because imagine if someone like Google did this as well?
      • OSS developers are burning out, so lets see if we can help a little.
      • fin. back to freso.
      • Freso
        Freso: Next meeting
      • lucifer
        sounds great. awesome!
      • Freso
        This is just a PSA that Europe switches to DST on Sunday, so for Indians and USians and others that do not follow Europe’s DST schedule, note that next week’s meeting will be an hour… uh, later? for you.
      • lucifer
        earlier
      • Freso
        Earlier. Thanks lucifer. :p
      • alastairp
        an hour different
      • Freso
        fin.
      • And that wraps up today’s meeting too.
      • Thank you everyone for your time! Stay safe out there. :)
      • </BANG>
      • akshaaatt
        Thank you!
      • monkey
        Cheers !
      • lucifer
        mayhem: you may be able to speed up the query a bit, change the table to unlogged before writing data and then to logged after writing. downside is something crashes table loses data but you'll run the query again in that case anyway
      • TOPIC: MetaBrainz Community and Development channel | MusicBrainz non-development: #musicbrainz | BookBrainz: #bookbrainz | Channel is logged; see https://musicbrainz.org/doc/IRC for details | Agenda: Reviews
      • *is if something crashes
      • mayhem
        the query time is all in the execute() -- the writing is relatively fast.
      • lucifer
        ah đź‘Ť
      • alastairp
        mayhem: is the query significantly faster when you just apply it to a few rows (i.e. when a replication packet comes in?)
      • mayhem
        it is instantaneous when I request only one row, so I would expect so.
      • team members should be able to edit
      • alastairp
        yeah, so I guess it doesn't matter how long it takes (within reason...) if we're only going to run it once
      • PrathameshG: hi, I'm here. not sure how late it is for you or if you want to talk
      • what have you managed to do?
      • lucifer
        oh but how do you apply it to some rows? lookup edits and figure out the entities that need to refreshed
      • Sophist-UK joined the channel
      • alastairp
        lucifer: the plan is to consume replication packets, which say which rows have changed
      • PrathameshG
        alastairp: Hey there, DW I'll be online for another hour.
      • lucifer
        i see, makes sense.
      • mayhem
        lucifer: this is why we are so damned anal about the created/last_updated columns. so that downstream users like this can clearly understand what changed.
      • and I have a GIN index on a ARRAY column that lists artist_mbids, so I can quickly mark rows as dirty.
      • and then one query to select the dirty rows in a CTE with the rest of the query.
      • lucifer
        yup makes sens. nice! :D
      • mayhem
        it is going to be really beefy after that.
      • lucifer
        one day, this will land hopefully make it all automatic https://commitfest.postgresql.org/23/2138/ !!
      • Incremental refresh of materialized views
      • PrathameshG
        alastairp: I was on a vacation and got stung by a bee on my hand so sadly I wasn't able to do much 🤦‍♂️
      • Although, so far I've managed to get used to my environment on bono, and tried out loading and testing some of the data sets.
      • monkey
        That a buzzkill…
      • That's*
      • alastairp
        monkey: buzzz off
      • monkey
        Yes honey.
      • alastairp
        PrathameshG: oops, hope everything is ok. but don't worry about it - we're happy for you to just play around and look at the data. no pressure to do anything
      • lucifer
        oh i forgot to mention earlier (~2hrs ago), LB prod updated
      • alastairp
        PrathameshG: so it sounds like you now know how to do things, but you're not sure what to do?
      • PrathameshG
        Nono, it's completely alright, I'd be more than happy to take responsibility and take up targets and try to complete them
      • Yes, that's exactly what's happening right now. I don't have a clue what I have to do
      • alastairp
        so first of all - last week you were talking about a bunch of interesting ideas that you had to look up mlhd data in last.fm and other sources
      • PrathameshG
        Yes that's right
      • alastairp
        just because you have an account on bono don't think that you only have to do what we suggest, feel free to use the resources for your own project too
      • that being said, let me explain to you what we wanted to do with the mhld
      • PrathameshG
        Thanks a lot, I was thinking of running some network intensive stuff on it.
      • Please go ahead
      • alastairp
        some history: experience has shown us that a lot of data in last.fm is wrong. about 10 or so years ago (someone correct me if I'm wrong), musicbrainz made a big change to its database structure, and we introduced a number of new concepts. it seems that maybe last.fm were late to identify this change
      • the result of this is that sometimes when they give you a "recording mbid", this might not actually be correct. it might be a track mbid (a "track" is a recording on a specific release - you could have only 1 recording, 2 releases, and 2 tracks)
      • so one thing that we're not sure about, is if all of the ids in the data files are actually correct and if they actually exist in musicbrainz
      • the first thing we wanted to do was look through the data files and cross-reference it with the database and see which things exist, which things have been deleted for some reason, and which things are in the database but with the wrong id
      • PrathameshG
        Alright, so we've to revalidate all the MBIDs
      • Sounds very doable
      • So I'll just start by writing scripts to evaluate the mbids and cross check them with the musicbrainz db.
      • I'll keep giving you updates for the same :)
      • And of course, if you've anything to add on it, please go ahead and mention it
      • ^ Found this above snippet that mayhem posted in the last convo. Will try to implement some of it along the way đź‘Ś
      • alastairp
        feel free to share with us the code that you write, there are many tricks that one can apply to make something like this comparison fast
      • PrathameshG
        Yes absolutely
      • alastairp
        yes right - that's a broader outline of what we hope to do
      • PrathameshG
        Firstly, I was wondering if I'd have to hit the database on each row?
      • That's gonna be really intense for the database
      • alastairp
        yeah, exactly ;)
      • PrathameshG
        Alrighty, I'll get started with 1 dataset first then :))
      • Will update you soon.
      • alastairp
        databases are really good at doing things in bulk, my first intuition would be to process at least 1 file at once
      • lucifer
        for checking whether the mbid is track mbid or recording mbid stuff, you could probably get away with just reading the index which should be fast enough. also do lookup in batchs.
      • alastairp
        for example, I'm just looking through some of the sample 00 files - some have 150k rows, some have 40k rows. postgresql will have no problem if you pass in a query with 100,000 parameters
      • PrathameshG
        Got it đź‘Ť lucifer alastairp
      • alastairp
        however I wonder if there are some even easier ways to do this, I'm just trying something - one moment
      • PrathameshG
        yep
      • Also, just to confirm our primary concern is with the recording-mbid right?
      • alastairp
        yes, right - because we already have these relationships in the musicbrainz database our plan is to ignore the artist and release ids and re-compute them
      • PrathameshG
        Sounds good. So I'll just drop the artist/release ID columns during analysis. Will speed up the process a bit