#metabrainz

/

      • petitminion has quit
      • 2023-11-01 30556, 2023

      • Pratha-Fish
        hey bitmap you around?
      • 2023-11-01 30504, 2023

      • tux0r converts bitmap to png
      • 2023-11-01 30531, 2023

      • Pratha-Fish xd
      • 2023-11-01 30505, 2023

      • lusciouslover joined the channel
      • 2023-11-01 30517, 2023

      • lucifer
        mayhem: when you have time, can you please add https://github.com/metabrainz/pylistenbrainz/pull… to coreteam
      • 2023-11-01 30507, 2023

      • antlarr has quit
      • 2023-11-01 30536, 2023

      • reosarevok
        yvanzo: see MBS-13343 - where should docker issues be reported? Github issues rather than jira? If so we should indicate it somewhere in jira maybe, if possible
      • 2023-11-01 30537, 2023

      • BrainzBot
        MBS-13343: musicbrainz docker stuck on port 5000 even with env variable set to something else https://tickets.metabrainz.org/browse/MBS-13343
      • 2023-11-01 30502, 2023

      • mayhem
        moooin!
      • 2023-11-01 30503, 2023

      • mayhem
        lucifer: done!
      • 2023-11-01 30505, 2023

      • reosarevok
        Hmm. Seems babel is struggling with flow all of a sudden? for gettext:
      • 2023-11-01 30508, 2023

      • reosarevok
      • 2023-11-01 30512, 2023

      • reosarevok
        yvanzo: ^ seen that one before?
      • 2023-11-01 30545, 2023

      • mayhem
        feels nice to not be wasting *huge* amounts of resources here. https://ludic.mataroa.blog/blog/what-the-goddamn-…
      • 2023-11-01 30537, 2023

      • reosarevok
        yvanzo, bitmap: is there any difference between "delete" and "remove"?
      • 2023-11-01 30555, 2023

      • reosarevok
        "This place has no relationships and will be removed automatically in the next few days. If this is not intended, please add more data to this place." - it probably should say "add some relationships" rather than just "more data", which is what we already say for works?
      • 2023-11-01 30527, 2023

      • reosarevok
        If not (since we have "more data" elsewhere where the data can be releases or recordings for example) then we should probably change the work one to be consistent
      • 2023-11-01 30535, 2023

      • Maxr1998_ joined the channel
      • 2023-11-01 30539, 2023

      • Maxr1998 has quit
      • 2023-11-01 30543, 2023

      • bitmap
        reosarevok: not in musicbrainz AFAIK. "delete" sounds a bit more permanent, I guess
      • 2023-11-01 30501, 2023

      • bitmap
        Pratha-Fish: hey, I'm back
      • 2023-11-01 30533, 2023

      • reosarevok
        Yeah, wondering because AFAICT we use them weirdly interchangeably
      • 2023-11-01 30507, 2023

      • reosarevok
        So maybe we should stick to one
      • 2023-11-01 30522, 2023

      • reosarevok
        (it also still annoys me that some entities have an /add url and some a /create url)
      • 2023-11-01 30548, 2023

      • bitmap
        so delete remove or remove delete?
      • 2023-11-01 30536, 2023

      • bitmap
        not sure which one we use more
      • 2023-11-01 30547, 2023

      • reosarevok
        relete demove?
      • 2023-11-01 30501, 2023

      • reosarevok
        We use remove a lot more, so I'd probably drop delete
      • 2023-11-01 30512, 2023

      • reosarevok
        Or well, it feels like we do, "Remove {entity}" edits et
      • 2023-11-01 30514, 2023

      • reosarevok
        *etc
      • 2023-11-01 30558, 2023

      • reosarevok
        Seems we mostly use deleted for editors
      • 2023-11-01 30500, 2023

      • reosarevok
        But not only
      • 2023-11-01 30513, 2023

      • reosarevok
        I guess keeping it just for that meaning could make sense, if you say it sounds more permanent?
      • 2023-11-01 30522, 2023

      • antlarr joined the channel
      • 2023-11-01 30547, 2023

      • Pratha-Fish
      • 2023-11-01 30509, 2023

      • bitmap
        I think I prefer remove too
      • 2023-11-01 30524, 2023

      • Pratha-Fish
        I'll get back to you in ~10 minutes right after I finish lunch. Having some issues comparing areas from the two sources 🫠
      • 2023-11-01 30549, 2023

      • bitmap
        ok :)
      • 2023-11-01 30516, 2023

      • Pratha-Fish
        hey reosarevok if you have 10 minutes, we can also maybe take up this opportunity to do an overall survey of the whole project?
      • 2023-11-01 30543, 2023

      • reosarevok
        I do have 10 minutes, yes :) So we can do that
      • 2023-11-01 30555, 2023

      • Pratha-Fish
        The current issue that I am facing is as follows:
      • 2023-11-01 30555, 2023

      • Pratha-Fish
        We have pristine data coming from both sources, in almost exactly the same format.
      • 2023-11-01 30557, 2023

      • Pratha-Fish
        e.g.
      • 2023-11-01 30535, 2023

      • Pratha-Fish
        (generating, just a sec)
      • 2023-11-01 30525, 2023

      • Pratha-Fish
      • 2023-11-01 30501, 2023

      • Pratha-Fish
        NVM
      • 2023-11-01 30532, 2023

      • kellnerd
        Trying to demonstrate the issue has solved it? :)
      • 2023-11-01 30532, 2023

      • reosarevok
        Figured it out? :D
      • 2023-11-01 30545, 2023

      • Pratha-Fish
        kellnerd: I only wish 😭
      • 2023-11-01 30549, 2023

      • Pratha-Fish
        Ran into a little bug
      • 2023-11-01 30552, 2023

      • Pratha-Fish
        Anyway
      • 2023-11-01 30504, 2023

      • Pratha-Fish
        The structure of the data fetched from musicbrainz is almost the same as well. But the id columns in it use musicbrainz_ids instad of wikidata ids
      • 2023-11-01 30515, 2023

      • Pratha-Fish
        Soo basically comparing based on ids is not an option
      • 2023-11-01 30554, 2023

      • reosarevok
        You should be comparing based on the wikidata ids you need to get from the URLs :)
      • 2023-11-01 30503, 2023

      • Pratha-Fish
        The big idea is, to somehow index areas even with the same names but different subdivisions and countries such that each area generates a unique index for it to be queried
      • 2023-11-01 30522, 2023

      • Pratha-Fish
        reosarevok: that's exactly what I am trying right now!
      • 2023-11-01 30544, 2023

      • Pratha-Fish
        Not sure how well it's gonna work tho, but let's hope that ALL entries in musicbrainz have a wikidata id associated with it
      • 2023-11-01 30548, 2023

      • Pratha-Fish
        *id -> url
      • 2023-11-01 30532, 2023

      • reosarevok
        99.99% should
      • 2023-11-01 30533, 2023

      • Pratha-Fish
        As for the second issue, we have some repeating areas going with the same subdivision and country, but different wikidata URLs
      • 2023-11-01 30533, 2023

      • Pratha-Fish
        lemme fetch an example real quick
      • 2023-11-01 30529, 2023

      • Pratha-Fish
        Ahh where are teh examples when you need them 🫠
      • 2023-11-01 30544, 2023

      • Pratha-Fish
      • 2023-11-01 30557, 2023

      • Pratha-Fish
        The big idea with these entries is detecting them.
      • 2023-11-01 30543, 2023

      • reosarevok
        That does seem to be a full duplicate, yeah
      • 2023-11-01 30557, 2023

      • reosarevok
        They're probably rare, but I'm sure there will be more
      • 2023-11-01 30503, 2023

      • Pratha-Fish
        I was just wondering what could we even do about it
      • 2023-11-01 30506, 2023

      • reosarevok
        I think it seems fine to skip those / log them
      • 2023-11-01 30530, 2023

      • reosarevok
        If you detect you'd add a second area with the same name and parents at least, you could log it and not do it automatically
      • 2023-11-01 30532, 2023

      • Pratha-Fish
        I can calculate them based on the city name, but what if they turn out to be cities with same name but different subdivision? or even country?
      • 2023-11-01 30553, 2023

      • Pratha-Fish
        So I tried generating an index of such areas to filter out such areas, but here's the result
      • 2023-11-01 30522, 2023

      • reosarevok
        (since if we *do* add it we should actually add a disambiguation anyway so it's better if a human does it)
      • 2023-11-01 30529, 2023

      • Pratha-Fish
        We have a lot of city data with no parent areas 💀 https://usercontent.irccloud-cdn.com/file/c5qDcMO…
      • 2023-11-01 30553, 2023

      • reosarevok
        Hmm. In MB?
      • 2023-11-01 30556, 2023

      • reosarevok
        Or in WD?
      • 2023-11-01 30501, 2023

      • Pratha-Fish
        Its from MB
      • 2023-11-01 30503, 2023

      • reosarevok
        In MB cities with no parents should be rare
      • 2023-11-01 30508, 2023

      • Pratha-Fish
        but same story from wiki
      • 2023-11-01 30530, 2023

      • Pratha-Fish
        Actually wiki could be suffering from some other issue due to my own bad code
      • 2023-11-01 30536, 2023

      • Pratha-Fish
      • 2023-11-01 30546, 2023

      • Pratha-Fish
        reosarevok: let's see how rare
      • 2023-11-01 30525, 2023

      • reosarevok
        wikidata having a few amount of those seems more likely, you should just ignore them if so
      • 2023-11-01 30538, 2023

      • reosarevok
        a few amount. me fail english that's unpossible
      • 2023-11-01 30541, 2023

      • Pratha-Fish
        reosarevok: we have 4885 of those in MB surprisingly
      • 2023-11-01 30548, 2023

      • reosarevok
        That does not sound right
      • 2023-11-01 30550, 2023

      • reosarevok
        Examples? :)
      • 2023-11-01 30553, 2023

      • Pratha-Fish
        on it
      • 2023-11-01 30519, 2023

      • Pratha-Fish
        Apparently there's also something wrong with my data 😑
      • 2023-11-01 30532, 2023

      • Pratha-Fish
        Will have to take a moment to get it fixed
      • 2023-11-01 30548, 2023

      • reosarevok
        That's ok
      • 2023-11-01 30510, 2023

      • Pratha-Fish
        Ah yes.
      • 2023-11-01 30523, 2023

      • Pratha-Fish
        I had to reset my dev environment a couple of days ago
      • 2023-11-01 30537, 2023

      • Pratha-Fish
        So looks like I am missing something in the musicbrainz database
      • 2023-11-01 30544, 2023

      • Pratha-Fish
        Here's how I've set it up so far:
      • 2023-11-01 30545, 2023

      • Pratha-Fish
        1. Dropped musicbrainz_db and reimported it using the datadumps generated by bitmap (contained all areas and relations last I checked.)
      • 2023-11-01 30545, 2023

      • Pratha-Fish
        2. Created the materialized tables as well
      • 2023-11-01 30537, 2023

      • Pratha-Fish
        But before the reset, my query used to fetch around 700k rows along with parent data. But now it barely fetches 200k with NO PARENT DATA
      • 2023-11-01 30541, 2023

      • bitmap
        you sure you imported the second dump and not the first one?
      • 2023-11-01 30553, 2023

      • Pratha-Fish
        Yes, I am pretty sure
      • 2023-11-01 30557, 2023

      • Pratha-Fish
        But I'll try again just in case
      • 2023-11-01 30507, 2023

      • bitmap
        select count(*) from area_containment; -> 320019 in production
      • 2023-11-01 30546, 2023

      • Pratha-Fish
        Yep, looks like I don't have area_containment data 🤦‍♂️
      • 2023-11-01 30521, 2023

      • Pratha-Fish
        I remember running the materialized table command in musicbrainz docker as well.
      • 2023-11-01 30527, 2023

      • Pratha-Fish
        I'll try restarting it
      • 2023-11-01 30531, 2023

      • bitmap
        is the table completely empty?
      • 2023-11-01 30508, 2023

      • Pratha-Fish
        There was no table detected 💀
      • 2023-11-01 30541, 2023

      • bitmap
        lol
      • 2023-11-01 30526, 2023

      • bitmap
        it should always exist in the schema, even if it's empty
      • 2023-11-01 30530, 2023

      • Pratha-Fish
        lmao
      • 2023-11-01 30539, 2023

      • Pratha-Fish
        are you sure you spelled the table name right?
      • 2023-11-01 30546, 2023

      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #3066 (03master…use-delete-for-user-only): Use "remove" consistently for non-editor cases https://github.com/metabrainz/musicbrainz-server/…
      • 2023-11-01 30554, 2023

      • bitmap
        pretty sure
      • 2023-11-01 30505, 2023

      • Pratha-Fish
        I'll try dropping and reimporting the dump then ig
      • 2023-11-01 30511, 2023

      • Pratha-Fish
        It takes a while tho
      • 2023-11-01 30526, 2023

      • Pratha-Fish
        But while it's done, let's discuss the current state of the project ig
      • 2023-11-01 30530, 2023

      • bitmap
        where is the table "not detected"? how are you checking for it?
      • 2023-11-01 30535, 2023

      • Pratha-Fish
        *being executed
      • 2023-11-01 30548, 2023

      • bitmap
        could also be a search_path issue
      • 2023-11-01 30550, 2023

      • Pratha-Fish
        rant this in bash `psql -U musicbrainz -h localhost -c "select count(*) from area_containment;" -d musicbrainz`
      • 2023-11-01 30505, 2023

      • Pratha-Fish
      • 2023-11-01 30552, 2023

      • bitmap
        is your database name musicbrainz or musicbrainz_db? (you mentioned the latter earlier, but your psql command uses musicbrainz)
      • 2023-11-01 30527, 2023

      • Pratha-Fish
        🤦‍♂️
      • 2023-11-01 30541, 2023

      • Pratha-Fish
        fixed it and it worked 💀
      • 2023-11-01 30548, 2023

      • reosarevok
        Yay
      • 2023-11-01 30550, 2023

      • Pratha-Fish
        the count is 320019
      • 2023-11-01 30508, 2023

      • bitmap
        😁
      • 2023-11-01 30508, 2023

      • Pratha-Fish
        Thankgod. Those resets take quite a while on my device 💀
      • 2023-11-01 30502, 2023

      • Pratha-Fish
        Yayy now my query is fetching parent areas as well!
      • 2023-11-01 30558, 2023

      • reosarevok
        Yay
      • 2023-11-01 30500, 2023

      • Pratha-Fish
        Give me a sec, I am running some compute to figure out fatherless areas on musicbrainz
      • 2023-11-01 30506, 2023

      • reosarevok
        Perfect
      • 2023-11-01 30534, 2023

      • Pratha-Fish cackles at the concept of "fatherless" areas
      • 2023-11-01 30544, 2023

      • reosarevok
        No fatherland for you!
      • 2023-11-01 30516, 2023

      • reosarevok
        bitmap: "Automatically subscribe me to artists I create", "Batch-create new works", "This user has not created any entities"
      • 2023-11-01 30524, 2023

      • reosarevok
        All those sound like it should be "added" to me ...
      • 2023-11-01 30553, 2023

      • reosarevok
        "Only the editor who created an edit can cancel it." that sounds like it should be entered :)
      • 2023-11-01 30503, 2023

      • Pratha-Fish
        Alright, looks like we still have 2349 fatherless cities in musicbrainz 🫠
      • 2023-11-01 30511, 2023

      • reosarevok
        Ok, examples :)
      • 2023-11-01 30526, 2023

      • bitmap
        reosarevok: both your suggestions sgtm
      • 2023-11-01 30538, 2023

      • reosarevok
        I think I might look over "create" strings next then
      • 2023-11-01 30544, 2023

      • petitminion joined the channel
      • 2023-11-01 30506, 2023

      • reosarevok
        Pratha-Fish: I'm around for 15 min more max, and then in a few hours btw :)
      • 2023-11-01 30520, 2023

      • Pratha-Fish
        Alright, trying to make it as fast as possible
      • 2023-11-01 30544, 2023

      • reosarevok
        It's fine later too :)
      • 2023-11-01 30521, 2023

      • reosarevok
        But let's see
      • 2023-11-01 30531, 2023

      • reosarevok
        Check the GID, open it in MB, see if it seems correct
      • 2023-11-01 30554, 2023

      • monkey
        aerozol: Saw these fun CSS gradients, thought of you: https://codepen.io/thebabydino/details/GRRpzNX
      • 2023-11-01 30529, 2023

      • petitminion has quit
      • 2023-11-01 30533, 2023

      • Pratha-Fish