0:43 AM
petitminion has quit
2023-11-01 30556, 2023
1:19 AM
Pratha-Fish
hey bitmap you around?
2023-11-01 30504, 2023
2:00 AM
tux0r converts bitmap to png
2023-11-01 30531, 2023
2:05 AM
Pratha-Fish xd
2023-11-01 30505, 2023
2:41 AM
lusciouslover joined the channel
2023-11-01 30517, 2023
3:44 AM
lucifer
2023-11-01 30507, 2023
4:50 AM
antlarr has quit
2023-11-01 30536, 2023
6:43 AM
reosarevok
yvanzo: see MBS-13343 - where should docker issues be reported? Github issues rather than jira? If so we should indicate it somewhere in jira maybe, if possible
2023-11-01 30537, 2023
6:43 AM
BrainzBot
2023-11-01 30502, 2023
9:17 AM
mayhem
moooin!
2023-11-01 30503, 2023
9:17 AM
mayhem
lucifer: done!
2023-11-01 30505, 2023
9:44 AM
reosarevok
Hmm. Seems babel is struggling with flow all of a sudden? for gettext:
2023-11-01 30508, 2023
9:44 AM
reosarevok
2023-11-01 30512, 2023
9:44 AM
reosarevok
yvanzo: ^ seen that one before?
2023-11-01 30545, 2023
9:49 AM
mayhem
2023-11-01 30537, 2023
10:19 AM
reosarevok
yvanzo, bitmap: is there any difference between "delete" and "remove"?
2023-11-01 30555, 2023
10:28 AM
reosarevok
"This place has no relationships and will be removed automatically in the next few days. If this is not intended, please add more data to this place." - it probably should say "add some relationships" rather than just "more data", which is what we already say for works?
2023-11-01 30527, 2023
10:30 AM
reosarevok
If not (since we have "more data" elsewhere where the data can be releases or recordings for example) then we should probably change the work one to be consistent
2023-11-01 30535, 2023
11:03 AM
Maxr1998_ joined the channel
2023-11-01 30539, 2023
11:04 AM
Maxr1998 has quit
2023-11-01 30543, 2023
11:23 AM
bitmap
reosarevok: not in musicbrainz AFAIK. "delete" sounds a bit more permanent, I guess
2023-11-01 30501, 2023
11:24 AM
bitmap
Pratha-Fish: hey, I'm back
2023-11-01 30533, 2023
11:24 AM
reosarevok
Yeah, wondering because AFAICT we use them weirdly interchangeably
2023-11-01 30507, 2023
11:25 AM
reosarevok
So maybe we should stick to one
2023-11-01 30522, 2023
11:25 AM
reosarevok
(it also still annoys me that some entities have an /add url and some a /create url)
2023-11-01 30548, 2023
11:25 AM
bitmap
so delete remove or remove delete?
2023-11-01 30536, 2023
11:26 AM
bitmap
not sure which one we use more
2023-11-01 30547, 2023
11:26 AM
reosarevok
relete demove?
2023-11-01 30501, 2023
11:27 AM
reosarevok
We use remove a lot more, so I'd probably drop delete
2023-11-01 30512, 2023
11:27 AM
reosarevok
Or well, it feels like we do, "Remove {entity}" edits et
2023-11-01 30514, 2023
11:27 AM
reosarevok
*etc
2023-11-01 30558, 2023
11:27 AM
reosarevok
Seems we mostly use deleted for editors
2023-11-01 30500, 2023
11:28 AM
reosarevok
But not only
2023-11-01 30513, 2023
11:28 AM
reosarevok
I guess keeping it just for that meaning could make sense, if you say it sounds more permanent?
2023-11-01 30522, 2023
11:28 AM
antlarr joined the channel
2023-11-01 30547, 2023
11:28 AM
Pratha-Fish
2023-11-01 30509, 2023
11:29 AM
bitmap
I think I prefer remove too
2023-11-01 30524, 2023
11:29 AM
Pratha-Fish
I'll get back to you in ~10 minutes right after I finish lunch. Having some issues comparing areas from the two sources 🫠
2023-11-01 30549, 2023
11:29 AM
bitmap
ok :)
2023-11-01 30516, 2023
11:36 AM
Pratha-Fish
hey reosarevok if you have 10 minutes, we can also maybe take up this opportunity to do an overall survey of the whole project?
2023-11-01 30543, 2023
11:36 AM
reosarevok
I do have 10 minutes, yes :) So we can do that
2023-11-01 30555, 2023
11:36 AM
Pratha-Fish
The current issue that I am facing is as follows:
2023-11-01 30555, 2023
11:36 AM
Pratha-Fish
We have pristine data coming from both sources, in almost exactly the same format.
2023-11-01 30557, 2023
11:36 AM
Pratha-Fish
e.g.
2023-11-01 30535, 2023
11:37 AM
Pratha-Fish
(generating, just a sec)
2023-11-01 30525, 2023
11:38 AM
Pratha-Fish
2023-11-01 30501, 2023
11:41 AM
Pratha-Fish
NVM
2023-11-01 30532, 2023
11:41 AM
kellnerd
Trying to demonstrate the issue has solved it? :)
2023-11-01 30532, 2023
11:41 AM
reosarevok
Figured it out? :D
2023-11-01 30545, 2023
11:41 AM
Pratha-Fish
kellnerd: I only wish 😭
2023-11-01 30549, 2023
11:41 AM
Pratha-Fish
Ran into a little bug
2023-11-01 30552, 2023
11:41 AM
Pratha-Fish
Anyway
2023-11-01 30504, 2023
11:42 AM
Pratha-Fish
The structure of the data fetched from musicbrainz is almost the same as well. But the id columns in it use musicbrainz_ids instad of wikidata ids
2023-11-01 30515, 2023
11:42 AM
Pratha-Fish
Soo basically comparing based on ids is not an option
2023-11-01 30554, 2023
11:42 AM
reosarevok
You should be comparing based on the wikidata ids you need to get from the URLs :)
2023-11-01 30503, 2023
11:43 AM
Pratha-Fish
The big idea is, to somehow index areas even with the same names but different subdivisions and countries such that each area generates a unique index for it to be queried
2023-11-01 30522, 2023
11:43 AM
Pratha-Fish
reosarevok: that's exactly what I am trying right now!
2023-11-01 30544, 2023
11:43 AM
Pratha-Fish
Not sure how well it's gonna work tho, but let's hope that ALL entries in musicbrainz have a wikidata id associated with it
2023-11-01 30548, 2023
11:43 AM
Pratha-Fish
*id -> url
2023-11-01 30532, 2023
11:44 AM
reosarevok
99.99% should
2023-11-01 30533, 2023
11:44 AM
Pratha-Fish
As for the second issue, we have some repeating areas going with the same subdivision and country, but different wikidata URLs
2023-11-01 30533, 2023
11:45 AM
Pratha-Fish
lemme fetch an example real quick
2023-11-01 30529, 2023
11:46 AM
Pratha-Fish
Ahh where are teh examples when you need them 🫠
2023-11-01 30544, 2023
11:46 AM
Pratha-Fish
2023-11-01 30557, 2023
11:47 AM
Pratha-Fish
The big idea with these entries is detecting them.
2023-11-01 30543, 2023
11:48 AM
reosarevok
That does seem to be a full duplicate, yeah
2023-11-01 30557, 2023
11:48 AM
reosarevok
They're probably rare, but I'm sure there will be more
2023-11-01 30503, 2023
11:49 AM
Pratha-Fish
I was just wondering what could we even do about it
2023-11-01 30506, 2023
11:49 AM
reosarevok
I think it seems fine to skip those / log them
2023-11-01 30530, 2023
11:49 AM
reosarevok
If you detect you'd add a second area with the same name and parents at least, you could log it and not do it automatically
2023-11-01 30532, 2023
11:49 AM
Pratha-Fish
I can calculate them based on the city name, but what if they turn out to be cities with same name but different subdivision? or even country?
2023-11-01 30553, 2023
11:49 AM
Pratha-Fish
So I tried generating an index of such areas to filter out such areas, but here's the result
2023-11-01 30522, 2023
11:50 AM
reosarevok
(since if we *do* add it we should actually add a disambiguation anyway so it's better if a human does it)
2023-11-01 30529, 2023
11:50 AM
Pratha-Fish
2023-11-01 30553, 2023
11:50 AM
reosarevok
Hmm. In MB?
2023-11-01 30556, 2023
11:50 AM
reosarevok
Or in WD?
2023-11-01 30501, 2023
11:51 AM
Pratha-Fish
Its from MB
2023-11-01 30503, 2023
11:51 AM
reosarevok
In MB cities with no parents should be rare
2023-11-01 30508, 2023
11:51 AM
Pratha-Fish
but same story from wiki
2023-11-01 30530, 2023
11:51 AM
Pratha-Fish
Actually wiki could be suffering from some other issue due to my own bad code
2023-11-01 30536, 2023
11:51 AM
Pratha-Fish
2023-11-01 30546, 2023
11:51 AM
Pratha-Fish
reosarevok: let's see how rare
2023-11-01 30525, 2023
11:52 AM
reosarevok
wikidata having a few amount of those seems more likely, you should just ignore them if so
2023-11-01 30538, 2023
11:52 AM
reosarevok
a few amount. me fail english that's unpossible
2023-11-01 30541, 2023
11:52 AM
Pratha-Fish
reosarevok: we have 4885 of those in MB surprisingly
2023-11-01 30548, 2023
11:52 AM
reosarevok
That does not sound right
2023-11-01 30550, 2023
11:52 AM
reosarevok
Examples? :)
2023-11-01 30553, 2023
11:52 AM
Pratha-Fish
on it
2023-11-01 30519, 2023
11:54 AM
Pratha-Fish
Apparently there's also something wrong with my data 😑
2023-11-01 30532, 2023
11:54 AM
Pratha-Fish
Will have to take a moment to get it fixed
2023-11-01 30548, 2023
11:54 AM
reosarevok
That's ok
2023-11-01 30510, 2023
11:56 AM
Pratha-Fish
Ah yes.
2023-11-01 30523, 2023
11:56 AM
Pratha-Fish
I had to reset my dev environment a couple of days ago
2023-11-01 30537, 2023
11:56 AM
Pratha-Fish
So looks like I am missing something in the musicbrainz database
2023-11-01 30544, 2023
11:57 AM
Pratha-Fish
Here's how I've set it up so far:
2023-11-01 30545, 2023
11:57 AM
Pratha-Fish
1. Dropped musicbrainz_db and reimported it using the datadumps generated by bitmap (contained all areas and relations last I checked.)
2023-11-01 30545, 2023
11:57 AM
Pratha-Fish
2. Created the materialized tables as well
2023-11-01 30537, 2023
11:58 AM
Pratha-Fish
But before the reset, my query used to fetch around 700k rows along with parent data. But now it barely fetches 200k with NO PARENT DATA
2023-11-01 30541, 2023
12:00 PM
bitmap
you sure you imported the second dump and not the first one?
2023-11-01 30553, 2023
12:00 PM
Pratha-Fish
Yes, I am pretty sure
2023-11-01 30557, 2023
12:00 PM
Pratha-Fish
But I'll try again just in case
2023-11-01 30507, 2023
12:02 PM
bitmap
select count(*) from area_containment; -> 320019 in production
2023-11-01 30546, 2023
12:03 PM
Pratha-Fish
Yep, looks like I don't have area_containment data 🤦♂️
2023-11-01 30521, 2023
12:04 PM
Pratha-Fish
I remember running the materialized table command in musicbrainz docker as well.
2023-11-01 30527, 2023
12:04 PM
Pratha-Fish
I'll try restarting it
2023-11-01 30531, 2023
12:04 PM
bitmap
is the table completely empty?
2023-11-01 30508, 2023
12:05 PM
Pratha-Fish
There was no table detected 💀
2023-11-01 30541, 2023
12:05 PM
bitmap
lol
2023-11-01 30526, 2023
12:06 PM
bitmap
it should always exist in the schema, even if it's empty
2023-11-01 30530, 2023
12:06 PM
Pratha-Fish
lmao
2023-11-01 30539, 2023
12:06 PM
Pratha-Fish
are you sure you spelled the table name right?
2023-11-01 30546, 2023
12:06 PM
BrainzGit
2023-11-01 30554, 2023
12:06 PM
bitmap
pretty sure
2023-11-01 30505, 2023
12:07 PM
Pratha-Fish
I'll try dropping and reimporting the dump then ig
2023-11-01 30511, 2023
12:07 PM
Pratha-Fish
It takes a while tho
2023-11-01 30526, 2023
12:07 PM
Pratha-Fish
But while it's done, let's discuss the current state of the project ig
2023-11-01 30530, 2023
12:07 PM
bitmap
where is the table "not detected"? how are you checking for it?
2023-11-01 30535, 2023
12:07 PM
Pratha-Fish
*being executed
2023-11-01 30548, 2023
12:07 PM
bitmap
could also be a search_path issue
2023-11-01 30550, 2023
12:07 PM
Pratha-Fish
rant this in bash `psql -U musicbrainz -h localhost -c "select count(*) from area_containment;" -d musicbrainz`
2023-11-01 30505, 2023
12:08 PM
Pratha-Fish
2023-11-01 30552, 2023
12:08 PM
bitmap
is your database name musicbrainz or musicbrainz_db? (you mentioned the latter earlier, but your psql command uses musicbrainz)
2023-11-01 30527, 2023
12:09 PM
Pratha-Fish
🤦♂️
2023-11-01 30541, 2023
12:09 PM
Pratha-Fish
fixed it and it worked 💀
2023-11-01 30548, 2023
12:09 PM
reosarevok
Yay
2023-11-01 30550, 2023
12:09 PM
Pratha-Fish
the count is 320019
2023-11-01 30508, 2023
12:10 PM
bitmap
😁
2023-11-01 30508, 2023
12:10 PM
Pratha-Fish
Thankgod. Those resets take quite a while on my device 💀
2023-11-01 30502, 2023
12:11 PM
Pratha-Fish
Yayy now my query is fetching parent areas as well!
2023-11-01 30558, 2023
12:11 PM
reosarevok
Yay
2023-11-01 30500, 2023
12:12 PM
Pratha-Fish
Give me a sec, I am running some compute to figure out fatherless areas on musicbrainz
2023-11-01 30506, 2023
12:12 PM
reosarevok
Perfect
2023-11-01 30534, 2023
12:12 PM
Pratha-Fish cackles at the concept of "fatherless" areas
2023-11-01 30544, 2023
12:12 PM
reosarevok
No fatherland for you!
2023-11-01 30516, 2023
12:13 PM
reosarevok
bitmap: "Automatically subscribe me to artists I create", "Batch-create new works", "This user has not created any entities"
2023-11-01 30524, 2023
12:13 PM
reosarevok
All those sound like it should be "added" to me ...
2023-11-01 30553, 2023
12:13 PM
reosarevok
"Only the editor who created an edit can cancel it." that sounds like it should be entered :)
2023-11-01 30503, 2023
12:14 PM
Pratha-Fish
Alright, looks like we still have 2349 fatherless cities in musicbrainz 🫠
2023-11-01 30511, 2023
12:14 PM
reosarevok
Ok, examples :)
2023-11-01 30526, 2023
12:14 PM
bitmap
reosarevok: both your suggestions sgtm
2023-11-01 30538, 2023
12:14 PM
reosarevok
I think I might look over "create" strings next then
2023-11-01 30544, 2023
12:14 PM
petitminion joined the channel
2023-11-01 30506, 2023
12:15 PM
reosarevok
Pratha-Fish: I'm around for 15 min more max, and then in a few hours btw :)
2023-11-01 30520, 2023
12:15 PM
Pratha-Fish
Alright, trying to make it as fast as possible
2023-11-01 30544, 2023
12:15 PM
reosarevok
It's fine later too :)
2023-11-01 30521, 2023
12:16 PM
reosarevok
But let's see
2023-11-01 30531, 2023
12:16 PM
reosarevok
Check the GID, open it in MB, see if it seems correct
2023-11-01 30554, 2023
12:16 PM
monkey
2023-11-01 30529, 2023
12:27 PM
petitminion has quit
2023-11-01 30533, 2023
12:27 PM
Pratha-Fish