0:43 AM
petitminion has quit
1:19 AM
Pratha-Fish
hey bitmap you around?
2:00 AM
tux0r converts bitmap to png
2:05 AM
Pratha-Fish xd
2:41 AM
lusciouslover joined the channel
3:44 AM
lucifer
4:50 AM
antlarr has quit
6:43 AM
reosarevok
yvanzo: see MBS-13343 - where should docker issues be reported? Github issues rather than jira? If so we should indicate it somewhere in jira maybe, if possible
6:43 AM
BrainzBot
9:17 AM
mayhem
moooin!
9:17 AM
lucifer: done!
9:44 AM
reosarevok
Hmm. Seems babel is struggling with flow all of a sudden? for gettext:
9:44 AM
9:44 AM
yvanzo: ^ seen that one before?
9:49 AM
mayhem
10:19 AM
reosarevok
yvanzo, bitmap: is there any difference between "delete" and "remove"?
10:28 AM
"This place has no relationships and will be removed automatically in the next few days. If this is not intended, please add more data to this place." - it probably should say "add some relationships" rather than just "more data", which is what we already say for works?
10:30 AM
If not (since we have "more data" elsewhere where the data can be releases or recordings for example) then we should probably change the work one to be consistent
11:03 AM
Maxr1998_ joined the channel
11:04 AM
Maxr1998 has quit
11:23 AM
bitmap
reosarevok: not in musicbrainz AFAIK. "delete" sounds a bit more permanent, I guess
11:24 AM
Pratha-Fish: hey, I'm back
11:24 AM
reosarevok
Yeah, wondering because AFAICT we use them weirdly interchangeably
11:25 AM
So maybe we should stick to one
11:25 AM
(it also still annoys me that some entities have an /add url and some a /create url)
11:25 AM
bitmap
so delete remove or remove delete?
11:26 AM
not sure which one we use more
11:26 AM
reosarevok
relete demove?
11:27 AM
We use remove a lot more, so I'd probably drop delete
11:27 AM
Or well, it feels like we do, "Remove {entity}" edits et
11:27 AM
*etc
11:27 AM
Seems we mostly use deleted for editors
11:28 AM
But not only
11:28 AM
I guess keeping it just for that meaning could make sense, if you say it sounds more permanent?
11:28 AM
antlarr joined the channel
11:28 AM
Pratha-Fish
11:29 AM
bitmap
I think I prefer remove too
11:29 AM
Pratha-Fish
I'll get back to you in ~10 minutes right after I finish lunch. Having some issues comparing areas from the two sources π«
11:29 AM
bitmap
ok :)
11:36 AM
Pratha-Fish
hey reosarevok if you have 10 minutes, we can also maybe take up this opportunity to do an overall survey of the whole project?
11:36 AM
reosarevok
I do have 10 minutes, yes :) So we can do that
11:36 AM
Pratha-Fish
The current issue that I am facing is as follows:
11:36 AM
We have pristine data coming from both sources, in almost exactly the same format.
11:36 AM
e.g.
11:37 AM
(generating, just a sec)
11:38 AM
11:41 AM
NVM
11:41 AM
kellnerd
Trying to demonstrate the issue has solved it? :)
11:41 AM
reosarevok
Figured it out? :D
11:41 AM
Pratha-Fish
kellnerd: I only wish π
11:41 AM
Ran into a little bug
11:41 AM
Anyway
11:42 AM
The structure of the data fetched from musicbrainz is almost the same as well. But the id columns in it use musicbrainz_ids instad of wikidata ids
11:42 AM
Soo basically comparing based on ids is not an option
11:42 AM
reosarevok
You should be comparing based on the wikidata ids you need to get from the URLs :)
11:43 AM
Pratha-Fish
The big idea is, to somehow index areas even with the same names but different subdivisions and countries such that each area generates a unique index for it to be queried
11:43 AM
reosarevok: that's exactly what I am trying right now!
11:43 AM
Not sure how well it's gonna work tho, but let's hope that ALL entries in musicbrainz have a wikidata id associated with it
11:43 AM
*id -> url
11:44 AM
reosarevok
99.99% should
11:44 AM
Pratha-Fish
As for the second issue, we have some repeating areas going with the same subdivision and country, but different wikidata URLs
11:45 AM
lemme fetch an example real quick
11:46 AM
Ahh where are teh examples when you need them π«
11:46 AM
11:47 AM
The big idea with these entries is detecting them.
11:48 AM
reosarevok
That does seem to be a full duplicate, yeah
11:48 AM
They're probably rare, but I'm sure there will be more
11:49 AM
Pratha-Fish
I was just wondering what could we even do about it
11:49 AM
reosarevok
I think it seems fine to skip those / log them
11:49 AM
If you detect you'd add a second area with the same name and parents at least, you could log it and not do it automatically
11:49 AM
Pratha-Fish
I can calculate them based on the city name, but what if they turn out to be cities with same name but different subdivision? or even country?
11:49 AM
So I tried generating an index of such areas to filter out such areas, but here's the result
11:50 AM
reosarevok
(since if we *do* add it we should actually add a disambiguation anyway so it's better if a human does it)
11:50 AM
Pratha-Fish
11:50 AM
reosarevok
Hmm. In MB?
11:50 AM
Or in WD?
11:51 AM
Pratha-Fish
Its from MB
11:51 AM
reosarevok
In MB cities with no parents should be rare
11:51 AM
Pratha-Fish
but same story from wiki
11:51 AM
Actually wiki could be suffering from some other issue due to my own bad code
11:51 AM
11:51 AM
reosarevok: let's see how rare
11:52 AM
reosarevok
wikidata having a few amount of those seems more likely, you should just ignore them if so
11:52 AM
a few amount. me fail english that's unpossible
11:52 AM
Pratha-Fish
reosarevok: we have 4885 of those in MB surprisingly
11:52 AM
reosarevok
That does not sound right
11:52 AM
Examples? :)
11:52 AM
Pratha-Fish
on it
11:54 AM
Apparently there's also something wrong with my data π
11:54 AM
Will have to take a moment to get it fixed
11:54 AM
reosarevok
That's ok
11:56 AM
Pratha-Fish
Ah yes.
11:56 AM
I had to reset my dev environment a couple of days ago
11:56 AM
So looks like I am missing something in the musicbrainz database
11:57 AM
Here's how I've set it up so far:
11:57 AM
1. Dropped musicbrainz_db and reimported it using the datadumps generated by bitmap (contained all areas and relations last I checked.)
11:57 AM
2. Created the materialized tables as well
11:58 AM
But before the reset, my query used to fetch around 700k rows along with parent data. But now it barely fetches 200k with NO PARENT DATA
12:00 PM
bitmap
you sure you imported the second dump and not the first one?
12:00 PM
Pratha-Fish
Yes, I am pretty sure
12:00 PM
But I'll try again just in case
12:02 PM
bitmap
select count(*) from area_containment; -> 320019 in production
12:03 PM
Pratha-Fish
Yep, looks like I don't have area_containment data π€¦ββοΈ
12:04 PM
I remember running the materialized table command in musicbrainz docker as well.
12:04 PM
I'll try restarting it
12:04 PM
bitmap
is the table completely empty?
12:05 PM
Pratha-Fish
There was no table detected π
12:05 PM
bitmap
lol
12:06 PM
it should always exist in the schema, even if it's empty
12:06 PM
Pratha-Fish
lmao
12:06 PM
are you sure you spelled the table name right?
12:06 PM
BrainzGit
12:06 PM
bitmap
pretty sure
12:07 PM
Pratha-Fish
I'll try dropping and reimporting the dump then ig
12:07 PM
It takes a while tho
12:07 PM
But while it's done, let's discuss the current state of the project ig
12:07 PM
bitmap
where is the table "not detected"? how are you checking for it?
12:07 PM
Pratha-Fish
*being executed
12:07 PM
bitmap
could also be a search_path issue
12:07 PM
Pratha-Fish
rant this in bash `psql -U musicbrainz -h localhost -c "select count(*) from area_containment;" -d musicbrainz`
12:08 PM
12:08 PM
bitmap
is your database name musicbrainz or musicbrainz_db? (you mentioned the latter earlier, but your psql command uses musicbrainz)
12:09 PM
Pratha-Fish
π€¦ββοΈ
12:09 PM
fixed it and it worked π
12:09 PM
reosarevok
Yay
12:09 PM
Pratha-Fish
the count is 320019
12:10 PM
bitmap
π
12:10 PM
Pratha-Fish
Thankgod. Those resets take quite a while on my device π
12:11 PM
Yayy now my query is fetching parent areas as well!
12:11 PM
reosarevok
Yay
12:12 PM
Pratha-Fish
Give me a sec, I am running some compute to figure out fatherless areas on musicbrainz
12:12 PM
reosarevok
Perfect
12:12 PM
Pratha-Fish cackles at the concept of "fatherless" areas
12:12 PM
No fatherland for you!
12:13 PM
bitmap: "Automatically subscribe me to artists I create", "Batch-create new works", "This user has not created any entities"
12:13 PM
All those sound like it should be "added" to me ...
12:13 PM
"Only the editor who created an edit can cancel it." that sounds like it should be entered :)
12:14 PM
Pratha-Fish
Alright, looks like we still have 2349 fatherless cities in musicbrainz π«
12:14 PM
reosarevok
Ok, examples :)
12:14 PM
bitmap
reosarevok: both your suggestions sgtm
12:14 PM
reosarevok
I think I might look over "create" strings next then
12:14 PM
petitminion joined the channel
12:15 PM
Pratha-Fish: I'm around for 15 min more max, and then in a few hours btw :)
12:15 PM
Pratha-Fish
Alright, trying to make it as fast as possible
12:15 PM
reosarevok
It's fine later too :)
12:16 PM
But let's see
12:16 PM
Check the GID, open it in MB, see if it seems correct
12:16 PM
monkey
12:27 PM
petitminion has quit
12:27 PM
Pratha-Fish