12:43 am
petitminion has quit
1:19 am
Pratha-Fish
hey bitmap you around?
2:00 am
tux0r converts bitmap to png
2:05 am
Pratha-Fish xd
2:41 am
lusciouslover joined the channel
3:44 am
lucifer
4:50 am
antlarr has quit
6:43 am
reosarevok
yvanzo: see MBS-13343 - where should docker issues be reported? Github issues rather than jira? If so we should indicate it somewhere in jira maybe, if possible
6:43 am
BrainzBot
9:17 am
mayhem
moooin!
9:17 am
lucifer: done!
9:44 am
reosarevok
Hmm. Seems babel is struggling with flow all of a sudden? for gettext:
9:44 am
9:44 am
yvanzo: ^ seen that one before?
9:49 am
mayhem
10:19 am
reosarevok
yvanzo, bitmap: is there any difference between "delete" and "remove"?
10:28 am
"This place has no relationships and will be removed automatically in the next few days. If this is not intended, please add more data to this place." - it probably should say "add some relationships" rather than just "more data", which is what we already say for works?
10:30 am
If not (since we have "more data" elsewhere where the data can be releases or recordings for example) then we should probably change the work one to be consistent
11:03 am
Maxr1998_ joined the channel
11:04 am
Maxr1998 has quit
11:23 am
bitmap
reosarevok: not in musicbrainz AFAIK. "delete" sounds a bit more permanent, I guess
11:24 am
Pratha-Fish: hey, I'm back
11:24 am
reosarevok
Yeah, wondering because AFAICT we use them weirdly interchangeably
11:25 am
So maybe we should stick to one
11:25 am
(it also still annoys me that some entities have an /add url and some a /create url)
11:25 am
bitmap
so delete remove or remove delete?
11:26 am
not sure which one we use more
11:26 am
reosarevok
relete demove?
11:27 am
We use remove a lot more, so I'd probably drop delete
11:27 am
Or well, it feels like we do, "Remove {entity}" edits et
11:27 am
*etc
11:27 am
Seems we mostly use deleted for editors
11:28 am
But not only
11:28 am
I guess keeping it just for that meaning could make sense, if you say it sounds more permanent?
11:28 am
antlarr joined the channel
11:28 am
Pratha-Fish
11:29 am
bitmap
I think I prefer remove too
11:29 am
Pratha-Fish
I'll get back to you in ~10 minutes right after I finish lunch. Having some issues comparing areas from the two sources π«
11:29 am
bitmap
ok :)
11:36 am
Pratha-Fish
hey reosarevok if you have 10 minutes, we can also maybe take up this opportunity to do an overall survey of the whole project?
11:36 am
reosarevok
I do have 10 minutes, yes :) So we can do that
11:36 am
Pratha-Fish
The current issue that I am facing is as follows:
11:36 am
We have pristine data coming from both sources, in almost exactly the same format.
11:36 am
e.g.
11:37 am
(generating, just a sec)
11:38 am
11:41 am
NVM
11:41 am
kellnerd
Trying to demonstrate the issue has solved it? :)
11:41 am
reosarevok
Figured it out? :D
11:41 am
Pratha-Fish
kellnerd: I only wish π
11:41 am
Ran into a little bug
11:41 am
Anyway
11:42 am
The structure of the data fetched from musicbrainz is almost the same as well. But the id columns in it use musicbrainz_ids instad of wikidata ids
11:42 am
Soo basically comparing based on ids is not an option
11:42 am
reosarevok
You should be comparing based on the wikidata ids you need to get from the URLs :)
11:43 am
Pratha-Fish
The big idea is, to somehow index areas even with the same names but different subdivisions and countries such that each area generates a unique index for it to be queried
11:43 am
reosarevok: that's exactly what I am trying right now!
11:43 am
Not sure how well it's gonna work tho, but let's hope that ALL entries in musicbrainz have a wikidata id associated with it
11:43 am
*id -> url
11:44 am
reosarevok
99.99% should
11:44 am
Pratha-Fish
As for the second issue, we have some repeating areas going with the same subdivision and country, but different wikidata URLs
11:45 am
lemme fetch an example real quick
11:46 am
Ahh where are teh examples when you need them π«
11:46 am
11:47 am
The big idea with these entries is detecting them.
11:48 am
reosarevok
That does seem to be a full duplicate, yeah
11:48 am
They're probably rare, but I'm sure there will be more
11:49 am
Pratha-Fish
I was just wondering what could we even do about it
11:49 am
reosarevok
I think it seems fine to skip those / log them
11:49 am
If you detect you'd add a second area with the same name and parents at least, you could log it and not do it automatically
11:49 am
Pratha-Fish
I can calculate them based on the city name, but what if they turn out to be cities with same name but different subdivision? or even country?
11:49 am
So I tried generating an index of such areas to filter out such areas, but here's the result
11:50 am
reosarevok
(since if we *do* add it we should actually add a disambiguation anyway so it's better if a human does it)
11:50 am
Pratha-Fish
11:50 am
reosarevok
Hmm. In MB?
11:50 am
Or in WD?
11:51 am
Pratha-Fish
Its from MB
11:51 am
reosarevok
In MB cities with no parents should be rare
11:51 am
Pratha-Fish
but same story from wiki
11:51 am
Actually wiki could be suffering from some other issue due to my own bad code
11:51 am
11:51 am
reosarevok: let's see how rare
11:52 am
reosarevok
wikidata having a few amount of those seems more likely, you should just ignore them if so
11:52 am
a few amount. me fail english that's unpossible
11:52 am
Pratha-Fish
reosarevok: we have 4885 of those in MB surprisingly
11:52 am
reosarevok
That does not sound right
11:52 am
Examples? :)
11:52 am
Pratha-Fish
on it
11:54 am
Apparently there's also something wrong with my data π
11:54 am
Will have to take a moment to get it fixed
11:54 am
reosarevok
That's ok
11:56 am
Pratha-Fish
Ah yes.
11:56 am
I had to reset my dev environment a couple of days ago
11:56 am
So looks like I am missing something in the musicbrainz database
11:57 am
Here's how I've set it up so far:
11:57 am
1. Dropped musicbrainz_db and reimported it using the datadumps generated by bitmap (contained all areas and relations last I checked.)
11:57 am
2. Created the materialized tables as well
11:58 am
But before the reset, my query used to fetch around 700k rows along with parent data. But now it barely fetches 200k with NO PARENT DATA
12:00 pm
bitmap
you sure you imported the second dump and not the first one?
12:00 pm
Pratha-Fish
Yes, I am pretty sure
12:00 pm
But I'll try again just in case
12:02 pm
bitmap
select count(*) from area_containment; -> 320019 in production
12:03 pm
Pratha-Fish
Yep, looks like I don't have area_containment data π€¦ββοΈ
12:04 pm
I remember running the materialized table command in musicbrainz docker as well.
12:04 pm
I'll try restarting it
12:04 pm
bitmap
is the table completely empty?
12:05 pm
Pratha-Fish
There was no table detected π
12:05 pm
bitmap
lol
12:06 pm
it should always exist in the schema, even if it's empty
12:06 pm
Pratha-Fish
lmao
12:06 pm
are you sure you spelled the table name right?
12:06 pm
BrainzGit
12:06 pm
bitmap
pretty sure
12:07 pm
Pratha-Fish
I'll try dropping and reimporting the dump then ig
12:07 pm
It takes a while tho
12:07 pm
But while it's done, let's discuss the current state of the project ig
12:07 pm
bitmap
where is the table "not detected"? how are you checking for it?
12:07 pm
Pratha-Fish
*being executed
12:07 pm
bitmap
could also be a search_path issue
12:07 pm
Pratha-Fish
rant this in bash `psql -U musicbrainz -h localhost -c "select count(*) from area_containment;" -d musicbrainz`
12:08 pm
12:08 pm
bitmap
is your database name musicbrainz or musicbrainz_db? (you mentioned the latter earlier, but your psql command uses musicbrainz)
12:09 pm
Pratha-Fish
π€¦ββοΈ
12:09 pm
fixed it and it worked π
12:09 pm
reosarevok
Yay
12:09 pm
Pratha-Fish
the count is 320019
12:10 pm
bitmap
π
12:10 pm
Pratha-Fish
Thankgod. Those resets take quite a while on my device π
12:11 pm
Yayy now my query is fetching parent areas as well!
12:11 pm
reosarevok
Yay
12:12 pm
Pratha-Fish
Give me a sec, I am running some compute to figure out fatherless areas on musicbrainz
12:12 pm
reosarevok
Perfect
12:12 pm
Pratha-Fish cackles at the concept of "fatherless" areas
12:12 pm
No fatherland for you!
12:13 pm
bitmap: "Automatically subscribe me to artists I create", "Batch-create new works", "This user has not created any entities"
12:13 pm
All those sound like it should be "added" to me ...
12:13 pm
"Only the editor who created an edit can cancel it." that sounds like it should be entered :)
12:14 pm
Pratha-Fish
Alright, looks like we still have 2349 fatherless cities in musicbrainz π«
12:14 pm
reosarevok
Ok, examples :)
12:14 pm
bitmap
reosarevok: both your suggestions sgtm
12:14 pm
reosarevok
I think I might look over "create" strings next then
12:14 pm
petitminion joined the channel
12:15 pm
Pratha-Fish: I'm around for 15 min more max, and then in a few hours btw :)
12:15 pm
Pratha-Fish
Alright, trying to make it as fast as possible
12:15 pm
reosarevok
It's fine later too :)
12:16 pm
But let's see
12:16 pm
Check the GID, open it in MB, see if it seems correct
12:16 pm
monkey
12:27 pm
petitminion has quit
12:27 pm
Pratha-Fish