in #metabrainz

1:09 AM
aerozol

lucifer: chinmay: exciting! What’s it for?
1:10 AM
Let me know if mockups would be useful
2:47 AM
GibusWearingMann has quit
4:05 AM
lucifer

chinmay: oh i just looked at the mockup and we probably need something much simpler.
4:06 AM
aerozol: fresh releases page, this is the current mockup. feel free to suggest about it. https://usercontent.irccloud-cdn.com/file/zCNY9...
4:08 AM
aerozol

Ooh that is really nice though. Great use of the whole screen
4:26 AM
I would like the opportunity to feed back on the details at some point if possible. Overall the usability looks great, how I would use it anyway. Some details could be workshopped a bit
4:27 AM
Non-design feedback: As a user what would make this perfect is if I can go back and forth in time, and also filter by tags
4:27 AM
I'm sure that will be easy 😁
6:30 AM
mayhem

mooin!
6:30 AM
aerozol: yes, back and forth in time is exactly the idea!
6:45 AM
lucifer

monkey: hi! when you have some time, can you please look into how we could make something like the slider in the mockup above?
6:54 AM
ansh

lucifer: Hi! Currently we don't have any api endpoint to publish draft reviews. Should I add it here? https://critiquebrainz.readthedocs.io/api/endpo...
6:55 AM
skelly37 joined the channel
6:58 AM
lucifer

ansh: oh i see, yes i think it makes sense to update that endpoint to support changing draft to publish.
6:59 AM
ansh

lucifer: Okay, I'll include this addition in CB#427
6:59 AM
BrainzBot

CB-410: Revisions of reviews are created when drafts are made, and they remain visible: https://github.com/metabrainz/critiquebrainz/pu...
6:59 AM
lucifer

👍
7:00 AM
genpaku has quit
7:00 AM
genpaku joined the channel
7:01 AM
mayhem

lucifer: the branch spotify-release-index has the first attempt at the inverted index. except it takes hours to run the query -- my machine fell asleep before it completed after 4 or 5 hours or so.
7:01 AM
if you feel like taking a look at my query to see if I am screwing it up, please take a look. in mbid_mapping/mapping/spotify_metata_index.py
7:02 AM
lucifer

i cannot see the branch on github yet, is it pushed?
7:02 AM
mayhem

otherwise I'll get back on it later next week.
7:02 AM
sorry, fixed.
7:03 AM
lucifer

thanks, will look into it. to confirm the index did build but took a long time?
7:03 AM
mayhem

it never finished.
7:04 AM
lucifer

ah ok
7:04 AM
mayhem

on gaga there is a subset tmp_sp_metadata table that it does complete on.
7:04 AM
lucifer

i think you forgot to commit and push some changes, https://github.com/metabrainz/listenbrainz-serv...
7:05 AM
mayhem

meh. also fixed.
7:06 AM
lucifer

hmm still nothing new there.
7:13 AM
mayhem

https://github.com/metabrainz/listenbrainz-serv...
7:13 AM
sorry, rushing off to a conference this morning. not fully present
7:13 AM
lucifer

yes there now, thanks!
7:13 AM
mayhem

first conference in... how long?
7:13 AM
I wonder how long before I walk out in disgust.
7:13 AM
lucifer

hehe lol
7:59 AM
ansh

lucifer: I found that every-time we create a new draft review, or we update the draft review, the avg_ratings table gets updated. This shouldn't be done. So I've fixed that also. So after this PR is merged, we need to run 2 queries in our main database. 1. To delete older draft revisions. 2. To delete the avg_ratings for the draft reviews.
8:00 AM
alastairp

morning
8:00 AM
lucifer

ansh: i see, makes sense. please add a comment on the PR so that we don't forget about this later.
8:00 AM
alastairp

ansh: good catch, so we should only update it when something is published or edited?
8:00 AM
ansh

yes
8:01 AM
So whenever a draft review is published, the ratings get updated
8:16 AM
Pratha-Fish

alastairp: Hi there
8:22 AM
alastairp

hi Pratha-Fish, how are you?
8:23 AM
Pratha-Fish

I'm doing fine, just got sidetracked for a few days due to personal issues.
8:23 AM
I am back on track now tho
8:23 AM
Working on the cleanup script RN
8:24 AM
alastairp

ok great
8:24 AM
do you have a draft version of it online that we can look at?
8:25 AM
Pratha-Fish

I've not committed any changes yet, but I can do it RN
8:25 AM
alastairp

yes please
8:25 AM
Pratha-Fish

Currently I am writing it all in a jupyter notebook, but the structure of the script is there
8:27 AM
alastairp: I've pushed the changes on the repo. Please refer to the notebook named "test_clean_master.ipynb"
8:28 AM
CatQuest

hi Pratha-Fish !
8:28 AM
🐟
8:28 AM
Pratha-Fish

CatQuest: Hi!🐟
8:32 AM
mayhem

Zas, alastairp , monkey, Freso : invoices plz
8:43 AM
lucifer

mayhem: the query is sane. just too much data and turning jsonb into columns on the fly. asked on pg channel as well, solution ranges from using a materialized view which can improve speed by not bringing data from pg to python. but the query will still remain slow iiuc, we'll probably need to normalize the data to some extent to make this performant.
8:45 AM
mayhem

Yeah, given that we need to rerun the query at least on a weekly basis, yes we need to.
8:46 AM
Would making the key jsonb columns into real columns help this?
8:46 AM
Or maybe we need to create artist/album/track tables for easier querying
8:49 AM
Mineo joined the channel
8:54 AM
skelly37 has quit
9:11 AM
Pratha-Fish

alastairp: Is there a table that I can query using recording_MBID that gives an artist_credit_list as well as release_MBID?
9:12 AM
alastairp

Pratha-Fish: that sounds like the previous query that we had to get the artist mbids
9:13 AM
Pratha-Fish

alastairp: yes, but apparently I lost the query somewhere 🥲
9:14 AM
alastairp

the `recording.artist_credit` field maps to `artist_credit.id` then `artist_credit_name.artist_credit` links to `artist_credit.id` and `artist_credit_name.artist` links to `artist.id` (from where you can get the `artist.gid`)
9:14 AM
Pratha-Fish

I tried exploring existing queries, but none of them yielded the expected results
9:14 AM
alastairp

give it a go to write a query, and let me know if you have any troubles
9:14 AM
Pratha-Fish

sure. I am learning DBMS this semester as well. Shouldn't be too hard
9:15 AM
lucifer

mayhem: yeah, i think a track level table would be a good start.
9:16 AM
alastairp

Pratha-Fish: for release mbid, you can use `mapping.canonical_musicbrainz_data.recording_mbid`
9:16 AM
lucifer

hmm, i am not sure why you order by album_id though? https://github.com/metabrainz/listenbrainz-serv...
9:16 AM
Pratha-Fish

alastairp: ``` select recording.gid as rec_gid,
9:16 AM
ac.gid as artist_credit_gid
9:16 AM
from recording join artist_credit ac
9:16 AM
on ac.id=artist_credit;``` seems to be working except for the release MBID part
9:17 AM
lucifer

i'll look into how spotify treats mutliple releases of an album to see if it makes sense.
9:17 AM
alastairp

Pratha-Fish: yes, right. however note that `ac.gid` is the mbid of the _artist credit_, not the mbids of the artists that make up that artist credit
9:18 AM
Pratha-Fish

alastairp: right. I noticed that too. Looks like mapping.canonical_musicbrainz_data is the best bet here
9:19 AM
lucifer

note that that table only has canonical recordings data not all recordings.
9:19 AM
Pratha-Fish

lucifer: I was about to ask that same question lol
9:20 AM
mayhem

lucifer: ordering by album_id to ensure that albums are processed in order
9:20 AM
Pratha-Fish

Also, should it be a problem, considering that we'll be fetching canonical rec mbids for all possible recordings anyway?
9:20 AM
lucifer

mayhem: oh duh, right. makes sense. i forgot to look at the python code.
9:21 AM
alastairp

Pratha-Fish: ah right, canonical_musicbrainz_data also has a list of artist_mbids, perfect
9:23 AM
Pratha-Fish

alastairp: But as lucifer pointed out, it only has canonical MBIDs. We'll be cleaning every rec_mbid to find its canonical_mbid anyway, but in case there's recording_MBIDs that work just fine, but for some reason don't have a canonical_MBID, we could be restricting the data too much.
9:24 AM
monkey

chinmay: (cc lucifer) For the slider on the right of the fresh release page, we could very simply use an HTML slider and customize it with CSS. It's going to be the easiest and lightest method: https://blog.hubspot.com/website/html-slider
9:24 AM
alastairp

lucifer: isn't an mbid which only has 1 recording the canonical mbid of itself?
9:26 AM
lucifer

alastairp: yes. and there's two ways to check canonical recordings in our tables. 1) exists in canonical_musicbrainz_data 2) does not exist in canonical_recording_redirect table.
9:26 AM
alastairp

there are 27.8m recordings, 21.5m canonical_musicbrainz_datas and 5.9m recording redirects
9:26 AM
that pretty much adds up to the same number (modulo a bit of rounding)
9:26 AM
lucifer

monkey: oh duh, makes sense, thanks.
9:26 AM
alastairp: yes, give or take 100k for standalone recordings;
9:27 AM
alastairp

lucifer: so I expect then that every recording in MB should have an entry in canonical_musicbrainz_data?
9:27 AM
lucifer

alastairp: you mean after redirecting?
9:27 AM
alastairp

if it's in musicbrainz_data, and isn't in recording_redirect, then that's fine. that's still the metadata for that recording mbid
9:27 AM
monkey

There's some subtlety as to what action we take when the user changes the input (in particular we probably want a good debounce function to reduce the number of times we update the page) but it's all fairly easily figureoutable
9:28 AM
alastairp

lucifer: no, not after redirecting. I mean for items which don't have a redirect
9:28 AM
monkey

Do we have the enitre list of fresh releases when we load the page?
9:28 AM
lucifer

alastairp: ah ok, yes.
9:28 AM
monkey: yes.
9:28 AM
monkey

OK, even easier then
9:29 AM
We can use the percentage value from the slider to display the right slice of results.
9:29 AM
alastairp

Pratha-Fish: it looks like it's fine, you should have all of the required metadata in that table
9:29 AM
Pratha-Fish

alastairp: Great!
9:30 AM
And eitherway, we'll be finding redirects and canonical IDs for all IDs, so we can check the few remaining outliers manually to see what we're missing
9:31 AM
mayhem

First mention of Blockchain and NFTs. Sigh
9:32 AM
lucifer

alastairp: 2 catches which probably don't matter but fyi, 1) standalone recordings - no canonical table has that but those are a small proportion anyway. 2) there will always be some canonical recording data in canonical_musicbrainz_data for every canonical mbid but the canonical mbids in themselves are not stable across runs. generating that twice will probably yield different results each time.
9:33 AM
mayhem

Huh. 70% of bandcamp users are looking for new music. A stark contrast to the rest industry.
9:37 AM
monkey

Indeed
9:41 AM
alastairp

lucifer: I think we found that the proportion of standalone recordings in mlhd was small enough that we can probably ignore them for this? I think it'll be fine
9:42 AM
lucifer: the non-stable across runs is because we might have 2 things with all the same fields, and so there is no explicit order?
9:42 AM
lucifer

yes to both.
9:43 AM
alastairp

I think that's OK - perhaps if possible we should add a tie breaker on recording.gid or recording.id anyway to make it stable, but perhaps not a huge issue either
9:43 AM
lucifer

the difference is in medium or track number which our ordering does not consider.
9:43 AM
makes sense
9:46 AM
alastairp

just some thinking out loud - I think it makes sense that if we declare something as "canonical", then over multiple runs that should stay stable. I don't mind if something becomes "more canonical" based on more data being added to the mb database, that's kinda normal
9:53 AM
agatzk has quit
9:55 AM
agatzk joined the channel
10:29 AM
atj

mayhem: 99% sure the address on the wiki for AirBNB #1 is incorrect and the address on the booking page is correct.
10:30 AM
one of the reviews mentions that a restaurant called Xines Nord is across the street, which tallies with the address on the booking page.
10:32 AM
mayhem

Yes, that is finally clear. Please update the wiki.
10:33 AM
atj

will do
11:36 AM
ansh

lucifer: How can I fix this? https://www.irccloud.com/pastebin/9FDl9NUU/
11:39 AM
CatQuest has left the channel
11:41 AM
lucifer

ansh: i can look into it in some time but on a very quick scan you probably want at least one of the histograms in do update to be excluded.histogram
11:42 AM
remeber that the new data will be in excluded.histogram and old one in just histogram.
11:43 AM
ansh

Oh understood. But i am getting this error in line 11 of this paste. "VALUES ('42ca4d72-41cd-4874-aedf-8ff6bb2c18d2', 'release_group', 80, 1, jsonb_set(histogram, '1' , (COALESCE(histogram->>'2','0')::int + 1)::text::jsonb))"
11:44 AM
BrainzGit

[listenbrainz-server] 14MonkeyDo opened pull request #2190 (03master…typescript-4-update): Update Typescript to v4.x https://github.com/metabrainz/listenbrainz-serv...
11:45 AM
lucifer

ah right. you cannot use the columns of a table in values clause like that.
11:45 AM
ansh

I think it is because since I am adding a new row, there is no previous value to update, so therefore the error .
11:45 AM
lucifer

just always insert a new histogram in values and let do update handle the case of existing rows
11:46 AM
ansh

okay, I'll try that
11:47 AM
lucifer

if you really need the old row in values, we can use a cte to get that row before insert but so far i dont think ot should be needed.
11:48 AM
ansh

Can we set the value for this column from the default value `{"1" : 0, "2" : 0, "3" : 0, "4" : 0, "5" : 0}` to `{"1" : 0, "2" : 1, "3" : 0, "4" : 0, "5" : 0}`?
11:49 AM
while adding a new row without conflict ?
11:51 AM
lucifer

sure instead of using histogram in values, hardcode the default.
11:54 AM
ansh

No I mean, I have set the default to `{"1" : 0, "2" : 0, "3" : 0, "4" : 0, "5" : 0}`. Now whenever I add a new row, I can just update `2` to 1 ?
11:56 AM
* the value of key "2" to 1
12:04 PM
lucifer

ansh: iiuc you set this as the default value for the column in create table. in that case not sure but probably no cant do that. however you can directly put this value in jsonb_set call and that should work.
12:09 PM
ansh

lucifer: Understood. Thanks
12:28 PM
monkey

That can't be good. https://usercontent.irccloud-cdn.com/file/ja1x4...
12:40 PM
Pratha-Fish

alastairp: I checked out what would happen if we straightup replace all artist_MBIDs and release_MBIDs from MLHD with the ones we fetch based upon cleaned recording_MBIDs. Here's the results:
12:40 PM
% Coverage of recording MBIDs: w/ MLHD: 0.87
12:40 PM
% Coverage of recording MBIDs: w/ cleaned recording_MBID: 0.97
12:40 PM
% Coverage of release MBIDs: w/ MLHD: 0.87
12:40 PM
% Coverage of release MBIDs: w/ cleaned recording_MBID: 0.79