#metabrainz

/

2:01 AM
Toasty has quit

2023-01-02 00220, 2023

3:40 AM
serialata joined the channel

2023-01-02 00220, 2023

3:40 AM
serial-ata joined the channel

2023-01-02 00201, 2023

3:49 AM
Maxr1998 joined the channel

2023-01-02 00235, 2023

3:50 AM
Maxr1998_ has quit

2023-01-02 00254, 2023

4:39 AM
Divyansh joined the channel

2023-01-02 00253, 2023

4:40 AM
Divyansh has quit

2023-01-02 00211, 2023

5:14 AM
kaine2 has quit

2023-01-02 00236, 2023

5:14 AM
kaine2 joined the channel

2023-01-02 00200, 2023

5:21 AM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #2316 (03master…missing-left-join): Change inner join to left join in spark listens dump https://github.com/metabrainz/listenbrainz-server…

2023-01-02 00227, 2023

5:22 AM
lucifer

jasje: what url are you calling? https://test-api.listenbrainz.org/1/stats/user/ja… shows some data to me.

2023-01-02 00234, 2023

5:22 AM
lucifer

vibhoo_24: will do

2023-01-02 00205, 2023

5:42 AM
serialata has quit

2023-01-02 00205, 2023

5:42 AM
serial-ata has quit

2023-01-02 00244, 2023

6:15 AM
akshaaatt

https://usercontent.irccloud-cdn.com/file/bnKjWA7…

2023-01-02 00207, 2023

6:16 AM
akshaaatt

akshaaatt: , ansh , riksucks , lucifer in the picture!

2023-01-02 00230, 2023

6:16 AM
akshaaatt

New Year Bang, with the MetaBrainz Gang! 🔥⚡️

2023-01-02 00205, 2023

6:43 AM
vibhoo_24 joined the channel

2023-01-02 00235, 2023

7:09 AM
vibhoo_24

lucifer: for syncing my forked repo should I use git commands for should I directly do it from the github?

2023-01-02 00252, 2023

7:09 AM
lucifer

vibhoo_24: either is fine.

2023-01-02 00206, 2023

7:10 AM
vibhoo_24

okay done

2023-01-02 00252, 2023

7:15 AM
vibhoo_24 has quit

2023-01-02 00202, 2023

7:43 AM
jasje joined the channel

2023-01-02 00254, 2023

7:43 AM
jasje

lucifer: i was using the non test version all along :(

2023-01-02 00255, 2023

7:44 AM
jasje

https://api.listenbrainz.org/1/stats/user/jasjeet…

2023-01-02 00259, 2023

7:44 AM
jasje

this one

2023-01-02 00233, 2023

7:59 AM
jasje has quit

2023-01-02 00258, 2023

8:00 AM
lucifer

jasje: that is fine. but you need to add the year 2022 at the end. at least until we make 2022 the default (probably on 4th Jan)

2023-01-02 00215, 2023

8:22 AM
jasje joined the channel

2023-01-02 00247, 2023

8:38 AM
alastairp

hello!

2023-01-02 00205, 2023

8:51 AM
jasje

hellu!

2023-01-02 00225, 2023

8:51 AM
jasje

lucifer: mb thanks

2023-01-02 00256, 2023

8:56 AM
lucifer

hi alastairp !

2023-01-02 00241, 2023

9:43 AM
jasje

lucifer: also about the playlist posters or artCovers

2023-01-02 00212, 2023

9:44 AM
jasje

what should i use?

2023-01-02 00203, 2023

9:45 AM
lucifer

https://api.listenbrainz.org/1/art/year-in-music/… and https://api.listenbrainz.org/1/art/year-in-music/…

2023-01-02 00219, 2023

9:47 AM
jasje

and if i wanted to search based on mbids?

2023-01-02 00224, 2023

9:51 AM
jasje

lucifer: is there any way??

2023-01-02 00255, 2023

9:51 AM
lucifer

jasje: yes, there are a couple of ways.

2023-01-02 00204, 2023

9:52 AM
lucifer

which part do you want this for?

2023-01-02 00200, 2023

9:53 AM
jasje

In the figma file, second frame first page, top artists of 2022

2023-01-02 00213, 2023

9:53 AM
jasje

top albums of 2022**

2023-01-02 00245, 2023

9:54 AM
lucifer

jasje: you can construct the url for image this way: `https://archive.org/download/mbid-{caa_release_mbid}/mbid-{caa_release_mbid}-{caa_id}_thumb500.jpg`

2023-01-02 00219, 2023

9:55 AM
lucifer

items in the top album list have caa_id and caa_release_mbid fields present and not null if there is a cover art available.

2023-01-02 00230, 2023

9:55 AM
lucifer

if the field is missing then no cover art available.

2023-01-02 00224, 2023

9:56 AM
lucifer

https://usercontent.irccloud-cdn.com/file/6VoyzNE…

2023-01-02 00240, 2023

9:56 AM
lucifer

like in this case only the 3rd item has cover art.

2023-01-02 00247, 2023

9:56 AM
jasje

oh so thats the reason

2023-01-02 00256, 2023

9:56 AM
jasje

so caa is basically used for the poster?

2023-01-02 00210, 2023

9:57 AM
lucifer

yes, caa is short for cover art archive

2023-01-02 00221, 2023

9:57 AM
jasje

that explains alot

2023-01-02 00225, 2023

9:57 AM
jasje

Thankyou!

2023-01-02 00211, 2023

10:29 AM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #2317 (03master…youtube-error-fix): LB-1168: Improve Youtube rate limit notification https://github.com/metabrainz/listenbrainz-server…

2023-01-02 00240, 2023

10:42 AM
mayhem

mooooin!

2023-01-02 00215, 2023

10:44 AM
lucifer

happy new year! and some mapping issues to make your day better :) https://musicbrainz.org/recording/e2bb4a3e-9579-4… and https://musicbrainz.org/recording/a0c1dd06-cc24-4…

2023-01-02 00249, 2023

10:44 AM
mayhem

joy.

2023-01-02 00236, 2023

10:45 AM
mayhem

simple solution: For a team of assassins. Find artists. Make a public example of some. Hope for better.

2023-01-02 00239, 2023

10:45 AM
mayhem

easy, no?

2023-01-02 00245, 2023

10:45 AM
mayhem

*form a team

2023-01-02 00256, 2023

10:45 AM
lucifer

😆

2023-01-02 00225, 2023

11:04 AM
rudraksh has quit

2023-01-02 00227, 2023

11:10 AM
Toasty joined the channel

2023-01-02 00225, 2023

11:14 AM
aerozol

akshaaatt: wow mb gang!! 🔥

2023-01-02 00203, 2023

11:16 AM
aerozol

No meeting tomorrow right?

2023-01-02 00210, 2023

11:20 AM
akshaaatt

Yuss aerozol ! ⚡️

2023-01-02 00237, 2023

11:20 AM
akshaaatt

Why? Are you planning to join us here , aerozol ?

2023-01-02 00239, 2023

11:22 AM
aerozol

Wait, in India or on IRC > <

2023-01-02 00243, 2023

11:22 AM
aerozol

One is more likely

2023-01-02 00216, 2023

12:36 PM
jasje has quit

2023-01-02 00259, 2023

12:37 PM
Toasty has quit

2023-01-02 00228, 2023

13:25 PM
lucifer

aerozol: as the agenda says, next IRC meeting is on 9th

2023-01-02 00255, 2023

13:37 PM
mayhem

lucifer: for the top discoveries data, I should run that on gaga, yes?

2023-01-02 00235, 2023

13:38 PM
lucifer

mayhem: thats the one that needs track data?

2023-01-02 00203, 2023

13:39 PM
lucifer

i mean all the tracks that a user listened to in the year data

2023-01-02 00215, 2023

13:40 PM
mayhem

no, the other one.

2023-01-02 00258, 2023

13:40 PM
lucifer

i see. currently those playlists are generated together when data comes in from spark.

2023-01-02 00220, 2023

13:41 PM
lucifer

i can add another command to directly test for a subset of users

2023-01-02 00238, 2023

13:42 PM
mayhem

ah, no. shouldn't be necessary. the data is fetched via the dataset hoster off wolf.

2023-01-02 00253, 2023

13:42 PM
mayhem

I'll just run the script to generate the full data and we should be set.

2023-01-02 00222, 2023

13:43 PM
lucifer

ah cool

2023-01-02 00200, 2023

13:44 PM
lucifer

i think we should wait for 4th to generate the data

2023-01-02 00223, 2023

13:44 PM
mayhem

in case more people do imports?

2023-01-02 00234, 2023

13:44 PM
mayhem

ok, that's cool. allows me to do stupid finances and the like.

2023-01-02 00237, 2023

13:44 PM
lucifer

at least for final playlists. because the latest full dump hasnt yet been imported.

2023-01-02 00240, 2023

13:44 PM
lucifer

yes that too

2023-01-02 00207, 2023

13:45 PM
lucifer

also, i found a bug in my latest dump changes so opened a PR to fix.

2023-01-02 00241, 2023

13:45 PM
mayhem

2316?

2023-01-02 00224, 2023

13:46 PM
lucifer

yes

2023-01-02 00232, 2023

13:46 PM
mayhem

lgtm

2023-01-02 00253, 2023

13:46 PM
lucifer

already generated a dump and currently being imported in spark. size of the dump looks correct.

2023-01-02 00217, 2023

13:47 PM
BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #2316 (03master…missing-left-join): Change inner join to left join in spark listens dump https://github.com/metabrainz/listenbrainz-server…

2023-01-02 00242, 2023

15:09 PM
BrainzGit

[musicbrainz-android] 14dependabot[bot] opened pull request #175 (03master…dependabot/gradle/org.jetbrains.kotlin-kotlin-gradle-plugin-1.8.0): Bump kotlin-gradle-plugin from 1.7.10 to 1.8.0 https://github.com/metabrainz/musicbrainz-android…

2023-01-02 00245, 2023

15:09 PM
BrainzGit

[musicbrainz-android] 14dependabot[bot] closed pull request #163 (03master…dependabot/gradle/org.jetbrains.kotlin-kotlin-gradle-plugin-1.7.22): Bump kotlin-gradle-plugin from 1.7.10 to 1.7.22 https://github.com/metabrainz/musicbrainz-android…

2023-01-02 00253, 2023

15:09 PM
BrainzGit

[musicbrainz-android] 14dependabot[bot] opened pull request #176 (03master…dependabot/gradle/com.squareup.okhttp3-mockwebserver-5.0.0-alpha.11): Bump mockwebserver from 5.0.0-alpha.7 to 5.0.0-alpha.11 https://github.com/metabrainz/musicbrainz-android…

2023-01-02 00229, 2023

15:15 PM
Toasty joined the channel

2023-01-02 00221, 2023

15:22 PM
Pratha-Fish

Hi alastairp, Hope you had a great holiday :)

2023-01-02 00244, 2023

15:24 PM
alastairp

hi Pratha-Fish, how are you?

2023-01-02 00255, 2023

15:24 PM
alastairp

I had a busy break, but it was fulfilling

2023-01-02 00228, 2023

15:25 PM
Pratha-Fish

These breaks never last long enough 🥲

2023-01-02 00256, 2023

15:25 PM
Pratha-Fish Even my college was gonna open back on ~20th Jan, but looks like they changed their mind and started it back again from today itself

2023-01-02 00207, 2023

15:26 PM
alastairp

your college is so weird

2023-01-02 00225, 2023

15:26 PM
Pratha-Fish

alastairp: You have seen nothing yet 💀

2023-01-02 00234, 2023

15:26 PM
alastairp

if mine changed anything from what was agreed 2 years ago, the unions would shut everything down

2023-01-02 00222, 2023

15:27 PM
Pratha-Fish

Well, you went to a pretty good college. My college campus itself looks like something out of Far Cry 3... With gangs and stuff

2023-01-02 00245, 2023

15:27 PM
alastairp

😬

2023-01-02 00200, 2023

15:28 PM
Pratha-Fish

Thankfully some good friends make college life manageable haha

2023-01-02 00233, 2023

15:28 PM
Pratha-Fish

But anyway, hopefully I'll be taking more days off ahead, so work should'nt be much of a problem

2023-01-02 00216, 2023

15:29 PM
vibhoo_24 joined the channel

2023-01-02 00247, 2023

15:29 PM
Pratha-Fish

I was checking this one out, and looks like you had to make a lot of edits in it alastairp https://github.com/Prathamesh-Ghatole/MLHD/commit…

2023-01-02 00232, 2023

15:30 PM
Pratha-Fish

Sorry you had to go through all that effort even though I had 4 months to see the work through🥲

2023-01-02 00244, 2023

15:30 PM
Toasty has quit

2023-01-02 00255, 2023

15:30 PM
alastairp

Pratha-Fish: yes, right. loading some more data from musicbrainz, and then being very detailed about how we treated each bit of data and what we do with it in each case

2023-01-02 00224, 2023

15:32 PM
Pratha-Fish

Hmm

2023-01-02 00229, 2023

15:32 PM
alastairp

Pratha-Fish: you would be surprised... remember that 90% of what I wrote in this change was easy for me because 1) I've done this kind of thing for 15 years, or 2) because you did all of the heavy lifting to answer all of our unknown questions about the data

2023-01-02 00242, 2023

15:32 PM
alastairp

I'm still really happy with where we got to

2023-01-02 00235, 2023

15:33 PM
Pratha-Fish

Well that was one hell of a learning curve haha. I still don't know how we got to the end in the first place lol

2023-01-02 00245, 2023

15:33 PM
alastairp

the key was to work out which bits of data your conversion code had left behind, I tried to carefully discuss this in the comments in the process_df_new function

2023-01-02 00225, 2023

15:34 PM
Pratha-Fish

Thanks for the comments, they have been pretty helpful

2023-01-02 00239, 2023

15:34 PM
Pratha-Fish

I'll try to leave more along the way too

2023-01-02 00255, 2023

15:35 PM
Pratha-Fish

alastairp: So can you give me a brief overview of the new changes? And what steps we need to take ahead

2023-01-02 00239, 2023

15:36 PM
Pratha-Fish

^ Whenever you're free that is

2023-01-02 00210, 2023

15:38 PM
alastairp

Pratha-Fish: your code worked well for items which had a recording mbid, and for which the recording mbid was valid

2023-01-02 00227, 2023

15:39 PM
alastairp

so for recordings we go through our steps: look up if there is a redirect and replace it if necessary; look up if there is a canonical id and replace it if necessary; and then look up artist and release information

2023-01-02 00217, 2023

15:40 PM
alastairp

there's one pending item that came up in my testing here where we have what are called "non album tracks" in musicbrainz - that is, a recording with no related album. It turns out that there were quite a few of these, and I think that it's due to bad data in the mlhd, we need to come up with a better way of looking up this

2023-01-02 00203, 2023

15:41 PM
alastairp

Then I was looking at the case of "what if there is an artist and release id, but no recording?" - you had already considered this a bit (your keep_missing, turn_blank parameters)

2023-01-02 00246, 2023

15:42 PM
alastairp

when I started looking at the data in detail, a few things became clear to me. first, we were talking about having 2 datasets, one with only rows that have all columns (artist, release, recording), and one that has all rows from mlhd (without throwing away bad data)

2023-01-02 00222, 2023

15:43 PM
alastairp

I realised that we could actually make a single dataset that contains both of this, by making one set of files with the "all column" data, and another set of files with the same filename in a separate directory containing _only_ the incomplete rows. This means if you need only all column data, you read just 1 file, and if you want all rows, you read 2 files and merge the rows together

2023-01-02 00223, 2023

15:43 PM
Pratha-Fish

"non album tracks" that's a new one

2023-01-02 00246, 2023

15:43 PM
alastairp

this is great because it means we don't need to make the dataset 2x as big to get all of the necessary data

2023-01-02 00203, 2023

15:45 PM
Pratha-Fish

What do you mean by "column data" here?

2023-01-02 00229, 2023

15:45 PM
alastairp

I mean the columns from the mlhd, timestamp, artist id, release id, recording id

2023-01-02 00247, 2023

15:45 PM
Pratha-Fish

Ah I see

2023-01-02 00258, 2023

15:45 PM
alastairp

so I mean "rows for which we have an artist id, a release id, and a recording id"

2023-01-02 00207, 2023

15:46 PM
Pratha-Fish

And while reading this, something just popped up in my mind

2023-01-02 00229, 2023

15:46 PM
alastairp

because there are a bunch of rows with only an artist and release id, sometimes there are rows with only a timestamp (we know someone listened to something at this time, but there's no record of what it was)

2023-01-02 00224, 2023

15:47 PM
Pratha-Fish

Given that people tend to listen to the same songs again and again, we can just make a dataset of unique rows, and then couple it with another data set with just the row_ID and a list of timestamps at which the track was listened at

2023-01-02 00208, 2023

15:50 PM
alastairp

yes, possibly. there are actually databases that are designed to consider/store data in this alternate format. I'm not sure how much space it would save, but we can definitely try it and find out

2023-01-02 00255, 2023

15:50 PM
Pratha-Fish

Yes exactly

2023-01-02 00222, 2023

15:51 PM
Pratha-Fish

I came across another last.fm dataset too a while ago, and they seemed to distribute the data this way

2023-01-02 00211, 2023

15:52 PM
alastairp

the other thing that came up was some of our previous questions about recordings, but in this case applied to releases. So, I did the same process - 1) perform a redirect lookup, 2) find a canonical release id (this is a new dataset that I made only a month ago), find the artist of the release

2023-01-02 00226, 2023

15:53 PM
alastairp

this bought up another question about if the release id is actually the correct field to use - we realised that in probably 99% of cases that people want to use this dataset, they're really just interested in knowing the general concept of "what album did someone listen to", not the specific version/format/year that it was released in

2023-01-02 00245, 2023

15:53 PM
alastairp

in this case, the release group id is a better choice. so we need to work out how to add this to the dataset

2023-01-02 00241, 2023

15:54 PM
Pratha-Fish

That definitely sounds a lot better

2023-01-02 00242, 2023

15:55 PM
Pratha-Fish

Also, with our previous code, we were clearly missing out on some huge chunks of data where the recording MBID wasnt present, but the release MBID or artist MBID was

2023-01-02 00205, 2023

15:56 PM
Pratha-Fish

And ig we can also derive artist MBIDs from release MBIDs too

2023-01-02 00231, 2023

15:56 PM
alastairp

right, so I addressed that in the 2nd and 3rd third of the process_df_new function

2023-01-02 00201, 2023

15:57 PM
alastairp

see that there is a 'if recording', and then if things are successful there is a 'continue', otherwise it falls down to the 'if release' and 'if artist' cases

2023-01-02 00236, 2023

15:57 PM
Pratha-Fish

Great!

2023-01-02 00254, 2023

15:57 PM
Pratha-Fish

It makes me wonder, how's the processing time looking as of now?

2023-01-02 00214, 2023

15:58 PM
alastairp

I don't really know - I think slower than your original code, but not by much