#metabrainz

/

0:00 AM
SothoTalKer has quit

2021-12-01 33503, 2021

0:00 AM
gcrkrause1 has quit

2021-12-01 33501, 2021

0:02 AM
SothoTalKer joined the channel

2021-12-01 33537, 2021

0:03 AM
agatzk has quit

2021-12-01 33543, 2021

0:03 AM
agatzk joined the channel

2021-12-01 33533, 2021

0:04 AM
Zhele joined the channel

2021-12-01 33519, 2021

0:34 AM
adhawkins_ joined the channel

2021-12-01 33504, 2021

0:35 AM
adhawkins has quit

2021-12-01 33526, 2021

0:35 AM
adhawkins_ is now known as adhawkins

2021-12-01 33522, 2021

1:22 AM
CatQuest joined the channel

2021-12-01 33535, 2021

1:26 AM
krishan joined the channel

2021-12-01 33552, 2021

1:27 AM
[1997kB] joined the channel

2021-12-01 33542, 2021

2:44 AM
Lotheric_ joined the channel

2021-12-01 33501, 2021

2:49 AM
Lotheric has quit

2021-12-01 33539, 2021

2:58 AM
BrainzGit

[musicbrainz-server] 14santiagofn opened pull request #2353 (03master…patch-3): Give more details about changing PostgreSQL config https://github.com/metabrainz/musicbrainz-server/…

2021-12-01 33505, 2021

4:11 AM
gcrkrause1 joined the channel

2021-12-01 33523, 2021

4:55 AM
lucifer

ruaok: not sure i understand what's wrong with those tracks. acousticbrainz.org/b58da12b-3182-4afc-b5ff-764687… shows 10 submissions, 9 of which are 185 bpm and the dataset hoster shows 184 so checks out. am i missing something here?

2021-12-01 33510, 2021

5:52 AM
reosarevok

lucifer: https://www.youtube.com/watch?v=4RAjifC9ji0 does not exactly sound like a 185 BPM party track to me

2021-12-01 33522, 2021

5:52 AM
reosarevok

Does sound really nice tho :)

2021-12-01 33518, 2021

5:56 AM
lucifer

oh! i see. makes sense. yeah something not right there.

2021-12-01 33541, 2021

6:16 AM
atj has quit

2021-12-01 33524, 2021

6:49 AM
atj joined the channel

2021-12-01 33537, 2021

8:49 AM
ruaok

moooin!

2021-12-01 33559, 2021

8:49 AM
ruaok

lucifer: not your fault at all -- AB calculates the wrong BP for the track.

2021-12-01 33538, 2021

8:50 AM
ruaok

and supposedly BPM was the "good" data from AB, but its isn't reliable either. which really sucks.

2021-12-01 33524, 2021

9:00 AM
ruaok

when alastairp returns we'll have to ask if there is some improved algorithm in the wings, but I think for now we're dead in the water as far as BPM.

2021-12-01 33543, 2021

9:00 AM
ruaok

you graph might work, but it will be cute, not accurate.

2021-12-01 33502, 2021

9:01 AM
ruaok

https://usercontent.irccloud-cdn.com/file/ozGMU8E…

2021-12-01 33531, 2021

9:01 AM
ruaok

those outlier peaks on the right side of the graphs? I'd guess those are wrong.

2021-12-01 33542, 2021

9:01 AM
ruaok

(through no fault of your own)

2021-12-01 33532, 2021

9:15 AM
yvanzo

O’Moin

2021-12-01 33545, 2021

9:15 AM
ruaok

moin, yvanzo!

2021-12-01 33529, 2021

9:16 AM
ruaok

lucifer: it might be a good idea to pull up some of the tracks that make up mokey's right hand spike and see if those tracks are all correct. I rather doubt it.

2021-12-01 33556, 2021

9:24 AM
BrainzGit

[musicbrainz-server] 14yvanzo merged pull request #2353 (03master…patch-3): Give more details about changing PostgreSQL config https://github.com/metabrainz/musicbrainz-server/…

2021-12-01 33553, 2021

9:42 AM
alastairp

morning

2021-12-01 33514, 2021

9:55 AM
monkey

Moin!

2021-12-01 33502, 2021

10:09 AM
akshaaatt

moin!

2021-12-01 33525, 2021

10:09 AM
BrainzGit

[bookbrainz-site] 14MonkeyDo merged pull request #719 (03master…fix#BB-630): fix[BB-630]: fix regex to match ~ in url https://github.com/bookbrainz/bookbrainz-site/pul…

2021-12-01 33521, 2021

10:14 AM
ruaok

moin alastairp. feeling better?

2021-12-01 33505, 2021

10:17 AM
alastairp

ruaok: mas o menos. a lot better than monday

2021-12-01 33510, 2021

10:17 AM
alastairp

some interesting bpm stuff going on, I see

2021-12-01 33514, 2021

10:17 AM
ruaok

glad to hear that!

2021-12-01 33527, 2021

10:17 AM
ruaok

yeah, interesting is not he word I would use.

2021-12-01 33543, 2021

10:17 AM
ruaok

is there another BPM alg available that we could run the low level data through?

2021-12-01 33535, 2021

10:18 AM
alastairp

we can't re-process the existing AB data - any BPM algorithm would require more detailed data

2021-12-01 33551, 2021

10:18 AM
ruaok

I was afraid of that.

2021-12-01 33547, 2021

10:19 AM
ruaok

is there any data in AB that is reliable?

2021-12-01 33524, 2021

10:20 AM
ruaok

and what is the shortest path for us to make some data that is reliable?

2021-12-01 33539, 2021

10:22 AM
alastairp

as far as I'm aware, BPM is one of the most reliable types of data that we have. what problems are you seeing with it?

2021-12-01 33558, 2021

10:22 AM
ruaok

https://listenbrainz.org/playlist/51c0f637-aaec-4…

2021-12-01 33513, 2021

10:23 AM
ruaok

this playlist is sorted by BPM -- from slow to fast back to slow.

2021-12-01 33530, 2021

10:23 AM
ruaok

https://www.irccloud.com/pastebin/IIILiVgv/

2021-12-01 33504, 2021

10:24 AM
ruaok

Go to the playlist and jump to the middle of Catalowes, Sick Society and Lon.

2021-12-01 33528, 2021

10:24 AM
ruaok

blick bassey is consistently wrong in BPM.

2021-12-01 33538, 2021

10:24 AM
ruaok

https://usercontent.irccloud-cdn.com/file/ozGMU8E…

2021-12-01 33507, 2021

10:25 AM
ruaok

then these peaks on the upper end of BPM? likely all wrong tracks. probably pointing to a systemic problem in the BPM detection.

2021-12-01 33518, 2021

10:25 AM
ruaok

*wrong BPM

2021-12-01 33556, 2021

10:25 AM
ruaok

basically, I was trying to use BPM to smooth out the rollercoaster problem. Instead I made it MUCH worse.

2021-12-01 33502, 2021

10:27 AM
alastairp

it's possible that there are a category of recordings for which the BPM detection doesn't work very well

2021-12-01 33530, 2021

10:27 AM
ruaok

seems that way. which makes the data wholly useless for recommendation purposes.

2021-12-01 33541, 2021

10:27 AM
alastairp

remember, the algorithm looks for "loud peaks", and then computes the difference between them, assuming that each peak is a beat

2021-12-01 33500, 2021

10:28 AM
ruaok

yep, understood

2021-12-01 33523, 2021

10:28 AM
lucifer

https://www.irccloud.com/pastebin/QbZL932f/

2021-12-01 33532, 2021

10:28 AM
lucifer

ruaok: ^ the recordings making up that peak

2021-12-01 33533, 2021

10:28 AM
alastairp

so I'm thinking out loud - there are a few fields on the bpm histogram that might be useful. e.g. I'm looking at `bpm_histogram_first_peak_weight`, given the name of the field it might indicate how "strong" a bpm is

2021-12-01 33546, 2021

10:28 AM
monkey

Maybe the alg didn't get fed enough African rhythms

2021-12-01 33552, 2021

10:28 AM
ruaok

lucifer: thanks. I'll turn that into a playlist for inspection in a bit.

2021-12-01 33508, 2021

10:29 AM
alastairp

in which case we could remove items which have less certainty

2021-12-01 33535, 2021

10:29 AM
alastairp

I think monkey might be on to something too - I suspect it's pretty good for pop/rockish songs, so maybe we could filter out there as well

2021-12-01 33553, 2021

10:29 AM
ruaok

also remind of the problem where the alg might pick a wrong range... something about BPM twice or half of the stated value....

2021-12-01 33554, 2021

10:29 AM
alastairp

as a very basic filter, _maybe_ we could assume that a fast track is also loud?

2021-12-01 33525, 2021

10:30 AM
alastairp

yes - exactly that. sometimes it might pick a value twice or half, due to misidentifying the peaks

2021-12-01 33544, 2021

10:30 AM
ruaok

ok, so a possible approach is to identify these cases and then adjust BPM?

2021-12-01 33504, 2021

10:31 AM
lucifer

https://acousticbrainz.org/b58da12b-3182-4afc-b5f…

2021-12-01 33524, 2021

10:31 AM
lucifer

note that the bpm was identified as 125 once and 185 9 other times

2021-12-01 33531, 2021

10:31 AM
lucifer

*123 once

2021-12-01 33543, 2021

10:31 AM
alastairp

this is one of the reasons why we return the 1st and 2nd histogram peak, too. it's possible that we could ignore items where these 2 peaks are close (and therefore the algorithm is uncertain about which one to choose)

2021-12-01 33551, 2021

10:31 AM
ruaok

it is clearly not 125 either.

2021-12-01 33552, 2021

10:31 AM
alastairp

lucifer: are you doing any filtering/processing on this data?

2021-12-01 33557, 2021

10:31 AM
lucifer

nope

2021-12-01 33501, 2021

10:32 AM
ruaok

185 / 2, quite possible.

2021-12-01 33510, 2021

10:32 AM
monkey

Maybe it'll be worth doing some sorting by mood instead of BPM?

2021-12-01 33519, 2021

10:32 AM
BrainzGit

[musicbrainz-android] 14akshaaatt opened pull request #97 (03master…year_in_music): YIM addition https://github.com/metabrainz/musicbrainz-android…

2021-12-01 33531, 2021

10:32 AM
lucifer

actually yes, alastairp. selecting the bpm which is closest to the mean of all bpms of that recording mbid.

2021-12-01 33531, 2021

10:32 AM
ruaok

mood is a much higher level of data, this even more unreliable, monkey

2021-12-01 33545, 2021

10:32 AM
ruaok

I trust nothing of the high level data in AB.

2021-12-01 33518, 2021

10:33 AM
alastairp

the cantelows example - I could imagine that it's finding the high plucked notes as "beats", and therefore miscalculating the BPM

2021-12-01 33540, 2021

10:33 AM
monkey

I mean, BPM doesn't look super reliable or appropriate for what we're doing, but strong confidence in Aggressive/Not Aggressive might make for a better sorting

2021-12-01 33508, 2021

10:34 AM
alastairp

lucifer: note that the ?n=0 submission has 123 as the first peak and 185 as the 2nd peak

2021-12-01 33525, 2021

10:34 AM
monkey

That was mostly what the rollercoaster effect was for me. Calm song followed by agressive one.

2021-12-01 33502, 2021

10:35 AM
lucifer

alastairp: currently i am ignoring those fields just using bpm field. i have those peak fields available as well in the dump if we want to try some stuff out.

2021-12-01 33515, 2021

10:35 AM
alastairp

https://acousticbrainz.org/091ca532-17ed-4804-a38… here's a track a few items before catelows

2021-12-01 33527, 2021

10:35 AM
alastairp

bpm_histogram_first_peak_weight: "mean": 0.484

2021-12-01 33542, 2021

10:35 AM
alastairp

compared to catelows: bpm_histogram_first_peak_weight "mean": 0.19

2021-12-01 33551, 2021

10:35 AM
BrainzGit

[musicbrainz-android] 14akshaaatt opened pull request #98 (03master…notifications): Notifications and NewsBrainz Setup https://github.com/metabrainz/musicbrainz-android…

2021-12-01 33517, 2021

10:36 AM
alastairp

lucifer: you don't have the peak weights in the dump, by any chance

2021-12-01 33528, 2021

10:36 AM
lucifer

i'll check

2021-12-01 33550, 2021

10:36 AM
alastairp

I recommend that we try it again but only select items with a higher first peak weight

2021-12-01 33503, 2021

10:37 AM
ruaok

lucifer: I'm willing to try out the moods -- perhaps aggressive is one that would be ok.

2021-12-01 33510, 2021

10:37 AM
lucifer

https://www.irccloud.com/pastebin/BgagjvNB/peak_m…

2021-12-01 33518, 2021

10:37 AM
ruaok

lucifer: any chance you could make a dump of the mood as you did for BPM/key?

2021-12-01 33520, 2021

10:37 AM
lucifer

ruaok: a nicer version to import to bono for playlist

2021-12-01 33550, 2021

10:37 AM
lucifer

sure, can do but it'll probably take a long time. took 2 days to dump bpm/key.

2021-12-01 33500, 2021

10:38 AM
ruaok

shit.

2021-12-01 33505, 2021

10:38 AM
lucifer

maybe if i dump of frank it'll be faster? saving network trips to kiss.

2021-12-01 33508, 2021

10:38 AM
ruaok

ok, then I'll try fetching from AB for testing.

2021-12-01 33535, 2021

10:38 AM
lucifer

yeah that sounds better. if something pans out, we can do a full dump.

2021-12-01 33551, 2021

10:38 AM
alastairp

dortmund and rosamerica are definitely "better" genres, but they have very few categories, we'd be better off finishing the genre import and using tags instead

2021-12-01 33553, 2021

10:38 AM
lucifer

alastairp: {"id":26804382,"recording_mbid":"b10a16f5-5413-4ab3-9ed4-b99bbf9cbc77","bpm":"97.3801803589","bpm_histogram_first_peak_bpm_mean":"98","bpm_histogram_first_peak_bpm_median":"98","bpm_histogram_second_peak_bpm_mean":"83","bpm_histogram_second_peak_bpm_median":"83","key_key":"A","key_scale":"major"}

2021-12-01 33504, 2021

10:39 AM
lucifer

these are the fields in the dump currrently

2021-12-01 33524, 2021

10:39 AM
alastairp

ah, drat

2021-12-01 33525, 2021

10:39 AM
lucifer

so no peak weights :/

2021-12-01 33542, 2021

10:39 AM
alastairp

well, we could always dump moods + weights at the same time

2021-12-01 33507, 2021

10:40 AM
lucifer

yup that's possible

2021-12-01 33539, 2021

10:41 AM
alastairp

(although moods require a few joins into separate tables)

2021-12-01 33559, 2021

10:41 AM
alastairp

lucifer: yes, I'd dump directly on frank

2021-12-01 33522, 2021

10:42 AM
ruaok

could we create a dataset hoster that gives access to moods given a list of MBIDs?

2021-12-01 33546, 2021

10:42 AM
lucifer

ab image doesn't have psql so i dumped through lb-web. i'll start a temporary postgres container on frank this time then.

2021-12-01 33552, 2021

10:42 AM
ruaok

meaning that the dataset hoster queries frank and takes care of the picking of the instance of the MBID

2021-12-01 33510, 2021

10:43 AM
lucifer

bono has a subset of ab db fwiw.

2021-12-01 33538, 2021

10:43 AM
lucifer

this data is accessible through ab api so we could just use that.

2021-12-01 33556, 2021

10:43 AM
alastairp

yes to both of those - we can prototype it on bono and directly connect to db, if it works, release ds hoster on kiss connecting to frankdb

2021-12-01 33559, 2021

10:43 AM
lucifer

and opt the bono ip out of ratelimit if that becomes an issue.

2021-12-01 33505, 2021

10:45 AM
ruaok

I'll work on the AB api for now -- if that shows promise we can expand on that.

2021-12-01 33515, 2021

10:45 AM
ruaok

but first let me make a playlst from those MBIDs

2021-12-01 33510, 2021

10:46 AM
alastairp

that being said, we already have the bulk get specific (ll) feature API for AB, which is basically that. not sure why we never finished the hl version of this

2021-12-01 33548, 2021

10:46 AM
alastairp

lucifer: I sometimes use docker exec on the pg instance on frank to get a psql shell

2021-12-01 33510, 2021

10:47 AM
lucifer

ah right, that should work too.

2021-12-01 33503, 2021

10:48 AM
lucifer

https://www.irccloud.com/pastebin/RYpY1edG/

2021-12-01 33513, 2021

10:48 AM
alastairp

correct

2021-12-01 33521, 2021

10:48 AM
alastairp

json is stored in the toast data blob

2021-12-01 33532, 2021

10:48 AM
lucifer

hmm interesting. i am wondering if just pg_dump is enoguh.

2021-12-01 33539, 2021

10:48 AM
lucifer

oh so pg_total_size?

2021-12-01 33526, 2021

10:49 AM
lucifer

https://www.irccloud.com/pastebin/TCRBaIUF/

2021-12-01 33546, 2021

10:49 AM
lucifer

uh yeah. dumping whole table is not feasible.

2021-12-01 33551, 2021

10:49 AM
alastairp

that looks better

2021-12-01 33557, 2021

10:49 AM
alastairp

lucifer: what are you trying to do?

2021-12-01 33519, 2021

10:50 AM
lucifer

wondering if pg_dump could be faster than \copy, alastairp

2021-12-01 33500, 2021

10:51 AM
lucifer

that is pg_dump the whole table, transfer to michael and bring up a pg instance there. import the dump and let spark connect to the pg instance directly.

2021-12-01 33530, 2021

10:51 AM
alastairp

ah, right. I'm not sure where the current slowdown is - is it due to getting a field from the json? or was it slow last time because of the round trip between two different servers?

2021-12-01 33550, 2021

10:51 AM
alastairp

if you're after a pg dump

2021-12-01 33501, 2021

10:52 AM
alastairp

frank /home/alastair/acousticbrainz-pgdump-2021-06-03.pgdump

2021-12-01 33557, 2021

10:52 AM
lucifer

ah nice.

2021-12-01 33551, 2021

10:53 AM
lucifer

the size of pgdumps it too so probably not a good idea to do this.

2021-12-01 33556, 2021

10:53 AM
lucifer

*too large

2021-12-01 33557, 2021

10:54 AM
alastairp

so, let's try your previous dump, including the weights, directly from frank and see if it's any faster

2021-12-01 33507, 2021

10:55 AM
BrainzGit

[bookbrainz-site] 14MonkeyDo merged pull request #720 (03master…fix#BB-511): Fix[BB-511]: sorted identifiers alphabetically https://github.com/bookbrainz/bookbrainz-site/pul…

2021-12-01 33527, 2021

10:55 AM
lucifer

sure, should we try now or wait for ruaok to test using api first?

2021-12-01 33550, 2021

10:55 AM
ruaok

I'm still making the MBID playlist (making new troi elements for it), so I'd say don;t wait.

2021-12-01 33505, 2021

10:56 AM
ruaok

hopefully done soon with this step.