ruaok: not sure i understand what's wrong with those tracks. acousticbrainz.org/b58da12b-3182-4afc-b5ff-764687… shows 10 submissions, 9 of which are 185 bpm and the dataset hoster shows 184 so checks out. am i missing something here?
those outlier peaks on the right side of the graphs? I'd guess those are wrong.
2021-12-01 33542, 2021
ruaok
(through no fault of your own)
2021-12-01 33532, 2021
yvanzo
O’Moin
2021-12-01 33545, 2021
ruaok
moin, yvanzo!
2021-12-01 33529, 2021
ruaok
lucifer: it might be a good idea to pull up some of the tracks that make up mokey's right hand spike and see if those tracks are all correct. I rather doubt it.
so I'm thinking out loud - there are a few fields on the bpm histogram that might be useful. e.g. I'm looking at `bpm_histogram_first_peak_weight`, given the name of the field it might indicate how "strong" a bpm is
2021-12-01 33546, 2021
monkey
Maybe the alg didn't get fed enough African rhythms
2021-12-01 33552, 2021
ruaok
lucifer: thanks. I'll turn that into a playlist for inspection in a bit.
2021-12-01 33508, 2021
alastairp
in which case we could remove items which have less certainty
2021-12-01 33535, 2021
alastairp
I think monkey might be on to something too - I suspect it's pretty good for pop/rockish songs, so maybe we could filter out there as well
2021-12-01 33553, 2021
ruaok
also remind of the problem where the alg might pick a wrong range... something about BPM twice or half of the stated value....
2021-12-01 33554, 2021
alastairp
as a very basic filter, _maybe_ we could assume that a fast track is also loud?
2021-12-01 33525, 2021
alastairp
yes - exactly that. sometimes it might pick a value twice or half, due to misidentifying the peaks
2021-12-01 33544, 2021
ruaok
ok, so a possible approach is to identify these cases and then adjust BPM?
note that the bpm was identified as 125 once and 185 9 other times
2021-12-01 33531, 2021
lucifer
*123 once
2021-12-01 33543, 2021
alastairp
this is one of the reasons why we return the 1st and 2nd histogram peak, too. it's possible that we could ignore items where these 2 peaks are close (and therefore the algorithm is uncertain about which one to choose)
2021-12-01 33551, 2021
ruaok
it is clearly not 125 either.
2021-12-01 33552, 2021
alastairp
lucifer: are you doing any filtering/processing on this data?
2021-12-01 33557, 2021
lucifer
nope
2021-12-01 33501, 2021
ruaok
185 / 2, quite possible.
2021-12-01 33510, 2021
monkey
Maybe it'll be worth doing some sorting by mood instead of BPM?
actually yes, alastairp. selecting the bpm which is closest to the mean of all bpms of that recording mbid.
2021-12-01 33531, 2021
ruaok
mood is a much higher level of data, this even more unreliable, monkey
2021-12-01 33545, 2021
ruaok
I trust nothing of the high level data in AB.
2021-12-01 33518, 2021
alastairp
the cantelows example - I could imagine that it's finding the high plucked notes as "beats", and therefore miscalculating the BPM
2021-12-01 33540, 2021
monkey
I mean, BPM doesn't look super reliable or appropriate for what we're doing, but strong confidence in Aggressive/Not Aggressive might make for a better sorting
2021-12-01 33508, 2021
alastairp
lucifer: note that the ?n=0 submission has 123 as the first peak and 185 as the 2nd peak
2021-12-01 33525, 2021
monkey
That was mostly what the rollercoaster effect was for me. Calm song followed by agressive one.
2021-12-01 33502, 2021
lucifer
alastairp: currently i am ignoring those fields just using bpm field. i have those peak fields available as well in the dump if we want to try some stuff out.
lucifer: any chance you could make a dump of the mood as you did for BPM/key?
2021-12-01 33520, 2021
lucifer
ruaok: a nicer version to import to bono for playlist
2021-12-01 33550, 2021
lucifer
sure, can do but it'll probably take a long time. took 2 days to dump bpm/key.
2021-12-01 33500, 2021
ruaok
shit.
2021-12-01 33505, 2021
lucifer
maybe if i dump of frank it'll be faster? saving network trips to kiss.
2021-12-01 33508, 2021
ruaok
ok, then I'll try fetching from AB for testing.
2021-12-01 33535, 2021
lucifer
yeah that sounds better. if something pans out, we can do a full dump.
2021-12-01 33551, 2021
alastairp
dortmund and rosamerica are definitely "better" genres, but they have very few categories, we'd be better off finishing the genre import and using tags instead
well, we could always dump moods + weights at the same time
2021-12-01 33507, 2021
lucifer
yup that's possible
2021-12-01 33539, 2021
alastairp
(although moods require a few joins into separate tables)
2021-12-01 33559, 2021
alastairp
lucifer: yes, I'd dump directly on frank
2021-12-01 33522, 2021
ruaok
could we create a dataset hoster that gives access to moods given a list of MBIDs?
2021-12-01 33546, 2021
lucifer
ab image doesn't have psql so i dumped through lb-web. i'll start a temporary postgres container on frank this time then.
2021-12-01 33552, 2021
ruaok
meaning that the dataset hoster queries frank and takes care of the picking of the instance of the MBID
2021-12-01 33510, 2021
lucifer
bono has a subset of ab db fwiw.
2021-12-01 33538, 2021
lucifer
this data is accessible through ab api so we could just use that.
2021-12-01 33556, 2021
alastairp
yes to both of those - we can prototype it on bono and directly connect to db, if it works, release ds hoster on kiss connecting to frankdb
2021-12-01 33559, 2021
lucifer
and opt the bono ip out of ratelimit if that becomes an issue.
2021-12-01 33505, 2021
ruaok
I'll work on the AB api for now -- if that shows promise we can expand on that.
2021-12-01 33515, 2021
ruaok
but first let me make a playlst from those MBIDs
2021-12-01 33510, 2021
alastairp
that being said, we already have the bulk get specific (ll) feature API for AB, which is basically that. not sure why we never finished the hl version of this
2021-12-01 33548, 2021
alastairp
lucifer: I sometimes use docker exec on the pg instance on frank to get a psql shell
wondering if pg_dump could be faster than \copy, alastairp
2021-12-01 33500, 2021
lucifer
that is pg_dump the whole table, transfer to michael and bring up a pg instance there. import the dump and let spark connect to the pg instance directly.
2021-12-01 33530, 2021
alastairp
ah, right. I'm not sure where the current slowdown is - is it due to getting a field from the json? or was it slow last time because of the round trip between two different servers?
2021-12-01 33550, 2021
alastairp
if you're after a pg dump
2021-12-01 33501, 2021
alastairp
frank /home/alastair/acousticbrainz-pgdump-2021-06-03.pgdump
2021-12-01 33557, 2021
lucifer
ah nice.
2021-12-01 33551, 2021
lucifer
the size of pgdumps it too so probably not a good idea to do this.
2021-12-01 33556, 2021
lucifer
*too large
2021-12-01 33557, 2021
alastairp
so, let's try your previous dump, including the weights, directly from frank and see if it's any faster