#metabrainz

/

      • SothoTalKer has quit
      • gcrkrause1 has quit
      • SothoTalKer joined the channel
      • agatzk has quit
      • agatzk joined the channel
      • Zhele joined the channel
      • adhawkins_ joined the channel
      • adhawkins has quit
      • adhawkins_ is now known as adhawkins
      • CatQuest joined the channel
      • krishan joined the channel
      • [1997kB] joined the channel
      • Lotheric_ joined the channel
      • Lotheric has quit
      • BrainzGit
        [musicbrainz-server] 14santiagofn opened pull request #2353 (03master…patch-3): Give more details about changing PostgreSQL config https://github.com/metabrainz/musicbrainz-serve...
      • gcrkrause1 joined the channel
      • lucifer
        ruaok: not sure i understand what's wrong with those tracks. acousticbrainz.org/b58da12b-3182-4afc-b5ff-7646... shows 10 submissions, 9 of which are 185 bpm and the dataset hoster shows 184 so checks out. am i missing something here?
      • reosarevok
        lucifer: https://www.youtube.com/watch?v=4RAjifC9ji0 does not exactly sound like a 185 BPM party track to me
      • Does sound really nice tho :)
      • lucifer
        oh! i see. makes sense. yeah something not right there.
      • atj has quit
      • atj joined the channel
      • ruaok
        moooin!
      • lucifer: not your fault at all -- AB calculates the wrong BP for the track.
      • and supposedly BPM was the "good" data from AB, but its isn't reliable either. which really sucks.
      • when alastairp returns we'll have to ask if there is some improved algorithm in the wings, but I think for now we're dead in the water as far as BPM.
      • you graph might work, but it will be cute, not accurate.
      • those outlier peaks on the right side of the graphs? I'd guess those are wrong.
      • (through no fault of your own)
      • yvanzo
        O’Moin
      • ruaok
        moin, yvanzo!
      • lucifer: it might be a good idea to pull up some of the tracks that make up mokey's right hand spike and see if those tracks are all correct. I rather doubt it.
      • BrainzGit
        [musicbrainz-server] 14yvanzo merged pull request #2353 (03master…patch-3): Give more details about changing PostgreSQL config https://github.com/metabrainz/musicbrainz-serve...
      • alastairp
        morning
      • monkey
        Moin!
      • akshaaatt
        moin!
      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #719 (03master…fix#BB-630): fix[BB-630]: fix regex to match ~ in url https://github.com/bookbrainz/bookbrainz-site/p...
      • ruaok
        moin alastairp. feeling better?
      • alastairp
        ruaok: mas o menos. a lot better than monday
      • some interesting bpm stuff going on, I see
      • ruaok
        glad to hear that!
      • yeah, interesting is not he word I would use.
      • is there another BPM alg available that we could run the low level data through?
      • alastairp
        we can't re-process the existing AB data - any BPM algorithm would require more detailed data
      • ruaok
        I was afraid of that.
      • is there any data in AB that is reliable?
      • and what is the shortest path for us to make some data that is reliable?
      • alastairp
        as far as I'm aware, BPM is one of the most reliable types of data that we have. what problems are you seeing with it?
      • ruaok
      • this playlist is sorted by BPM -- from slow to fast back to slow.
      • Go to the playlist and jump to the middle of Catalowes, Sick Society and Lon.
      • blick bassey is consistently wrong in BPM.
      • then these peaks on the upper end of BPM? likely all wrong tracks. probably pointing to a systemic problem in the BPM detection.
      • *wrong BPM
      • basically, I was trying to use BPM to smooth out the rollercoaster problem. Instead I made it MUCH worse.
      • alastairp
        it's possible that there are a category of recordings for which the BPM detection doesn't work very well
      • ruaok
        seems that way. which makes the data wholly useless for recommendation purposes.
      • alastairp
        remember, the algorithm looks for "loud peaks", and then computes the difference between them, assuming that each peak is a beat
      • ruaok
        yep, understood
      • lucifer
      • ruaok: ^ the recordings making up that peak
      • alastairp
        so I'm thinking out loud - there are a few fields on the bpm histogram that might be useful. e.g. I'm looking at `bpm_histogram_first_peak_weight`, given the name of the field it might indicate how "strong" a bpm is
      • monkey
        Maybe the alg didn't get fed enough African rhythms
      • ruaok
        lucifer: thanks. I'll turn that into a playlist for inspection in a bit.
      • alastairp
        in which case we could remove items which have less certainty
      • I think monkey might be on to something too - I suspect it's pretty good for pop/rockish songs, so maybe we could filter out there as well
      • ruaok
        also remind of the problem where the alg might pick a wrong range... something about BPM twice or half of the stated value....
      • alastairp
        as a very basic filter, _maybe_ we could assume that a fast track is also loud?
      • yes - exactly that. sometimes it might pick a value twice or half, due to misidentifying the peaks
      • ruaok
        ok, so a possible approach is to identify these cases and then adjust BPM?
      • lucifer
      • note that the bpm was identified as 125 once and 185 9 other times
      • *123 once
      • alastairp
        this is one of the reasons why we return the 1st and 2nd histogram peak, too. it's possible that we could ignore items where these 2 peaks are close (and therefore the algorithm is uncertain about which one to choose)
      • ruaok
        it is clearly not 125 either.
      • alastairp
        lucifer: are you doing any filtering/processing on this data?
      • lucifer
        nope
      • ruaok
        185 / 2, quite possible.
      • monkey
        Maybe it'll be worth doing some sorting by mood instead of BPM?
      • BrainzGit
        [musicbrainz-android] 14akshaaatt opened pull request #97 (03master…year_in_music): YIM addition https://github.com/metabrainz/musicbrainz-andro...
      • lucifer
        actually yes, alastairp. selecting the bpm which is closest to the mean of all bpms of that recording mbid.
      • ruaok
        mood is a much higher level of data, this even more unreliable, monkey
      • I trust nothing of the high level data in AB.
      • alastairp
        the cantelows example - I could imagine that it's finding the high plucked notes as "beats", and therefore miscalculating the BPM
      • monkey
        I mean, BPM doesn't look super reliable or appropriate for what we're doing, but strong confidence in Aggressive/Not Aggressive might make for a better sorting
      • alastairp
        lucifer: note that the ?n=0 submission has 123 as the first peak and 185 as the 2nd peak
      • monkey
        That was mostly what the rollercoaster effect was for me. Calm song followed by agressive one.
      • lucifer
        alastairp: currently i am ignoring those fields just using bpm field. i have those peak fields available as well in the dump if we want to try some stuff out.
      • alastairp
        https://acousticbrainz.org/091ca532-17ed-4804-a... here's a track a few items before catelows
      • bpm_histogram_first_peak_weight: "mean": 0.484
      • compared to catelows: bpm_histogram_first_peak_weight "mean": 0.19
      • BrainzGit
        [musicbrainz-android] 14akshaaatt opened pull request #98 (03master…notifications): Notifications and NewsBrainz Setup https://github.com/metabrainz/musicbrainz-andro...
      • alastairp
        lucifer: you don't have the peak weights in the dump, by any chance
      • lucifer
        i'll check
      • alastairp
        I recommend that we try it again but only select items with a higher first peak weight
      • ruaok
        lucifer: I'm willing to try out the moods -- perhaps aggressive is one that would be ok.
      • lucifer
      • ruaok
        lucifer: any chance you could make a dump of the mood as you did for BPM/key?
      • lucifer
        ruaok: a nicer version to import to bono for playlist
      • sure, can do but it'll probably take a long time. took 2 days to dump bpm/key.
      • ruaok
        shit.
      • lucifer
        maybe if i dump of frank it'll be faster? saving network trips to kiss.
      • ruaok
        ok, then I'll try fetching from AB for testing.
      • lucifer
        yeah that sounds better. if something pans out, we can do a full dump.
      • alastairp
        dortmund and rosamerica are definitely "better" genres, but they have very few categories, we'd be better off finishing the genre import and using tags instead
      • lucifer
        alastairp: {"id":26804382,"recording_mbid":"b10a16f5-5413-4ab3-9ed4-b99bbf9cbc77","bpm":"97.3801803589","bpm_histogram_first_peak_bpm_mean":"98","bpm_histogram_first_peak_bpm_median":"98","bpm_histogram_second_peak_bpm_mean":"83","bpm_histogram_second_peak_bpm_median":"83","key_key":"A","key_scale":"major"}
      • these are the fields in the dump currrently
      • alastairp
        ah, drat
      • lucifer
        so no peak weights :/
      • alastairp
        well, we could always dump moods + weights at the same time
      • lucifer
        yup that's possible
      • alastairp
        (although moods require a few joins into separate tables)
      • lucifer: yes, I'd dump directly on frank
      • ruaok
        could we create a dataset hoster that gives access to moods given a list of MBIDs?
      • lucifer
        ab image doesn't have psql so i dumped through lb-web. i'll start a temporary postgres container on frank this time then.
      • ruaok
        meaning that the dataset hoster queries frank and takes care of the picking of the instance of the MBID
      • lucifer
        bono has a subset of ab db fwiw.
      • this data is accessible through ab api so we could just use that.
      • alastairp
        yes to both of those - we can prototype it on bono and directly connect to db, if it works, release ds hoster on kiss connecting to frankdb
      • lucifer
        and opt the bono ip out of ratelimit if that becomes an issue.
      • ruaok
        I'll work on the AB api for now -- if that shows promise we can expand on that.
      • but first let me make a playlst from those MBIDs
      • alastairp
        that being said, we already have the bulk get specific (ll) feature API for AB, which is basically that. not sure why we never finished the hl version of this
      • lucifer: I sometimes use docker exec on the pg instance on frank to get a psql shell
      • lucifer
        ah right, that should work too.
      • alastairp
        correct
      • json is stored in the toast data blob
      • lucifer
        hmm interesting. i am wondering if just pg_dump is enoguh.
      • oh so pg_total_size?
      • uh yeah. dumping whole table is not feasible.
      • alastairp
        that looks better
      • lucifer: what are you trying to do?
      • lucifer
        wondering if pg_dump could be faster than \copy, alastairp
      • that is pg_dump the whole table, transfer to michael and bring up a pg instance there. import the dump and let spark connect to the pg instance directly.
      • alastairp
        ah, right. I'm not sure where the current slowdown is - is it due to getting a field from the json? or was it slow last time because of the round trip between two different servers?
      • if you're after a pg dump
      • frank /home/alastair/acousticbrainz-pgdump-2021-06-03.pgdump
      • lucifer
        ah nice.
      • the size of pgdumps it too so probably not a good idea to do this.
      • *too large
      • alastairp
        so, let's try your previous dump, including the weights, directly from frank and see if it's any faster
      • BrainzGit
        [bookbrainz-site] 14MonkeyDo merged pull request #720 (03master…fix#BB-511): Fix[BB-511]: sorted identifiers alphabetically https://github.com/bookbrainz/bookbrainz-site/p...
      • lucifer
        sure, should we try now or wait for ruaok to test using api first?
      • ruaok
        I'm still making the MBID playlist (making new troi elements for it), so I'd say don;t wait.
      • hopefully done soon with this step.