we finally found out it was because mayhem was reading audio files from a slow disk compared to the one used to store database, hiding performance gain (which concern only database).
2024-01-16 01647, 2024
outsidecontext
ah, ok :) happy to test this later here as well
2024-01-16 01635, 2024
mayhem
new SSD arrives later this week, so that problem goes away. :)
2024-01-16 01635, 2024
Sophist-UK joined the channel
2024-01-16 01651, 2024
Sophist_UK has quit
2024-01-16 01617, 2024
rana_satyaraj
I'm new here, I have set up the ListenBrainz development environment, but having trouble finding something to work on. Can anyone point me in the right direction, maybe give me some tasks to do? It could be anything as long as it's coding.
2024-01-16 01614, 2024
mayhem
rana_satyaraj: hi! I'm looking but I can never find the "easy first bugs" label in jira
lucifer: Hello! Did you see LB-1455 by any chance? Wondering if it is due to how often we rebuild the cache or if there's something else going on there that prevents it being added to the cache.
zas: you can run create on an existing DB file and it will make the new table for you.
2024-01-16 01612, 2024
zas
great, but I think we'll still need better handling of schema updates at some point
2024-01-16 01627, 2024
mayhem
yep.
2024-01-16 01635, 2024
mayhem
I didn't think we'd need it that soon, lol.
2024-01-16 01646, 2024
zas
:D
2024-01-16 01633, 2024
zas
outsidecontext: The way we manage the catalog of audio files in listenbrainz-content-resolver could be done in Picard btw. In order to speed up music collection updates/tag resync etc
2024-01-16 01606, 2024
monkey
Oh boy. mayhem do I have a fun mapping pickle for you !
2024-01-16 01606, 2024
monkey
These two are not the same recording and not the same artist: pray (by Eve) and Pray (by EVE)
one of the things I was thinking about is that getting this right is... hard.
2024-01-16 01626, 2024
mayhem
I could get right results by changing window size. 5 or 10 seconds.
2024-01-16 01637, 2024
mayhem
but obviously that doesn't work in the real world.
2024-01-16 01659, 2024
mayhem
the thing I had always wondered about is using machine learning to really solve this problem.
2024-01-16 01621, 2024
musicListenerSam
hmm . perhaps instead of creating a voting classifier for multiple mdls we could begin with a voting clasifier for specif window sizes
2024-01-16 01627, 2024
musicListenerSam
thats would be a start
2024-01-16 01642, 2024
mayhem
not sure that is the right approach.
2024-01-16 01655, 2024
mayhem
I have a feeling that we should pick a middle of the road window size.
2024-01-16 01612, 2024
mayhem
and then no use a peak detector -- that part is the trickiest.
2024-01-16 01627, 2024
mayhem
what if instead the feed the generated data to something like a neural net?
2024-01-16 01647, 2024
mayhem
we'd need to build a decent training data set, with audio files and expected (verified) BMP values.
2024-01-16 01600, 2024
mayhem
then we can train a BMP classifier with that data.
2024-01-16 01604, 2024
mayhem
what do you think?
2024-01-16 01621, 2024
musicListenerSam
yup , we will surely need to start with the data
2024-01-16 01632, 2024
musicListenerSam
i think teh neural network is a right approach
2024-01-16 01657, 2024
musicListenerSam
the algos can often fail in more dymaic scenarios , where the nueural network thrives
2024-01-16 01634, 2024
musicListenerSam
training the BMP classifoeer with a good dataset would be a huge plus
2024-01-16 01606, 2024
riksucks has quit
2024-01-16 01621, 2024
musicListenerSam
ig i'll look into the dataset buliding for now then , ig spotify has a lot of bpm data ,or so ive heard
2024-01-16 01629, 2024
mayhem
it does.
2024-01-16 01637, 2024
mayhem
but I dont know if we can trust it.
2024-01-16 01646, 2024
arsh has quit
2024-01-16 01651, 2024
mayhem
AcousticBrainz has this data, but we can't rely on it.
2024-01-16 01634, 2024
vscode_ has quit
2024-01-16 01643, 2024
mayhem
so my take was to make a collection of releases, from many different genres and work out a BPM value for each track in the collection.
2024-01-16 01649, 2024
musicListenerSam
in that case , where else can we look for reliable sources of BMP data ?
2024-01-16 01600, 2024
musicListenerSam
hmm'
2024-01-16 01602, 2024
mayhem
what do you think is a good training dataset size for this?
2024-01-16 01652, 2024
mayhem
there are other algorithms out there. we could ownload as many as we can find, run then all, pull in AB/Spotify and if we get an agreement, the track goes into the collection.
2024-01-16 01615, 2024
mayhem
that might however, select for easy cases, so we may need to hand resolve the edge cases.
2024-01-16 01657, 2024
Shubh has quit
2024-01-16 01615, 2024
musicListenerSam
frankly speaking , if i take releases , average duration of each relsese 3 minutes , and since audio files are large sized in sonometry , i think 1 gb worth of data would be a good start . something that can be achoeved in the beginning
2024-01-16 01622, 2024
musicListenerSam
ya i agree with that
2024-01-16 01651, 2024
musicListenerSam
we could run a script matching the two , the data from the spotify api for bpm vs the algo and if its the same , it passes
2024-01-16 01653, 2024
ShivamAwasthi has quit
2024-01-16 01621, 2024
musicListenerSam
to enhance accuracy , we could parallely run mutiple agotihms and set a threshold for acceptance
2024-01-16 01644, 2024
mayhem
agreed.
2024-01-16 01649, 2024
musicListenerSam
that way we would have reduced number of false BMP results in the dataset .
2024-01-16 01652, 2024
Freso has quit
2024-01-16 01637, 2024
mayhem
let me see what I can do to collect this dataset.
2024-01-16 01635, 2024
musicListenerSam
as far as the edge cases are concerned , once we have a neural network that works on the larger chunk of data , certain cases should stand out , say soft music or some other case that the dataset misses . we can then work towards that data needs specifically , perhaps using attention modelling of some sort
2024-01-16 01636, 2024
mayhem
zas: are you following this convo?
2024-01-16 01651, 2024
musicListenerSam
shouldnt be that hard once we are at that point
2024-01-16 01624, 2024
mayhem
for ambient and classical music, ie. music without a clear beat, we should ideally say: Nope, can't determine BPM, rather than giving the wrong BPM.
2024-01-16 01646, 2024
zas
mayhem: yes
2024-01-16 01605, 2024
musicListenerSam
hmm , ill look into the dataset creation as well .
2024-01-16 01625, 2024
musicListenerSam
perhaps for clasical and amibent we can set a confidence fro the models prediction
2024-01-16 01626, 2024
mayhem
so, we're trying to come up with a machine learning BPM alg -- I have a feeling its been done before, but none of these approaches ever made it to open source.
2024-01-16 01639, 2024
musicListenerSam
below a certain prediction we just say , no BPM detected
2024-01-16 01648, 2024
mayhem
musicListenerSam: why don't you use my music service as source for music for now. let me worry about the dataset.
2024-01-16 01659, 2024
musicListenerSam
okay
2024-01-16 01606, 2024
mayhem
zas: your collection has more breadth than mine does.
2024-01-16 01607, 2024
mayhem
would you be willing to contribute 5 albums each from punk, jazz, metal for the training dataset?
2024-01-16 01644, 2024
zas
np, but genres like jazz & metal are rather fuzzy
2024-01-16 01604, 2024
mayhem
yep, understood. thye are just poorly represented in my collection.
2024-01-16 01606, 2024
zas
bpm of doom metal is near zero, while bpm of death metal is rather high
2024-01-16 01616, 2024
mayhem
which is why I want both.
2024-01-16 01629, 2024
mayhem
the more edge casesy sort of music you can help us with, the better.
2024-01-16 01659, 2024
musicListenerSam
hmm . ok so ig understood what i need to do next .(y) i'll ping you mayhem in case of any more exciting developements and if we feel the changes are an improvement we can add them to the bpm-detector repo
2024-01-16 01633, 2024
mayhem
yep. if you give me your github handle, I'll give you commit access to that repo.
2024-01-16 01643, 2024
mayhem
and tomorrow, I will start building a test dataset.
2024-01-16 01656, 2024
mayhem
should be good to start testing with tomorrow, but significant size will take some time still.
2024-01-16 01620, 2024
musicListenerSam
okay . sure
2024-01-16 01609, 2024
musicListenerSam
we'll start testing in small batches for now anyway , so the dataset size shouldnt be hurdle for now