perhaps we need to have a test set of 100,000 tracks for testing that we can run quickly.
2021-12-01 33501, 2021
ruaok
and when that 100,000 tracks produces some decent data, do we open it up for wider submissions.
2021-12-01 33520, 2021
ruaok
and we really need to find people with large music collections and find a way to use them for bootstrapping a reboot.
2021-12-01 33519, 2021
ruaok
should we stop all AB work until we have a new plan in place?
2021-12-01 33546, 2021
reosarevok
What's the current state of AB alternatives?
2021-12-01 33508, 2021
ruaok
in what sense, reosarevok ? open alternatives to AB?
2021-12-01 33511, 2021
alastairp
I think that having a plan about what we want to do with AB is a good idea
2021-12-01 33514, 2021
reosarevok
As in, have researchers managed better algorithms elsewhere? I guess Spotify or whoever might internally, but nothing open?
2021-12-01 33523, 2021
alastairp
that is, just throwing data at a database isn't working
2021-12-01 33545, 2021
alastairp
reosarevok: there are many datasets with extracted features, based on algorithm x
2021-12-01 33553, 2021
ruaok
well, the algs to hand were promised to be much better than they are.
2021-12-01 33523, 2021
alastairp
but they fall into the same problem that caused us to start AB in the first place - that is, they're fixed in time, fixed in dataset size
2021-12-01 33541, 2021
ruaok
so, if we can't rely on the original premise of taking things from academia and putting them into production, then the whole value proposition of AB falls on its face.
2021-12-01 33505, 2021
reosarevok
Yeah, I was mostly wondering if there's something else that has already "replaced" AB
2021-12-01 33513, 2021
reosarevok
Or if what we have is the least bad there is
2021-12-01 33527, 2021
ruaok
nothing open. its a very large effort.
2021-12-01 33529, 2021
alastairp
and I think this comes back to the question of scale. When you're testing on 1000 items and you get good results, it's easy to say that it works wel
2021-12-01 33518, 2021
ruaok
which is why we need to start with something that runs in a reasonable amount of time, yet is representative of the whole picture.
2021-12-01 33546, 2021
reosarevok
I think it's fine if AB doesn't work well on African music yet, but then we should be saying "hey, we know we have this issue, who has a large, diverse collection of African music and is willing to help us with it"
2021-12-01 33558, 2021
reosarevok
I guess the problem is the whole picture is so absurdly wide
2021-12-01 33554, 2021
alastairp
reosarevok: right, but the question here also is if it's just a matter of collecting the data, or if you actually have to perform research on field x in order to learn how to deal with African music
2021-12-01 33555, 2021
ruaok
reosarevok: the problem is, how do you tell someone which parts work and which don't?
2021-12-01 33522, 2021
ruaok
if I call and API I expect it to work or I expect to see a confidence rating of the quality of the results. We have few tools to provide such things.
2021-12-01 33526, 2021
alastairp
which was one of the big premises of compmusic - that you needed specific algorithms
2021-12-01 33503, 2021
reosarevok
Well, your users should tell you what looks wrong, I guess :) But yeah, it's hard to use that programmatically
2021-12-01 33528, 2021
alastairp
for example, one change to essentia since we released AB is that it now gives 5 different key estimations - as we realised that the one "standard" model that we thought worked well really only worked well on a small subset of data
2021-12-01 33542, 2021
reosarevok
I would be surprised if the same algorithms which can work with EDM, jazz and classical break with African music
2021-12-01 33550, 2021
reosarevok
But it might be that they don't work as well on that either :)
2021-12-01 33555, 2021
alastairp
but also, at the scale of AB, knowing 1 piece of data is wrong in 10m files doesn't give a huge amount of context
2021-12-01 33511, 2021
ruaok
reosarevok: for instance, I can't use AB in any of my playlist work. I doubt anyone else could.
2021-12-01 33529, 2021
reosarevok
ruaok: is it meant for that though? I thought it was meant to mostly just be a long-term slow research project :)
2021-12-01 33538, 2021
ruaok
if they do, they are getting shit results. and just think of how many people have already done research based on AB. clearly without vetting the results.
2021-12-01 33557, 2021
ruaok
yes, the idea was to have results after 5 years.
2021-12-01 33520, 2021
reosarevok
I'd expect research to be done as in "we ran AB on this huge collection of African music and it worked / didn't work and this is what we saw"
2021-12-01 33525, 2021
reosarevok
But I guess that might not be happening
2021-12-01 33555, 2021
alastairp
I think that one big problem with AB is that we thought "oh yes, it can follow the research as it improves", but then we didn't make it follow essentia upgrades
2021-12-01 33514, 2021
reosarevok
What's the main problem with making it follow upgrades? That it needs to re-scan everything?
2021-12-01 33537, 2021
ruaok
alastairp: do you have any faith that updated essential algs would actually scale better?
2021-12-01 33537, 2021
alastairp
yes - a combination of technical and social hurdles
2021-12-01 33506, 2021
alastairp
I'm sure that current essentia algorithms are "better" than the AB ones
2021-12-01 33513, 2021
alastairp
but we're stuck on the definition of better
2021-12-01 33538, 2021
alastairp
on 20m tracks there are still going to be awful results
2021-12-01 33519, 2021
reosarevok
Yeah. How doable it is to *know* they are awful? I understand the automatic confidence isn't always great?
2021-12-01 33534, 2021
ruaok
I think if we continue with AB we need to make things "algorithms first".
2021-12-01 33558, 2021
ruaok
first prove out that an algorithm works and scales well. then adopt it into AB and run it over data.
2021-12-01 33501, 2021
lucifer
80% accuracy on 20m is still 4m tracks wrong.
2021-12-01 33512, 2021
reosarevok
Also, how doable would it be to combine submitting LB listens with AB submission?
2021-12-01 33524, 2021
ruaok
reosarevok: not doable at all.
2021-12-01 33527, 2021
reosarevok
For people running local plugins on like VLC or something
2021-12-01 33528, 2021
alastairp
that's what I was looking at this morning on the BPM algorithms - I had hoped that the histogram strength would show us when there was uncertancy - but in many cases it was pretty confident at its result
2021-12-01 33538, 2021
ruaok
90% of our listens come from spotify.
2021-12-01 33508, 2021
lucifer
spotify provides an audio analysis api fwiw so we could get that data for comparision with ab at least.
2021-12-01 33524, 2021
reosarevok
ruaok: sure, I'm asking for old school people
2021-12-01 33526, 2021
alastairp
and hence 95% of research is data management and evaluation
2021-12-01 33544, 2021
reosarevok
Since I'm assuming we won't be getting access to all of Spotify :p
2021-12-01 33524, 2021
reosarevok
Ideally of course something like AB would have an agreement with something like Spotify, but I assume everyone in that market already has their own inhouse stuff and are not willing to help anybody else
2021-12-01 33508, 2021
reosarevok
lucifer: 80% accuracy is probably the most you can hope for, really - I mean, people shouldn't expect magic when using automatic stuff
2021-12-01 33525, 2021
reosarevok
If you want perfection, use human-built playlists
2021-12-01 33555, 2021
lucifer
reosarevok: indeed and looking at the research paper that descibes the current ab algorithm, my understanding is that 80% accuracy is the best case.
2021-12-01 33555, 2021
reosarevok
My Spotify release radar for example is a huge mess, playlist-wise, so either they don't even try to sort it, or they do a terrible job of it
2021-12-01 33534, 2021
reosarevok
(it's usually full of "rap-classical-metal-rap-classical" in random orders like that)
2021-12-01 33549, 2021
CatQuest
hah
2021-12-01 33505, 2021
ruaok
I think I am going to spend some time working out if the annoy stuff has any utility. because to date, I haven't been convinced of that.
2021-12-01 33509, 2021
CatQuest
btw I mean I would happily submit ab stuff
2021-12-01 33510, 2021
reosarevok
So maybe the main issue is not AB data as much as expectations
2021-12-01 33525, 2021
CatQuest
also yes, i mean it's automated
2021-12-01 33538, 2021
reosarevok
alastairp: how often is essentia updated?
2021-12-01 33551, 2021
CatQuest
-i as thinking that having a way for letting users on eg mb feedback ab data shown on recordings might be usefull?
2021-12-01 33504, 2021
reosarevok
I don't think it'd be doable to ask people to resubmit more than once a year or so, but having a new data version every year might not be that bad?
2021-12-01 33510, 2021
alastairp
reosarevok: we try and keep it up to date with new updates to algorithms as they are released
2021-12-01 33518, 2021
alastairp
but again, only a few people involved in doing that
2021-12-01 33521, 2021
CatQuest
like if al to of people like "downvote" a bpm tag from ab
2021-12-01 33536, 2021
reosarevok
alastairp: sure, but how often are algorithms released? :D
CatQuest's point isn't bad either, the more data we show (as "we don't know if this is good") in MB and elsewhere, the more we could find where we have stuff that just looks bad
2021-12-01 33541, 2021
CatQuest
:D
2021-12-01 33541, 2021
alastairp
improvements happen all the time. but sometimes that improvement is "we no longer screw up on this small part of this test dataset"
2021-12-01 33517, 2021
reosarevok
alastairp: so would it be doable to say "we package all new improvements for the year once a year, and offer a new version of AB that supports that, but needs re-scanning"?
2021-12-01 33525, 2021
ruaok
so, my feeling is that the only AB work that should be happening in the short term is to find new algs that are usable and making a plan for how to reboot.
2021-12-01 33527, 2021
alastairp
reosarevok: that was one of the original ideas
2021-12-01 33536, 2021
reosarevok
I guess it would make the data take a huuuge amount of space though if we have yearly versions of all the data?
2021-12-01 33509, 2021
CatQuest
archive old data?
2021-12-01 33512, 2021
alastairp
reosarevok: so maybe retire old versions? but then what do you do if an MBID gets processed with n-5 and never gets re-done. do you accept the old (maybe worse) version, or do you delete it?
2021-12-01 33529, 2021
CatQuest
mark it as old but keep
2021-12-01 33530, 2021
reosarevok
alastairp: maybe retire old versions *except* for stuff not in any newer version?
2021-12-01 33537, 2021
alastairp
reosarevok: yes, perhaps
2021-12-01 33551, 2021
reosarevok
And then allow people to optionally ask the API for "latest version, but fill the gaps with historical"?
2021-12-01 33502, 2021
CatQuest
show on mb that it needs to be rescanned. call to people for rescanning
2021-12-01 33503, 2021
reosarevok
So you can choose if you only want the latest, or all
2021-12-01 33507, 2021
CatQuest
also also, make scanning easier. much, much easier
2021-12-01 33529, 2021
CatQuest
oh i liek that idea reo
2021-12-01 33537, 2021
alastairp
reosarevok: that was my idea for what to do when we got a new version of the extractor. stop accepting the old one, when you request an mbid get the new one if it exists otherwise use the old one
2021-12-01 33551, 2021
CatQuest
having someay ot scna ab with picard would be :chef:
2021-12-01 33521, 2021
alastairp
this was always a long-term plan, but it relied on having AB dev resources, having a stable release cadence for essentia, etc, etc
2021-12-01 33531, 2021
CatQuest
:(
2021-12-01 33545, 2021
reosarevok
alastairp: sounds good, although I think we could still have a way to specifically say "I would rather get 0 results than old results"
2021-12-01 33513, 2021
reosarevok
CatQuest: there's a Picard plugin, but I dunno how well it works?
2021-12-01 33521, 2021
CatQuest
I still think it can happen. just. idk, lb is being prioritized now. if prioritizing ab will make lb better .I'm sure we cna do that
2021-12-01 33528, 2021
reosarevok
Or maybe that's just to *use* data
2021-12-01 33541, 2021
CatQuest
mhm
2021-12-01 33503, 2021
reosarevok
Oh, seems so
2021-12-01 33512, 2021
reosarevok
Anyway, I'm sure it's doable
2021-12-01 33551, 2021
reosarevok
alastairp: how many resources would that take? Are we talking "you spending a month a year on it"? or "needs a full-time person"?
2021-12-01 33524, 2021
reosarevok
If we update once a year, I'm assuming it needs one big push to make that multi-version system work, and then just some time to update every year?
reosarevok: I think that development work on AB to support this kind of feature extraction is probably only a few months of work, if that
2021-12-01 33553, 2021
alastairp
however, I think that building up QA for algorithms, making improvements, and rolling them out is a full time job for an entire data processing team
2021-12-01 33550, 2021
reosarevok
Oh, I mean, yes, I'd expect the QA would be "hey, our community has detected these issues, whoever wants to do some research using AB, you can look into improvements for that"
2021-12-01 33500, 2021
reosarevok
I can't expect we're going to be doing the algo improvements ourselves
2021-12-01 33509, 2021
alastairp
I'm skeptical that a feedback button on an AB page to collect issues would be useful for the long-term improvement of algorithms, though
2021-12-01 33541, 2021
CatQuest
we'll be training countless neuralnets to do it for us! :D
2021-12-01 33549, 2021
alastairp
a researcher can't do anything with an mbid and "this is wrong". Perhaps they could do more with mbid + bpm annotation (in the case of bpms)
2021-12-01 33517, 2021
CatQuest
that's what I meant. the "what is wrong" must be included
2021-12-01 33533, 2021
alastairp
because really, you'd need audio in order to make improvements (this is basically a dataset)
2021-12-01 33535, 2021
reosarevok
alastairp: I was expecting they could try to find the similarities between what kind of things are wrong, if there are enough reports
2021-12-01 33541, 2021
reosarevok
But that'd anyway involve a lot of reports :)
2021-12-01 33510, 2021
alastairp
I really don't have enough experience in this area to know if many reports of that form would be useful
2021-12-01 33513, 2021
reosarevok
as in "well, we have a lot of reports for music of genre X"
2021-12-01 33527, 2021
reosarevok
"so we should specifically try to find a good amount of genre X and see what we find"
2021-12-01 33529, 2021
reosarevok
But yeah, dunno
2021-12-01 33546, 2021
alastairp
yeah, I think that large collections of features -> genres is one of the things that AB _can_ do well.
2021-12-01 33521, 2021
alastairp
unfortunately, right about the time we released it, people got all in on deep learning, which requires orders of magnitude more features than what we put in AB
2021-12-01 33503, 2021
alastairp
and training models once you have more than ~1000 examples starts taking exponentially more time
2021-12-01 33529, 2021
alastairp
so again - in small sets of research data, the data + algorithms looked good, and really did give good results
2021-12-01 33501, 2021
alastairp
but if you try and apply a 8 class genre classifier to a million tracks, you're going to have problems real quick
2021-12-01 33512, 2021
reosarevok
Sure
2021-12-01 33542, 2021
reosarevok
So essentia isn't expected to be updated with deep learning algos?
So, remind me, how does AB work with essentia again?
2021-12-01 33532, 2021
reosarevok
Is essentia the bit that runs on the files locally, then submits the data up to AB?
2021-12-01 33546, 2021
reosarevok
(just wondering about the "orders of magnitude more features than what we put in AB")
2021-12-01 33526, 2021
alastairp
right. essentia is a library of algorithms (some are "process this audio into a representation that can be used for machine learning" and some are "give me the bpm of audio). there is a single binary which runs a bunch of different algorithms over an audio file (the 'music extractor'), which is the AB extractor. then the ab submitter takes the result of that and submits it
So "orders of magnitude more features than what we put in AB" just means "because we haven't updated it"?
2021-12-01 33513, 2021
reosarevok
Or is there actually a hardcoded issue why AB can't support those?
2021-12-01 33520, 2021
alastairp
partially yes, just a matter of adding it (as I said at the beginning of this discussion, we had already started having a discussion about adding a new data type/extractor to AB)
2021-12-01 33518, 2021
alastairp
partially no - the more detailed data that you add, the easier it becomes to reverse that data back into audio
2021-12-01 33539, 2021
alastairp
so then the question of what AB is changes a bit - do you want it to just output single, good values? (accurate bpm, key, etc). if so, you can do this but then you can't use the data in the database to improve algorithms. you have to improve them on external collections of music, then roll out a new version, and do what we discussed about rotating old versions out
2021-12-01 33526, 2021
alastairp
or maybe you want it to be a collection of detailed features that allow people to use these features independently to build new models, new algorithms etc without needing to have access to large collections of audio
2021-12-01 33500, 2021
reosarevok
So now we're doing a) ?
2021-12-01 33525, 2021
reosarevok
Also, "add detailed data, and see what ungodly mess comes back when trying to turn it back into audio" sounds hilarious
2021-12-01 33516, 2021
alastairp
current AB is a bit of both - it includes specific features that required detailed audio data (bpm, key), but then it includes the chroma features which are used for training new models
2021-12-01 33514, 2021
alastairp
https://gist.github.com/bmcfee/a40c3ab83f166a3892… this is an interesting experiment doing exactly that - we have some demo pages somewhere that allow us to play back the reproduced audio, let me see if I can find it
2021-12-01 33507, 2021
reosarevok
So the doubt is "how many more features can we allow before someone sues us for piracy"?