perhaps we need to have a test set of 100,000 tracks for testing that we can run quickly.
and when that 100,000 tracks produces some decent data, do we open it up for wider submissions.
and we really need to find people with large music collections and find a way to use them for bootstrapping a reboot.
should we stop all AB work until we have a new plan in place?
reosarevok
What's the current state of AB alternatives?
ruaok
in what sense, reosarevok ? open alternatives to AB?
alastairp
I think that having a plan about what we want to do with AB is a good idea
reosarevok
As in, have researchers managed better algorithms elsewhere? I guess Spotify or whoever might internally, but nothing open?
alastairp
that is, just throwing data at a database isn't working
reosarevok: there are many datasets with extracted features, based on algorithm x
ruaok
well, the algs to hand were promised to be much better than they are.
alastairp
but they fall into the same problem that caused us to start AB in the first place - that is, they're fixed in time, fixed in dataset size
ruaok
so, if we can't rely on the original premise of taking things from academia and putting them into production, then the whole value proposition of AB falls on its face.
reosarevok
Yeah, I was mostly wondering if there's something else that has already "replaced" AB
Or if what we have is the least bad there is
ruaok
nothing open. its a very large effort.
alastairp
and I think this comes back to the question of scale. When you're testing on 1000 items and you get good results, it's easy to say that it works wel
ruaok
which is why we need to start with something that runs in a reasonable amount of time, yet is representative of the whole picture.
reosarevok
I think it's fine if AB doesn't work well on African music yet, but then we should be saying "hey, we know we have this issue, who has a large, diverse collection of African music and is willing to help us with it"
I guess the problem is the whole picture is so absurdly wide
alastairp
reosarevok: right, but the question here also is if it's just a matter of collecting the data, or if you actually have to perform research on field x in order to learn how to deal with African music
ruaok
reosarevok: the problem is, how do you tell someone which parts work and which don't?
if I call and API I expect it to work or I expect to see a confidence rating of the quality of the results. We have few tools to provide such things.
alastairp
which was one of the big premises of compmusic - that you needed specific algorithms
reosarevok
Well, your users should tell you what looks wrong, I guess :) But yeah, it's hard to use that programmatically
alastairp
for example, one change to essentia since we released AB is that it now gives 5 different key estimations - as we realised that the one "standard" model that we thought worked well really only worked well on a small subset of data
reosarevok
I would be surprised if the same algorithms which can work with EDM, jazz and classical break with African music
But it might be that they don't work as well on that either :)
alastairp
but also, at the scale of AB, knowing 1 piece of data is wrong in 10m files doesn't give a huge amount of context
ruaok
reosarevok: for instance, I can't use AB in any of my playlist work. I doubt anyone else could.
reosarevok
ruaok: is it meant for that though? I thought it was meant to mostly just be a long-term slow research project :)
ruaok
if they do, they are getting shit results. and just think of how many people have already done research based on AB. clearly without vetting the results.
yes, the idea was to have results after 5 years.
reosarevok
I'd expect research to be done as in "we ran AB on this huge collection of African music and it worked / didn't work and this is what we saw"
But I guess that might not be happening
alastairp
I think that one big problem with AB is that we thought "oh yes, it can follow the research as it improves", but then we didn't make it follow essentia upgrades
reosarevok
What's the main problem with making it follow upgrades? That it needs to re-scan everything?
ruaok
alastairp: do you have any faith that updated essential algs would actually scale better?
alastairp
yes - a combination of technical and social hurdles
I'm sure that current essentia algorithms are "better" than the AB ones
but we're stuck on the definition of better
on 20m tracks there are still going to be awful results
reosarevok
Yeah. How doable it is to *know* they are awful? I understand the automatic confidence isn't always great?
ruaok
I think if we continue with AB we need to make things "algorithms first".
first prove out that an algorithm works and scales well. then adopt it into AB and run it over data.
lucifer
80% accuracy on 20m is still 4m tracks wrong.
reosarevok
Also, how doable would it be to combine submitting LB listens with AB submission?
ruaok
reosarevok: not doable at all.
reosarevok
For people running local plugins on like VLC or something
alastairp
that's what I was looking at this morning on the BPM algorithms - I had hoped that the histogram strength would show us when there was uncertancy - but in many cases it was pretty confident at its result
ruaok
90% of our listens come from spotify.
lucifer
spotify provides an audio analysis api fwiw so we could get that data for comparision with ab at least.
reosarevok
ruaok: sure, I'm asking for old school people
alastairp
and hence 95% of research is data management and evaluation
reosarevok
Since I'm assuming we won't be getting access to all of Spotify :p
Ideally of course something like AB would have an agreement with something like Spotify, but I assume everyone in that market already has their own inhouse stuff and are not willing to help anybody else
lucifer: 80% accuracy is probably the most you can hope for, really - I mean, people shouldn't expect magic when using automatic stuff
If you want perfection, use human-built playlists
lucifer
reosarevok: indeed and looking at the research paper that descibes the current ab algorithm, my understanding is that 80% accuracy is the best case.
reosarevok
My Spotify release radar for example is a huge mess, playlist-wise, so either they don't even try to sort it, or they do a terrible job of it
(it's usually full of "rap-classical-metal-rap-classical" in random orders like that)
CatQuest
hah
ruaok
I think I am going to spend some time working out if the annoy stuff has any utility. because to date, I haven't been convinced of that.
CatQuest
btw I mean I would happily submit ab stuff
reosarevok
So maybe the main issue is not AB data as much as expectations
CatQuest
also yes, i mean it's automated
reosarevok
alastairp: how often is essentia updated?
CatQuest
-i as thinking that having a way for letting users on eg mb feedback ab data shown on recordings might be usefull?
reosarevok
I don't think it'd be doable to ask people to resubmit more than once a year or so, but having a new data version every year might not be that bad?
alastairp
reosarevok: we try and keep it up to date with new updates to algorithms as they are released
but again, only a few people involved in doing that
CatQuest
like if al to of people like "downvote" a bpm tag from ab
reosarevok
alastairp: sure, but how often are algorithms released? :D
CatQuest's point isn't bad either, the more data we show (as "we don't know if this is good") in MB and elsewhere, the more we could find where we have stuff that just looks bad
CatQuest
:D
alastairp
improvements happen all the time. but sometimes that improvement is "we no longer screw up on this small part of this test dataset"
reosarevok
alastairp: so would it be doable to say "we package all new improvements for the year once a year, and offer a new version of AB that supports that, but needs re-scanning"?
ruaok
so, my feeling is that the only AB work that should be happening in the short term is to find new algs that are usable and making a plan for how to reboot.
alastairp
reosarevok: that was one of the original ideas
reosarevok
I guess it would make the data take a huuuge amount of space though if we have yearly versions of all the data?
CatQuest
archive old data?
alastairp
reosarevok: so maybe retire old versions? but then what do you do if an MBID gets processed with n-5 and never gets re-done. do you accept the old (maybe worse) version, or do you delete it?
CatQuest
mark it as old but keep
reosarevok
alastairp: maybe retire old versions *except* for stuff not in any newer version?
alastairp
reosarevok: yes, perhaps
reosarevok
And then allow people to optionally ask the API for "latest version, but fill the gaps with historical"?
CatQuest
show on mb that it needs to be rescanned. call to people for rescanning
reosarevok
So you can choose if you only want the latest, or all
CatQuest
also also, make scanning easier. much, much easier
oh i liek that idea reo
alastairp
reosarevok: that was my idea for what to do when we got a new version of the extractor. stop accepting the old one, when you request an mbid get the new one if it exists otherwise use the old one
CatQuest
having someay ot scna ab with picard would be :chef:
alastairp
this was always a long-term plan, but it relied on having AB dev resources, having a stable release cadence for essentia, etc, etc
CatQuest
:(
reosarevok
alastairp: sounds good, although I think we could still have a way to specifically say "I would rather get 0 results than old results"
CatQuest: there's a Picard plugin, but I dunno how well it works?
CatQuest
I still think it can happen. just. idk, lb is being prioritized now. if prioritizing ab will make lb better .I'm sure we cna do that
reosarevok
Or maybe that's just to *use* data
CatQuest
mhm
reosarevok
Oh, seems so
Anyway, I'm sure it's doable
alastairp: how many resources would that take? Are we talking "you spending a month a year on it"? or "needs a full-time person"?
If we update once a year, I'm assuming it needs one big push to make that multi-version system work, and then just some time to update every year?
BrainzGit
[troi-recommendation-playground] 14mayhem opened pull request #41 (03main…year-review): Year in music and a whole pile of other general development https://github.com/metabrainz/troi-recommendati...
alastairp
reosarevok: I think that development work on AB to support this kind of feature extraction is probably only a few months of work, if that
however, I think that building up QA for algorithms, making improvements, and rolling them out is a full time job for an entire data processing team
reosarevok
Oh, I mean, yes, I'd expect the QA would be "hey, our community has detected these issues, whoever wants to do some research using AB, you can look into improvements for that"
I can't expect we're going to be doing the algo improvements ourselves
alastairp
I'm skeptical that a feedback button on an AB page to collect issues would be useful for the long-term improvement of algorithms, though
CatQuest
we'll be training countless neuralnets to do it for us! :D
alastairp
a researcher can't do anything with an mbid and "this is wrong". Perhaps they could do more with mbid + bpm annotation (in the case of bpms)
CatQuest
that's what I meant. the "what is wrong" must be included
alastairp
because really, you'd need audio in order to make improvements (this is basically a dataset)
reosarevok
alastairp: I was expecting they could try to find the similarities between what kind of things are wrong, if there are enough reports
But that'd anyway involve a lot of reports :)
alastairp
I really don't have enough experience in this area to know if many reports of that form would be useful
reosarevok
as in "well, we have a lot of reports for music of genre X"
"so we should specifically try to find a good amount of genre X and see what we find"
But yeah, dunno
alastairp
yeah, I think that large collections of features -> genres is one of the things that AB _can_ do well.
unfortunately, right about the time we released it, people got all in on deep learning, which requires orders of magnitude more features than what we put in AB
and training models once you have more than ~1000 examples starts taking exponentially more time
so again - in small sets of research data, the data + algorithms looked good, and really did give good results
but if you try and apply a 8 class genre classifier to a million tracks, you're going to have problems real quick
reosarevok
Sure
So essentia isn't expected to be updated with deep learning algos?
So, remind me, how does AB work with essentia again?
Is essentia the bit that runs on the files locally, then submits the data up to AB?
(just wondering about the "orders of magnitude more features than what we put in AB")
alastairp
right. essentia is a library of algorithms (some are "process this audio into a representation that can be used for machine learning" and some are "give me the bpm of audio). there is a single binary which runs a bunch of different algorithms over an audio file (the 'music extractor'), which is the AB extractor. then the ab submitter takes the result of that and submits it
So "orders of magnitude more features than what we put in AB" just means "because we haven't updated it"?
Or is there actually a hardcoded issue why AB can't support those?
alastairp
partially yes, just a matter of adding it (as I said at the beginning of this discussion, we had already started having a discussion about adding a new data type/extractor to AB)
partially no - the more detailed data that you add, the easier it becomes to reverse that data back into audio
so then the question of what AB is changes a bit - do you want it to just output single, good values? (accurate bpm, key, etc). if so, you can do this but then you can't use the data in the database to improve algorithms. you have to improve them on external collections of music, then roll out a new version, and do what we discussed about rotating old versions out
or maybe you want it to be a collection of detailed features that allow people to use these features independently to build new models, new algorithms etc without needing to have access to large collections of audio
reosarevok
So now we're doing a) ?
Also, "add detailed data, and see what ungodly mess comes back when trying to turn it back into audio" sounds hilarious
alastairp
current AB is a bit of both - it includes specific features that required detailed audio data (bpm, key), but then it includes the chroma features which are used for training new models
https://gist.github.com/bmcfee/a40c3ab83f166a38... this is an interesting experiment doing exactly that - we have some demo pages somewhere that allow us to play back the reproduced audio, let me see if I can find it
reosarevok
So the doubt is "how many more features can we allow before someone sues us for piracy"?