I should do those graphs, lossy stuff has been growing a lot more lately
kepstin-laptop
my lossless stuff hasn't finished yet
this data set's gonna be a bit more weighted towards japanese pop than most, i think ;)
ianmcorvidae
haha
probably got more estonian hip-hop than the average dataset just by my 6 CDs worth :P
kepstin-laptop
well, it can only improve the results, right?
ianmcorvidae
yup!
I was thinking I should write a crappy recommender to kick us off
something obviously terrible like levenshtein distance of the JSON :P
kepstin-laptop
hmm, something with just the low-level data? Could do something silly like just match bpm and key
you like this song in C# major at 140bpm, so you'll obviously like this other one too!
ianmcorvidae
that's far more sophisticated than I was thinking XD
I mean, I'm really thinking in the vein of making a truly terrible recommender that anyone can do better than, because I want to goad them into doing so :P
CallerNo6
listeners who like songs with "satan" in the title will probably like other songs with "satan" in the title?
kepstin-laptop wonders if there's something really silly and easy you could do which would on average perform worse than random matching.
ianmcorvidae
hah
CallerNo6
I've been assured that nobody's smart enough to be wrong all the time. But it can't hurt to try?
kepstin-laptop
doesn't have to be all the time
just on average :)
(if you actually got it wrong all the time, you could presumably just flip your rating and get something actually useful)
alastairp: do you have a sec to talk about jesus christ your lord and saviour?
er wait.
how about the schema for the highlevel table? :)
alastairp
I can see how you might confuse them
ruaok
in particular I'm thinking of what version info we should track.
alastairp
they're both world-changing
ruaok
heh. :)
alastairp
are you at the lab, or will do we do it here?
ruaok
here. mom is in town and I only have half days while aleta baby-sits mom.
ruaok wishes he was in the lab
alastairp
I don't know what features or algorithms high-level will be in the output
ruaok
yeah, that too.
so, my inclinatio is to store: json, timestamp and essentia_git_sha
since, I am thinking that only the AB server should ever calculate high level stuff.
is that even a reasonable assumption?
alastairp
split per algorithm?
ruaok
ideally, but I just don't know if the essentia codebase is really ready for that/
I think we may just need to start with one version and get a move on.
the good thing is that we can re-calculate this at any time.
alastairp
right. that'd be a good start then
ruaok
ok, I'll get moving on that.
any signs of dima?
alastairp
if there are many algorithms, there's no difference between 1 binary that spits out lots of bits of json, and many binaries that each spit out their own
no, but he normally does afternoons, I think
I'll try and grab him as soon as I can
ruaok returns from a mom interruption
I have to put out some ssl fires on freesound first, but back to this asap
do you want to do antying about highlevel_json / raw_json table namess?
ruaok
unsure.
we are not likely to need the split and view as we do for the lowlevel stuff.
first question is if ianmcorvidae intended for all the json to go into one table.
my gut instinct says to use two tables.
for scalability.
and then deciding on the names.
alastairp
right
ruaok
but ianmcorvidae is sleeping, right now.
but assuming you're ok with the columns in said tables, I'll press on for now.
changing table names during the review phase is easy.
combining tables less so, but I think having two tables is desireable.
we're not losing anything having separate tables.
alastairp
yes, I think 2 is a good idea
otherwise, fine
ruaok
ok, I'll keep moving then.
not sure I can get a PR up for the high level stuff today, but I'll try.
hm.
I'll build no locking support into the highlevel stuff.
I'm going to assume that there will be one master program that looks at the DB, determines which highlevel data needs to be calculated, fires off a thread that will then calculate the highlevel data.
it then takes ending threads and stores the data into the DB>
Nyanko-sensei joined the channel
ardoRic
does the vm update the musicbrainz-server code automatically, or should I check it out again ?
ruaok
just do a git pull on it.
it doesn't update automatically
KillDaBOB_ joined the channel
chirlu` joined the channel
KillDaBOB joined the channel
Nyanko-sensei joined the channel
ijabz1 joined the channel
kepstin-laptop
so, >100k recordings now :)
alastairp
this is great. 10% of our target in 5 days
at this rate that'll be ~400k by the end of the month, so if we get more people running it in the coming week I think 500k or more is really doable
kepstin-laptop
I've just about hit all the music I have now, though.
keeping the rate up probably really requires getting more people to run the tool :)
alastairp
right, but the only reason we've not opened this up wider is that the tools still have problems
rob is confident, and I agree with him, that we can dump this tool on 2-4x as many people immediately
which will keep up our submission speed
kepstin-laptop has started to run it on the stuff he has only has lossy formats now
kepstin-laptop
(which is a bunch of touhou arranges, mostly)
Nyanko-sensei joined the channel
ruaok
in fact, I think we should start tapping people on the shoulders quietly and ask them to jump in.
alastairp
right
ruaok
we need to get derwin in on this.
nikki is still working on her stuff
nikki
although when I'll be able to actually run it on *all* of my music is another question
ijabz1
if we can get either an osx or windows version available soon will be alot easier to get more users
nikki
(right now I can't do korean stuff, because apparently linux has a bug in its support for korean filenames on hfs filesystems)
JesseW joined the channel
ruaok
ijabz1: that is our goal for friday, if at all possible
ijabz1
great
jesus2099_ joined the channel
alastairp
i wish
LordSputnik
btw, have about 12k lossless tracks for scanning - are there instructions anywhere? :)