but this could be so amazing if it all comes together
mayhem
what is challenging about the datasets?
alastairp
just so many cool ideas
mayhem
I know that feeling, but not in that conext.
context.
alastairp
mostly that it's very easy to say what we want to do
but actually doing it takes a bunch more
artist filtering is a good example
"artist filtering" - 2 words!
mayhem
yet, ugly bag of nastiness.
alastairp
it touches dataset editor, training, access to the mb database
mayhem
what are you filtering artists on? what is the goal?
alastairp
the general accepted practise in machine learning is that you shouldn't have more than one example in a dataset by the same artist
because there is the chance of the algorithm learning the style of an artist, or worse, the production style of an album
so the idea is to let someone make a dataset, but when they go to the summary view, we can say "you have 150 items in this class, but once we compare to all the other classes you have we have to remove 48 of them because there are items in other class that share the same artist"
mayhem
and even there are tricky things. are you going to consider performed as artists? i.e. snoop dogg and calvin broadus in the same set.
alastairp
you should write a paper about this!
I would put money on the fact that no one has ever considered this
mayhem
thanks, but... no.
alastairp
:D
Freso
alastairp_: But we consider it all the time?
mayhem
not sure if this is misplaced optimism, but I think that the data sets that MeB/UPF produce are going to be far more interesting and thought out than other things that exist out there.
alastairp
anyway, in principal easy, but the live feedback in the dataset editor is somthing I'd really like, which opens up a few more questions
do we use metadata in the lowlevel files? get it from musicbrainz?
merged artist ids are the same thing
Freso: yeah, yeah
Freso
;)
alastairp
yeah, that's point 1.4
we already know that the hacky datasets that we made are better
but they're still not live
kyan joined the channel
alastairp_ has quit
alastairp_ joined the channel
alastairp_ has left the channel
zas: wiki http doesn't redirect to https. is that expected?
ZarkBit has quit
zas
nope (at least since we decided to move to https), can you create a ticket for it ?
alastairp
will do
Slurpee has quit
Freso: are we planning on splitting out the ideas page for SoC like last year's one?
if yes, I'd be happy to help you work towards that tomorrow morning
Freso
alastairp: I don't know.
alastairp: I didn't do the splitting last year, and I'm not really involved with GSoC this year.
alastairp
ah! ok
mayhem
alastairp: yes, that'd be nice.
we just need to lift the stuff from last year's page.
hibiscuskazeneko joined the channel
alastairp
mayhem: done
splitting now
I just copied everything directly. We [that is, not me] will need to edit stuff that's change
hibiscuskazeneko has quit
hibiscuskazeneko joined the channel
I hate mediawiki
someone else can fix picard :)
SothoTalKer
tsk
Freso
alastairp: There you go! SothoTalKer volunteered!
alastairp
thanks, SothoTalKer!
CallerNo6
alastairp, you can always dump mediawiki drudgery on me
on my plate? Idioms are hard.
SothoTalKer
oh, let's see whoever is faster. i need to prepare some food :)
ibrahimsharaf joined the channel
ibrahimsharaf
Hello developers
mayhem
CallerNo6: you did the awesome ideas page for last year, yeah?
I'd love the same treatment for this year, please.
In fact, it might be good to make a template for us to copy, so we don't reinvent the wheel each year.
CallerNo6
copy]
Freso
Hi ibrahimsharaf
ibrahimsharaf
I've been researching for GSoC 2017 ideas, and I've been interested in AcousticBrainz (New machine learning infrastructure)
I am good with C++, python, and I have basic ML knowledge
I've been playing with scikit learn for some time, solved some kaggle problems
alastairp
mayhem: I've already copied it
ibrahimsharaf
So how can I start?
alastairp
CallerNo6: this year picard is on the list. I tried to make a nice table with a same-size logo as all the other projects, but I failed
ibrahimsharaf: wow, I just wrote that 15 minutes ago!
ibrahimsharaf: we're still working out exactly what we want this project to involve
CatQuest
Freso: I see that as a weird kissyface emoticon o_O
gcilou
Freso: ?
alastairp
it's also worth noting that SoC doesn't start for a long time! We have some people interested in working on these projects before SoC starts
gcilou
Oh
Yeah
CatQuest
Soc?
alastairp
we'd love for people to participate in our projects outside of the program too
gcilou
Summer of code
alastairp
CatQuest: it's that time of year again
CatQuest
arg, stay with oe acronym!
one*
alastairp
it's always been SoC!
(it's not GCI)
gcilou
Or GSoc
CatQuest
before people used GosC
alastairp
oh
CatQuest
erh GSoC
alastairp
sorry, to me they're identical
CatQuest
ಠ_ಠ
ibrahimsharaf
@Freso I'll check the link out, thanks
Freso
Google Service on Chip
CatQuest
wow just learned if I shift alt on the - key i get —
oohh noo not the hypens!
Freso
That's not a hyphen.
SothoTalKer
endash
alastairp
ibrahimsharaf: if you are interested in AcousticBrainz development in general, it would be a good idea to follow the acousticbrainz-specific getting-started guide at https://wiki.musicbrainz.org/Development/Summer... and set up the server too
CatQuest
I've officially given up a long time ago so you might as well not bother :D
Slurpee joined the channel
Slurpee has quit
Slurpee joined the channel
SothoTalKer
the picard logo is bigger, that's why it also is bigger in the wiki
alastairp
yep, I got that far :)
SothoTalKer
gcilou was the one with the image editing skills, no? :D
gcilou
Si
bitmap
mayhem: do you think we should copy over any of the MB ideas from 2016? I don't remember anyone expressing interest in any of them last year...