this is interesting, because I have no idea exactly how unique it's going to be
2014-10-10 28354, 2014
alastairp
different build of ffmpeg, same file? maybe different
2014-10-10 28359, 2014
alastairp
mp3, flac, definitely different
2014-10-10 28312, 2014
alastairp
mp3, different mp3? maybe different
2014-10-10 28321, 2014
ianmcorvidae
well, the notion is that if the data's completely the same it's not worth keeping both
2014-10-10 28330, 2014
ianmcorvidae
if there's more that should go into that calculation then that's also fine
2014-10-10 28334, 2014
alastairp
yes, true
2014-10-10 28339, 2014
ianmcorvidae
just trying to do better than "keep the last 5 that happened to be submitted"
2014-10-10 28350, 2014
alastairp
right
2014-10-10 28353, 2014
ianmcorvidae
(or "keep only the first, or the first lossless")
2014-10-10 28316, 2014
alastairp
it's just that "the same" in terms of features can be different
2014-10-10 28324, 2014
alastairp
I agree that "exactly the same" is useless
2014-10-10 28338, 2014
ianmcorvidae
well, sure, though this is on the whole JSON data
2014-10-10 28339, 2014
alastairp
but it's a big change (and potentially computationally expensive) for just that
2014-10-10 28353, 2014
ianmcorvidae
so it should also change for things like build differences etc., with what I have
2014-10-10 28302, 2014
alastairp
right, I suspect that this will only dedup the exact same person runnig it twice
2014-10-10 28306, 2014
alastairp
yeah, it'll change on build
2014-10-10 28325, 2014
alastairp
it makes me feel a little funny, so I'll wait for rob to weigh in
2014-10-10 28329, 2014
alastairp
(thanks though!)
2014-10-10 28341, 2014
alastairp
I'm just fixing the json exporter for you
2014-10-10 28344, 2014
ianmcorvidae
just in terms of it not creating much uniqueness?
2014-10-10 28347, 2014
alastairp
yeah
2014-10-10 28320, 2014
ianmcorvidae
fair enough -- I think something like splitting things up a bit might make sense eventually -- such that smaller things are stored individually (such that each version thing only needs storing once, for example, but also if the whole lowlevel category comes out the same, or so -- not sure exactly where to break it up
2014-10-10 28307, 2014
ianmcorvidae
which is the way to make this catch things better, I think, isolate "the tags changed" from "the build changed" from "the features changed"
I understand why he didn't escape keys - they're supposed to all come from internal code and you should never name a pool (essentia term for a key) with that
2014-10-10 28333, 2014
ianmcorvidae
yeah, makes sense
2014-10-10 28334, 2014
alastairp
but in the case of tags it just gets everything from taglib and dumps it there
2014-10-10 28305, 2014
ianmcorvidae
I figured it was something like that XD
2014-10-10 28351, 2014
alastairp
ok, fixed in another branch, you can merge it if you want
2014-10-10 28327, 2014
alastairp
I need to redo the branches, one for each of my fixes and one combining everying for us guys - it's because I want dmitry to be able to pull what he wants into master
2014-10-10 28302, 2014
alastairp
unfortunately it might mean we end up with hashids in abz that are no longer in the tree. that'll be annoying
next thing I want to do when we have more data is meta-stats over the mb database
2014-10-10 28327, 2014
alastairp
how many complete albums, how much of an artist's collection
2014-10-10 28346, 2014
alastairp
then meta-meta stats. how many pop albums as determined by lastfm tags (when are we getting genres?)
2014-10-10 28324, 2014
ijabz1
alistairp the only mb data you are storing is mbrecordingid or are you storing acoustid as well ?
2014-10-10 28319, 2014
ijabz1
in acoustbrainz ?
2014-10-10 28321, 2014
alastairp
only recordingid
2014-10-10 28326, 2014
alastairp
in the case where there is no recordingid tag in the file I want to do an acoustid lookup, it would be fine to add that as additional metadata
2014-10-10 28341, 2014
ijabz1
I just wonder because one acoustid can match multiple recordingsids, and when that is the case there is the chace that the mapping to recordingid is wrong
2014-10-10 28347, 2014
alastairp
once I add that functionality we can also add an option to always submit acoustid if the person has it installed
2014-10-10 28348, 2014
alastairp
right
2014-10-10 28310, 2014
alastairp
ok, so if we find recordingid by fingerprinting it's a good idea to submit acoustid too
2014-10-10 28310, 2014
ijabz1
having the acoustid would allow you to postcheck bad data at a later date
imho, matching acoustid isn't a good idea, it would mean some kind of autotagging, which will lead to many errors (acoustid associated with incorrect recording, acoustid matching multiple recordings, etc...), imho you should just encourage people to tag their files using Picard (which is using acoustid, but user can check if correct)
2014-10-10 28301, 2014
ijabz1
Picard does autotag, i dont think people are going to want to start retagging their collection in order to contribute to acoustbrainz
2014-10-10 28344, 2014
ijabz1
im just saying that if their files already contain an acoustid its useful to send that as it helps verifies that the data is correct or indeed incorrect
2014-10-10 28306, 2014
alastairp
agreed. I would propose matching with acoustid but marking the data as such
2014-10-10 28325, 2014
alastairp
e.g. "I'm happy to deal with potentially bad files" or "I only want almost certain files"
2014-10-10 28343, 2014
ijabz1
Maybe, thats not really what Im saying though, Ill try again.
2014-10-10 28339, 2014
ijabz1
If the songs have an mbrecordingid then they have already been tagged by some method be that Picard, SongKong ectera
2014-10-10 28312, 2014
ijabz1
You just need the mbrecordingid to serve as the key, but if the user has addtional metadata such as acoustid already in the file then they should send
2014-10-10 28341, 2014
ijabz1
that as well, this helps verify at a some later stage if see bad data
2014-10-10 28335, 2014
ijabz1
e.g, The Acoustid for that MBRecordingid matches to many MBRecordingIds, higher risk
2014-10-10 28315, 2014
ijabz1
or vice versa none of the Acoustids known for that MBRecordingid match the one user sent by the user, higher risk
2014-10-10 28356, 2014
zas
it remembers me http://musicbrainz.org/release/09186fe9-18af-47e5… where acoustids on both discs are the same, second disc has tracks without main voices... ;) acoustids are totally messed up on this one, kinda expected
2014-10-10 28343, 2014
zas
i wonder how track 1-1 and track 2-2 can share the same acoustid (now)
2014-10-10 28328, 2014
alastairp
ijabz1: we send every tag that taglib finds
2014-10-10 28338, 2014
zas
i mean track 1-2 and 2-1
2014-10-10 28304, 2014
alastairp
if taglib tells us there is a tag for acoustid, it'll get sent (I'm not sure if this means that taglib needs to know how to parse an acoustid tag)
alastairp: Always storing AcoustIDs wouldn't be bad either, since recordings do sometimes need to be split up.
2014-10-10 28339, 2014
Freso
Also, caught up with back log: nvm. ;)
2014-10-10 28343, 2014
ijabz1 joined the channel
2014-10-10 28308, 2014
Nyanko-sensei joined the channel
2014-10-10 28346, 2014
ijabz1 joined the channel
2014-10-10 28316, 2014
ijabz1 joined the channel
2014-10-10 28305, 2014
Freso
Man. Those show stopper bugs are really annoying. :(
2014-10-10 28302, 2014
ianmcorvidae joined the channel
2014-10-10 28335, 2014
Leftmost joined the channel
2014-10-10 28306, 2014
ijabz1 joined the channel
2014-10-10 28341, 2014
tungol joined the channel
2014-10-10 28351, 2014
21WABMWUW joined the channel
2014-10-10 28313, 2014
yeeeargh
i'm a bit curiois what kind of music you guys are scanning. i didn't encounter that chord-bug once yet. the only errors i got where a bunch of replaygain/silence bugs with track which where either literally silence or tracks wich a larg amout of silence between two songs (hidden tracks)
2014-10-10 28337, 2014
Freso
I've scanned some hip hop, R&B, dancehall, Christmas music stuff, pop, folk/trad., ...
2014-10-10 28333, 2014
Freso
I have one thread hanging right now, but have had several bailing out on a "IOError: [Errno 2] No such file or directory: '/tmp/tmpy5hQly.json'"
2014-10-10 28301, 2014
alastairp
yeah, that'll be because the extractor fails to write the file, and the submitter tries to blindly open it
2014-10-10 28311, 2014
Freso
Yep.
2014-10-10 28312, 2014
alastairp
bug fix for that will be coming in the weekend
2014-10-10 28319, 2014
Freso
And it's consistent for the files that happens to.
2014-10-10 28321, 2014
Freso
...
2014-10-10 28338, 2014
Freso
Which, in retrospect, I should have probably collected somewhere for easy re-submission...
2014-10-10 28303, 2014
alastairp
yeah, we have no way of marking a file as submitted, or bad for submitting
2014-10-10 28313, 2014
alastairp
also, it'd be nice to know how the extractor failed on those ones
2014-10-10 28317, 2014
alastairp
to report bugs if needed
2014-10-10 28353, 2014
Freso
Yep. But I figure the ones I run into are the ones already reported last night, so I'll wait until those are sorted out before reporting new stuff. :)
2014-10-10 28326, 2014
Freso
It would also be nice if it would continue with the rest of the queued files and then report at the end which ones didn't work...
2014-10-10 28350, 2014
alastairp
code. patch. etc
2014-10-10 28301, 2014
alastairp
seriously, I have about 4 different things going on here