alastairp: Likely. :) It's also a young (sub)site, so IMHO it's fine to fumble around for a bit.
2016-01-05 00543, 2016
Freso
Girish_: 47 open tasks right now.
2016-01-05 00538, 2016
alastairp
OK, tweeted
2016-01-05 00544, 2016
alastairp
hah
2016-01-05 00511, 2016
alastairp
from post to like by jherskowitz in 30 seconds
2016-01-05 00516, 2016
Freso goes to like and retweet
2016-01-05 00511, 2016
alastairp
so many notifications
2016-01-05 00546, 2016
alastairp
I never get this many likes on my personal tweets
2016-01-05 00555, 2016
alastairp
oh, that’s right. because I never tweet
2016-01-05 00510, 2016
ruaok
maybe if you added a picture to your account it woudn't look like you're a spammer. :)
2016-01-05 00523, 2016
ruaok
doesn't even have to be a picture of you.
2016-01-05 00546, 2016
alastairp
I happen to like my egg thank you very much
2016-01-05 00515, 2016
Girish_ has quit
2016-01-05 00524, 2016
LordSputnik
Freso: and no, I don't think Last.fm have fixed anything, but I found a working API endpoint that can get the user's play counts, and another that can get their loved tracks
2016-01-05 00530, 2016
kepstin
they never broke any of their apis, afaik. only the ability to get new api keys
2016-01-05 00548, 2016
alastairp
nah, they quite seriously broke the output of many apis
2016-01-05 00554, 2016
kepstin
oh, huh?
2016-01-05 00558, 2016
alastairp
invalid xml, different results
2016-01-05 00514, 2016
kepstin
well, the only one I care about was the listening history, which at least still appears to be complete
2016-01-05 00528, 2016
ruaok
ok, off to the epiphany parade.
2016-01-05 00553, 2016
LordSputnik
kepstin: the user library API is still completely dead
2016-01-05 00502, 2016
LordSputnik
library.getTracks has been deleted
2016-01-05 00535, 2016
LordSputnik
also one of getAlbums or getArtists, can't remember which
2016-01-05 00548, 2016
alastairp
ruaok: have fun!
2016-01-05 00524, 2016
kepstin
well, the albums/artists ones weren't really that useful due to lack of disambiguation, and they can be re-created from the listening history anyways
2016-01-05 00552, 2016
kepstin
ah, the api you want to use is 'user.getrecenttracks', which is a paginated list of complete scrobbling history including track mbids if submitted
2016-01-05 00509, 2016
kepstin
still works afaik, dunno if the format got changed
2016-01-05 00519, 2016
LordSputnik
kepstin: yeah that's what I moved beets over to yesterday ;) I spent some time looking through the API docs, to see what could replace Library.getTracks, and found that one
2016-01-05 00536, 2016
kepstin
tjat
2016-01-05 00501, 2016
kepstin
that's always the one I've used, since the library stuff was annoying due to last.fm library not being very useful
2016-01-05 00519, 2016
LordSputnik
although annoyingly pylast doesn't give back MBIDs, so I had to derive a custom class that did
2016-01-05 00534, 2016
kepstin
and the getrecenttracks of course includes the time of each play, which i don't think the library api included?
2016-01-05 00555, 2016
kepstin
in any case, for a listenbrainz import, the getrecenttracks api is far more complete than the current page-scraping method; i guess the only reason it wasn't used was the inability to get a new api key?
2016-01-05 00506, 2016
kepstin
(which has since been fixed)
2016-01-05 00531, 2016
reosarevok
IIRC they were saying they would move to a slower method soon
2016-01-05 00536, 2016
reosarevok
So I imagine that's what they meant?
2016-01-05 00538, 2016
reosarevok
But I dunno :)
2016-01-05 00532, 2016
kepstin
pretty much the main reason I haven't imported my stuff to listenbrainz yet is the fact that it didn't pull in the mbids from my scrobbles.
2016-01-05 00508, 2016
kepstin
(do note that while the 'getrecenttracks' api returns artist, album, and track mbids, only the *track* mbid comes from the scrobble, the others are matched in the last.fm server and are often missing or wrong)
2016-01-05 00533, 2016
kepstin
(and the "track" mbid is of course from pre-ngs, so it corresponds to an ngs recording)
2016-01-05 00511, 2016
alastairp
mmm, messy data is messy
2016-01-05 00558, 2016
kepstin
to be specific, the 'track' mbid in last.fm comes from the 'MUSICBRAINZ_TRACKID' tag (or equivalent in other formats), which in post-ngs picard versions is set to the recording id.
2016-01-05 00547, 2016
regagain has quit
2016-01-05 00529, 2016
kepstin
interesting. so acousticbrainz currently covers about 1/4 of the tracks in the msd?
2016-01-05 00524, 2016
kepstin
I'm assuming that's probably not intentional, but is just overlap between data people submitted naturally and data in the msd.
reosarevok, I am not sure that the existing setup is ready for the modifications he wishes to make.
2016-01-05 00529, 2016
reosarevok
That's a perfectly valid answer for them too I guess :)
2016-01-05 00520, 2016
alastairp
kepstin: yeah
2016-01-05 00531, 2016
alastairp
honestly, I had hoped we would have more
2016-01-05 00547, 2016
alastairp
the matching algorithm is pretty strict though, I think we can easily find a bunch more
2016-01-05 00532, 2016
alastairp
though, we use the search server to get the first set of results, so I guess that’s kind of fuzzy
2016-01-05 00514, 2016
alastairp
we claim to have almost 2 million unique tracks. In the coming week I’m going to dig into this and try and deduplicate on an artist id/track name level, and see if that’s actually true
2016-01-05 00535, 2016
alastairp
recording ids in the same release group, perhaps
2016-01-05 00555, 2016
alastairp
because I think part of the reason we have such a low match is that we don’t actually have as many uniques as we thought we did
2016-01-05 00529, 2016
chirlu` has quit
2016-01-05 00501, 2016
chirlu` joined the channel
2016-01-05 00530, 2016
kepstin
how are you measuring uniques right now? just unique recording mbids?
2016-01-05 00558, 2016
kepstin
might be over-estimating a bit due to unmerged recordings, i guess.
and of course there's a fair number of similar recordings that'll never be merged, too.
2016-01-05 00501, 2016
alastairp
yeah, so track name grouped by artist should give an interesting distribution
2016-01-05 00553, 2016
kepstin
recording or track artist ids? :)
2016-01-05 00514, 2016
kepstin has several cases where they're different, in particular if the same recording has different artist credits on different releases
2016-01-05 00554, 2016
alastairp
uhh. good question :)
2016-01-05 00507, 2016
alastairp
I’ll publish a ipython notebook and you can correct it for me!
2016-01-05 00558, 2016
kepstin
assuming musicbrainz is perfect, deduplicating by recording id makes sense - as long as you normalize recording ids to handle merged recordings properly.
2016-01-05 00506, 2016
kepstin
big assumption tho :)
2016-01-05 00532, 2016
alastairp
actually, we don’t handle merged stuff at all either
2016-01-05 00548, 2016
alastairp
I wonder how many of the ids we have (from peoples tags) have been subsequently merged
2016-01-05 00553, 2016
kepstin
well, fixing that would certainly help your duplicate count
2016-01-05 00535, 2016
kepstin
stuff like those beatles tracks have had a lot of merges over time from compilations, etc, and I suspect many users might still have old recording mbids in their files.
and "Evaluating a dataset times out because validation checks that each mbid exists in the lowlevel table." doesn't seem to be related to the editor directly
2016-01-05 00536, 2016
alastairp
no, I just made a ticket with a whole bunch of stuff that came up when I was using it
2016-01-05 00546, 2016
Gentlecat
right
2016-01-05 00508, 2016
Gentlecat
how is new schema looking?
2016-01-05 00517, 2016
alastairp
I was doing ^ today
2016-01-05 00520, 2016
Gentlecat
merging any time soon?
2016-01-05 00523, 2016
alastairp
so we still need to do the same things
2016-01-05 00535, 2016
alastairp
dumps, imports, stats, verification of highlevel, conversion
2016-01-05 00553, 2016
alastairp
they’re all at the top of my list at work for the next 2 weeks
2016-01-05 00509, 2016
Gentlecat
anything related to datasets directly?
2016-01-05 00540, 2016
Gentlecat
I want to implement something that would allow adding new models into hl evaluation
2016-01-05 00547, 2016
Gentlecat
some kind of admin interface
2016-01-05 00542, 2016
alastairp
right
2016-01-05 00508, 2016
alastairp
I was looking at getting the merge done as soon as possible, because that dataset stuff depends on having the new schema
2016-01-05 00515, 2016
alastairp
so I wanted to merge, and then look at stuff like that
2016-01-05 00501, 2016
alastairp
the other big dataset thing is to use lowlevel.id instead of lowlevel.mbid, and allow classless collections (e.g. just like a musicbrainz collection)
"Isaac Bashevis Singer (czyta Jerzy Stuhr)" clearly should be two artists - is there a nice way of saying "Isaac Bashevis Singer read by Jerzy Stuhr" in Polish without needing the parens?
2016-01-05 00552, 2016
drsaunde
semi-colon in between?
2016-01-05 00537, 2016
stanislas
drsaunde: no, that would not be correct unfortunately
2016-01-05 00506, 2016
drsaunde
why not?
2016-01-05 00540, 2016
stanislas
reosarevok: I am thinking of something, that might suits you. You could use 'Isaac Bashevis Singer, czyta Jerzy Stuhr'.
2016-01-05 00547, 2016
kepstin
it's not hard to do "[Isaac Bashevis Singer] (czyta [Jerzy Stuhr])" as artist credits, we do that all the time for japanese character vocalist artists
2016-01-05 00551, 2016
reosarevok
I guess, that's better than the parens :)
2016-01-05 00559, 2016
reosarevok
kepstin: I know, it's just so ugly :p
2016-01-05 00511, 2016
kepstin
it's standard formatting in japan, so we just deal with it
2016-01-05 00553, 2016
stanislas
reosarevok: I've looked on various polish sites and it is Isaac Bashevis Singer (czyta Jerzy Stuhr) :)