alastairp: Likely. :) It's also a young (sub)site, so IMHO it's fine to fumble around for a bit.
Girish_: 47 open tasks right now.
alastairp
OK, tweeted
hah
from post to like by jherskowitz in 30 seconds
Freso goes to like and retweet
so many notifications
I never get this many likes on my personal tweets
oh, that’s right. because I never tweet
ruaok
maybe if you added a picture to your account it woudn't look like you're a spammer. :)
doesn't even have to be a picture of you.
alastairp
I happen to like my egg thank you very much
Girish_ has quit
LordSputnik
Freso: and no, I don't think Last.fm have fixed anything, but I found a working API endpoint that can get the user's play counts, and another that can get their loved tracks
kepstin
they never broke any of their apis, afaik. only the ability to get new api keys
alastairp
nah, they quite seriously broke the output of many apis
kepstin
oh, huh?
alastairp
invalid xml, different results
kepstin
well, the only one I care about was the listening history, which at least still appears to be complete
ruaok
ok, off to the epiphany parade.
LordSputnik
kepstin: the user library API is still completely dead
library.getTracks has been deleted
also one of getAlbums or getArtists, can't remember which
alastairp
ruaok: have fun!
kepstin
well, the albums/artists ones weren't really that useful due to lack of disambiguation, and they can be re-created from the listening history anyways
ah, the api you want to use is 'user.getrecenttracks', which is a paginated list of complete scrobbling history including track mbids if submitted
still works afaik, dunno if the format got changed
LordSputnik
kepstin: yeah that's what I moved beets over to yesterday ;) I spent some time looking through the API docs, to see what could replace Library.getTracks, and found that one
kepstin
tjat
that's always the one I've used, since the library stuff was annoying due to last.fm library not being very useful
LordSputnik
although annoyingly pylast doesn't give back MBIDs, so I had to derive a custom class that did
kepstin
and the getrecenttracks of course includes the time of each play, which i don't think the library api included?
in any case, for a listenbrainz import, the getrecenttracks api is far more complete than the current page-scraping method; i guess the only reason it wasn't used was the inability to get a new api key?
(which has since been fixed)
reosarevok
IIRC they were saying they would move to a slower method soon
So I imagine that's what they meant?
But I dunno :)
kepstin
pretty much the main reason I haven't imported my stuff to listenbrainz yet is the fact that it didn't pull in the mbids from my scrobbles.
(do note that while the 'getrecenttracks' api returns artist, album, and track mbids, only the *track* mbid comes from the scrobble, the others are matched in the last.fm server and are often missing or wrong)
(and the "track" mbid is of course from pre-ngs, so it corresponds to an ngs recording)
alastairp
mmm, messy data is messy
kepstin
to be specific, the 'track' mbid in last.fm comes from the 'MUSICBRAINZ_TRACKID' tag (or equivalent in other formats), which in post-ngs picard versions is set to the recording id.
regagain has quit
interesting. so acousticbrainz currently covers about 1/4 of the tracks in the msd?
I'm assuming that's probably not intentional, but is just overlap between data people submitted naturally and data in the msd.
reosarevok, I am not sure that the existing setup is ready for the modifications he wishes to make.
reosarevok
That's a perfectly valid answer for them too I guess :)
alastairp
kepstin: yeah
honestly, I had hoped we would have more
the matching algorithm is pretty strict though, I think we can easily find a bunch more
though, we use the search server to get the first set of results, so I guess that’s kind of fuzzy
we claim to have almost 2 million unique tracks. In the coming week I’m going to dig into this and try and deduplicate on an artist id/track name level, and see if that’s actually true
recording ids in the same release group, perhaps
because I think part of the reason we have such a low match is that we don’t actually have as many uniques as we thought we did
chirlu` has quit
chirlu` joined the channel
kepstin
how are you measuring uniques right now? just unique recording mbids?
might be over-estimating a bit due to unmerged recordings, i guess.
and of course there's a fair number of similar recordings that'll never be merged, too.
alastairp
yeah, so track name grouped by artist should give an interesting distribution
kepstin
recording or track artist ids? :)
kepstin has several cases where they're different, in particular if the same recording has different artist credits on different releases
alastairp
uhh. good question :)
I’ll publish a ipython notebook and you can correct it for me!
kepstin
assuming musicbrainz is perfect, deduplicating by recording id makes sense - as long as you normalize recording ids to handle merged recordings properly.
big assumption tho :)
alastairp
actually, we don’t handle merged stuff at all either
I wonder how many of the ids we have (from peoples tags) have been subsequently merged
kepstin
well, fixing that would certainly help your duplicate count
stuff like those beatles tracks have had a lot of merges over time from compilations, etc, and I suspect many users might still have old recording mbids in their files.
and "Evaluating a dataset times out because validation checks that each mbid exists in the lowlevel table." doesn't seem to be related to the editor directly
alastairp
no, I just made a ticket with a whole bunch of stuff that came up when I was using it
Gentlecat
right
how is new schema looking?
alastairp
I was doing ^ today
Gentlecat
merging any time soon?
alastairp
so we still need to do the same things
dumps, imports, stats, verification of highlevel, conversion
they’re all at the top of my list at work for the next 2 weeks
Gentlecat
anything related to datasets directly?
I want to implement something that would allow adding new models into hl evaluation
some kind of admin interface
alastairp
right
I was looking at getting the merge done as soon as possible, because that dataset stuff depends on having the new schema
so I wanted to merge, and then look at stuff like that
the other big dataset thing is to use lowlevel.id instead of lowlevel.mbid, and allow classless collections (e.g. just like a musicbrainz collection)
"Isaac Bashevis Singer (czyta Jerzy Stuhr)" clearly should be two artists - is there a nice way of saying "Isaac Bashevis Singer read by Jerzy Stuhr" in Polish without needing the parens?
drsaunde
semi-colon in between?
stanislas
drsaunde: no, that would not be correct unfortunately
drsaunde
why not?
stanislas
reosarevok: I am thinking of something, that might suits you. You could use 'Isaac Bashevis Singer, czyta Jerzy Stuhr'.
kepstin
it's not hard to do "[Isaac Bashevis Singer] (czyta [Jerzy Stuhr])" as artist credits, we do that all the time for japanese character vocalist artists
reosarevok
I guess, that's better than the parens :)
kepstin: I know, it's just so ugly :p
kepstin
it's standard formatting in japan, so we just deal with it
stanislas
reosarevok: I've looked on various polish sites and it is Isaac Bashevis Singer (czyta Jerzy Stuhr) :)