in #metabrainz

17:44 PM
alastairp

(jekyll can do this automatically too)
17:44 PM
if we can come up with some ideas for different categories then it could be OK
17:44 PM
I guess, mappings, datasets, demos
17:44 PM
opatel99 joined the channel
17:53 PM
JesseW has quit
17:55 PM
weeksio has quit
17:59 PM
CallerNo6

Is there a known issue with CB an line-wrapping? e.g. https://critiquebrainz.org/review/81f0aa06-8b9c...
17:59 PM
CallerNo6 looks for an open ticket
18:05 PM
alastairp

Freso: http://labs.acousticbrainz.org/mappings/million...
18:06 PM
Freso

I think that last mapping is superfluous. :p
18:06 PM
alastairp

I don't
18:06 PM
because the url is the name of the thing
18:06 PM
I mean, the last part
18:06 PM
it’s not a “million song dataset"
18:06 PM
Freso shrugs
18:06 PM
it’s a “million song dataset mapping”
18:07 PM
Freso

It's also under mappings, so it's implied that it's a mapping already.
18:09 PM
alastairp

yeah, quite possibly
18:09 PM
I removed the categories. I quite like just the name of the thing
18:11 PM
Girish_ joined the channel
18:11 PM
Freso

Aw.
18:12 PM
Freso likes his hackable URLs
18:12 PM
Girish_

Hi! I'm participating in Google Code In. Is there a separate channel for the competition?
18:12 PM
Freso

Girish_: Nope. This is it. Welcome! :)
18:13 PM
Girish_

Are the tasks added recently? I didn't see them earlier.
18:13 PM
Tasks forMetabrainz.
18:13 PM
for Metabrainz *
18:13 PM
alastairp

Freso: hackable how?
18:13 PM
chirlu`

CallerNo6: Apparently the author used <pre> or something similar?
18:14 PM
Freso

alastairp: If I am at http://labs.acousticbrainz.org/mappings/million... (or get the URL handed to me), I may want to cut out the last bit and go to http://labs.acousticbrainz.org/mappings/ to see other mappings.
18:15 PM
alastairp

yeah, that’s a fair point
18:15 PM
Freso

Girish_: We have plenty of tasks.
18:15 PM
CallerNo6

chirlu`, yeah, I'm pretty sure it's copy/pasted from blogspot, so funky markup is likely.
18:15 PM
alastairp

Freso: we can probably reverse engineer it in if we need to
18:15 PM
Freso

Girish_: https://codein.withgoogle.com/dashboard/tasks/?...
18:16 PM
alastairp: Likely. :) It's also a young (sub)site, so IMHO it's fine to fumble around for a bit.
18:16 PM
Girish_: 47 open tasks right now.
18:18 PM
alastairp

OK, tweeted
18:18 PM
hah
18:19 PM
from post to like by jherskowitz in 30 seconds
18:19 PM
Freso goes to like and retweet
18:20 PM
so many notifications
18:21 PM
I never get this many likes on my personal tweets
18:21 PM
oh, that’s right. because I never tweet
18:22 PM
ruaok

maybe if you added a picture to your account it woudn't look like you're a spammer. :)
18:22 PM
doesn't even have to be a picture of you.
18:22 PM
alastairp

I happen to like my egg thank you very much
18:24 PM
Girish_ has quit
18:25 PM
LordSputnik

Freso: and no, I don't think Last.fm have fixed anything, but I found a working API endpoint that can get the user's play counts, and another that can get their loved tracks
18:27 PM
kepstin

they never broke any of their apis, afaik. only the ability to get new api keys
18:27 PM
alastairp

nah, they quite seriously broke the output of many apis
18:27 PM
kepstin

oh, huh?
18:27 PM
alastairp

invalid xml, different results
18:28 PM
kepstin

well, the only one I care about was the listening history, which at least still appears to be complete
18:28 PM
ruaok

ok, off to the epiphany parade.
18:28 PM
LordSputnik

kepstin: the user library API is still completely dead
18:29 PM
library.getTracks has been deleted
18:29 PM
also one of getAlbums or getArtists, can't remember which
18:29 PM
alastairp

ruaok: have fun!
18:30 PM
kepstin

well, the albums/artists ones weren't really that useful due to lack of disambiguation, and they can be re-created from the listening history anyways
18:31 PM
ah, the api you want to use is 'user.getrecenttracks', which is a paginated list of complete scrobbling history including track mbids if submitted
18:32 PM
still works afaik, dunno if the format got changed
18:33 PM
LordSputnik

kepstin: yeah that's what I moved beets over to yesterday ;) I spent some time looking through the API docs, to see what could replace Library.getTracks, and found that one
18:33 PM
kepstin

tjat
18:34 PM
that's always the one I've used, since the library stuff was annoying due to last.fm library not being very useful
18:34 PM
LordSputnik

although annoyingly pylast doesn't give back MBIDs, so I had to derive a custom class that did
18:34 PM
kepstin

and the getrecenttracks of course includes the time of each play, which i don't think the library api included?
18:36 PM
in any case, for a listenbrainz import, the getrecenttracks api is far more complete than the current page-scraping method; i guess the only reason it wasn't used was the inability to get a new api key?
18:37 PM
(which has since been fixed)
18:38 PM
reosarevok

IIRC they were saying they would move to a slower method soon
18:38 PM
So I imagine that's what they meant?
18:38 PM
But I dunno :)
18:39 PM
kepstin

pretty much the main reason I haven't imported my stuff to listenbrainz yet is the fact that it didn't pull in the mbids from my scrobbles.
18:41 PM
(do note that while the 'getrecenttracks' api returns artist, album, and track mbids, only the *track* mbid comes from the scrobble, the others are matched in the last.fm server and are often missing or wrong)
18:42 PM
(and the "track" mbid is of course from pre-ngs, so it corresponds to an ngs recording)
18:44 PM
alastairp

mmm, messy data is messy
18:45 PM
kepstin

to be specific, the 'track' mbid in last.fm comes from the 'MUSICBRAINZ_TRACKID' tag (or equivalent in other formats), which in post-ngs picard versions is set to the recording id.
18:48 PM
regagain has quit
18:49 PM
interesting. so acousticbrainz currently covers about 1/4 of the tracks in the msd?
18:51 PM
I'm assuming that's probably not intentional, but is just overlap between data people submitted naturally and data in the msd.
18:54 PM
reosarevok

Someone is asking details about the solr thing in the forum (in case Mineo or weeksio feel like answering) http://forums.musicbrainz.org/viewtopic.php?pid...
18:56 PM
CJ_

reosarevok, I am not sure that the existing setup is ready for the modifications he wishes to make.
18:56 PM
reosarevok

That's a perfectly valid answer for them too I guess :)
19:05 PM
alastairp

kepstin: yeah
19:05 PM
honestly, I had hoped we would have more
19:05 PM
the matching algorithm is pretty strict though, I think we can easily find a bunch more
19:06 PM
though, we use the search server to get the first set of results, so I guess that’s kind of fuzzy
19:08 PM
we claim to have almost 2 million unique tracks. In the coming week I’m going to dig into this and try and deduplicate on an artist id/track name level, and see if that’s actually true
19:08 PM
recording ids in the same release group, perhaps
19:08 PM
because I think part of the reason we have such a low match is that we don’t actually have as many uniques as we thought we did
19:21 PM
chirlu` has quit
19:24 PM
chirlu` joined the channel
19:26 PM
kepstin

how are you measuring uniques right now? just unique recording mbids?
19:26 PM
might be over-estimating a bit due to unmerged recordings, i guess.
19:27 PM
alastairp

yeah, just recording mbids
19:28 PM
which results in weird stuff like https://twitter.com/AcousticBrainz/status/66004...
19:28 PM
kepstin

and of course there's a fair number of similar recordings that'll never be merged, too.
19:29 PM
alastairp

yeah, so track name grouped by artist should give an interesting distribution
19:29 PM
kepstin

recording or track artist ids? :)
19:30 PM
kepstin has several cases where they're different, in particular if the same recording has different artist credits on different releases
19:30 PM
alastairp

uhh. good question :)
19:31 PM
I’ll publish a ipython notebook and you can correct it for me!
19:32 PM
kepstin

assuming musicbrainz is perfect, deduplicating by recording id makes sense - as long as you normalize recording ids to handle merged recordings properly.
19:33 PM
big assumption tho :)
19:33 PM
alastairp

actually, we don’t handle merged stuff at all either
19:33 PM
I wonder how many of the ids we have (from peoples tags) have been subsequently merged
19:33 PM
kepstin

well, fixing that would certainly help your duplicate count
19:37 PM
stuff like those beatles tracks have had a lot of merges over time from compilations, etc, and I suspect many users might still have old recording mbids in their files.
19:40 PM
gcilou

Gentlecat, It's working now!! I resubmitted
19:47 PM
Gentlecat

alastairp: we should split up http://tickets.musicbrainz.org/browse/AB-94 into something more specific
19:47 PM
like adding pagination, live editing
19:47 PM
alastairp

yes
19:48 PM
Gentlecat

and "Evaluating a dataset times out because validation checks that each mbid exists in the lowlevel table." doesn't seem to be related to the editor directly
19:48 PM
alastairp

no, I just made a ticket with a whole bunch of stuff that came up when I was using it
19:48 PM
Gentlecat

right
19:49 PM
how is new schema looking?
19:49 PM
alastairp

I was doing ^ today
19:49 PM
Gentlecat

merging any time soon?
19:49 PM
alastairp

so we still need to do the same things
19:49 PM
dumps, imports, stats, verification of highlevel, conversion
19:49 PM
they’re all at the top of my list at work for the next 2 weeks
19:50 PM
Gentlecat

anything related to datasets directly?
19:50 PM
I want to implement something that would allow adding new models into hl evaluation
19:50 PM
some kind of admin interface
19:51 PM
alastairp

right
19:52 PM
I was looking at getting the merge done as soon as possible, because that dataset stuff depends on having the new schema
19:52 PM
so I wanted to merge, and then look at stuff like that
19:53 PM
the other big dataset thing is to use lowlevel.id instead of lowlevel.mbid, and allow classless collections (e.g. just like a musicbrainz collection)
19:58 PM
opatel99 has quit
20:00 PM
reosarevok

stanislas: you around?
20:05 PM
stanislas

reosarevok: yep :)
20:05 PM
reosarevok

http://beta.musicbrainz.org/release/7cf4d284-5c...
20:06 PM
"Isaac Bashevis Singer (czyta Jerzy Stuhr)" clearly should be two artists - is there a nice way of saying "Isaac Bashevis Singer read by Jerzy Stuhr" in Polish without needing the parens?
20:07 PM
drsaunde

semi-colon in between?
20:08 PM
stanislas

drsaunde: no, that would not be correct unfortunately
20:09 PM
drsaunde

why not?
20:11 PM
stanislas

reosarevok: I am thinking of something, that might suits you. You could use 'Isaac Bashevis Singer, czyta Jerzy Stuhr'.
20:11 PM
kepstin

it's not hard to do "[Isaac Bashevis Singer] (czyta [Jerzy Stuhr])" as artist credits, we do that all the time for japanese character vocalist artists
20:11 PM
reosarevok

I guess, that's better than the parens :)
20:11 PM
kepstin: I know, it's just so ugly :p
20:12 PM
kepstin

it's standard formatting in japan, so we just deal with it
20:12 PM
stanislas

reosarevok: I've looked on various polish sites and it is Isaac Bashevis Singer (czyta Jerzy Stuhr) :)
20:13 PM
reosarevok

Hmm. Fiiine, I guess I can leave the parens :p
20:13 PM
Thanks :)
20:13 PM
stanislas

reosarevok: for example : http://www.gandalf.com.pl/u/opowiadania-dla-dzi...
20:13 PM
reosarevok

Did you also happen to find the tracklist for the missing CDs?
20:15 PM
Oh, http://www.bemowo.e-bp.pl/bemowo/ini.php?dol=0&... seems to have it
20:15 PM
Guess I can fix the stuff then
20:16 PM
stanislas

reosarevok: That library is like 10km from my home.
20:17 PM
reosarevok

haha - was the first result when looking for the artists and the track titles