#metabrainz

/

      • alastairp
        (jekyll can do this automatically too)
      • if we can come up with some ideas for different categories then it could be OK
      • I guess, mappings, datasets, demos
      • opatel99 joined the channel
      • JesseW has quit
      • weeksio has quit
      • CallerNo6
        Is there a known issue with CB an line-wrapping? e.g. https://critiquebrainz.org/review/81f0aa06-8b9c...
      • CallerNo6 looks for an open ticket
      • alastairp
      • Freso
        I think that last mapping is superfluous. :p
      • alastairp
        I don't
      • because the url is the name of the thing
      • I mean, the last part
      • it’s not a “million song dataset"
      • Freso shrugs
      • it’s a “million song dataset mapping”
      • Freso
        It's also under mappings, so it's implied that it's a mapping already.
      • alastairp
        yeah, quite possibly
      • I removed the categories. I quite like just the name of the thing
      • Girish_ joined the channel
      • Freso
        Aw.
      • Freso likes his hackable URLs
      • Girish_
        Hi! I'm participating in Google Code In. Is there a separate channel for the competition?
      • Freso
        Girish_: Nope. This is it. Welcome! :)
      • Girish_
        Are the tasks added recently? I didn't see them earlier.
      • Tasks forMetabrainz.
      • for Metabrainz *
      • alastairp
        Freso: hackable how?
      • chirlu`
        CallerNo6: Apparently the author used <pre> or something similar?
      • Freso
        alastairp: If I am at http://labs.acousticbrainz.org/mappings/million... (or get the URL handed to me), I may want to cut out the last bit and go to http://labs.acousticbrainz.org/mappings/ to see other mappings.
      • alastairp
        yeah, that’s a fair point
      • Freso
        Girish_: We have plenty of tasks.
      • CallerNo6
        chirlu`, yeah, I'm pretty sure it's copy/pasted from blogspot, so funky markup is likely.
      • alastairp
        Freso: we can probably reverse engineer it in if we need to
      • Freso
      • alastairp: Likely. :) It's also a young (sub)site, so IMHO it's fine to fumble around for a bit.
      • Girish_: 47 open tasks right now.
      • alastairp
        OK, tweeted
      • hah
      • from post to like by jherskowitz in 30 seconds
      • Freso goes to like and retweet
      • so many notifications
      • I never get this many likes on my personal tweets
      • oh, that’s right. because I never tweet
      • ruaok
        maybe if you added a picture to your account it woudn't look like you're a spammer. :)
      • doesn't even have to be a picture of you.
      • alastairp
        I happen to like my egg thank you very much
      • Girish_ has quit
      • LordSputnik
        Freso: and no, I don't think Last.fm have fixed anything, but I found a working API endpoint that can get the user's play counts, and another that can get their loved tracks
      • kepstin
        they never broke any of their apis, afaik. only the ability to get new api keys
      • alastairp
        nah, they quite seriously broke the output of many apis
      • kepstin
        oh, huh?
      • alastairp
        invalid xml, different results
      • kepstin
        well, the only one I care about was the listening history, which at least still appears to be complete
      • ruaok
        ok, off to the epiphany parade.
      • LordSputnik
        kepstin: the user library API is still completely dead
      • library.getTracks has been deleted
      • also one of getAlbums or getArtists, can't remember which
      • alastairp
        ruaok: have fun!
      • kepstin
        well, the albums/artists ones weren't really that useful due to lack of disambiguation, and they can be re-created from the listening history anyways
      • ah, the api you want to use is 'user.getrecenttracks', which is a paginated list of complete scrobbling history including track mbids if submitted
      • still works afaik, dunno if the format got changed
      • LordSputnik
        kepstin: yeah that's what I moved beets over to yesterday ;) I spent some time looking through the API docs, to see what could replace Library.getTracks, and found that one
      • kepstin
        tjat
      • that's always the one I've used, since the library stuff was annoying due to last.fm library not being very useful
      • LordSputnik
        although annoyingly pylast doesn't give back MBIDs, so I had to derive a custom class that did
      • kepstin
        and the getrecenttracks of course includes the time of each play, which i don't think the library api included?
      • in any case, for a listenbrainz import, the getrecenttracks api is far more complete than the current page-scraping method; i guess the only reason it wasn't used was the inability to get a new api key?
      • (which has since been fixed)
      • reosarevok
        IIRC they were saying they would move to a slower method soon
      • So I imagine that's what they meant?
      • But I dunno :)
      • kepstin
        pretty much the main reason I haven't imported my stuff to listenbrainz yet is the fact that it didn't pull in the mbids from my scrobbles.
      • (do note that while the 'getrecenttracks' api returns artist, album, and track mbids, only the *track* mbid comes from the scrobble, the others are matched in the last.fm server and are often missing or wrong)
      • (and the "track" mbid is of course from pre-ngs, so it corresponds to an ngs recording)
      • alastairp
        mmm, messy data is messy
      • kepstin
        to be specific, the 'track' mbid in last.fm comes from the 'MUSICBRAINZ_TRACKID' tag (or equivalent in other formats), which in post-ngs picard versions is set to the recording id.
      • regagain has quit
      • interesting. so acousticbrainz currently covers about 1/4 of the tracks in the msd?
      • I'm assuming that's probably not intentional, but is just overlap between data people submitted naturally and data in the msd.
      • reosarevok
        Someone is asking details about the solr thing in the forum (in case Mineo or weeksio feel like answering) http://forums.musicbrainz.org/viewtopic.php?pid...
      • CJ_
        reosarevok, I am not sure that the existing setup is ready for the modifications he wishes to make.
      • reosarevok
        That's a perfectly valid answer for them too I guess :)
      • alastairp
        kepstin: yeah
      • honestly, I had hoped we would have more
      • the matching algorithm is pretty strict though, I think we can easily find a bunch more
      • though, we use the search server to get the first set of results, so I guess that’s kind of fuzzy
      • we claim to have almost 2 million unique tracks. In the coming week I’m going to dig into this and try and deduplicate on an artist id/track name level, and see if that’s actually true
      • recording ids in the same release group, perhaps
      • because I think part of the reason we have such a low match is that we don’t actually have as many uniques as we thought we did
      • chirlu` has quit
      • chirlu` joined the channel
      • kepstin
        how are you measuring uniques right now? just unique recording mbids?
      • might be over-estimating a bit due to unmerged recordings, i guess.
      • alastairp
        yeah, just recording mbids
      • which results in weird stuff like https://twitter.com/AcousticBrainz/status/66004...
      • kepstin
        and of course there's a fair number of similar recordings that'll never be merged, too.
      • alastairp
        yeah, so track name grouped by artist should give an interesting distribution
      • kepstin
        recording or track artist ids? :)
      • kepstin has several cases where they're different, in particular if the same recording has different artist credits on different releases
      • alastairp
        uhh. good question :)
      • I’ll publish a ipython notebook and you can correct it for me!
      • kepstin
        assuming musicbrainz is perfect, deduplicating by recording id makes sense - as long as you normalize recording ids to handle merged recordings properly.
      • big assumption tho :)
      • alastairp
        actually, we don’t handle merged stuff at all either
      • I wonder how many of the ids we have (from peoples tags) have been subsequently merged
      • kepstin
        well, fixing that would certainly help your duplicate count
      • stuff like those beatles tracks have had a lot of merges over time from compilations, etc, and I suspect many users might still have old recording mbids in their files.
      • gcilou
        Gentlecat, It's working now!! I resubmitted
      • Gentlecat
        alastairp: we should split up http://tickets.musicbrainz.org/browse/AB-94 into something more specific
      • like adding pagination, live editing
      • alastairp
        yes
      • Gentlecat
        and "Evaluating a dataset times out because validation checks that each mbid exists in the lowlevel table." doesn't seem to be related to the editor directly
      • alastairp
        no, I just made a ticket with a whole bunch of stuff that came up when I was using it
      • Gentlecat
        right
      • how is new schema looking?
      • alastairp
        I was doing ^ today
      • Gentlecat
        merging any time soon?
      • alastairp
        so we still need to do the same things
      • dumps, imports, stats, verification of highlevel, conversion
      • they’re all at the top of my list at work for the next 2 weeks
      • Gentlecat
        anything related to datasets directly?
      • I want to implement something that would allow adding new models into hl evaluation
      • some kind of admin interface
      • alastairp
        right
      • I was looking at getting the merge done as soon as possible, because that dataset stuff depends on having the new schema
      • so I wanted to merge, and then look at stuff like that
      • the other big dataset thing is to use lowlevel.id instead of lowlevel.mbid, and allow classless collections (e.g. just like a musicbrainz collection)
      • opatel99 has quit
      • reosarevok
        stanislas: you around?
      • stanislas
        reosarevok: yep :)
      • reosarevok
      • "Isaac Bashevis Singer (czyta Jerzy Stuhr)" clearly should be two artists - is there a nice way of saying "Isaac Bashevis Singer read by Jerzy Stuhr" in Polish without needing the parens?
      • drsaunde
        semi-colon in between?
      • stanislas
        drsaunde: no, that would not be correct unfortunately
      • drsaunde
        why not?
      • stanislas
        reosarevok: I am thinking of something, that might suits you. You could use 'Isaac Bashevis Singer, czyta Jerzy Stuhr'.
      • kepstin
        it's not hard to do "[Isaac Bashevis Singer] (czyta [Jerzy Stuhr])" as artist credits, we do that all the time for japanese character vocalist artists
      • reosarevok
        I guess, that's better than the parens :)
      • kepstin: I know, it's just so ugly :p
      • kepstin
        it's standard formatting in japan, so we just deal with it
      • stanislas
        reosarevok: I've looked on various polish sites and it is Isaac Bashevis Singer (czyta Jerzy Stuhr) :)
      • reosarevok
        Hmm. Fiiine, I guess I can leave the parens :p
      • Thanks :)
      • stanislas
      • reosarevok
        Did you also happen to find the tracklist for the missing CDs?
      • Guess I can fix the stuff then
      • stanislas
        reosarevok: That library is like 10km from my home.
      • reosarevok
        haha - was the first result when looking for the artists and the track titles