#metabrainz

/

      • zas
        ruaok: well, i'm not sure about the exact issue, i think docker itself is leaking stuff, the unnamed volume isn't really the issue
      • that's related to aufs, i cleaned up dangling containers/images/volumes, and rebooted to clear temp stuff, but that was only 3% of disk space. Then i stopped docker, moved /var/lib/docker, and restarted it, the difference is huge
      • only did it on prince for now
      • i think named volumes for /home/search/ dirs may reduce useless data transfer and possibly loads (since it will not start from zero on container restart)
      • reosarevok
        Ether_Man: we mostly take them from Wikidata, https://github.com/metabrainz/musicbrainz-serve... is the relevant code in case it's useful
      • (I guess you figured the first half out, but just in case the second half helps :) )
      • Ether_Man
        reosarevok, any reason you're not simply using the commons-api? Seems to be a lot of waste to go through wikidata
      • reosarevok
        We're picking the image set as the artist image in Wikidata
      • (which is the link we have for the artist)
      • Freso
        Ether_Man: We try to link as much as possible to Wikidata, which is also what the Wikimedia Foundation prefers to use as their unique identifiers.
      • reosarevok
        If we have a manually-set image for the artist, then we use that instead, of course, but otherwise Wikidata is the obvious place to check :)
      • Freso
        So Wikidata is the entry point to Wikipedia, Commons, Quotes, …
      • Ether_Man
        Right. Which gives you the html for the image on commons. Isn't it more efficient to ask the commons api for the image url directly?
      • reosarevok
        Can you ask directly for the image linked as main image to a specific wikidata ID?
      • If so, it'd be, I guess
      • Ether_Man
      • Gives you both the url to the html page, and the image directly, as well as the url to the html for that specific revision
      • reosarevok
        Yes, but you're asking for "Image:AlYankovicByKristineSlipson.jpg"
      • We don't have that, we just have "this is then entity at Wikidata ID Q24343543" or whatever
      • *the entity
      • zas
        ruaok: proceeding on boingo now
      • Ether_Man
        Right. But the data you get from there, is the url to the html descriptor page of the image.
      • ruaok
        k
      • reosarevok
        Sure, and then we call Commons with it: https://github.com/metabrainz/musicbrainz-serve... etc
      • Ether_Man
        Right, but you're essentially fetching that full html page and then just discarding everything but the image. Rather than asking the API for just the image.
      • reosarevok
        get_commons_image does exactly the API call you said https://github.com/metabrainz/musicbrainz-serve...
      • (unless I'm missing something)
      • Ether_Man
        And before anyone gets annoyed. I'm not meaning this as critique or anything, I'm genuinely interested which is more efficient here
      • reosarevok
        And get_wikidata_properties also makes an API call: https://github.com/metabrainz/musicbrainz-serve...
      • Ether_Man
        oh sorry then :)
      • reosarevok
        It does mean we need to make two calls, of course :)
      • Ether_Man
        I read it as it was reading the descriptor page :)
      • reosarevok
        But at the same time, it saves us worrying about whether the images move or change or whatever, and if someone adds a new better image to the Wikidata page, we get it automatically, which is nice
      • Ether_Man
        Well yea, but one minimal call, and then the image, rather than a whole html page, at least feels like it should be more efficient :)
      • reosarevok
        (before we started basing everything on Wikidata, we'd have to fix a few hundred Wikipedia links a month for example because they kept removing them or redirecting them :D )
      • But that's also, I guess, why they're not automatically added to the ws results
      • Freso
      • reosarevok
        (because we'd have to call Wikidata/Commons every to give the URL, which the user might not even need, so it's better if the user does it)
      • Freso
        Yeah. We provide the data we have, which includes a lot of additional IDs and links that data users can use to query for more/other data on other services.
      • Ether_Man
        If only Plex Agents actually had a system to STORE all those links, they'd be really handy indeed for me :)
      • agentsim has quit
      • Norwich_ joined the channel
      • Norwich_
        Do you sell CDs ?
      • ruaok
        no. we dont sell anything.
      • Norwich_
        Someone pinched my CRIMCD86 ... I have an empty Case !!
      • Norwich_ has quit
      • ZaphodBeeblebrox
        ..right
      • I was expecting a leadup to a joke honestly.
      • MajorLurker
        you got it
      • ZaphodBeeblebrox
        "Do you dig graves?"
      • "Yea, they're alright"
      • "I think they're wonderful"
      • (and if people get that refrence they are officially Cool™)
      • MajorLurker
        hahah.... new jazz group
      • ZaphodBeeblebrox
        what part? :P
      • MajorLurker
        the digging part bro.... ;)
      • ZaphodBeeblebrox
      • github joined the channel
      • github
        [picard] samj1912 closed pull request #713: Remove unneeded unicode prefix (master...remove_unicode) https://git.io/v9WcW
      • github has left the channel
      • samj1912
        !m Freso
      • BrainzBot
        You're doing good work, Freso!
      • samj1912
        :D
      • nice blog post :D
      • also
      • !m everyone :D
      • BrainzBot
        You're doing good work, everyone :D!
      • ZaphodBeeblebrox
        :D
      • reosarevok
        Freso: are you going to tweet the community thing with MeB?
      • UmkaDK joined the channel
      • amanmehta has quit
      • UmkaDK has quit
      • github joined the channel
      • github
        [picard] mineo opened pull request #714: Reject functions with required keyword-only arguments (master...reject-kwonly) https://git.io/v9lcH
      • github has left the channel
      • Leo_Verto[m]
        Ether_Man: so you're developing a MB plex plugin?
      • Freso: great blog post!
      • Ether_Man
        Leo_Verto[m], Sort of. I'm mainly updating and reworking my old one
      • Leo_Verto[m]
        Sounds pretty useful, even though I use Emby :P
      • are you setting a custom user agent?
      • Ether_Man
        Well it can't get more useless than the last.fm agent that comes with plex at least...
      • Leo_Verto[m]
        heh
      • Ether_Man
        Yes. 'MusicbrainzPlexAgent/1.0.0.0 ( [redacted]@gmail.com )' :)
      • Leo_Verto[m]
        ah, then you shouldn't run into any problems with rate limiting
      • SothoTalKer
        well....
      • Leo_Verto[m]
        I mean, if it's not using a generic UA and there's a way to contact the dev, issues can be resolved more diplomatically
      • Ether_Man
        I'm more worried about being banned for doing something really dumb against the API really. But I also know that there's a lot of plex users that are seriously unhappy with the last.fm agent... Doesn't help that they released a paid agent, and broke the entire thing in the process so UI doesn't actually reflect the metadata for things like artist name and album titles... Unless ofc, you're using the paid version...
      • SothoTalKer
        Leo_Verto[m]: i am a good citizen, too and my script gets a 503 sometimes :p
      • usually in the evening and night european time.
      • and i make 1 request every 5 seconds :p
      • Leo_Verto[m]
        hmm, does the API really throw 503s when rate-limiting?
      • Ether_Man
        Yes
      • According to the doc at least https://musicbrainz.org/doc/XML_Web_Service/Rat... "and decline (http 503) the rest."
      • reosarevok
      • BrainzBot
        MBS-5827: for /ws/3, use HTTP 429 rather than HTTP 503
      • Leo_Verto[m]
        yeah, that seems like the cleaner solution
      • ruaok
        Leo_Verto[m]: yes it does. we used 503 before 429 existed.
      • Leo_Verto[m]
        ah
      • and SothoTalKer's 503's might be caused by other things, right?
      • SothoTalKer
        Leo_Verto[m]: that's actually a fun thing. the first reply is a 503. when you request again too fast, you get an { "Error": "Server is busy" } or something :p
      • Leo_Verto[m]: no its not. my code is great and has no flaws. :p
      • Leo_Verto[m]
        hmm, unfortunately stats.meb.org doesn't seem to record 503s but couldn't those also be caused by genuine server-overload?
      • SothoTalKer
        that's the case, indeed :x
      • when the usa is sleeping, i almost never get an error :-)
      • zas
      • around 7k per minute
      • Leo_Verto[m]
        wow
      • yeah, I can see why those aren't on the other graphs :P
      • SothoTalKer
        and i am responsible for a few :x
      • Leo_Verto[m]
        are 403s banned clients?
      • zas
        yes
      • ruaok
        403 are reserved for SothoTalKer. :)
      • SothoTalKer
        *growl*
      • zas
        that said, 503s can be either individual rate limit or global one (the code is the same)
      • Leo_Verto[m]
        I didn't realize MB was blocking more than three times as many requests as it let through
      • zas
        one can check headers returned by rate limiter to have further info about the reason of the rate limit
      • Leo_Verto[m]
        is the list of the worst offenders public?
      • zas
        We block also at IP level (those don't appear of course, mbstats is based on actual web logs)
      • SothoTalKer
        i can imagine why. All those userscript do heavily make use of the webservice :x
      • Leo_Verto[m]
        so most of those 403s are probably not offending clients but certain software/apps used by tons of clients then, right?
      • zas
        yes, this is why we'll move to api key based ws in the future, to have better control
      • SothoTalKer
        how would i request a key? do i need to sign up with MB to get a key?
      • zas
        prolly, it doesn't exist yet, but that's planned
      • SothoTalKer
        you could still have the old keyless api for those people who want to run their own search servers?
      • Leo_Verto[m]
        are the people causing more than a hundred thousand of failing requests using custom UAs or just generic ones?
      • how do you even not notice your app being blocked
      • khan joined the channel
      • zas
        users are supposed to comply to simple requirements concerning UA and rate, if you are blocked you get 403s not 503s
      • SothoTalKer
        i should print out my error messages XD
      • zas
        503s are when rate limited (and headers can tell when to retry, and if it is the global rate limit, in which case your rate may be correct, but servers are overloaded)
      • FYI we serve ~333 200 responses per second on the ws alone, that's ~30M per day, 10501488000 per year...
      • SothoTalKer
        zas: did you see an increase in requests within the last month?
      • hibiscuskazeneko has quit
      • github joined the channel
      • github
        [picard] mineo opened pull request #715: patch_version: Use sys.platform as the default platform name (master...patch-version-default) https://git.io/v9l8M
      • github has left the channel
      • SothoTalKer
        Leo_Verto[m]: regarding the survey. what's the difference between rarely and sometimes? :p
      • Slurpee joined the channel
      • Slurpee has quit
      • Slurpee joined the channel
      • zas
        SothoTalKer: we had an increase in requests since 4/18
      • SothoTalKer
        that explains it :-)
      • Leo_Verto[m]
        SothoTalKer: as a non-native speaker, I'd say rarely is "I've used it once or twice" and sometimes is more along the lines of "I use it every once in a while"
      • do you feel I should clarify/change that in the survey?
      • SothoTalKer
        I would not mind clarifying it. One question is marked as mandatory. is this intended?
      • reosarevok
        ruaok, Freso, one of you two interested?: https://twitter.com/Tpt93/status/85902034563385...
      • ruaok
        I'd do that. :)
      • ruaok responds
      • Leo_Verto[m]
        SothoTalKer: I was planning to set most of them as mandatory (except for the last section), I just didn't do that for now to make looking through the survey easier
      • ZaphodBeeblebrox
        arg now even twitter does this annoying "cropping your avatar into a circle with css/js" bullshit I have no idea why they are doing