#metabrainz

/

      • alastairp
        yes, the old dump is a different table structure
      • 2019-05-09 12936, 2019

      • alastairp
        and for years I've been saying that I'd fix them, and make a small dump for developers/testing
      • 2019-05-09 12940, 2019

      • alastairp
        and never got around to it
      • 2019-05-09 12942, 2019

      • alastairp
        so that's a thing
      • 2019-05-09 12938, 2019

      • aidanlw17
        Gotcha - happens to all of us
      • 2019-05-09 12956, 2019

      • aidanlw17
        Maybe we can add that to the list of PRs to get done then?
      • 2019-05-09 12910, 2019

      • aidanlw17
        It would probably be helpful when we get into building the similarity index
      • 2019-05-09 12925, 2019

      • alastairp
        yeah, right. we have some PRs
      • 2019-05-09 12950, 2019

      • alastairp
      • 2019-05-09 12959, 2019

      • alastairp
        but it's really difficult to test. takes a long time
      • 2019-05-09 12923, 2019

      • alastairp
        this is why I'm excited about the submission offset patch, it's going to speed up dumps significatly
      • 2019-05-09 12929, 2019

      • alastairp
        and will solve many of the problems that I thought we had
      • 2019-05-09 12942, 2019

      • alastairp
        so maybe this summer will be the time to get these finished
      • 2019-05-09 12906, 2019

      • alastairp
        yes, for testing similarity we have a handful of things to do there - we can probably get you a bunch of items for you to test with
      • 2019-05-09 12926, 2019

      • alastairp
        once we want to test large-scale we'll get another development server and make an entire copy of the current AB database
      • 2019-05-09 12921, 2019

      • aidanlw17
        yeah! I think so... I'm up for it :) hopefully we can get the offset patch up by Monday or on the weekend?
      • 2019-05-09 12947, 2019

      • alastairp
        my goal is by the end of next week
      • 2019-05-09 12915, 2019

      • aidanlw17
        Sounds good. I'll let you know when I get the second part up, and we can probably plan out some of the testing when we go over the proposal monday
      • 2019-05-09 12901, 2019

      • alastairp
        sounds good
      • 2019-05-09 12906, 2019

      • zas
      • 2019-05-09 12942, 2019

      • ruaok
        thank you!
      • 2019-05-09 12954, 2019

      • alastairp
        have you just started this statistic?
      • 2019-05-09 12902, 2019

      • ruaok
        yes
      • 2019-05-09 12916, 2019

      • alastairp
        great
      • 2019-05-09 12941, 2019

      • alastairp
        how easy is it to split into GET/POST and url? (highlevel/lowlevel get)
      • 2019-05-09 12913, 2019

      • zas
        alastairp: it isn't a fullblown web logs analyzer, but rather a very quick one i create for mbs high traffic to get near real time stats
      • 2019-05-09 12948, 2019

      • alastairp
        👍 ok
      • 2019-05-09 12959, 2019

      • alastairp
        how long will you be in Barcelona for?
      • 2019-05-09 12907, 2019

      • zas
        till 18
      • 2019-05-09 12913, 2019

      • alastairp
        great
      • 2019-05-09 12918, 2019

      • alastairp
        let's do something before you go
      • 2019-05-09 12925, 2019

      • ZoeB joined the channel
      • 2019-05-09 12932, 2019

      • zas
        sure :) like drinking beers ??
      • 2019-05-09 12942, 2019

      • alastairp
        (and so we can discuss logging in more detail)
      • 2019-05-09 12946, 2019

      • alastairp
        sounds good
      • 2019-05-09 12900, 2019

      • alastairp
        I can bring some of Mr_Monkey and my homebrew to officebrainz
      • 2019-05-09 12912, 2019

      • ruaok
        black IPA, please!
      • 2019-05-09 12918, 2019

      • alastairp
        mmmm
      • 2019-05-09 12922, 2019

      • alastairp
        no more black ipa sorry :(
      • 2019-05-09 12925, 2019

      • ruaok
        not that that I was asked.
      • 2019-05-09 12927, 2019

      • alastairp
        we have some imperial ipa
      • 2019-05-09 12927, 2019

      • ruaok
        boo.
      • 2019-05-09 12928, 2019

      • zas
        btw, wait a bit the stats gather more data, it does a sum each minute, we should have something significant in 20 mins
      • 2019-05-09 12958, 2019

      • alastairp
        the black ipa is difficult
      • 2019-05-09 12912, 2019

      • alastairp
        it still smells pretty stout-y straight out of the bottle
      • 2019-05-09 12918, 2019

      • alastairp
        and it skunks up really quickly
      • 2019-05-09 12919, 2019

      • Mr_Monkey
        ruaok: The imperial IPA is quite nice!
      • 2019-05-09 12933, 2019

      • ruaok
        ok, I shan't be too picky.
      • 2019-05-09 12937, 2019

      • alastairp
        after 2-3 months it's basically a stout. it loses all of the hops
      • 2019-05-09 12951, 2019

      • alastairp
        I have to experiment a bit more with it
      • 2019-05-09 12937, 2019

      • zas
        1k 200s per minute... hmmm
      • 2019-05-09 12943, 2019

      • zas
        that's a lot
      • 2019-05-09 12942, 2019

      • zas
        and a lot of 404s too (almost 500 per minute)
      • 2019-05-09 12919, 2019

      • alastairp
        right - because people query the API for all mbids that they have to get data, if we don't have data for that mbid we return 404
      • 2019-05-09 12922, 2019

      • ruaok
        404s are not surprising.
      • 2019-05-09 12928, 2019

      • ruaok
        that. :)
      • 2019-05-09 12940, 2019

      • alastairp
        remember, no rate limiting or api keys on AB
      • 2019-05-09 12924, 2019

      • alastairp
        we have bulk-get endpoints for lowlevel, we should encourage that more
      • 2019-05-09 12941, 2019

      • ruaok
        alastairp: I'm going to bump up work_mem again. ok for me to proceed?
      • 2019-05-09 12949, 2019

      • zas
        still ~1.5k requests / min -> 25 req/s, but we'll get a better figure after a while
      • 2019-05-09 12918, 2019

      • alastairp
        I'm making a list of things to discuss in the AcousticBrainz board on trello
      • 2019-05-09 12942, 2019

      • zas
        good thing > 50% are gzipped
      • 2019-05-09 12957, 2019

      • zas
        for mbs web service that's a very low 3% ...
      • 2019-05-09 12909, 2019

      • zas
        and 45% for mb website
      • 2019-05-09 12932, 2019

      • zas
        is there any rate limit ?
      • 2019-05-09 12949, 2019

      • alastairp
        no rate limit
      • 2019-05-09 12904, 2019

      • ruaok
        that should be fixed pretty soon, methinks.
      • 2019-05-09 12905, 2019

      • alastairp
      • 2019-05-09 12950, 2019

      • aidanlw17
        alastairp should I join the trello?
      • 2019-05-09 12910, 2019

      • alastairp
        I'm not sure you can, but it's not important
      • 2019-05-09 12920, 2019

      • alastairp
        I use it only to keep track of tickets that I've merged but not released
      • 2019-05-09 12930, 2019

      • alastairp
        we do everything else in jira
      • 2019-05-09 12911, 2019

      • aidanlw17
        Okay no worries then. Just wanted to make sure I wasn't missing something important
      • 2019-05-09 12928, 2019

      • ZoeB
        Hi! I have a JSON API question: when I access http://musicbrainz.org/ws/2/release-group?artist=… it doesn't include https://musicbrainz.org/release/7f76f20e-acbb-4c3… Can anyone see why?
      • 2019-05-09 12942, 2019

      • ZoeB
        I thought it was because it's joint by another artist as well, but upon a closer look, https://musicbrainz.org/release-group/7a2bb171-77… *is* included and that's a joint artist work too... Is there any other reason that EP might be excluded from the results?
      • 2019-05-09 12916, 2019

      • alastairp
        ZoeB: there are only 25 results there, but 31 in total
      • 2019-05-09 12920, 2019

      • ZoeB
        (I can see an argument that I *may* have over-engineered my website to be reliant upon this...)
      • 2019-05-09 12925, 2019

      • alastairp
        did you use the offset/limit?
      • 2019-05-09 12930, 2019

      • ZoeB
        Oh, it's paginated?
      • 2019-05-09 12933, 2019

      • alastairp
        yep
      • 2019-05-09 12940, 2019

      • alastairp
        default 25, you can select up to 100
      • 2019-05-09 12954, 2019

      • travis-ci joined the channel
      • 2019-05-09 12954, 2019

      • travis-ci
        Project bookbrainz-site build #2161: passed in 4 min 0 sec: https://travis-ci.org/bookbrainz/bookbrainz-site/…
      • 2019-05-09 12954, 2019

      • travis-ci has left the channel
      • 2019-05-09 12954, 2019

      • ZoeB
        &limit=100?
      • 2019-05-09 12909, 2019

      • alastairp
        that looks good
      • 2019-05-09 12916, 2019

      • ZoeB
        Thank you!
      • 2019-05-09 12928, 2019

      • alastairp
        I note that you're also doing a query for release-groups, but you're saying that a release isn't in the results
      • 2019-05-09 12935, 2019

      • alastairp
        was that an error?
      • 2019-05-09 12922, 2019

      • ruaok
        AB is back. the work_mem doesn't seem to be reducing the temp files. let me let it settle down for a bit.
      • 2019-05-09 12906, 2019

      • zas
        ruaok: what's the size of the ab database ?
      • 2019-05-09 12924, 2019

      • ruaok
        waaaay bigger than reo's mum.
      • 2019-05-09 12930, 2019

      • zas
        i mean in megabytes ? does it fit in ram or not ?
      • 2019-05-09 12934, 2019

      • ruaok
        no.
      • 2019-05-09 12923, 2019

      • ZoeB
        I'm somewhat manually recursing, pulling in "http://musicbrainz.org/ws/2/release? release-group={$releaseGroupID}&inc=recordings+artist-credits+url- rels&fmt=json" for each result. I only had https://wiki.musicbrainz.org/Development/JSON_Web… to go on, so a bit of trial and error was involved. It's not my best work, I'll be honest... (;-.-)
      • 2019-05-09 12912, 2019

      • ruaok
        zas: 588G /var/lib/docker/volumes/postgres-acousticbrainz-data/_data/base/130618
      • 2019-05-09 12927, 2019

      • alastairp
        right, that's fine. I just wasn't sure if you were doing that step when you said "it's not there"
      • 2019-05-09 12927, 2019

      • ruaok
        that is the largest file on disk.
      • 2019-05-09 12932, 2019

      • Mr_Monkey
        iliekcomputers: I can't seem to request your review from the PR page, so here I am: I fixed the issues I was having, but would love another pair of eyes to look at https://github.com/bookbrainz/bookbrainz-site/pul…
      • 2019-05-09 12933, 2019

      • ruaok
        close to 600GB in total
      • 2019-05-09 12917, 2019

      • ZoeB
        Ah, that fixed it, thank you so much!
      • 2019-05-09 12940, 2019

      • alastairp
        ZoeB: you can also do /release?artist={artist-id} if that helps? if you get 100 items at a time it might be less queries than doing 1 for every release-group-id?
      • 2019-05-09 12928, 2019

      • alastairp
        https://musicbrainz.org/doc/Development/XML_Web_S… has more documentation, including what types you can search for using what other types
      • 2019-05-09 12946, 2019

      • alastairp
        ignore the fact that it says XML Web Service, it's the same syntax, the only different is &fmt=
      • 2019-05-09 12947, 2019

      • ZoeB
        Thank you, I'll look into refactoring it like that! I definitely don't want to strain your server.
      • 2019-05-09 12956, 2019

      • djwhitey has quit
      • 2019-05-09 12911, 2019

      • iliekcomputers
        Mr_Monkey: yes, sorry, I'll look at it today (he said again).
      • 2019-05-09 12929, 2019

      • zas
        frank has hard drives, not SSD, much slower
      • 2019-05-09 12930, 2019

      • ruaok
        work mem at 128MB is doing better, but I feel that value is too large.
      • 2019-05-09 12953, 2019

      • Mr_Monkey
        iliekcomputers: Having fixed my issues (I forgot to copy crucial service files a couple of PRs ago), there's no huge rush.
      • 2019-05-09 12913, 2019

      • ZoeB has left the channel
      • 2019-05-09 12918, 2019

      • zas
        ruaok: where can i see the current frank's pg config ?
      • 2019-05-09 12958, 2019

      • ruaok
      • 2019-05-09 12907, 2019

      • ruaok
        are all the default values.
      • 2019-05-09 12958, 2019

      • zas
        shared_buffers is 128M ? that looks very low to me
      • 2019-05-09 12907, 2019

      • zas
      • 2019-05-09 12911, 2019

      • ruaok
      • 2019-05-09 12925, 2019

      • ruaok
        that are the actual values that override the defaults.
      • 2019-05-09 12935, 2019

      • zas
        ah ok
      • 2019-05-09 12939, 2019

      • ruaok
        16GB, but it should be 32GB or even 40GB.
      • 2019-05-09 12945, 2019

      • ruaok
        that is the next thing I want to change.
      • 2019-05-09 12931, 2019

      • zas
        yup, i'd say at least 32GB, especially for such big db
      • 2019-05-09 12938, 2019

      • ruaok
      • 2019-05-09 12902, 2019

      • ruaok
        but I want to let the current work_mem at 128MB run for a bit.
      • 2019-05-09 12908, 2019

      • zas
        lgtm, but i don't expect a miracle
      • 2019-05-09 12940, 2019

      • ruaok
        the miracle fixing comes from an ORDER BY clause gets removed.
      • 2019-05-09 12953, 2019

      • ruaok
        *being
      • 2019-05-09 12946, 2019

      • ruaok
        disks pegged to 100% again.
      • 2019-05-09 12901, 2019

      • ruaok
        load 23. ok, never mind, I'll push this out now.
      • 2019-05-09 12929, 2019

      • ruaok
        hit approve on the PR, zas?
      • 2019-05-09 12920, 2019

      • zas
        done
      • 2019-05-09 12923, 2019

      • ruaok
        thx
      • 2019-05-09 12949, 2019

      • zas
        sorry, my connection is unstable (phone+train)
      • 2019-05-09 12900, 2019

      • ruaok
        no worries.
      • 2019-05-09 12910, 2019

      • ruaok
        you on AVE or TGV?
      • 2019-05-09 12951, 2019

      • zas
        ave
      • 2019-05-09 12912, 2019

      • zas
        did you notice frank's disk I/O are mostly writes?
      • 2019-05-09 12954, 2019

      • yvanzo
        hi zas: I checked MBS-7130 and it is not resolved yet.
      • 2019-05-09 12955, 2019

      • BrainzBot
        MBS-7130: Recordings and tracklists: allow setting milliseconds (for other formats than CDDA) https://tickets.metabrainz.org/browse/MBS-7130
      • 2019-05-09 12904, 2019

      • ruaok
        yes. query temp storage.
      • 2019-05-09 12909, 2019

      • ruaok
        that's what I've been working to address.
      • 2019-05-09 12956, 2019

      • zas
        ok, makes sense, hence work_mem changes
      • 2019-05-09 12914, 2019

      • zas
        what was the highest value you tried for work_mem ?
      • 2019-05-09 12949, 2019

      • ruaok
        the current of 128MB
      • 2019-05-09 12935, 2019

      • chhavi_ joined the channel
      • 2019-05-09 12901, 2019

      • zas
        try 512M, just to see if it has any effect, i think queries and/or indexes just need huge optimization, especially if tables/db are big
      • 2019-05-09 12947, 2019

      • ruaok
        first I want to see the effect of more shared buffers.
      • 2019-05-09 12926, 2019

      • zas
        the traffic isn't that high, we could set up telegraf to collect frank's pg stats as we do for bowie, but it needs some config on pg side, we may do that tomorrow
      • 2019-05-09 12945, 2019

      • zas
        just to see how many transactions etc...
      • 2019-05-09 12913, 2019

      • yvanzo
        ruaok: I just checked the Alpine CVE, Solr doesn’t rely on system authentification, thus mb-solr is not affected.