#metabrainz

/

      • Sophist_UK joined the channel
      • 2020-02-22 05300, 2020

      • supersandro2000 has quit
      • 2020-02-22 05315, 2020

      • supersandro2000 joined the channel
      • 2020-02-22 05349, 2020

      • ruaok picked a bad time for a nap.
      • 2020-02-22 05350, 2020

      • Sophist-UK has quit
      • 2020-02-22 05307, 2020

      • zas
        About MB / Json APIs, I was reading https://www.haproxy.com/blog/using-haproxy-as-an-…
      • 2020-02-22 05320, 2020

      • zas
        Can MB provide such JWT tokens using current oauth stuff?
      • 2020-02-22 05350, 2020

      • zas
        yvanzo, bitmap: one thing prevents us to provide embedded bandcamp player, we have Bandcamp URLs for a streameable album, but not the Bandcamp Album ID (named in following BAID)
      • 2020-02-22 05315, 2020

      • BrainzGit
        [listenbrainz-server] paramsingh merged pull request #739 (master…importer-tests): Write tests for the LastFM Importer https://github.com/metabrainz/listenbrainz-server…
      • 2020-02-22 05325, 2020

      • zas
        we can actually retrieve it from Bandcamp web pages
      • 2020-02-22 05346, 2020

      • zas
        because it is in head/meta (and in cookies)
      • 2020-02-22 05336, 2020

      • zas
        I think it would be actually useful to store it in db: release:mbid -> BAID
      • 2020-02-22 05358, 2020

      • zas
        what do you think?
      • 2020-02-22 05333, 2020

      • ruaok
        130M listens imported, no degradation
      • 2020-02-22 05336, 2020

      • iliekcomputers
        pristine__: hi
      • 2020-02-22 05320, 2020

      • Sophist-UK joined the channel
      • 2020-02-22 05311, 2020

      • iliekcomputers
        ruaok: the newleader isn't a hetzner vm now, right?
      • 2020-02-22 05326, 2020

      • iliekcomputers
        i'm seeing degraded performance on it while loading a new dump
      • 2020-02-22 05331, 2020

      • ruaok
        hetzner bare metal.
      • 2020-02-22 05339, 2020

      • iliekcomputers
        so i thought i'd restart the spark containers
      • 2020-02-22 05300, 2020

      • iliekcomputers
        and now it's stuck at that for some reason.
      • 2020-02-22 05305, 2020

      • iliekcomputers
      • 2020-02-22 05332, 2020

      • ruaok
        let see if the nodes are healthy
      • 2020-02-22 05307, 2020

      • Sophist_UK has quit
      • 2020-02-22 05356, 2020

      • iliekcomputers
        restart worked
      • 2020-02-22 05310, 2020

      • iliekcomputers
        i mean it restarted
      • 2020-02-22 05316, 2020

      • iliekcomputers
        not sure of performance yet
      • 2020-02-22 05349, 2020

      • pristine__
        iliekcomputers: hi
      • 2020-02-22 05312, 2020

      • iliekcomputers
        pristine__: nvm, just saw your reply to my reply on the pr
      • 2020-02-22 05319, 2020

      • ruaok
        iliekcomputers: remind me, what were our rules for listen uniqueness?
      • 2020-02-22 05325, 2020

      • ruaok
        (user, timestamp) ?
      • 2020-02-22 05335, 2020

      • ruaok
        (user, timestamp, recording_msid) ?
      • 2020-02-22 05344, 2020

      • iliekcomputers
        B
      • 2020-02-22 05349, 2020

      • ruaok
        K
      • 2020-02-22 05321, 2020

      • ruaok
        that will be fun to put that unique index on the old data.
      • 2020-02-22 05351, 2020

      • ruaok
        I guess I should make the importer handle it, since all the listens are sorted and doing a dup check would be rather simple.
      • 2020-02-22 05355, 2020

      • ruaok
        zas: I'm eyeing a dedicated time series database server in order to bring more capacity to LB.
      • 2020-02-22 05303, 2020

      • ruaok
        but we have no rackspace.
      • 2020-02-22 05338, 2020

      • ruaok
        could we get rid of another server or do we need to go do using VLANs?
      • 2020-02-22 05356, 2020

      • zas
        I don't think we can get rid of any atm
      • 2020-02-22 05341, 2020

      • ruaok
        I see a lot of underutilized machines when I look at stats.
      • 2020-02-22 05342, 2020

      • ruaok
        paco, serge, cage.
      • 2020-02-22 05336, 2020

      • ruaok
        paco, for instance.
      • 2020-02-22 05353, 2020

      • ruaok
        60GB of ram free. 8% CPU
      • 2020-02-22 05314, 2020

      • ruaok
        are you *sure* we can't forego one of these machines?
      • 2020-02-22 05322, 2020

      • ruaok
        the timescale import is 50% done, 35k rows/s still, 20 - 30% disk, 8% CPU
      • 2020-02-22 05334, 2020

      • yvanzo
        zas: that could also be retrieved client-side using React.js
      • 2020-02-22 05355, 2020

      • yvanzo
        zas: storing it in the DB will be possible with attributes, on my todolist.
      • 2020-02-22 05303, 2020

      • yvanzo
        ruaok: can you please add me on trello?
      • 2020-02-22 05355, 2020

      • ruaok
        what was your username?
      • 2020-02-22 05324, 2020

      • yvanzo
        yvanzo
      • 2020-02-22 05306, 2020

      • ruaok
        added to team and invited to that board.
      • 2020-02-22 05338, 2020

      • yvanzo
        received, thanks!
      • 2020-02-22 05333, 2020

      • Etua has quit
      • 2020-02-22 05311, 2020

      • Etua joined the channel
      • 2020-02-22 05358, 2020

      • Protab joined the channel
      • 2020-02-22 05358, 2020

      • Rotab has quit
      • 2020-02-22 05335, 2020

      • zas
        ruaok: I'll have a look on Monday, we can reduce a bit redundancy of mb services, and free up cage for example
      • 2020-02-22 05316, 2020

      • ruaok
        as long as it doesn't lead to SPoF, then great!
      • 2020-02-22 05312, 2020

      • Protab is now known as Rotab
      • 2020-02-22 05346, 2020

      • Etua has quit
      • 2020-02-22 05317, 2020

      • Cyna
        bitmap: reosarevok https://github.com/metabrainz/musicbrainz-server/… might be ready for review
      • 2020-02-22 05329, 2020

      • Cyna
        Waiting for CI to complete it's testing
      • 2020-02-22 05343, 2020

      • eharris has quit
      • 2020-02-22 05326, 2020

      • Etua joined the channel
      • 2020-02-22 05343, 2020

      • ruaok
        import complete 215 minutes, including building an index.
      • 2020-02-22 05301, 2020

      • rdswift
        outsidecontext, ping.
      • 2020-02-22 05351, 2020

      • outsidecontext
        rdswift: hi
      • 2020-02-22 05352, 2020

      • rdswift
        Just wondering if you want me to have the PR for the Picard function documentation to go to metabrainz/picard-website/master or to the 2.3.1 branch on your clone.
      • 2020-02-22 05335, 2020

      • outsidecontext
        please make it against the 2.3.1 branch. then we merge it and finally merge this 2.3.1 pull request
      • 2020-02-22 05335, 2020

      • rdswift
        I didn't see a 2.3.1 branch on metabrainz
      • 2020-02-22 05312, 2020

      • outsidecontext
        ah, I thought I had it pushed there :(
      • 2020-02-22 05350, 2020

      • rdswift
        The PR from your clone points to the master branch on metabrainz
      • 2020-02-22 05307, 2020

      • rdswift
        Thus my confusion.
      • 2020-02-22 05316, 2020

      • outsidecontext
        fixed now, branch is picard-2.3.1
      • 2020-02-22 05327, 2020

      • rdswift
        Perfect! Thanks.
      • 2020-02-22 05314, 2020

      • rdswift
        Also, I thought I would just add the new functions to the documentation for now, and treat the restructure of the docs as a separate project.
      • 2020-02-22 05358, 2020

      • outsidecontext
        yes, that makes sense
      • 2020-02-22 05309, 2020

      • rdswift
        Thanks.
      • 2020-02-22 05317, 2020

      • outsidecontext
        thank you :)
      • 2020-02-22 05326, 2020

      • Sophist-UK has quit
      • 2020-02-22 05338, 2020

      • rdswift
        Not sure if it's an issue, but codacy is not set up to run on the new metabrainz/picard-website/2.3.1 branch so the testing on my PR is failing.
      • 2020-02-22 05315, 2020

      • tmontney joined the channel
      • 2020-02-22 05314, 2020

      • tmontney
      • 2020-02-22 05348, 2020

      • tmontney
        Running on Ubuntu 18.04, first issue started at cpanm --installdeps --notest .
      • 2020-02-22 05304, 2020

      • tmontney
        LWP::Protocol::https failed, but running sudo apt-get install libssl-dev solved it
      • 2020-02-22 05353, 2020

      • tmontney
        ./script/compile_resources.sh failed complaining about "no scenarios" but removing cmdtest and installing yarn via https://classic.yarnpkg.com/en/docs/install/#debi… got me a bit further
      • 2020-02-22 05314, 2020

      • tmontney
        however still fails due to version mismatch
      • 2020-02-22 05351, 2020

      • tmontney
        ./admin/InitDb.pl --createdb --clean claims 'could not open extension control file "/usr/share/postgresql/11/extension/musicbrainz_collate.control"'
      • 2020-02-22 05355, 2020

      • tmontney
        but the file exists
      • 2020-02-22 05341, 2020

      • tmontney
        Instance is a fresh VMWare VM
      • 2020-02-22 05352, 2020

      • tmontney
        I know https://musicbrainz.org/doc/MusicBrainz_Server/Se… says VMWare is not supported, but assume that's if you use the .ova
      • 2020-02-22 05310, 2020

      • ruaok
        we actually suggest using this now:
      • 2020-02-22 05311, 2020

      • ruaok
      • 2020-02-22 05324, 2020

      • ruaok
      • 2020-02-22 05338, 2020

      • ruaok
        fire up a VM, install docker, then run that. wait. done.
      • 2020-02-22 05340, 2020

      • tmontney
        Obviously I missed that but where was that on the site?
      • 2020-02-22 05356, 2020

      • tmontney
        Landed here from Google https://musicbrainz.org/ then say "MusicBrainz Database" download section
      • 2020-02-22 05307, 2020

      • tmontney
        Saw "Download and/or install the database"
      • 2020-02-22 05326, 2020

      • tmontney
        the Setup section should include the docker option
      • 2020-02-22 05333, 2020

      • tmontney
        because that's awesome, that should be much easier
      • 2020-02-22 05345, 2020

      • ruaok
        ok, correction. we should state that that is now the preferred version.
      • 2020-02-22 05349, 2020

      • ruaok
        yvanzo: don't you think?
      • 2020-02-22 05307, 2020

      • ruaok
        i'm super happy to get rid of VM images.
      • 2020-02-22 05313, 2020

      • tmontney
        I'll give the docker option a try
      • 2020-02-22 05319, 2020

      • tmontney
        but is the Install.md still valid?
      • 2020-02-22 05350, 2020

      • ruaok
        as log as you use this branch and follow the instruction in it, yes: https://github.com/metabrainz/musicbrainz-docker/…
      • 2020-02-22 05351, 2020

      • tmontney
        I assume it should be install docker or install manually
      • 2020-02-22 05322, 2020

      • tmontney
        ok
      • 2020-02-22 05352, 2020

      • ruaok
        I follow these instructions to install docker: https://docs.docker.com/install/linux/docker-ce/u…
      • 2020-02-22 05302, 2020

      • ruaok
        iliekcomputers: so far I am really appreciating timescale. using it, it seems about as snappy as influx for the operations we need.
      • 2020-02-22 05330, 2020

      • ruaok
        but it has updates and it is postgres. the one hack we wont be able to get rid of is the per user listen count time series.
      • 2020-02-22 05322, 2020

      • tmontney
        no I meant say someone didn't want to use docker
      • 2020-02-22 05331, 2020

      • tmontney
        is there a manual method still, or is this the only supported way?
      • 2020-02-22 05350, 2020

      • tmontney
        e.g. their distro doesn't support docker
      • 2020-02-22 05313, 2020

      • ruaok
        only documented way. clearly it will always be feasible to install by hand, but it would require some small amount of insanity.
      • 2020-02-22 05343, 2020

      • ruaok
        its just grown too complex to realistically install by hand
      • 2020-02-22 05352, 2020

      • tmontney
        fair enough
      • 2020-02-22 05327, 2020

      • Etua has quit
      • 2020-02-22 05315, 2020

      • eharris joined the channel
      • 2020-02-22 05327, 2020

      • tmontney
        one other question
      • 2020-02-22 05339, 2020

      • tmontney
        the whole reason I'm interested in musicbrainz is search
      • 2020-02-22 05350, 2020

      • ruaok
        if you use the VM, search is built in.
      • 2020-02-22 05357, 2020

      • tmontney
        there's an API right?
      • 2020-02-22 05302, 2020

      • ruaok
        yep.
      • 2020-02-22 05306, 2020

      • tmontney
        I have strings of text, some a bit garbled/partial
      • 2020-02-22 05317, 2020

      • tmontney
        that I'd like to find the closest match
      • 2020-02-22 05317, 2020

      • ruaok
        it is exactly the one we use in production, but no rate limits.
      • 2020-02-22 05321, 2020

      • ruaok
        OH!
      • 2020-02-22 05327, 2020

      • ruaok
        really? I have a data set for you.
      • 2020-02-22 05334, 2020

      • tmontney
        a data set?
      • 2020-02-22 05354, 2020

      • tmontney
        what for
      • 2020-02-22 05316, 2020

      • ruaok
        given an artist and a recording string, it give give you a MB recording ID (and consistent release) and artist ID.
      • 2020-02-22 05345, 2020

      • ruaok
      • 2020-02-22 05350, 2020

      • tmontney
        A string could look like Man In The Box by ALICE IN CHAINS
      • 2020-02-22 05302, 2020

      • tmontney
        however, I believe they all come in as {Title} by {Artist}
      • 2020-02-22 05309, 2020

      • tmontney
        so i could do a simple string split
      • 2020-02-22 05315, 2020

      • ruaok
        if you can reliably split then, this could work.
      • 2020-02-22 05325, 2020

      • ruaok
        it would be a simple index lookup.
      • 2020-02-22 05346, 2020

      • tmontney
        I'd say like 95% the text is fully accurate
      • 2020-02-22 05351, 2020

      • tmontney
        but I'd rather not miss any
      • 2020-02-22 05353, 2020

      • ruaok
        but it doesn't handle inexact searches. this data is based on all the data that has ever been submitted to listenbrainz.
      • 2020-02-22 05313, 2020

      • ruaok
        if you care about accuracy, the I would go with the VM with search.
      • 2020-02-22 05332, 2020

      • tmontney
        Yeah, because otherwise it would've been like me hitting YouTube or Google and taking the first result as 100% accurate
      • 2020-02-22 05302, 2020

      • tmontney
        i shouldn't even say I care about accuracy
      • 2020-02-22 05306, 2020

      • ruaok
      • 2020-02-22 05308, 2020

      • tmontney
        just "here's another opinion"
      • 2020-02-22 05344, 2020

      • tmontney
        interesting
      • 2020-02-22 05314, 2020

      • ruaok
        iliekcomputers: 120k rows/s now. 4 threads. SSD is saturated.
      • 2020-02-22 05329, 2020

      • ruaok
        with dedup