#metabrainz

/

      • iliekcomputers
        ruaok: ferbncode: we still need to make the dataframes persistent
      • 2018-04-01 09128, 2018

      • iliekcomputers
      • 2018-04-01 09159, 2018

      • ruaok
        yeah, I noticed that the script trailed off a bit, but I wanted to start playing with it anyway.
      • 2018-04-01 09115, 2018

      • ruaok
        the old recommendation training ran 10 minutes on the cluster. O_O
      • 2018-04-01 09154, 2018

      • iliekcomputers
        yike
      • 2018-04-01 09157, 2018

      • iliekcomputers
        yikes
      • 2018-04-01 09123, 2018

      • ruaok
        yeah. and that was with 4 * 27GB of ram.
      • 2018-04-01 09115, 2018

      • ruaok
        and the LB dump appears corrupt. I can reproduce mineo's problem.
      • 2018-04-01 09132, 2018

      • outsidecontext joined the channel
      • 2018-04-01 09145, 2018

      • outsidecontext has quit
      • 2018-04-01 09146, 2018

      • drsaunde has quit
      • 2018-04-01 09154, 2018

      • ruaok
        iliekcomputers: the other LB dump is corrupt too. :(
      • 2018-04-01 09104, 2018

      • ruaok
      • 2018-04-01 09134, 2018

      • ruaok
        we need to fix it so that it won't recommend tracks you've got in your listen history.
      • 2018-04-01 09100, 2018

      • UmkaDK has quit
      • 2018-04-01 09148, 2018

      • iliekcomputers
        We subtract user tracks from all tracks already: https://github.com/metabrainz/listenbrainz-recomm…
      • 2018-04-01 09118, 2018

      • iliekcomputers
        not sure why that isn't working
      • 2018-04-01 09148, 2018

      • bukwurm joined the channel
      • 2018-04-01 09112, 2018

      • ruaok
        maybe it is recommending different versions of the same recording?
      • 2018-04-01 09108, 2018

      • akhilesh has quit
      • 2018-04-01 09115, 2018

      • akhilesh joined the channel
      • 2018-04-01 09136, 2018

      • iliekcomputers
        maybe you didn't listen to the track in 2017?
      • 2018-04-01 09150, 2018

      • ruaok
        ah! great observation!
      • 2018-04-01 09123, 2018

      • ruaok
        the latest data dump inside the LB container is not corrupt.
      • 2018-04-01 09129, 2018

      • ruaok
        copying that to the master.
      • 2018-04-01 09121, 2018

      • sentriz joined the channel
      • 2018-04-01 09116, 2018

      • UmkaDK joined the channel
      • 2018-04-01 09159, 2018

      • ruaok starts an LB data load
      • 2018-04-01 09129, 2018

      • bukwurm
        LordSputnik: The tests on bb-data indeed fail. There's been a schema change in last two years?
      • 2018-04-01 09115, 2018

      • bukwurm
        Here's a gist: relation "bookbrainz._editor_entity_visits" does not exist
      • 2018-04-01 09136, 2018

      • bukwurm
        column "link_phrase" of relation "relationship_type" does not exist
      • 2018-04-01 09131, 2018

      • bukwurm
        (and so does reverse_link_phrase)
      • 2018-04-01 09144, 2018

      • bukwurm
        Same with "editor_type"
      • 2018-04-01 09125, 2018

      • bukwurm
        Nopes, not editor_type, sorry.
      • 2018-04-01 09135, 2018

      • bukwurm
        Initially ran into some permission issues, thought that were causing it - but I added all permissions.
      • 2018-04-01 09153, 2018

      • iliekcomputers
        ruaok: did it stop with an error?
      • 2018-04-01 09101, 2018

      • ruaok
        iliekcomputers: ferbncode: yep.
      • 2018-04-01 09111, 2018

      • UmkaDK has quit
      • 2018-04-01 09113, 2018

      • ruaok
        out of heap space trying to import a user with 28k listens.
      • 2018-04-01 09129, 2018

      • ruaok
        I think we need to brainstorm how to refactor the import to not have to do much work on the client
      • 2018-04-01 09146, 2018

      • iliekcomputers
        first idea that comes to mind is
      • 2018-04-01 09100, 2018

      • iliekcomputers
        we modify the listenbrainz dumps code to add username to the listen json doc
      • 2018-04-01 09113, 2018

      • iliekcomputers
        after that create a new dump and load it directly using sc.textFile
      • 2018-04-01 09151, 2018

      • ruaok
        doh. that's a great idea.
      • 2018-04-01 09103, 2018

      • ruaok
        given that we're running into this problem, others will have the same problem.
      • 2018-04-01 09105, 2018

      • ruaok
        let's do that.
      • 2018-04-01 09132, 2018

      • iliekcomputers
        alright, on it
      • 2018-04-01 09151, 2018

      • LordSputnik
      • 2018-04-01 09112, 2018

      • LordSputnik
        That's the data the client will send to the server for relationships (from the edition editor)
      • 2018-04-01 09128, 2018

      • bukwurm
        LordSputnik: Ok
      • 2018-04-01 09149, 2018

      • bukwurm
        LordSputnik: I relationships I have (got through the pg_dump and it's same with the schameSpy diagram on readthedocs) doesn't have linkphrase?
      • 2018-04-01 09159, 2018

      • bukwurm
        How do I update it?
      • 2018-04-01 09114, 2018

      • ruaok
        Mineo: well spotted, the import runs faster with that correct. :)
      • 2018-04-01 09123, 2018

      • LordSputnik
        bukwurm: let me check what's going on
      • 2018-04-01 09128, 2018

      • LordSputnik
      • 2018-04-01 09141, 2018

      • ruaok
        iliekcomputers: we should still proceed with the LB dump change -- that will still make this faster.
      • 2018-04-01 09111, 2018

      • ruaok
        but now it dies with an assertion errore:
      • 2018-04-01 09112, 2018

      • ruaok
      • 2018-04-01 09133, 2018

      • bukwurm
      • 2018-04-01 09153, 2018

      • LordSputnik
        bukwurm: ok, so the relationship thing is fairly easy - that's part of the new relationship editor stuff and not yet applied on the server
      • 2018-04-01 09159, 2018

      • LordSputnik
        So you'll need to run the migration scrip
      • 2018-04-01 09107, 2018

      • bukwurm
        LordSputnik: Ok
      • 2018-04-01 09111, 2018

      • LordSputnik
      • 2018-04-01 09117, 2018

      • LordSputnik
        "up.sql" will upgrade you there
      • 2018-04-01 09137, 2018

      • bukwurm
        LordSputnik: Ok
      • 2018-04-01 09101, 2018

      • bukwurm
        LordSputnik: Also, one side question.
      • 2018-04-01 09109, 2018

      • LordSputnik
        And you're right that there's no _editor_entity_visits on the server, that'll need looking into
      • 2018-04-01 09132, 2018

      • bukwurm
        bookbrainz_test is using same schema and sequences as bookbrainz db
      • 2018-04-01 09159, 2018

      • bukwurm
        This is causing "insert into "bookbrainz"."relationship_set" ("id") values ($1) returning "id" - duplicate key value violates unique constraint "relationship_set_pkey""
      • 2018-04-01 09101, 2018

      • D4RK-PH0ENiX has quit
      • 2018-04-01 09127, 2018

      • bukwurm
        Any idea how to fix it? I am not well versed with postgres
      • 2018-04-01 09104, 2018

      • bukwurm
        But I get the notion that sequence of bookbrainz db is pointing to the next value to be inserted with few relations
      • 2018-04-01 09129, 2018

      • bukwurm
        And as the bookbrainz-data relations (tables) are still empty, this error comes up
      • 2018-04-01 09130, 2018

      • LordSputnik
        bukwurm: is that during tests?
      • 2018-04-01 09135, 2018

      • bukwurm
        Yes!
      • 2018-04-01 09139, 2018

      • LordSputnik
        That usually means that the tables haven't been truncated properly
      • 2018-04-01 09102, 2018

      • LordSputnik
        That will only happen when we specify an ID when inserting data, and the ID already exists in the table
      • 2018-04-01 09114, 2018

      • bukwurm
        LordSputnik: Ok
      • 2018-04-01 09119, 2018

      • bukwurm
        Let me check
      • 2018-04-01 09142, 2018

      • LordSputnik
        If you rerun the test when the relationship issue is fixed, the truncation stuff should work properly :)
      • 2018-04-01 09147, 2018

      • bukwurm
        Ok that's written in the err message itself :P
      • 2018-04-01 09104, 2018

      • bukwurm
        LordSputnik: Ok that great!
      • 2018-04-01 09108, 2018

      • bukwurm
        On it!
      • 2018-04-01 09109, 2018

      • LordSputnik
        (we should also change the tests so that truncation doesn't break when something else breaks)
      • 2018-04-01 09124, 2018

      • bukwurm
        Ok
      • 2018-04-01 09100, 2018

      • ruaok
        iliekcomputers: ferbncode: more data: https://gist.github.com/mayhem/6f714ab5d1a312acbd…
      • 2018-04-01 09124, 2018

      • bukwurm
        LordSputnik: I'll look into it too. :D
      • 2018-04-01 09140, 2018

      • iliekcomputers
      • 2018-04-01 09147, 2018

      • github joined the channel
      • 2018-04-01 09147, 2018

      • github
        [listenbrainz-server] paramsingh opened pull request #397: Add username to listen json docs in dumps (production...usernames-in-dumps) https://git.io/vxKNM
      • 2018-04-01 09147, 2018

      • github has left the channel
      • 2018-04-01 09101, 2018

      • bukwurm_
        LordSputnik: Probably should update the schemaSpy diagram on the docs. That's a lifesaver for beginners!
      • 2018-04-01 09153, 2018

      • LordSputnik
        bukwurm_: definitely, that applies to everything on the docs really!
      • 2018-04-01 09155, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-04-01 09125, 2018

      • UmkaDK joined the channel
      • 2018-04-01 09148, 2018

      • D4RK-PH0ENiX has quit
      • 2018-04-01 09102, 2018

      • D4RK-PH0ENiX joined the channel
      • 2018-04-01 09116, 2018

      • LordSputnik
        bukwurm_: aha! So I think _editor_entity_visits used to exist. Then we restored the database from a dump, and it's not in the dumps (it's personal info), so we lost it
      • 2018-04-01 09129, 2018

      • github joined the channel
      • 2018-04-01 09129, 2018

      • github
        [listenbrainz-server] mayhem closed pull request #397: Add username to listen json docs in dumps (production...usernames-in-dumps) https://git.io/vxKNM
      • 2018-04-01 09129, 2018

      • github has left the channel
      • 2018-04-01 09105, 2018

      • LordSputnik
        bukwurm_: I've re-added it now, but you'll want to reflect that in your own copy by using the last few lines of https://github.com/bookbrainz/bookbrainz-sql/blob…
      • 2018-04-01 09132, 2018

      • bukwurm
        LordSputnik: Ok
      • 2018-04-01 09123, 2018

      • akhilesh_ joined the channel
      • 2018-04-01 09124, 2018

      • bukwurm
        LordSputnik: Does BB-site has functionality to add explorer achievement?
      • 2018-04-01 09130, 2018

      • LordSputnik
        Yup
      • 2018-04-01 09138, 2018

      • LordSputnik
        I'm not sure why there was no error
      • 2018-04-01 09153, 2018

      • bukwurm
        Weird
      • 2018-04-01 09107, 2018

      • bukwurm
        LordSputnik: Another thing, bb-site uses unminified react
      • 2018-04-01 09130, 2018

      • bukwurm
        Is it by design?
      • 2018-04-01 09158, 2018

      • akhilesh_
        LordSputnik: hey :)
      • 2018-04-01 09117, 2018

      • bukwurm
      • 2018-04-01 09139, 2018

      • ruaok triggers a dump on LB
      • 2018-04-01 09142, 2018

      • travis-ci joined the channel
      • 2018-04-01 09143, 2018

      • travis-ci
        Project bookbrainz-data-js build #753: passed in 3 min 26 sec: https://travis-ci.org/bookbrainz/bookbrainz-data-…
      • 2018-04-01 09143, 2018

      • travis-ci has left the channel
      • 2018-04-01 09107, 2018

      • LordSputnik
        bukwurm: well, it should get minified along with everything else in the build process
      • 2018-04-01 09122, 2018

      • LordSputnik
        I think that's merged now, maybe not updated on the server
      • 2018-04-01 09144, 2018

      • bukwurm_
        LordSputnik: Ok, cool
      • 2018-04-01 09149, 2018

      • UmkaDK has quit
      • 2018-04-01 09115, 2018

      • akhilesh_
        LordSputnik: Did you see a new table of count entity in the stats page?
      • 2018-04-01 09139, 2018

      • LordSputnik
        akhilesh_: no, not yet, maybe later today
      • 2018-04-01 09146, 2018

      • LordSputnik
        I've been a bit busy the last couple of days
      • 2018-04-01 09129, 2018

      • bukwurm
      • 2018-04-01 09155, 2018

      • bukwurm
        LordSputnik: The local build is also showing the same. 🤔
      • 2018-04-01 09134, 2018

      • akhilesh_
        LordSputnik: ok, after seeing you may suggest that , what other table can be added in the statas page.
      • 2018-04-01 09142, 2018

      • bukwurm
        I assume the production and dev environment run different builds.
      • 2018-04-01 09110, 2018

      • LordSputnik
        bukwurm: maybe. I'd have to check that
      • 2018-04-01 09128, 2018

      • LordSputnik
        akhilesh_: ok, I'll have a think and let you know
      • 2018-04-01 09107, 2018

      • bukwurm
        Meanwhile, I'll check the build process. Never used gulp/grunt, so lots to learn! :)
      • 2018-04-01 09149, 2018

      • LordSputnik
        bukwurm: neither do we, so that's good :D
      • 2018-04-01 09100, 2018

      • LordSputnik
        We use command line scripts called from npm directly
      • 2018-04-01 09109, 2018

      • LordSputnik
        There's no real need for gulp/grunt
      • 2018-04-01 09140, 2018

      • bukwurm_
        Oh
      • 2018-04-01 09157, 2018

      • LordSputnik
        We did used to use gulp, but then simplified things
      • 2018-04-01 09159, 2018

      • bukwurm_
        I saw somewhere grunt file
      • 2018-04-01 09112, 2018

      • bukwurm_
        Lol it was in the .gitignore
      • 2018-04-01 09115, 2018

      • LordSputnik
        Although if we move to webpack in the near future, I suspect that some of the build process will need to change
      • 2018-04-01 09118, 2018

      • LordSputnik
        Haha!
      • 2018-04-01 09117, 2018

      • bukwurm_
        LordSputnik: Webpack is hot cake right now :P
      • 2018-04-01 09147, 2018

      • UmkaDK joined the channel
      • 2018-04-01 09159, 2018

      • bukwurm_
        I'll open up JIRA for it! :)
      • 2018-04-01 09135, 2018

      • bukwurm
        I think webpack is a GSoC org this year
      • 2018-04-01 09146, 2018

      • iliekcomputers
        ruaok: we modified load_data to use the latest dump structure
      • 2018-04-01 09100, 2018

      • iliekcomputers
        pushed to master
      • 2018-04-01 09107, 2018

      • ruaok
        awesome. I'll run it as soon as I get the dumps.
      • 2018-04-01 09132, 2018

      • ruaok
        making good progress -- now to make a script to train from the loaded RDDs?
      • 2018-04-01 09146, 2018

      • iliekcomputers
        yeah, starting on that now
      • 2018-04-01 09149, 2018

      • ruaok
        I'm going to look at dockeriizing a spark setup -- that stuff I was using yesterday was bunk.
      • 2018-04-01 09103, 2018

      • ruaok
        but I think I found a clever way to make it for one a single cluster.