#metabrainz

/

      • shivam-kapila
        What do we use BiqQuery for currently?
      • 2020-05-02 12313, 2020

      • iliekcomputers
        nothing, it's mostly a private archive right now
      • 2020-05-02 12356, 2020

      • shivam-kapila
        Shouldn't we clean out it from LB repo or there is some reason its still there?
      • 2020-05-02 12301, 2020

      • sumedh joined the channel
      • 2020-05-02 12348, 2020

      • jmp_music joined the channel
      • 2020-05-02 12327, 2020

      • iliekcomputers
        yeah, we should remove all the bigquery stats code
      • 2020-05-02 12333, 2020

      • iliekcomputers
        the bigquery-writer still works
      • 2020-05-02 12342, 2020

      • shivam-kapila
        Yes I meant the stats part
      • 2020-05-02 12314, 2020

      • iliekcomputers
        i'll open a ticket, it's a bit non-trivial
      • 2020-05-02 12339, 2020

      • iliekcomputers
        it's at least an hour or two of work
      • 2020-05-02 12344, 2020

      • iliekcomputers
        LB-550
      • 2020-05-02 12345, 2020

      • BrainzBot
        LB-550: Remove bigquery statistics code https://tickets.metabrainz.org/browse/LB-550
      • 2020-05-02 12319, 2020

      • iliekcomputers
        shivam-kapila: i made your changes to the docs PR
      • 2020-05-02 12304, 2020

      • shivam-kapila
        Yeah I had a look. Thanks :). Your comment about the develop.sh makes me inclined to think that we can have a separate file to wrap non docker-compose tasks.
      • 2020-05-02 12304, 2020

      • iliekcomputers
        what tasks are you talking about ?
      • 2020-05-02 12313, 2020

      • shivam-kapila
        The manage ones
      • 2020-05-02 12322, 2020

      • shivam-kapila
        ./develop.sh manage
      • 2020-05-02 12325, 2020

      • shivam-kapila
        types
      • 2020-05-02 12326, 2020

      • iliekcomputers
        manage.py is supposed to be that
      • 2020-05-02 12340, 2020

      • iliekcomputers
        we're making a manage script over a manage script
      • 2020-05-02 12343, 2020

      • moufl has quit
      • 2020-05-02 12346, 2020

      • iliekcomputers
        makes no sense
      • 2020-05-02 12354, 2020

      • moufl joined the channel
      • 2020-05-02 12329, 2020

      • shivam-kapila
        Oh yes. Sorry. That was dumb :/
      • 2020-05-02 12355, 2020

      • iliekcomputers
        nah, that's ok. i'm just trying to reduce complexity in the dev environment as much as possible.
      • 2020-05-02 12358, 2020

      • iliekcomputers
        for example
      • 2020-05-02 12315, 2020

      • iliekcomputers
        both develop.sh and spark_develop.sh have a `format_namenode` for some reason
      • 2020-05-02 12346, 2020

      • iliekcomputers
        I don't think we need the `develop.sh npm` either, that automatically happens
      • 2020-05-02 12313, 2020

      • shivam-kapila
        Yes we can remove that
      • 2020-05-02 12346, 2020

      • iliekcomputers
        `psql` I'm okay with, but it's not documented anywhere
      • 2020-05-02 12320, 2020

      • iliekcomputers
        `manage` i'm not so sure about
      • 2020-05-02 12323, 2020

      • shivam-kapila
        > both develop.sh and spark_develop.sh have a `format_namenode` for some reason
      • 2020-05-02 12323, 2020

      • shivam-kapila
        The one in develop.sh can be removed. Its useless. The one in spark_develop.sh accomplishes the same thing
      • 2020-05-02 12341, 2020

      • iliekcomputers
        yes
      • 2020-05-02 12309, 2020

      • iliekcomputers
        spark_develop.sh format namenode used to take a clusterId argument that i had no idea where to find
      • 2020-05-02 12327, 2020

      • iliekcomputers
        removing it still worked
      • 2020-05-02 12333, 2020

      • iliekcomputers shrugs
      • 2020-05-02 12349, 2020

      • shivam-kapila
        Yes it will
      • 2020-05-02 12318, 2020

      • iliekcomputers
        we need to adopt a policy of only adding stuff that's necessary. the dev environment has gotten so complicated that it's easier to develop spark stuff and run it on production data.
      • 2020-05-02 12331, 2020

      • iliekcomputers
        vs just developing locally
      • 2020-05-02 12316, 2020

      • shivam-kapila
        clusterid was used to format the namenode of the same cluster s of datanode
      • 2020-05-02 12321, 2020

      • shivam-kapila
        as*
      • 2020-05-02 12308, 2020

      • shivam-kapila
        Actually if we don't provide clusterID then if we run format command twice then the link between datanode and namenode will end and they wont work
      • 2020-05-02 12316, 2020

      • shivam-kapila
        ClusterID ws used for this reason
      • 2020-05-02 12336, 2020

      • iliekcomputers
        what is the ID that I'm supposed to pass tho?
      • 2020-05-02 12341, 2020

      • iliekcomputers
        where do I find it?
      • 2020-05-02 12323, 2020

      • iliekcomputers
        it also doesn't error out well enough, I ran it two or three times and then was confused why it wasn't working, only realized that it needs some extra argument
      • 2020-05-02 12316, 2020

      • shivam-kapila
        The clusterID part is somewhere in docs. Give me a sec
      • 2020-05-02 12347, 2020

      • iliekcomputers
        LB-551
      • 2020-05-02 12348, 2020

      • BrainzBot
        LB-551: Simplify the develop.sh scripts https://tickets.metabrainz.org/browse/LB-551
      • 2020-05-02 12317, 2020

      • shivam-kapila
      • 2020-05-02 12329, 2020

      • shivam-kapila
        Here you see the cat command to fetch the ID
      • 2020-05-02 12329, 2020

      • SothoTalKer has quit
      • 2020-05-02 12301, 2020

      • shivam-kapila
        3rd point
      • 2020-05-02 12342, 2020

      • iliekcomputers
        we should remove hacking.md
      • 2020-05-02 12349, 2020

      • SothoTalKer joined the channel
      • 2020-05-02 12352, 2020

      • iliekcomputers
        all of it should be on listenbrainz.readthedocs
      • 2020-05-02 12349, 2020

      • iliekcomputers
        new devs will not read HACKING.md to find this stuff lol
      • 2020-05-02 12311, 2020

      • shivam-kapila
        Seems like dev env things need a standard that also needs to be documented
      • 2020-05-02 12342, 2020

      • iliekcomputers
        readthedocs is the standard
      • 2020-05-02 12321, 2020

      • iliekcomputers
        for docs at least :D
      • 2020-05-02 12333, 2020

      • iliekcomputers
        LB-552
      • 2020-05-02 12333, 2020

      • BrainzBot
        LB-552: Should not need to format hadoop on first run https://tickets.metabrainz.org/browse/LB-552
      • 2020-05-02 12303, 2020

      • shivam-kapila
        Its not a config problem
      • 2020-05-02 12350, 2020

      • shivam-kapila
        Hadoop docs say that 1st time namenode needs to be formatted so it can obtain the clusterID of datanode after the format
      • 2020-05-02 12313, 2020

      • shivam-kapila
        Although they also encourage to never format the namenode again unless its highly required
      • 2020-05-02 12314, 2020

      • shivam-kapila
        Also HACKING.md has nothing much that isnt in docs. The main stuff thats different is format namenode one which I agree should be in spark-dev-env
      • 2020-05-02 12328, 2020

      • iliekcomputers
        not in spark-dev-env
      • 2020-05-02 12344, 2020

      • iliekcomputers
        the first format should just work (automatically or in a single command)
      • 2020-05-02 12359, 2020

      • shivam-kapila
        automatic looks good
      • 2020-05-02 12319, 2020

      • iliekcomputers
        the clusterId finding shit should be easy to do, if a user runs `./spark-develop.sh format` without a clusterID they should get an error that tells them exactly what to do
      • 2020-05-02 12326, 2020

      • iliekcomputers
        where to find the ID and how to pass it
      • 2020-05-02 12352, 2020

      • iliekcomputers
        the dev-env docs should be as simple as possible with the minimum number of steps possible
      • 2020-05-02 12323, 2020

      • iliekcomputers
        otherwise, it's just a pita to read them and run so many steps
      • 2020-05-02 12306, 2020

      • shivam-kapila
        Right now if they dont pass the clusterID I guess the format will still work. But if they run multiple times then the link breakage will occur. ClusterID is an optional param for format command
      • 2020-05-02 12334, 2020

      • iliekcomputers
        shivam-kapila: that's not the case in current master
      • 2020-05-02 12349, 2020

      • iliekcomputers
        if you run ./spark-develop.sh format on master, it won't work
      • 2020-05-02 12304, 2020

      • iliekcomputers
        i fixed it in the PR by removing the clusterID argument
      • 2020-05-02 12325, 2020

      • iliekcomputers
        is there a usecase for formatting hadoop after the first format?
      • 2020-05-02 12340, 2020

      • shivam-kapila
        We dont need it mostly.
      • 2020-05-02 12344, 2020

      • iliekcomputers
        i'm wondering if we can just remove the format command entirely, just run it the first time.
      • 2020-05-02 12356, 2020

      • iliekcomputers
        if someone needs to format their data, they should use hdfs -rm
      • 2020-05-02 12325, 2020

      • shivam-kapila
        We can remove that
      • 2020-05-02 12349, 2020

      • shivam-kapila
        Then your current fix as you pushed into thr PR will work
      • 2020-05-02 12311, 2020

      • shivam-kapila
        A note can be added that *Please run this command only once*
      • 2020-05-02 12317, 2020

      • iliekcomputers
        ideally it would run automatically, and we can remove the entire step from the doc
      • 2020-05-02 12330, 2020

      • iliekcomputers
      • 2020-05-02 12331, 2020

      • BrainzBot
        LB-553: Improve developer environment and docs
      • 2020-05-02 12333, 2020

      • iliekcomputers
        opened an epic
      • 2020-05-02 12339, 2020

      • shivam-kapila
        Also for multiple formats we can add the HACKING.md into FAQs replacing `./spark_develop.sh format` with `docker-compose ...........`
      • 2020-05-02 12332, 2020

      • shivam-kapila
        So you mean to add the format without clusterID into `./spark_develop.sh build`?
      • 2020-05-02 12355, 2020

      • shivam-kapila
        So it happens itself?
      • 2020-05-02 12337, 2020

      • iliekcomputers
        wouldn't work with build
      • 2020-05-02 12356, 2020

      • iliekcomputers
        i'll have to think about how to do it specifically, but that's an implementation detail
      • 2020-05-02 12325, 2020

      • shivam-kapila
        Thats why I asked. If we build multiple times then format will occur multiple times and link between nodes will break
      • 2020-05-02 12323, 2020

      • iliekcomputers
        yep
      • 2020-05-02 12332, 2020

      • iliekcomputers
        the ticket about removing the command entirely (or removing it with hdfs -rm): https://tickets.metabrainz.org/browse/LB-552
      • 2020-05-02 12333, 2020

      • BrainzBot
        LB-552: Should not need to format hadoop on first run
      • 2020-05-02 12350, 2020

      • iliekcomputers
        wait
      • 2020-05-02 12351, 2020

      • iliekcomputers
      • 2020-05-02 12351, 2020

      • BrainzBot
        LB-555: Improve the spark-develop.sh format command
      • 2020-05-02 12339, 2020

      • iliekcomputers
        the epic has a nice list of tasks now, so at least we have a way forward now :)
      • 2020-05-02 12343, 2020

      • iliekcomputers
        thanks shivam-kapila
      • 2020-05-02 12336, 2020

      • iliekcomputers
      • 2020-05-02 12341, 2020

      • iliekcomputers
        this part shouldn't be in the setting up doc
      • 2020-05-02 12306, 2020

      • shivam-kapila
        Oh yes.
      • 2020-05-02 12357, 2020

      • djwhitey joined the channel
      • 2020-05-02 12301, 2020

      • iliekcomputers
        done in current PR
      • 2020-05-02 12312, 2020

      • shivam-kapila
        iliekcomputers: I added a comment about a mistake I did in my docs PR. I just noticed it on readthedocs. Can you plz resolve that too
      • 2020-05-02 12359, 2020

      • iliekcomputers
        yes.
      • 2020-05-02 12306, 2020

      • moufl has quit
      • 2020-05-02 12352, 2020

      • moufl joined the channel
      • 2020-05-02 12303, 2020

      • shivam-kapila
        About LB-549: Doesnt that happen because the spotify reader fetches the latest 50 listens again after deleting all listens? P.S. Not urgent. This just caught my eye
      • 2020-05-02 12304, 2020

      • BrainzBot
        LB-549: Listen Deletion Doesn't Work https://tickets.metabrainz.org/browse/LB-549
      • 2020-05-02 12307, 2020

      • iliekcomputers
        yeah, that makes sense
      • 2020-05-02 12326, 2020

      • iliekcomputers
        i'll reply to the jira saying that they should disconnect their spotify
      • 2020-05-02 12334, 2020

      • iliekcomputers
        shivam-kapila: added the dump docs change
      • 2020-05-02 12326, 2020

      • shivam-kapila
        Thanks :)
      • 2020-05-02 12359, 2020

      • shivam-kapila
        We also need to reset user stats after we delete the listens. I will open a ticket for that
      • 2020-05-02 12302, 2020

      • iliekcomputers
        shivam-kapila: yes please, thanks
      • 2020-05-02 12308, 2020

      • iliekcomputers
        put them in the statistics epic
      • 2020-05-02 12313, 2020

      • iliekcomputers
      • 2020-05-02 12314, 2020

      • BrainzBot
        LB-534: In case of error 504 the API returns HTML instead of JSON
      • 2020-05-02 12326, 2020

      • iliekcomputers
        I'm not sure if there's anyway around this
      • 2020-05-02 12305, 2020

      • iliekcomputers
      • 2020-05-02 12305, 2020

      • BrainzBot
        LB-533: 400 Bad Request: Incorrect timestamp argument max_ts: null when browsing listens
      • 2020-05-02 12355, 2020

      • alastairp
        iliekcomputers: not anyway around what?
      • 2020-05-02 12317, 2020

      • alastairp
        535 or 533?
      • 2020-05-02 12325, 2020

      • shivam-kapila
        LB-533: Oh I had fixed that. I was studying about frontend tests to add test for that
      • 2020-05-02 12327, 2020

      • iliekcomputers
        533
      • 2020-05-02 12334, 2020

      • iliekcomputers
        no sorry, 534
      • 2020-05-02 12342, 2020

      • iliekcomputers
        the 504 comes from the gateways, not LB
      • 2020-05-02 12342, 2020

      • alastairp
        534, we do that in AB
      • 2020-05-02 12350, 2020

      • iliekcomputers
        oh
      • 2020-05-02 12351, 2020

      • alastairp
        oh, right. 504
      • 2020-05-02 12354, 2020

      • alastairp
        sorry, you're right
      • 2020-05-02 12332, 2020

      • iliekcomputers
        shivam-kapila: can you just pull request the fix, i'll look into adding a test
      • 2020-05-02 12355, 2020

      • shivam-kapila
        Ok I will do that. I will rebase it over master first
      • 2020-05-02 12356, 2020

      • alastairp
        and it's true that other 5xx errors, we probably do return json. so we can't just say "don't bother parsing a 5xx error"
      • 2020-05-02 12318, 2020

      • iliekcomputers
        alastairp: yeah.
      • 2020-05-02 12330, 2020

      • alastairp
        but to do it from the frontend, that fe would have to inspect the path, and return a different error message depending on the value of the path
      • 2020-05-02 12341, 2020

      • alastairp
        I'd definitely raise it as a sysadmin ticket to see if there's anything we can do about it
      • 2020-05-02 12336, 2020

      • shivam-kapila
        iliekcomputers: LB-556
      • 2020-05-02 12336, 2020

      • BrainzBot
        LB-556: Reset user stats when a user deletes their listens https://tickets.metabrainz.org/browse/LB-556
      • 2020-05-02 12322, 2020

      • iliekcomputers
        MBH-538
      • 2020-05-02 12323, 2020

      • BrainzBot
        MBH-538: Can we return JSON for ListenBrainz API Gateway Timeout requests? https://tickets.metabrainz.org/browse/MBH-538
      • 2020-05-02 12341, 2020

      • iliekcomputers
        shivam-kapila: thanks!
      • 2020-05-02 12341, 2020

      • alastairp
        iliekcomputers: nice
      • 2020-05-02 12357, 2020

      • moufl has quit
      • 2020-05-02 12349, 2020

      • moufl joined the channel