#metabrainz

/

      • chaban has quit
      • chaban joined the channel
      • ayerhart joined the channel
      • D4RK-PH0ENiX has quit
      • Nyanko-sensei joined the channel
      • Lotheric_ joined the channel
      • Lotheric has quit
      • Lotheric_ is now known as Lotheric
      • travis-ci joined the channel
      • travis-ci
        Project bookbrainz-data-js build #1181: passed in 1 min 49 sec: https://travis-ci.org/bookbrainz/bookbrainz-dat...
      • travis-ci has left the channel
      • Darkloke joined the channel
      • Darkloke
        Hi2All. My question is a little bit offtopic, but i am just asking here, cause here are developers and may be u could give me some advice for the noob like me. I am trying to make a java script, which could parse GEMA PRO database and extract track names with details. PM me if u are interested please.
      • pristine__
        ruaok: moin. Can we create a tunnel through worker nodes. Stderr and stdout for workers are visible on 8081
      • ruaok
        moin!
      • visible on port 8081 inside the container?
      • pristine__
        We have created a tunnel for port 4040 running in listenbrainz jobs on leader. This is the Spark UI. Now in this UI there are tabs (stderr, stdout) for every executor to view executor logs. When I click on this, a window opens, and something like *10.x.x.5x:8081.xxxxx* is written in address bar but nothing is displayed.
      • So I guess we need a tunnel for 8081.
      • ruaok
        ok, yes that means a tunnel into a different machine and then into the container. hmm. I'll have to ponder this.
      • pristine__
        Yeah. We just want executor logs to debug. Now, by default spark logs in spark_home/logs. On leader, spark/logs in empty. Idk why. I have spent days on this but idk.
      • By default workers store stderr and stdout in spark_home/work
      • Can you log in to one of the worker and check /usr/local/spark/work
      • ?
      • ruaok
        that might be easier.
      • any worker?
      • pristine__
        I have created a requested to join spark users list, I can ask about empty logs there. UI serves the same purpose, but it will stop once the job stops.
      • Yeah. Any worker.
      • But I still say that we should try for workers UI. It is better to apprehend. The stored logs (which must be cleaned regularly) can help us too see error history even if the job has ended. Whislt the job is running, UI can be great.
      • ruaok
        lets see if the files are useful first.
      • on leader in /home/vansika is worker-logs.zip
      • pristine__
        A sec
      • ruaok: can you go to /usr/local/spark and send me a screenshot
      • And then /usr/local/spark/work and a screenshot
      • ruaok
        you have all of the contents of the work directory in the zip file. did the zip file not work?
      • pristine__
        No it worked. I just want to check somethin
      • ruaok
      • pristine__
        Cool. What about app I ran yesterday? Are they stored on other workers? We don't have logs of previous week/month. Are they cleaned up automatically?
      • So many questions. Lol
      • ruaok
        I presume the other workers have similar files. and I would expect them to get cleaned up after x days, which is likely configurable in some config file.
      • is the output useful?
      • pristine__
        Loads of it. I will go through it and get back to you
      • Can we discuss a lil about cleaning up models?
      • (thanks for the zip)
      • ruaok
        sure.
      • pristine__
        So i was thinking, whenever we save a model (which will be after months most probably) should we clean up the previous one(s)?
      • ruaok
        I wonder what our thinking here should be.
      • clean up by default and mark others as "saved, in use"?
      • or manually clean up and only carefully delete items.
      • or maybe, just keep the latest one? I guess the question is how we specify which module to use for recommendations.
      • pristine__
        The latest one
      • I guess
      • ruaok
        perhaps, this question is premature.
      • pristine__
        If we dont delete, we may run out of space in time.
      • ruaok
        it is a good question, but we're not fully certain how we're going to use data yet.
      • we will.
      • how about we do something simple to start with and simply keep the X latest models, but delete everything else?
      • pristine__
        Cool. So maybe till we are sure about it, i will manual delete all the models which are created while testing.
      • Yes. Sound good.
      • ruaok
        ok
      • pristine__
        What should X be?
      • ruaok
        7?
      • pristine__
        Ummm. It actually depends on how much data we are using for training. Like for around 6 months, consider one gb. 1*7 = 7gb
      • 7*3 = 21gb
      • After replication.
      • Also, we should have a json or parquet for storing matadata about models (on which data it eas trained, when trained size etc etc.)
      • Maybe 4 to start with.
      • ruaok
        :)
      • pristine__
        I was just thinking, why would we require folder models? Could not get to an answer?
      • Older*
      • ruaok
        we may find a model that works well and put it into production.
      • but at the same time we will want to continue evolving models.
      • pristine__
        Oh. Right.
      • ruaok
        we need to keep at least one around for production. possible keep more around for various production scenarious.
      • pristine__
        This project is wow. Everything has to be done from scratch, so much brainstorming. Yay! Thank you <3
      • ruaok
        I know the feeling. part of it is exciting, part of it is tiring. but it has been tiring for 20 years, so I am used to it.
      • pristine__
        I have never done something like this before. I like it.
      • Better than going to college
      • Lol.
      • I am on bunk today😎
      • CatQuest
      • (esp the alt text)
      • ruaok
        yes, indeed.
      • if people make stupid requests, they get stupid macros. lol
      • pristine__
        ruaok: one consumer was running on the worker from where you got the zip?
      • Container*
      • (stupid autocorrect)
      • CatQuest
        "statistics in the last 30 days until today" <-- thats the one I'd want
      • I mean like asking forstatistics for "ah month" is useful for looking at last years what did i listen to in january
      • but for "the last month" nothng speial has changed between 1st of july and 31st of june
      • BestSteve has quit
      • Gazooo joined the channel
      • BestSteve joined the channel
      • ruaok
        pristine__: I didn't carefully check to see, sorry.
      • are the logs useful. do we need to find a way to get them to you?
      • pristine__
        Should i get back to you around at night? I have a test at 3:30 so i did not look at them. Each stderr is too big and there are around 8 of them.
      • Around 9 IST*
      • Check the containers whenever you can :)
      • BestSteve has quit
      • ruaok
        ok, good luck on the test.
      • what should I check in the containers?
      • BestSteve joined the channel
      • yvanzo
        Mr_Monkey: there currently are 10 open SEC tickets related to BB, see https://tickets.metabrainz.org/issues/?filter=1...
      • ruaok
        ohhh, a spanking by the security czar. bad news.
      • yvanzo
        Would it be alright to archive (making read-only) abandoned repositories such as https://github.com/metabrainz/xmpp-messaging-se... ?
      • ruaok
        yes, please.
      • yvanzo
        Alright, it can be unarchived at any time.
      • ruaok
        spellew: ping
      • Mr_Monkey
        Thanks yvanzo ! I was away for a bit and they piled up. I'll look at them.
      • Nyanko-sensei has quit
      • yvanzo
        There might false positive, e.g. 80% of alerts on MBS were not applicable.
      • ruaok: the proper way to address SEC-40 is to go to https://github.com/metabrainz/listenbrainz-serv... and to dismiss as “A fix has already been started”
      • BrainzBot
        SEC-40: [listenbrainz-server] CVE-2019-10744: lodash < 4.17.13 https://tickets.metabrainz.org/browse/SEC-40
      • ruaok
        ah, thanks.
      • done
      • yvanzo
        yup, now closed :)
      • D4RK-PH0ENiX joined the channel
      • bitmap: I guess design-system alerts can be all dismissed as “Vulnerable code is not actually used”
      • D4RK-PH0ENiX has quit
      • D4RK-PH0ENiX joined the channel
      • spellew
        ruaok: o/
      • If this is about my passport expiring soon, I was planning on getting it renewed
      • ruaok
        no, the prices for the flight keep changing. and not for the better. :(
      • but, do get the passport renewed, don't wait.
      • this one has a one night stop in DUB, but with a 91€ hotel, it still is 110€ cheaper than the shorter flight without a stop
      • could also be done leaving sunday evening.
      • zas
        ruaok: can you have a look at Lemmy's disk space ?
      • ruaok
        bad again?
      • stable where its at, no?
      • meaning not good, but stable.
      • iliekcomputers: you around?
      • aidanlw17
        hi alastairp, the first PR is all ready for you to review!
      • alastairp
        great, I'm doing errands this morning but will look in a few hours
      • ruaok
        zas: do you know what is normally stored in /var/lib/docker/aufs ?
      • aidanlw17
        No problem!
      • yvanzo
        hi zas: can you confirm SEC-32 is harmless?
      • BrainzBot
        SEC-32: [picard-website] CVE-2019-10744: lodash.merge < 4.6.2 https://tickets.metabrainz.org/browse/SEC-32
      • yvanzo
        (can be done by reviewing and dismissing https://github.com/metabrainz/picard-website/ne... )
      • spellew
        ruaok: It's fine
      • ruaok
        leave sunday or tuesday?
      • spellew
        Tuesday?
      • I'd prefer that, if it works for you
      • ruaok
        ok, stick around lets buy this.
      • spellew
        Ok
      • ruaok
        apparently booking this ticket is hard. :(
      • ok, I'm finding a much nicer flight combo, but on two separate tickets. which can be dicey, but you'll have one fewer stops to make.
      • I think you need to move, spellew.
      • might be easier. 🤣
      • what other options are there for you to get to NYC?