#metabrainz

/

      • D4RK-PH0ENiX has quit
      • 2019-01-15 01557, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-01-15 01543, 2019

      • D4RK-PH0ENiX has quit
      • 2019-01-15 01521, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-01-15 01500, 2019

      • dseomn has quit
      • 2019-01-15 01524, 2019

      • Gazooo joined the channel
      • 2019-01-15 01505, 2019

      • Leo_Verto_ joined the channel
      • 2019-01-15 01542, 2019

      • Leo_Verto has quit
      • 2019-01-15 01542, 2019

      • Leo_Verto_ is now known as Leo_Verto
      • 2019-01-15 01526, 2019

      • akhilesh joined the channel
      • 2019-01-15 01554, 2019

      • G0re joined the channel
      • 2019-01-15 01527, 2019

      • Gore|woerk has quit
      • 2019-01-15 01527, 2019

      • iliekcomputers
        Morning!
      • 2019-01-15 01544, 2019

      • iliekcomputers
        Today's a good day to have a good day.
      • 2019-01-15 01553, 2019

      • yvanzo
        mo''in'
      • 2019-01-15 01507, 2019

      • Zialus_PT has quit
      • 2019-01-15 01514, 2019

      • Zialus joined the channel
      • 2019-01-15 01526, 2019

      • outsidecontext joined the channel
      • 2019-01-15 01526, 2019

      • Zialus has quit
      • 2019-01-15 01536, 2019

      • Zialus joined the channel
      • 2019-01-15 01554, 2019

      • akhilesh has quit
      • 2019-01-15 01523, 2019

      • BestSteve has quit
      • 2019-01-15 01554, 2019

      • BestSteve joined the channel
      • 2019-01-15 01526, 2019

      • michelv joined the channel
      • 2019-01-15 01500, 2019

      • akhilesh joined the channel
      • 2019-01-15 01539, 2019

      • madmouser1 joined the channel
      • 2019-01-15 01547, 2019

      • madmouser1_ has quit
      • 2019-01-15 01509, 2019

      • outsidecontext has quit
      • 2019-01-15 01524, 2019

      • outsidecontext joined the channel
      • 2019-01-15 01558, 2019

      • iliekcomputers
        woohoo, file reading is okay. :)
      • 2019-01-15 01523, 2019

      • iliekcomputers
        >File loaded in 0.93 seconds
      • 2019-01-15 01547, 2019

      • iliekcomputers
        >File processed in 68.25 seconds
      • 2019-01-15 01558, 2019

      • iliekcomputers
        >Listens / sec = 27870.46
      • 2019-01-15 01537, 2019

      • lks has quit
      • 2019-01-15 01526, 2019

      • adhawkins has quit
      • 2019-01-15 01511, 2019

      • adhawkins joined the channel
      • 2019-01-15 01532, 2019

      • adhawkins has quit
      • 2019-01-15 01531, 2019

      • adhawkins joined the channel
      • 2019-01-15 01522, 2019

      • adhawkins has quit
      • 2019-01-15 01506, 2019

      • adhawkins joined the channel
      • 2019-01-15 01515, 2019

      • ruaok
        moooin!
      • 2019-01-15 01526, 2019

      • ruaok
        27k listens a sec, hmmm, ok.
      • 2019-01-15 01531, 2019

      • ruaok
        did the import ever finish?
      • 2019-01-15 01551, 2019

      • _switch_ has quit
      • 2019-01-15 01556, 2019

      • iliekcomputers
        ruaok: moin!
      • 2019-01-15 01500, 2019

      • _switch_ joined the channel
      • 2019-01-15 01520, 2019

      • iliekcomputers
        it wrote a bunch of listens and died around 2018/june for some reason.
      • 2019-01-15 01559, 2019

      • iliekcomputers
        running another one with more time and logging.
      • 2019-01-15 01514, 2019

      • ruaok
        but it started with 200x?
      • 2019-01-15 01540, 2019

      • iliekcomputers
        yeah, it wrote dataframes for until 2017 (starting from 2002)
      • 2019-01-15 01555, 2019

      • iliekcomputers
        i checked them and they were valid.
      • 2019-01-15 01502, 2019

      • ruaok
        promising!
      • 2019-01-15 01518, 2019

      • jwf has quit
      • 2019-01-15 01508, 2019

      • jwf joined the channel
      • 2019-01-15 01501, 2019

      • Gazooo has quit
      • 2019-01-15 01545, 2019

      • D4RK-PH0ENiX has quit
      • 2019-01-15 01556, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-01-15 01538, 2019

      • CatQuest
        hey guys, how old is picard?
      • 2019-01-15 01554, 2019

      • CatQuest
        like we should celebrate 10 yr anniversary soon maybe
      • 2019-01-15 01547, 2019

      • D4RK-PH0ENiX has quit
      • 2019-01-15 01559, 2019

      • amCap1712 joined the channel
      • 2019-01-15 01553, 2019

      • amCap1712
        yvanzo: I have made substantial changes to the draft. I have included a more detailed workflow and elaborated my deadline. I have to make more changes but before it. Please review and suggest if I am headed on the right track. Here's the link: https://community.metabrainz.org/t/gsoc-2019-brin…
      • 2019-01-15 01537, 2019

      • ruaok
        CatQuest: more than 10 years.
      • 2019-01-15 01556, 2019

      • ruaok
        picard was started when I was living in a place where I lived 2003 - 2007.
      • 2019-01-15 01542, 2019

      • D4RK-PH0ENiX joined the channel
      • 2019-01-15 01531, 2019

      • iliekcomputers
        amCap1712: at this point, I personally would work on some code stuff, gsoc is a far way away still, you could probably make some major contributions before gsoc actually starts.
      • 2019-01-15 01553, 2019

      • iliekcomputers
        ruaok: ok to work on spark stuff today?
      • 2019-01-15 01500, 2019

      • CatQuest
        oh no
      • 2019-01-15 01532, 2019

      • CatQuest
        anyway we should look at the ancientest commit messages and see whne it was. thne do some sort of celebration for 15 years of picard or thel ike
      • 2019-01-15 01554, 2019

      • CatQuest
        (depending on whne it was it might have to wait until 20 years)
      • 2019-01-15 01554, 2019

      • CatQuest
        but i want to do something
      • 2019-01-15 01556, 2019

      • CatQuest
        it's so cool
      • 2019-01-15 01509, 2019

      • CatQuest
        i remember whne picard was still a pipedream :D
      • 2019-01-15 01531, 2019

      • CatQuest
        any excuse to party right?
      • 2019-01-15 01538, 2019

      • CatQuest
        gelato or cake and a banner on the blog :D
      • 2019-01-15 01545, 2019

      • CatQuest
        chavi could design it
      • 2019-01-15 01547, 2019

      • CatQuest
        or m idk
      • 2019-01-15 01557, 2019

      • CatQuest
        sorry chhavi
      • 2019-01-15 01519, 2019

      • ruaok
        iliekcomputers: yes, ready.
      • 2019-01-15 01524, 2019

      • ruaok
        sorry for the late start.
      • 2019-01-15 01501, 2019

      • iliekcomputers
        No worries at all. :)
      • 2019-01-15 01507, 2019

      • ruaok
        on my radar: line by line code review, volume/job support
      • 2019-01-15 01514, 2019

      • iliekcomputers
        Yes, cool.
      • 2019-01-15 01517, 2019

      • ruaok
        where should we start?
      • 2019-01-15 01526, 2019

      • ruaok
        do you have some output from the most recent script to share?
      • 2019-01-15 01531, 2019

      • iliekcomputers
        I added times, let me pull up some stats on average.
      • 2019-01-15 01550, 2019

      • ruaok
        raw dump for me please.
      • 2019-01-15 01554, 2019

      • ruaok
        or raw output.
      • 2019-01-15 01504, 2019

      • ruaok
        "don't whitewash it jim, I can take it!"
      • 2019-01-15 01504, 2019

      • amCap1712
        iliekcomputers: surely I will.
      • 2019-01-15 01526, 2019

      • CatQuest
        good luck on your gsoc proposal amCap1712
      • 2019-01-15 01537, 2019

      • iliekcomputers
        It contains a lot of spark info.
      • 2019-01-15 01539, 2019

      • amCap1712
        CatQuest: thanks
      • 2019-01-15 01541, 2019

      • iliekcomputers
        One sec
      • 2019-01-15 01512, 2019

      • iliekcomputers
        so umm, it started writing the dataframes and the backlog is gone because of spark logs...
      • 2019-01-15 01525, 2019

      • ruaok
        ok, whatever you got.
      • 2019-01-15 01541, 2019

      • iliekcomputers
        anyways, i was monitoring it continuously and it was taking around 70s on average for each listen file.
      • 2019-01-15 01553, 2019

      • iliekcomputers
        this includes the upload to hdfs etc too
      • 2019-01-15 01509, 2019

      • iliekcomputers
      • 2019-01-15 01532, 2019

      • iliekcomputers
        getting to (https://github.com/metabrainz/listenbrainz-recomm…) doesn't take much time 0.1s around on average.
      • 2019-01-15 01544, 2019

      • iliekcomputers
        but that is probably because spark lazy loads stuff.
      • 2019-01-15 01510, 2019

      • iliekcomputers
        listens/sec here (https://github.com/metabrainz/listenbrainz-recomm…) were aroudn 40k
      • 2019-01-15 01550, 2019

      • iliekcomputers
        spark is very chatty, i think we might benefit from disabling INFO logs and running once.
      • 2019-01-15 01559, 2019

      • ruaok
        the first two links are the same.
      • 2019-01-15 01531, 2019

      • ruaok
        was that intentional?
      • 2019-01-15 01545, 2019

      • iliekcomputers
        hmm, sorry i pasted it without the message initially.
      • 2019-01-15 01557, 2019

      • iliekcomputers
        i added timings for how much time it takes to write a dataframe but all that is lost
      • 2019-01-15 01505, 2019

      • iliekcomputers
      • 2019-01-15 01521, 2019

      • ruaok
        ok, let's do that. we need to have more sane logs.
      • 2019-01-15 01534, 2019

      • ruaok
        do you have a job running right now?
      • 2019-01-15 01538, 2019

      • iliekcomputers
        yeah.
      • 2019-01-15 01540, 2019

      • ruaok
        have you estimated how long it will take to complete?
      • 2019-01-15 01526, 2019

      • iliekcomputers
        it's done all the processing and is just writing the dataframes now
      • 2019-01-15 01530, 2019

      • iliekcomputers
        done till 2012
      • 2019-01-15 01537, 2019

      • ruaok
        how many hours so far?
      • 2019-01-15 01551, 2019

      • ruaok
        should we thread this script?
      • 2019-01-15 01553, 2019

      • iliekcomputers
        2.9 h
      • 2019-01-15 01530, 2019

      • ruaok
        then we can pick the number of threads to run concurrently.
      • 2019-01-15 01543, 2019

      • outsidecontext has quit
      • 2019-01-15 01551, 2019

      • ruaok
        or we have one thread do the processing and one thread to the writing data frames.
      • 2019-01-15 01507, 2019

      • outsidecontext joined the channel
      • 2019-01-15 01510, 2019

      • ruaok
        lets tune down the log messages.
      • 2019-01-15 01529, 2019

      • ruaok
        then lets do a run over a few files to gather stats. then we can follow the code and see how long things take.
      • 2019-01-15 01536, 2019

      • ruaok
        then we can determine how to improve it.
      • 2019-01-15 01544, 2019

      • iliekcomputers
        ok, I'll stop the code now.
      • 2019-01-15 01505, 2019

      • iliekcomputers
        make changes to run over ~20 files (the entire dump has ~100)
      • 2019-01-15 01511, 2019

      • iliekcomputers
        ?
      • 2019-01-15 01519, 2019

      • ruaok
        make it 10.
      • 2019-01-15 01539, 2019

      • iliekcomputers
        yokay. cool.
      • 2019-01-15 01524, 2019

      • ruaok
        iliekcomputers: did you do something on lemmy that causes a lot of disk usage?
      • 2019-01-15 01558, 2019

      • iliekcomputers
        haven't touched lemmy in a while.
      • 2019-01-15 01548, 2019

      • ruaok
        there was a massive spike in disk usage. got to 96%.
      • 2019-01-15 01551, 2019

      • ruaok
        zas? you?
      • 2019-01-15 01503, 2019

      • iliekcomputers
        when?
      • 2019-01-15 01522, 2019

      • iliekcomputers
        new dumps were created today...
      • 2019-01-15 01522, 2019

      • ruaok
        oh duh.
      • 2019-01-15 01525, 2019

      • ruaok
        today is the 15th.
      • 2019-01-15 01528, 2019

      • ruaok
        yeah.
      • 2019-01-15 01531, 2019

      • ruaok
        well, still scary.
      • 2019-01-15 01537, 2019

      • ruaok
        I'm cleaning old containers.
      • 2019-01-15 01551, 2019

      • ruaok
        can you please log in and see if you have random dumps laying around in your home dir you no longer need?
      • 2019-01-15 01514, 2019

      • iliekcomputers
        alright.
      • 2019-01-15 01530, 2019

      • iliekcomputers
        i did have dumps in there, cleaned up.
      • 2019-01-15 01552, 2019

      • ruaok
        much better. 21% reduction.
      • 2019-01-15 01554, 2019

      • ruaok
        thanks!
      • 2019-01-15 01508, 2019

      • iliekcomputers
        no worries, sorry.
      • 2019-01-15 01535, 2019

      • iliekcomputers
        i've pushed the test code. let me see how to turn spark logs to ERROR.
      • 2019-01-15 01537, 2019

      • ruaok
        how do you normally invoke Dockerfile.jobs?
      • 2019-01-15 01544, 2019

      • ruaok is looking into the volume thing
      • 2019-01-15 01559, 2019

      • iliekcomputers
        ruaok: i push it and then have a script do a `docker run`.