#metabrainz

/

      • SothoTalKer has quit
      • 2020-10-01 27517, 2020

      • davic joined the channel
      • 2020-10-01 27525, 2020

      • kori has quit
      • 2020-10-01 27556, 2020

      • d4rkie has quit
      • 2020-10-01 27539, 2020

      • Nyanko-sensei joined the channel
      • 2020-10-01 27520, 2020

      • kori joined the channel
      • 2020-10-01 27550, 2020

      • Lotheric
        ruaok, idea for your hack weekend: https://newsroom.spotify.com/2020-09-29/how-to-ma…
      • 2020-10-01 27558, 2020

      • kori has quit
      • 2020-10-01 27527, 2020

      • kori joined the channel
      • 2020-10-01 27547, 2020

      • kori has quit
      • 2020-10-01 27551, 2020

      • kori joined the channel
      • 2020-10-01 27516, 2020

      • kori has quit
      • 2020-10-01 27521, 2020

      • kori joined the channel
      • 2020-10-01 27504, 2020

      • kori has quit
      • 2020-10-01 27529, 2020

      • kori joined the channel
      • 2020-10-01 27503, 2020

      • kori has quit
      • 2020-10-01 27540, 2020

      • kori joined the channel
      • 2020-10-01 27513, 2020

      • thomasross has quit
      • 2020-10-01 27502, 2020

      • kori has quit
      • 2020-10-01 27504, 2020

      • MajorLurker has quit
      • 2020-10-01 27550, 2020

      • _lucifer
        pristine___: ping
      • 2020-10-01 27523, 2020

      • supersandro2000 has quit
      • 2020-10-01 27555, 2020

      • kori joined the channel
      • 2020-10-01 27547, 2020

      • pristine___
        _lucifer: pong
      • 2020-10-01 27520, 2020

      • _lucifer
        pristine___: i am getting java oom errors. what should i set driver memory for spark as?
      • 2020-10-01 27529, 2020

      • kori has quit
      • 2020-10-01 27552, 2020

      • pristine___
        There is no ideal value as such. Depends on your machine. You will have to tweak configs to get the right value for your machine. https://stackoverflow.com/questions/53631853/spar….
      • 2020-10-01 27552, 2020

      • pristine___
        This might help.
      • 2020-10-01 27547, 2020

      • pristine___
        You will have to calculate driver memory, excutor memory and other configs based on your machine. Fun maths :p
      • 2020-10-01 27515, 2020

      • _lucifer
        lol ok :)
      • 2020-10-01 27525, 2020

      • _lucifer
        what are these value for your machine btw?
      • 2020-10-01 27525, 2020

      • pristine___
        I use few MBs of data so doesn't matter :p
      • 2020-10-01 27547, 2020

      • _lucifer
        yeah right :|
      • 2020-10-01 27521, 2020

      • pristine___
        You can ishaanshah , I guess he also using full dumps.
      • 2020-10-01 27551, 2020

      • _lucifer
        ishaanshah: ping :)
      • 2020-10-01 27537, 2020

      • kori joined the channel
      • 2020-10-01 27523, 2020

      • _lucifer
        pristine___: btw, have tried out google colab?
      • 2020-10-01 27512, 2020

      • pristine___
        Not yet
      • 2020-10-01 27530, 2020

      • pristine___
        But say? What's in your mind?
      • 2020-10-01 27509, 2020

      • _lucifer
        i was thinking if we could set up a jupyter notebook for quickly experimenting with reca
      • 2020-10-01 27514, 2020

      • _lucifer
        recs.
      • 2020-10-01 27554, 2020

      • _lucifer
        using colab, we may be able to run workloads using k80 gpus so speed and memory will less of an issie
      • 2020-10-01 27525, 2020

      • ishaanshah
        _lucifer: pong!
      • 2020-10-01 27559, 2020

      • _lucifer
        ishaanshah: hi, do you use full dumps or incremental dumps locally while working eith spark?
      • 2020-10-01 27518, 2020

      • ishaanshah
        multiple incremental dumps, not full
      • 2020-10-01 27532, 2020

      • _lucifer
        ah ok
      • 2020-10-01 27552, 2020

      • _lucifer
        i am too using that for listens
      • 2020-10-01 27502, 2020

      • _lucifer
        but for the mapping a full dump
      • 2020-10-01 27512, 2020

      • ishaanshah
        yeah I used full for mapping too
      • 2020-10-01 27516, 2020

      • ishaanshah
        but got OOM
      • 2020-10-01 27527, 2020

      • _lucifer
        yeah same here
      • 2020-10-01 27541, 2020

      • _lucifer
        were you able to tweak the config to get it working?
      • 2020-10-01 27550, 2020

      • ishaanshah
        I have a 8G laptop
      • 2020-10-01 27558, 2020

      • ishaanshah
        and the mapping is 11G
      • 2020-10-01 27508, 2020

      • kori has quit
      • 2020-10-01 27516, 2020

      • _lucifer
        i too have a 8 gig one
      • 2020-10-01 27536, 2020

      • ishaanshah
        so I dont think theres any way we can fix it, using a smaller dump would be better
      • 2020-10-01 27557, 2020

      • _lucifer
        yeah right that would certainly fix this
      • 2020-10-01 27558, 2020

      • pristine___
        _lucifer: can do, next week when I am back in town
      • 2020-10-01 27509, 2020

      • _lucifer
        great!
      • 2020-10-01 27520, 2020

      • pristine___
        _lucifer: told ya to not use full dump mapping :p
      • 2020-10-01 27535, 2020

      • _lucifer
        yeah you were right :)
      • 2020-10-01 27557, 2020

      • pristine___
        Though I think we really need to have smaller dumps for dev
      • 2020-10-01 27521, 2020

      • _lucifer
        i have run it over and am monitoring to see where it fails
      • 2020-10-01 27535, 2020

      • pristine___
        For better user experience.
      • 2020-10-01 27555, 2020

      • ishaanshah
        _lucifer: How much memory do we get for free on colab?
      • 2020-10-01 27500, 2020

      • pristine___
        Can you open a ticket for smaller dumps _lucifer ?
      • 2020-10-01 27517, 2020

      • _lucifer
        ishaanshah: i was trying to find the same
      • 2020-10-01 27526, 2020

      • _lucifer
        pristine___: yeah sure will do that
      • 2020-10-01 27548, 2020

      • ishaanshah
        I am interested in having some kind of cloud testing env for spark
      • 2020-10-01 27502, 2020

      • _lucifer
        12gig
      • 2020-10-01 27514, 2020

      • ishaanshah
        I used databricks, but it has limitations for free accounts
      • 2020-10-01 27520, 2020

      • ishaanshah
        and doesnt work for full dumps
      • 2020-10-01 27531, 2020

      • _lucifer
        yeah right
      • 2020-10-01 27510, 2020

      • ishaanshah
        > 12gig
      • 2020-10-01 27512, 2020

      • ishaanshah
        :(
      • 2020-10-01 27553, 2020

      • _lucifer
        how much do you get on databrixks?
      • 2020-10-01 27502, 2020

      • ishaanshah
        15G
      • 2020-10-01 27510, 2020

      • ishaanshah
        but limited storage
      • 2020-10-01 27515, 2020

      • ishaanshah
        so cant download the dumps
      • 2020-10-01 27535, 2020

      • _lucifer
        okay but that 12 does not include gpu
      • 2020-10-01 27524, 2020

      • _lucifer
        i'll see if i can find another alternative
      • 2020-10-01 27538, 2020

      • _lucifer
        ishaanshah: what about kaggle, 16 g + 30h gpu/week
      • 2020-10-01 27542, 2020

      • kori joined the channel
      • 2020-10-01 27556, 2020

      • ishaanshah
        _lucifer: Hmm, looks promising, I haven't personally used kaggle though
      • 2020-10-01 27500, 2020

      • ishaanshah
        does it support spark?
      • 2020-10-01 27506, 2020

      • shivam-kapila
        Gcp gives upto 26g on demand
      • 2020-10-01 27546, 2020

      • _lucifer
        ishaanshah: yup, pyspark is just like any other mllib. i just installed pyspark on pc and experimented with using python console
      • 2020-10-01 27505, 2020

      • _lucifer
        lol, gcp banned my account
      • 2020-10-01 27553, 2020

      • shivam-kapila
        Good
      • 2020-10-01 27500, 2020

      • ishaanshah
        _lucifer: ooh nice, let me know if you are able to run mapping on kaggle
      • 2020-10-01 27524, 2020

      • _lucifer
        yeah, will try and let you know :D
      • 2020-10-01 27546, 2020

      • ishaanshah
        I like to experiment with the queries before writing it for production, notebooks like environment are good for this
      • 2020-10-01 27555, 2020

      • shivam-kapila
        ishaanshah: you mentioned about zepplin once
      • 2020-10-01 27513, 2020

      • shivam-kapila
        Wont it solve the issue if we have a zepplin layer in prod
      • 2020-10-01 27518, 2020

      • ishaanshah
        shivam-kapila: yes but it requires you to use your own PC
      • 2020-10-01 27531, 2020

      • shivam-kapila
        Ouch
      • 2020-10-01 27541, 2020

      • ishaanshah
        I dont have a powerful enough pc to join huge datasets
      • 2020-10-01 27551, 2020

      • ishaanshah
        we can add it to prod but its not an easy task
      • 2020-10-01 27552, 2020

      • shivam-kapila
        Mine is slower than yours
      • 2020-10-01 27555, 2020

      • _lucifer
        thats true for almost all of us
      • 2020-10-01 27510, 2020

      • shivam-kapila
        Yeah I saw zepplin integration
      • 2020-10-01 27518, 2020

      • shivam-kapila
        Its somewhat tedious
      • 2020-10-01 27531, 2020

      • shivam-kapila
        Anyways I think a smaller mapping is needed
      • 2020-10-01 27552, 2020

      • _lucifer
        also a listen dataset for that
      • 2020-10-01 27502, 2020

      • shivam-kapila
        Cloud isnt as flex
      • 2020-10-01 27520, 2020

      • _lucifer
        so that the listens are actually in the mapping and we can get meaningful.resulta
      • 2020-10-01 27554, 2020

      • shivam-kapila
        yes that
      • 2020-10-01 27532, 2020

      • shivam-kapila
        ideally we can pick the latest 5 inc dumps and have corresponiding mapping
      • 2020-10-01 27556, 2020

      • shivam-kapila
        IG thats enough
      • 2020-10-01 27556, 2020

      • _lucifer
        yeah makes sense
      • 2020-10-01 27527, 2020

      • ishaanshah
        _lucifer: this has some functions to download files from FTP and extracting them, maybe helpful https://usercontent.irccloud-cdn.com/file/mzgcsVC…
      • 2020-10-01 27532, 2020

      • _lucifer
        but getting that corresponding mapping can be hard
      • 2020-10-01 27556, 2020

      • shivam-kapila
        dunno think o
      • 2020-10-01 27512, 2020

      • _lucifer
        ishaanshah: thanks, i was just going to write these myself. a lot of time saved :D
      • 2020-10-01 27526, 2020

      • ishaanshah
        :D
      • 2020-10-01 27538, 2020

      • shivam-kapila
        theres a dedicated spark extension for jupyter notebook
      • 2020-10-01 27536, 2020

      • _lucifer
        nice!
      • 2020-10-01 27535, 2020

      • Nyanko-sensei has quit
      • 2020-10-01 27535, 2020

      • _lucifer has quit
      • 2020-10-01 27535, 2020

      • leonardo has quit
      • 2020-10-01 27535, 2020

      • imdeni has quit
      • 2020-10-01 27535, 2020

      • mruszczyk has quit
      • 2020-10-01 27535, 2020

      • diru1100 has quit
      • 2020-10-01 27535, 2020

      • reg[m] has quit
      • 2020-10-01 27535, 2020

      • joshuaboniface has quit
      • 2020-10-01 27535, 2020

      • djinni` has quit
      • 2020-10-01 27555, 2020

      • rdswift_ joined the channel
      • 2020-10-01 27546, 2020

      • rdswift has quit
      • 2020-10-01 27550, 2020

      • rdswift_ is now known as rdswift
      • 2020-10-01 27526, 2020

      • testfreenode joined the channel
      • 2020-10-01 27504, 2020

      • _lucifer joined the channel
      • 2020-10-01 27544, 2020

      • _lucifer
        pristine___: ishaanshah took one hour but request dataframes completed succesfully so issue is not with the mapping
      • 2020-10-01 27504, 2020

      • testfreenode has quit
      • 2020-10-01 27526, 2020

      • pristine___
        Dataframes created in an hour?
      • 2020-10-01 27537, 2020

      • _lucifer
        yeah
      • 2020-10-01 27512, 2020

      • ishaanshah
        _lucifer: on kaggle or on local dev?
      • 2020-10-01 27525, 2020

      • _lucifer
        local
      • 2020-10-01 27537, 2020

      • _lucifer
        ok my bad it was 2 hours
      • 2020-10-01 27550, 2020

      • _lucifer
        but succesful
      • 2020-10-01 27552, 2020

      • ishaanshah
        Oh the full mapping worked?
      • 2020-10-01 27506, 2020

      • ishaanshah
        What changes did you make?
      • 2020-10-01 27510, 2020

      • ishaanshah
        To the config
      • 2020-10-01 27511, 2020

      • _lucifer
        none
      • 2020-10-01 27538, 2020

      • ishaanshah
        You said you got an OOM at first right?
      • 2020-10-01 27540, 2020

      • _lucifer
        i too had thought the issue was mapping but i had issued all command the last time
      • 2020-10-01 27553, 2020

      • _lucifer
        this time i am running all commands one by one as they complete
      • 2020-10-01 27507, 2020

      • _lucifer
        there are three left one of which should be the culprit
      • 2020-10-01 27516, 2020

      • ishaanshah
        Oh
      • 2020-10-01 27534, 2020

      • ishaanshah
        I dont know why it ran out of memory when I did it