-
SothoTalKer has quit
2020-10-01 27517, 2020
-
davic joined the channel
2020-10-01 27525, 2020
-
kori has quit
2020-10-01 27556, 2020
-
d4rkie has quit
2020-10-01 27539, 2020
-
Nyanko-sensei joined the channel
2020-10-01 27520, 2020
-
kori joined the channel
2020-10-01 27550, 2020
-
Lotheric
2020-10-01 27558, 2020
-
kori has quit
2020-10-01 27527, 2020
-
kori joined the channel
2020-10-01 27547, 2020
-
kori has quit
2020-10-01 27551, 2020
-
kori joined the channel
2020-10-01 27516, 2020
-
kori has quit
2020-10-01 27521, 2020
-
kori joined the channel
2020-10-01 27504, 2020
-
kori has quit
2020-10-01 27529, 2020
-
kori joined the channel
2020-10-01 27503, 2020
-
kori has quit
2020-10-01 27540, 2020
-
kori joined the channel
2020-10-01 27513, 2020
-
thomasross has quit
2020-10-01 27502, 2020
-
kori has quit
2020-10-01 27504, 2020
-
MajorLurker has quit
2020-10-01 27550, 2020
-
_lucifer
pristine___: ping
2020-10-01 27523, 2020
-
supersandro2000 has quit
2020-10-01 27555, 2020
-
kori joined the channel
2020-10-01 27547, 2020
-
pristine___
_lucifer: pong
2020-10-01 27520, 2020
-
_lucifer
pristine___: i am getting java oom errors. what should i set driver memory for spark as?
2020-10-01 27529, 2020
-
kori has quit
2020-10-01 27552, 2020
-
pristine___
2020-10-01 27552, 2020
-
pristine___
This might help.
2020-10-01 27547, 2020
-
pristine___
You will have to calculate driver memory, excutor memory and other configs based on your machine. Fun maths :p
2020-10-01 27515, 2020
-
_lucifer
lol ok :)
2020-10-01 27525, 2020
-
_lucifer
what are these value for your machine btw?
2020-10-01 27525, 2020
-
pristine___
I use few MBs of data so doesn't matter :p
2020-10-01 27547, 2020
-
_lucifer
yeah right :|
2020-10-01 27521, 2020
-
pristine___
You can ishaanshah , I guess he also using full dumps.
2020-10-01 27551, 2020
-
_lucifer
ishaanshah: ping :)
2020-10-01 27537, 2020
-
kori joined the channel
2020-10-01 27523, 2020
-
_lucifer
pristine___: btw, have tried out google colab?
2020-10-01 27512, 2020
-
pristine___
Not yet
2020-10-01 27530, 2020
-
pristine___
But say? What's in your mind?
2020-10-01 27509, 2020
-
_lucifer
i was thinking if we could set up a jupyter notebook for quickly experimenting with reca
2020-10-01 27514, 2020
-
_lucifer
recs.
2020-10-01 27554, 2020
-
_lucifer
using colab, we may be able to run workloads using k80 gpus so speed and memory will less of an issie
2020-10-01 27525, 2020
-
ishaanshah
_lucifer: pong!
2020-10-01 27559, 2020
-
_lucifer
ishaanshah: hi, do you use full dumps or incremental dumps locally while working eith spark?
2020-10-01 27518, 2020
-
ishaanshah
multiple incremental dumps, not full
2020-10-01 27532, 2020
-
_lucifer
ah ok
2020-10-01 27552, 2020
-
_lucifer
i am too using that for listens
2020-10-01 27502, 2020
-
_lucifer
but for the mapping a full dump
2020-10-01 27512, 2020
-
ishaanshah
yeah I used full for mapping too
2020-10-01 27516, 2020
-
ishaanshah
but got OOM
2020-10-01 27527, 2020
-
_lucifer
yeah same here
2020-10-01 27541, 2020
-
_lucifer
were you able to tweak the config to get it working?
2020-10-01 27550, 2020
-
ishaanshah
I have a 8G laptop
2020-10-01 27558, 2020
-
ishaanshah
and the mapping is 11G
2020-10-01 27508, 2020
-
kori has quit
2020-10-01 27516, 2020
-
_lucifer
i too have a 8 gig one
2020-10-01 27536, 2020
-
ishaanshah
so I dont think theres any way we can fix it, using a smaller dump would be better
2020-10-01 27557, 2020
-
_lucifer
yeah right that would certainly fix this
2020-10-01 27558, 2020
-
pristine___
_lucifer: can do, next week when I am back in town
2020-10-01 27509, 2020
-
_lucifer
great!
2020-10-01 27520, 2020
-
pristine___
_lucifer: told ya to not use full dump mapping :p
2020-10-01 27535, 2020
-
_lucifer
yeah you were right :)
2020-10-01 27557, 2020
-
pristine___
Though I think we really need to have smaller dumps for dev
2020-10-01 27521, 2020
-
_lucifer
i have run it over and am monitoring to see where it fails
2020-10-01 27535, 2020
-
pristine___
For better user experience.
2020-10-01 27555, 2020
-
ishaanshah
_lucifer: How much memory do we get for free on colab?
2020-10-01 27500, 2020
-
pristine___
Can you open a ticket for smaller dumps _lucifer ?
2020-10-01 27517, 2020
-
_lucifer
ishaanshah: i was trying to find the same
2020-10-01 27526, 2020
-
_lucifer
pristine___: yeah sure will do that
2020-10-01 27548, 2020
-
ishaanshah
I am interested in having some kind of cloud testing env for spark
2020-10-01 27502, 2020
-
_lucifer
12gig
2020-10-01 27514, 2020
-
ishaanshah
I used databricks, but it has limitations for free accounts
2020-10-01 27520, 2020
-
ishaanshah
and doesnt work for full dumps
2020-10-01 27531, 2020
-
_lucifer
yeah right
2020-10-01 27510, 2020
-
ishaanshah
> 12gig
2020-10-01 27512, 2020
-
ishaanshah
:(
2020-10-01 27553, 2020
-
_lucifer
how much do you get on databrixks?
2020-10-01 27502, 2020
-
ishaanshah
15G
2020-10-01 27510, 2020
-
ishaanshah
but limited storage
2020-10-01 27515, 2020
-
ishaanshah
so cant download the dumps
2020-10-01 27535, 2020
-
_lucifer
okay but that 12 does not include gpu
2020-10-01 27524, 2020
-
_lucifer
i'll see if i can find another alternative
2020-10-01 27538, 2020
-
_lucifer
ishaanshah: what about kaggle, 16 g + 30h gpu/week
2020-10-01 27542, 2020
-
kori joined the channel
2020-10-01 27556, 2020
-
ishaanshah
_lucifer: Hmm, looks promising, I haven't personally used kaggle though
2020-10-01 27500, 2020
-
ishaanshah
does it support spark?
2020-10-01 27506, 2020
-
shivam-kapila
Gcp gives upto 26g on demand
2020-10-01 27546, 2020
-
_lucifer
ishaanshah: yup, pyspark is just like any other mllib. i just installed pyspark on pc and experimented with using python console
2020-10-01 27505, 2020
-
_lucifer
lol, gcp banned my account
2020-10-01 27553, 2020
-
shivam-kapila
Good
2020-10-01 27500, 2020
-
ishaanshah
_lucifer: ooh nice, let me know if you are able to run mapping on kaggle
2020-10-01 27524, 2020
-
_lucifer
yeah, will try and let you know :D
2020-10-01 27546, 2020
-
ishaanshah
I like to experiment with the queries before writing it for production, notebooks like environment are good for this
2020-10-01 27555, 2020
-
shivam-kapila
ishaanshah: you mentioned about zepplin once
2020-10-01 27513, 2020
-
shivam-kapila
Wont it solve the issue if we have a zepplin layer in prod
2020-10-01 27518, 2020
-
ishaanshah
shivam-kapila: yes but it requires you to use your own PC
2020-10-01 27531, 2020
-
shivam-kapila
Ouch
2020-10-01 27541, 2020
-
ishaanshah
I dont have a powerful enough pc to join huge datasets
2020-10-01 27551, 2020
-
ishaanshah
we can add it to prod but its not an easy task
2020-10-01 27552, 2020
-
shivam-kapila
Mine is slower than yours
2020-10-01 27555, 2020
-
_lucifer
thats true for almost all of us
2020-10-01 27510, 2020
-
shivam-kapila
Yeah I saw zepplin integration
2020-10-01 27518, 2020
-
shivam-kapila
Its somewhat tedious
2020-10-01 27531, 2020
-
shivam-kapila
Anyways I think a smaller mapping is needed
2020-10-01 27552, 2020
-
_lucifer
also a listen dataset for that
2020-10-01 27502, 2020
-
shivam-kapila
Cloud isnt as flex
2020-10-01 27520, 2020
-
_lucifer
so that the listens are actually in the mapping and we can get meaningful.resulta
2020-10-01 27554, 2020
-
shivam-kapila
yes that
2020-10-01 27532, 2020
-
shivam-kapila
ideally we can pick the latest 5 inc dumps and have corresponiding mapping
2020-10-01 27556, 2020
-
shivam-kapila
IG thats enough
2020-10-01 27556, 2020
-
_lucifer
yeah makes sense
2020-10-01 27527, 2020
-
ishaanshah
2020-10-01 27532, 2020
-
_lucifer
but getting that corresponding mapping can be hard
2020-10-01 27556, 2020
-
shivam-kapila
dunno think o
2020-10-01 27512, 2020
-
_lucifer
ishaanshah: thanks, i was just going to write these myself. a lot of time saved :D
2020-10-01 27526, 2020
-
ishaanshah
:D
2020-10-01 27538, 2020
-
shivam-kapila
theres a dedicated spark extension for jupyter notebook
2020-10-01 27536, 2020
-
_lucifer
nice!
2020-10-01 27535, 2020
-
Nyanko-sensei has quit
2020-10-01 27535, 2020
-
_lucifer has quit
2020-10-01 27535, 2020
-
leonardo has quit
2020-10-01 27535, 2020
-
imdeni has quit
2020-10-01 27535, 2020
-
mruszczyk has quit
2020-10-01 27535, 2020
-
diru1100 has quit
2020-10-01 27535, 2020
-
reg[m] has quit
2020-10-01 27535, 2020
-
joshuaboniface has quit
2020-10-01 27535, 2020
-
djinni` has quit
2020-10-01 27555, 2020
-
rdswift_ joined the channel
2020-10-01 27546, 2020
-
rdswift has quit
2020-10-01 27550, 2020
-
rdswift_ is now known as rdswift
2020-10-01 27526, 2020
-
testfreenode joined the channel
2020-10-01 27504, 2020
-
_lucifer joined the channel
2020-10-01 27544, 2020
-
_lucifer
pristine___: ishaanshah took one hour but request dataframes completed succesfully so issue is not with the mapping
2020-10-01 27504, 2020
-
testfreenode has quit
2020-10-01 27526, 2020
-
pristine___
Dataframes created in an hour?
2020-10-01 27537, 2020
-
_lucifer
yeah
2020-10-01 27512, 2020
-
ishaanshah
_lucifer: on kaggle or on local dev?
2020-10-01 27525, 2020
-
_lucifer
local
2020-10-01 27537, 2020
-
_lucifer
ok my bad it was 2 hours
2020-10-01 27550, 2020
-
_lucifer
but succesful
2020-10-01 27552, 2020
-
ishaanshah
Oh the full mapping worked?
2020-10-01 27506, 2020
-
ishaanshah
What changes did you make?
2020-10-01 27510, 2020
-
ishaanshah
To the config
2020-10-01 27511, 2020
-
_lucifer
none
2020-10-01 27538, 2020
-
ishaanshah
You said you got an OOM at first right?
2020-10-01 27540, 2020
-
_lucifer
i too had thought the issue was mapping but i had issued all command the last time
2020-10-01 27553, 2020
-
_lucifer
this time i am running all commands one by one as they complete
2020-10-01 27507, 2020
-
_lucifer
there are three left one of which should be the culprit
2020-10-01 27516, 2020
-
ishaanshah
Oh
2020-10-01 27534, 2020
-
ishaanshah
I dont know why it ran out of memory when I did it