-
SothoTalKer has quit
-
davic joined the channel
-
kori has quit
-
d4rkie has quit
-
Nyanko-sensei joined the channel
-
kori joined the channel
-
Lotheric
-
kori has quit
-
kori joined the channel
-
kori has quit
-
kori joined the channel
-
kori has quit
-
kori joined the channel
-
kori has quit
-
kori joined the channel
-
kori has quit
-
kori joined the channel
-
thomasross has quit
-
kori has quit
-
MajorLurker has quit
-
_lucifer
pristine___: ping
-
supersandro2000 has quit
-
kori joined the channel
-
pristine___
_lucifer: pong
-
_lucifer
pristine___: i am getting java oom errors. what should i set driver memory for spark as?
-
kori has quit
-
pristine___
-
This might help.
-
You will have to calculate driver memory, excutor memory and other configs based on your machine. Fun maths :p
-
_lucifer
lol ok :)
-
what are these value for your machine btw?
-
pristine___
I use few MBs of data so doesn't matter :p
-
_lucifer
yeah right :|
-
pristine___
You can ishaanshah , I guess he also using full dumps.
-
_lucifer
ishaanshah: ping :)
-
kori joined the channel
-
pristine___: btw, have tried out google colab?
-
pristine___
Not yet
-
But say? What's in your mind?
-
_lucifer
i was thinking if we could set up a jupyter notebook for quickly experimenting with reca
-
recs.
-
using colab, we may be able to run workloads using k80 gpus so speed and memory will less of an issie
-
ishaanshah
_lucifer: pong!
-
_lucifer
ishaanshah: hi, do you use full dumps or incremental dumps locally while working eith spark?
-
ishaanshah
multiple incremental dumps, not full
-
_lucifer
ah ok
-
i am too using that for listens
-
but for the mapping a full dump
-
ishaanshah
yeah I used full for mapping too
-
but got OOM
-
_lucifer
yeah same here
-
were you able to tweak the config to get it working?
-
ishaanshah
I have a 8G laptop
-
and the mapping is 11G
-
kori has quit
-
_lucifer
i too have a 8 gig one
-
ishaanshah
so I dont think theres any way we can fix it, using a smaller dump would be better
-
_lucifer
yeah right that would certainly fix this
-
pristine___
_lucifer: can do, next week when I am back in town
-
_lucifer
great!
-
pristine___
_lucifer: told ya to not use full dump mapping :p
-
_lucifer
yeah you were right :)
-
pristine___
Though I think we really need to have smaller dumps for dev
-
_lucifer
i have run it over and am monitoring to see where it fails
-
pristine___
For better user experience.
-
ishaanshah
_lucifer: How much memory do we get for free on colab?
-
pristine___
Can you open a ticket for smaller dumps _lucifer ?
-
_lucifer
ishaanshah: i was trying to find the same
-
pristine___: yeah sure will do that
-
ishaanshah
I am interested in having some kind of cloud testing env for spark
-
_lucifer
12gig
-
ishaanshah
I used databricks, but it has limitations for free accounts
-
and doesnt work for full dumps
-
_lucifer
yeah right
-
ishaanshah
> 12gig
-
:(
-
_lucifer
how much do you get on databrixks?
-
ishaanshah
15G
-
but limited storage
-
so cant download the dumps
-
_lucifer
okay but that 12 does not include gpu
-
i'll see if i can find another alternative
-
ishaanshah: what about kaggle, 16 g + 30h gpu/week
-
kori joined the channel
-
ishaanshah
_lucifer: Hmm, looks promising, I haven't personally used kaggle though
-
does it support spark?
-
shivam-kapila
Gcp gives upto 26g on demand
-
_lucifer
ishaanshah: yup, pyspark is just like any other mllib. i just installed pyspark on pc and experimented with using python console
-
lol, gcp banned my account
-
shivam-kapila
Good
-
ishaanshah
_lucifer: ooh nice, let me know if you are able to run mapping on kaggle
-
_lucifer
yeah, will try and let you know :D
-
ishaanshah
I like to experiment with the queries before writing it for production, notebooks like environment are good for this
-
shivam-kapila
ishaanshah: you mentioned about zepplin once
-
Wont it solve the issue if we have a zepplin layer in prod
-
ishaanshah
shivam-kapila: yes but it requires you to use your own PC
-
shivam-kapila
Ouch
-
ishaanshah
I dont have a powerful enough pc to join huge datasets
-
we can add it to prod but its not an easy task
-
shivam-kapila
Mine is slower than yours
-
_lucifer
thats true for almost all of us
-
shivam-kapila
Yeah I saw zepplin integration
-
Its somewhat tedious
-
Anyways I think a smaller mapping is needed
-
_lucifer
also a listen dataset for that
-
shivam-kapila
Cloud isnt as flex
-
_lucifer
so that the listens are actually in the mapping and we can get meaningful.resulta
-
shivam-kapila
yes that
-
ideally we can pick the latest 5 inc dumps and have corresponiding mapping
-
IG thats enough
-
_lucifer
yeah makes sense
-
ishaanshah
-
_lucifer
but getting that corresponding mapping can be hard
-
shivam-kapila
dunno think o
-
_lucifer
ishaanshah: thanks, i was just going to write these myself. a lot of time saved :D
-
ishaanshah
:D
-
shivam-kapila
theres a dedicated spark extension for jupyter notebook
-
_lucifer
nice!
-
Nyanko-sensei has quit
-
_lucifer has quit
-
leonardo has quit
-
imdeni has quit
-
mruszczyk has quit
-
diru1100 has quit
-
reg[m] has quit
-
joshuaboniface has quit
-
djinni` has quit
-
rdswift_ joined the channel
-
rdswift has quit
-
rdswift_ is now known as rdswift
-
testfreenode joined the channel
-
_lucifer joined the channel
-
pristine___: ishaanshah took one hour but request dataframes completed succesfully so issue is not with the mapping
-
testfreenode has quit
-
pristine___
Dataframes created in an hour?
-
_lucifer
yeah
-
ishaanshah
_lucifer: on kaggle or on local dev?
-
_lucifer
local
-
ok my bad it was 2 hours
-
but succesful
-
ishaanshah
Oh the full mapping worked?
-
What changes did you make?
-
To the config
-
_lucifer
none
-
ishaanshah
You said you got an OOM at first right?
-
_lucifer
i too had thought the issue was mapping but i had issued all command the last time
-
this time i am running all commands one by one as they complete
-
there are three left one of which should be the culprit
-
ishaanshah
Oh
-
I dont know why it ran out of memory when I did it