#metabrainz

/

0:00 AM
SothoTalKer has quit

2020-10-01 27517, 2020

0:03 AM
davic joined the channel

2020-10-01 27525, 2020

0:05 AM
kori has quit

2020-10-01 27556, 2020

0:06 AM
d4rkie has quit

2020-10-01 27539, 2020

0:07 AM
Nyanko-sensei joined the channel

2020-10-01 27520, 2020

0:27 AM
kori joined the channel

2020-10-01 27550, 2020

0:44 AM
Lotheric

ruaok, idea for your hack weekend: https://newsroom.spotify.com/2020-09-29/how-to-ma…

2020-10-01 27558, 2020

0:51 AM
kori has quit

2020-10-01 27527, 2020

0:57 AM
kori joined the channel

2020-10-01 27547, 2020

1:48 AM
kori has quit

2020-10-01 27551, 2020

2:10 AM
kori joined the channel

2020-10-01 27516, 2020

2:53 AM
kori has quit

2020-10-01 27521, 2020

3:15 AM
kori joined the channel

2020-10-01 27504, 2020

3:31 AM
kori has quit

2020-10-01 27529, 2020

3:36 AM
kori joined the channel

2020-10-01 27503, 2020

4:00 AM
kori has quit

2020-10-01 27540, 2020

4:21 AM
kori joined the channel

2020-10-01 27513, 2020

4:23 AM
thomasross has quit

2020-10-01 27502, 2020

4:28 AM
kori has quit

2020-10-01 27504, 2020

4:29 AM
MajorLurker has quit

2020-10-01 27550, 2020

4:35 AM
_lucifer

pristine___: ping

2020-10-01 27523, 2020

4:37 AM
supersandro2000 has quit

2020-10-01 27555, 2020

4:49 AM
kori joined the channel

2020-10-01 27547, 2020

4:55 AM
pristine___

_lucifer: pong

2020-10-01 27520, 2020

4:57 AM
_lucifer

pristine___: i am getting java oom errors. what should i set driver memory for spark as?

2020-10-01 27529, 2020

4:58 AM
kori has quit

2020-10-01 27552, 2020

5:02 AM
pristine___

There is no ideal value as such. Depends on your machine. You will have to tweak configs to get the right value for your machine. https://stackoverflow.com/questions/53631853/spar….

2020-10-01 27552, 2020

5:02 AM
pristine___

This might help.

2020-10-01 27547, 2020

5:03 AM
pristine___

You will have to calculate driver memory, excutor memory and other configs based on your machine. Fun maths :p

2020-10-01 27515, 2020

5:04 AM
_lucifer

lol ok :)

2020-10-01 27525, 2020

5:04 AM
_lucifer

what are these value for your machine btw?

2020-10-01 27525, 2020

5:05 AM
pristine___

I use few MBs of data so doesn't matter :p

2020-10-01 27547, 2020

5:05 AM
_lucifer

yeah right :|

2020-10-01 27521, 2020

5:06 AM
pristine___

You can ishaanshah , I guess he also using full dumps.

2020-10-01 27551, 2020

5:06 AM
_lucifer

ishaanshah: ping :)

2020-10-01 27537, 2020

5:19 AM
kori joined the channel

2020-10-01 27523, 2020

5:24 AM
_lucifer

pristine___: btw, have tried out google colab?

2020-10-01 27512, 2020

5:25 AM
pristine___

Not yet

2020-10-01 27530, 2020

5:25 AM
pristine___

But say? What's in your mind?

2020-10-01 27509, 2020

5:26 AM
_lucifer

i was thinking if we could set up a jupyter notebook for quickly experimenting with reca

2020-10-01 27514, 2020

5:26 AM
_lucifer

recs.

2020-10-01 27554, 2020

5:26 AM
_lucifer

using colab, we may be able to run workloads using k80 gpus so speed and memory will less of an issie

2020-10-01 27525, 2020

5:27 AM
ishaanshah

_lucifer: pong!

2020-10-01 27559, 2020

5:27 AM
_lucifer

ishaanshah: hi, do you use full dumps or incremental dumps locally while working eith spark?

2020-10-01 27518, 2020

5:28 AM
ishaanshah

multiple incremental dumps, not full

2020-10-01 27532, 2020

5:28 AM
_lucifer

ah ok

2020-10-01 27552, 2020

5:28 AM
_lucifer

i am too using that for listens

2020-10-01 27502, 2020

5:29 AM
_lucifer

but for the mapping a full dump

2020-10-01 27512, 2020

5:29 AM
ishaanshah

yeah I used full for mapping too

2020-10-01 27516, 2020

5:29 AM
ishaanshah

but got OOM

2020-10-01 27527, 2020

5:29 AM
_lucifer

yeah same here

2020-10-01 27541, 2020

5:29 AM
_lucifer

were you able to tweak the config to get it working?

2020-10-01 27550, 2020

5:29 AM
ishaanshah

I have a 8G laptop

2020-10-01 27558, 2020

5:29 AM
ishaanshah

and the mapping is 11G

2020-10-01 27508, 2020

5:30 AM
kori has quit

2020-10-01 27516, 2020

5:30 AM
_lucifer

i too have a 8 gig one

2020-10-01 27536, 2020

5:30 AM
ishaanshah

so I dont think theres any way we can fix it, using a smaller dump would be better

2020-10-01 27557, 2020

5:30 AM
_lucifer

yeah right that would certainly fix this

2020-10-01 27558, 2020

5:30 AM
pristine___

_lucifer: can do, next week when I am back in town

2020-10-01 27509, 2020

5:31 AM
_lucifer

great!

2020-10-01 27520, 2020

5:31 AM
pristine___

_lucifer: told ya to not use full dump mapping :p

2020-10-01 27535, 2020

5:31 AM
_lucifer

yeah you were right :)

2020-10-01 27557, 2020

5:31 AM
pristine___

Though I think we really need to have smaller dumps for dev

2020-10-01 27521, 2020

5:32 AM
_lucifer

i have run it over and am monitoring to see where it fails

2020-10-01 27535, 2020

5:32 AM
pristine___

For better user experience.

2020-10-01 27555, 2020

5:32 AM
ishaanshah

_lucifer: How much memory do we get for free on colab?

2020-10-01 27500, 2020

5:33 AM
pristine___

Can you open a ticket for smaller dumps _lucifer ?

2020-10-01 27517, 2020

5:33 AM
_lucifer

ishaanshah: i was trying to find the same

2020-10-01 27526, 2020

5:33 AM
_lucifer

pristine___: yeah sure will do that

2020-10-01 27548, 2020

5:33 AM
ishaanshah

I am interested in having some kind of cloud testing env for spark

2020-10-01 27502, 2020

5:34 AM
_lucifer

12gig

2020-10-01 27514, 2020

5:34 AM
ishaanshah

I used databricks, but it has limitations for free accounts

2020-10-01 27520, 2020

5:34 AM
ishaanshah

and doesnt work for full dumps

2020-10-01 27531, 2020

5:34 AM
_lucifer

yeah right

2020-10-01 27510, 2020

5:35 AM
ishaanshah

> 12gig

2020-10-01 27512, 2020

5:35 AM
ishaanshah

:(

2020-10-01 27553, 2020

5:35 AM
_lucifer

how much do you get on databrixks?

2020-10-01 27502, 2020

5:38 AM
ishaanshah

15G

2020-10-01 27510, 2020

5:38 AM
ishaanshah

but limited storage

2020-10-01 27515, 2020

5:38 AM
ishaanshah

so cant download the dumps

2020-10-01 27535, 2020

5:38 AM
_lucifer

okay but that 12 does not include gpu

2020-10-01 27524, 2020

5:41 AM
_lucifer

i'll see if i can find another alternative

2020-10-01 27538, 2020

5:45 AM
_lucifer

ishaanshah: what about kaggle, 16 g + 30h gpu/week

2020-10-01 27542, 2020

5:49 AM
kori joined the channel

2020-10-01 27556, 2020

5:51 AM
ishaanshah

_lucifer: Hmm, looks promising, I haven't personally used kaggle though

2020-10-01 27500, 2020

5:52 AM
ishaanshah

does it support spark?

2020-10-01 27506, 2020

5:52 AM
shivam-kapila

Gcp gives upto 26g on demand

2020-10-01 27546, 2020

5:53 AM
_lucifer

ishaanshah: yup, pyspark is just like any other mllib. i just installed pyspark on pc and experimented with using python console

2020-10-01 27505, 2020

5:54 AM
_lucifer

lol, gcp banned my account

2020-10-01 27553, 2020

5:54 AM
shivam-kapila

Good

2020-10-01 27500, 2020

5:55 AM
ishaanshah

_lucifer: ooh nice, let me know if you are able to run mapping on kaggle

2020-10-01 27524, 2020

5:55 AM
_lucifer

yeah, will try and let you know :D

2020-10-01 27546, 2020

5:55 AM
ishaanshah

I like to experiment with the queries before writing it for production, notebooks like environment are good for this

2020-10-01 27555, 2020

5:55 AM
shivam-kapila

ishaanshah: you mentioned about zepplin once

2020-10-01 27513, 2020

5:56 AM
shivam-kapila

Wont it solve the issue if we have a zepplin layer in prod

2020-10-01 27518, 2020

5:56 AM
ishaanshah

shivam-kapila: yes but it requires you to use your own PC

2020-10-01 27531, 2020

5:56 AM
shivam-kapila

Ouch

2020-10-01 27541, 2020

5:56 AM
ishaanshah

I dont have a powerful enough pc to join huge datasets

2020-10-01 27551, 2020

5:56 AM
ishaanshah

we can add it to prod but its not an easy task

2020-10-01 27552, 2020

5:56 AM
shivam-kapila

Mine is slower than yours

2020-10-01 27555, 2020

5:56 AM
_lucifer

thats true for almost all of us

2020-10-01 27510, 2020

5:57 AM
shivam-kapila

Yeah I saw zepplin integration

2020-10-01 27518, 2020

5:57 AM
shivam-kapila

Its somewhat tedious

2020-10-01 27531, 2020

5:57 AM
shivam-kapila

Anyways I think a smaller mapping is needed

2020-10-01 27552, 2020

5:57 AM
_lucifer

also a listen dataset for that

2020-10-01 27502, 2020

5:58 AM
shivam-kapila

Cloud isnt as flex

2020-10-01 27520, 2020

5:58 AM
_lucifer

so that the listens are actually in the mapping and we can get meaningful.resulta

2020-10-01 27554, 2020

5:58 AM
shivam-kapila

yes that

2020-10-01 27532, 2020

5:59 AM
shivam-kapila

ideally we can pick the latest 5 inc dumps and have corresponiding mapping

2020-10-01 27556, 2020

5:59 AM
shivam-kapila

IG thats enough

2020-10-01 27556, 2020

5:59 AM
_lucifer

yeah makes sense

2020-10-01 27527, 2020

6:00 AM
ishaanshah

_lucifer: this has some functions to download files from FTP and extracting them, maybe helpful https://usercontent.irccloud-cdn.com/file/mzgcsVC…

2020-10-01 27532, 2020

6:00 AM
_lucifer

but getting that corresponding mapping can be hard

2020-10-01 27556, 2020

6:00 AM
shivam-kapila

dunno think o

2020-10-01 27512, 2020

6:01 AM
_lucifer

ishaanshah: thanks, i was just going to write these myself. a lot of time saved :D

2020-10-01 27526, 2020

6:01 AM
ishaanshah

:D

2020-10-01 27538, 2020

6:01 AM
shivam-kapila

theres a dedicated spark extension for jupyter notebook

2020-10-01 27536, 2020

6:04 AM
_lucifer

nice!

2020-10-01 27535, 2020

6:54 AM
Nyanko-sensei has quit

2020-10-01 27535, 2020

6:54 AM
_lucifer has quit

2020-10-01 27535, 2020

6:54 AM
leonardo has quit

2020-10-01 27535, 2020

6:54 AM
imdeni has quit

2020-10-01 27535, 2020

6:54 AM
mruszczyk has quit

2020-10-01 27535, 2020

6:54 AM
diru1100 has quit

2020-10-01 27535, 2020

6:54 AM
reg[m] has quit

2020-10-01 27535, 2020

6:54 AM
joshuaboniface has quit

2020-10-01 27535, 2020

6:54 AM
djinni` has quit

2020-10-01 27555, 2020

7:50 AM
rdswift_ joined the channel

2020-10-01 27546, 2020

7:52 AM
rdswift has quit

2020-10-01 27550, 2020

7:52 AM
rdswift_ is now known as rdswift

2020-10-01 27526, 2020

7:53 AM
testfreenode joined the channel

2020-10-01 27504, 2020

7:55 AM
_lucifer joined the channel

2020-10-01 27544, 2020

7:56 AM
_lucifer

pristine___: ishaanshah took one hour but request dataframes completed succesfully so issue is not with the mapping

2020-10-01 27504, 2020

7:57 AM
testfreenode has quit

2020-10-01 27526, 2020

7:57 AM
pristine___

Dataframes created in an hour?

2020-10-01 27537, 2020

7:57 AM
_lucifer

yeah

2020-10-01 27512, 2020

7:58 AM
ishaanshah

_lucifer: on kaggle or on local dev?

2020-10-01 27525, 2020

7:58 AM
_lucifer

local

2020-10-01 27537, 2020

7:58 AM
_lucifer

ok my bad it was 2 hours

2020-10-01 27550, 2020

7:58 AM
_lucifer

but succesful

2020-10-01 27552, 2020

7:58 AM
ishaanshah

Oh the full mapping worked?

2020-10-01 27506, 2020

7:59 AM
ishaanshah

What changes did you make?

2020-10-01 27510, 2020

7:59 AM
ishaanshah

To the config

2020-10-01 27511, 2020

7:59 AM
_lucifer

none

2020-10-01 27538, 2020

7:59 AM
ishaanshah

You said you got an OOM at first right?

2020-10-01 27540, 2020

7:59 AM
_lucifer

i too had thought the issue was mapping but i had issued all command the last time

2020-10-01 27553, 2020

7:59 AM
_lucifer

this time i am running all commands one by one as they complete

2020-10-01 27507, 2020

8:00 AM
_lucifer

there are three left one of which should be the culprit

2020-10-01 27516, 2020

8:00 AM
ishaanshah

Oh

2020-10-01 27534, 2020

8:00 AM
ishaanshah

I dont know why it ran out of memory when I did it