Yeah I had a look. Thanks :). Your comment about the develop.sh makes me inclined to think that we can have a separate file to wrap non docker-compose tasks.
2020-05-02 12304, 2020
iliekcomputers
what tasks are you talking about ?
2020-05-02 12313, 2020
shivam-kapila
The manage ones
2020-05-02 12322, 2020
shivam-kapila
./develop.sh manage
2020-05-02 12325, 2020
shivam-kapila
types
2020-05-02 12326, 2020
iliekcomputers
manage.py is supposed to be that
2020-05-02 12340, 2020
iliekcomputers
we're making a manage script over a manage script
2020-05-02 12343, 2020
moufl has quit
2020-05-02 12346, 2020
iliekcomputers
makes no sense
2020-05-02 12354, 2020
moufl joined the channel
2020-05-02 12329, 2020
shivam-kapila
Oh yes. Sorry. That was dumb :/
2020-05-02 12355, 2020
iliekcomputers
nah, that's ok. i'm just trying to reduce complexity in the dev environment as much as possible.
2020-05-02 12358, 2020
iliekcomputers
for example
2020-05-02 12315, 2020
iliekcomputers
both develop.sh and spark_develop.sh have a `format_namenode` for some reason
2020-05-02 12346, 2020
iliekcomputers
I don't think we need the `develop.sh npm` either, that automatically happens
2020-05-02 12313, 2020
shivam-kapila
Yes we can remove that
2020-05-02 12346, 2020
iliekcomputers
`psql` I'm okay with, but it's not documented anywhere
2020-05-02 12320, 2020
iliekcomputers
`manage` i'm not so sure about
2020-05-02 12323, 2020
shivam-kapila
> both develop.sh and spark_develop.sh have a `format_namenode` for some reason
2020-05-02 12323, 2020
shivam-kapila
The one in develop.sh can be removed. Its useless. The one in spark_develop.sh accomplishes the same thing
2020-05-02 12341, 2020
iliekcomputers
yes
2020-05-02 12309, 2020
iliekcomputers
spark_develop.sh format namenode used to take a clusterId argument that i had no idea where to find
2020-05-02 12327, 2020
iliekcomputers
removing it still worked
2020-05-02 12333, 2020
iliekcomputers shrugs
2020-05-02 12349, 2020
shivam-kapila
Yes it will
2020-05-02 12318, 2020
iliekcomputers
we need to adopt a policy of only adding stuff that's necessary. the dev environment has gotten so complicated that it's easier to develop spark stuff and run it on production data.
2020-05-02 12331, 2020
iliekcomputers
vs just developing locally
2020-05-02 12316, 2020
shivam-kapila
clusterid was used to format the namenode of the same cluster s of datanode
2020-05-02 12321, 2020
shivam-kapila
as*
2020-05-02 12308, 2020
shivam-kapila
Actually if we don't provide clusterID then if we run format command twice then the link between datanode and namenode will end and they wont work
2020-05-02 12316, 2020
shivam-kapila
ClusterID ws used for this reason
2020-05-02 12336, 2020
iliekcomputers
what is the ID that I'm supposed to pass tho?
2020-05-02 12341, 2020
iliekcomputers
where do I find it?
2020-05-02 12323, 2020
iliekcomputers
it also doesn't error out well enough, I ran it two or three times and then was confused why it wasn't working, only realized that it needs some extra argument
2020-05-02 12316, 2020
shivam-kapila
The clusterID part is somewhere in docs. Give me a sec
Hadoop docs say that 1st time namenode needs to be formatted so it can obtain the clusterID of datanode after the format
2020-05-02 12313, 2020
shivam-kapila
Although they also encourage to never format the namenode again unless its highly required
2020-05-02 12314, 2020
shivam-kapila
Also HACKING.md has nothing much that isnt in docs. The main stuff thats different is format namenode one which I agree should be in spark-dev-env
2020-05-02 12328, 2020
iliekcomputers
not in spark-dev-env
2020-05-02 12344, 2020
iliekcomputers
the first format should just work (automatically or in a single command)
2020-05-02 12359, 2020
shivam-kapila
automatic looks good
2020-05-02 12319, 2020
iliekcomputers
the clusterId finding shit should be easy to do, if a user runs `./spark-develop.sh format` without a clusterID they should get an error that tells them exactly what to do
2020-05-02 12326, 2020
iliekcomputers
where to find the ID and how to pass it
2020-05-02 12352, 2020
iliekcomputers
the dev-env docs should be as simple as possible with the minimum number of steps possible
2020-05-02 12323, 2020
iliekcomputers
otherwise, it's just a pita to read them and run so many steps
2020-05-02 12306, 2020
shivam-kapila
Right now if they dont pass the clusterID I guess the format will still work. But if they run multiple times then the link breakage will occur. ClusterID is an optional param for format command
2020-05-02 12334, 2020
iliekcomputers
shivam-kapila: that's not the case in current master
2020-05-02 12349, 2020
iliekcomputers
if you run ./spark-develop.sh format on master, it won't work
2020-05-02 12304, 2020
iliekcomputers
i fixed it in the PR by removing the clusterID argument
2020-05-02 12325, 2020
iliekcomputers
is there a usecase for formatting hadoop after the first format?
2020-05-02 12340, 2020
shivam-kapila
We dont need it mostly.
2020-05-02 12344, 2020
iliekcomputers
i'm wondering if we can just remove the format command entirely, just run it the first time.
2020-05-02 12356, 2020
iliekcomputers
if someone needs to format their data, they should use hdfs -rm
2020-05-02 12325, 2020
shivam-kapila
We can remove that
2020-05-02 12349, 2020
shivam-kapila
Then your current fix as you pushed into thr PR will work
2020-05-02 12311, 2020
shivam-kapila
A note can be added that *Please run this command only once*
2020-05-02 12317, 2020
iliekcomputers
ideally it would run automatically, and we can remove the entire step from the doc
iliekcomputers: I added a comment about a mistake I did in my docs PR. I just noticed it on readthedocs. Can you plz resolve that too
2020-05-02 12359, 2020
iliekcomputers
yes.
2020-05-02 12306, 2020
moufl has quit
2020-05-02 12352, 2020
moufl joined the channel
2020-05-02 12303, 2020
shivam-kapila
About LB-549: Doesnt that happen because the spotify reader fetches the latest 50 listens again after deleting all listens? P.S. Not urgent. This just caught my eye