-
D4RK-PH0ENiX joined the channel
-
Zialus_PT has quit
-
Zialus joined the channel
-
bruce_r joined the channel
-
ferbncode has quit
-
Nyanko-sensei joined the channel
-
D4RK-PH0ENiX has quit
-
Dr-Flay has quit
-
Dr-Flay joined the channel
-
thefar8 has quit
-
Nyanko-sensei has quit
-
D4RK-PH0ENiX joined the channel
-
Leo_Verto_ joined the channel
-
Leo_Verto has quit
-
Leo_Verto_ is now known as Leo_Verto
-
bruce_r has quit
-
thefar8 joined the channel
-
Nyanko-sensei joined the channel
-
D4RK-PH0ENiX has quit
-
MightyJay has quit
-
rsh7 joined the channel
-
MightyJay joined the channel
-
yokel has quit
-
ruaok
so, what have you learned, iliekcomputers?
-
Dr-Flay has quit
-
iliekcomputers
ruaok: so i was mucking around with hadoop and config values yesterday.
-
there's two interfaces that clients can use to read/write/interact with hdfs
-
one is the standard rpc interface, the other is a http api type interface called webhdfs
-
ruaok
do you have a feeling for which is preferred?
-
iliekcomputers
ruaok: most of the simpler python clients I've looked at use webhdfs
-
the client I wrote the code with uses webhdfs too
-
`hdfs dfs -put` used RPC
-
ruaok
and that is the one interface we got working so far. lol.
-
iliekcomputers
yeah, about that.
-
i changed some config values and used `stop-master-service` and `start-master-service`
-
and now the put thing isn't working...
-
ruaok
heh, lol.
-
iliekcomputers
i'm not sure why exactly
-
ruaok
hang on, there might be uncommited changes in my repo.
-
iliekcomputers
the latest hadoop-yarn image is just the master branch of github/meb/hadoop-docker
-
ruaok
the playground repo only has these two lines
-
+ -p published=8088,target=8088,mode=host \
-
+ -p published=9864,target=9864 \
-
which don't affect anything.
-
iliekcomputers
maybe it's the worker nodes. how exactly do I find out what IP they are on and if they're running.
-
Nyanko-sensei has quit
-
from the PR Leo_Verto made, it seemed like `hdfs dfsadmin -report` wasn't trustworthy.
-
ruaok
changes commited, and yes those would make a difference.
-
I've built a new image and am pushing it now
-
D4RK-PH0ENiX joined the channel
-
push.
-
+ed
-
D4RK-PH0ENiX has quit
-
iliekcomputers
ok cool
-
ruaok
you should be able to stop/start and have it work again.
-
iliekcomputers
let me try the put thing again.
-
D4RK-PH0ENiX joined the channel
-
-
ruaok
did you restart both the master and workers?
-
iliekcomputers
i don't have access to the workers.
-
ruaok
you don't need it.
-
iliekcomputers
afaik
-
ruaok
./stop-workers.sh
-
iliekcomputers
ah
-
ruaok
./start-workser.sh
-
iliekcomputers
oh cool
-
ruaok
-s
-
the nice thing bout this system is that you never have to care about the worker nodes. they get fired up and join the cluster.
-
iliekcomputers
Usage: start-worker-service.sh <replicas>
-
replicas is a number?
-
3?
-
ruaok
yep
-
(to both)
-
iliekcomputers
noice, that's pretty cool.
-
outsidecontext joined the channel
-
still the same error
-
can you confirm for once that the workers are on 10.0.0.24, 10.0.0.23 etc?
-
ruaok
the 10 network is the overlay network and every time you start and stop the workers, they get new IPs.
-
I'm going to restart the cluster using my setup. hang on
-
now it works again.
-
why?
-
iliekcomputers
ruaok: foo is an empty file
-
ruaok
yea, I just touched it.
-
iliekcomputers
once it has data, it doesn't work...
-
try again, i echoed some text into it.
-
ruaok
doh.
-
huh, interesting.
-
but that used to work, no?
-
well, we should really focus on the http version of this interface and go with that.
-
are you annoyed by the fact that the only error messages that hadoop gives are java stacktraces?
-
iliekcomputers
yes.
-
not very helpful
-
ruaok
we'd get flayed alive if we wrote code like that.
-
but int he java world, I guess you're used to pain.
-
iliekcomputers
I think we should look at what ports are accessible to the master from the workers
-
ruaok
let me log into a worker
-
iliekcomputers
there's probably some reason why the master isn't able to connect with the workers
-
ruaok
-
the workers have quite fewer ports open.
-
iliekcomputers
-
The ports 9866 and 9864 are the ones we want accessible from the master for sure
-
-
ruaok
as in a process on master makes a connection to port 9866/9864 on a worker?
-
iliekcomputers
ruaok: yes
-
ruaok
ok, both of those are exposed on the inside of the worker containers.
-
could you ping 10.0.0.39 ?
-
iliekcomputers
ping works
-
ruaok
ok, then we need to find which setting causes those ports to be bound on all interfaces.
-
dfs.namenode.http-bind-host ?
-
ruaok starts a build
-
iliekcomputers
-
ruaok
yay!
-
both ports are not reachable from the master node.
-
*now
-
iliekcomputers
I'm still getting connection refuse from inside the master container
-
ruaok
ok, remember that hadoop-master is the master node.
-
03a472b53968 is a worker node
-
-
iliekcomputers
no data received. but it connected.
-
ruaok
probably an invalid wget command, but at least the port is open.
-
iliekcomputers
yes
-
I see that the client is trying to connect to incorrect IPs.
-
This was the thing that Leo_Verto had fixed
-
ruaok
did we lose his PR somehow?
-
iliekcomputers
core-site.xml in /usr/local/hadoop/etc/hadoop/ has the config value set correctly.
-
maybe it needs to be in hdfs-site.xml ?
-
ruaok
can't hurt to try
-
ruaok is on it
-
change applied, cluster restarted.
-
try it again, iliekcomputers
-
iliekcomputers
-
two workers errored but I think it worked
-
ruaok
I wonder if that is because we just restarted things...
-
thefar8
Hi CatQuest
-
iliekcomputers
it is weird.
-
from the logs it still seems like it can't connect to 2 of the datanodes.
-
thefar8
just got a long list of gamelan set (it's 26) from my Javanese Art teacher :)
-
ruaok
which ones?
-
remember that you can look at which ones are alive here:
-
-
iliekcomputers
can you add dfs.datanode.use.datanode.hostname to true once too?
-
ruaok
once?
-
I didn't parse what you're requesting
-
iliekcomputers
whoops
-
can you set dfs.datanode.use.datanode.hostname to true in hdfs-site.xml?
-
and then I'll try again.
-
ruaok
done
-
iliekcomputers
no error this time, woo!
-
now let's check if the http interface works...
-
iliekcomputers crosses fingers