yvanzo[m]: I don't think it's that interesting to most people tbh. This post is mainly for people running mirrors right?
yvanzo[m]
Yes, and editors, probably not worth mentioning it then.
atj[m]
[@zas](https://matrix.to/#/@zas666:matrix.org): Might be me but I did a curl HTTPS req to the new solr-priv host and it responded with a redirect to HTTP
Can you check, I'm a bit busy today trying to tie things up before my flight tomorrow
pite joined the channel
discordbrainz
<02netizen1997> really dont know how to deal with “connection reset” error,can someone help me please?
mayhem[m]
got more context, netizen1997?
discordbrainz
<02netizen1997> sorry for my poor english. i m using java with okhttpclient,here is my code
not much of a java guy, but do you need all those options passed in the headers?
discordbrainz
<02netizen1997> i m also a new beginner of java, i just dont know how to do, so i append too many options in my headers..
mayhem[m]
I'd say skip everything by the User-Agent header.
discordbrainz
<02netizen1997> i have tried but faild.
mayhem[m]
anyone here with more java experience who could help?
minimal joined the channel
discordbrainz
<11techmorningstar> @netizen1997 sounds like a network issue to me.
<02netizen1997> but when i use python, it works without any config set...:lookdown:
<02netizen1997> java is too complex, i hate it, but i really need it.:lookdown:
<02netizen1997> It's ok to use my code sending a request to other website..
Jigen
also netizen: please use a paste bin for that'
discordbrainz
<02netizen1997> `
<02netizen1997> paste bin? means delete my message?
lucifer[m]
netizen1997: tried it locally and it works for me so likely a network issue.
rimskii: can you share the psql command you used on wolf?
discordbrainz
<02netizen1997> thank you very much, maybe i need a proxy? i will try it later
mayhem[m]
lucifer: the TfidfVectorizer refuses to run in parallel in threads. I think there is something blocking inside scikit learn.
lucifer
mayhem[m]: code snippet?
mayhem[m]
I have a thread version of the same code now, should I check that in ?
zerodogg has quit
discordbrainz
<02netizen1997> shit!!! it works when i set proxy..
lucifer[m]
mayhem[m]: sure
discordbrainz
<02netizen1997> appreciate again,now i know how to do my next work。
mayhem[m]
pushed
discordbrainz
<02netizen1997> It' s ok when i use edge visit the api without setting proxy, so I always think It would also work when I use my code to send a request without proxy( in fact, it should workd, right?), there is no different when using browser or use code, right? So, it
lucifer[m]
mayhem: do you have the logs of this when you ran it?
<lucifer[m]> "rimskii: can you share the..." <- `psql postgresql://musicbrainz:musicbrainz@localhost:5433/musicbrainz_db`
lucifer[m]
it using multiple cores though right?
mayhem[m]
yes
see wolf.
73% of CPU used
rimskii[m] uploaded an image: (1278KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/ElvkZYvHBJKUmsudYnyXxWGe/Screenshot%202024-06-21%20at%2020.09.30.png >
rimskii[m]
when I run
> \dt
mayhem[m]: omg mybe its because of me
lucifer[m]
rimskii: can you run `\dn`?
mayhem[m]
rimskii: no, that is me. :)
rimskii[m] uploaded an image: (199KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/kAqHrInPNXLTORfpNSlJGHGC/Screenshot%202024-06-21%20at%2020.10.46.png >
lucifer[m]
mayhem: i have a hunch that sklearn has internal locking or something to limit the number of cores the library is allowed to use
there is a library called joblib which can be used to change that
mayhem[m]
that is something that scientists would do. rather than learn computer science, lol.
lucifer[m] uploaded an image: (75KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/dGzSSXAnZUiqJPWqTeVfaVJM/image.png >
lucifer[m]
oh wow i see what's going on
rimskii: if you are running it on wolf directly, use 5432 as the port
apparently there is another db running on wolf at 5433 port.
i'll see if i can find what is running that and shut it down
rimskii[m]
oh okay
lucifer[m]
done
rimskii[m]
thank you!
lucifer[m]
try again.
rimskii[m]
its working now!
thanks
5432 working
5433 is shut down
lucifer[m]
cool. to be clear, 5433 when you run locally, 5432 when you run on wolf.
bitmap[m]
lucifer[m]: that was for the MB area GSoC project IIRC, so not needed
lucifer[m]
makes sense
mayhem: spark has tf-idf vectorizer too btw
mayhem[m]
good to know.
the problem with your threadpool approach is that it saves the results in ram until everything is processed. which makes the memory footprint much worse.
lucifer[m]
can you try adding a del future inside the loop that processes the finished futures
mayhem[m]
added. let me run it over the whole data and see what happens. hard to tell when it might finish though.
ha. your code was missing the thread_data = [] assignment, so that thread_data kept growing.
not having the growing thread data causes.... only one process to execute at a time.
lucifer[m]
oh lol 😅
mayhem[m]
ok, time to take a closer look at job control.
lucifer[m]
mayhem[m]: huh that is weird
mayhem[m]
there is deffo something weird.
I dont think the job control stuff is going to work. each of the indexes we build is tiny. so it probably never decides to use more than one thread.
d4rk-ph0enix joined the channel
d4rkie has quit
lucifer[m]
mayhem: it is using 8 cores for me, running my script with the thread_data fix
check wolf
d4rk has quit
mayhem[m]
ok, interesting. see if it finishes in reasonable amount of time.
I need to get moving for today anways
s/anways/anyways/
lucifer[m]
free doesn't seem to point any major uptick in ram usage so far
d4rkie joined the channel
mayhem[m]
yeah, it might not be as big as a problem as I suspect.
but for this POC, fine.
discordbrainz
<06salaxceitor> hi! how are you guys doing
<06salaxceitor> came back after some time off and I trying to set up the development environment
<06salaxceitor> i am getting a blank screen with this errors in the browser
<06salaxceitor> - Failed to load resource: the server responded with a status of 404 (NOT FOUND)Understand this error indexPage.js:1Failed to load resource: the server responded with a status of 404 (NOT FOUND)Understand this error close_hover.png:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
<06salaxceitor> no errors in the docker logs
monkey[m]
Hello salaxceitor, which project are you trying to set up?
atj: we get a 302 redirects on both https://solrcloud-sir & solrcloud-privileged, that's solr answering this 302
it returns http because the internal connection is done over http
lucifer[m]
mayhem: the statement that logged timings for each batch is wrong i think, due to which it appears that some indexes are built vastly slower than others
mayhem: no, it failed both times with ProcessPoolExecutor because nmslib indices cannot be pickled apparently. I can work on a version that save the indices to disk and load them in the main thread later.
mayhem[m]
Ahhh, pickling. I was wondering how it was going to do the process boundary transfer
I wonder if the vectorizer code is separable from the rest of scikit learn so that we can run without silly constraints
I guess this is what Boeing pilots must contend with when they fly Airbus....
lucifer[m]
lolol
mayhem[m]
But I suspect Airbus let their pilots fly on all engines if they want...
lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/khyNUcpDWceGaqMaWvSYIoKv
lucifer[m]
ran it on a subset of the data
and i think it is doing the processing in parallel.
mayhem: yes, it is doing everything in parallel. pressing H after opening top i see multiple theads.