#metabrainz

/

      • atj[m]
        yvanzo[m]: I don't think it's that interesting to most people tbh. This post is mainly for people running mirrors right?
      • yvanzo[m]
        Yes, and editors, probably not worth mentioning it then.
      • atj[m]
        [@zas](https://matrix.to/#/@zas666:matrix.org): Might be me but I did a curl HTTPS req to the new solr-priv host and it responded with a redirect to HTTP
      • Can you check, I'm a bit busy today trying to tie things up before my flight tomorrow
      • pite joined the channel
      • discordbrainz
        <02netizen1997> really dont know how to deal with “connection reset” error,can someone help me please?
      • mayhem[m]
        got more context, netizen1997?
      • discordbrainz
        <02netizen1997> sorry for my poor english. i m using java with okhttpclient,here is my code
      • <02netizen1997> OkHttpClient client = new OkHttpClient().newBuilder() .callTimeout(10, TimeUnit.SECONDS) .connectTimeout(10, TimeUnit.SECONDS) .readTimeout(10, TimeUnit.SECONDS) .connectionPool(new ConnectionPool(10,10,TimeUnit.SECONDS)) .build(); String url = "https://musicbrainz.org/ws/2/release?query=lanterns"; // url =
      • "https://blog.csdn.net/super_kingking/article/details/70992012"; Request request = new Request.Builder() .url(url) .method("GET", null) .addHeader("User-Agent","album manager/1.0(xiangyu97@live.cn), personal use") .addHeader("Accept",
      • "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.7") .addHeader("Accept-Encoding", "gzip, deflate, br, zstd") .addHeader("Accept-Language","zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6") .addHeader("Accept-Language","zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6") .build();
      • System.out.println(request.toString()); try { Response response = client.newCall(request).execute();
      • <02netizen1997> get
      • <02netizen1997> java.lang.RuntimeException: java.net.SocketException: Connection reset
      • mayhem[m]
        not much of a java guy, but do you need all those options passed in the headers?
      • discordbrainz
        <02netizen1997> i m also a new beginner of java, i just dont know how to do, so i append too many options in my headers..
      • mayhem[m]
        I'd say skip everything by the User-Agent header.
      • discordbrainz
        <02netizen1997> i have tried but faild.
      • mayhem[m]
        anyone here with more java experience who could help?
      • minimal joined the channel
      • discordbrainz
        <11techmorningstar> @netizen1997 sounds like a network issue to me.
      • <02netizen1997> but when i use python, it works without any config set...:lookdown:
      • <02netizen1997> java is too complex, i hate it, but i really need it.:lookdown:
      • <02netizen1997> It's ok to use my code sending a request to other website..
      • Jigen
        also netizen: please use a paste bin for that'
      • discordbrainz
        <02netizen1997> `
      • <02netizen1997> paste bin? means delete my message?
      • lucifer[m]
        netizen1997: tried it locally and it works for me so likely a network issue.
      • rimskii: can you share the psql command you used on wolf?
      • discordbrainz
        <02netizen1997> thank you very much, maybe i need a proxy? i will try it later
      • mayhem[m]
        lucifer: the TfidfVectorizer refuses to run in parallel in threads. I think there is something blocking inside scikit learn.
      • lucifer
        mayhem[m]: code snippet?
      • mayhem[m]
        I have a thread version of the same code now, should I check that in ?
      • zerodogg has quit
      • discordbrainz
        <02netizen1997> shit!!! it works when i set proxy..
      • lucifer[m]
        mayhem[m]: sure
      • discordbrainz
        <02netizen1997> appreciate again,now i know how to do my next work。
      • mayhem[m]
        pushed
      • discordbrainz
        <02netizen1997> It' s ok when i use edge visit the api without setting proxy, so I always think It would also work when I use my code to send a request without proxy( in fact, it should workd, right?), there is no different when using browser or use code, right? So, it
      • lucifer[m]
        mayhem: do you have the logs of this when you ran it?
      • mayhem[m]
      • really weird results.
      • d4rk has quit
      • d4rk joined the channel
      • lucifer[m]
        mayhem: can you try this? https://github.com/amCap1712/fast-fuzzy
      • using threadpoolexecutor
      • mayhem[m] hates threadpoolexecutor
      • oh why so?
      • mayhem[m]
        I've never worked out how to make dynamic jobs work. let me read your code.
      • ah, I see how you do it.
      • rimskii[m]
        <lucifer[m]> "rimskii: can you share the..." <- A minute
      • mayhem[m]
        your code is running lucifer. 106% CPU/
      • how the threads are being started is not the problem (nor did I suspect that to be the problem)
      • lucifer[m]
        hmm i see
      • mayhem[m]
        the vectorizer function is where it all goes bad. but I've skimmed the source for it and I can't see why it would be a problem.
      • lucifer[m]
        Can you change threalpoolexecutor to processpoolexecutor
      • And see if that changes anything,m
      • s/,m/?/
      • zerodogg joined the channel
      • mayhem[m]
        that does.
      • still highly variable times per chunk.
      • rimskii[m]
        <lucifer[m]> "rimskii: can you share the..." <- `psql postgresql://musicbrainz:musicbrainz@localhost:5433/musicbrainz_db`
      • lucifer[m]
        it using multiple cores though right?
      • mayhem[m]
        yes
      • see wolf.
      • 73% of CPU used
      • rimskii[m] uploaded an image: (1278KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/ElvkZYvHBJKUmsudYnyXxWGe/Screenshot%202024-06-21%20at%2020.09.30.png >
      • rimskii[m]
        when I run
      • > \dt
      • mayhem[m]: omg mybe its because of me
      • lucifer[m]
        rimskii: can you run `\dn`?
      • mayhem[m]
        rimskii: no, that is me. :)
      • rimskii[m] uploaded an image: (199KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/kAqHrInPNXLTORfpNSlJGHGC/Screenshot%202024-06-21%20at%2020.10.46.png >
      • lucifer[m]
        mayhem: i have a hunch that sklearn has internal locking or something to limit the number of cores the library is allowed to use
      • there is a library called joblib which can be used to change that
      • mayhem[m]
        that is something that scientists would do. rather than learn computer science, lol.
      • lucifer[m] uploaded an image: (75KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/dGzSSXAnZUiqJPWqTeVfaVJM/image.png >
      • lucifer[m]
        oh wow i see what's going on
      • rimskii: if you are running it on wolf directly, use 5432 as the port
      • apparently there is another db running on wolf at 5433 port.
      • i'll see if i can find what is running that and shut it down
      • rimskii[m]
        oh okay
      • lucifer[m]
        done
      • rimskii[m]
        thank you!
      • lucifer[m]
        try again.
      • rimskii[m]
        its working now!
      • thanks
      • 5432 working
      • 5433 is shut down
      • lucifer[m]
        cool. to be clear, 5433 when you run locally, 5432 when you run on wolf.
      • bitmap[m]
        lucifer[m]: that was for the MB area GSoC project IIRC, so not needed
      • lucifer[m]
        makes sense
      • mayhem: spark has tf-idf vectorizer too btw
      • mayhem[m]
        good to know.
      • the problem with your threadpool approach is that it saves the results in ram until everything is processed. which makes the memory footprint much worse.
      • lucifer[m]
        can you try adding a del future inside the loop that processes the finished futures
      • mayhem[m]
        added. let me run it over the whole data and see what happens. hard to tell when it might finish though.
      • lucifer[m]
        or futures.remove(future) maybe
      • mayhem[m]: sounds good
      • BrainzGit
        [musicbrainz-server] 14mwiencek merged pull request #3294 (03beta…eaa-type-pot): Add `event_art_archive.art_type` to extract_pot_db https://github.com/metabrainz/musicbrainz-serve...
      • minimal has quit
      • mayhem[m]
        ha. your code was missing the thread_data = [] assignment, so that thread_data kept growing.
      • not having the growing thread data causes.... only one process to execute at a time.
      • lucifer[m]
        oh lol 😅
      • mayhem[m]
        ok, time to take a closer look at job control.
      • lucifer[m]
        mayhem[m]: huh that is weird
      • mayhem[m]
        there is deffo something weird.
      • I dont think the job control stuff is going to work. each of the indexes we build is tiny. so it probably never decides to use more than one thread.
      • d4rk-ph0enix joined the channel
      • d4rkie has quit
      • lucifer[m]
        mayhem: it is using 8 cores for me, running my script with the thread_data fix
      • check wolf
      • d4rk has quit
      • mayhem[m]
        ok, interesting. see if it finishes in reasonable amount of time.
      • I need to get moving for today anways
      • s/anways/anyways/
      • lucifer[m]
        free doesn't seem to point any major uptick in ram usage so far
      • d4rkie joined the channel
      • mayhem[m]
        yeah, it might not be as big as a problem as I suspect.
      • but for this POC, fine.
      • discordbrainz
        <06salaxceitor> hi! how are you guys doing
      • <06salaxceitor> came back after some time off and I trying to set up the development environment
      • <06salaxceitor> i am getting a blank screen with this errors in the browser
      • <06salaxceitor> - Failed to load resource: the server responded with a status of 404 (NOT FOUND)Understand this error indexPage.js:1Failed to load resource: the server responded with a status of 404 (NOT FOUND)Understand this error close_hover.png:1 Failed to load resource: net::ERR_CONNECTION_REFUSED
      • <06salaxceitor> no errors in the docker logs
      • monkey[m]
        Hello salaxceitor, which project are you trying to set up?
      • BrainzGit
        [musicbrainz-server] 14mwiencek opened pull request #3296 (03master…mbs-13615): MBS-13615: Skip same-entity relationships when batch-adding https://github.com/metabrainz/musicbrainz-serve...
      • zas[m]
        atj: we get a 302 redirects on both https://solrcloud-sir & solrcloud-privileged, that's solr answering this 302
      • it returns http because the internal connection is done over http
      • lucifer[m]
        mayhem: the statement that logged timings for each batch is wrong i think, due to which it appears that some indexes are built vastly slower than others
      • zas[m]
        atj: I think we need something as described in this thread: https://www.eclipse.org/lists/jetty-users/msg10...
      • mayhem[m]
        Has it finished building the whole index ever?
      • lucifer[m]
      • don't divide by recording_data
      • mayhem: no, it failed both times with ProcessPoolExecutor because nmslib indices cannot be pickled apparently. I can work on a version that save the indices to disk and load them in the main thread later.
      • mayhem[m]
        Ahhh, pickling. I was wondering how it was going to do the process boundary transfer
      • I wonder if the vectorizer code is separable from the rest of scikit learn so that we can run without silly constraints
      • I guess this is what Boeing pilots must contend with when they fly Airbus....
      • lucifer[m]
        lolol
      • mayhem[m]
        But I suspect Airbus let their pilots fly on all engines if they want...
      • lucifer[m] sent a code block: https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/khyNUcpDWceGaqMaWvSYIoKv
      • lucifer[m]
        ran it on a subset of the data
      • and i think it is doing the processing in parallel.
      • mayhem: yes, it is doing everything in parallel. pressing H after opening top i see multiple theads.
      • mayhem[m]
        Threads not procs? How did you accomplish that?
      • lucifer[m]
        in top?
      • mayhem[m]
        No, in the script
      • lucifer[m]
        ah okay that, threadpoolexecutor
      • mayhem[m]
        In top threads are shown as procs