both search servers stopped to answer at the same time @16:27 utc
stanislas
Freso: i think i finally solved the issue about installing my plugin
Freso: and i updated it so you might take a look
Freso: restarting calibre (but not shutting it using ctrl-c) helps
ruaok wishes the color between left and right were the same
ruaok
zas: the CLOSE_WAIT... I still can't decide if that is a cause or a symptom.
zas
it is a symptom
regagain_ joined the channel
ruaok
yeah, I think so too
zas
search search still accepts connections while answering threads are blocked or smt
ruaok nods
looks at established
ruaok
do you have the stack trace for when things are borked?
zas
i have one
ruaok
I think we should continue on the path of creating the google doc that we started last week.
zas
on ernie in /home/zas/
ruaok
update with everything that has changed -- now we're able to really ask for help since we're not using Fred Flintstone's tools anymore.
zas
yes
ruaok
the stacktrace is much more varied this time.
last time they were all stuck in icu code, now a lot are stuck throwing an exception.
ok, the plot is thickening
a lot of threads are blocked in writing network IO.
which suggests a gateway (related) issue, not a search issue.
yeeeargh joined the channel
zas: around the time of search server crashes, have you looked at syslog on the active gateway?
ruaok sees nothing of interest.
so, if I read the stackdump correctly, it looks like it is dying trying to write the results to the caller.
which is nginx
zas: I think we may want to examine our nginx setup and see if we're running out of ... something.
dpmittal has left the channel
zas: ping me when you're back please.
opatel99 joined the channel
opatel99
Mineo: I have to admit, I am stumped...
Mineo
if you tell me why, maybe we can fix that :)
opatel99
ELI5, what should I do? The threading seems straight forward, but the first portion of your comments was crypticto me. Should albums with MBIDs be clustered?
Mineo
imho only if the option to ignore mbids is true
the automatic clustering would be most useful for files that are not associated with anything in MB yet
if there are already MBIDs in the files, the MBIDs are much better information than can be provided by the clustering
opatel99
Ok... what if there is a combination?
Mineo
of options?
opatel99
of files with MBID and no MBID. Should I cluster the ones with no MBID and leave the MBIDs alone?
Mineo
ah, I had not actually thought of that yet
opatel99
:o
Mineo
in that case, I think it would make sense to have an additional method on the tagger object like 'cluster_non_mbid_files' or something that goes through all unmatched files and collects the ones without MBIDs and clusters those
tl;dr: yes
opatel99
Okay. Now what about that in combination with ignore MBIDs? If that option is selected, do everything?
Mineo
yes, just cluster all files in that case
opatel99
Cool. Giving it a shot.
Got any more Picard tasks up your sleeve btw? I am kinda useless here without Picard...
typhoe
Hello again, when trying to import dumps for the first time with the command "./admin/InitDb.pl -- --createdb --import /tmp/dumps/mbdump*.tar.bz2 --echo", I get an error "psql: FATAL: role "musicbrainz" does not exist"
zas
ruaok: i checked the nginx conf with bitmap, and we saw nothing wrong with it (that doesnt mean nothing is wrong)
typhoe
Should I create a role or create a clean db (--createdb --clean) before?
ruaok
understood.
there is an admin interface that we can get current stats from, yes?
Mineo
opatel99: I was thinking of making a task to improve/rewrite https://picard.musicbrainz.org/docs/scripting/ because a lot of people struggle with scripting, but I'm not yet sure what exactly needs to be improved
ruaok
I wonder if we should graph the number of buffers, number of connections, anything for the search* configurations.
this latest stacktrace really suggests that this is an internal configuration issue and not a lucene/java issue.
I'd love to get your read on the current state of things.
akirom has quit
stanislas
LordSputnik, Leftmost: I've done my second plugin. I would be grateful if you review my work. I've not submitted it on gci yet, I just want to know your opinion at this stage. Link to my repo : https://github.com/stasszczesniak/CalibreBookBr...
ruaok
zas: this doesn't seem detailed enough for my desires, but let's start graphing:
opatel99: You could expand your horizons! I'm about to add two more CB tasks, and there's a bunch of unclaimed beets tasks too, as I mentioned previously.
stanislas: I won't be able to try it out until tomorrow, but one thing you could do to improve would be to split out the bits of code that initialize the UI into separate functions, so the large methods you have at the moment become smaller and easier to maintain
Freso
^ +1 (even if I haven't actually looked at the code :))
zas
ruaok: this is collected since some time already, if possible (=module enabled)
Mineo
regarding the stackdump: I wonder why a lot of threads are in some EOFException, all coming from eclipse-persistence's JSONWriterRecord
stanislas
LordSputnik. Ok, i will try to clean my code. Thanks. Maybe Leftmost is willing to review it today.
LordSputnik
stanislas: hopefully! I'm willing, but universitry deadlines mean that I've not got the time this evening :(
opatel99
I have exams this entire week... Gonna be so behind once I am done..
stanislas
LordSputnik: Oh i understand, i have geometry exam tomorrow :)
reosarevok
stanislas: less coding, more studying! ;)
regagain_ has quit
LordSputnik
stanislas: I could add a day onto the task if you like, just in case we don't have it wrapped up by tomorrow evening?
Mineo
regarding the bb plugin for calibre: you're aware that you're working around Qts event model by using urllib, right?
ruaok
Mineo: yes, exactly that.
stanislas
LordSputnik: seems like a good idea
Mineo: What do you mean ?
Freso
stanislas: I'll give it a whirl. :)
ruaok
JSONWriterRecord sounds like the bog standard send the response to the caller and it gets stuck somehow.
and that somehow would be caused by nginx, since that is what is on the other end.
thus leading me to think we need to examine our nginx config.
stanislas
Mineo: No, I don't.
ruaok
Mineo: does that line of thinking make sense to you?
stanislas
Mineo: I don't even understand what do you mean by "working areound Qts event model by using urllib"
Mineo
ruaok: what's surprising to me is that none of the EOFExceptions are related to xml responses getting written, although I suspect the number of those to be way higher than the json ones
stanislas
Mineo: Are you talking about my plugin ?
Mineo
oh, wait, the website uses json as well, right?
ruaok
yes.
IIRC
still, that is an interesting observations.
-s
Mineo
stanislas: sorry, I didn't want to try having two conversations at once :-)
stanislas
Mineo; ok
Mineo
stanislas: calibre seems to be built on Qt which models everything i/o-related (reading files, sending data over the network etc.) as events with callbacks attached to them
this allows it to do other things while i/o is happening in the background without having to spawn a new thread for every action
by using urllib to request data from bookbrainz, everything else is blocked while the http request is in progress
this works if bookbrainz is responding fast, but doesn't work quite so well if the bookbrainz servers take a long time to respond or are completely offline
stanislas
Mineo: Would doing all https requests in some other thread solve the problem ?
Mineo
yes and no :P I would expect there to be some helper methods for plugins in calibre
opatel99
Mineo: Can you explain why the upload for many files fails with auto cluster, but not without?
regagain_ joined the channel
bitmap
ruaok: we've been getting ISEs for cut-off JSON response from the search server since at least 2013, probably longer
Mineo
opatel99: no, I don't really know why that happens, but I think the clustering engine is not meant to be called from multiple threads
ruaok
bitmap: that is interesting.
opatel99
So QSemaphore reserves threads?
ruaok
how frequent are those?
I wonder if they happen due to index rotation or if they happen when the servers choke
can we graph the occurance of those, bitmap, zas?
bitmap
we usually get a couple/few a day I think
Mineo
opatel99: no, it allows you to count the number of active threads (which you can't just do by incrementing a normal variable in multiple threads)
bitmap
they include the JSON and show where it gets cut off (then mbserver fails to parse it, hence the ISE)
ruaok
ok, a few a day really seems to be related to the index rotation.
not too much we can do about that with the current setup