in #metabrainz

0:20 AM
MajorLurker joined the channel
0:21 AM
lorenzuru has quit
0:23 AM
lorenzuru joined the channel
0:25 AM
MajorLurker has quit
0:54 AM
MajorLurker joined the channel
0:58 AM
MajorLurker has quit
1:02 AM
d4rkie has quit
1:03 AM
Nyanko-sensei joined the channel
3:57 AM
MajorLurker joined the channel
4:01 AM
MajorLurker has quit
5:21 AM
MajorLurker joined the channel
5:25 AM
MajorLurker has quit
5:27 AM
adk0971 joined the channel
7:17 AM
roger_that joined the channel
7:24 AM
roger_that

Hi yvanzo.... sorry I got disconnect from the chat and didn't see your message. I did some digging and I think the amqp trigger retries after a failure, so I think it's just a warning, but I had some other questions about the indexer because it does seem to be taking a long time for the live indexing. I followed the thread on github, I seem to be
7:24 AM
seeing a lot of "Index limit exceeded. Entity: recording, Total Rows: 5038988" but is it ok to bump that number up in my indexer.ini? What happens after the index limit is exceeded? It seems like activity in the indexer stops for about 10-15 minutes then processes more messages until it hits the index limit again. Just wanted to figure out what was
7:24 AM
going on so that maybe I could debug it. Sorry for the long message! Thanks!
7:56 AM
roger_that has quit
9:05 AM
Lotheric has quit
9:10 AM
adk0971 has quit
9:22 AM
iliekcomputers

good morning!
9:22 AM
MajorLurker joined the channel
9:26 AM
MajorLurker has quit
9:34 AM
Lotheric joined the channel
9:43 AM
roger_that joined the channel
9:54 AM
dseomn_ joined the channel
9:55 AM
ZaphodBeeblebrox joined the channel
9:55 AM
ZaphodBeeblebrox has quit
9:55 AM
ZaphodBeeblebrox joined the channel
9:59 AM
navap1 joined the channel
10:00 AM
spuniun- joined the channel
10:01 AM
Zhele_ joined the channel
10:01 AM
ijc_ joined the channel
10:02 AM
kloeri_ joined the channel
10:02 AM
yvanzo

roger_that: It limits the number of rows queried from PostgreSQL at once. If exceeded, it aborts processing the current messages and requeues them as failed.
10:03 AM
spuniun has quit
10:03 AM
dseomn has quit
10:03 AM
CatQuest has quit
10:03 AM
Zhele has quit
10:03 AM
ijc has quit
10:03 AM
kloeri has quit
10:03 AM
navap has quit
10:03 AM
dseomn_ is now known as dseomn
10:04 AM
It is ok to bump that number up as long as it fits allocated resources.
10:06 AM
BrainzGit has quit
10:08 AM
BrainzGit joined the channel
10:08 AM
Etua joined the channel
10:17 AM
adk0971 joined the channel
10:17 AM
Etua has quit
10:18 AM
ZaphodBeeblebrox is now known as CatQuest
10:26 AM
Mr_Monkey

Top of the time-of-day to you!
10:28 AM
ruaok

moin!
10:35 AM
adk0971 has quit
10:35 AM
travis-ci joined the channel
10:35 AM
travis-ci

Project bookbrainz-site build #3666: passed in 4 min 28 sec: https://travis-ci.org/bookbrainz/bookbrainz-sit...
10:35 AM
travis-ci has left the channel
10:50 AM
roger_that20 joined the channel
10:51 AM
roger_that has quit
10:52 AM
roger_that20

yvano: So if those messages fail, my search indexer can never be up to date?
10:52 AM
I have 8 cpus, 32gb ram... is it not enough to run the live indexing?
10:55 AM
And also... why do the failed messages cause such a big delay in processing the next messages? (Although the search server seems to be processing stuff - there's just nothing in the indexer logs for about 10-15 minutes after the failed query)
10:55 AM
I'm guessing the big queries use a lot of ram?
11:03 AM
Lotheric has quit
11:06 AM
Lotheric joined the channel
11:11 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1934 (master…MBS-11407): MBS-11407: Make Controller->error use React errors https://github.com/metabrainz/musicbrainz-serve...
11:15 AM
yvanzo

roger_that20: exactly, these queries require a lot of RAM and take a lot of time to be processed by PostgreSQL, so better reject them to allow for other queries to go through.
11:16 AM
It does not use the full power of your CPUs/RAM because each component (indexer, solr, postgres) are limited for safety.
11:17 AM
roger_that20

I see
11:17 AM
yvanzo

The tricky part is to correctly allocate a balanced amount of resources to each component so it works smoothly.
11:17 AM
roger_that20

What happens after they fail? Like I'm trying to understand what happens between the failed query and the next batch of messages that get processed
11:17 AM
yvanzo

Rejected messages are not lost, they are in a failed queue that can be processed later on.
11:18 AM
roger_that20

Oh ok
11:18 AM
kloeri_ is now known as kloeri
11:18 AM
And if I restart the containers.... the messages are saved?
11:18 AM
Like if I wanted to change the indexer_limit
11:18 AM
yvanzo

Yes, they are in the *_mqdata volume
11:18 AM
adk0971 joined the channel
11:20 AM
roger_that20

I noticed though, in that github thread... that JoshDi set his index limit to 1500000
11:20 AM
So i wonder how those messages were getting processed
11:21 AM
yvanzo

It’s not complete but it gives an overview at least: https://sir.readthedocs.io/en/stable/import.html
11:22 AM
roger_that20

Oh great
11:22 AM
I'll give that a read
11:31 AM
The docker version uses sir 1.0.2 though I believe?
11:31 AM
What do you guys have set for your indexer_limit?
11:34 AM
_lucifer

ruaok: is the the config file used for lb spark same as that for spark?
11:34 AM
*as that for lb webserver
11:37 AM
roger_that20 has quit
11:47 AM
ruaok

no those are different. one is in listenbrainz, the other in listenbrainz_spark
11:48 AM
_lucifer

yes, i understand that. i am asking about the file used to populate the configuration values in production.
11:48 AM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1935 (master…MBS-10012): MBS-10012: Treat http and https version of link as same for adding https://github.com/metabrainz/musicbrainz-serve...
11:49 AM
_lucifer

like for lb webserver there is docker-server-configs/consul/LB/config.prod.json. is this file used for configuration of spark as well?
11:49 AM
ruaok

the lb web server config.py is generated by consul.
11:50 AM
the spark config file is manually generated the current configuration.
11:50 AM
in /home/request_consumer/listenbrainz-server
11:50 AM
on leader.
11:50 AM
_lucifer

oh ok. thanks!
11:50 AM
ruaok

np
11:51 AM
ruaok sighs at the bank needing a "wet" signature.
11:51 AM
_lucifer

should we add that config to git also?
11:51 AM
ruaok busts out gimp and the photocopy filter
11:51 AM
somewhere in docker-server-configs for future reference
11:51 AM
ruaok

configs should not really ever go into git.
11:51 AM
oh, I see.
11:51 AM
reosarevok

ruaok: would it be *too* silly to drop some water on the signature area of the paper before scanning? :p
11:51 AM
ruaok

yes, that isn't a bad idea.
11:52 AM
_lucifer

great, i'll open a pr
11:52 AM
ruaok

reosarevok: I don't intend to scan anything. this is why the photocopy filter in gimp was invented.
11:52 AM
reosarevok

Oh
11:52 AM
I see :D
11:52 AM
ruaok

should I be cheecky and add a coffee stain filter too?
11:54 AM
reosarevok

A "written-in-pen" style "fuck stupid banks" on size 4 on the margins? :p
11:54 AM
ruaok

oh that will help for sure.
11:56 AM
reosarevok

bitmap, yvanzo: curious about your opinion on https://tickets.metabrainz.org/browse/MBS-10004 ?
11:56 AM
BrainzBot

MBS-10004: inconsistency between JSON-LD and documentation
11:56 AM
reosarevok

They kinda have a point that JSON-LD URIs probably should not be https?
11:57 AM
yvanzo

makes sense
11:58 AM
reosarevok

bitmap: do you remember if the https there was intentional?
11:59 AM
yvanzo

This is the same reason we use http for VIAF URLs.
12:01 PM
roger_that20: The only difference between 1.x and 2.x is about indexing recording’s first release date (in 2.x only).
12:03 PM
1.x is a bit behind 2.x for a few other points but that it will be catching up with them.
12:04 PM
reosarevok wonders about https://tickets.metabrainz.org/browse/MBS-9997 - seems that solved "itself", which usually means we fixed something in the meantime :p
12:05 PM
reosarevok

Related to the other -LD ticket is https://tickets.metabrainz.org/browse/MBS-9987 - also about main url vs identifier
12:05 PM
BrainzBot

MBS-9987: JSON-LD: Use "Concept URI" for Wikidata IRIs in the sameAs relation
12:06 PM
reosarevok

Kinda wish the two would be the same but :D
12:06 PM
My only worry is whether changing either of these will cause any issues with data users because the same value will be seen as changed or something
12:12 PM
adk0971 has quit
12:33 PM
ruaok

51M recordings in messybrainz. O_O
12:33 PM
that will take a moment or two to lookup.
12:44 PM
reosarevok

It does sound messy
12:46 PM
ruaok

I think I can bring the number down to a more reasonable amount.
12:55 PM
sumedh joined the channel
13:13 PM
BrainzGit

[musicbrainz-server] reosarevok opened pull request #1936 (master…MBS-11408): MBS-11408: Clarify Edit Note Author edit search options https://github.com/metabrainz/musicbrainz-serve...
13:13 PM
sumedh has quit
13:24 PM
MajorLurker joined the channel
13:29 PM
MajorLurker has quit
13:52 PM
sumedh joined the channel
14:02 PM
_lucifer

ruaok: iliekcomputers: i was looking in sentry logs and saw that there are over 11m spotify errors related to same user over the past few months. any ideas?
14:05 PM
ruaok

yeah, it been on my list to do. there are some simple things, some more complicated.
14:06 PM
_lucifer

i see that spotipy hides the actual error message.
14:06 PM
ruaok

I was trying to do weekly sentry cleanups for while, but got distracted by other things.
14:06 PM
_lucifer

i can help :)
14:06 PM
ruaok

the 400 errors are harder to clean up -- I think some users may have deleted their spotify account, but not turned off LB listen saving.
14:07 PM
I think once we get a 400 error enough times, we need to disable that account. we used to have some logic like that, but it ended up disabled 2/3 of the accounts. ;(
14:07 PM
please do help!3
14:09 PM
_lucifer

makes sense
14:09 PM
the 400 message might actually have some useful info.
14:10 PM
i'll try to reproduce this error locally and see if we can come up with a better solution
14:11 PM
ruaok

kewl.
14:12 PM
_lucifer: alastairp I'm running a "not nice" query on bono. if it starts smoking and burning, you know whom to blame.
14:12 PM
alastairp

ok