#metabrainz

/

      • MajorLurker joined the channel
      • lorenzuru has quit
      • lorenzuru joined the channel
      • MajorLurker has quit
      • MajorLurker joined the channel
      • MajorLurker has quit
      • d4rkie has quit
      • Nyanko-sensei joined the channel
      • MajorLurker joined the channel
      • MajorLurker has quit
      • MajorLurker joined the channel
      • MajorLurker has quit
      • adk0971 joined the channel
      • roger_that joined the channel
      • roger_that
        Hi yvanzo.... sorry I got disconnect from the chat and didn't see your message. I did some digging and I think the amqp trigger retries after a failure, so I think it's just a warning, but I had some other questions about the indexer because it does seem to be taking a long time for the live indexing. I followed the thread on github, I seem to be
      • seeing a lot of "Index limit exceeded. Entity: recording, Total Rows: 5038988" but is it ok to bump that number up in my indexer.ini? What happens after the index limit is exceeded? It seems like activity in the indexer stops for about 10-15 minutes then processes more messages until it hits the index limit again. Just wanted to figure out what was
      • going on so that maybe I could debug it. Sorry for the long message! Thanks!
      • roger_that has quit
      • Lotheric has quit
      • adk0971 has quit
      • iliekcomputers
        good morning!
      • MajorLurker joined the channel
      • MajorLurker has quit
      • Lotheric joined the channel
      • roger_that joined the channel
      • dseomn_ joined the channel
      • ZaphodBeeblebrox joined the channel
      • ZaphodBeeblebrox has quit
      • ZaphodBeeblebrox joined the channel
      • navap1 joined the channel
      • spuniun- joined the channel
      • Zhele_ joined the channel
      • ijc_ joined the channel
      • kloeri_ joined the channel
      • yvanzo
        roger_that: It limits the number of rows queried from PostgreSQL at once. If exceeded, it aborts processing the current messages and requeues them as failed.
      • spuniun has quit
      • dseomn has quit
      • CatQuest has quit
      • Zhele has quit
      • ijc has quit
      • kloeri has quit
      • navap has quit
      • dseomn_ is now known as dseomn
      • It is ok to bump that number up as long as it fits allocated resources.
      • BrainzGit has quit
      • BrainzGit joined the channel
      • Etua joined the channel
      • adk0971 joined the channel
      • Etua has quit
      • ZaphodBeeblebrox is now known as CatQuest
      • Mr_Monkey
        Top of the time-of-day to you!
      • ruaok
        moin!
      • adk0971 has quit
      • travis-ci joined the channel
      • travis-ci
        Project bookbrainz-site build #3666: passed in 4 min 28 sec: https://travis-ci.org/bookbrainz/bookbrainz-sit...
      • travis-ci has left the channel
      • roger_that20 joined the channel
      • roger_that has quit
      • roger_that20
        yvano: So if those messages fail, my search indexer can never be up to date?
      • I have 8 cpus, 32gb ram... is it not enough to run the live indexing?
      • And also... why do the failed messages cause such a big delay in processing the next messages? (Although the search server seems to be processing stuff - there's just nothing in the indexer logs for about 10-15 minutes after the failed query)
      • I'm guessing the big queries use a lot of ram?
      • Lotheric has quit
      • Lotheric joined the channel
      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1934 (master…MBS-11407): MBS-11407: Make Controller->error use React errors https://github.com/metabrainz/musicbrainz-serve...
      • yvanzo
        roger_that20: exactly, these queries require a lot of RAM and take a lot of time to be processed by PostgreSQL, so better reject them to allow for other queries to go through.
      • It does not use the full power of your CPUs/RAM because each component (indexer, solr, postgres) are limited for safety.
      • roger_that20
        I see
      • yvanzo
        The tricky part is to correctly allocate a balanced amount of resources to each component so it works smoothly.
      • roger_that20
        What happens after they fail? Like I'm trying to understand what happens between the failed query and the next batch of messages that get processed
      • yvanzo
        Rejected messages are not lost, they are in a failed queue that can be processed later on.
      • roger_that20
        Oh ok
      • kloeri_ is now known as kloeri
      • And if I restart the containers.... the messages are saved?
      • Like if I wanted to change the indexer_limit
      • yvanzo
        Yes, they are in the *_mqdata volume
      • adk0971 joined the channel
      • roger_that20
        I noticed though, in that github thread... that JoshDi set his index limit to 1500000
      • So i wonder how those messages were getting processed
      • yvanzo
        It’s not complete but it gives an overview at least: https://sir.readthedocs.io/en/stable/import.html
      • roger_that20
        Oh great
      • I'll give that a read
      • The docker version uses sir 1.0.2 though I believe?
      • What do you guys have set for your indexer_limit?
      • _lucifer
        ruaok: is the the config file used for lb spark same as that for spark?
      • *as that for lb webserver
      • roger_that20 has quit
      • ruaok
        no those are different. one is in listenbrainz, the other in listenbrainz_spark
      • _lucifer
        yes, i understand that. i am asking about the file used to populate the configuration values in production.
      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1935 (master…MBS-10012): MBS-10012: Treat http and https version of link as same for adding https://github.com/metabrainz/musicbrainz-serve...
      • _lucifer
        like for lb webserver there is docker-server-configs/consul/LB/config.prod.json. is this file used for configuration of spark as well?
      • ruaok
        the lb web server config.py is generated by consul.
      • the spark config file is manually generated the current configuration.
      • in /home/request_consumer/listenbrainz-server
      • on leader.
      • _lucifer
        oh ok. thanks!
      • ruaok
        np
      • ruaok sighs at the bank needing a "wet" signature.
      • _lucifer
        should we add that config to git also?
      • ruaok busts out gimp and the photocopy filter
      • somewhere in docker-server-configs for future reference
      • ruaok
        configs should not really ever go into git.
      • oh, I see.
      • reosarevok
        ruaok: would it be *too* silly to drop some water on the signature area of the paper before scanning? :p
      • ruaok
        yes, that isn't a bad idea.
      • _lucifer
        great, i'll open a pr
      • ruaok
        reosarevok: I don't intend to scan anything. this is why the photocopy filter in gimp was invented.
      • reosarevok
        Oh
      • I see :D
      • ruaok
        should I be cheecky and add a coffee stain filter too?
      • reosarevok
        A "written-in-pen" style "fuck stupid banks" on size 4 on the margins? :p
      • ruaok
        oh that will help for sure.
      • reosarevok
        bitmap, yvanzo: curious about your opinion on https://tickets.metabrainz.org/browse/MBS-10004 ?
      • BrainzBot
        MBS-10004: inconsistency between JSON-LD and documentation
      • reosarevok
        They kinda have a point that JSON-LD URIs probably should not be https?
      • yvanzo
        makes sense
      • reosarevok
        bitmap: do you remember if the https there was intentional?
      • yvanzo
        This is the same reason we use http for VIAF URLs.
      • roger_that20: The only difference between 1.x and 2.x is about indexing recording’s first release date (in 2.x only).
      • 1.x is a bit behind 2.x for a few other points but that it will be catching up with them.
      • reosarevok wonders about https://tickets.metabrainz.org/browse/MBS-9997 - seems that solved "itself", which usually means we fixed something in the meantime :p
      • reosarevok
        Related to the other -LD ticket is https://tickets.metabrainz.org/browse/MBS-9987 - also about main url vs identifier
      • BrainzBot
        MBS-9987: JSON-LD: Use "Concept URI" for Wikidata IRIs in the sameAs relation
      • reosarevok
        Kinda wish the two would be the same but :D
      • My only worry is whether changing either of these will cause any issues with data users because the same value will be seen as changed or something
      • adk0971 has quit
      • ruaok
        51M recordings in messybrainz. O_O
      • that will take a moment or two to lookup.
      • reosarevok
        It does sound messy
      • ruaok
        I think I can bring the number down to a more reasonable amount.
      • sumedh joined the channel
      • BrainzGit
        [musicbrainz-server] reosarevok opened pull request #1936 (master…MBS-11408): MBS-11408: Clarify Edit Note Author edit search options https://github.com/metabrainz/musicbrainz-serve...
      • sumedh has quit
      • MajorLurker joined the channel
      • MajorLurker has quit
      • sumedh joined the channel
      • _lucifer
        ruaok: iliekcomputers: i was looking in sentry logs and saw that there are over 11m spotify errors related to same user over the past few months. any ideas?
      • ruaok
        yeah, it been on my list to do. there are some simple things, some more complicated.
      • _lucifer
        i see that spotipy hides the actual error message.
      • ruaok
        I was trying to do weekly sentry cleanups for while, but got distracted by other things.
      • _lucifer
        i can help :)
      • ruaok
        the 400 errors are harder to clean up -- I think some users may have deleted their spotify account, but not turned off LB listen saving.
      • I think once we get a 400 error enough times, we need to disable that account. we used to have some logic like that, but it ended up disabled 2/3 of the accounts. ;(
      • please do help!3
      • _lucifer
        makes sense
      • the 400 message might actually have some useful info.
      • i'll try to reproduce this error locally and see if we can come up with a better solution
      • ruaok
        kewl.
      • _lucifer: alastairp I'm running a "not nice" query on bono. if it starts smoking and burning, you know whom to blame.
      • alastairp
        ok