if k8s ends up being too far to one side on the complexity scale, nomad (by hashicorp) has been brought up occasionally in here before and could be worth investigating
2025-02-21 05208, 2025
julian45[m]
and finally, one last thing before i step off my soapbox and go focus on things i should be doing: this may be based on a naive understanding of how ansible is currently used within MeB, but it may be beneficial to us to use a tool like AWX^ to help centralize ansible usage & observability related to it. having one place from which to run roles/playbooks would make it easier to see which are currently being used/enforced for any
2025-02-21 05208, 2025
julian45[m]
given target server, as well as a place to trigger runs of playbooks, keep track of execution history, and provide a single consistent execution environment for ansible jobs. https://github.com/ansible/awx
2025-02-21 05208, 2025
julian45[m]
^ OSS upstream to red hat's ansible automation platform product (formerly known as ansible tower)
2025-02-21 05207, 2025
minimal has quit
2025-02-21 05213, 2025
leonardo- joined the channel
2025-02-21 05238, 2025
leonardo has quit
2025-02-21 05246, 2025
lucifer[m]
mayhem: julian45 yup i have been thinking of upgrading timescale PG db to postgres 16 for parity with MB, if we want to do a server upgrade best to club both together.
2025-02-21 05209, 2025
lucifer[m]
a simple master/standby for timescale would be great or atleast adding barman backups to it. ideally without adding the complexity of new tools.
but, the index only indexes the first x characters of each string. (which is why I was looking at the graphs)
2025-02-21 05212, 2025
lucifer[m]
3.5G is with how many characters?
2025-02-21 05223, 2025
mayhem[m]
I am now debating if I should store all the excess chars of each string in the index, or take the results and fetch them from PG.
2025-02-21 05243, 2025
mayhem[m]
30 characters, currently. but most of the index data is literally all the strings and mbids residing in ram.
2025-02-21 05256, 2025
mayhem[m]
nmslib only has a single int that can be stored in the index.
2025-02-21 05224, 2025
lucifer[m]
i think doing a subsequent PG query makes sense to me.
2025-02-21 05238, 2025
mayhem[m]
so, secondary data needs to be stored in the index (which makes indexes huge and slow to load) OR we simply make the index as light as possible and then ask PG for the results.
2025-02-21 05245, 2025
lucifer[m]
the biggest user of this index would be the mapper i think and that only needs a recording mbid.
2025-02-21 05211, 2025
mayhem[m]
lucifer[m]: it does, but not if we use PG twice. right now we check for exact matches using PG and if nothing is found, we go to typsense.
2025-02-21 05214, 2025
lucifer[m]
you could make an index in PG on the int and recording_mbid.
2025-02-21 05242, 2025
mayhem[m]
we could also add recording_id to the canonical data.
2025-02-21 05243, 2025
lucifer[m]
and it would never query the table for that query.
2025-02-21 05255, 2025
lucifer[m]
sure.
2025-02-21 05203, 2025
mayhem[m]
because this index is built off canonical data, not all MB data.
2025-02-21 05218, 2025
lucifer[m]
sounds good to me.
2025-02-21 05221, 2025
mayhem[m]
but it really comes down to one of two ways:
2025-02-21 05235, 2025
mayhem[m]
1) Store everything in the index and have indexes be slow to load.
2025-02-21 05243, 2025
mayhem[m]
2) store nothing in index and fetch everything from PG.
2025-02-21 05208, 2025
mayhem[m]
and I am leaning towards 2.
2025-02-21 05215, 2025
lucifer[m]
yup same.
2025-02-21 05221, 2025
mayhem[m]
cool.
2025-02-21 05229, 2025
lucifer[m]
for mbid mapper you can even skip the pg query.
2025-02-21 05230, 2025
mayhem[m]
now comes the question: how the fuck do we host this?
2025-02-21 05249, 2025
lucifer[m]
just write the recording id to the table and let the recording mbid be resolved later.
2025-02-21 05200, 2025
mayhem[m]
lucifer: maybe. if the query strings are shorter than the max chars, possibly.
2025-02-21 05210, 2025
lucifer[m]
either way indexes would optimize it.
2025-02-21 05228, 2025
mayhem[m]
but hosting, is a pita.
2025-02-21 05240, 2025
lucifer[m]
mayhem[m]: how much ram is consumed when all recording indexes have been built?
2025-02-21 05244, 2025
mayhem[m]
ideally it would be a single-process, multi-threaded app.
2025-02-21 05227, 2025
mayhem[m]
lucifer[m]: dont know and I realize that this is a bad pattern. we will never want all of MB resident in the index. that's nonsense. LB users are going to be listening to a subset of all the data, so we should only keep the active things in the index.
2025-02-21 05247, 2025
mayhem[m]
lets leave this for one sec. we'll get to it.
2025-02-21 05249, 2025
lucifer[m]
so a lru cache?
2025-02-21 05252, 2025
mayhem[m]
yes!
2025-02-21 05212, 2025
mayhem[m]
but we can;t have a purely threaded app until the GIL is reliably gone.
2025-02-21 05225, 2025
lucifer[m]
i think we could host it on a vm or a separate server with enough ram.
2025-02-21 05231, 2025
mayhem[m]
it needs to be multi-process, mutli-thread.
2025-02-21 05239, 2025
mayhem[m]
yes!
2025-02-21 05201, 2025
mayhem[m]
I am thinking of dividing the dataset into chunks.
2025-02-21 05224, 2025
lucifer[m]
sharding based on names and hosting multiple instances of the app?
2025-02-21 05227, 2025
mayhem[m]
on the first level, we'll decide to break the data into P chunks where P is the number of desired processes.
2025-02-21 05259, 2025
mayhem[m]
say strings that start with A-E in process #1, F-H in #2 and so on.
2025-02-21 05229, 2025
mayhem[m]
when a request comes in, it gets resolved to a backend process to handle.
2025-02-21 05246, 2025
mayhem[m]
the process carries it out and returns the results to the dispatcher.
2025-02-21 05203, 2025
mayhem[m]
but each process' data is further broken into MANY smaller chunks.
2025-02-21 05221, 2025
mayhem[m]
and the many smaller chunks are not all loaded at load time -- everything is lazy loaded.
2025-02-21 05238, 2025
mayhem[m]
but there is a flat file that has all the built indexes serialized to disk.
2025-02-21 05255, 2025
lucifer[m]
have you tried multithreaded querying?
2025-02-21 05257, 2025
mayhem[m]
and the index we load into ram, says: this index chunk for this query is in file X, offset O, length L.
2025-02-21 05210, 2025
mayhem[m]
fetch the index, unpickle, query.
2025-02-21 05242, 2025
mayhem[m]
and then we set an upper memory consumption limit. if the proc get to that limit, it dumps LRU indexes.
2025-02-21 05245, 2025
lucifer[m]
like host the index behind a uwsgi flask app and make multiple concurrent requests and see if it works. it might handle itself automatically for you.
according to this the actual search code doesn't hold the GIL only the glue code does so its possible it might work.
2025-02-21 05244, 2025
lucifer[m]
so worth a try if you already haven't.
2025-02-21 05256, 2025
mayhem[m]
you're suggested a single flask app, single process and see what happens?
2025-02-21 05224, 2025
lucifer[m]
actually even simpler. just a python process with a threadpoolexecutor to query the indexer.
2025-02-21 05207, 2025
lucifer[m]
test say 1k-10k items. on a single thread and two threads. and compare overall time to execute.
2025-02-21 05211, 2025
mayhem[m]
well, a flask app is our end goal, so lets express it in terms of that. :)
2025-02-21 05244, 2025
mayhem[m]
I suppose I can stand up a simple flask end point and try it.
2025-02-21 05202, 2025
lucifer[m]
sure a flask app but the dev mode is limited to one thread so with uwsgi workers and enable-threads to use threads instead of processes.
2025-02-21 05217, 2025
mayhem[m]
yep, fer sure.
2025-02-21 05225, 2025
mayhem[m]
let me do that
2025-02-21 05249, 2025
lucifer[m]
i think a threadpoolexecutor is a simpler and more accurate test though. and easier to debug too if needed.
2025-02-21 05255, 2025
mayhem[m]
I detest threadpoolexecutor, I have to say. its always mental gymastics to get it what I need it to do.
2025-02-21 05258, 2025
mayhem[m]
and I am not sure if accurate is the correct term. our goal is to run under uwsgi, when not test there? testing in an artificial setup may not reflect reality.
2025-02-21 05204, 2025
the4oo4 joined the channel
2025-02-21 05250, 2025
yvanzo[m]
julian45: About the SSO app in Jira provided by miniOrange, a faithful partner of Atlassian, the app is well maintained and closely follows new versions of Jira including 10.x. Happy to replace it with Jira 10.x native SSO feature if can be, but we can just keep using this app otherwise.
2025-02-21 05202, 2025
yvanzo[m]
bitmap: Not sure. I would have guessed that it came from some Ansible repository but I couldn’t find anything. However, it matches the user `brainz` in the sshd container for fullexport.
2025-02-21 05227, 2025
monkey[m]
ansh: I've been tweaking the mobile UI PR, and I think it's in a good state to get some feedback.
2025-02-21 05227, 2025
monkey[m]
I'd like yours and aerozol's first if you have any, then perhaps we can deploy it to beta for a little while to get feedback from the community?
<yvanzo[m]> "bitmap: Not sure. I would have..." <- ah right, the brainz user is also in sshd-musicbrainz-json-dumps-incremental (which is based on the same sshd image). thanks! I'll just add a comment then
2025-02-21 05249, 2025
suvid[m] joined the channel
2025-02-21 05249, 2025
suvid[m]
I was planning on working on this ticket:... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/cHtvNcHeyvAAYslGKIWZODBZ>)
2025-02-21 05226, 2025
julian45[m]
<yvanzo[m]> "julian45: About the SSO app in..." <- Got it, thanks! I was not aware of miniOrange's relationship with Atlassian, so this is good context for me to have.
2025-02-21 05223, 2025
monkey[m]
<suvid[m]> "I was planning on working on..." <- suvid: For this, you would want a toggle to turn of BrainzPlayer entirely. We already hava apage for BP settings at https://listenbrainz.org/settings/brainzplayer/
2025-02-21 05223, 2025
monkey[m]
Then there will be some conditional rendering in a few places depending on the activation state of BP (for example hiding the play icon buttons on all the listencards, not rendering or loading the BrainzPlayer component, etc.)
2025-02-21 05247, 2025
mthax has quit
2025-02-21 05239, 2025
mthax joined the channel
2025-02-21 05242, 2025
mthax has quit
2025-02-21 05219, 2025
mthax joined the channel
2025-02-21 05202, 2025
mthax has quit
2025-02-21 05209, 2025
mthax joined the channel
2025-02-21 05240, 2025
jasje[m] joined the channel
2025-02-21 05240, 2025
jasje[m]
Note for concerned: I wont be available till the end of this month (28th feb). Available for imp stuff only :)
2025-02-21 05208, 2025
jasje[m]
* be available (travelling) till the
2025-02-21 05203, 2025
aerozol[m]
Love the stats mayhem! I'll share them this weekend
2025-02-21 05230, 2025
aerozol[m]
reosarevok: re MBS-13945, I saw that the other day and it seemed relatively straightforward? The wording could use some work (I'm not sure "source MBID" will be universally understood) but I don't know if there's a downside. Whether it's done in the UI or via modbot I assume is a technical question. Let me know if you want me to comment on the ticket re. anything in particular