#metabrainz

/

0:08 AM
julian45[m]

if k8s ends up being too far to one side on the complexity scale, nomad (by hashicorp) has been brought up occasionally in here before and could be worth investigating

2025-02-21 05208, 2025

0:30 AM
julian45[m]

and finally, one last thing before i step off my soapbox and go focus on things i should be doing: this may be based on a naive understanding of how ansible is currently used within MeB, but it may be beneficial to us to use a tool like AWX^ to help centralize ansible usage & observability related to it. having one place from which to run roles/playbooks would make it easier to see which are currently being used/enforced for any

2025-02-21 05208, 2025

0:30 AM
julian45[m]

given target server, as well as a place to trigger runs of playbooks, keep track of execution history, and provide a single consistent execution environment for ansible jobs. https://github.com/ansible/awx

2025-02-21 05208, 2025

0:30 AM
julian45[m]

^ OSS upstream to red hat's ansible automation platform product (formerly known as ansible tower)

2025-02-21 05207, 2025

1:09 AM
minimal has quit

2025-02-21 05213, 2025

4:07 AM
leonardo- joined the channel

2025-02-21 05238, 2025

4:07 AM
leonardo has quit

2025-02-21 05246, 2025

4:29 AM
lucifer[m]

mayhem: julian45 yup i have been thinking of upgrading timescale PG db to postgres 16 for parity with MB, if we want to do a server upgrade best to club both together.

2025-02-21 05209, 2025

4:33 AM
lucifer[m]

a simple master/standby for timescale would be great or atleast adding barman backups to it. ideally without adding the complexity of new tools.

2025-02-21 05215, 2025

4:39 AM
lucifer[m]

bitmap: done

2025-02-21 05239, 2025

5:42 AM
lusciouslover has quit

2025-02-21 05259, 2025

5:42 AM
lusciouslover joined the channel

2025-02-21 05250, 2025

5:46 AM
pite has quit

2025-02-21 05231, 2025

6:07 AM
function1_ joined the channel

2025-02-21 05224, 2025

6:08 AM
function1 has quit

2025-02-21 05247, 2025

6:10 AM
leonardo joined the channel

2025-02-21 05242, 2025

6:11 AM
leonardo- has quit

2025-02-21 05252, 2025

6:42 AM
lusciouslover has quit

2025-02-21 05246, 2025

6:43 AM
lusciouslover joined the channel

2025-02-21 05216, 2025

7:59 AM
allen joined the channel

2025-02-21 05241, 2025

8:30 AM
Kladky joined the channel

2025-02-21 05209, 2025

10:00 AM
the4oo4- has quit

2025-02-21 05247, 2025

11:19 AM
BrainzGit

[listenbrainz-server] 14amCap1712 opened pull request #3194 (03master…this-stats-to-date): Update this_(week/month/year) time ranges https://github.com/metabrainz/listenbrainz-server…

2025-02-21 05219, 2025

11:50 AM
BrainzGit

[listenbrainz-server] 14amCap1712 merged pull request #3194 (03master…this-stats-to-date): Update this_(week/month/year) time ranges https://github.com/metabrainz/listenbrainz-server…

2025-02-21 05209, 2025

12:55 PM
pite joined the channel

2025-02-21 05259, 2025

13:10 PM
mayhem[m]

some MB data graph porn for your friday:

2025-02-21 05201, 2025

13:11 PM
mayhem[m] uploaded an image: (60KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/URkmbpvGcvpiVNIFSIALufxk/image.png >

2025-02-21 05210, 2025

13:11 PM
mayhem[m]

recording name lengths in MB.

2025-02-21 05232, 2025

13:13 PM
monkey[m]

What's that spike at 80 chars ?!

2025-02-21 05246, 2025

13:13 PM
mayhem[m]

everything longer than 80 characters

2025-02-21 05251, 2025

13:13 PM
monkey[m]

Ah, I see

2025-02-21 05238, 2025

13:15 PM
monkey[m] wonders if it follows Zipf's law

2025-02-21 05207, 2025

13:20 PM
mayhem[m]

not sure, bit it fits zeppelin's law...

2025-02-21 05219, 2025

13:20 PM
mayhem[m]

s/bit/but/

2025-02-21 05210, 2025

13:21 PM
monkey[m]

Is it.... a stairway?

2025-02-21 05250, 2025

13:21 PM
mayhem[m] uploaded an image: (810KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/ixawiNCOWJZwOPWBBqwXMyYN/image.png >

2025-02-21 05202, 2025

13:27 PM
lucifer[m]

mayhem: can you please also review LB#3193

2025-02-21 05203, 2025

13:27 PM
BrainzBot

Implement listen deletion in the spark cluster: https://github.com/metabrainz/listenbrainz-server…

2025-02-21 05213, 2025

13:27 PM
mayhem[m] uploaded an image: (22KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/siWDMtNVMpzDjZcQzhVIPWKJ/image.png >

2025-02-21 05222, 2025

13:27 PM
mayhem[m]

same, but for artists with a max of 40 chars.

2025-02-21 05204, 2025

13:33 PM
mayhem[m]

lucifer: do you happen to be about?

2025-02-21 05212, 2025

13:33 PM
lucifer[m]

yu

2025-02-21 05215, 2025

13:33 PM
lucifer[m]

*yup

2025-02-21 05227, 2025

13:33 PM
mayhem[m]

kewl, want to talk through something.

2025-02-21 05242, 2025

13:33 PM
mayhem[m]

so, I got the nmslib indexes to persist to disk -- which is a great!

2025-02-21 05259, 2025

13:33 PM
mayhem[m]

someone implemented that feature in scikit learn and it works fine.

2025-02-21 05222, 2025

13:34 PM
mayhem[m]

so I made an artists index that also loads all the tracks for each artist into ram.

2025-02-21 05232, 2025

13:34 PM
mayhem[m]

and then persists the whole thing to disk.

2025-02-21 05239, 2025

13:34 PM
lucifer[m]

makes sense

2025-02-21 05204, 2025

13:35 PM
mayhem[m]

3.5G on disk, which isn't bad.

2025-02-21 05250, 2025

13:35 PM
mayhem[m]

it takes 30 seconds to load from disk and all recording indexes are built at search time on demand.

2025-02-21 05213, 2025

13:36 PM
mayhem[m]

so the first search might take 40ms, but a subsequent search takes about 4-10ms.

2025-02-21 05219, 2025

13:36 PM
reosarevok[m]

aerozol: another thing where your ideas would be appreciated: MBS-13945

2025-02-21 05220, 2025

13:36 PM
BrainzBot

MBS-13945: Include release link for edits made in the release relationship editor https://tickets.metabrainz.org/browse/MBS-13945

2025-02-21 05221, 2025

13:36 PM
mayhem[m]

which is all great, really.

2025-02-21 05230, 2025

13:36 PM
lucifer[m]

yup sounds good.

2025-02-21 05247, 2025

13:36 PM
mayhem[m]

but, the index only indexes the first x characters of each string. (which is why I was looking at the graphs)

2025-02-21 05212, 2025

13:37 PM
lucifer[m]

3.5G is with how many characters?

2025-02-21 05223, 2025

13:37 PM
mayhem[m]

I am now debating if I should store all the excess chars of each string in the index, or take the results and fetch them from PG.

2025-02-21 05243, 2025

13:37 PM
mayhem[m]

30 characters, currently. but most of the index data is literally all the strings and mbids residing in ram.

2025-02-21 05256, 2025

13:37 PM
mayhem[m]

nmslib only has a single int that can be stored in the index.

2025-02-21 05224, 2025

13:38 PM
lucifer[m]

i think doing a subsequent PG query makes sense to me.

2025-02-21 05238, 2025

13:38 PM
mayhem[m]

so, secondary data needs to be stored in the index (which makes indexes huge and slow to load) OR we simply make the index as light as possible and then ask PG for the results.

2025-02-21 05245, 2025

13:38 PM
lucifer[m]

the biggest user of this index would be the mapper i think and that only needs a recording mbid.

2025-02-21 05211, 2025

13:39 PM
mayhem[m]

lucifer[m]: it does, but not if we use PG twice. right now we check for exact matches using PG and if nothing is found, we go to typsense.

2025-02-21 05214, 2025

13:39 PM
lucifer[m]

you could make an index in PG on the int and recording_mbid.

2025-02-21 05242, 2025

13:39 PM
mayhem[m]

we could also add recording_id to the canonical data.

2025-02-21 05243, 2025

13:39 PM
lucifer[m]

and it would never query the table for that query.

2025-02-21 05255, 2025

13:39 PM
lucifer[m]

sure.

2025-02-21 05203, 2025

13:40 PM
mayhem[m]

because this index is built off canonical data, not all MB data.

2025-02-21 05218, 2025

13:40 PM
lucifer[m]

sounds good to me.

2025-02-21 05221, 2025

13:40 PM
mayhem[m]

but it really comes down to one of two ways:

2025-02-21 05235, 2025

13:40 PM
mayhem[m]

1) Store everything in the index and have indexes be slow to load.

2025-02-21 05243, 2025

13:40 PM
mayhem[m]

2) store nothing in index and fetch everything from PG.

2025-02-21 05208, 2025

13:41 PM
mayhem[m]

and I am leaning towards 2.

2025-02-21 05215, 2025

13:41 PM
lucifer[m]

yup same.

2025-02-21 05221, 2025

13:41 PM
mayhem[m]

cool.

2025-02-21 05229, 2025

13:41 PM
lucifer[m]

for mbid mapper you can even skip the pg query.

2025-02-21 05230, 2025

13:41 PM
mayhem[m]

now comes the question: how the fuck do we host this?

2025-02-21 05249, 2025

13:41 PM
lucifer[m]

just write the recording id to the table and let the recording mbid be resolved later.

2025-02-21 05200, 2025

13:42 PM
mayhem[m]

lucifer: maybe. if the query strings are shorter than the max chars, possibly.

2025-02-21 05210, 2025

13:42 PM
lucifer[m]

either way indexes would optimize it.

2025-02-21 05228, 2025

13:42 PM
mayhem[m]

but hosting, is a pita.

2025-02-21 05240, 2025

13:42 PM
lucifer[m]

mayhem[m]: how much ram is consumed when all recording indexes have been built?

2025-02-21 05244, 2025

13:42 PM
mayhem[m]

ideally it would be a single-process, multi-threaded app.

2025-02-21 05227, 2025

13:43 PM
mayhem[m]

lucifer[m]: dont know and I realize that this is a bad pattern. we will never want all of MB resident in the index. that's nonsense. LB users are going to be listening to a subset of all the data, so we should only keep the active things in the index.

2025-02-21 05247, 2025

13:43 PM
mayhem[m]

lets leave this for one sec. we'll get to it.

2025-02-21 05249, 2025

13:43 PM
lucifer[m]

so a lru cache?

2025-02-21 05252, 2025

13:43 PM
mayhem[m]

yes!

2025-02-21 05212, 2025

13:44 PM
mayhem[m]

but we can;t have a purely threaded app until the GIL is reliably gone.

2025-02-21 05225, 2025

13:44 PM
lucifer[m]

i think we could host it on a vm or a separate server with enough ram.

2025-02-21 05231, 2025

13:44 PM
mayhem[m]

it needs to be multi-process, mutli-thread.

2025-02-21 05239, 2025

13:44 PM
mayhem[m]

yes!

2025-02-21 05201, 2025

13:45 PM
mayhem[m]

I am thinking of dividing the dataset into chunks.

2025-02-21 05224, 2025

13:45 PM
lucifer[m]

sharding based on names and hosting multiple instances of the app?

2025-02-21 05227, 2025

13:45 PM
mayhem[m]

on the first level, we'll decide to break the data into P chunks where P is the number of desired processes.

2025-02-21 05259, 2025

13:45 PM
mayhem[m]

say strings that start with A-E in process #1, F-H in #2 and so on.

2025-02-21 05229, 2025

13:46 PM
mayhem[m]

when a request comes in, it gets resolved to a backend process to handle.

2025-02-21 05246, 2025

13:46 PM
mayhem[m]

the process carries it out and returns the results to the dispatcher.

2025-02-21 05203, 2025

13:47 PM
mayhem[m]

but each process' data is further broken into MANY smaller chunks.

2025-02-21 05221, 2025

13:47 PM
mayhem[m]

and the many smaller chunks are not all loaded at load time -- everything is lazy loaded.

2025-02-21 05238, 2025

13:47 PM
mayhem[m]

but there is a flat file that has all the built indexes serialized to disk.

2025-02-21 05255, 2025

13:47 PM
lucifer[m]

have you tried multithreaded querying?

2025-02-21 05257, 2025

13:47 PM
mayhem[m]

and the index we load into ram, says: this index chunk for this query is in file X, offset O, length L.

2025-02-21 05210, 2025

13:48 PM
mayhem[m]

fetch the index, unpickle, query.

2025-02-21 05242, 2025

13:48 PM
mayhem[m]

and then we set an upper memory consumption limit. if the proc get to that limit, it dumps LRU indexes.

2025-02-21 05245, 2025

13:48 PM
lucifer[m]

like host the index behind a uwsgi flask app and make multiple concurrent requests and see if it works. it might handle itself automatically for you.

2025-02-21 05206, 2025

13:49 PM
mayhem[m]

lucifer[m]: the GIL will be in your way.

2025-02-21 05217, 2025

13:49 PM
mayhem[m]

you won't get much beyond 100% CPU use

2025-02-21 05200, 2025

13:50 PM
lucifer[m]

https://github.com/nmslib/hnswlib/issues/104#issu…

2025-02-21 05224, 2025

13:50 PM
lucifer[m]

according to this the actual search code doesn't hold the GIL only the glue code does so its possible it might work.

2025-02-21 05244, 2025

13:50 PM
lucifer[m]

so worth a try if you already haven't.

2025-02-21 05256, 2025

13:50 PM
mayhem[m]

you're suggested a single flask app, single process and see what happens?

2025-02-21 05224, 2025

13:51 PM
lucifer[m]

actually even simpler. just a python process with a threadpoolexecutor to query the indexer.

2025-02-21 05207, 2025

13:52 PM
lucifer[m]

test say 1k-10k items. on a single thread and two threads. and compare overall time to execute.

2025-02-21 05211, 2025

13:52 PM
mayhem[m]

well, a flask app is our end goal, so lets express it in terms of that. :)

2025-02-21 05244, 2025

13:52 PM
mayhem[m]

I suppose I can stand up a simple flask end point and try it.

2025-02-21 05202, 2025

13:53 PM
lucifer[m]

sure a flask app but the dev mode is limited to one thread so with uwsgi workers and enable-threads to use threads instead of processes.

2025-02-21 05217, 2025

13:53 PM
mayhem[m]

yep, fer sure.

2025-02-21 05225, 2025

13:53 PM
mayhem[m]

let me do that

2025-02-21 05249, 2025

13:53 PM
lucifer[m]

i think a threadpoolexecutor is a simpler and more accurate test though. and easier to debug too if needed.

2025-02-21 05255, 2025

14:00 PM
mayhem[m]

I detest threadpoolexecutor, I have to say. its always mental gymastics to get it what I need it to do.

2025-02-21 05258, 2025

14:01 PM
mayhem[m]

and I am not sure if accurate is the correct term. our goal is to run under uwsgi, when not test there? testing in an artificial setup may not reflect reality.

2025-02-21 05204, 2025

14:24 PM
the4oo4 joined the channel

2025-02-21 05250, 2025

14:47 PM
yvanzo[m]

julian45: About the SSO app in Jira provided by miniOrange, a faithful partner of Atlassian, the app is well maintained and closely follows new versions of Jira including 10.x. Happy to replace it with Jira 10.x native SSO feature if can be, but we can just keep using this app otherwise.

2025-02-21 05202, 2025

14:54 PM
yvanzo[m]

bitmap: Not sure. I would have guessed that it came from some Ansible repository but I couldn’t find anything. However, it matches the user `brainz` in the sshd container for fullexport.

2025-02-21 05227, 2025

15:08 PM
monkey[m]

ansh: I've been tweaking the mobile UI PR, and I think it's in a good state to get some feedback.

2025-02-21 05227, 2025

15:08 PM
monkey[m]

I'd like yours and aerozol's first if you have any, then perhaps we can deploy it to beta for a little while to get feedback from the community?

2025-02-21 05252, 2025

15:09 PM
monkey[m]

Currently deployed to test.LB , BTW

2025-02-21 05229, 2025

15:25 PM
MyNetAz has quit

2025-02-21 05224, 2025

15:36 PM
MyNetAz joined the channel

2025-02-21 05224, 2025

15:45 PM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #3484 (03master…MBS-13771): MBS-13771: Filter edit search by RG primary type https://github.com/metabrainz/musicbrainz-server/…

2025-02-21 05216, 2025

16:07 PM
bitmap[m]

<yvanzo[m]> "bitmap: Not sure. I would have..." <- ah right, the brainz user is also in sshd-musicbrainz-json-dumps-incremental (which is based on the same sshd image). thanks! I'll just add a comment then

2025-02-21 05249, 2025

16:36 PM
suvid[m] joined the channel

2025-02-21 05249, 2025

16:36 PM
suvid[m]

I was planning on working on this ticket:... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/cHtvNcHeyvAAYslGKIWZODBZ>)

2025-02-21 05226, 2025

16:38 PM
julian45[m]

<yvanzo[m]> "julian45: About the SSO app in..." <- Got it, thanks! I was not aware of miniOrange's relationship with Atlassian, so this is good context for me to have.

2025-02-21 05223, 2025

16:43 PM
monkey[m]

<suvid[m]> "I was planning on working on..." <- suvid: For this, you would want a toggle to turn of BrainzPlayer entirely. We already hava apage for BP settings at https://listenbrainz.org/settings/brainzplayer/

2025-02-21 05223, 2025

16:43 PM
monkey[m]

Then there will be some conditional rendering in a few places depending on the activation state of BP (for example hiding the play icon buttons on all the listencards, not rendering or loading the BrainzPlayer component, etc.)

2025-02-21 05247, 2025

16:54 PM
mthax has quit

2025-02-21 05239, 2025

16:55 PM
mthax joined the channel

2025-02-21 05242, 2025

17:02 PM
mthax has quit

2025-02-21 05219, 2025

17:03 PM
mthax joined the channel

2025-02-21 05202, 2025

18:44 PM
mthax has quit

2025-02-21 05209, 2025

18:46 PM
mthax joined the channel

2025-02-21 05240, 2025

19:09 PM
jasje[m] joined the channel

2025-02-21 05240, 2025

19:09 PM
jasje[m]

Note for concerned: I wont be available till the end of this month (28th feb). Available for imp stuff only :)

2025-02-21 05208, 2025

19:11 PM
jasje[m]

* be available (travelling) till the

2025-02-21 05203, 2025

19:16 PM
aerozol[m]

Love the stats mayhem! I'll share them this weekend

2025-02-21 05230, 2025

19:19 PM
aerozol[m]

reosarevok: re MBS-13945, I saw that the other day and it seemed relatively straightforward? The wording could use some work (I'm not sure "source MBID" will be universally understood) but I don't know if there's a downside. Whether it's done in the UI or via modbot I assume is a technical question. Let me know if you want me to comment on the ticket re. anything in particular

2025-02-21 05231, 2025

19:19 PM
BrainzBot

MBS-13945: Include release link for edits made in the release relationship editor https://tickets.metabrainz.org/browse/MBS-13945