<fettuccinae[m]> "https://listenbrainz.readthedocs..." <- Ok so I need to look into the Spotify reader container?
<suvid[m]> "i had some general queries..." <- > <@suvid:matrix.org> i had some general queries regarding the listens import code... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/...>)
The importer service related code in this directory.
The listenbrainz-spotify-reader container invokes the spotify.py file and runs the importer code
suvid[m]
ohh
and it does it using cronjobs?
lucifer[m]
No
suvid[m]
like at every interval
lucifer[m]
It runs keep running continuously at all times
Doing one pass over all users then the another pass over all users and repeat infinitely
suvid[m]
ohh
lucifer so the spotify-reader container just invokes the spotify.py file? or something else as well?
where can i view what all it calls?
sorry for such a beginner query 😅
lucifer[m]
Just the spotify.py file
mayhem[m]
lucifer: moin! I'm working on the shared memory implementation and I have to say, that will be the way to go. shared memory is great (see postgres).
however, the nmslib only supports persisting indexes to disk. You need to pass a filename -- passing a stream is not supported
I'm obviously trying to avoid the hit of writing to disk only to load into ram again.
A RAM disk would do the trick, but that is a pain to setup.
there are several suggested ways of patching the open call that returns a memory stream as opposed to a file stream, but this code is very likely actual C code, deep in nmslib.
short of a ram disk, can you think of any alternatives?
zas: you might have some insights as well.
lucifer[m]
mayhem: how about save to a file first and then memory map the file?
mayhem[m]
I dont ever want the file to go to disk.
making a ram disk is pretty easy it turns out. that might be the best way.
and its a speed-up improvement, not critical to have.
lucifer[m]
cool, are you using pyfilesystem?
mayhem[m]
sudo mount -o size=10M -t tmpfs none /mnt/tmpfs
not sure I see the point of pyfilesystem
lucifer[m]
sounds good.
zas[m]
So basically you don't want to persist indexes? Isn't that the default? Can you point me at actual code? https://github.com/nmslib/nmslib/blob/2ae537802... seems to indicate one has to call saveIndex for it to be written to disk, and there's also a loadIndex. But I'm not even sure what indexes we talk about.
mayhem[m]
zas[m]: exactly that -- load and save index to and from disk. I don't ever want to hit the disk, I want a pure ram operation.
speed is of the essence in this case for the new mapping server.
zas[m]
But ... createIndex() seems to use RAM, and load/save are meant to persist those, but aren't those load/save calls under your control? I mean if you don't want to use disk it seems to me that's perfectly possible (just don't use save/loadIndex()). But maybe I miss something, I know nothing about this lib nor your use of it.
Of course the RAM disk solution works (and is easy to set up), but what I don't understand is the "nmslib only supports persisting indexes to disk." part, it seems that bindings say different. You can create an index and don't save it at all.
mayhem[m]
I am building a system where an index needs to be shared with other processes in shared ram. so I need to get an in-ram index and get it into shared ram.
if I could persist to a buffer, I could copy that buffer to shared ram and I am done.
but I can only persist the index to disk. thus the need to for the ram disk.
otherwise I get the hit of going to disk and right back from it, when I would prefer to avoid that.
zas[m]
Ah, I get it, actually you want to persist indexes. ;)
mayhem[m]
I do,yes.
ram disk it is. venga.
zas[m]
So the ramdisk looks the best approach to me, it is simple and reliable and doesn't require any change in the app
lucifer: yesterday I got rid of the shards and just made the simplest uwsgi workers. no sharing of data. I got at most 75reqs/s out of that.
Now with shared memory its 200 reqs/s.
lucifer[m]
awesome.
mayhem[m]
and if I pre-build the indexes I suspect that is going to much higher.
vardhan has quit
vardhan_ has quit
this is finally starting to come into focus.
its amazing that it can do as well as it is, without pre-build indexes.
tomorrow I'll pre-build indexes and add cache management and then we can see the real performance.
but I suspect I'll be greenlighted for finishing all the features, since this may actually work ok.
lucifer[m]
sounds great.
did you add a validation step yet? to make sure its working correctly.
bitmap[m]
<zas[m]> "bitmap: ^^ not sure what..." <- I didn't see any alert either. and it looks like the container logs for that time period are already gone... I didn't see anything in the PG logs.
[listenbrainz-server] 14ahmvdev closed pull request #3221 (03master…master): LB-1760: Fix the link in the details section to redirect to the correct file. https://github.com/metabrainz/listenbrainz-serv...
question: why isn't critiquebrainz a part of GSOC projects?
It falls behind in terms of UI design but i could literally see it as being an alternative to letterboxd for music
mayhem[m]
hi ahmad!
we've kinda deprecated CritiqueBrainz as a separate project -- overall we feel that a lot of these things should be available to ListenBrainz users, so we're planning on migrating or adding features from CB to LB.
if you wanted to proposed a project that takes the useful bits of CB and adds them to LB, we'd consider it.
however, we are very hesitant on accepting UI projects from GSoC students. we have pretty solid design guidelines and coordinating between our designer, the mentor and the student against a tight deadline doesn't work all that well.
So, if you focus on API work, then that should work great. If you want to do UI work, you'll have to be a wizard and really impress us to take a project on.
glucosesniffer[m
mayhem[m]: well the last line just scared me off, ill stick to existing ideas lmfao
mayhem[m]
sorry, but best to make things clear early on. :)
we've done GSoC so many times that we know what works for us so we pick things that tend to have the best outcomes for all of us.
glucosesniffer[m
mayhem[m]: i thought itd be easy work so i was thinking of proposing that as a project idea, but honestly if you hadn't clarified i wouldve went with it and bitten off more than i could chew, apart from gsoc if you guys decide to incorporate some features in the future from CB to LB i would love to contribute
glucosesniffer[m: find a non-gui way to contribute then!
julian45[m]: I think I would prefer the GUI option this time. I use these services once every 6 months and GUIs are much more easily discoverable than having to re-lean a GUI.
glucosesniffer[m
mayhem[m]: lolol
mayhem[m]
but if zas strongly prefers a CLI, then that's fine by me.
glucosesniffer[m
mayhem[m]: will try!
zas[m]
I'm fine with the simplest one. Not like we will manage a lot of users anyway.
julian45[m]
good to know! unfortunately neither of the options i presented are particularly simple (a bit of the nature of the beast when it comes to identity providers), but each is relatively easy to reach MVP and do ongoing work in.
i do have a follow-up question: the CLI-only option can theoretically handle user auth and SSH key distribution for *nix hosts, but the web GUI-first option can't. if the implementation of this feature was deemed strong enough and usable enough to viably replace the current ansible-based user and key mgmt processes, would that skew things either way?
mayhem[m]
julian45[m]: My nose is not close enough to the grindstone to answer that question. Zas will have a better answer than I.
zas[m]
Clearly if it is possible to use it instead of Ansible for SSH keys deployment I guess we should opt for the solution that allows that, it would be very convenient and safer. Though how is it easy to set up compared to the current Ansible "solution" (which is far from perfect but rather simpler and reliable)?