jasje: I have made the final PR of the playlist feature. Let me know if you have any changes are required. Then we can merge it in the main branch.
jasje[m]
HemangMishra[m]: Need time to test
HemangMishra[m] uploaded an image: (736KiB) < https://matrix.chatbrainz.org/_matrix/media/v3/download/matrix.org/jhCzeuhZjPransvQZQvwedGO/image.png >
HemangMishra[m]
HemangMishra[m]: Also, should I change the gradient or rounded corners of the description section. Like Artists page and others have rounded corners but Profile>listens doesn't, so I'm not sure which one to follow.
reosarevok[m]
Beta updated one
* Beta update done
rustynova[m]
Hi! I plan to make some contributions to Melba again now that the project is a more stable state (and not bound by the rules of GoSC).
But I do wonder, is the project on hiatus? Or just need contributors? There's a pending PR by yellowhatpro that hasn't been touched in 4 months...
For now I'd just make some refactoring changes to clean things up a bit (and finally rebase my own PR).
yellowhatpro[m] joined the channel
yellowhatpro[m]
Hi, rustynova. Yeah you can work on the project. I have been busy with some other projects outside of MetaBrainz that I am not able to work on melba.
vardhan_ joined the channel
rustynova[m]
Well that will be a side project for me too. Something to change my mind off other projects. Will try at least a commit per week though
lucifer[m]
mayhem: hi! i am doing MLHD+ chunk by chunk processing and i think we might be able to do it on the current cluster. it processes one chunk in an hour and 16 chunks in total so assuming that takes a day. however each chunk produces an output of 400G to store. and that's without replication. with replication its 3x. so the current bottleneck is disk space.
mayhem[m]
ok, let me see what our options are.
lucifer[m]
zas: mayhem: i see hetzner has 5TB storage boxes for $13/month. so in the worst case we keep this around for a month, its 65usd still cheaper than the vms.
mayhem[m]
(but, squeee, that is exciting!)
the storage boxes run via SAMBA and end up being slow and unreliable.
lucifer[m]
i am guessing its HDD not SSD so slower but we have 12 hours of spark free time daily so i could 3-4 chunks a days.
i see
mayhem[m]
its for storing things, not for working things.
I tried to run navidrome on top of a storage box and it lasted a week before it shat itself.
lucifer[m]
i can disable replication cluster wide or disable it to 2 for the duration of mlhd processing and but even then i expect at least 10 TB of disk space requirement.
s/disable/reduce/
mayhem[m]
we could have disks put into those machines, but it would be a real pain.
given all this, I still think we should use new VMs to get this done.
lucifer[m]
i see.
mayhem[m]
and we can attach storage volumes (fast and suitable for working) to the nodes to reach the DB levels we need.
lucifer[m]
is it possible to have multiple disks at a single mount point?
mayhem[m]
no, unix doesn't allow that.
why do you need that?
lucifer[m]
actually nvm, i found the way to specify multiple directories/mounts to hdfs
mayhem[m]
if you wanted that you would have to use something like LVM and make a virtual drive.
phew.
lucifer[m]
zas: would it possible for the new VMs to connect to existing spark servers?
i need a smaller server for running the leader, but i guess we could get a smaller vm for that and keep it all isolated from the current cluster.
mayhem: so i am thinking CCX53 * 3 + CCX53 * 1 + 10 TB storage volume on each.
<aerozol[m]> "I think this was about the..." <- Yeha, I think we have enough stuff to show-and-tell, that would be nice. Not sure we have any feature on the immediate horizon to justify waiting for.
Let me know if you want some help, and in what form. For example I can collate everything I think should go in there, and let you filter, massage and post?
oy. I think it will be better to wait zas -- I think you'll do it much faster than me trying to learn this all.
zas[m]
Actually it would be good if you add VMs to Ansible (this part is easy), and then learn how to deploy (basically run bootstrap.yml playbook then site.yml playbook). You need ssh to be properly configured (ssh root@yourvm should work for bootstrap, and then ssh youruser@yourvm should work for site)
You risk nothing trying (at worse we rebuild the VM and start over). Properly configuring network is a must (of course, else we lose connection)
for the vswitch part, that's trickier, because we don't have such setup yet, and I expect some changes will be needed regarding network/firewall settings to handle this case in Ansible.
Bootstrap needs root access because users aren't created yet, then site can run on user+sudo
mayhem[m]
well, this sounds more interesting that the thing I am fighting now...
<lucifer[m]> "mayhem: so i am thinking CCX53 *..." <- this is actually not clear. one machine with 10TB storage?
lucifer[m]
each.
mayhem[m]
and 3 stock CCX53s?
so 4 CCX53 with each 10TB?
lucifer[m]
oh sorry, i made a typo. 3 CCX53's and 1 CCX23. with 10 TB each yes.
but i did think of an alternative plan to tackle this dataset too if you want to wait.
mayhem[m]
what is that alternate plan?
lucifer[m]
process each file individually parallely using duckdb or somesuch and then just do the final combination step in spark.
mayhem[m]
sounds like a lot of code for you to write.
And I can't get a dedicated CPU VM. we've reached some limit I need to have raised.
lucifer[m]
hard to tell without taking a shot at it, might be a day or less or more.
mayhem[m]
maybe spend an hour or 2 on it and see if that approach is promising?
zas[m]
If those machines are temporary, we may not add them to Ansible at all, and just do a basic config manually
mayhem[m]
yes, all temp. but we'll see if we actually need them
lucifer[m]
i don't anticipate issues to show up until say 10% of the dataset is done but sure let me try.
zas[m]
ok, tell me, I'll be happy to help, but I'm very busy right now at controlling this absurd traffic surge...
mayhem[m]
zas[m]: focus on the surge. lucifer will try an alternate path.
d4rkie has quit
d4rkie joined the channel
d4rkie has quit
d4rkie joined the channel
d4rk-ph0enix has quit
d4rk-ph0enix joined the channel
vardhan_ has quit
leftmostcatUTC-8 has quit
bitmap[m]
reosarevok: yvanzo: lucifer: hello, are we meeting now?
reosarevok[m]
Hi! I thought we were! :)
yvanzo[m]
I thought it was in 1h from now but I’m around already.
monkey[m]
mayhem, lucifer An interesting timezone-related issue I never saw before was reported (LB-1766): `can't compare offset-naive and offset-aware datetimes`. The ticket was assigned to me but I don't know if I'm the ideal candidate. Can I assign the ticket to either of you?