in #metabrainz

0:13 AM
bitmap

reosarevok: yvanzo: artwork-indexer-prod container is now running on serge
0:20 AM
yvanzo

🎉
0:30 AM
bitmap

seems to be working, except the IA went down right after I deployed it 😬
0:30 AM
good test of the error handling at least
3:59 AM
ursa-major has quit
3:59 AM
ursa-major joined the channel
4:00 AM
zas has quit
4:03 AM
zas joined the channel
4:04 AM
snobdiggy2 joined the channel
4:17 AM
yellowhatpro joined the channel
4:52 AM
minimal has quit
5:06 AM
Maxr1998 has quit
5:09 AM
Maxr1998 joined the channel
7:03 AM
Shubham joined the channel
7:31 AM
Kladky joined the channel
7:36 AM
mara42 joined the channel
7:56 AM
yellowhatpro

Hi @bitmap, thanks for the reply on community forum. I've revisited the proposal, will soon make appropriate changes.
7:56 AM
There are couple of things I would love to discuss as well.
7:56 AM
Regarding the producer/consumer architecture first of all.
8:00 AM
There will be a task running to save URLs, which will receive the URLs from single source. But say, if I also have a task that retry saving any unsaved URL, and the main task, which will send the "URL saving task" new URLs, and all these tasks running concurrently, doesn't a notify/listen mechanism make sense here?
8:02 AM
It's like 2 producers (the main task , and the retry task) producing/notifying a channel, and the consume/listener has just to process the URLs from the channel.
8:15 AM
mara42 has quit
8:30 AM
Satyaraj[m] joined the channel
8:30 AM
Satyaraj[m]

lucifer (IRC): Do i need to include any UI mockups in the proposal?
8:51 AM
aabbi15 joined the channel
8:51 AM
aabbi15

yvanzo: Yes thanks to you too for helping me throughout ;)
8:54 AM
Yes I do know that i18next is limited to JS/TS projects but it does work well in those :)
9:06 AM
mara42 joined the channel
9:18 AM
aabbi15 has quit
9:30 AM
lucifer

Satyaraj[m]: not needed.
10:15 AM
mara42 has quit
10:32 AM
pranav[m] joined the channel
10:32 AM
pranav[m] uploaded an image: (2345KiB) < https://matrix.moviebrainz.org/_matrix/media/v3/download/matrix.org/KVGFCaYbgcMxjSVfQonPpNsv/ima_3c196d4.jpeg >
10:32 AM
pranav[m]

Hey aerozol instead of having a play button on, can we play the song if a user taps on the card?
10:32 AM
Because that component has already been created
10:40 AM
eharris_ has quit
10:40 AM
eharris joined the channel
10:57 AM
mara42 joined the channel
10:57 AM
huhridge joined the channel
11:00 AM
mara42 has quit
11:00 AM
mara42 joined the channel
11:30 AM
Satyaraj[m] has quit
11:36 AM
Shubham has quit
12:07 PM
mara42 has quit
12:22 PM
kellnerd

monkey: Finally posted my reworked proposal on the forums: https://community.metabrainz.org/t/gsoc-2024-bo...
12:22 PM
A bit later than I planned to, but still earlier than last year :sweat_smile:
12:24 PM
mara42 joined the channel
12:25 PM
I think the project still has enough potential for 350 hours, but I am not sure whether I can put so many hours into GSoC before the Delhi summit. So I would rather only apply for 175 hours and work on it longer if if I can still find the time.
12:26 PM
Let me know what you think about this thought and my proposed timeline.
12:39 PM
mara42 has quit
12:44 PM
huhridge has quit
12:45 PM
mara42 joined the channel
12:59 PM
Tarun_0x0 joined the channel
13:01 PM
atj

Oh dear https://www.theregister.com/2024/03/28/ai_bots_...
13:11 PM
mara42 has quit
13:11 PM
mara42 joined the channel
13:26 PM
Tarun_0x0 has quit
13:32 PM
pranav[m] has quit
13:32 PM
huhridge joined the channel
13:35 PM
kellnerd

🤦
13:47 PM
theflash__ joined the channel
13:49 PM
theflash__

akshaaatt: hey!, can you please edit the ideas page with the new project name
13:49 PM
I am submitting proposal to the gsoc website
13:59 PM
mara42 has quit
14:00 PM
mara42 joined the channel
14:10 PM
mara42 has quit
14:10 PM
mara42 joined the channel
14:16 PM
huhridge has quit
14:18 PM
huhridge joined the channel
14:28 PM
huhridge has quit
14:38 PM
bitmap

!m kellnerd
14:38 PM
BrainzBot

You're doing good work, kellnerd!
14:38 PM
bitmap

hi yellowhatpro
14:39 PM
what you say makes a lot of sense if high concurrency is needed, but I think this is something else missing from the proposal
14:40 PM
akshaaatt

theflash__: that page doesn’t matter though. You can ignore it. Just submit your proposal to the website now. ✌️
14:40 PM
bitmap

yellowhatpro: if you have a mirror DB set up, you may be able to estimate how many URLs need to be saved per hour, and I expect even processing those serially would be enough
14:41 PM
unless you are processing all old edits too, but that wan't clear to me :) (because the older the edit is, the more likely that the links will be broken)
14:42 PM
I also expected that the number of retries needed wouldn't be so high as to need a dedicated prodcer, but I could be wrong :)
14:42 PM
yellowhatpro

hi bitmap
14:44 PM
Aight I am getting the point, but the rate of adding new URLs, and the rate of saving them still is very high no?
14:45 PM
regarding the number of URLs to process, I was actually taking the difference of the first and last edit processed in a day
14:46 PM
I saw around 30k new edits to created last day (if I am not screwing math)
14:46 PM
bitmap

yes, it seems we have around 30k edits per day as an upper bound. however, not all of those will be modifying URLs, not all of them will have edit notes, and not all of the edit notes will have URLs
14:47 PM
yellowhatpro

These edits + edit notes combined will be much greater,
14:47 PM
yeah right, most donthave URLs
14:48 PM
but some can have multiple URLs right?
14:48 PM
Like we are not bound to 1 URL per edit_note
14:49 PM
bitmap

yeah, they can, but I'm guessing on average there will be less than 1 URL per edit. I could be wrong of course, which is why getting some statistics would be nice!
14:49 PM
yellowhatpro

On average, right.
14:50 PM
kellnerd imagines a lot of a-tisket cache pages being archived soon
14:51 PM
Is there any overhead if we do notify/listen mechanism?
14:53 PM
bitmap

not really, I was just brining it up so that we don't prematurely overengineer things (which could make it more difficult to maintain)
14:53 PM
yellowhatpro

The main reason I understood is I can easily manage different async tasks using a channel, and passing messages to the URL saving part
14:55 PM
Oh right, I thought about it too this morning when I read the comment.
14:55 PM
I was like waiiiiiit, can we?
14:55 PM
can we just sql it
14:57 PM
bitmap

it's definitely possible to just select the next URL to be processed from the table; you could even do this concurrently without LISTEN/NOTIFY
15:00 PM
theflash__

akshaaatt: alright!
15:01 PM
yellowhatpro

Oh nice.. can you give some lead on this. Like is there any special function or keyword to use?
15:03 PM
bitmap

I gave an example in my comment where you simply order the results by the critera you want, and then select the first result with LIMIT 1. to do this concurrently, you can add FOR UPDATE ... SKIP LOCKED so that multiple tasks don't select the same URL
15:04 PM
mara42 has quit
15:04 PM
mara42 joined the channel
15:04 PM
LISTEN/NOTIFY would be advantageous if we had hundreds of workers polling the database, but I don't think that amount of concurrency is needed here
15:05 PM
yellowhatpro

Right, it makes sense.
15:07 PM
huhridge joined the channel
15:11 PM
huhridge has quit
15:16 PM
btw, I keep thinking about this. If the rate of production is > rate of consumption of URLs, and assuming around 20k URLs to be processed on a daily basis (edit notes+edits+retries), will we be able to save every single one of them (12 urls a minute*60*24 = 17280 saves a day being our quota ) ?
15:17 PM
Guest58 joined the channel
15:24 PM
bitmap

I don't think we will have anywhere near that many unique URLs per day, but if we do then we'll figure something out :)
15:25 PM
keep in mind that even in edit forms where you can submit batches of edits, there is only one edit note field, so all the edits submitted in that session will have the same note
15:25 PM
yellowhatpro

whew!!
15:27 PM
Ohh. I'll check it once then.
15:28 PM
minimal joined the channel
15:31 PM
Guest58 has quit
15:35 PM
Aah yes. Another question. For the retry mechanism, do we just check when was the last time we tried/retried saving the URL. Say if a url is not being able to save for some reason, what can we do in that case. My thought was to keep a retry count, and crossing that count we can drop the URL from table. But can we tolerate a couple URLs not being saved?
15:48 PM
bitmap

I suppose it depends on what kinds of errors the save API returns. if it's simply overloaded, then obviously we can retry, but if the error is inherent to the URL, then we shouldn't make another attempt
15:49 PM
but a retry count may be needed to limit the number of attempts, yes
15:50 PM
I haven't played with the SPN API enough to know, so it'd be nice to investigate this for your proposal :)
15:57 PM
yellowhatpro

Yeah I did. I even mailed them because apparently they can let a user save 100k URLs a day, it's just the api one is one of the many ways they provide
16:03 PM
huhridge joined the channel
16:04 PM
munishk joined the channel
16:06 PM
Kladky has quit
16:09 PM
huhridge has quit
16:11 PM
munishk

Hi @lucifer, I tried with adding websockets to test.sh as well, but still suffering from the same issue 😢
16:16 PM
mayhem: I have added the dates and remove server side stretch goals form the proposal. Added items to community bonding period
16:17 PM
huhridge joined the channel
16:22 PM
rimskii[m] joined the channel
16:22 PM
rimskii[m]

Hi, guys! Genuine question: do you think is it enough if you can import max 100 tracks from Spotify playlists? Or should I add more? Am afraid there would be problems with a rate limit
16:24 PM
huhridge has quit
16:28 PM
Kladky joined the channel
16:28 PM
huhridge joined the channel
17:07 PM
theflash__ has quit
17:13 PM
mara42 has quit
17:18 PM
dvzrv has quit
17:18 PM
dvzrv joined the channel
17:24 PM
Tarun_0x0 joined the channel
17:26 PM
huhridge

lucifer: for the IA caching, how are we planning to seed it? like do we just a collection?
17:27 PM
or should we start with a collection, and then seed it with all the artists that are returned?
17:28 PM
mayhem as well
18:22 PM
huhridge has quit
18:23 PM
munishk has quit
18:23 PM
atj

yellowwhatpro: one small feature to consider might be a domain ignore list
18:24 PM
you probably don't want to archive MB URLs for example
18:24 PM
and some domains explicitly exclude IA
18:27 PM
huhridge joined the channel
19:22 PM
rimskii[m] has quit
19:29 PM
Derailed has quit
19:34 PM
Derailed joined the channel
19:47 PM
Tarun_0x0 has quit
20:05 PM
huhridge has quit