reosarevok: yvanzo: artwork-indexer-prod container is now running on serge
yvanzo
🎉
bitmap
seems to be working, except the IA went down right after I deployed it 😬
good test of the error handling at least
ursa-major has quit
ursa-major joined the channel
zas has quit
zas joined the channel
snobdiggy2 joined the channel
yellowhatpro joined the channel
minimal has quit
Maxr1998 has quit
Maxr1998 joined the channel
Shubham joined the channel
Kladky joined the channel
mara42 joined the channel
yellowhatpro
Hi @bitmap, thanks for the reply on community forum. I've revisited the proposal, will soon make appropriate changes.
There are couple of things I would love to discuss as well.
Regarding the producer/consumer architecture first of all.
There will be a task running to save URLs, which will receive the URLs from single source. But say, if I also have a task that retry saving any unsaved URL, and the main task, which will send the "URL saving task" new URLs, and all these tasks running concurrently, doesn't a notify/listen mechanism make sense here?
It's like 2 producers (the main task , and the retry task) producing/notifying a channel, and the consume/listener has just to process the URLs from the channel.
mara42 has quit
Satyaraj[m] joined the channel
Satyaraj[m]
lucifer (IRC): Do i need to include any UI mockups in the proposal?
aabbi15 joined the channel
aabbi15
yvanzo: Yes thanks to you too for helping me throughout ;)
Yes I do know that i18next is limited to JS/TS projects but it does work well in those :)
mara42 joined the channel
aabbi15 has quit
lucifer
Satyaraj[m]: not needed.
mara42 has quit
pranav[m] joined the channel
pranav[m] uploaded an image: (2345KiB) < https://matrix.moviebrainz.org/_matrix/media/v3/download/matrix.org/KVGFCaYbgcMxjSVfQonPpNsv/ima_3c196d4.jpeg >
pranav[m]
Hey aerozol instead of having a play button on, can we play the song if a user taps on the card?
A bit later than I planned to, but still earlier than last year :sweat_smile:
mara42 joined the channel
I think the project still has enough potential for 350 hours, but I am not sure whether I can put so many hours into GSoC before the Delhi summit. So I would rather only apply for 175 hours and work on it longer if if I can still find the time.
Let me know what you think about this thought and my proposed timeline.
akshaaatt: hey!, can you please edit the ideas page with the new project name
I am submitting proposal to the gsoc website
mara42 has quit
mara42 joined the channel
mara42 has quit
mara42 joined the channel
huhridge has quit
huhridge joined the channel
huhridge has quit
bitmap
!m kellnerd
BrainzBot
You're doing good work, kellnerd!
bitmap
hi yellowhatpro
what you say makes a lot of sense if high concurrency is needed, but I think this is something else missing from the proposal
akshaaatt
theflash__: that page doesn’t matter though. You can ignore it. Just submit your proposal to the website now. ✌️
bitmap
yellowhatpro: if you have a mirror DB set up, you may be able to estimate how many URLs need to be saved per hour, and I expect even processing those serially would be enough
unless you are processing all old edits too, but that wan't clear to me :) (because the older the edit is, the more likely that the links will be broken)
I also expected that the number of retries needed wouldn't be so high as to need a dedicated prodcer, but I could be wrong :)
yellowhatpro
hi bitmap
Aight I am getting the point, but the rate of adding new URLs, and the rate of saving them still is very high no?
regarding the number of URLs to process, I was actually taking the difference of the first and last edit processed in a day
I saw around 30k new edits to created last day (if I am not screwing math)
bitmap
yes, it seems we have around 30k edits per day as an upper bound. however, not all of those will be modifying URLs, not all of them will have edit notes, and not all of the edit notes will have URLs
yellowhatpro
These edits + edit notes combined will be much greater,
yeah right, most donthave URLs
but some can have multiple URLs right?
Like we are not bound to 1 URL per edit_note
bitmap
yeah, they can, but I'm guessing on average there will be less than 1 URL per edit. I could be wrong of course, which is why getting some statistics would be nice!
yellowhatpro
On average, right.
kellnerd imagines a lot of a-tisket cache pages being archived soon
Is there any overhead if we do notify/listen mechanism?
bitmap
not really, I was just brining it up so that we don't prematurely overengineer things (which could make it more difficult to maintain)
yellowhatpro
The main reason I understood is I can easily manage different async tasks using a channel, and passing messages to the URL saving part
Oh right, I thought about it too this morning when I read the comment.
I was like waiiiiiit, can we?
can we just sql it
bitmap
it's definitely possible to just select the next URL to be processed from the table; you could even do this concurrently without LISTEN/NOTIFY
theflash__
akshaaatt: alright!
yellowhatpro
Oh nice.. can you give some lead on this. Like is there any special function or keyword to use?
bitmap
I gave an example in my comment where you simply order the results by the critera you want, and then select the first result with LIMIT 1. to do this concurrently, you can add FOR UPDATE ... SKIP LOCKED so that multiple tasks don't select the same URL
mara42 has quit
mara42 joined the channel
LISTEN/NOTIFY would be advantageous if we had hundreds of workers polling the database, but I don't think that amount of concurrency is needed here
yellowhatpro
Right, it makes sense.
huhridge joined the channel
huhridge has quit
btw, I keep thinking about this. If the rate of production is > rate of consumption of URLs, and assuming around 20k URLs to be processed on a daily basis (edit notes+edits+retries), will we be able to save every single one of them (12 urls a minute*60*24 = 17280 saves a day being our quota ) ?
Guest58 joined the channel
bitmap
I don't think we will have anywhere near that many unique URLs per day, but if we do then we'll figure something out :)
keep in mind that even in edit forms where you can submit batches of edits, there is only one edit note field, so all the edits submitted in that session will have the same note
yellowhatpro
whew!!
Ohh. I'll check it once then.
minimal joined the channel
Guest58 has quit
Aah yes. Another question. For the retry mechanism, do we just check when was the last time we tried/retried saving the URL. Say if a url is not being able to save for some reason, what can we do in that case. My thought was to keep a retry count, and crossing that count we can drop the URL from table. But can we tolerate a couple URLs not being saved?
bitmap
I suppose it depends on what kinds of errors the save API returns. if it's simply overloaded, then obviously we can retry, but if the error is inherent to the URL, then we shouldn't make another attempt
but a retry count may be needed to limit the number of attempts, yes
I haven't played with the SPN API enough to know, so it'd be nice to investigate this for your proposal :)
yellowhatpro
Yeah I did. I even mailed them because apparently they can let a user save 100k URLs a day, it's just the api one is one of the many ways they provide
huhridge joined the channel
munishk joined the channel
Kladky has quit
huhridge has quit
munishk
Hi @lucifer, I tried with adding websockets to test.sh as well, but still suffering from the same issue 😢
mayhem: I have added the dates and remove server side stretch goals form the proposal. Added items to community bonding period
huhridge joined the channel
rimskii[m] joined the channel
rimskii[m]
Hi, guys! Genuine question: do you think is it enough if you can import max 100 tracks from Spotify playlists? Or should I add more? Am afraid there would be problems with a rate limit
huhridge has quit
Kladky joined the channel
huhridge joined the channel
theflash__ has quit
mara42 has quit
dvzrv has quit
dvzrv joined the channel
Tarun_0x0 joined the channel
huhridge
lucifer: for the IA caching, how are we planning to seed it? like do we just a collection?
or should we start with a collection, and then seed it with all the artists that are returned?
mayhem as well
huhridge has quit
munishk has quit
atj
yellowwhatpro: one small feature to consider might be a domain ignore list
you probably don't want to archive MB URLs for example