#metabrainz

/

      • bitmap
        reosarevok: yvanzo: artwork-indexer-prod container is now running on serge
      • yvanzo
        🎉
      • bitmap
        seems to be working, except the IA went down right after I deployed it 😬
      • good test of the error handling at least
      • ursa-major has quit
      • ursa-major joined the channel
      • zas has quit
      • zas joined the channel
      • snobdiggy2 joined the channel
      • yellowhatpro joined the channel
      • minimal has quit
      • Maxr1998 has quit
      • Maxr1998 joined the channel
      • Shubham joined the channel
      • Kladky joined the channel
      • mara42 joined the channel
      • yellowhatpro
        Hi @bitmap, thanks for the reply on community forum. I've revisited the proposal, will soon make appropriate changes.
      • There are couple of things I would love to discuss as well.
      • Regarding the producer/consumer architecture first of all.
      • There will be a task running to save URLs, which will receive the URLs from single source. But say, if I also have a task that retry saving any unsaved URL, and the main task, which will send the "URL saving task" new URLs, and all these tasks running concurrently, doesn't a notify/listen mechanism make sense here?
      • It's like 2 producers (the main task , and the retry task) producing/notifying a channel, and the consume/listener has just to process the URLs from the channel.
      • mara42 has quit
      • Satyaraj[m] joined the channel
      • Satyaraj[m]
        lucifer (IRC): Do i need to include any UI mockups in the proposal?
      • aabbi15 joined the channel
      • aabbi15
        yvanzo: Yes thanks to you too for helping me throughout ;)
      • Yes I do know that i18next is limited to JS/TS projects but it does work well in those :)
      • mara42 joined the channel
      • aabbi15 has quit
      • lucifer
        Satyaraj[m]: not needed.
      • mara42 has quit
      • pranav[m] joined the channel
      • pranav[m] uploaded an image: (2345KiB) < https://matrix.moviebrainz.org/_matrix/media/v3/download/matrix.org/KVGFCaYbgcMxjSVfQonPpNsv/ima_3c196d4.jpeg >
      • pranav[m]
        Hey aerozol instead of having a play button on, can we play the song if a user taps on the card?
      • Because that component has already been created
      • eharris_ has quit
      • eharris joined the channel
      • mara42 joined the channel
      • huhridge joined the channel
      • mara42 has quit
      • mara42 joined the channel
      • Satyaraj[m] has quit
      • Shubham has quit
      • mara42 has quit
      • kellnerd
        monkey: Finally posted my reworked proposal on the forums: https://community.metabrainz.org/t/gsoc-2024-bo...
      • A bit later than I planned to, but still earlier than last year :sweat_smile:
      • mara42 joined the channel
      • I think the project still has enough potential for 350 hours, but I am not sure whether I can put so many hours into GSoC before the Delhi summit. So I would rather only apply for 175 hours and work on it longer if if I can still find the time.
      • Let me know what you think about this thought and my proposed timeline.
      • mara42 has quit
      • huhridge has quit
      • mara42 joined the channel
      • Tarun_0x0 joined the channel
      • atj
      • mara42 has quit
      • mara42 joined the channel
      • Tarun_0x0 has quit
      • pranav[m] has quit
      • huhridge joined the channel
      • kellnerd
        🤦
      • theflash__ joined the channel
      • theflash__
        akshaaatt: hey!, can you please edit the ideas page with the new project name
      • I am submitting proposal to the gsoc website
      • mara42 has quit
      • mara42 joined the channel
      • mara42 has quit
      • mara42 joined the channel
      • huhridge has quit
      • huhridge joined the channel
      • huhridge has quit
      • bitmap
        !m kellnerd
      • BrainzBot
        You're doing good work, kellnerd!
      • bitmap
        hi yellowhatpro
      • what you say makes a lot of sense if high concurrency is needed, but I think this is something else missing from the proposal
      • akshaaatt
        theflash__: that page doesn’t matter though. You can ignore it. Just submit your proposal to the website now. ✌️
      • bitmap
        yellowhatpro: if you have a mirror DB set up, you may be able to estimate how many URLs need to be saved per hour, and I expect even processing those serially would be enough
      • unless you are processing all old edits too, but that wan't clear to me :) (because the older the edit is, the more likely that the links will be broken)
      • I also expected that the number of retries needed wouldn't be so high as to need a dedicated prodcer, but I could be wrong :)
      • yellowhatpro
        hi bitmap
      • Aight I am getting the point, but the rate of adding new URLs, and the rate of saving them still is very high no?
      • regarding the number of URLs to process, I was actually taking the difference of the first and last edit processed in a day
      • I saw around 30k new edits to created last day (if I am not screwing math)
      • bitmap
        yes, it seems we have around 30k edits per day as an upper bound. however, not all of those will be modifying URLs, not all of them will have edit notes, and not all of the edit notes will have URLs
      • yellowhatpro
        These edits + edit notes combined will be much greater,
      • yeah right, most donthave URLs
      • but some can have multiple URLs right?
      • Like we are not bound to 1 URL per edit_note
      • bitmap
        yeah, they can, but I'm guessing on average there will be less than 1 URL per edit. I could be wrong of course, which is why getting some statistics would be nice!
      • yellowhatpro
        On average, right.
      • kellnerd imagines a lot of a-tisket cache pages being archived soon
      • Is there any overhead if we do notify/listen mechanism?
      • bitmap
        not really, I was just brining it up so that we don't prematurely overengineer things (which could make it more difficult to maintain)
      • yellowhatpro
        The main reason I understood is I can easily manage different async tasks using a channel, and passing messages to the URL saving part
      • Oh right, I thought about it too this morning when I read the comment.
      • I was like waiiiiiit, can we?
      • can we just sql it
      • bitmap
        it's definitely possible to just select the next URL to be processed from the table; you could even do this concurrently without LISTEN/NOTIFY
      • theflash__
        akshaaatt: alright!
      • yellowhatpro
        Oh nice.. can you give some lead on this. Like is there any special function or keyword to use?
      • bitmap
        I gave an example in my comment where you simply order the results by the critera you want, and then select the first result with LIMIT 1. to do this concurrently, you can add FOR UPDATE ... SKIP LOCKED so that multiple tasks don't select the same URL
      • mara42 has quit
      • mara42 joined the channel
      • LISTEN/NOTIFY would be advantageous if we had hundreds of workers polling the database, but I don't think that amount of concurrency is needed here
      • yellowhatpro
        Right, it makes sense.
      • huhridge joined the channel
      • huhridge has quit
      • btw, I keep thinking about this. If the rate of production is > rate of consumption of URLs, and assuming around 20k URLs to be processed on a daily basis (edit notes+edits+retries), will we be able to save every single one of them (12 urls a minute*60*24 = 17280 saves a day being our quota ) ?
      • Guest58 joined the channel
      • bitmap
        I don't think we will have anywhere near that many unique URLs per day, but if we do then we'll figure something out :)
      • keep in mind that even in edit forms where you can submit batches of edits, there is only one edit note field, so all the edits submitted in that session will have the same note
      • yellowhatpro
        whew!!
      • Ohh. I'll check it once then.
      • minimal joined the channel
      • Guest58 has quit
      • Aah yes. Another question. For the retry mechanism, do we just check when was the last time we tried/retried saving the URL. Say if a url is not being able to save for some reason, what can we do in that case. My thought was to keep a retry count, and crossing that count we can drop the URL from table. But can we tolerate a couple URLs not being saved?
      • bitmap
        I suppose it depends on what kinds of errors the save API returns. if it's simply overloaded, then obviously we can retry, but if the error is inherent to the URL, then we shouldn't make another attempt
      • but a retry count may be needed to limit the number of attempts, yes
      • I haven't played with the SPN API enough to know, so it'd be nice to investigate this for your proposal :)
      • yellowhatpro
        Yeah I did. I even mailed them because apparently they can let a user save 100k URLs a day, it's just the api one is one of the many ways they provide
      • huhridge joined the channel
      • munishk joined the channel
      • Kladky has quit
      • huhridge has quit
      • munishk
        Hi @lucifer, I tried with adding websockets to test.sh as well, but still suffering from the same issue 😢
      • mayhem: I have added the dates and remove server side stretch goals form the proposal. Added items to community bonding period
      • huhridge joined the channel
      • rimskii[m] joined the channel
      • rimskii[m]
        Hi, guys! Genuine question: do you think is it enough if you can import max 100 tracks from Spotify playlists? Or should I add more? Am afraid there would be problems with a rate limit
      • huhridge has quit
      • Kladky joined the channel
      • huhridge joined the channel
      • theflash__ has quit
      • mara42 has quit
      • dvzrv has quit
      • dvzrv joined the channel
      • Tarun_0x0 joined the channel
      • huhridge
        lucifer: for the IA caching, how are we planning to seed it? like do we just a collection?
      • or should we start with a collection, and then seed it with all the artists that are returned?
      • mayhem as well
      • huhridge has quit
      • munishk has quit
      • atj
        yellowwhatpro: one small feature to consider might be a domain ignore list
      • you probably don't want to archive MB URLs for example
      • and some domains explicitly exclude IA
      • huhridge joined the channel
      • rimskii[m] has quit
      • Derailed has quit
      • Derailed joined the channel
      • Tarun_0x0 has quit
      • huhridge has quit