yvanzo[m]: right, I guess the idea is that the polling task would start from the very first edit (another scenario triggers couldn't cover)
2024-06-18 17053, 2024
yvanzo[m]
That wasn’t my idea, but it seems even better.
2024-06-18 17027, 2024
yellowhatpro[m]
I was thinking to archive the URLs that are there before the service starts, but should we be doing it simultaneously, whiile the archiver is archiving the current set of URLs
2024-06-18 17032, 2024
yellowhatpro[m]
?
2024-06-18 17017, 2024
yvanzo[m]
That is a possibility.
2024-06-18 17055, 2024
yellowhatpro[m]
But I am limiting my thoughts due to rate limiting as well.
2024-06-18 17055, 2024
yellowhatpro[m]
I know that we won't be getting much URLs per edit/edit note
2024-06-18 17001, 2024
bitmap[m]
I guess it depends on how fast it can archive the existing URLs
2024-06-18 17024, 2024
yellowhatpro[m]
* edit/edit note, but the previous ones are still many right
2024-06-18 17038, 2024
yvanzo[m]
It will be rate limited by the API.
2024-06-18 17054, 2024
yellowhatpro[m]
bitmap[m]: But we still are capped by IA rate limit right
2024-06-18 17031, 2024
yvanzo[m]
I don’t see any test in the repo.
2024-06-18 17038, 2024
bitmap[m]
yes, you can do a rough calculation based on the IA rate limit and number of existing URLs
2024-06-18 17025, 2024
yvanzo[m]
Do you have a mock API for testing, or is it always running with the real API?
2024-06-18 17032, 2024
yellowhatpro[m]
yvanzo[m]: yeah sorry, I was exploring tests past week.
2024-06-18 17032, 2024
yellowhatpro[m]
Did'nt add any rn. Will start writing them
2024-06-18 17057, 2024
yellowhatpro[m]
yvanzo[m]: Currently there is no mock API. I am assuming the URL to be saved, and then it would return some JOB_ID, which I would update in the internet_archive_urls table
In MBS, we do record/replay HTTP requests for integration tests.
2024-06-18 17035, 2024
bitmap[m]
<yellowhatpro[m]> "I see. I will try to keep it..." <- I wouldn't expect a separate repo is needed, rather an existing mock API like yvanzo referenced is used
2024-06-18 17008, 2024
yellowhatpro[m]
Yupp, will add the required mocks then.
2024-06-18 17013, 2024
yellowhatpro[m]
Any tip for db tests btw?
2024-06-18 17014, 2024
yellowhatpro[m]
Like what areas should I cover in that aspect?
2024-06-18 17054, 2024
yvanzo[m]
Start with something very simple, such as hand-crafted examples if needed.
2024-06-18 17002, 2024
bitmap[m]
<yellowhatpro[m]> "What other aspects should we..." <- how are you generating this list? automatically adding domains that fail consistently? allowing manual entries?
2024-06-18 17010, 2024
yellowhatpro[m]
Currently was thinking to manually enter, but I shall check if there is any domain filtering list
2024-06-18 17032, 2024
yellowhatpro[m]
Adding domains that fail consistently is one good thing I can implement
2024-06-18 17004, 2024
yvanzo[m]
If you need an example to start with, musicbrainz.org URLs should be filtered out.
2024-06-18 17004, 2024
yellowhatpro[m]
Like if there is any site, which fails to archive ever after x retries, I can prolly add it to some domain filtering table
2024-06-18 17022, 2024
yvanzo[m]
A manually-curated list would be nice to have just in case some domain names must be temporarily ignored.
2024-06-18 17036, 2024
yellowhatpro[m]
Right
2024-06-18 17038, 2024
discordbrainz
<05rustynova> DB tests depend on your DB driver. Sqlx is auto testing on your DB at compile time. Add the serde feature and it's done.
2024-06-18 17052, 2024
discordbrainz
<05rustynova> CDD FTW
2024-06-18 17054, 2024
fletchto99_ joined the channel
2024-06-18 17020, 2024
yellowhatpro[m]
Yupp I am using sqlx
2024-06-18 17029, 2024
yellowhatpro[m]
Oh that's cool
2024-06-18 17002, 2024
fletchto99 has quit
2024-06-18 17002, 2024
fletchto99_ is now known as fletchto99
2024-06-18 17050, 2024
yvanzo[m]
bitmap, yellowhatpro: What else did you want to discuss?
2024-06-18 17052, 2024
discordbrainz
<05rustynova> I know that sea orm also use sqlx under the hood, but I don't know about the compile time checking. Although for ORMs it's just a matter of checking the schema in the code against the schema in the DB.
2024-06-18 17058, 2024
yellowhatpro[m]
yvanzo[m]: I also remember reo once said that we should prevent spam users edits as well iirc.
2024-06-18 17012, 2024
yvanzo[m]
Yes but that is a separate issue.
2024-06-18 17046, 2024
yvanzo[m]
Basically, when retrieving rows from edit_data, it should be checked that the edit author isn’t marked as spammer.
2024-06-18 17006, 2024
yvanzo[m]
Same for edit_note.
2024-06-18 17038, 2024
yvanzo[m]
Also consider edit_note_change for later on.
2024-06-18 17039, 2024
yellowhatpro[m]
<yvanzo[m]> "That is one more use case for..." <- Also this. Sorry, I think I misunderstood earlier, but is it just a cli script to write, right? Sorta simple methods to write entries to `internet_archive_urls` ?
2024-06-18 17039, 2024
yellowhatpro[m]
Can you elaborate a bit on this. I might have misunderstood it from earlier meet as well.
2024-06-18 17038, 2024
yvanzo[m]
yellowhatpro: Ok, from “week 1”, “a small, workable app, which can perform the above-mentioned tasks” and “will be tested individually” made me thought that the app would be able to perform any of these tasks individually when requested through arguments.
2024-06-18 17017, 2024
atj[m]
zas, yvanzo: just FYI, now that we have a firewall I've update Solr to listen on the wildcard address, so you can make API requests to localhost as well as the LAN IP. The HAProxy backend configuration is now the same on all nodes.
2024-06-18 17039, 2024
yellowhatpro[m]
Oh got it. what I meant was to work on them separately, and checking if I was able to get the result from it correctly.
2024-06-18 17045, 2024
atj[m]
I've force-pushed the updates to my solr branch in the metabrainz-ansible repo
2024-06-18 17008, 2024
yvanzo[m]
For example: To queue a URL for archiving: melba --queue-url http://example.com/; To queue an edit for archiving the URLs it contain: melba --queue-edit 42
2024-06-18 17014, 2024
yellowhatpro[m]
For that purpose, I thought working on separate branches would be fine
2024-06-18 17003, 2024
bitmap[m]
<yvanzo[m]> "bitmap, yellowhatpro: What..." <- sorry I had to go afk for a moment, nothing specific from me, just to answer any of yellowhatpro's concerns
2024-06-18 17036, 2024
yvanzo[m]
yellowhatpro[m]: It has to be in the same branch eventually.
2024-06-18 17016, 2024
yellowhatpro[m]
yvanzo[m]: I didn't think it that way, but I can come up with cmd line args feature support
2024-06-18 17035, 2024
yellowhatpro[m]
yvanzo[m]: Yupp got the point
2024-06-18 17002, 2024
bitmap[m]
<yellowhatpro[m]> "Any tip for db tests btw?..." <- > <@yellowhatpro:matrix.org> Any tip for db tests btw?
2024-06-18 17002, 2024
bitmap[m]
> Like what areas should I cover in that aspect?
2024-06-18 17002, 2024
bitmap[m]
look at what DB queries the archiver is making, and make sure you have test data that satisfies each type of query. at least one example for each edit type
2024-06-18 17012, 2024
yvanzo[m]
yellowhatpro[m]: See the crate clap
2024-06-18 17039, 2024
yellowhatpro[m]
bitmap[m]: Got it, will work on this ✅
2024-06-18 17024, 2024
yellowhatpro[m]
yvanzo[m]: yeah I have used it in past so will impl it sooon
2024-06-18 17055, 2024
yellowhatpro[m]
So to summarize... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/LIVSpudkUJfryuVSSRTXJkzP>)
2024-06-18 17056, 2024
yellowhatpro[m]
I should cover these aspects then
2024-06-18 17001, 2024
yvanzo[m]
Did you have any other question about transformations?
2024-06-18 17034, 2024
yellowhatpro[m]
Not at the moment, I was only curious about the filtering part, and any other edit types
Even .jpeg images can be used as reference, better archiving these too.
2024-06-18 17009, 2024
yellowhatpro[m]
Alrightyy. media stays
2024-06-18 17047, 2024
yellowhatpro[m]
<yellowhatpro[m]> "So to summarize..." <- > <@yellowhatpro:matrix.org> So to summarize... (full message at <https://matrix.chatbrainz.org/_matrix/media/v3/download/chatbrainz.org/NsJtSxANqsdxrIyPQtzgkxuG>)
2024-06-18 17052, 2024
yvanzo[m]
About your summary: Tests can be split into documentation tests (the easiest to start with) and unit tests (simple to add as well and very useful). Integration tests can be delayed until there is proper mocking of SQL and HTTP.
2024-06-18 17022, 2024
yvanzo[m]
I would suggest to prioritize tests at first (as it can help with spotting basic issues), then command-line (as it can help with debugging), and the rest.
2024-06-18 17046, 2024
yellowhatpro[m]
Thanks. working on it ✅
2024-06-18 17047, 2024
yvanzo[m]
You’re doing well, keep it up! 😎
2024-06-18 17057, 2024
yvanzo[m]
bitmap, yellowhatpro: Thanks for your time!
2024-06-18 17009, 2024
atj[m]
Sounds like this is going to be a really useful project
2024-06-18 17052, 2024
atj[m]
I've looked at IA in vain many times trying to see if edits were correct etc.
2024-06-18 17009, 2024
yellowhatpro[m]
yvanzo[m]: Thanks. I didn't outline many things earlier in proposal, but as we move I can see there are many things to work on. Thanks y'all for help, will make things work
2024-06-18 17006, 2024
yellowhatpro[m]
atj[m]: Oh cool. Well that's motivation
2024-06-18 17046, 2024
atj[m]
[@yvanzo](https://matrix.to/#/@yvanzo:chatbrainz.org): I don't know what your thoughts are but I feel like we either do the Solr switch tomorrow or wait 2 weeks until I'm back from holiday.
2024-06-18 17048, 2024
yvanzo[m]
atj, bitmap: I patched all the MB web containers to allow switching to the new cluster (using HTTPS).
2024-06-18 17050, 2024
v6lur joined the channel
2024-06-18 17002, 2024
atj[m]
Is there a new mb-solr release? I'm using a snapshot extracted from the Docker build I think
reosarevok: yvanzo: MBS-13628 is caused by the production-cron container only having stub edit classes for the EAA edit types, which don't implement the proper `alter_edit_pending` methods.
2024-06-18 17054, 2024
bitmap[m]
I'm wondering if we should temporarily deploy the beta image there, since it would be difficult to extract all of the necessary bits of code needed to the production branch (and complicate reverting the changes later)
We have type descriptions, so that's trivial, it just needs copying from https://beta.musicbrainz.org/admin/attributes/Eve… (ideally we'd actually make that public but the PR doing that is still not approved IIRC so we can put them on the wiki for now)
2024-06-18 17010, 2024
bitmap[m]
BrainzBot: determining which images are marked as front is unfortunately a schema change
2024-06-18 17014, 2024
yvanzo[m]
bitmap: It can be useful to deploy beta to production-cron to also test using the PG standby.
2024-06-18 17021, 2024
reosarevok[m]
The non-poster one very much does not seem in any case like it needs to happen in beta
2024-06-18 17051, 2024
reosarevok[m]
Oh, if it can't, more for that, but in any case, it's not a beta ticket really, it's a perfectly fine long term improvement suggeston
2024-06-18 17056, 2024
reosarevok[m]
s/suggeston/suggestion/
2024-06-18 17047, 2024
yvanzo[m]
It isn’t complicated to implement either.
2024-06-18 17003, 2024
yvanzo[m]
We can make a decision at least before forgetting about it.
2024-06-18 17024, 2024
bitmap[m]
I was also thinking we were just querying /front, but that's not accurate, we should just be able to query for a fallback image from the database
2024-06-18 17059, 2024
bitmap[m]
but indexing them as front images is a schema change at least
2024-06-18 17020, 2024
reosarevok[m]
Allowing something like flyer seems fine to me
2024-06-18 17021, 2024
yvanzo[m]
front is for release? are we discussing events?
2024-06-18 17024, 2024
reosarevok[m]
As fallback
2024-06-18 17039, 2024
reosarevok[m]
Yes, events, but it's still the concept of "frontiest" because there's no better way :)