yvanzo[m]: right, I guess the idea is that the polling task would start from the very first edit (another scenario triggers couldn't cover)
yvanzo[m]
That wasn’t my idea, but it seems even better.
yellowhatpro[m]
I was thinking to archive the URLs that are there before the service starts, but should we be doing it simultaneously, whiile the archiver is archiving the current set of URLs
?
yvanzo[m]
That is a possibility.
yellowhatpro[m]
But I am limiting my thoughts due to rate limiting as well.
I know that we won't be getting much URLs per edit/edit note
bitmap[m]
I guess it depends on how fast it can archive the existing URLs
yellowhatpro[m]
* edit/edit note, but the previous ones are still many right
yvanzo[m]
It will be rate limited by the API.
yellowhatpro[m]
bitmap[m]: But we still are capped by IA rate limit right
yvanzo[m]
I don’t see any test in the repo.
bitmap[m]
yes, you can do a rough calculation based on the IA rate limit and number of existing URLs
yvanzo[m]
Do you have a mock API for testing, or is it always running with the real API?
yellowhatpro[m]
yvanzo[m]: yeah sorry, I was exploring tests past week.
Did'nt add any rn. Will start writing them
yvanzo[m]: Currently there is no mock API. I am assuming the URL to be saved, and then it would return some JOB_ID, which I would update in the internet_archive_urls table
In MBS, we do record/replay HTTP requests for integration tests.
bitmap[m]
<yellowhatpro[m]> "I see. I will try to keep it..." <- I wouldn't expect a separate repo is needed, rather an existing mock API like yvanzo referenced is used
yellowhatpro[m]
Yupp, will add the required mocks then.
Any tip for db tests btw?
Like what areas should I cover in that aspect?
yvanzo[m]
Start with something very simple, such as hand-crafted examples if needed.
bitmap[m]
<yellowhatpro[m]> "What other aspects should we..." <- how are you generating this list? automatically adding domains that fail consistently? allowing manual entries?
yellowhatpro[m]
Currently was thinking to manually enter, but I shall check if there is any domain filtering list
Adding domains that fail consistently is one good thing I can implement
yvanzo[m]
If you need an example to start with, musicbrainz.org URLs should be filtered out.
yellowhatpro[m]
Like if there is any site, which fails to archive ever after x retries, I can prolly add it to some domain filtering table
yvanzo[m]
A manually-curated list would be nice to have just in case some domain names must be temporarily ignored.
yellowhatpro[m]
Right
discordbrainz
<05rustynova> DB tests depend on your DB driver. Sqlx is auto testing on your DB at compile time. Add the serde feature and it's done.
<05rustynova> CDD FTW
fletchto99_ joined the channel
yellowhatpro[m]
Yupp I am using sqlx
Oh that's cool
fletchto99 has quit
fletchto99_ is now known as fletchto99
yvanzo[m]
bitmap, yellowhatpro: What else did you want to discuss?
discordbrainz
<05rustynova> I know that sea orm also use sqlx under the hood, but I don't know about the compile time checking. Although for ORMs it's just a matter of checking the schema in the code against the schema in the DB.
yellowhatpro[m]
yvanzo[m]: I also remember reo once said that we should prevent spam users edits as well iirc.
yvanzo[m]
Yes but that is a separate issue.
Basically, when retrieving rows from edit_data, it should be checked that the edit author isn’t marked as spammer.
Same for edit_note.
Also consider edit_note_change for later on.
yellowhatpro[m]
<yvanzo[m]> "That is one more use case for..." <- Also this. Sorry, I think I misunderstood earlier, but is it just a cli script to write, right? Sorta simple methods to write entries to `internet_archive_urls` ?
Can you elaborate a bit on this. I might have misunderstood it from earlier meet as well.
yvanzo[m]
yellowhatpro: Ok, from “week 1”, “a small, workable app, which can perform the above-mentioned tasks” and “will be tested individually” made me thought that the app would be able to perform any of these tasks individually when requested through arguments.
atj[m]
zas, yvanzo: just FYI, now that we have a firewall I've update Solr to listen on the wildcard address, so you can make API requests to localhost as well as the LAN IP. The HAProxy backend configuration is now the same on all nodes.
yellowhatpro[m]
Oh got it. what I meant was to work on them separately, and checking if I was able to get the result from it correctly.
atj[m]
I've force-pushed the updates to my solr branch in the metabrainz-ansible repo
yvanzo[m]
For example: To queue a URL for archiving: melba --queue-url http://example.com/; To queue an edit for archiving the URLs it contain: melba --queue-edit 42
yellowhatpro[m]
For that purpose, I thought working on separate branches would be fine
bitmap[m]
<yvanzo[m]> "bitmap, yellowhatpro: What..." <- sorry I had to go afk for a moment, nothing specific from me, just to answer any of yellowhatpro's concerns
yvanzo[m]
yellowhatpro[m]: It has to be in the same branch eventually.
yellowhatpro[m]
yvanzo[m]: I didn't think it that way, but I can come up with cmd line args feature support
yvanzo[m]: Yupp got the point
bitmap[m]
<yellowhatpro[m]> "Any tip for db tests btw?..." <- > <@yellowhatpro:matrix.org> Any tip for db tests btw?
> Like what areas should I cover in that aspect?
look at what DB queries the archiver is making, and make sure you have test data that satisfies each type of query. at least one example for each edit type
yvanzo[m]
yellowhatpro[m]: See the crate clap
yellowhatpro[m]
bitmap[m]: Got it, will work on this ✅
yvanzo[m]: yeah I have used it in past so will impl it sooon
About your summary: Tests can be split into documentation tests (the easiest to start with) and unit tests (simple to add as well and very useful). Integration tests can be delayed until there is proper mocking of SQL and HTTP.
I would suggest to prioritize tests at first (as it can help with spotting basic issues), then command-line (as it can help with debugging), and the rest.
yellowhatpro[m]
Thanks. working on it ✅
yvanzo[m]
You’re doing well, keep it up! 😎
bitmap, yellowhatpro: Thanks for your time!
atj[m]
Sounds like this is going to be a really useful project
I've looked at IA in vain many times trying to see if edits were correct etc.
yellowhatpro[m]
yvanzo[m]: Thanks. I didn't outline many things earlier in proposal, but as we move I can see there are many things to work on. Thanks y'all for help, will make things work
atj[m]: Oh cool. Well that's motivation
atj[m]
[@yvanzo](https://matrix.to/#/@yvanzo:chatbrainz.org): I don't know what your thoughts are but I feel like we either do the Solr switch tomorrow or wait 2 weeks until I'm back from holiday.
yvanzo[m]
atj, bitmap: I patched all the MB web containers to allow switching to the new cluster (using HTTPS).
v6lur joined the channel
atj[m]
Is there a new mb-solr release? I'm using a snapshot extracted from the Docker build I think
reosarevok: yvanzo: MBS-13628 is caused by the production-cron container only having stub edit classes for the EAA edit types, which don't implement the proper `alter_edit_pending` methods.
I'm wondering if we should temporarily deploy the beta image there, since it would be difficult to extract all of the necessary bits of code needed to the production branch (and complicate reverting the changes later)
We have type descriptions, so that's trivial, it just needs copying from https://beta.musicbrainz.org/admin/attributes/E... (ideally we'd actually make that public but the PR doing that is still not approved IIRC so we can put them on the wiki for now)
bitmap[m]
BrainzBot: determining which images are marked as front is unfortunately a schema change
yvanzo[m]
bitmap: It can be useful to deploy beta to production-cron to also test using the PG standby.
reosarevok[m]
The non-poster one very much does not seem in any case like it needs to happen in beta
Oh, if it can't, more for that, but in any case, it's not a beta ticket really, it's a perfectly fine long term improvement suggeston
s/suggeston/suggestion/
yvanzo[m]
It isn’t complicated to implement either.
We can make a decision at least before forgetting about it.
bitmap[m]
I was also thinking we were just querying /front, but that's not accurate, we should just be able to query for a fallback image from the database
but indexing them as front images is a schema change at least
reosarevok[m]
Allowing something like flyer seems fine to me
yvanzo[m]
front is for release? are we discussing events?
reosarevok[m]
As fallback
Yes, events, but it's still the concept of "frontiest" because there's no better way :)