yes that happens currently with PG setup but that is not the behaviour with the proposed couchdb setup.
2022-07-22 20327, 2022
alastairp
mm, now I'm a bit confused as to how this works with the couchdb setup
2022-07-22 20337, 2022
lucifer
artists_this_month_Jan - [A's stats, B's stats]
2022-07-22 20344, 2022
lucifer
A has't submitted in Feb, spark only generates B stats as no listens for A
2022-07-22 20348, 2022
lucifer
artists_this_month_Feb - [B'stats]
2022-07-22 20356, 2022
lucifer
once Feb database is finished inserting, end message comes from spark and deletes Jan's database.
2022-07-22 20340, 2022
alastairp
ah. there are only 2 databases present while the insert is happening into the new one?
2022-07-22 20342, 2022
lucifer
so when we query for A now in Feb, we get no stats.
2022-07-22 20344, 2022
alastairp
and then the old one is deleted?
2022-07-22 20346, 2022
lucifer
yes right
2022-07-22 20353, 2022
alastairp
I understood that there were always 2
2022-07-22 20358, 2022
alastairp
ok, now that makes sense, thanks
2022-07-22 20300, 2022
lucifer
ah sorry.
2022-07-22 20305, 2022
alastairp
np
2022-07-22 20307, 2022
alastairp
ok, dumps?
2022-07-22 20336, 2022
lucifer
sure
2022-07-22 20358, 2022
lucifer
so we export stats biweekly as part of full dumps.
2022-07-22 20312, 2022
lucifer
we'll have to add a new json dump for stats since PG dumps don't fit this.
2022-07-22 20339, 2022
lucifer
the issue i am currently stuck at is how to coordinate dump process with insert process.
2022-07-22 20310, 2022
alastairp
so that we don't dump a half-inserted table?
2022-07-22 20356, 2022
lucifer
yes. that's 1 possibility, other one is that we are dumping the full database but end message arrives and deletes it while we haven't finished exporting.
2022-07-22 20325, 2022
alastairp
mmm, right
2022-07-22 20358, 2022
lucifer
first option i have in mind is this:
2022-07-22 20308, 2022
lucifer
retrieve all user ids from databases. lookup each user's stats using usual lookup. some stats from today's database, others from yesterdays database.
2022-07-22 20305, 2022
lucifer
this works but possible drawback is that it maybe slower and the dump might have stats for users from different days.
2022-07-22 20352, 2022
lucifer
one user's this week is actually this week but other's last week because their stat for this week hasn't been generated yet.
2022-07-22 20324, 2022
lucifer
but again this is status quo. so i am only concerned about the speed issue.
2022-07-22 20312, 2022
lucifer
note that 2 dbs here because i am thinking of the bad case that export and insert time is conflicting.
2022-07-22 20322, 2022
alastairp
yes, right
2022-07-22 20359, 2022
alastairp
you know, I'm thinking again that both of these problems (insertion and dumps) can probably be solved clearly by putting current db into postgres...
2022-07-22 20317, 2022
alastairp
it seems like we're creating all sorts of workarounds because we decided that we didn't want to do it
2022-07-22 20335, 2022
lucifer
yes but there's another issue which isn't solved by putting database name in postgres
2022-07-22 20342, 2022
alastairp
well, two out of three aint bad
2022-07-22 20300, 2022
lucifer
stored yesterday's database name in postgres. new database created for today. database insertion completes then it goes on to delete yesterday's database.
2022-07-22 20344, 2022
lucifer
we aren't done exporting yesterday's database but it went away.
2022-07-22 20317, 2022
lucifer
we could probably build in retry and start exporting new one again and this time we know it won't error because no way its taking more 1 day to export
2022-07-22 20351, 2022
alastairp
yeah, we could also have a flag which says "this db is being dumped", and if so don't delete it
2022-07-22 20312, 2022
alastairp
will you do the dump by retrieving batches, or will you get it all at once?
2022-07-22 20346, 2022
lucifer
alternatively we could do a SELECT FOR UPDATE lock which blocks the insert process from updating the database's name from in PG
2022-07-22 20351, 2022
lucifer
batches.
2022-07-22 20324, 2022
alastairp
will the spark writer add multiple types of stats in a single run?
2022-07-22 20337, 2022
lucifer
umm actually lock may not work but yes flag sounds good.
2022-07-22 20306, 2022
lucifer
Its always 1) Start message 2) Stats for particular type and range 3) End message
2022-07-22 20306, 2022
alastairp
not sure if we want it blocking after creating 1 type, and then having to wait for dumps to finish before it progresses onto the next
2022-07-22 20347, 2022
alastairp
ah, so in the case of a block, it'd happen only in response to the end message for a particular type?
2022-07-22 20352, 2022
lucifer
yes
2022-07-22 20314, 2022
alastairp
and we'd also have to decide whose responsiblity it is to delete the old database if we find ourselves in this situation
2022-07-22 20322, 2022
lucifer
yes we have the option to 1) block spark reader till export for that stat is done and it can delete 2) leave the database as it is, it gets deleted next day 3) add cron job.
2022-07-22 20334, 2022
lucifer
i like 2 most fwiw.
2022-07-22 20356, 2022
alastairp
yeah, 2 sounds good
2022-07-22 20317, 2022
lucifer
there's another way to do this if we want to avoid storing in PG. write a file named say LOCKED to database. spark reader checks for this file in couchdb database before deleting. if its there it moves on and then next day cleanup,
2022-07-22 20305, 2022
alastairp
if we don't want to use postgres, that also sounds fine
2022-07-22 20304, 2022
lucifer
this one is easier to implement currently so I'll try this out first, if i am stuck then will try out PG impl.
2022-07-22 20319, 2022
lucifer
thanks!
2022-07-22 20342, 2022
alastairp
no problem. looking forward to see how it turns out
monkey: thanks for the clarification. I'll see what other pattern can be used to match the consistency of the existing code..
2022-07-22 20349, 2022
chinmay
Some context: I am working on implementing filters for my soc project which includes some array manipulation in a `Filters` component and sharing it with `ReleaseCard` component. I think I will need some data sharing when I work on a `Timeline` component later
2022-07-22 20356, 2022
chinmay
I am kind of stuck on implementing filters and I should have asked for help before instead of mindlessly trying out things until something works.
[musicbrainz-server] 14reosarevok opened pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-server/…
2022-07-22 20305, 2022
BrainzGit
[musicbrainz-server] 14reosarevok merged pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-server/…