in #metabrainz

16:14 PM
lucifer

yes that happens currently with PG setup but that is not the behaviour with the proposed couchdb setup.
16:16 PM
alastairp

mm, now I'm a bit confused as to how this works with the couchdb setup
16:16 PM
lucifer

artists_this_month_Jan - [A's stats, B's stats]
16:16 PM
A has't submitted in Feb, spark only generates B stats as no listens for A
16:16 PM
artists_this_month_Feb - [B'stats]
16:16 PM
once Feb database is finished inserting, end message comes from spark and deletes Jan's database.
16:17 PM
alastairp

ah. there are only 2 databases present while the insert is happening into the new one?
16:17 PM
lucifer

so when we query for A now in Feb, we get no stats.
16:17 PM
alastairp

and then the old one is deleted?
16:17 PM
lucifer

yes right
16:17 PM
alastairp

I understood that there were always 2
16:17 PM
ok, now that makes sense, thanks
16:18 PM
lucifer

ah sorry.
16:18 PM
alastairp

np
16:18 PM
ok, dumps?
16:18 PM
lucifer

sure
16:18 PM
so we export stats biweekly as part of full dumps.
16:19 PM
we'll have to add a new json dump for stats since PG dumps don't fit this.
16:19 PM
the issue i am currently stuck at is how to coordinate dump process with insert process.
16:20 PM
alastairp

so that we don't dump a half-inserted table?
16:20 PM
lucifer

yes. that's 1 possibility, other one is that we are dumping the full database but end message arrives and deletes it while we haven't finished exporting.
16:21 PM
alastairp

mmm, right
16:21 PM
lucifer

first option i have in mind is this:
16:23 PM
retrieve all user ids from databases. lookup each user's stats using usual lookup. some stats from today's database, others from yesterdays database.
16:25 PM
this works but possible drawback is that it maybe slower and the dump might have stats for users from different days.
16:25 PM
one user's this week is actually this week but other's last week because their stat for this week hasn't been generated yet.
16:26 PM
but again this is status quo. so i am only concerned about the speed issue.
16:27 PM
note that 2 dbs here because i am thinking of the bad case that export and insert time is conflicting.
16:27 PM
alastairp

yes, right
16:27 PM
you know, I'm thinking again that both of these problems (insertion and dumps) can probably be solved clearly by putting current db into postgres...
16:28 PM
it seems like we're creating all sorts of workarounds because we decided that we didn't want to do it
16:29 PM
lucifer

yes but there's another issue which isn't solved by putting database name in postgres
16:30 PM
alastairp

well, two out of three aint bad
16:31 PM
lucifer

stored yesterday's database name in postgres. new database created for today. database insertion completes then it goes on to delete yesterday's database.
16:31 PM
we aren't done exporting yesterday's database but it went away.
16:32 PM
we could probably build in retry and start exporting new one again and this time we know it won't error because no way its taking more 1 day to export
16:32 PM
alastairp

yeah, we could also have a flag which says "this db is being dumped", and if so don't delete it
16:33 PM
will you do the dump by retrieving batches, or will you get it all at once?
16:33 PM
lucifer

alternatively we could do a SELECT FOR UPDATE lock which blocks the insert process from updating the database's name from in PG
16:33 PM
batches.
16:35 PM
alastairp

will the spark writer add multiple types of stats in a single run?
16:35 PM
lucifer

umm actually lock may not work but yes flag sounds good.
16:36 PM
Its always 1) Start message 2) Stats for particular type and range 3) End message
16:36 PM
alastairp

not sure if we want it blocking after creating 1 type, and then having to wait for dumps to finish before it progresses onto the next
16:36 PM
ah, so in the case of a block, it'd happen only in response to the end message for a particular type?
16:36 PM
lucifer

yes
16:37 PM
alastairp

and we'd also have to decide whose responsiblity it is to delete the old database if we find ourselves in this situation
16:38 PM
lucifer

yes we have the option to 1) block spark reader till export for that stat is done and it can delete 2) leave the database as it is, it gets deleted next day 3) add cron job.
16:38 PM
i like 2 most fwiw.
16:38 PM
alastairp

yeah, 2 sounds good
16:41 PM
lucifer

there's another way to do this if we want to avoid storing in PG. write a file named say LOCKED to database. spark reader checks for this file in couchdb database before deleting. if its there it moves on and then next day cleanup,
16:45 PM
alastairp

if we don't want to use postgres, that also sounds fine
16:47 PM
lucifer

this one is easier to implement currently so I'll try this out first, if i am stuck then will try out PG impl.
16:47 PM
thanks!
16:49 PM
alastairp

no problem. looking forward to see how it turns out
17:39 PM
CatQuest joined the channel
17:39 PM
CatQuest

zas: https://www.reddit.com/r/ProgrammerDadJokes/com...
17:40 PM
CatQuest has left the channel
17:51 PM
monkey

Hi chinmay !
17:52 PM
I'm pretty much AFK for the week, but in short you understood right, we only have the one global context we use for shared props in LB.
17:52 PM
The rest is higher order components like the alert component, and basic passing down props from components to children
17:55 PM
That doesn't mean we can't have more contexts if they make sense.
17:59 PM
BrainzGit

[musicbrainz-server] 14mwiencek merged pull request #2591 (03master…flow-0.183.0): Upgrade Flow to 0.183.0 https://github.com/metabrainz/musicbrainz-serve...
18:38 PM
ephemer0l has quit
18:45 PM
chinmay

monkey: thanks for the clarification. I'll see what other pattern can be used to match the consistency of the existing code..
18:47 PM
Some context: I am working on implementing filters for my soc project which includes some array manipulation in a `Filters` component and sharing it with `ReleaseCard` component. I think I will need some data sharing when I work on a `Timeline` component later
18:48 PM
I am kind of stuck on implementing filters and I should have asked for help before instead of mindlessly trying out things until something works.
19:17 PM
BrainzGit

[musicbrainz-server] 14reosarevok merged pull request #2592 (03production…MBS-12512): MBS-12512: Dump genre_alias_type for sample DB https://github.com/metabrainz/musicbrainz-serve...
19:39 PM
ephemer0l joined the channel
19:45 PM
ephemer0l has quit
19:50 PM
Sophist_UK joined the channel
19:50 PM
Sophist-UK has quit
19:50 PM
ephemer0l joined the channel
20:04 PM
[musicbrainz-server] 14reosarevok opened pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-serve...
20:27 PM
[musicbrainz-server] 14reosarevok merged pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-serve...
21:04 PM
ephemer0l has quit
23:01 PM
ephemer0l joined the channel