#metabrainz

/

16:14 PM
lucifer

yes that happens currently with PG setup but that is not the behaviour with the proposed couchdb setup.

2022-07-22 20327, 2022

16:16 PM
alastairp

mm, now I'm a bit confused as to how this works with the couchdb setup

2022-07-22 20337, 2022

16:16 PM
lucifer

artists_this_month_Jan - [A's stats, B's stats]

2022-07-22 20344, 2022

16:16 PM
lucifer

A has't submitted in Feb, spark only generates B stats as no listens for A

2022-07-22 20348, 2022

16:16 PM
lucifer

artists_this_month_Feb - [B'stats]

2022-07-22 20356, 2022

16:16 PM
lucifer

once Feb database is finished inserting, end message comes from spark and deletes Jan's database.

2022-07-22 20340, 2022

16:17 PM
alastairp

ah. there are only 2 databases present while the insert is happening into the new one?

2022-07-22 20342, 2022

16:17 PM
lucifer

so when we query for A now in Feb, we get no stats.

2022-07-22 20344, 2022

16:17 PM
alastairp

and then the old one is deleted?

2022-07-22 20346, 2022

16:17 PM
lucifer

yes right

2022-07-22 20353, 2022

16:17 PM
alastairp

I understood that there were always 2

2022-07-22 20358, 2022

16:17 PM
alastairp

ok, now that makes sense, thanks

2022-07-22 20300, 2022

16:18 PM
lucifer

ah sorry.

2022-07-22 20305, 2022

16:18 PM
alastairp

np

2022-07-22 20307, 2022

16:18 PM
alastairp

ok, dumps?

2022-07-22 20336, 2022

16:18 PM
lucifer

sure

2022-07-22 20358, 2022

16:18 PM
lucifer

so we export stats biweekly as part of full dumps.

2022-07-22 20312, 2022

16:19 PM
lucifer

we'll have to add a new json dump for stats since PG dumps don't fit this.

2022-07-22 20339, 2022

16:19 PM
lucifer

the issue i am currently stuck at is how to coordinate dump process with insert process.

2022-07-22 20310, 2022

16:20 PM
alastairp

so that we don't dump a half-inserted table?

2022-07-22 20356, 2022

16:20 PM
lucifer

yes. that's 1 possibility, other one is that we are dumping the full database but end message arrives and deletes it while we haven't finished exporting.

2022-07-22 20325, 2022

16:21 PM
alastairp

mmm, right

2022-07-22 20358, 2022

16:21 PM
lucifer

first option i have in mind is this:

2022-07-22 20308, 2022

16:23 PM
lucifer

retrieve all user ids from databases. lookup each user's stats using usual lookup. some stats from today's database, others from yesterdays database.

2022-07-22 20305, 2022

16:25 PM
lucifer

this works but possible drawback is that it maybe slower and the dump might have stats for users from different days.

2022-07-22 20352, 2022

16:25 PM
lucifer

one user's this week is actually this week but other's last week because their stat for this week hasn't been generated yet.

2022-07-22 20324, 2022

16:26 PM
lucifer

but again this is status quo. so i am only concerned about the speed issue.

2022-07-22 20312, 2022

16:27 PM
lucifer

note that 2 dbs here because i am thinking of the bad case that export and insert time is conflicting.

2022-07-22 20322, 2022

16:27 PM
alastairp

yes, right

2022-07-22 20359, 2022

16:27 PM
alastairp

you know, I'm thinking again that both of these problems (insertion and dumps) can probably be solved clearly by putting current db into postgres...

2022-07-22 20317, 2022

16:28 PM
alastairp

it seems like we're creating all sorts of workarounds because we decided that we didn't want to do it

2022-07-22 20335, 2022

16:29 PM
lucifer

yes but there's another issue which isn't solved by putting database name in postgres

2022-07-22 20342, 2022

16:30 PM
alastairp

well, two out of three aint bad

2022-07-22 20300, 2022

16:31 PM
lucifer

stored yesterday's database name in postgres. new database created for today. database insertion completes then it goes on to delete yesterday's database.

2022-07-22 20344, 2022

16:31 PM
lucifer

we aren't done exporting yesterday's database but it went away.

2022-07-22 20317, 2022

16:32 PM
lucifer

we could probably build in retry and start exporting new one again and this time we know it won't error because no way its taking more 1 day to export

2022-07-22 20351, 2022

16:32 PM
alastairp

yeah, we could also have a flag which says "this db is being dumped", and if so don't delete it

2022-07-22 20312, 2022

16:33 PM
alastairp

will you do the dump by retrieving batches, or will you get it all at once?

2022-07-22 20346, 2022

16:33 PM
lucifer

alternatively we could do a SELECT FOR UPDATE lock which blocks the insert process from updating the database's name from in PG

2022-07-22 20351, 2022

16:33 PM
lucifer

batches.

2022-07-22 20324, 2022

16:35 PM
alastairp

will the spark writer add multiple types of stats in a single run?

2022-07-22 20337, 2022

16:35 PM
lucifer

umm actually lock may not work but yes flag sounds good.

2022-07-22 20306, 2022

16:36 PM
lucifer

Its always 1) Start message 2) Stats for particular type and range 3) End message

2022-07-22 20306, 2022

16:36 PM
alastairp

not sure if we want it blocking after creating 1 type, and then having to wait for dumps to finish before it progresses onto the next

2022-07-22 20347, 2022

16:36 PM
alastairp

ah, so in the case of a block, it'd happen only in response to the end message for a particular type?

2022-07-22 20352, 2022

16:36 PM
lucifer

yes

2022-07-22 20314, 2022

16:37 PM
alastairp

and we'd also have to decide whose responsiblity it is to delete the old database if we find ourselves in this situation

2022-07-22 20322, 2022

16:38 PM
lucifer

yes we have the option to 1) block spark reader till export for that stat is done and it can delete 2) leave the database as it is, it gets deleted next day 3) add cron job.

2022-07-22 20334, 2022

16:38 PM
lucifer

i like 2 most fwiw.

2022-07-22 20356, 2022

16:38 PM
alastairp

yeah, 2 sounds good

2022-07-22 20317, 2022

16:41 PM
lucifer

there's another way to do this if we want to avoid storing in PG. write a file named say LOCKED to database. spark reader checks for this file in couchdb database before deleting. if its there it moves on and then next day cleanup,

2022-07-22 20305, 2022

16:45 PM
alastairp

if we don't want to use postgres, that also sounds fine

2022-07-22 20304, 2022

16:47 PM
lucifer

this one is easier to implement currently so I'll try this out first, if i am stuck then will try out PG impl.

2022-07-22 20319, 2022

16:47 PM
lucifer

thanks!

2022-07-22 20342, 2022

16:49 PM
alastairp

no problem. looking forward to see how it turns out

2022-07-22 20344, 2022

17:39 PM
CatQuest joined the channel

2022-07-22 20349, 2022

17:39 PM
CatQuest

zas: https://www.reddit.com/r/ProgrammerDadJokes/comme…

2022-07-22 20329, 2022

17:40 PM
CatQuest has left the channel

2022-07-22 20311, 2022

17:51 PM
monkey

Hi chinmay !

2022-07-22 20303, 2022

17:52 PM
monkey

I'm pretty much AFK for the week, but in short you understood right, we only have the one global context we use for shared props in LB.

2022-07-22 20353, 2022

17:52 PM
monkey

The rest is higher order components like the alert component, and basic passing down props from components to children

2022-07-22 20316, 2022

17:55 PM
monkey

That doesn't mean we can't have more contexts if they make sense.

2022-07-22 20349, 2022

17:59 PM
BrainzGit

[musicbrainz-server] 14mwiencek merged pull request #2591 (03master…flow-0.183.0): Upgrade Flow to 0.183.0 https://github.com/metabrainz/musicbrainz-server/…

2022-07-22 20309, 2022

18:38 PM
ephemer0l has quit

2022-07-22 20312, 2022

18:45 PM
chinmay

monkey: thanks for the clarification. I'll see what other pattern can be used to match the consistency of the existing code..

2022-07-22 20349, 2022

18:47 PM
chinmay

Some context: I am working on implementing filters for my soc project which includes some array manipulation in a `Filters` component and sharing it with `ReleaseCard` component. I think I will need some data sharing when I work on a `Timeline` component later

2022-07-22 20356, 2022

18:48 PM
chinmay

I am kind of stuck on implementing filters and I should have asked for help before instead of mindlessly trying out things until something works.

2022-07-22 20334, 2022

19:17 PM
BrainzGit

[musicbrainz-server] 14reosarevok merged pull request #2592 (03production…MBS-12512): MBS-12512: Dump genre_alias_type for sample DB https://github.com/metabrainz/musicbrainz-server/…

2022-07-22 20335, 2022

19:39 PM
ephemer0l joined the channel

2022-07-22 20323, 2022

19:45 PM
ephemer0l has quit

2022-07-22 20300, 2022

19:50 PM
Sophist_UK joined the channel

2022-07-22 20301, 2022

19:50 PM
Sophist-UK has quit

2022-07-22 20339, 2022

19:50 PM
ephemer0l joined the channel

2022-07-22 20308, 2022

20:04 PM
BrainzGit

[musicbrainz-server] 14reosarevok opened pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-server/…

2022-07-22 20305, 2022

20:27 PM
BrainzGit

[musicbrainz-server] 14reosarevok merged pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-server/…

2022-07-22 20301, 2022

21:04 PM
ephemer0l has quit

2022-07-22 20335, 2022

23:01 PM
ephemer0l joined the channel