#metabrainz

/

      • lucifer
        yes that happens currently with PG setup but that is not the behaviour with the proposed couchdb setup.
      • alastairp
        mm, now I'm a bit confused as to how this works with the couchdb setup
      • lucifer
        artists_this_month_Jan - [A's stats, B's stats]
      • A has't submitted in Feb, spark only generates B stats as no listens for A
      • artists_this_month_Feb - [B'stats]
      • once Feb database is finished inserting, end message comes from spark and deletes Jan's database.
      • alastairp
        ah. there are only 2 databases present while the insert is happening into the new one?
      • lucifer
        so when we query for A now in Feb, we get no stats.
      • alastairp
        and then the old one is deleted?
      • lucifer
        yes right
      • alastairp
        I understood that there were always 2
      • ok, now that makes sense, thanks
      • lucifer
        ah sorry.
      • alastairp
        np
      • ok, dumps?
      • lucifer
        sure
      • so we export stats biweekly as part of full dumps.
      • we'll have to add a new json dump for stats since PG dumps don't fit this.
      • the issue i am currently stuck at is how to coordinate dump process with insert process.
      • alastairp
        so that we don't dump a half-inserted table?
      • lucifer
        yes. that's 1 possibility, other one is that we are dumping the full database but end message arrives and deletes it while we haven't finished exporting.
      • alastairp
        mmm, right
      • lucifer
        first option i have in mind is this:
      • retrieve all user ids from databases. lookup each user's stats using usual lookup. some stats from today's database, others from yesterdays database.
      • this works but possible drawback is that it maybe slower and the dump might have stats for users from different days.
      • one user's this week is actually this week but other's last week because their stat for this week hasn't been generated yet.
      • but again this is status quo. so i am only concerned about the speed issue.
      • note that 2 dbs here because i am thinking of the bad case that export and insert time is conflicting.
      • alastairp
        yes, right
      • you know, I'm thinking again that both of these problems (insertion and dumps) can probably be solved clearly by putting current db into postgres...
      • it seems like we're creating all sorts of workarounds because we decided that we didn't want to do it
      • lucifer
        yes but there's another issue which isn't solved by putting database name in postgres
      • alastairp
        well, two out of three aint bad
      • lucifer
        stored yesterday's database name in postgres. new database created for today. database insertion completes then it goes on to delete yesterday's database.
      • we aren't done exporting yesterday's database but it went away.
      • we could probably build in retry and start exporting new one again and this time we know it won't error because no way its taking more 1 day to export
      • alastairp
        yeah, we could also have a flag which says "this db is being dumped", and if so don't delete it
      • will you do the dump by retrieving batches, or will you get it all at once?
      • lucifer
        alternatively we could do a SELECT FOR UPDATE lock which blocks the insert process from updating the database's name from in PG
      • batches.
      • alastairp
        will the spark writer add multiple types of stats in a single run?
      • lucifer
        umm actually lock may not work but yes flag sounds good.
      • Its always 1) Start message 2) Stats for particular type and range 3) End message
      • alastairp
        not sure if we want it blocking after creating 1 type, and then having to wait for dumps to finish before it progresses onto the next
      • ah, so in the case of a block, it'd happen only in response to the end message for a particular type?
      • lucifer
        yes
      • alastairp
        and we'd also have to decide whose responsiblity it is to delete the old database if we find ourselves in this situation
      • lucifer
        yes we have the option to 1) block spark reader till export for that stat is done and it can delete 2) leave the database as it is, it gets deleted next day 3) add cron job.
      • i like 2 most fwiw.
      • alastairp
        yeah, 2 sounds good
      • lucifer
        there's another way to do this if we want to avoid storing in PG. write a file named say LOCKED to database. spark reader checks for this file in couchdb database before deleting. if its there it moves on and then next day cleanup,
      • alastairp
        if we don't want to use postgres, that also sounds fine
      • lucifer
        this one is easier to implement currently so I'll try this out first, if i am stuck then will try out PG impl.
      • thanks!
      • alastairp
        no problem. looking forward to see how it turns out
      • CatQuest joined the channel
      • CatQuest
      • CatQuest has left the channel
      • monkey
        Hi chinmay !
      • I'm pretty much AFK for the week, but in short you understood right, we only have the one global context we use for shared props in LB.
      • The rest is higher order components like the alert component, and basic passing down props from components to children
      • That doesn't mean we can't have more contexts if they make sense.
      • BrainzGit
        [musicbrainz-server] 14mwiencek merged pull request #2591 (03master…flow-0.183.0): Upgrade Flow to 0.183.0 https://github.com/metabrainz/musicbrainz-serve...
      • ephemer0l has quit
      • chinmay
        monkey: thanks for the clarification. I'll see what other pattern can be used to match the consistency of the existing code..
      • Some context: I am working on implementing filters for my soc project which includes some array manipulation in a `Filters` component and sharing it with `ReleaseCard` component. I think I will need some data sharing when I work on a `Timeline` component later
      • I am kind of stuck on implementing filters and I should have asked for help before instead of mindlessly trying out things until something works.
      • BrainzGit
        [musicbrainz-server] 14reosarevok merged pull request #2592 (03production…MBS-12512): MBS-12512: Dump genre_alias_type for sample DB https://github.com/metabrainz/musicbrainz-serve...
      • ephemer0l joined the channel
      • ephemer0l has quit
      • Sophist_UK joined the channel
      • Sophist-UK has quit
      • ephemer0l joined the channel
      • [musicbrainz-server] 14reosarevok opened pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-serve...
      • [musicbrainz-server] 14reosarevok merged pull request #2594 (03master…MBS-12515): MBS-12515: Check _gid_redirect table exists before trying to use it https://github.com/metabrainz/musicbrainz-serve...
      • ephemer0l has quit
      • ephemer0l joined the channel