#metabrainz

/

      • alastairp
        mmm. that's an interesting question which is kinda difficult to answer
      • 2022-07-22 20357, 2022

      • Pratha-Fish
        alastairp: That one's in the making to :)
      • 2022-07-22 20301, 2022

      • Pratha-Fish
        *too
      • 2022-07-22 20313, 2022

      • alastairp
        in fact, if you have a process that has a file open (e.g. you did `fp = open("/path/to/file")`) and you still haven't closed it
      • 2022-07-22 20316, 2022

      • alastairp
        then it's actually not deleted
      • 2022-07-22 20326, 2022

      • alastairp
        linux only deletes a file once all open file handles are closed
      • 2022-07-22 20334, 2022

      • mayhem
        alastairp: gràcies!
      • 2022-07-22 20348, 2022

      • alastairp
        so... maybe you have it open somewhere? there's a way of poking about to find a reference to it
      • 2022-07-22 20309, 2022

      • alastairp
        I've definitely used this to recover files before
      • 2022-07-22 20318, 2022

      • Pratha-Fish
        alastairp: Apparently VScode killed all instances before deleting it ⚰️
      • 2022-07-22 20342, 2022

      • alastairp
        it's also why (for example) if you delete all of your apache log files because you're running out of disk space, but don't restart apache, you find that your disk space hasn't actually been freed up
      • 2022-07-22 20349, 2022

      • alastairp
        (this one is from personal experience)
      • 2022-07-22 20340, 2022

      • alastairp
        and it's why apache has a mode where you can send it a signal (SIGHUP) and it'll close and re-open all of its file handles, so when you use a tool such as logrotate to manage your logfiles you can tell apache to give up the filehandle so that you can delete them
      • 2022-07-22 20303, 2022

      • Pratha-Fish
        Wow that one's interesting
      • 2022-07-22 20349, 2022

      • Pratha-Fish
        By Apache do you mean Apache web server or a general trait amongst Apache products (spark, etc?)
      • 2022-07-22 20321, 2022

      • skelly37
        outsidecontext, zas, rdswift: Not-proper exit finally solved with just one line!
      • 2022-07-22 20323, 2022

      • skelly37
      • 2022-07-22 20303, 2022

      • skelly37
        Leaving in few hours until Monday, I'll read your reviews later :)
      • 2022-07-22 20350, 2022

      • BrainzGit
        [musicbrainz-server] 14reosarevok opened pull request #2593 (03master…MBS-12497): MBS-12497: Drop AC redirects when last use is removed https://github.com/metabrainz/musicbrainz-server/…
      • 2022-07-22 20308, 2022

      • reosarevok
        yvanzo, bitmap: draft I need your input on to decide how to best approach this ^
      • 2022-07-22 20354, 2022

      • alastairp
        Pratha-Fish: hah, I guess I'm old enough that "apache" still means "the webserver" to me :)
      • 2022-07-22 20320, 2022

      • alastairp
      • 2022-07-22 20349, 2022

      • Pratha-Fish
        _That's some experience right there_
      • 2022-07-22 20356, 2022

      • reosarevok
        yvanzo, bitmap: CASCADE seems like the least bad option to me, but that's a schema change - all good fixes seem like schema changes unless I'm missing something
      • 2022-07-22 20328, 2022

      • reosarevok is also old, apparently
      • 2022-07-22 20325, 2022

      • alastairp
        reosarevok: it happens to the best of us
      • 2022-07-22 20332, 2022

      • akshaaatt
        Hi yellowhatpro! It’s fine that you thought about the animations, but as you said, we shiuld proceed with the mockup for now.
      • 2022-07-22 20339, 2022

      • akshaaatt
        Should*
      • 2022-07-22 20335, 2022

      • akshaaatt
        Hi aerozol! That sounds nice but we need effort in getting the profile page first. The listening now feature sounds cool, but don’t know how complex it would be to implement.
      • 2022-07-22 20302, 2022

      • Lotheric has quit
      • 2022-07-22 20343, 2022

      • Lotheric joined the channel
      • 2022-07-22 20346, 2022

      • Lotheric has quit
      • 2022-07-22 20341, 2022

      • Lotheric joined the channel
      • 2022-07-22 20304, 2022

      • skelly37
        zas, outsidecontext: I've changed my recent PR a bit, made it as little "dirty" as possible by storing info about unexpected removal in Pipe and then checking it after we capture the tagger.run()'s exit code to determine if we can exit nicely.
      • 2022-07-22 20341, 2022

      • skelly37 has quit
      • 2022-07-22 20304, 2022

      • Hotrod2k joined the channel
      • 2022-07-22 20324, 2022

      • Hotrod2k
        Hello Looking for an irc channel to download books?
      • 2022-07-22 20324, 2022

      • Hotrod2k
        #bookbrainz
      • 2022-07-22 20311, 2022

      • alastairp
        Hotrod2k: no
      • 2022-07-22 20332, 2022

      • Hotrod2k has quit
      • 2022-07-22 20331, 2022

      • yuzie has quit
      • 2022-07-22 20312, 2022

      • lucifer
        mayhem: 👍
      • 2022-07-22 20324, 2022

      • lucifer
        alastairp: hi! around?
      • 2022-07-22 20347, 2022

      • alastairp
        lucifer: I am
      • 2022-07-22 20309, 2022

      • lucifer
        alastairp: brainstorming about couchdb dumps and wanted to discuss the same.
      • 2022-07-22 20313, 2022

      • alastairp
        sure
      • 2022-07-22 20344, 2022

      • lucifer
        we generate stats from spark daily. once the stats have been generated we send a start message to LB, the reader creates a database to store the stat named as {stat_type}_{range}_YYYYMMDD (usually today's date).
      • 2022-07-22 20316, 2022

      • lucifer
        then we send all the stats and insert it in this database. finally we send an end message to mark end of stats deletion.
      • 2022-07-22 20337, 2022

      • lucifer
        LB side goes on to delete older database of that particular stat and range.
      • 2022-07-22 20343, 2022

      • alastairp
        the spark reader does this?
      • 2022-07-22 20346, 2022

      • lucifer
        yes
      • 2022-07-22 20353, 2022

      • alastairp
        ok
      • 2022-07-22 20303, 2022

      • lucifer
        we do not have a way to mark databases as ready so what we do is query database whose {stat_type}_{range} when the user wants to view a stat in descending order of the date suffix.
      • 2022-07-22 20321, 2022

      • alastairp
        oh, interesting
      • 2022-07-22 20331, 2022

      • lucifer
        say we have artists_week_20220722 and artists_week_20220723
      • 2022-07-22 20348, 2022

      • alastairp
        can you rename databases?
      • 2022-07-22 20350, 2022

      • lucifer
        check in the 23 one, if user's data not found check in 22, still not found then 204.
      • 2022-07-22 20313, 2022

      • lucifer
        no. (future versions of couchdb may support it though)
      • 2022-07-22 20321, 2022

      • alastairp
        boo
      • 2022-07-22 20352, 2022

      • alastairp
        what's a "database" in couchdb?
      • 2022-07-22 20358, 2022

      • alastairp
        does it match a pg database, or a pg table?
      • 2022-07-22 20341, 2022

      • lucifer
        i'd consider it a table. with each row as 1 document.
      • 2022-07-22 20344, 2022

      • lucifer
      • 2022-07-22 20357, 2022

      • alastairp
        right
      • 2022-07-22 20301, 2022

      • lucifer
        most of the relevant code to interact with couchdb is located here.
      • 2022-07-22 20323, 2022

      • alastairp
        so just to confirm - your current question is to know how we can decide if a coucbdb database is ready to be queried?
      • 2022-07-22 20302, 2022

      • lucifer
        ah no, we can't decide it afaik so we just try latest to oldest.
      • 2022-07-22 20350, 2022

      • alastairp
        did we discuss having a separate piece of data indicating the db to check? (e.g. in redis or postgres)
      • 2022-07-22 20300, 2022

      • lucifer
        it should be 2 http queries at most, today's database or yesterday's. we delete daily so 2 databases at most exist at any time (when the latest oone is currently not ready).
      • 2022-07-22 20301, 2022

      • alastairp
        I recall us talking about it but can't remember what we decided
      • 2022-07-22 20332, 2022

      • lucifer
        yes we discussed that, iirc we decided in favor of the current impl.
      • 2022-07-22 20332, 2022

      • alastairp
        is this 2 queries per [requesting a user's stats]?
      • 2022-07-22 20339, 2022

      • lucifer
        yes
      • 2022-07-22 20305, 2022

      • lucifer
        2 HTTP queries to be specific.
      • 2022-07-22 20316, 2022

      • alastairp
        so it's possible that for 1 user they might have something in today's db, but for another user it's not yet been inserted and so it goes to yesterday's?
      • 2022-07-22 20326, 2022

      • lucifer
        yes
      • 2022-07-22 20337, 2022

      • lucifer
        that happens currently as well fwiw.
      • 2022-07-22 20341, 2022

      • alastairp
        right
      • 2022-07-22 20359, 2022

      • lucifer
        also some stats may be for today but others old for the same user
      • 2022-07-22 20320, 2022

      • alastairp
        I'm just thinking through a few cases: is it possible that a user will have stats for yesterday, but even after they're fully computed and inserted into the db there are none for today?
      • 2022-07-22 20314, 2022

      • lucifer
        yes. that can happen in 2 cases. the user deleted their account or they didn't submit any listens for current range.
      • 2022-07-22 20333, 2022

      • alastairp
        yeah, no listens for current range is what I was thinking of
      • 2022-07-22 20334, 2022

      • lucifer
        however, in case the user lookup in db fails so couchdb is never queried.
      • 2022-07-22 20342, 2022

      • alastairp
        right
      • 2022-07-22 20328, 2022

      • lucifer
        case 2, its possible on days like 1st of the year when the last year stat changes years.
      • 2022-07-22 20359, 2022

      • lucifer
        also this is also the current behavior of the pg tables.
      • 2022-07-22 20318, 2022

      • alastairp
        you know what, I was just wondering about going back to our original discussion about saving the "current" table somewhere. and that's actually 2 queries anyway
      • 2022-07-22 20328, 2022

      • Pratha-Fish
        alastairp: I had to leave the mapper script on hold today due to some network issues. I'll try to get it done over the weekend though, so if there are any instructions you'd like to give me in advance, please go ahead :)
      • 2022-07-22 20343, 2022

      • alastairp
        Pratha-Fish: hi, no problem. I don't think I have anything extra that I need to add
      • 2022-07-22 20347, 2022

      • lucifer
        yeah indeed.
      • 2022-07-22 20352, 2022

      • alastairp
        let's talk either over the weekend or on monday
      • 2022-07-22 20302, 2022

      • Pratha-Fish
        alastairp: Alright :)
      • 2022-07-22 20313, 2022

      • alastairp
        lucifer: can couchdb's bulk insert methods help us?
      • 2022-07-22 20320, 2022

      • lucifer
        i guess redis may be faster than http but not very reliable. PG probably similar to http.
      • 2022-07-22 20331, 2022

      • alastairp
        (how large is a stats run? is it a risk for us to insert everything at once?)
      • 2022-07-22 20343, 2022

      • lucifer
        yes, already using those but batches of 10-25.
      • 2022-07-22 20310, 2022

      • lucifer
        yeah can't insert all in 1 go because each stat type X range combination ranges from 40 MB - 1.5 G.
      • 2022-07-22 20314, 2022

      • alastairp
        right. not all at once
      • 2022-07-22 20315, 2022

      • alastairp
        mmhm
      • 2022-07-22 20356, 2022

      • alastairp
        approximately how long does it take to store all stats?
      • 2022-07-22 20317, 2022

      • lucifer
        i'll need to check but >4 hrs iirc.
      • 2022-07-22 20331, 2022

      • lucifer
        this includes time to generate those as well.
      • 2022-07-22 20323, 2022

      • lucifer
        in the long run, it'll probably become infeasible to generate all this data daily but maybe wrong. we'll see when that time nears.
      • 2022-07-22 20352, 2022

      • alastairp
        yeah, right. at some point in time it might be better to add in a flag for the current db too
      • 2022-07-22 20320, 2022

      • alastairp
        but at the moment, if we already have this behaviour in postgres, and if it'll seamlessly work as new data gets added, I don't see a huge problem with it
      • 2022-07-22 20322, 2022

      • lucifer
        sorry, not sure which flag you meant?
      • 2022-07-22 20317, 2022

      • alastairp
        oh, a setting in postgres or redis which says what the current database is
      • 2022-07-22 20300, 2022

      • lucifer
        ah ok, i was referring to inserting part becoming infeasible but sure we can try redis/pg later.
      • 2022-07-22 20315, 2022

      • lucifer
        there's 1 difference from current behaviour. say you suddenly stopped submitting listens, the pg tables are never cleared so old stats remain but in couchdb we delete old stats daily so outdatded will no longer remain.
      • 2022-07-22 20322, 2022

      • alastairp
        oh yes, that may also be a problem in the future
      • 2022-07-22 20350, 2022

      • lucifer
        maybe we can discuss some options for that at summit.
      • 2022-07-22 20358, 2022

      • alastairp
        but we have different couchdb tables for each stat range type, right?
      • 2022-07-22 20305, 2022

      • lucifer
        right
      • 2022-07-22 20317, 2022

      • alastairp
        so if I stop submitting, in 1 day I'll stop having daily stats, in 1 week I'll stop having weekly stats
      • 2022-07-22 20324, 2022

      • lucifer
        right
      • 2022-07-22 20328, 2022

      • alastairp
        and my yearly stats won't stop showing until the next time we compute yearly ones?
      • 2022-07-22 20340, 2022

      • alastairp
        so that sounds like an improvement over the current setup?
      • 2022-07-22 20342, 2022

      • lucifer
        we compute all stats daily.
      • 2022-07-22 20331, 2022

      • lucifer
        but all years listens are considered for yearly stats so if you stop submitting for a year then that stat will go away
      • 2022-07-22 20308, 2022

      • alastairp
        yes, right. I'm following
      • 2022-07-22 20329, 2022

      • lucifer
        in the current setup, you'd continue seeing the last year's stats
      • 2022-07-22 20300, 2022

      • lucifer
        its probably more accurate indeed.
      • 2022-07-22 20355, 2022

      • lucifer
        any other suggestions about insert/fetch process? if not let's move to dumps.
      • 2022-07-22 20300, 2022

      • alastairp
        oh, mmm
      • 2022-07-22 20325, 2022

      • alastairp
        can you give me some examples with date boundaries for seeing last [period]'s stats?
      • 2022-07-22 20338, 2022

      • alastairp
        e.g. imagine if I have stats for Jan 1-Jan 31
      • 2022-07-22 20345, 2022

      • alastairp
        and then I stop listening all of Feb
      • 2022-07-22 20352, 2022

      • alastairp
        then during Feb I'll see Jan's stats?
      • 2022-07-22 20359, 2022

      • lucifer
        yes
      • 2022-07-22 20316, 2022

      • alastairp
        and a user who is actively listening... will see month-to-date stats, or still Jan?
      • 2022-07-22 20355, 2022

      • lucifer
        we have 2 monthly ranges. Last Month is full last month in this case, always Jan. the other range is This Month which is to-date.
      • 2022-07-22 20334, 2022

      • lucifer
        so for This Month, user who submits in Feb will see Feb stats whereas the not submitting one will continue to see Jan.
      • 2022-07-22 20307, 2022

      • alastairp
        does a stat document include what range it's for?
      • 2022-07-22 20334, 2022

      • lucifer
        yes in current setup its column. in couchdb setup, the database name contains it.
      • 2022-07-22 20345, 2022

      • alastairp
        if we're able to identify it (after we retrieve it from storage), it probably makes sense to say "you have no stats for the current month"
      • 2022-07-22 20350, 2022

      • alastairp
        oh
      • 2022-07-22 20302, 2022

      • alastairp
        but we don't know if this is because there are none, or if it's not been inserted yet
      • 2022-07-22 20306, 2022

      • alastairp
        back to this again
      • 2022-07-22 20348, 2022

      • lucifer
        we'll search in the older db and if there are none there either then say no stats.
      • 2022-07-22 20353, 2022

      • alastairp
        so it does feel a bit weird to say "month-to-date" and show Jan when we're in Feb
      • 2022-07-22 20304, 2022

      • lucifer
        indeed
      • 2022-07-22 20355, 2022

      • alastairp
        but as you said, this is the same as current behaviour?
      • 2022-07-22 20317, 2022

      • lucifer
        this is the current behaviour because we never delete the old stats. this happens because we only insert in stats table in PG. when spark sends new stats existing stats are overwritten. if no stat for this month is sent by spark then the old one remains.
      • 2022-07-22 20331, 2022

      • alastairp
        right
      • 2022-07-22 20349, 2022

      • lucifer
        in the couchdb setup, we only keep the stats that spark sent this time and get rid of the old db entirely. so the old stat goes away.
      • 2022-07-22 20316, 2022

      • alastairp
        I can see a possibility that we might want to set it up somehow so that "current x" stats always show the current stats, but for now I think it's OK to leave as-is
      • 2022-07-22 20318, 2022

      • lucifer
        so month-to-date does not show anything in Feb for the user who submitted nothing in Feb.
      • 2022-07-22 20317, 2022

      • lucifer
        iiuc, then the couchdb setup already does that "current x" always showing the current stats?
      • 2022-07-22 20353, 2022

      • lucifer
      • 2022-07-22 20321, 2022

      • alastairp
        the way I was understanding it, the user who hasn't listened to anything in feb will show "month of jan" when showing month-to-date stats, even in feb?