#metabrainz

/

      • dseomn has quit
      • davic has quit
      • dseomn joined the channel
      • Nyanko-sensei has quit
      • Nyanko-sensei joined the channel
      • MajorLurker joined the channel
      • MajorLurker has quit
      • Sigyn has quit
      • Sigyn joined the channel
      • yef has quit
      • yef joined the channel
      • yef has quit
      • yef joined the channel
      • Rohan_Pillai joined the channel
      • Rohan_Pillai has quit
      • sumedh joined the channel
      • sumedh has quit
      • Rohan_Pillai joined the channel
      • MajorLurker joined the channel
      • Rohan_Pillai has quit
      • MajorLurker has quit
      • sumedh joined the channel
      • Rohan_Pillai joined the channel
      • sumedh has quit
      • Rohan_Pillai has quit
      • Rohan_Pillai joined the channel
      • ShraddhaAg_ joined the channel
      • revi_ joined the channel
      • leonh_ joined the channel
      • mat_ joined the channel
      • milkii_ joined the channel
      • pprkut_ joined the channel
      • BrainzGit
        [listenbrainz-server] jdaok opened pull request #1314 (master…timestampformat): Timestampformat https://github.com/metabrainz/listenbrainz-serv...
      • urluck_ joined the channel
      • Rohan_Pillai has quit
      • Protab joined the channel
      • Rohan_Pillai joined the channel
      • ShraddhaAg has quit
      • mat___ has quit
      • milkii has quit
      • leonh has quit
      • pprkut has quit
      • revi has quit
      • Rotab has quit
      • urluck has quit
      • pprkut_ is now known as pprkut
      • urluck_ is now known as urluck
      • revi_ is now known as revi
      • ruaok
        ooiin!
      • _lucifer
        ruaok: the messages sent back to lemmy should be one message per user or all users in a single message ?
      • reosarevok
        yvanzo: I know you've talked about configurable columns for data display before - do you know if we have a ticket for that? (https://tickets.metabrainz.org/browse/MBS-11414 is about that and I'm wondering if it's a dupe)
      • BrainzBot
        MBS-11414: Collection view should allow managing (thus adding missing) columns
      • ruaok
        _lucifer: all in one.
      • _lucifer
        👍
      • ruaok
        updating the table row by row would be painfully slow. I plan to insert rows into a new table and then atomically swap the tables into production.
      • _lucifer
        ruaok: i just pushed the initial implementation for user similarity. i was going through how lemmy requests similar users and think that it'll probably need a couple of changes.
      • ruaok
        ok, what does it need?
      • _lucifer
        as we decided earlier about separating dataframe creation, so the similar user request should just send a threshold
      • ruaok
        ah, ok. np, will fix.
      • _lucifer
        before sending a request for similar users, we need to manually request dataframes
      • ruaok
        makes sense.
      • _lucifer
        that part uses days instead of years, so the request should send number of days instead of years
      • ruaok
        theoretically that part should not need any changes right?
      • just use days, yes?
      • years argument removed.
      • _lucifer
        the days part no. but the request should now send a job_type as well to denote whether the dataframe is being generated for recommendations or user similarity
      • ruaok
        what are the two exact string values possible for job_type?
      • _lucifer
        i am using "recommendation" and "user_similarity" for now but that can be changed
      • ruaok
        I hope to have similar artist collaborative filtering soon. that will make the candidate set selection for recording CF work a lot better.
      • will the dataframes generated for "recommendation" be suitable for artist recommendation and recording recommendation?
      • if not, we should name "recommendation" to "recommendation_recording".
      • _lucifer
        i think those will be different, we were able to reuse the dataframes in this case because we use recordings for both things
      • recommendation_recording sounds good, on a similar note will we want to have user_similarity based on artists?
      • ruaok
        going with "recommendation_recording" then.
      • > on a similar note will we want to have user_similarity based on artists?
      • I dont see an immediate need for that -- we need to look at the results of what you've created so far.
      • then we'll see.
      • but if you find yourself bored, you could work on the CF artists feature. I theory most of it is copypasta.
      • _lucifer
        makes sense.
      • ruaok
        In theory...
      • _lucifer
        sure, but first need to test and iron out this feature first :)
      • ruaok
        agreed.
      • the data saving is the primary task for today. hopefully we can test later this afternoon.
      • _lucifer
        i'll be unavailable between 2-6PM CET. let's do it after 6 today or tomorrow
      • ruaok
        ok, I wont be available after 6pm, so lets see what we can do before then. or tomorrow.
      • _lucifer
        cool. in the meanwhile, i'll work on documenting the spark side and writing unit tests.
      • ruaok
        great.
      • reosarevok
        bitmap: is this deadlock connected to the ones you're hoping to remove ? https://tickets.metabrainz.org/browse/MBS-9683
      • BrainzBot
        MBS-9683: Database deadlock on add artist edits
      • Mr_Monkey
        iliekcomputers: Hi! Do we have a definitive format for the `user/XXX/feed` API endpoint? I know /feed/listens was returning a list of listens, but I've assumed the following structure instead and wanted to compare with your plan:
      • alastairp
        I've recently been using a chrome profile that doesn't have an ad blocker installed, and the internet is awful. truly terrible
      • many kudos to metabrainz for not having ads on any of its products
      • ruaok
        the number of asswipes we need to fend off each week trying to sell us ads. sigh.
      • the advertising requests catch-all has reduced them a lot though: https://metabrainz.org/contact
      • alastairp: got a sec to discuss a minor topic.
      • Mr_Monkey
      • alastairp
        ruaok: you've got 10 minutes before I go for a bike ride
      • ruaok
        Mr_Monkey: yep. and that deadline string is always next month 1st. :)
      • alastairp: ok, for atomically rotating postgres tables into place.
      • Rohan_Pillai has quit
      • Mr_Monkey
        lulz
      • ruaok
        one sec
      • Mr_Monkey
        Oh hey alastairp, your pet peeve has been answered I think! https://github.com/metabrainz/listenbrainz-serv...
      • alastairp
        whic pet peeve, I've got lots of them :)
      • ruaok
        sorry package delivery.
      • PG table rotation.
      • there is a feature to rename a table, which allows atomic swapping in of tables.
      • it, however, does not rename its indexes. which is a pain in the ass.
      • so, we either rename all the indexes or we give indexes unique names so we don't have to rename indexes.
      • this is a pattern I've come across twice now and on a table that has more than 1 index, this process gets butt ugly.
      • alastairp
        or rename old indexes then make new table with correct names, then move
      • which isn't great either
      • why are you rotating tables? what new data is coming in and why is it so different that we need to make a new tbale and rename it?
      • ruaok
      • similar users, for instance. we would have to diff the existing data to the new data to update it. or just blow it all away an insert new and swap tables in.
      • the latter is MUCH faster, much less error prone.
      • alastairp
        data structure is the same?
      • ruaok
        exactly the same.
      • alastairp
        just throwing some ideas around without knowing the problem area too well: views?
      • ruaok
        and TRUNCATE would have an exclusive lock on the table for too long.
      • alastairp
      • ruaok
        what I am doing is the right thing to do, I am sure. that is not what I want to talk about.
      • alastairp
        let me mull this over on the ride, will let you know later this afternoon
      • ruaok
        I haven't even gotten to ask my question yet....
      • d4rkie joined the channel
      • Nyanko-sensei has quit
      • Protab is now known as Rotab
      • reosarevok
        yvanzo: in https://github.com/metabrainz/musicbrainz-serve... - tests are failing because of one French example
      • With the code change, that returns "Acte 1, no. 7 : Chœur : « Voyons brigadier »"
      • Is that actually wrong though?
      • MajorLurker joined the channel
      • MajorLurker has quit
      • ruaok
        iliekcomputers: you up for a quick technical discussion?
      • iliekcomputers
        Yep
      • ruaok
        cool.
      • for the similar users feature I need to create a parallel table, populate it and then swap it in in one transaction.
      • nothing challenging here.
      • more a "how do we do this cleanly" question.
      • we have the table definition in create_tables.sql.
      • but now I need to run that single table creation script again as part of a INSERT INTO query.
      • which duplicates a critical table definition and that blows.
      • any idea how to have that knowledge live in code and the .sql file?
      • iliekcomputers
        What do you mean by parallel table?
      • ruaok
        the similar_users table is in production. now we want to update the table with new data from spark.
      • the fastest way to do this is not to diff the table, but to create a new parallel table with the same table structure, INSERT INTO, CREATE INDEX, then in a transaction RENAME TABLE.
      • this allows the table to always be available with no downtime.
      • iliekcomputers
        So every time new data comes in, we'll create a new table, drop the old one and rename the new one?
      • ruaok
        almost.