in #metabrainz

0:38 AM
dseomn has quit
0:40 AM
davic has quit
0:42 AM
dseomn joined the channel
1:36 AM
Nyanko-sensei has quit
1:36 AM
Nyanko-sensei joined the channel
2:32 AM
MajorLurker joined the channel
2:36 AM
MajorLurker has quit
2:40 AM
Sigyn has quit
2:48 AM
Sigyn joined the channel
3:14 AM
yef has quit
3:15 AM
yef joined the channel
3:15 AM
yef has quit
3:15 AM
yef joined the channel
3:45 AM
Rohan_Pillai joined the channel
5:02 AM
Rohan_Pillai has quit
5:12 AM
sumedh joined the channel
5:35 AM
sumedh has quit
5:39 AM
Rohan_Pillai joined the channel
6:34 AM
MajorLurker joined the channel
6:38 AM
Rohan_Pillai has quit
6:39 AM
MajorLurker has quit
6:47 AM
sumedh joined the channel
7:26 AM
Rohan_Pillai joined the channel
7:55 AM
sumedh has quit
8:12 AM
Rohan_Pillai has quit
8:13 AM
Rohan_Pillai joined the channel
8:28 AM
ShraddhaAg_ joined the channel
8:30 AM
revi_ joined the channel
8:31 AM
leonh_ joined the channel
8:31 AM
mat_ joined the channel
8:31 AM
milkii_ joined the channel
8:32 AM
pprkut_ joined the channel
8:32 AM
BrainzGit

[listenbrainz-server] jdaok opened pull request #1314 (master…timestampformat): Timestampformat https://github.com/metabrainz/listenbrainz-serv...
8:32 AM
urluck_ joined the channel
8:33 AM
Rohan_Pillai has quit
8:34 AM
Protab joined the channel
8:38 AM
Rohan_Pillai joined the channel
8:43 AM
ShraddhaAg has quit
8:43 AM
mat___ has quit
8:43 AM
milkii has quit
8:43 AM
leonh has quit
8:43 AM
pprkut has quit
8:43 AM
revi has quit
8:43 AM
Rotab has quit
8:43 AM
urluck has quit
8:43 AM
pprkut_ is now known as pprkut
8:43 AM
urluck_ is now known as urluck
8:43 AM
revi_ is now known as revi
8:47 AM
ruaok

ooiin!
9:01 AM
_lucifer

ruaok: the messages sent back to lemmy should be one message per user or all users in a single message ?
9:03 AM
reosarevok

yvanzo: I know you've talked about configurable columns for data display before - do you know if we have a ticket for that? (https://tickets.metabrainz.org/browse/MBS-11414 is about that and I'm wondering if it's a dupe)
9:03 AM
BrainzBot

MBS-11414: Collection view should allow managing (thus adding missing) columns
9:05 AM
ruaok

_lucifer: all in one.
9:05 AM
_lucifer

👍
9:05 AM
ruaok

updating the table row by row would be painfully slow. I plan to insert rows into a new table and then atomically swap the tables into production.
9:25 AM
_lucifer

ruaok: i just pushed the initial implementation for user similarity. i was going through how lemmy requests similar users and think that it'll probably need a couple of changes.
9:25 AM
ruaok

ok, what does it need?
9:26 AM
_lucifer

as we decided earlier about separating dataframe creation, so the similar user request should just send a threshold
9:26 AM
ruaok

ah, ok. np, will fix.
9:26 AM
_lucifer

before sending a request for similar users, we need to manually request dataframes
9:26 AM
ruaok

makes sense.
9:27 AM
_lucifer

that part uses days instead of years, so the request should send number of days instead of years
9:28 AM
ruaok

theoretically that part should not need any changes right?
9:28 AM
just use days, yes?
9:30 AM
years argument removed.
9:38 AM
_lucifer

the days part no. but the request should now send a job_type as well to denote whether the dataframe is being generated for recommendations or user similarity
9:39 AM
ruaok

what are the two exact string values possible for job_type?
9:39 AM
_lucifer

i am using "recommendation" and "user_similarity" for now but that can be changed
9:40 AM
ruaok

I hope to have similar artist collaborative filtering soon. that will make the candidate set selection for recording CF work a lot better.
9:41 AM
will the dataframes generated for "recommendation" be suitable for artist recommendation and recording recommendation?
9:41 AM
if not, we should name "recommendation" to "recommendation_recording".
9:45 AM
_lucifer

i think those will be different, we were able to reuse the dataframes in this case because we use recordings for both things
9:45 AM
recommendation_recording sounds good, on a similar note will we want to have user_similarity based on artists?
9:46 AM
ruaok

going with "recommendation_recording" then.
9:46 AM
> on a similar note will we want to have user_similarity based on artists?
9:46 AM
I dont see an immediate need for that -- we need to look at the results of what you've created so far.
9:46 AM
then we'll see.
9:47 AM
but if you find yourself bored, you could work on the CF artists feature. I theory most of it is copypasta.
9:47 AM
_lucifer

makes sense.
9:47 AM
ruaok

In theory...
9:48 AM
_lucifer

sure, but first need to test and iron out this feature first :)
9:51 AM
ruaok

agreed.
9:51 AM
the data saving is the primary task for today. hopefully we can test later this afternoon.
9:53 AM
_lucifer

i'll be unavailable between 2-6PM CET. let's do it after 6 today or tomorrow
9:54 AM
ruaok

ok, I wont be available after 6pm, so lets see what we can do before then. or tomorrow.
9:54 AM
_lucifer

cool. in the meanwhile, i'll work on documenting the spark side and writing unit tests.
9:54 AM
ruaok

great.
10:07 AM
reosarevok

bitmap: is this deadlock connected to the ones you're hoping to remove ? https://tickets.metabrainz.org/browse/MBS-9683
10:07 AM
BrainzBot

MBS-9683: Database deadlock on add artist edits
10:08 AM
Mr_Monkey

iliekcomputers: Hi! Do we have a definitive format for the `user/XXX/feed` API endpoint? I know /feed/listens was returning a list of listens, but I've assumed the following structure instead and wanted to compare with your plan:
10:08 AM
https://www.irccloud.com/pastebin/yIab5LBH/
10:08 AM
alastairp

I've recently been using a chrome profile that doesn't have an ad blocker installed, and the internet is awful. truly terrible
10:08 AM
many kudos to metabrainz for not having ads on any of its products
10:09 AM
ruaok

the number of asswipes we need to fend off each week trying to sell us ads. sigh.
10:09 AM
the advertising requests catch-all has reduced them a lot though: https://metabrainz.org/contact
10:10 AM
alastairp: got a sec to discuss a minor topic.
10:10 AM
Mr_Monkey

`advertising@metabrainz.org > /dev/null` ?
10:11 AM
alastairp

ruaok: you've got 10 minutes before I go for a bike ride
10:11 AM
ruaok

Mr_Monkey: yep. and that deadline string is always next month 1st. :)
10:11 AM
alastairp: ok, for atomically rotating postgres tables into place.
10:11 AM
Rohan_Pillai has quit
10:11 AM
Mr_Monkey

lulz
10:12 AM
ruaok

one sec
10:14 AM
Mr_Monkey

Oh hey alastairp, your pet peeve has been answered I think! https://github.com/metabrainz/listenbrainz-serv...
10:15 AM
alastairp

whic pet peeve, I've got lots of them :)
10:16 AM
ruaok

sorry package delivery.
10:16 AM
PG table rotation.
10:16 AM
there is a feature to rename a table, which allows atomic swapping in of tables.
10:16 AM
it, however, does not rename its indexes. which is a pain in the ass.
10:17 AM
so, we either rename all the indexes or we give indexes unique names so we don't have to rename indexes.
10:17 AM
this is a pattern I've come across twice now and on a table that has more than 1 index, this process gets butt ugly.
10:17 AM
alastairp

or rename old indexes then make new table with correct names, then move
10:17 AM
which isn't great either
10:18 AM
why are you rotating tables? what new data is coming in and why is it so different that we need to make a new tbale and rename it?
10:18 AM
ruaok

https://github.com/metabrainz/listenbrainz-serv...
10:18 AM
similar users, for instance. we would have to diff the existing data to the new data to update it. or just blow it all away an insert new and swap tables in.
10:19 AM
the latter is MUCH faster, much less error prone.
10:19 AM
alastairp

data structure is the same?
10:19 AM
ruaok

exactly the same.
10:19 AM
alastairp

just throwing some ideas around without knowing the problem area too well: views?
10:19 AM
ruaok

and TRUNCATE would have an exclusive lock on the table for too long.
10:19 AM
alastairp

table partitioning? https://www.alibabacloud.com/blog/postgresql-da...
10:20 AM
ruaok

what I am doing is the right thing to do, I am sure. that is not what I want to talk about.
10:20 AM
alastairp

let me mull this over on the ride, will let you know later this afternoon
10:20 AM
ruaok

I haven't even gotten to ask my question yet....
10:23 AM
d4rkie joined the channel
10:26 AM
Nyanko-sensei has quit
10:30 AM
Protab is now known as Rotab
10:31 AM
reosarevok

yvanzo: in https://github.com/metabrainz/musicbrainz-serve... - tests are failing because of one French example
10:31 AM
https://www.irccloud.com/pastebin/ILrx9rEg/
10:32 AM
With the code change, that returns "Acte 1, no. 7 : Chœur : « Voyons brigadier »"
10:32 AM
Is that actually wrong though?
10:35 AM
MajorLurker joined the channel
10:40 AM
MajorLurker has quit
10:49 AM
ruaok

iliekcomputers: you up for a quick technical discussion?
10:53 AM
iliekcomputers

Yep
10:53 AM
ruaok

cool.
10:54 AM
for the similar users feature I need to create a parallel table, populate it and then swap it in in one transaction.
10:54 AM
nothing challenging here.
10:54 AM
more a "how do we do this cleanly" question.
10:55 AM
we have the table definition in create_tables.sql.
10:55 AM
but now I need to run that single table creation script again as part of a INSERT INTO query.
10:55 AM
which duplicates a critical table definition and that blows.
10:56 AM
any idea how to have that knowledge live in code and the .sql file?
10:56 AM
iliekcomputers

What do you mean by parallel table?
10:57 AM
ruaok

the similar_users table is in production. now we want to update the table with new data from spark.
10:57 AM
the fastest way to do this is not to diff the table, but to create a new parallel table with the same table structure, INSERT INTO, CREATE INDEX, then in a transaction RENAME TABLE.
10:58 AM
this allows the table to always be available with no downtime.
11:00 AM
iliekcomputers

So every time new data comes in, we'll create a new table, drop the old one and rename the new one?
11:00 AM
ruaok

almost.