9:37 AM
pristine___
2020-08-13 22652, 2020
9:37 AM
pristine___
This is where we perform the join on mapping.
2020-08-13 22659, 2020
9:37 AM
iliekcomputers
yes. there might be people who've only connected spotify to listen to music.
2020-08-13 22607, 2020
9:38 AM
pristine___
And listens.
2020-08-13 22620, 2020
9:38 AM
yvanzo
reosarevok: gh:MBS#1610 is approved too
2020-08-13 22621, 2020
9:38 AM
BrainzBot
2020-08-13 22630, 2020
9:38 AM
iliekcomputers
will have to look at the permissions we have in the permission column too.
2020-08-13 22655, 2020
9:38 AM
reosarevok
Thanks!
2020-08-13 22645, 2020
9:39 AM
ruaok
pristine___: yeah, confirmed. not a mapping problem. a missing data problem in MB -- not a glaring one,but still.
2020-08-13 22608, 2020
9:40 AM
pristine___
Hmm... I like the idea of reporting it to the users
2020-08-13 22620, 2020
9:40 AM
ruaok
the individual artists are there, but the collaborations have not been entered. that means its #2 (as you said) and we should prepare a report so that users can go enter the data.
2020-08-13 22642, 2020
9:40 AM
ruaok
2020-08-13 22607, 2020
9:41 AM
ruaok
the best we can do is seed the release editor with the data we already know -- and the link above explains how to do that.
2020-08-13 22643, 2020
9:41 AM
ruaok
so, if we can write a report that has a pile of links on it that open the release editor and seed it with the data we have, then the process of adding the data to MB is made a wee bit easier. and we want exactly that.
2020-08-13 22644, 2020
9:42 AM
pristine___
With the data we already know?
2020-08-13 22648, 2020
9:42 AM
ruaok
yes!
2020-08-13 22650, 2020
9:42 AM
ruaok
2020-08-13 22641, 2020
9:43 AM
ruaok
iliekcomputers: are any of those perm combos we should not update the record_listens for?
2020-08-13 22650, 2020
9:43 AM
pristine___
But we want to add the data thay we don't have.
2020-08-13 22652, 2020
9:43 AM
iliekcomputers
i think we can do it for anyone with the `user-read-recently-played` permission.
2020-08-13 22601, 2020
9:44 AM
pristine___
I think I am not clear about it.
2020-08-13 22609, 2020
9:44 AM
ruaok
> With the data we already know?
2020-08-13 22618, 2020
9:44 AM
ruaok
with the data from LB, not the data from MB.
2020-08-13 22653, 2020
9:44 AM
pristine___
Woops. So you mean the data we have spotted in LB but ain't available in MB
2020-08-13 22655, 2020
9:44 AM
ruaok
so for artist `Zack Knight, Jasmin Walia` we already have those two artists in the MB db.
2020-08-13 22658, 2020
9:44 AM
pristine___
?
2020-08-13 22603, 2020
9:45 AM
ruaok
exactly that.
2020-08-13 22610, 2020
9:45 AM
pristine___
Oooooo. Right
2020-08-13 22611, 2020
9:45 AM
ruaok
in as much as it is possible.
2020-08-13 22639, 2020
9:45 AM
pristine___
I will have to think about the implementation. I get the basic idea though
2020-08-13 22646, 2020
9:45 AM
ruaok
likely we are only going to have 1 track with of info -- but even that makes adding that track, especially in light of releases easier.
2020-08-13 22600, 2020
9:46 AM
ruaok
also, in a lot of cases we will have a spotify id, right?
2020-08-13 22609, 2020
9:46 AM
pristine___
Hmm
2020-08-13 22623, 2020
9:46 AM
ruaok
so, then we could fetch the spotify metadata for the release for that track and use it to the seed the release editor.
2020-08-13 22625, 2020
9:46 AM
ruaok
woooooo!
2020-08-13 22611, 2020
9:47 AM
sumedh has quit
2020-08-13 22645, 2020
9:47 AM
pristine___
Yay. Also I am happy to know that the queries in candidate_sets are fine :)
2020-08-13 22611, 2020
9:48 AM
pristine___
Cool. So this sums up the discussion on mapping.
2020-08-13 22648, 2020
9:48 AM
pristine___
Lemme know whenever you review the artist-artist-relation code (the ticket)
2020-08-13 22627, 2020
9:49 AM
pristine___
I don't have my laptop today but I will be online.
2020-08-13 22624, 2020
9:51 AM
ruaok
2020-08-13 22650, 2020
9:51 AM
ruaok
once I finish this spotify perms thing, I'm on it.
2020-08-13 22651, 2020
9:52 AM
iliekcomputers
lgtm
2020-08-13 22602, 2020
9:53 AM
iliekcomputers
actually
2020-08-13 22608, 2020
9:53 AM
iliekcomputers
for precautions sake
2020-08-13 22623, 2020
9:53 AM
iliekcomputers
could we first extract a list of users this would change?
2020-08-13 22638, 2020
9:53 AM
iliekcomputers
so that we know which ones to revert in case it all goes kaput
2020-08-13 22611, 2020
9:54 AM
iliekcomputers
select user_id from spotify_user where {same condition as in update query}
2020-08-13 22623, 2020
9:56 AM
iliekcomputers
2020-08-13 22607, 2020
9:57 AM
iliekcomputers
2020-08-13 22628, 2020
10:02 AM
ruaok
ok, user list saved.
2020-08-13 22631, 2020
10:03 AM
v6lur joined the channel
2020-08-13 22639, 2020
10:03 AM
ruaok
2020-08-13 22615, 2020
10:04 AM
iliekcomputers
lgtm.
2020-08-13 22612, 2020
10:07 AM
ruaok
2020-08-13 22652, 2020
10:07 AM
iliekcomputers
cool, cool, cool.
2020-08-13 22601, 2020
10:08 AM
iliekcomputers
459 is a lot 🙈
2020-08-13 22643, 2020
10:08 AM
ruaok pops open grafana to see what that does to our queue.
2020-08-13 22649, 2020
10:09 AM
ruaok
probably won't be visible.
2020-08-13 22603, 2020
10:21 AM
ruaok
pristine___: the code to calculate artist credit similarities works fine, but the dump code is borked.
2020-08-13 22637, 2020
10:21 AM
pristine___
Where at? Any link?
2020-08-13 22619, 2020
10:22 AM
pristine___
Of the code
2020-08-13 22613, 2020
10:24 AM
ruaok
2020-08-13 22637, 2020
10:24 AM
ruaok
here I am cobbling artist credit names together, when a fully assembled one is already in the DB.
2020-08-13 22658, 2020
10:24 AM
ruaok
I just need to fetch it instead. but I think I need to find lunch first, can't concentrate.
2020-08-13 22608, 2020
10:26 AM
pristine___
No hurry. I am happy that we figured out the problem. shivam-kapila will probably have a better similar artist playlist by the end of the day :)
2020-08-13 22641, 2020
10:28 AM
iliekcomputers
ruaok: how many listens would you estimate we're adding every day
2020-08-13 22603, 2020
10:29 AM
iliekcomputers
i think it's at least 800-900k
2020-08-13 22647, 2020
10:34 AM
ruaok
I honestly have no idea.
2020-08-13 22655, 2020
10:34 AM
ruaok
it will be much more now. :)
2020-08-13 22616, 2020
10:35 AM
ruaok
but before we fixed these accounts, I think it was far less than that.
2020-08-13 22629, 2020
10:35 AM
ruaok
we may fetch that many from spotify, but 95% are dups.
2020-08-13 22625, 2020
10:36 AM
ruaok
yeah, 350M. I mean timescale knocked a few M of those out, but its nowhere approaching 1M per day. sadly.
2020-08-13 22635, 2020
10:36 AM
ruaok
but, I'd love for you to be right.
2020-08-13 22606, 2020
10:37 AM
iliekcomputers
:D
2020-08-13 22620, 2020
10:37 AM
iliekcomputers
i think we've been having a few good days at least.
2020-08-13 22621, 2020
10:38 AM
ishaanshah
iliekcomputers: can we store the number of listens received that day in the DB at the end of the day?
2020-08-13 22633, 2020
10:38 AM
ishaanshah
maybe we can make a graph for that?
2020-08-13 22622, 2020
10:39 AM
iliekcomputers
yeah, we could.
2020-08-13 22637, 2020
10:39 AM
iliekcomputers
i won't do it in the PR i have, but that's a good idea.
2020-08-13 22658, 2020
10:39 AM
iliekcomputers
i'll follow up with a cron job that takes it from redis and stores it in pg
2020-08-13 22646, 2020
10:40 AM
ishaanshah
sounds good :D
2020-08-13 22618, 2020
10:41 AM
ishaanshah
I'll look into the graph part later then
2020-08-13 22644, 2020
10:41 AM
ishaanshah
would be a good addition to sitewide stats
2020-08-13 22630, 2020
10:43 AM
iliekcomputers
++
2020-08-13 22609, 2020
10:45 AM
SothoTalKer_ has quit
2020-08-13 22639, 2020
10:46 AM
SothoTalKer joined the channel
2020-08-13 22634, 2020
10:47 AM
pristine___
> maybe we can make a graph for that?
2020-08-13 22640, 2020
10:47 AM
pristine___
<3
2020-08-13 22658, 2020
11:25 AM
v6lur has quit
2020-08-13 22618, 2020
11:44 AM
BrainzGit
2020-08-13 22637, 2020
11:48 AM
travis-ci joined the channel
2020-08-13 22637, 2020
11:48 AM
travis-ci
2020-08-13 22637, 2020
11:48 AM
travis-ci has left the channel
2020-08-13 22647, 2020
11:50 AM
ishaanshah
pristine___: ping
2020-08-13 22647, 2020
11:50 AM
iliekcomputers
2020-08-13 22637, 2020
11:52 AM
travis-ci joined the channel
2020-08-13 22637, 2020
11:52 AM
travis-ci
2020-08-13 22637, 2020
11:52 AM
travis-ci has left the channel
2020-08-13 22626, 2020
11:59 AM
BrainzGit
2020-08-13 22611, 2020
12:01 PM
CatQuest
2020-08-13 22629, 2020
12:03 PM
travis-ci joined the channel
2020-08-13 22629, 2020
12:03 PM
travis-ci
2020-08-13 22629, 2020
12:03 PM
travis-ci has left the channel
2020-08-13 22605, 2020
12:13 PM
travis-ci joined the channel
2020-08-13 22605, 2020
12:13 PM
travis-ci
2020-08-13 22605, 2020
12:13 PM
travis-ci has left the channel
2020-08-13 22630, 2020
12:20 PM
ruaok
2020-08-13 22653, 2020
12:20 PM
ruaok
do you know what is going on with those two SVG files? I didn't modify them.
2020-08-13 22611, 2020
12:21 PM
ruaok
reset hard does not get rid of them rm and checkout does not either.
2020-08-13 22613, 2020
12:22 PM
pristine___
ishaanshah: yeah
2020-08-13 22624, 2020
12:22 PM
iliekcomputers
not sure.
2020-08-13 22627, 2020
12:22 PM
ishaanshah
Hi
2020-08-13 22631, 2020
12:22 PM
iliekcomputers
first time i'm seeing it.
2020-08-13 22642, 2020
12:22 PM
yvanzo has quit
2020-08-13 22603, 2020
12:23 PM
ishaanshah
I was using MSID MBID mapping to improve the results for stats
2020-08-13 22637, 2020
12:23 PM
ishaanshah
I ran into outOfMemory error while using it
2020-08-13 22617, 2020
12:24 PM
ishaanshah
I just wanted to ask that would it cause an issue on prod?
2020-08-13 22631, 2020
12:24 PM
ishaanshah
I mean is there any optimisation that can be made
2020-08-13 22632, 2020
12:24 PM
ruaok
pristine___: bug fixed, working on a new dump now. with up-to-date data even
2020-08-13 22640, 2020
12:24 PM
yvanzo joined the channel
2020-08-13 22603, 2020
12:25 PM
ishaanshah
I'll link the query just asec
2020-08-13 22606, 2020
12:26 PM
pristine___
ruaok: yay. Will you upload it on FTP?
2020-08-13 22619, 2020
12:26 PM
ishaanshah
2020-08-13 22629, 2020
12:26 PM
ruaok
yes, pristine___
2020-08-13 22633, 2020
12:26 PM
ishaanshah
line 75
2020-08-13 22640, 2020
12:27 PM
pristine___
A sec
2020-08-13 22631, 2020
12:29 PM
pristine___
Why do you want to to left join?
2020-08-13 22622, 2020
12:30 PM
ishaanshah
So that we dont skip artists which haven't been mapped
2020-08-13 22632, 2020
12:30 PM
pristine___
Not relates to optimization was just curious.
2020-08-13 22634, 2020
12:30 PM
pristine___
Ah
2020-08-13 22639, 2020
12:30 PM
BrainzGit
2020-08-13 22644, 2020
12:30 PM
ishaanshah
inner would skip those right
2020-08-13 22632, 2020
12:31 PM
pristine___
OOM is generally when two huge tables are joined
2020-08-13 22648, 2020
12:31 PM
pristine___
> inner would skip those right
2020-08-13 22608, 2020
12:32 PM
pristine___
Yes. Rec use inner since we Strictly need MBIDs
2020-08-13 22628, 2020
12:32 PM
pristine___
Broadcast join is one of the options.
2020-08-13 22627, 2020
12:34 PM
ishaanshah
broadcast joins generally work with one small and one large table right?
2020-08-13 22627, 2020
12:35 PM
travis-ci joined the channel
2020-08-13 22627, 2020
12:35 PM
travis-ci
2020-08-13 22627, 2020
12:35 PM
travis-ci has left the channel
2020-08-13 22657, 2020
12:37 PM
pristine___
I am not sure. But the basic idea is that each excutor should have a copy of the table to minimize shuffling and stuff. I had OOMS back then, I played a lot with driver memory, executor memory and other configs and came down to configs we use now. They might need to be changed with data size.
2020-08-13 22616, 2020
12:38 PM
BrainzGit
2020-08-13 22611, 2020
12:39 PM
pristine___
I would like to have a look at the error if possible
2020-08-13 22637, 2020
12:39 PM
ishaanshah
oops I closed the terminal window
2020-08-13 22642, 2020
12:39 PM
ishaanshah
I will reproduce it
2020-08-13 22647, 2020
12:39 PM
ishaanshah
give me 2 mins
2020-08-13 22630, 2020
12:40 PM
travis-ci joined the channel
2020-08-13 22630, 2020
12:40 PM
travis-ci
2020-08-13 22630, 2020
12:40 PM
travis-ci has left the channel