0:18 AM
Jigwally joined the channel
0:21 AM
Dugongue has quit
0:48 AM
Newgongue joined the channel
0:52 AM
Jigwally has quit
1:47 AM
Dugongue joined the channel
1:49 AM
Newgongue has quit
2:14 AM
thomasross has quit
2:38 AM
amCap1712 joined the channel
3:40 AM
Leo_Verto_ joined the channel
3:42 AM
Leo_Verto has quit
3:42 AM
Leo_Verto_ is now known as Leo_Verto
6:55 AM
outsidecontext joined the channel
7:23 AM
michelv joined the channel
7:30 AM
outsidecontext has quit
7:31 AM
outsidecontext joined the channel
7:57 AM
yvanzo
mo''in'
8:17 AM
Sophist-UK has quit
8:18 AM
Sophist-UK joined the channel
8:20 AM
Sophist_UK joined the channel
8:22 AM
Sophist-UK has quit
8:27 AM
amCap1712 has quit
8:55 AM
reosarevok
bitmap: still around?
8:57 AM
iliekcomputers
Moin!
9:05 AM
Gore|work joined the channel
9:23 AM
michelv has quit
9:35 AM
Sophist-UK joined the channel
9:38 AM
ruaok
Moooin!
9:38 AM
ruaok is sitting on a train back into town.
9:39 AM
Sophist_UK has quit
9:42 AM
reosarevok
choo choo
9:58 AM
yvanzo
sl
10:05 AM
Gazooo has quit
10:07 AM
Gazooo joined the channel
10:23 AM
Gazooo has quit
10:24 AM
Gazooo joined the channel
10:33 AM
madmouser1 has quit
10:46 AM
iliekcomputers
Isn't working remotely great
10:46 AM
:) :)
10:51 AM
amCap1712 joined the channel
11:18 AM
CatQuest
t-t-tr-trains!
11:28 AM
zas
discourse backups issue is still happening, it works in manual mode, but not in automatic mode, which fails partially, leaving a lot of files behind (hence the disk space issue we had friday), and backups cannot complete
11:29 AM
currently rebuilding web container again
12:19 PM
madmouser1 joined the channel
12:29 PM
iliekcomputers
12:29 PM
ruaok
great timing. I've just pulled up that branch and was about to ping you.
12:30 PM
iliekcomputers
takes around 1.5s on average to calculate monthly stats for a user
12:30 PM
:D
12:30 PM
ruaok
that long?
12:31 PM
iliekcomputers
i think a lot of it is the prints that are there
12:31 PM
I was doing a json dumps to see what the stats are.
12:31 PM
ruaok
I wonder if that changes once the cluster is warmed up.
12:31 PM
iliekcomputers
queries taking around 0.01 etc.
12:31 PM
ruaok
ahhh, ok.
12:31 PM
I wonder what the sum total time for all users will be.
12:36 PM
pristine_ joined the channel
12:36 PM
hi pristine_!
12:37 PM
iliekcomputers: idle thought... I wonder if we queried all top artist queries for all users, then all top release queries for all users.
12:37 PM
pristine_ has left the channel
12:38 PM
I think making different queries for for the same user probably leads to loads of cache invalidation.
12:38 PM
pristine__ joined the channel
12:38 PM
not important now, just musing.
12:38 PM
iliekcomputers
ruaok: hmm, good point.
12:38 PM
it is currently doing artist, release, recording, artist, release, recording...
12:38 PM
ruaok
yep.
12:39 PM
iliekcomputers
ruaok: the low times were a bug, the prints were before the collect() calls.
12:39 PM
ruaok
oh boo!
12:39 PM
iliekcomputers
taking around 1.5s for each user still.
12:39 PM
12:39 PM
the collect calls are the bottleneck as far as i can see.
12:39 PM
ruaok
just for a quick test...
12:40 PM
pristine__ has quit
12:40 PM
can you disable fetching anything other than just the artist query?
12:40 PM
pristine_ joined the channel
12:40 PM
iliekcomputers
right, just noting rn that artist stats are taking around ~0.7s
12:40 PM
ruaok
and see if that is any faster than doing them all for one user in one go?
12:41 PM
perfect -- that was going to be my next question.
12:41 PM
also weird that artist takes half the time.
12:42 PM
odd. collect() is the bottleneck. not query. I wonder why that is.
12:42 PM
iliekcomputers
ruaok: collect brings the data from all the nodes into the master.
12:42 PM
so it probably involves network calls.
12:45 PM
yeah, nice. average seems to be around ~0.4 now
12:45 PM
ruaok
I wonder if batches of users are the way to go.
12:46 PM
pick 100 users or so, do artist for each user, then release, ... collect everything. collate the results and stuff them into the pipeline.
12:46 PM
pristine_
ruaok: heya!!
12:47 PM
ruaok
sorry for not having paid much attention to your PRs. you're on top of the todo list for today.
12:47 PM
pristine_
iliekcomputers: ping pong!!
12:47 PM
ruaok: Not a problem.
12:47 PM
iliekcomputers
pristine_: :) :)
12:47 PM
pristine_
oh, that's great. Thank you:)
12:48 PM
Slurpee joined the channel
12:48 PM
Slurpee has quit
12:48 PM
Slurpee joined the channel
12:49 PM
so how far did we go?
12:49 PM
do we need to look for an alternative for .collect()?
12:49 PM
ruaok
iliekcomputers is currently running the LB-playground branch to see how fast it goes.
12:49 PM
pristine_
oh, okay
12:49 PM
ruaok
not so much an alternative, but we may need to use it more wiesely.
12:50 PM
pristine_
I am here and would only leave after the meeting.
12:50 PM
would wait :)
12:50 PM
ruaok
pristine_: on PR #16, iliekcomputers left a note about taking a value from the config. that is something you can jump on.
12:50 PM
12:50 PM
iliekcomputers
yes, I changed the code to do 100 entities on each query. :)
12:51 PM
also, there's a bug in the time calculation where the time taken prints before the collect() call.
12:52 PM
ruaok
which config file are you referring to on that comment iliekcomputers ? I don;t see one for LB-playground.
12:52 PM
time to add one? or simply add one at the top of the module?
12:52 PM
pristine_
ruaok: sure:)
12:52 PM
ruaok
(I'm ok with the latter, tbh)
12:52 PM
iliekcomputers
ruaok: I created a config.py.sample inside the `listenbrainz_spark` module
12:52 PM
ruaok nods
12:53 PM
took 8m35s to calculate stats for all users for jan 2019.
12:53 PM
artist stats only.
12:53 PM
iliekcomputers wishes he had printed the number of users...
12:53 PM
ruaok
I just wonder if adding these config values is necessary on a config.py setup -- I doubt it.
12:53 PM
8:35? wow.
12:53 PM
pristine_
iliekcomputers: oh yes, got the bug :(
12:54 PM
ruaok
add the count of users and run again.
12:54 PM
cache should be warm and yield a faster result.
12:55 PM
iliekcomputers
1091 users
12:55 PM
ruaok
.5s per user.
13:03 PM
iliekcomputers
8m22.647s
13:03 PM
ruaok
ok, not very interesting.
13:03 PM
iliekcomputers
13:04 PM
gonna have to do it in batches.
13:06 PM
ruaok
just checking, was the 8m figure for all artists for all 1091 users?
13:06 PM
iliekcomputers
calculated top 100 artists for the 1091 users that had listened to something in Jan 2019.
13:11 PM
>I just wonder if adding these config values is necessary on a config.py setup -- I doubt it.
13:11 PM
where else could we keep them?
13:11 PM
rabbitmq host, sentry etc
13:11 PM
ruaok
in the module itself.
13:11 PM
service config values, yes.
13:11 PM
but how many items we fetch? we change that less often, if ever at all.
13:12 PM
iliekcomputers
yeah, that def can be inside the module.
13:12 PM
ruaok
no need to clutter the config files.... and they are cluttered with stuff that hardly gets changed.
13:12 PM
given that we're starting a new module, maybe we should set new guidelines now for what goes in and what doesn't/
13:12 PM
pristine_
Should I add the limit in config?
13:13 PM
ruaok
pristine_: no, add it at the top of the module.
13:13 PM
given that we change it very little...
13:13 PM
pristine_
Okay:)
13:13 PM
iliekcomputers
>given that we're starting a new module, maybe we should set new guidelines now for what goes in and what doesn't/
13:13 PM
pristine_
limit = 100?
13:13 PM
iliekcomputers
yes please :)
13:13 PM
pristine_: yep, for now.
13:14 PM
pristine_
sounds good
13:27 PM
iliekcomputers
13:27 PM
ruaok
let me loo in a bit.
13:28 PM
I'm still a bit confused by PR#16 for lb-playground
13:28 PM
*look
13:30 PM
pristine_
confused?
13:30 PM
like?
13:30 PM
ruaok
I'm leaving comments on the PR -- I will publish a review in a second.