I used to have only one measurement called "listen" which turned out to be a bad idea.
2017-04-19 10944, 2017
ruaok
each user has their own measurement now.
2017-04-19 10922, 2017
zas
why it was a bad idea?
2017-04-19 10943, 2017
ruaok
too many points in one measurement and the load on lemmy shot to 100.
2017-04-19 10958, 2017
ruaok
spread it across many more measurements and influx is much happier.
2017-04-19 10905, 2017
zas
yup, makes sense
2017-04-19 10937, 2017
ruaok
that was hard to see when first using influx. like other non relational data stores, it takes a while to change your thinking.
2017-04-19 10959, 2017
zas
well, in theory, using user_name as tag is ok, using separate measurements for each user is obviously more complex when it comes to sum up listens
2017-04-19 10937, 2017
ruaok
yeah, but summing up listens is old skool thinking. adding a new measurement to keep track of them is key.
2017-04-19 10951, 2017
ruaok
write more, don't update/delete
2017-04-19 10956, 2017
zas
yes, much faster approach
2017-04-19 10927, 2017
bochecha has quit
2017-04-19 10924, 2017
zag has quit
2017-04-19 10918, 2017
d4rkie has quit
2017-04-19 10945, 2017
D4RK-PH0ENiX joined the channel
2017-04-19 10915, 2017
samj1912 joined the channel
2017-04-19 10944, 2017
ruaok wonders if logging into jira is something he could track as a fitness activity in Google Fit.
2017-04-19 10912, 2017
agentsim joined the channel
2017-04-19 10949, 2017
agentsim has quit
2017-04-19 10927, 2017
SothoTalKer joined the channel
2017-04-19 10902, 2017
Freso
Did anyone ever hear the complaint that "MB wants too much personal data during registration" before?
2017-04-19 10913, 2017
Quesito hopes ruaok knows how unreliable the data is from wearables....
2017-04-19 10920, 2017
SothoTalker_ has quit
2017-04-19 10954, 2017
Quesito
Freso: that has to be a cop out for I'm lazy....
2017-04-19 10955, 2017
ruaok
Quesito: yep, it is a suggestion at best.
2017-04-19 10911, 2017
Quesito
;)
2017-04-19 10925, 2017
SothoTalker_ joined the channel
2017-04-19 10942, 2017
ruaok
yeah, pure BS, Freso.
2017-04-19 10917, 2017
SothoTalKer has quit
2017-04-19 10947, 2017
alastairp
Freso: kind of interesting that those releases are in mb at all!
2017-04-19 10938, 2017
Freso
alastairp: :)
2017-04-19 10947, 2017
reosarevok
Freso: I guess some people might somehow think that all the stuff in the profile is mandatory?
2017-04-19 10957, 2017
reosarevok
I know a bunch of artists are confused about all the fields in the Add Artist page for example because they assume every single one is mandatory and they don't know what IPI is or something
2017-04-19 10921, 2017
Freso
Yeah.
2017-04-19 10927, 2017
reosarevok
(to be fair, "bolded label means mandatory" is not obvious nor explained)
2017-04-19 10948, 2017
Freso
Will IPIs and ISNIs be moved to attributes with the schema change? Or will that have to be done later down the line?
2017-04-19 10932, 2017
ruaok
crap. the metabrainz press corps realized that chocolate should be present in the office at all times.
for that piece of code, we're talking about a batch which is limited to the size of a block of listens.
2017-04-19 10926, 2017
ruaok
sub 100.
2017-04-19 10933, 2017
alastairp
Hmm
2017-04-19 10937, 2017
ruaok
ie. not worth talking about for the most part.
2017-04-19 10941, 2017
alastairp
Yeah
2017-04-19 10910, 2017
alastairp
Sounds like premature optimisation... My recommendation would be to optimise for readability not speed
2017-04-19 10923, 2017
ruaok
the bigger problem we have is that we do too much serializing/deserializing the incoming listens. and passing over them too many times to sanity check them.
2017-04-19 10940, 2017
alastairp
Especially considering the two timestamp conversions that you're doing in the same loop
2017-04-19 10945, 2017
ruaok
well, I've done a pile of stress tests to see what is going to be a problem.
2017-04-19 10901, 2017
alastairp
Yeah, we wondered about that when we first did it
2017-04-19 10908, 2017
alastairp
Even with ujson?
2017-04-19 10915, 2017
ruaok
the influx schema was a problem, but is much better now.
2017-04-19 10935, 2017
ruaok
yes, even with ujson. we're just doing too much stuff that needs streamlining.
2017-04-19 10900, 2017
ruaok
to the point where importing listens the sanity checking the incoming data is the major bottleneck now.
2017-04-19 10944, 2017
ruaok
which is why I stuffed tons of fake listens directly into rabbitmq to find problems.
2017-04-19 10902, 2017
alastairp
Right
2017-04-19 10918, 2017
ruaok
I'm now happy that we'll avoid the obvious walls and i have other, less pressing issues on my radar.
2017-04-19 10943, 2017
ruaok
but my focus still remains data integrity, which I'm almost happy with.
2017-04-19 10956, 2017
alastairp
This is things like 'validate_listens'?
2017-04-19 10908, 2017
alastairp
Or is it bigger than that?
2017-04-19 10914, 2017
ruaok
I need to run a test and make sure that all data that enters correctly ends up at BQ.
2017-04-19 10921, 2017
ruaok
yeah, mostly all that.
2017-04-19 10929, 2017
alastairp
I'll see if I can put it through a profiler to see where the bad parts are
2017-04-19 10938, 2017
alastairp
That's not very difficult
2017-04-19 10942, 2017
ruaok
it is too early for a profiler.
2017-04-19 10948, 2017
ruaok
there is just stupid shit going on. :)
2017-04-19 10913, 2017
ruaok
we can find problems by inspection, much like shooting fish in a barrel.
2017-04-19 10926, 2017
ruaok
but, this is easy to fix as opposed to getting the schema wrong.
2017-04-19 10928, 2017
alastairp
Sure. So if you can point out a stupid thing, we can fix it. But the profiler will tell us the same thing without having to wait for you
2017-04-19 10952, 2017
alastairp
We can inspect for loops and thing
2017-04-19 10909, 2017
ruaok
sure, if you want to do more premature optimization, go for it. :) :)
2017-04-19 10945, 2017
ruaok
I think in passing the listens around we do more than one conversion of the data and that is dumb.
2017-04-19 10956, 2017
Gentlecat
profiler actually helps you figure out if something is premature or not
2017-04-19 10922, 2017
ruaok
do premature optimization with a profiler to determine if optimization is premature?
2017-04-19 10925, 2017
ruaok
I like it, very meta. :)
2017-04-19 10933, 2017
ruaok
and recursive.
2017-04-19 10908, 2017
ruaok
in any case, I've got much more signficant issues to tackle for the time being, so I am going to focus on those
2017-04-19 10919, 2017
alastairp
Just had a look at validate. There are not many loops :(
2017-04-19 10920, 2017
ruaok
I know what is slow and that is sufficient for now.
2017-04-19 10940, 2017
ruaok
I think the slowness comes one level up.
2017-04-19 10941, 2017
alastairp
Uuid validation. Perhaps that takes some time
2017-04-19 10945, 2017
alastairp
Ah, ok
2017-04-19 10951, 2017
ruaok
parse for validate, then store.
2017-04-19 10957, 2017
D4RK-PH0ENiX joined the channel
2017-04-19 10959, 2017
ruaok
parse/convert again for dedeup
2017-04-19 10916, 2017
Gentlecat
if you don't know for sure what's slow then it's just guesswork
2017-04-19 10917, 2017
alastairp
Right, now I see where you're coming from
2017-04-19 10937, 2017
alastairp
That's definitely the right place to start, then
2017-04-19 10937, 2017
Gentlecat
and there are tools to help with that, so we should use them
2017-04-19 10938, 2017
ruaok
we need to review the code flow, not individual statements.
if I understand what you mean by "code flow" correctly
2017-04-19 10932, 2017
D4RK-PH0ENiX has quit
2017-04-19 10939, 2017
D4RK-PH0ENiX joined the channel
2017-04-19 10938, 2017
zas
ruaok: making a loop to add to a set() isn't the best idea, when you can just pass the iterable as set() argument: s = set() ; for x in r; s.add(x) vs s = set(r)
2017-04-19 10957, 2017
ruaok
zas: sure, but for this test it doesn't matter.
2017-04-19 10957, 2017
zas
but i don't know if it is actually faster or how much it is