_lucifer: is error_message a string, or a dictionary?
_lucifer
string
alastairp
so in this case you're only using json.dumps to add quotes to it?
_lucifer
yes, i was. but format thing worked as well so i'll go ahead with it,
alastairp
I've done this before where I use json.JSONEncoder().encode(val)
because, consider:
s = 'foo"bar'; json.dumps(s)
'"foo\\"bar"'
_lucifer
ah! makes sense.
ruaok
alastairp: got a sec for a sanity check?
alastairp
give me 5, just fighting with haproxy
ruaok
perfect. it will take me 5 minutes to explain. ;D
_lucifer
is there any difference between using dumps or the encoder?
ruaok
the listen count are very expensive to calculate and when we first push this branch into production, the site will appear effectively broken because nothing is loading.
to mitigate that, I plan to write a script that:
1) Picks the last inserted timestamp from the listen table.
2) For each user, set a zero listen count, zero timestamp.
3) iterate over every listen from begin of time to this timestamp. tabulate listen counts and time.
_lucifer
how much expensive are we talking about?
ruaok
4) At the end of this script INCREMENT each of the users counts by the calculated total. Update timestamps to ensure that calculated timestamps won't overwrite any timestamps that may have been calculated since this process started.
This process allows for the timescale_writer to keep writing new listens and for the update script and the timescale writer to coexist peacefully.
In theory we should catch all the listens as they come in.
thoughts?
_lucifer: I saw one take 85s.
_lucifer
😵
alastairp
here
ruaok
exactly, which is why I really want to stop computing them.
alastairp
yeah, doing a script to pre-compute these things makes a lot of sense. does this mean we'll have to make a deploy with the infrastructure available but disabled? or will you be able to do it before the big deploy?
tabulate listen counts and time - this will just find min/max time and num listens for each user?
ruaok
> tabulate listen counts and time - this will just find min/max time and num listens for each user?
yes
> or will you be able to do it before the big deploy?
I think that if we deploy the timescale_writer first, then we can run the tabulate script, then push the web container, then I think the right thing will happen.
More correctly:
1. tabulate.
2. deploy timescale writer
3. tabulate again
4. deploy web
then the release should be seamless.
alastairp
what service updates max and num listens when necessary? timescale writer?
ruaok
was timescale writer, but a lot of that logic has moved to timescale listenstore.
where it belongs.
alastairp
but the writer container, not the web container?
ruaok
there is a lot of really needed cleanup in this PR.
alastairp
do you need any special code running in a container in order to do 1 ?
ruaok
yes,its runs in the writer container with no special code needed.
_lucifer
beta, and prod share the cache container so we could run the script in beta if needed?
ruaok
just python module speaking that logic moved to the listenstore.
_lucifer: yeah, sure. good idea.
alastairp
ruaok: yeah, that's what I was trying to get at. running in beta should be fine
ruaok
the key being, the writer needs to be the first thing to move as part of the deployment.
you can test all this one test.lb right now.
load your feed page to find out what I mean.
it will appear broken, I assure you.
alastairp
what's the purpose of 3? because of stuff that might come in between tabulate and when we shut down the old timescale writer?
ruaok
#1 is needed for #2 to work right. #3 is the part that does the actual work.
I suppose #1 could be reduced to "set all redis keys to 0."
but that is more code to write, lol
alastairp
ah, I follow now
_lucifer
another thing, how about getting the list of all users and then querying the api in beta container for all users.
instead of writing another script for the same task.
ruaok
that could work, but it would tie up the DB for hours. and it would make N passes over the data, as opposed to 1.
and this other script I am talking about, all its pieces already exist.
its just a matter of conjuring them into one script.
there's definitely something to be said for just sshing into a computer and running your webserver in a screen
_lucifer
nice but why the move to kubes?
alastairp
because our IT department supports it
_lucifer
ah! :D
yvanzo
I would love to read more your experience about it :)
alastairp
🤮
lol
nah, it's not too bad. we're lucky that IT gave us a set of templates to copy and fill out.
yvanzo
:D
alastairp
there are a lot of moving parts, and coming into it with no knowledge about how everything fits together there was a lot of guessing. I'm sure that as we migrate more services to it we'll come to understand better how everything works