alastairp: i worked ahead on the download dataset feature and added chunking. worked nice for ~5K recordings dataset. now trying it on a ~55K recording dataset.
2021-06-03 15452, 2021
ruaok
moooin!
2021-06-03 15404, 2021
ruaok chuckles at expertsexchange
2021-06-03 15406, 2021
lucifer
it downloaded 2 GB and then my internet connection gave up. but this probably means that the feature is working because we load the entire zip first and then send it for download.
2021-06-03 15406, 2021
ruaok
that goof was made by my former comp sci professor. and then one of my first bosses was the guy who leaked the DeCSS key. I've had an interesting professional life. :)
2021-06-03 15430, 2021
lucifer
lol 😆
2021-06-03 15449, 2021
ruaok
zas: question about grafana graphs/influx queries for when you have a moment.
I see you have missing data, are they submitted every minute?
2021-06-03 15416, 2021
lucifer
yup, very nice :D
2021-06-03 15451, 2021
ruaok
should be. all graphs have the same missing data points. not sure why.
2021-06-03 15450, 2021
ruaok
for smoothing do you normally use moving average, zas? how many seconds?
2021-06-03 15432, 2021
lucifer
the interval drop down on the top right is not working for me. does it work for you?
2021-06-03 15432, 2021
ruaok
lucifer: the one with the arrows is how frequently it should update. the one with the clock is the range to see. which one are you referring to?
2021-06-03 15444, 2021
lucifer
the one with the arrow.
2021-06-03 15404, 2021
ruaok
does it not auto-update for you then?
2021-06-03 15425, 2021
lucifer
no its not auto updating either. looks like another missing minute.
2021-06-03 15432, 2021
zas
mov avg uses a number of values, so if you do mov avg(5) and you have data every min, it will be over 5 mins
2021-06-03 15414, 2021
ruaok
ah, values. I see.
2021-06-03 15430, 2021
ruaok
lucifer: sometimes it helps to close the tab and start over in a new one.
2021-06-03 15448, 2021
alastairp
lucifer: cool, that's great that bulk download seems to work. Can I try it myself (maybe I have a faster connection)
2021-06-03 15403, 2021
zas
don't over smooth, as it will hide problems ;)
2021-06-03 15414, 2021
lucifer
ruaok: right sorry, was network issue on my end. works now.
2021-06-03 15415, 2021
ruaok
what value to you normally use?
2021-06-03 15422, 2021
alastairp
but also, perhaps we could put a lower limit (5k? 10k?) over which you can't download
2021-06-03 15441, 2021
lucifer
alastairp: yes. you can try it out, its up on similarity.ab
2021-06-03 15453, 2021
zas
I don't smooth most of times, or just a little so 5 or 10 max
2021-06-03 15445, 2021
ruaok moves it back to 5
2021-06-03 15446, 2021
zas
another approach
2021-06-03 15446, 2021
lucifer
alastairp: i think it makes sense to put a limit, currently i have increased nginx timeout limits to avoid gateway timeouts but not sure we want to do it in production.
lucifer: yeah, exactly. for now we want to try avoid that timeout. what did you increase it to?
2021-06-03 15415, 2021
zas
see variables definition in dashboard config
2021-06-03 15431, 2021
zas
and look at group by time() in queries
2021-06-03 15445, 2021
zas
imho that's a better way to smooth out the data
2021-06-03 15447, 2021
alastairp
and I guess it's also related to user download speed, not just how quickly we can select and compress it
2021-06-03 15414, 2021
lucifer
alastairp: 1d :p, i think but a ~10 min timeout would work as well.
2021-06-03 15424, 2021
alastairp
most datasets have very few items anyway (less than 1000, maybe less than 100 too).
2021-06-03 15429, 2021
alastairp
ah, I was thinking 60s
2021-06-03 15403, 2021
zas
by default it is on "auto", which basically uses an interval based on current time window displayed
2021-06-03 15406, 2021
alastairp
zas: what is connection timeout on openresty gateways?
2021-06-03 15415, 2021
lucifer
60s is the default, it timed out on the 5k recordings test.
2021-06-03 15436, 2021
lucifer
so i increased it arbitarily to be able to test the download
2021-06-03 15445, 2021
ruaok
zas: the name in the example you showed is "interva" not "interval". intentional?
2021-06-03 15401, 2021
zas
yes
2021-06-03 15409, 2021
ruaok
dark. magic.
2021-06-03 15416, 2021
alastairp
as we discussed this idea of background processing, we should just release this current feature with a low limit, and use background processing in the future
alastairp: connection timeout is rather vague, we have millions of settings. what do you want to know exactly?
2021-06-03 15445, 2021
zas
which query? which error?
2021-06-03 15446, 2021
alastairp
lucifer: what setting did you change?
2021-06-03 15419, 2021
lucifer
uwsgi_read_timeout on nginx.
2021-06-03 15406, 2021
ruaok
zas: and then group by time and select a larger interval, zas? you dont have that in the example...
2021-06-03 15416, 2021
lucifer
proxy_read_timeout, proxy_connect_timeout and proxy_send_timeout on nginx-proxy.
2021-06-03 15429, 2021
zas
ruaok: what do you mean?
2021-06-03 15446, 2021
alastairp
lucifer: ah, right. do you know how those items interact with each other? are they all needed?
2021-06-03 15452, 2021
alastairp
zas: we have a view in acousticbrainz that may take a long time to finish (it reads data from the database and generates a zip). a value for uwsgi_read_timeout of 60s is too short
ruaok: sometimes you need to save/reload the dashboard
2021-06-03 15424, 2021
ruaok
just did that, no change.
2021-06-03 15425, 2021
alastairp
lucifer: ok, no problem. let's wait until zas and ruaok finish and see if we can look into this more
2021-06-03 15431, 2021
lucifer
sure, thanks!
2021-06-03 15449, 2021
ruaok
shan't be long.
2021-06-03 15455, 2021
ruaok
not showing up, zas. :(
2021-06-03 15425, 2021
zas
alastairp: I'm not sure that's the correct approach to fix this issue, what if your zip takes 2 hours to generate, we keep connections open, and gateways will soon be out of resources
2021-06-03 15423, 2021
alastairp
zas: yes, understood. to minimise this issue we're going to set a maximum number of items that you can compress. we need to do some experiments to decide what the good limit is
2021-06-03 15437, 2021
zas
alastairp: really, I don't think that's a good idea to increase timeouts (60s to answer an HTTP query is already very looong)
2021-06-03 15450, 2021
alastairp
we have a future plan that we will have a background process to generate the zip - you will send a request and come back when it's been generated and download it
2021-06-03 15455, 2021
alastairp
but that's not ready yet
2021-06-03 15423, 2021
ruaok
alastairp: yes, indeed, increasing the timeouts will lead to disaster.
2021-06-03 15451, 2021
alastairp
so can I get an answer to my initial question - what are the timeouts currently?
2021-06-03 15457, 2021
zas
60s
2021-06-03 15418, 2021
alastairp
ok great. I'm happy to work within 60s for now
2021-06-03 15418, 2021
lucifer
cool, so let's set a limit on recording that can be zipped in 60s?
2021-06-03 15442, 2021
ruaok
thats also not a good idea. if the server is loaded that goal post moves.
2021-06-03 15401, 2021
ruaok
sadly, this is a really tricky problem to solve.
2021-06-03 15432, 2021
lucifer
also we need to solve this for LB as well, currently we export user listens in the same way.
2021-06-03 15433, 2021
alastairp
yeah, there's not much we can do at the moment here. We have a PR open which is useful in a small number of situations so I'd like to get it merged
2021-06-03 15452, 2021
alastairp
we've already talked about adding in background processing - as lucifer says it's useful for LB as well
lucifer: and you can see that the mbid writer is missing writing data -- the timescale writer is not missing data points, so the metric system is working as we expect.