-
alastairp
db._connection.reset() :)
-
I still don’t like how we’re looping through both lowlevel and highlevel_json
-
it means we do twice as much work
-
we can do lowlevel.id=highlevel.id, highlevel.data=highlevel_json.id
-
Gentlecat
I didn't have much time for optimization yesterday :)
-
alastairp
and so when we’ve decided that a lowevel is bad, automatically go to fix the hl json
-
OK, no problem!
-
do you think you can do that?
-
I’ll continue testing this
-
Gentlecat
in one update query?
-
alastairp
well, perhaps do 2 update_row()s, but only get the candidate lowlevel ids
-
then do another query to join into the highlevel_json table to get its id
-
did I explain that clearly?
-
diana_olhovik joined the channel
-
Gentlecat
-
alastairp
cool
-
I think the hl row id is wrong
-
highlevel.id = lowlevel.id
-
Gentlecat
ohh
-
alastairp
except the highlevel_data.id is different - it’s stored in highlevel.data
-
Gentlecat
right
-
alastairp
also, your string interpolation in cursor.execute doesn’t work
-
i just fixed it on mine
-
you need to do interpolation for the table name, but then second arguent to execute for actual data
-
Gentlecat
right
-
alastairp
Row #True is bad! Fixing...
-
I guess that should be a number!
-
but otherwise, great. it works here
-
Gentlecat
without high-level stuff?
-
alastairp
yeah
-
cursor.execute("SELECT id FROM %s" % table)
-
without the loop this has to be lowlevel again
-
Gentlecat
just fixed that too
-
alastairp
(y)
-
Mineo
aren't you just getting all the ids from lowlevel in do_magic only to load the row itself in is_bad? any reason to not load both at once?
-
oh
-
Mineo is not completely awake
-
carry on doing whatever awesome things you're doing :)
-
alastairp
yes, postgres aborts the query if the json in the row is bad
-
and python’s json module won’t say it’s bad, only postgres once you try and access it as a json field (rather than text)
-
Gentlecat
alastairp: updated
-
alastairp
hl_data_dict=purify(get_data_as_text("highlevel_json", row['id'])),
-
still wrong :(
-
sorry
-
Gentlecat
ugh, forgot about this one
-
alastairp
perhaps you could do the join to hl_json at the initial select
-
so we don’t have to do a million small selects
-
return json, sha256
-
jason
-
Gentlecat
I'll probably have to rewrite it completely then
-
updated again
-
alastairp
nah, just select ll.id, hl.data from ll join hl on hl.id=ll.id
-
and pass both ids into update_rows
-
Gentlecat
-
alastairp
you could go back to update_row(table, data, id) in this case
-
yeah
-
Gentlecat
right
-
alastairp: try again
-
alastairp
cool. working!
-
a few small things I had to fix with % arguments to execute
-
hmm, weird
-
KeyError: 'metadata'
-
Gentlecat
metadata is missing?
-
how is that possible
-
alastairp
ah, interesting
-
so
-
if the highlevel extractor can’t compute anything, it inserts {} into highlevel_json
-
in this example, it couldn’t
-
Gentlecat
why would it do that?
-
alastairp
I wonder if our extractor is as strict as postgres
-
and this is why it failed
-
why would what do what?
-
extractor insert {}, or fail?
-
Gentlecat
insert empty json
-
MBJenkins
dufferzafar0: Add missing jsonify import
-
Gentlecat
to prevent itself from running on the same row again?
-
alastairp
yep
-
exactly
-
Gentlecat
ok
-
alastairp
ok, I’ll just replace it with .get(‘’, {})
-
uh oh
-
I have this really funny feeling
-
that we only have 1 bad row ;)
-
Gentlecat
fun
-
alastairp
oh, no. I got another one
-
hmm, but only 1 more it seems
-
Gentlecat: cool. we seem to be ready
-
thanks for your work
-
Gentlecat
exciting!
-
now what exactly are we ready for? :)
-
ruaok
LOL
-
heh. :)
-
alastairp
hah
-
so, we have this paper for a conference
-
and I wrote all this stuff about how we had 1 million tracks
-
and then the paper was accepted, and the final version is due tomorrow
-
Mineo
and now we suddenly have nearly 2 million tracks!
-
Gentlecat
do you actually need to provide all the data with the paper?
-
alastairp
Mineo: bingo!
-
Mineo
(I can't actually check how many there are at the moment because
abz.org ISEs)
-
alastairp
some of our stats are to do with the number of unique items in the metadata
-
Gentlecat
uh oh
-
alastairp
which we need to parse the json for
-
uh oh
-
what did I do?
-
did we just delete 700k items?
-
Gentlecat
yay
-
restart uwsgi?
-
check logs I guess
-
alastairp
ok, better. it has a database connection
-
I just keep seeing connection already closed
-
even after restarting wsgi
-
Gentlecat
ohhh
-
we might need to update paths to high-level extractor too
-
but I've got no idea how we run it there
-
alastairp
sure, but that shouldn’t affect the database, right?
-
yeah, I can do that
-
I mean, it doesn’t affect the website
-
Gentlecat
well it was running after update has been deployed
-
alastairp
yeah
-
Gentlecat
I even saw new submissions
-
alastairp
and then I played with the database
-
Gentlecat
with high-level data
-
alastairp
right
-
Gentlecat
what's in uwsgi logs?
-
alastairp
that’ll be because the program was already running
-
just connection closed
-
Gentlecat
what if you just try to start server manually?
-
from manage.py
-
alastairp
weird
-
lost synchronization with server: got message type ...
-
this is a postgres error
-
I’ve /never/ seen it before
-
Gentlecat
hm
-
alastairp
-
OK. I set that setting
-
but it’s really slow
-
better now. maybe it was just postgres being sluggish
-
weird. I’m doing a dump, and it’s stuck on incremental_dumps table
-
that seems a weird table to be stuck on
-
oh, then there’s that thing where pxz is using 1000% cpu
-
Gentlecat
what do you mean stuck?
-
alastairp
well, it looked like it was doing nothing
-
but it just seems like it’s streaming 600k items in to an xz file
-
no problem at all :)
-
Freso
alastairp | did we just delete 700k items? — XD
-
alastairp
it’s ok. we didn’t. I’m just doing the backup now, *after* I destructively edited the database
-
everything is under control
-
MBJenkins
* Michael Wiencek: Fix npm warning about knockout-arraytransforms
-
* Michael Wiencek: Replace deprecated react-tools with babel