in #musicbrainz-devel

18:26 PM
alastairp

db._connection.reset() :)
18:26 PM
I still don’t like how we’re looping through both lowlevel and highlevel_json
18:26 PM
it means we do twice as much work
18:26 PM
we can do lowlevel.id=highlevel.id, highlevel.data=highlevel_json.id
18:27 PM
Gentlecat

I didn't have much time for optimization yesterday :)
18:27 PM
alastairp

and so when we’ve decided that a lowevel is bad, automatically go to fix the hl json
18:27 PM
OK, no problem!
18:27 PM
do you think you can do that?
18:27 PM
I’ll continue testing this
18:28 PM
Gentlecat

in one update query?
18:29 PM
alastairp

well, perhaps do 2 update_row()s, but only get the candidate lowlevel ids
18:29 PM
then do another query to join into the highlevel_json table to get its id
18:30 PM
did I explain that clearly?
18:35 PM
diana_olhovik joined the channel
18:38 PM
Gentlecat

alastairp: updated https://gist.github.com/gentlecat/a07e7eaf224b6...
18:39 PM
alastairp

cool
18:39 PM
I think the hl row id is wrong
18:39 PM
highlevel.id = lowlevel.id
18:39 PM
Gentlecat

ohh
18:39 PM
alastairp

except the highlevel_data.id is different - it’s stored in highlevel.data
18:39 PM
Gentlecat

right
18:40 PM
alastairp

also, your string interpolation in cursor.execute doesn’t work
18:40 PM
i just fixed it on mine
18:40 PM
you need to do interpolation for the table name, but then second arguent to execute for actual data
18:40 PM
Gentlecat

right
18:41 PM
alastairp

Row #True is bad! Fixing...
18:41 PM
I guess that should be a number!
18:41 PM
but otherwise, great. it works here
18:41 PM
Gentlecat

without high-level stuff?
18:42 PM
alastairp

yeah
18:42 PM
cursor.execute("SELECT id FROM %s" % table)
18:42 PM
without the loop this has to be lowlevel again
18:42 PM
Gentlecat

just fixed that too
18:43 PM
alastairp

(y)
18:44 PM
Mineo

aren't you just getting all the ids from lowlevel in do_magic only to load the row itself in is_bad? any reason to not load both at once?
18:44 PM
oh
18:44 PM
Mineo is not completely awake
18:45 PM
carry on doing whatever awesome things you're doing :)
18:47 PM
alastairp

yes, postgres aborts the query if the json in the row is bad
18:48 PM
and python’s json module won’t say it’s bad, only postgres once you try and access it as a json field (rather than text)
18:48 PM
Gentlecat

alastairp: updated
18:49 PM
alastairp

hl_data_dict=purify(get_data_as_text("highlevel_json", row['id'])),
18:49 PM
still wrong :(
18:49 PM
sorry
18:49 PM
Gentlecat

ugh, forgot about this one
18:49 PM
alastairp

perhaps you could do the join to hl_json at the initial select
18:49 PM
so we don’t have to do a million small selects
18:50 PM
return json, sha256
18:50 PM
jason
18:50 PM
Gentlecat

I'll probably have to rewrite it completely then
18:51 PM
updated again
18:51 PM
alastairp

nah, just select ll.id, hl.data from ll join hl on hl.id=ll.id
18:51 PM
and pass both ids into update_rows
18:51 PM
Gentlecat

oh, there https://gist.github.com/gentlecat/a07e7eaf224b6...
18:51 PM
alastairp

you could go back to update_row(table, data, id) in this case
18:51 PM
yeah
18:51 PM
Gentlecat

right
18:57 PM
alastairp: try again
19:02 PM
alastairp

cool. working!
19:02 PM
a few small things I had to fix with % arguments to execute
19:10 PM
hmm, weird
19:10 PM
KeyError: 'metadata'
19:11 PM
Gentlecat

metadata is missing?
19:11 PM
how is that possible
19:13 PM
alastairp

ah, interesting
19:13 PM
so
19:13 PM
if the highlevel extractor can’t compute anything, it inserts {} into highlevel_json
19:13 PM
in this example, it couldn’t
19:14 PM
Gentlecat

why would it do that?
19:14 PM
alastairp

I wonder if our extractor is as strict as postgres
19:14 PM
and this is why it failed
19:14 PM
why would what do what?
19:14 PM
extractor insert {}, or fail?
19:14 PM
Gentlecat

insert empty json
19:15 PM
MBJenkins

dufferzafar0: Add missing jsonify import
19:15 PM
Gentlecat

to prevent itself from running on the same row again?
19:15 PM
alastairp

yep
19:15 PM
exactly
19:15 PM
Gentlecat

ok
19:16 PM
alastairp

ok, I’ll just replace it with .get(‘’, {})
19:19 PM
uh oh
19:19 PM
I have this really funny feeling
19:19 PM
that we only have 1 bad row ;)
19:20 PM
Gentlecat

fun
19:20 PM
alastairp

oh, no. I got another one
19:28 PM
hmm, but only 1 more it seems
19:30 PM
Gentlecat: cool. we seem to be ready
19:30 PM
thanks for your work
19:30 PM
Gentlecat

exciting!
19:31 PM
now what exactly are we ready for? :)
19:31 PM
ruaok

LOL
19:31 PM
heh. :)
19:32 PM
alastairp

hah
19:33 PM
so, we have this paper for a conference
19:33 PM
and I wrote all this stuff about how we had 1 million tracks
19:33 PM
and then the paper was accepted, and the final version is due tomorrow
19:34 PM
Mineo

and now we suddenly have nearly 2 million tracks!
19:34 PM
Gentlecat

do you actually need to provide all the data with the paper?
19:34 PM
alastairp

Mineo: bingo!
19:35 PM
Mineo

(I can't actually check how many there are at the moment because abz.org ISEs)
19:35 PM
alastairp

some of our stats are to do with the number of unique items in the metadata
19:35 PM
Gentlecat

uh oh
19:35 PM
alastairp

which we need to parse the json for
19:35 PM
uh oh
19:35 PM
what did I do?
19:35 PM
did we just delete 700k items?
19:35 PM
Gentlecat

yay
19:36 PM
restart uwsgi?
19:36 PM
check logs I guess
19:36 PM
alastairp

ok, better. it has a database connection
19:38 PM
I just keep seeing connection already closed
19:38 PM
even after restarting wsgi
19:38 PM
Gentlecat

ohhh
19:38 PM
we might need to update paths to high-level extractor too
19:39 PM
but I've got no idea how we run it there
19:39 PM
alastairp

sure, but that shouldn’t affect the database, right?
19:39 PM
yeah, I can do that
19:39 PM
I mean, it doesn’t affect the website
19:39 PM
Gentlecat

well it was running after update has been deployed
19:39 PM
alastairp

yeah
19:40 PM
Gentlecat

I even saw new submissions
19:40 PM
alastairp

and then I played with the database
19:40 PM
Gentlecat

with high-level data
19:40 PM
alastairp

right
19:40 PM
Gentlecat

what's in uwsgi logs?
19:40 PM
alastairp

that’ll be because the program was already running
19:40 PM
just connection closed
19:41 PM
Gentlecat

what if you just try to start server manually?
19:41 PM
from manage.py
19:42 PM
alastairp

weird
19:42 PM
lost synchronization with server: got message type ...
19:43 PM
this is a postgres error
19:43 PM
I’ve /never/ seen it before
19:43 PM
Gentlecat

hm
19:43 PM
alastairp

http://stackoverflow.com/questions/26273328/ran...
19:45 PM
OK. I set that setting
19:45 PM
but it’s really slow
19:47 PM
better now. maybe it was just postgres being sluggish
20:20 PM
weird. I’m doing a dump, and it’s stuck on incremental_dumps table
20:20 PM
that seems a weird table to be stuck on
20:22 PM
oh, then there’s that thing where pxz is using 1000% cpu
20:34 PM
Gentlecat

what do you mean stuck?
20:38 PM
alastairp

well, it looked like it was doing nothing
20:38 PM
but it just seems like it’s streaming 600k items in to an xz file
20:38 PM
no problem at all :)
20:41 PM
Freso

alastairp | did we just delete 700k items? — XD
20:45 PM
alastairp

it’s ok. we didn’t. I’m just doing the backup now, *after* I destructively edited the database
20:45 PM
everything is under control
20:47 PM
MBJenkins

* Michael Wiencek: Fix npm warning about knockout-arraytransforms
20:47 PM
* Michael Wiencek: Replace deprecated react-tools with babel