I was working on making the scripts independent. I don't think we should still keep the htmls bidirectional.
the HTMLs shall be independent, as the parent files.
no?
ruaok
Is it more difficult to keep them bidirectional?
pristine__
umm....how will we pass HTML file names to scripts?
ruaok
Aren't we going to have job UUIDs? Use those for filenames and links
pristine__
for instance the first script, create_dataframes.py
I do 'queries-%s.html.format(uuid4)'
so how will the script train_models.py know this filename to pass it to html template?
I think I am missing somethin?
Gazooo has quit
ruaok
Hmmm. I guess not. But then they can't be linked at all, no?
It's not that important, go ahead without it if it is a problem.
Gazooo joined the channel
pristine__
ruaok: yeah. What I will do is, include necessary information in all html files so that the reader have all the info to understand that particular file.
And should I tell you how the lookup time reduced?
ruaok
Please do!
ruaok is about to go AFK for the day
reosarevok
bitmap, yvanzo : does the ko issue GitHub is alerting about affect us?
pristine__
ruaok: we were using Rdds for lookup. To effectively use Rdds we must parallelize them which I still need to understand. Also, RDDs are slow. So we used dataframe for lookup (a dataframe by default has 200 partitions) and they are fast.
zas: I have to make deep changes to SEC workflow, did you set up anything I should preserve?
alastairp
and all functionality will be made free
zas
yvanzo: like ?
reosarevok
alastairp: but it can't automatically fix any bugs resulting from these dependency upgrades, right? Not sure what the usefulness is then :/
yvanzo
reosarevok, alastairp: in this specific case, we use a custom version of KO 2 with backported patches maintained by bitmap.
zas: I just noticed SEC has a unique workflow compared to other projects.
alastairp
reosarevok: when it was bought up the last time I commented that I didn't really like automatic tools for reasons similar to that. I was just copypasting from HN
zas
ah, yes, you can change it if needed
yvanzo
zas: For example, what distinguish Approved from Done?
alastairp
having said that, it makes a PR, rather than directly commiting the change, so tests can find bugs, and they can be fixed before merging
yvanzo
zas: Ok, I mostly need clearer statuses and direct transitions from Open to any.
alastairp: There is also an “Automated security fixes” option in GitHub.
That creates PRs too.
This is beta though.
Freso
alastairp: Related: https://dependabot.com/blog/gemnasium/ : "Finally, for us, Gemnasium's blog post is a warning of what can happen to businesses in a platform ecosystem. We believe Dependabot adds a lot of value over GitHub's dependency graph, and over Gemnasium, but if GitHub were to replicate our functionality they would likely crush us. We don't believe that's in their interest, but are staying as close to them as possible." :)
yvanzo
reosarevok, bitmap: I incidentally put mbs in beta of automated security fixes, PRs are disabled but I did not find how to remove that new tab :/
reosarevok: we have root/release/caa_darkened.tt but I guess it's not working for some reason (haven't looked)
btw, if you have any technical details about those radio platforms not getting cover art, can you forward them to me?
aidanlw17
alastairp: I’m free to meet when you are
bitmap
reosarevok, yvanzo: I can backport the knockout patch so we can dismiss that alert
I tried upgrading before and it broke too much stuff iirc
so I'd rather focus on slowly removing it like jquery
yvanzo nods
alastairp
ferbncode: is there a test that would have picked up that problem on the homepage? if not, then it'd be good that spellew writes a test for that too
it looks like there's no CI in CB too, if you need help setting that up then I'm sure someone could help you with it. iliekcomputers should be around next week
aidanlw17: hi, how are you?
ferbncode
alastairp: right, I ran tests in my local setup and they all passed, there should be a test. I'll add it to the ticket.
sentry reports error in CB prod, and there is Jenkins that run tests, I'll ping iliekcomputers for deployment once it's fixed 👍
aidanlw17
alastairp: good! How are you?
alastairp
yeah, the issue with sentry of course is that it only notifies to admins once and will swallow subsequent occurrences of errors
ferbncode: do you receive error messages from sentry for CB? If not, perhaps that might be something that we should enable
aidanlw17: good, but the week's not finished yet
did you try and import the data dump?
ferbncode
alastairp: I receive error messages for CB from sentry.
I'll be more proactive and keep an eye on them 😅
alastairp
ferbncode: we have integration in sentry to create tickets on jira from the sentry page
aidanlw17
alastairp: I did try and import the data dump, but I haven't been successful.. I've used `./develop.sh run --rm webserver python2 manage.py import_data path_to_the_archive` and `./develop.sh run --rm webserver python2 manage.py init_db --force path_to_the_archive` before to import smaller archives that I have made, but those were .tar.xz files and the .sql.bz2 doesn't work with that, is that correct?
After trying that, I tried to copy the dump into the docker postgres container volume and use pg_restore
alastairp
ah, sorry that I didn't tell you how to import it
right, so. import_data isn't correct, it's a postgres database dump
but pg_restore is only for binary dumps, and this is a text dump
so you should be able to run `bzcat [thefile] | psql`
aidanlw17: I found something else while reading last week too: from https://github.com/spotify/annoy - "another feature that really sets Annoy apart: it has the ability to use static files as indexes. In particular, this means you can share index across processes."
this is interesting. it means that the lack of ability to update the index is a _feature_ of annoy. it's a tradeoff that they made to allow multiple processes to read the same index file
so we should definitely see if we also need this tradeoff - either we want to do this, or it might be more important for us to update easily. I still think that it's OK that we start with annoy, but if we see our requirements change we might want to look at this again
bitmap
yvanzo: I guess there's no "In Review" step for SEC?
also not sure what the difference between Resolve and Close is
alastairp
if it's the same workflow as AB, resolve turns it into "patch sent", and close turns it into "fixed"
(or closed for any other reason, etc)
bitmap
hmm. here they both open the same screen and let you pick the resolution
aidanlw17
alastairp: interesting, I had previously read about its use of static files but I didn't make the connection to the update functionality
alastairp
bitmap: yes, but the state after you click ok will be different
yvanzo
bitmap: just update alert status on GitHub
bitmap: we should probably have an MBS ticket for that
aidanlw17
alastairp: So it could be helpful for us in terms of using the index in multiple ways at the same time, but if our update mechanism doesn't work we could look for something that updates rather than allowing multiple processes?
alastairp
right. I'm still a little unclear about what our final result of this analysis will be
for example, if the annoy index is small enough, perhaps we could just distribute this to people who want to be able to compute their own similarity?
but I have no idea what the size will be until we make it
aidanlw17
Yeah that is unclear
alastairp
something that allows multiple access would be great if we wanted to make an API endpoint that uses it
aidanlw17
Do you mean make the index distributable rather than creating the api endpoints that use it?
alastairp
but if a separate tool returns responses fast enough, perhaps it doesn't matter if we can only make one query at a time
right, exactly
aidanlw17
Yeah I see
alastairp
we currently have two possible use-cases that I'm aware of: "what are some similar items to mbid x" and "what are some similar items to [this lowlevel file]"