No, I don't either, I'd like to have something more like Wikipedia's talk pages, but that's more complex, so I've left it out for now
Leftmost
I'm still a little concerned that an annotations table would balloon, and I think we should treat disambig like other data since it should be smallish.
LordSputnik
Storing them separately should reduce the amount of space they use, rather than increase it
Leftmost
Yeah.
LordSputnik
The size of those tables will almost certainly be smaller than the size of the Entity table, since most entities won't have them
Leftmost
My current line of thinking is annotation as ID, disambig as inline data, encourage people to think of annotations as a stopgap for storing information our schema can't yet incorporate and that may be lost down the road if we move to a better system.
LordSputnik
(and the ids are set to null in the EntityTree if there's no disambig/annotation)
Leftmost
I'm reluctant to commit to storing an annotations table forever, as something about it makes me sad inside.
Does that seem reasonable?
LordSputnik
I still don't see the benefit in storing disambiguations inline, I'm afraid
I don't think we gain from it, and we increase the amount of memory needed
Leftmost
I see it as a rehash of the names table in MB, which proved to be a pain to maintain and turned out to be hard to define semantically. I think we should be storing names inline and I don't think that disambig is materially different from a name as it relates to storage.
Gentlecat has left the channel
We're committing to a certain amount of denormalization with the data table as it stands.
LordSputnik
How so?
I thought things were reasonable normalized
Leftmost
Multiple data tables can contain substantially similar data referring to the same entity with only minor changes, and that's intentional.
LordSputnik
Well, it's not redundant
The information is in the combinations of aliases/annotation/disambiguation/data
There's no way of storing that which is more efficient than foreign keys to separate tables, and no way we can reduce the duplication without losing the information
ocharles_: kuno: ping, if you're around to give some database-fu
Leftmost
LordSputnik, it also creates a separate join to a generic table which can have an arbitrarily large set of data, which has caused its own set of problems with edits on MB.
The table will never be as large as the edits table, but it carries a performance cost and stores entity-specific information in a generic table.
LordSputnik
The EntityData table?
Leftmost
No, the disambiguation table.
kuno
graph databases!
Leftmost
We don't store end dates in a separate table referenced by ID because end dates are small and it creates overhead to do a lookup.
Disambig is larger but still relatively small, should still have an upper limit on size, and would also create lookup overhead.
Storing per-entity information in a generic table seems worse to me than small duplications of data.
LordSputnik
Oh, so you're suggesting we merge the Disambiguation and EntityData tables
Leftmost
Yes.
LordSputnik
That makes more sense
I thought you meant the Disambiguation and EntityTree tables
Leftmost
Oh, sorry. I think I was being really unclear about that.
LordSputnik
To be honest, I'm not 100% sure we need to separate EntityTree and EntityData, but I wanted to hear from ocharles_ before I thought any more about that
Since EntityTree is just an entity-type independent way of storing pointers to other tables, and there's no reason EntityData couldn't do that
Leftmost
Here's what I'm thinking, high-level, in terms of data structures:
LordSputnik
So, I'd suggest we keep that as it is for now, until I've discussed that with ocharles_
Oh go on
Leo_Verto
oshit
Leftmost
An Entity is a structure with an ID, a type, a rooted tree, and a node pointer associated with it. Each node of the rooted tree is a revision, which contains an edit note, a date, a pointer to a parent node, and a pointer to the data which makes up that revision. The node pointer on the entity points to the current master revision node.
Leo_Verto
restarting the entire network right now
LordSputnik
Leo_Verto: yeah I've seen :P
Leftmost: let me draw that
mb-chat-logger joined the channel
Leftmost: ok, where are aliases?
Leftmost
Not sure offhand. That was a very high-level picture and I'm still not used to the idea of putting aliases in a separate table.
LordSputnik
Ok, so, it doesn't group revisions together at all
Leftmost
No, only by parent.
LordSputnik
But if we're not voting I guess we don't really need edits to group things?
Leftmost
Application of an edit is accomplished by moving the master revision pointer.
LordSputnik
Oh, I mean, in the current schema, edits are groups of revisions which get applied together
Leftmost
Right, I meant in my schema.
LordSputnik
Ok, I think this would work, but we still have the issue of storing different data for different entity types
Leftmost
Not necessarily. Our entity structure can store type without using a typed tree.
Each EntityData structure is associated with only a single Entity, so everything up to that point can be generic.
LordSputnik
Well, what would the EntityData for each revision contain?
Leftmost
Any type-agnostic revision information and a pointer to a type-specific data struct.
LordSputnik
So, like an EntityTree is now?
Leftmost
Yes, similar.
LordSputnik
So, the main difference is in the organisation of revisions, then
(and also presumably storing annotations and disambiguations inline in the entity-type agnostic data)
Leftmost
Yeah.
LordSputnik
Right, I'll think about that
Let's go onto the final few topics
SSH
Leo_Verto
yeah
LordSputnik
Leftmost: in order to set you up on the bb.org sandbox, I need the public SSH keys for the PCs you want to connect from (there's no password authentication afaik)
So I guess the best way to do that would be in an email
Ok, next is Search (who wanted to talk about that?)
Leftmost
I did.
LordSputnik
btw Leo_Verto: if you wanted SSH access to the bookbrainz server, we can probably sort that out too (although I should probably check with ruaok first)
Leftmost
I've looked into it and it seems that solr and elasticsearch largely comes down to preference. I've got some experience with elasticsearch. It's dead easy to use and dead easy to get running. Any objection to moving forward with an ES search implementation?
LordSputnik
No objection here
Leo_Verto
probably not essential but could be useful in certain situations
LordSputnik
Ok, I'll ask him next Monday, when he's not so busy (hopefully)
Leftmost: either way I'd have to learn one or the other, and if it's easy to set up, we can probably switch between them without too much fuss if we really need to
Leftmost
LordSputnik, public key sent.
Leo_Verto
*decrypting transmission*
LordSputnik
Leftmost: ok, I'll see what I can do when I have a proper internet connection again
Finally, Setup (also you, Leftmost?) :)
Leftmost
Yep.
Just a general note that we should work on making setup easier. I don't know if I mentioned that yesterday.
LordSputnik
nope
What bits particularly?
I thought it was fairly straightforward (especially compared to MB)
Leo_Verto
mhm
Leftmost
It wasn't difficult or confusing, but I think in general it could be smoother.
Leo_Verto
the frontend setup is pretty well documented by now
LordSputnik
so, currently, you have to install bbschema, install postgres and redis, then clone bbws
Install dependencies with pip, then launch the ws
Leo_Verto
oh yeah, installing redis is entirely undocumented
Leftmost
The READMEs for each component should include a short setup howto and any system reqs, I'd say, and -schema or -ws should be able to set up the database with only a user/password to its name.
LordSputnik
Clone the site, npm install the dependencies, compile the javascript, then launch the site
Ok, currently database setup is done in a separate script in bbschema
Leftmost
I got it set up without difficulty, just things that may be worth keeping in mind.
LordSputnik
we could move that somewhere else, maybe, or try to integrate it with the ws config file
Leo_Verto
if we get the config system done, we could provide configs for having a whole local setup or just working on the site using the bb.org ws
LordSputnik
(or have a separate bbschema config file containing the database settings?)
Leftmost
It probably belongs in -ws if -schema is just a lib for interacting purely with the schema instead of making database calls.
LordSputnik
Well, that's the thing...
I've just started moving some of the editing logic into schema
There's a blurry line between them
Leftmost
Hmm. That may be worth some thought, then.
LordSputnik
Ok, anything else for discussion today?
Leftmost
Not that I can think of. Do you wanna think over the -ws/-schema split and talk about it next meeting, maybe?
LordSputnik
Ok, should we aim for 9:30 next time?
Leo_Verto
I can do the publisher/edition forms provided the ws endpoint exists
Leftmost
Sounds good to me. Same bat time (plus ten minutes), same bat channel?
LordSputnik
Leo_Verto: yes, it does - same endpoint as the creator/publisher ones
Leo_Verto
unless we want to hold that off until after the new call model
LordSputnik
Leftmost: haha yeah
Leo_Verto
ah good
LordSputnik
Leftmost: do you still have some time now?
Leftmost
I do.
LordSputnik
Ok, I think we can resolve the differences between your new proposed schema and our current one
Leftmost
Leo_Verto, up to you. It may be next meeting before I even have PRs up for discussion on frontend stuff, and new changes should be relatively easy to integrate.
LordSputnik
So, your proposal has revisions organised in a tree for each entity, which is optimal for parallel revisions being merged, right?
Leo_Verto
oh, one more thing
I want to start working on better accessability of the site
stuff like adding title texts for icons
LordSputnik
Go for it, that's always welcome
Leftmost
LordSputnik, right.
LordSputnik
labels for screenreaders are also helpful, but I've been leaving them out up until now :(
Leftmost
Leo_Verto, please do. I even have an accessibility consultant if you want feedback.
Leo_Verto
is there a proper way to implement those labels?
Leftmost
(My father is blind.)
LordSputnik
Leftmost: is better support for merging parallel work the only reason for using a tree rather than a list of revisions?
Leftmost
There is. I can get you more information by tomorrow if you want.
LordSputnik, off the top of my head, yeah.
LordSputnik
Leo_Verto: there's something like an "aria-*" attribute - it's documented in bootstrap, I think
There's also form labelling stuff baked into HTML, I believe.
Okay.
LordSputnik
Leftmost: ok, so which one we use should depend on whether we need that level of detail in the revision history
Leo_Verto
oh and I also want to add a nice button to the editor page to send a message, the problem here is vertically centering the button
Leftmost
LordSputnik, I think it's a useful concept. It's a very simple data structure, it's list-like if there's no parallel editing, and it makes it easier for multiple changes to be made and merged.
It's also very adaptable in terms of how we choose to display this to the end user.
LordSputnik
Ok
Leftmost
I've been doing a lot of data structures reading lately, so my mind may still be there. :)
LordSputnik
Second thing is- do we really want edits/merge requests *and* revisions
Leftmost
I don't see the need myself.
LordSputnik
Assuming the primary way of peer review is through verifying data, rather than reviewing edits, I don't think so either
So, we have a tree of revisions, no edits, and I'd assume that revisions can be reverted easily
Leftmost
Yep. A revert would basically just a new revision with an old dataset.
LordSputnik
Leftmost: wouldn't it just be setting the master_revision further down the tree?