in #bookbrainz-devel

22:27 PM
LordSputnik

No, I don't either, I'd like to have something more like Wikipedia's talk pages, but that's more complex, so I've left it out for now
22:27 PM
Leftmost

I'm still a little concerned that an annotations table would balloon, and I think we should treat disambig like other data since it should be smallish.
22:28 PM
LordSputnik

Storing them separately should reduce the amount of space they use, rather than increase it
22:28 PM
Leftmost

Yeah.
22:28 PM
LordSputnik

The size of those tables will almost certainly be smaller than the size of the Entity table, since most entities won't have them
22:29 PM
Leftmost

My current line of thinking is annotation as ID, disambig as inline data, encourage people to think of annotations as a stopgap for storing information our schema can't yet incorporate and that may be lost down the road if we move to a better system.
22:29 PM
LordSputnik

(and the ids are set to null in the EntityTree if there's no disambig/annotation)
22:29 PM
Leftmost

I'm reluctant to commit to storing an annotations table forever, as something about it makes me sad inside.
22:30 PM
Does that seem reasonable?
22:30 PM
LordSputnik

I still don't see the benefit in storing disambiguations inline, I'm afraid
22:30 PM
I don't think we gain from it, and we increase the amount of memory needed
22:32 PM
Leftmost

I see it as a rehash of the names table in MB, which proved to be a pain to maintain and turned out to be hard to define semantically. I think we should be storing names inline and I don't think that disambig is materially different from a name as it relates to storage.
22:33 PM
Gentlecat has left the channel
22:34 PM
We're committing to a certain amount of denormalization with the data table as it stands.
22:34 PM
LordSputnik

How so?
22:34 PM
I thought things were reasonable normalized
22:35 PM
Leftmost

Multiple data tables can contain substantially similar data referring to the same entity with only minor changes, and that's intentional.
22:36 PM
LordSputnik

Well, it's not redundant
22:37 PM
The information is in the combinations of aliases/annotation/disambiguation/data
22:37 PM
There's no way of storing that which is more efficient than foreign keys to separate tables, and no way we can reduce the duplication without losing the information
22:38 PM
ocharles_: kuno: ping, if you're around to give some database-fu
22:39 PM
Leftmost

LordSputnik, it also creates a separate join to a generic table which can have an arbitrarily large set of data, which has caused its own set of problems with edits on MB.
22:39 PM
The table will never be as large as the edits table, but it carries a performance cost and stores entity-specific information in a generic table.
22:40 PM
LordSputnik

The EntityData table?
22:40 PM
Leftmost

No, the disambiguation table.
22:40 PM
kuno

graph databases!
22:41 PM
Leftmost

We don't store end dates in a separate table referenced by ID because end dates are small and it creates overhead to do a lookup.
22:42 PM
Disambig is larger but still relatively small, should still have an upper limit on size, and would also create lookup overhead.
22:42 PM
Storing per-entity information in a generic table seems worse to me than small duplications of data.
22:42 PM
LordSputnik

Oh, so you're suggesting we merge the Disambiguation and EntityData tables
22:42 PM
Leftmost

Yes.
22:43 PM
LordSputnik

That makes more sense
22:43 PM
I thought you meant the Disambiguation and EntityTree tables
22:43 PM
Leftmost

Oh, sorry. I think I was being really unclear about that.
22:43 PM
LordSputnik

To be honest, I'm not 100% sure we need to separate EntityTree and EntityData, but I wanted to hear from ocharles_ before I thought any more about that
22:44 PM
Since EntityTree is just an entity-type independent way of storing pointers to other tables, and there's no reason EntityData couldn't do that
22:47 PM
Leftmost

Here's what I'm thinking, high-level, in terms of data structures:
22:47 PM
LordSputnik

So, I'd suggest we keep that as it is for now, until I've discussed that with ocharles_
22:48 PM
Oh go on
22:51 PM
Leo_Verto

oshit
22:51 PM
Leftmost

An Entity is a structure with an ID, a type, a rooted tree, and a node pointer associated with it. Each node of the rooted tree is a revision, which contains an edit note, a date, a pointer to a parent node, and a pointer to the data which makes up that revision. The node pointer on the entity points to the current master revision node.
22:51 PM
Leo_Verto

restarting the entire network right now
22:51 PM
LordSputnik

Leo_Verto: yeah I've seen :P
22:52 PM
Leftmost: let me draw that
22:54 PM
mb-chat-logger joined the channel
22:54 PM
Leftmost: ok, where are aliases?
22:55 PM
Leftmost

Not sure offhand. That was a very high-level picture and I'm still not used to the idea of putting aliases in a separate table.
22:56 PM
LordSputnik

Ok, so, it doesn't group revisions together at all
22:56 PM
Leftmost

No, only by parent.
22:56 PM
LordSputnik

But if we're not voting I guess we don't really need edits to group things?
22:56 PM
Leftmost

Application of an edit is accomplished by moving the master revision pointer.
22:57 PM
LordSputnik

Oh, I mean, in the current schema, edits are groups of revisions which get applied together
22:57 PM
Leftmost

Right, I meant in my schema.
22:58 PM
LordSputnik

Ok, I think this would work, but we still have the issue of storing different data for different entity types
22:58 PM
Leftmost

Not necessarily. Our entity structure can store type without using a typed tree.
22:59 PM
Leo_Verto

bb.org back up
23:00 PM
Leftmost

Each EntityData structure is associated with only a single Entity, so everything up to that point can be generic.
23:00 PM
LordSputnik

Well, what would the EntityData for each revision contain?
23:02 PM
Leftmost

Any type-agnostic revision information and a pointer to a type-specific data struct.
23:02 PM
LordSputnik

So, like an EntityTree is now?
23:03 PM
Leftmost

Yes, similar.
23:04 PM
LordSputnik

So, the main difference is in the organisation of revisions, then
23:05 PM
(and also presumably storing annotations and disambiguations inline in the entity-type agnostic data)
23:06 PM
Leftmost

Yeah.
23:06 PM
LordSputnik

Right, I'll think about that
23:07 PM
Let's go onto the final few topics
23:07 PM
SSH
23:07 PM
Leo_Verto

yeah
23:08 PM
LordSputnik

Leftmost: in order to set you up on the bb.org sandbox, I need the public SSH keys for the PCs you want to connect from (there's no password authentication afaik)
23:09 PM
So I guess the best way to do that would be in an email
23:11 PM
Ok, next is Search (who wanted to talk about that?)
23:11 PM
Leftmost

I did.
23:12 PM
LordSputnik

btw Leo_Verto: if you wanted SSH access to the bookbrainz server, we can probably sort that out too (although I should probably check with ruaok first)
23:12 PM
Leftmost

I've looked into it and it seems that solr and elasticsearch largely comes down to preference. I've got some experience with elasticsearch. It's dead easy to use and dead easy to get running. Any objection to moving forward with an ES search implementation?
23:13 PM
LordSputnik

No objection here
23:13 PM
Leo_Verto

probably not essential but could be useful in certain situations
23:13 PM
LordSputnik

Ok, I'll ask him next Monday, when he's not so busy (hopefully)
23:14 PM
Leftmost: either way I'd have to learn one or the other, and if it's easy to set up, we can probably switch between them without too much fuss if we really need to
23:15 PM
Leftmost

LordSputnik, public key sent.
23:15 PM
Leo_Verto

*decrypting transmission*
23:15 PM
LordSputnik

Leftmost: ok, I'll see what I can do when I have a proper internet connection again
23:15 PM
Finally, Setup (also you, Leftmost?) :)
23:16 PM
Leftmost

Yep.
23:16 PM
Just a general note that we should work on making setup easier. I don't know if I mentioned that yesterday.
23:17 PM
LordSputnik

nope
23:17 PM
What bits particularly?
23:17 PM
I thought it was fairly straightforward (especially compared to MB)
23:17 PM
Leo_Verto

mhm
23:18 PM
Leftmost

It wasn't difficult or confusing, but I think in general it could be smoother.
23:18 PM
Leo_Verto

the frontend setup is pretty well documented by now
23:19 PM
LordSputnik

so, currently, you have to install bbschema, install postgres and redis, then clone bbws
23:20 PM
Install dependencies with pip, then launch the ws
23:20 PM
Leo_Verto

oh yeah, installing redis is entirely undocumented
23:20 PM
Leftmost

The READMEs for each component should include a short setup howto and any system reqs, I'd say, and -schema or -ws should be able to set up the database with only a user/password to its name.
23:20 PM
LordSputnik

Clone the site, npm install the dependencies, compile the javascript, then launch the site
23:21 PM
Ok, currently database setup is done in a separate script in bbschema
23:21 PM
Leftmost

I got it set up without difficulty, just things that may be worth keeping in mind.
23:21 PM
LordSputnik

we could move that somewhere else, maybe, or try to integrate it with the ws config file
23:21 PM
Leo_Verto

if we get the config system done, we could provide configs for having a whole local setup or just working on the site using the bb.org ws
23:21 PM
LordSputnik

(or have a separate bbschema config file containing the database settings?)
23:22 PM
Leftmost

It probably belongs in -ws if -schema is just a lib for interacting purely with the schema instead of making database calls.
23:23 PM
LordSputnik

Well, that's the thing...
23:23 PM
I've just started moving some of the editing logic into schema
23:23 PM
There's a blurry line between them
23:23 PM
Leftmost

Hmm. That may be worth some thought, then.
23:24 PM
LordSputnik

Ok, anything else for discussion today?
23:24 PM
Leftmost

Not that I can think of. Do you wanna think over the -ws/-schema split and talk about it next meeting, maybe?
23:25 PM
LordSputnik

Ok, should we aim for 9:30 next time?
23:25 PM
Leo_Verto

I can do the publisher/edition forms provided the ws endpoint exists
23:25 PM
Leftmost

Sounds good to me. Same bat time (plus ten minutes), same bat channel?
23:25 PM
LordSputnik

Leo_Verto: yes, it does - same endpoint as the creator/publisher ones
23:25 PM
Leo_Verto

unless we want to hold that off until after the new call model
23:25 PM
LordSputnik

Leftmost: haha yeah
23:25 PM
Leo_Verto

ah good
23:26 PM
LordSputnik

Leftmost: do you still have some time now?
23:26 PM
Leftmost

I do.
23:26 PM
LordSputnik

Ok, I think we can resolve the differences between your new proposed schema and our current one
23:26 PM
Leftmost

Leo_Verto, up to you. It may be next meeting before I even have PRs up for discussion on frontend stuff, and new changes should be relatively easy to integrate.
23:27 PM
LordSputnik

So, your proposal has revisions organised in a tree for each entity, which is optimal for parallel revisions being merged, right?
23:28 PM
Leo_Verto

oh, one more thing
23:28 PM
I want to start working on better accessability of the site
23:28 PM
stuff like adding title texts for icons
23:29 PM
LordSputnik

Go for it, that's always welcome
23:29 PM
Leftmost

LordSputnik, right.
23:29 PM
LordSputnik

labels for screenreaders are also helpful, but I've been leaving them out up until now :(
23:30 PM
Leftmost

Leo_Verto, please do. I even have an accessibility consultant if you want feedback.
23:30 PM
Leo_Verto

is there a proper way to implement those labels?
23:30 PM
Leftmost

(My father is blind.)
23:30 PM
LordSputnik

Leftmost: is better support for merging parallel work the only reason for using a tree rather than a list of revisions?
23:30 PM
Leftmost

There is. I can get you more information by tomorrow if you want.
23:30 PM
LordSputnik, off the top of my head, yeah.
23:30 PM
LordSputnik

Leo_Verto: there's something like an "aria-*" attribute - it's documented in bootstrap, I think
23:31 PM
Leo_Verto

Leftmost, you should join our mailing list https://groups.io/org/groupsio/bookbrainz-devel...
23:31 PM
Leftmost

There's also form labelling stuff baked into HTML, I believe.
23:31 PM
Okay.
23:31 PM
LordSputnik

Leftmost: ok, so which one we use should depend on whether we need that level of detail in the revision history
23:33 PM
Leo_Verto

oh and I also want to add a nice button to the editor page to send a message, the problem here is vertically centering the button
23:33 PM
Leftmost

LordSputnik, I think it's a useful concept. It's a very simple data structure, it's list-like if there's no parallel editing, and it makes it easier for multiple changes to be made and merged.
23:33 PM
It's also very adaptable in terms of how we choose to display this to the end user.
23:34 PM
LordSputnik

Ok
23:34 PM
Leftmost

I've been doing a lot of data structures reading lately, so my mind may still be there. :)
23:35 PM
LordSputnik

Second thing is- do we really want edits/merge requests *and* revisions
23:35 PM
Leftmost

I don't see the need myself.
23:36 PM
LordSputnik

Assuming the primary way of peer review is through verifying data, rather than reviewing edits, I don't think so either
23:37 PM
So, we have a tree of revisions, no edits, and I'd assume that revisions can be reverted easily
23:38 PM
Leftmost

Yep. A revert would basically just a new revision with an old dataset.
23:39 PM
LordSputnik

Leftmost: wouldn't it just be setting the master_revision further down the tree?
23:39 PM
OH
23:39 PM
* Oh, no I guess you'd want a note too
23:39 PM
rather than silently changing history