LordSputnik, are there any open questions about the database still?
LordSputnik
Leftmost: only planning how to do merging
I could maybe blast off a procedure for that tonight
Leftmost: if you feel like it, add a couple more of those port jade -> react tasks to GCI
I've done one but leaving it unpublished for the time being
Leftmost
Oh, right. I'll keep thinking about how we can deal with revision parenting, since I think that addresses most of the merging problems except the actual property selection.
LordSputnik
(revision and entity display might be easy ones for beginners)
Leftmost: ah yeah, I'll think about it now, maybe I'll come up with something
I read up on git internals but that hasn't helped much, apart from getting me to think about NES again and that leading to the new model for relationships
Leftmost
The obvious way is having a revision_parent table, but I'm not sure if that's overkill.
Yeah, I came up with about the same schema for that driving home from the store.
LordSputnik
So, when we merge, we have two revisions - those are parents to the merge revision, correct?
Then, if we revert the merge (split), we create two new revisions, with the previous data, and with the merge revision as the parent?
Leftmost
Yes.
(Can we call this CRUMB? Create, Read, Update, Merge, Belete? :))
LordSputnik
Haha, that or DRUMC?
Leftmost
Additionally, a create relationship revision would modify the relationship sets of two different entities. Ideally, the entities would share the same revision and so their master revisions would be the parents. Reverting an add would be single-parent single-child, deleting a relationship would be double-parent (or potentially single-parent) single-child.
LordSputnik
So I think we may need a separate table to keep track of revision relations
Lotheric has quit
With PK (parent_revision_id, child_revision_id)
Freso
alastairp: Seems like we have a student from New Zealand :)
Lotheric joined the channel
LordSputnik
(effectively storing the edges of the directed acyclic graph of revisions)
drsaunders has quit
Leftmost
Yeah, that sounds right.
drsaunders joined the channel
(Man, I love graphs.)
LordSputnik
That may be the most CS sentence I've ever put together :P
Freso: did you see the side-scrolling xkcd game the other day?
Freso
Nope.
Leftmost
LordSputnik, I got lost in that for over an hour...
Freso
But I do kind of want the thing explainer.
LordSputnik
Leftmost: I went left, got stuck in the volcano, then went right all the way
I also explored the star destroyer a little, but not fully :P
Leftmost
You got stuck in the volcano? But there's more past there!
Hmm. Storing source and target entities in the relationship table is now technically denormalized.
LordSputnik
Leftmost: Oh I went back again after reloading :P
Leftmost: oh yeah, the relationship table is pretty much just (id, type) now
To get the entity BBIDs (hopefully not often needed), you'd join through relationship_set__relationship, relationship_set, <entity>_data, <entity>_revision, <entity>_header :P
Although we *will* need them every time we display a relationship... hmmm
Nyanko-sensei has quit
LordSputnik is now imagining a society of sysadmins like https://www.youtube.com/watch?v=UOs-4J6rr-w
I think that could actually save the world :P
Leftmost
You'd need to determine the relationship type, the type of the other entity (which I think becomes non-trivial when we use the flag, as we need to test which entity type column we need to check), then select the entity of that type which has a master revision with data containing a relationship set containing that relationship.
Has to be a better way than that. :-P
qu3ntin02 joined the channel
qu3ntin02
Hey guys
Does bookbrainz have an API?
Freso
This question sure is popular.
Leftmost
If we don't store isSource at the relationship set level (shouldn't really be relevant when only looking at the set, right?), then storing source and target BBIDs in the relationship table isn't denormalized and simplifies everything, heh.
LordSputnik, What is the difference between the two?
LordSputnik
mildused: which two? :)
Leftmost
mildused, one is the production instance, one is the source for it.
mildused
got it
Leftmost
LordSputnik, can you think of an instance in which we want to know whether something is source or target but don't want to join against the relationship table? That seems like information for the relationship data itself.
mildused
Why not Node.js?
darwin
why not zoidberg
LordSputnik
Leftmost: but. You could also have an entity A with entity_data with a relationship_set pointing to a relationship set, and also have entity B with different entity_data pointing to the same relationship_set, meaning that the info stored in the relationship was wrong
Leftmost
Because python was the direction we initially chose to look. That's changing, though.
LordSputnik
mildused: yeah, WS 1.0 will be node.js :)
mildused
So... how far off is that?
LordSputnik
What we're discussing right now is the reason we haven't started on that yet
(ie. schema 1.0)
mildused
Ohh haha
Leftmost
LordSputnik, I don't follow. Why would the info stored in the relationship be wrong?
mildused
still using Redis?
LordSputnik
Leftmost: because two different entities could be the source for a relationship through their relationship_sets
mildused: yup
Because relationship carries so little information
Leftmost
LordSputnik, how? The relationship set wouldn't know anything about whether something is a source.
LordSputnik
I need to think about the relationships a bit more, something is definitely not good with the way they are in that diagram
Then the whole graph needs to change to connect relationship and entity
Bookzombie joined the channel
But that still wouldn't work if you keep relationship_set
Leftmost
I'm not sure I see why.
LordSputnik
Because any number of entities can share the relationship set, meaning that the relationship BBIDs could easily be invalidated
Consider an entity with relationships being merged into an entity with no relationships
The resulting entity has a different BBID, but the same set of relationships as entity A
mildused
So tasks should be done for the python web service?
LordSputnik
So that will cause problems, but I don't know exactly what problems until we have a rough merge process worked out
mildused: yup, exactly
Stuff like testing for the current WS will be useful for when we come to write tests for the new WS in a few months time
Nyanko-sensei joined the channel
Leftmost
LordSputnik, the procedure would be the same, I think, as if you merged two entities both with relationships: new relationship data will have to be created pointing to the correct entity and a new relationship set created.
LordSputnik
Leftmost: OK, but you'd also have to duplicate the relationships themselves, which is why I'm not sure how much RelationshipSet makes sense when we have BBIDs on the relationship table
Need to think some more
Leftmost
Okay.
LordSputnik
Another problem. If you have 5 relationships between an entity and 5 other entities, then you have to make 6 revisions to change them all
If you revert only one of those revisions, the relationships become out of sync
So, I don't even think we can handle relationships in the same versioning system as entity data
identifiers work because they only involve one entity (the other is some external identifier)
LordSputnik, we wouldn't need to have more than one revision, I think.
With the change from revisions being single-parent, we can change arbitrarily many entities with a single revision.
I'm not sure if it's a good thing to change related entities, though. That does seem to balloon pretty fast.
darwin
(DBA suggests cascading updates are often bad)
Leftmost
I'm not willing to give up yet, though. I think putting relationships into the same versioning system can still work, just need to figure out how.
Rayna joined the channel
Freso
LordSputnik | Another problem. If you have 5 relationships between an entity and 5 other entities, then you have to make 6 revisions to change them all -- or 10 revisions; 5 revisions on the single entity and 1 revision on each of the five. If a revision is reverted, you "just" "undo" the related revision on the other end too.
s/;/:/
Leftmost
Ontologically-speaking, a relationship is a link between two entities. If the identity of one of the entities changes, we want to change the link but not the other entity.
This does create practical problems in finding out what that other entity is, though.
Rayna has quit
LordSputnik, okay, thinking about it: to fetch an edition with a publisher rel, we go EditionHeader -> EditionRevision -> EditionData -> RelationshipSet -> RelationshipSet_Relationship -> Relationship -> RelationshipType already, just to display the edition side. We already know the other side is a publisher, so we just have to do the other side without the last two steps, which is only one join more than we'd have to do to get the
publisher side of it anyhow.
We could possibly even eliminate the PublisherHeader part of it, since PublisherRevision knows the BBID.
Well, no, we probably don't want to do that. We still want to get the master revision if we're fetching the edition's master revision.