-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathdesign_notes.txt
163 lines (123 loc) · 10.4 KB
/
design_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
Build process is a function: build = {src,target} : bin.
src has repo.
bin has repo.
bin refers back to src. Either atomically or via post-glue.
src can't atomically {refer to, contain} bin as that would be a merkle DAG collision. Only via post-glue.
Post-glue := Non-intrusive state glued on externally afterwards:
1) src can have a git notes branch of where OK-binaries exists (in {git bin repo, NFS, artifactory, etc})
2) src can have a git tag on each commit, of sha in binary repo holding the OK-binaries.
MPED: Prefer 1 ie git-notes as they are future-proof in {scope, content}.
Post-glue can always be added/changed/redesigned, but the atomic design of bin repo needs upfront consideration:
bin repo can atomically refer back to src repo SHA in some ways:
1) commit tree: bin can have src/ as submodule (shallow update=none) which points to src SHA. Commit on tag or branch.
2) commit msg: bin can put src SHA in commit message. Commit on tag or branch.
3) raw {tree,hash}-object held alive by a tag.
The non-atomic post-glue would be:
4) bin can put src SHA in tag-name on the commit.
5) bin can put src SHA as [annotated tag]-name on the commit.
6) bin can put src SHA as git-note on the commit.
MPED: Prefer 1 ie submodule as that is more native and can easily be checked out - and clicked on in UIs. Policy lives in .gitmodules, not our own stuff.
Considerations for bin repo design:
a) Garbage collection of unneeded bin-artifacts.
GC requires we can detach references from the commit. This is easy if there are no refs from parent-commits, i.e no parents.
No parents means the commit will be a new root, only containing itself on this commit history DAG.
We can choose to use either branches or tags for this. Only branches can be directly checked out. See tag wisdom below.
b) O(1) lookup time: Resolve src-SHA to bin-artifact fast.
We can have a ref (branch or tag) refer to the bin-artifact by the src-SHA via convention:
E.g.: branch auto/SomePrefix/${SRC_REPO}/${SRC_SHA}/${target_sanitized}
for example: auto/SomePrefix/firmware/abbabeef/fwbase-hi2
- which is a branch that holds 1 commit: Where message contains the {build agent+environment+src-branch name}, and treeish holds the {artifacts+submodule+build-command/script}.
The commit message can also hold the target-command line, but the build target command may also be useful to re-run (for audit,validation).
This branch name convention should (modulo environmental changes), make idempotency possible:
We can quickly check if src-SHA exists and what targets from it has been built, and build what may be missing.
c) Name space collision/pollution.
This branch having 1 commit is also known as an orphan branch.
This branch having 1 commit can be deleted. This will only delete the ref, not the artifact blobs. Blobs can be GC'd.
Let's say all such 1-commit branches are ephemeral by default meaning deleted after, say, 30 days.
During these 30 days, we as Manufacturer can protect the orphan branches' blobs or even the whole orphan branch:
Only blobs: Add a merge commit on a "Release" branch - this adds a ref on the blobs. But we lose the branch ref name.
Whole branch: Add another branch with some other Prefix.
Extend the branch by one more commit, which could be empty. The important thing would be the commit message:
- Explanation: Is this a release or a customer who wants to keep it around.
- Reset the TTL in the first commit.
Consolidating this into a tangible example:
src repo: firmware on gerrit or gitlab
bin repo: firmware.bin on gitlab
Assume src repo has SHA ABBABEEF42.
Some CI agent runs 'forge fwbase-hi2' on firmware@ABBABEEF42 successfully, producing obj/fwbase-hi2/ directory.
We commit obj/fwbase-hi2/ on firmware.bin as new orphan branch 'auto/src2bin/firmware/ABBABEEF42/fwbase-hi2'.
We commit obj/fwbase-hi2/ as obj/fwbase-hi2/ using 1:1 folder mapping in this case[1].
The commit has a message headline of "auto: firmware@ABBABEEF42: Build of fwbase-hi2"
The commit has a message body of "TTL: 30 days".
This commit is pushed to firmware.bin repo.
Each day, a CI job looks at all auto/src2bin/*-branches' HEAD commit:
If current time now >= commit.time + commit.message.body.TTL then we delete the branch.
If commit.message.body does not have a TTL field, we treat it as Infinite.
TTL lives with the commit and could be changed without any new deployment.
gen_7.0 could get a higher TTL than feature/foo branch.
This strategy will let anyone protect a branch by adding a commit without the TTL-field.
Or let anyone add-back or change the TTL of a branch.
This would also work with amend and force push (but discouraged).
Branches can be cloned directly, tags cannot.
Commits are better for containing data than short tags. We can add other traceability meta-data in later. E.g. source-repo branch having ABBABEEF42.
[1] other repos may want to have this become the new root, but 1:1 means multiple targets from different orphan branches can be merged without conflicts.
Tag wisdom:
- Lightweight tags are just mutable pointers. Therefore races can occur. No message or date. Best-practise is to consider them Private.
- Annotated tags are immutable + have commit message. Good for release notes. Best-practise is to consider them Public.
- All tags live in the same global flat namespace.
- Can't clone directly from tag. Only branches can be cloned directly.
-------------
Implementing download.
We sit in SRC_REPO on a SRC_SHA commit.
To discover available artifacts, we can:
a) Run from within SRC_REPO:
$ git_recycle_bin --list blah --bin-remote $BIN_REMOTE --src-sha $SRC_SHA # ask server-side bin repo
OK, but this requires us to know the URI of the bin remote.
It is likely that different kinds of artifacts may be pushed to different remotes. E.g. HTML vs ELF.
I.e. this is not discoverable.
b) Look for avilable git notes.
Run from within SRC_REPO:
$ git fetch origin "refs/notes/*:refs/notes/*"
A note binds to the SRC_SHA.
Many "channels" of notes can exist - because notes have their own branch, DAG and history.
One such "channel" branch can inform about artifacts.
The artifact channel would live under a refspec such as:
refs/notes/artifact/<TARGET> # OK. Simple. Filterable by target name.
refs/notes/artifact/<BIN_REMOTE>/<TARGET> # OK. Segmented, filterable. Perhaps a BIN_REMOTE holds HTML, others ELF.
refs/notes/artifact/<BIN_REMOTE> # HMM. Spammy? As different artifacts would race to enter the git note and thus need sort_uniq rebase.
refs/notes/artifact/<BIN_REMOTE>/<TARGET>/<DATE> # BAD. Pollution: This would give too many refs over time as everything will be unique.
refs/notes/artifact/<BUILD_HOST>/<TARGET> # BAD. Why only fetch from one build host? Hosts come and go, hostname still doesn't capture buildenv.
It makes sense to duplicate information in note refspec and the notes' contents - this allows change of either in the future.
With notes fetched, a note binds to SRC_SHA and contains text.
The text will be sorted lines of Human JSON, such as:
{date:2023-07-27/13.14+0200, sha:abcdef1234567890, target:fwbase-hi2, ttl:30d, remote:firmware.bin}
{date:2023-07-27/13.14+0200, sha:0987654321fedcba, target:user-manual, ttl:90d, remote:docs.html}
In more general, the format:
{date:<BIN_SHA_COMMIT_DATE>, sha:<BIN_SHA>, target:<TARGET>, ttl:<TTL>, remote:<BIN_REMOTE>}
Because humans are also expected to look at these notes, the fields are ordered.
Fields are ordered by a pattern of:
[Sortable fixed-width fundamental-info is more eye-pleasing] [Variable payload info] [Repetitive low information density]
Now that we can list server-side artifacts, we can download and place one.
What if there is already a {older,newer} artifact?
If there is no artifact already,
We are in SRC repo on HEAD SHA.
We can build on HEAD ABBA, then checkout some completely other commit BABE - now our local binary is not related to BABE-source.
Build-system will realize that binary (artifact from ABBA-source) is not derived from BABE, and rebuild.
server-side-has-for-HEAD:yes workspace-binary:none Just download the artifact, skip build
server-side-has-for-HEAD:yes workspace-binary:current Just download the artifact, skip build. Can we be idempotent and avoid the download?
server-side-has-for-HEAD:yes workspace-binary:older Just download the artifact, skip build. Must align to HEAD
server-side-has-for-HEAD:yes workspace-binary:newer Just download the artifact, skip build. Must align to HEAD
server-side-has-for-HEAD:no workspace-binary:none Just build. Will be a full first build.
server-side-has-for-HEAD:no workspace-binary:current Just build. May be sparse rebuild.
server-side-has-for-HEAD:no workspace-binary:older Just build. May be sparse rebuild.
server-side-has-for-HEAD:no workspace-binary:newer Just build. May be sparse rebuild.
* If src is dirty, give up. We can't relate dirty-HEAD with server-side binary artifact. Just build.
* If src is clean, try download the server-side binary for the current HEAD.
- available: Fetch+overwrite always (TODO: Can we make this idempotent?). Skip build.
- unavailable: Just build.
(Post-build invocation of git-recycle-bin to publish the built artifact to make it generally available - though only CI systems should publish)
(Do we need state for idempotency? Treeish in rbgit, or placing a metadata file next to.)
How do we download and populate?
$ rbgit reset BIN_SHA --hard
- This will mutate only the files that are in treeish of the BIN_SHA orhan commit, leaving other files unchanged.
- This can be done in sequence for each artifact in turn.