-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disk usage concerns #9
Comments
iirc, Mac and Linux file systems throw a file save event. You would write code to look for that event and filter to the git folder watched. |
That's more in line with efficiency concerns of #5 This issue is about removing branches (and objects) that aren't needed. I'm thinking about heuristics like
|
I have not done too much digging into the code to see how the state is managed but is disk space such a high commodity that a cleanup event must be triggered every N minutes? My suggestion would be to have the cleanup even run at the program start (possibly delayed) and expose a cleanup command to the end-user to run on-demand (i.e. Anyway, this is a neat issue to have and I commend you for wanting to save end-user disks. I look forward to seeing how this issue gets resolved. |
Perhaps there could be a system similar to log rotation, 'branch rotation'. Dura could create Not sure if that's stepping into the territory of being too complex? |
Ring Buffer approachI'll call @JakeStanger's idea the "ring buffer" approach. There's a lot of variations in it, but it amounts to
(fwiw using a date instead of an index solves some of your problems) The main problem with this approach is deciding which backups are safe to lose. Meanwhile, people are choking on branches. There's probably a discrepancy between "how much data they're okay losing" and "how many branches they want to see". i.e. Many users would prefer to see only the most recent So how do we
B-Tree approachDisclaimer: this idea is terrible and I love it Each commit can have multiple parents. I'm not sure what the limit is, but that's your base. Create commits that refer to other commits such that you build up a B-Tree with a single commit on top. That's your dura branch (#31). You now only have one branch. The leaf nodes of this B-tree are the commits that are currently Adding commitsB-trees are fast to append to. You always add to the right-most (newest) side of the tree. When the tree fills up, you make a new parent. New regular commits would create Removing commitsWhen the history gets truly old and crusty (2 years seems adequate), removing from the left side is just as fast as inserting on the right. ElephantThe elephant in the room is that this could explode the width of the git log in I wonder if there's a way to make this not matter. Maybe you can manipulate the timestamps so they appear later (I don't think this will work). Maybe we can solve this through hybrid mode. Hybrid modeMixing the approaches, we can use the B-tree for cold storage, and the ring buffer for more relevant commits (hot storage?). One idea is to have maintain all |
Following up on my last message, an octopus merge is a merge commit with more than 2 parents. There's apparently no hard limit, except that Github history viewer won't handle 100k and a 66-way merge in the Linux kernel seemed to break viewers. You could do either the ring buffer approach or the B-tree approach. They both seem to do a let better as you increase parents. I think I'll take a stab at this. I'll make it configurable, so that you can effectively toggle the behavior on/off, reduce/enhance the effect, etc. I think I need to see it in action |
I'd agree it's clear the ring-buffer idea isn't sufficient by itself. I was trying to come up with answers to the ring-buffer questions and basically re-invented your idea of hot/cold storage, so the hybrid mode sounds good to me. I must admit the B-Tree approach is mostly going over my head. It might be a good idea to visualize it somehow, if not for this then for end-user documentation. |
Alirght, here's my best shot at whiteboarding it. B-Tree
There's no limit to the number of parents, so you could in theory have 1 red node with all blue branches parenting into it, like a spider with 200 legs (9 in this case). But there really are limits (they just aren't stated), so we have to put a limit on it, so you need an octopus or B-tree. Octopus CollectiveAnother variation is to only have a 1st level of the B-tree. This would vastly reduce the number of branches, but it becomes harder to ignore the octopus commits. With the B-tree you can do I'm starting to see the value in the Octopus Collective. I had initially thought it would be hard to exclude patterns of branches until I wrote this. |
Thanks for taking the time to draw those out, that does help a lot. Sorry it's taken me a while to get back to it. So let's say someone's been working on a huge refactor and not made any 'real' commits for several days. Assuming the hybrid ring buffer/octopus model was implemented, and you want to restore a commit from a few days ago ago that's no longer in the ring buffer:
|
A lobste.rs user is concerned about the disk usage. I've wondered the same thing myself.
Does anyone has ideas about how to get real data about how big of a problem this is?
I've wanted to build in some sort of GC, but it seems a little tough to get right. I could see doing it time-based or branch based (delete
dura
branches when their tracking branch is deleted). I'm in favor of branch-based, but there's still some corner cases (like, what if thedura
branch is created againstbranch-1
but then you deletebranch-1
and createbranch-2
at the same commit?)The text was updated successfully, but these errors were encountered: