Rendered at 00:51:20 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
dotwaffle 2 days ago [-]
I have done extensive research on CDC and it almost never works out because most utilities don't create compressed archives in an "rsyncable" (rsync does CDC) format, I actually saved a lot of storage using restic when I switched my backups of certain things so that files were stored in archives uncompressed, and sorted in a stable order. I know syncthing eventually removed CDC and just went with constant-size block sizes.
Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!
a_t48 2 days ago [-]
This is something I'm very interested in implementing for Docker builds. I've tested out CDC for the final image outputs, it results in smaller outputs but requires tuning between saved space versus request count when pulling. For build cache it might be even more advantageous.
stabbles 2 days ago [-]
Isn't that rather difficult given the `.tar.gz` layers?
a_t48 1 days ago [-]
I have a custom pull client/registry/builder that uses a different format, but can output standard OCI if needed.
tracnar 2 days ago [-]
It also supports .tar but that's probably not very commonly used.
auscompgeek 2 days ago [-]
In theory eStargz layers should be amenable to CDC.
a_t48 1 days ago [-]
It feels that way, but eStargz is still only addressable as a single layer, or range of one.
londons_explore 2 days ago [-]
Doesn't this mean that malicious inputs can deliberately cause super tiny or super huge chunks?
rienbdj 2 days ago [-]
Bazel caches tend to have a size limit.
You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!
ramchip 2 days ago [-]
The same is true without CDC, and you can configure a maximum size.
Bazel, on the other hand, is completely in control of this, and it makes perfect sense to do this at that point -- and it seems to be a relatively efficient implementation too, really nice to see!
You need to trust your build execution machine anyway. They have your source code and you will be executing the artifacts that they produce!