linux - 在服务器上删除重复的 Git 分支

coder 2023-06-19 原文

有没有办法硬链接(hard link)包含多个 Git 存储库的文件夹中的所有重复对象？

解释:

我在我公司的服务器(Linux 机器)上托管一个 Git 服务器。这个想法是拥有一个主要的规范存储库，每个用户都没有推送访问权限，但是每个用户都 fork 规范存储库(将规范克隆到用户的主目录，从而实际创建硬链接(hard link))。

/canonical/Repo /Dev1/Repo (objects Hard-linked to /canonical/Repo to when initially cloned) /Dev2/Repo (objects Hard-linked to /canonical/Repo to when initially cloned)

一切正常。问题出现在:

Dev1: Pushes a huge commit onto his fork on server (/Dev1/Repo) Dev2: Fetches that on his local system, makes his own changes and pushes it to his own fork on server (/Dev2/Repo)

(现在相同的“巨大”文件驻留在服务器上的两个开发人员的分支中。它不会自动创建硬链接(hard link)。)

这疯狂地占用了我的服务器空间!

我如何在两个 fork 之间重复的对象之间创建硬链接(hard link)或就此而言是规范的，以便节省服务器空间，并且每个开发人员从他/她的 fork 克隆到他/她的本地机器上时都可以获得所有数据？

最佳答案

Now the same 'huge' file resides in both the developer's forks on the server. It does not create a hard-link automatically

实际上，在 Git 2.20 中，这个问题可能会消失，因为 delta islands，这是一种进行 delta 计算的新方法，这样 一个存在于一个 fork 中的对象就不会变成一个针对未出现在同一个 fork 存储库中的另一个对象的增量。

参见 commit fe0ac2f , commit 108f530 , commit f64ba53 (2018 年 8 月 16 日)Christian Couder (chriscool) .
帮助:Jeff King (peff) , 和 Duy Nguyen (pclouds) .
参见 commit 9eb0986 , commit 16d75fa , commit 28b8a73 , commit c8d521f (2018 年 8 月 16 日)Jeff King (peff) .
帮助:Jeff King (peff) , 和 Duy Nguyen (pclouds) .
^{(由 Junio C Hamano -- gitster -- merge 于 commit f3504ea ，2018 年 9 月 17 日)}

Add delta-islands.{c,h}

Hosting providers that allow users to "fork" existing repositories want those forks to share as much disk space as possible.

Alternates are an existing solution to keep all the objects from all the forks into a unique central repository, but this can have some drawbacks.
Especially when packing the central repository, deltas will be created between objects from different forks.

This can make cloning or fetching a fork much slower and much more CPU intensive as Git might have to compute new deltas for many objects to avoid sending objects from a different fork.

Because the inefficiency primarily arises when an object is deltified against another object that does not exist in the same fork, we partition objects into sets that appear in the same fork, and define "delta islands".
When finding delta base, we do not allow an object outside the same island to be considered as its base.

So "delta islands" is a way to store objects from different forks in the same repository and packfile without having deltas between objects from different forks.

This patch implements the delta islands mechanism in "delta-islands.{c,h}", but does not yet make use of it.

A few new fields are added in 'struct object_entry' in "pack-objects.h" though.

参见 Documentation/git-pack-objects.txt: Delta Island :

DELTA ISLANDS

When possible, pack-objects tries to reuse existing on-disk deltas to avoid having to search for new ones on the fly. This is an important optimization for serving fetches, because it means the server can avoid inflating most objects at all and just send the bytes directly from disk.

This optimization can't work when an object is stored as a delta against a base which the receiver does not have (and which we are not already sending). In that case the server "breaks" the delta and has to find a new one, which has a high CPU cost. Therefore it's important for performance that the set of objects in on-disk delta relationships match what a client would fetch.

In a normal repository, this tends to work automatically.
The objects are mostly reachable from the branches and tags, and that's what clients fetch. Any deltas we find on the server are likely to be between objects the client has or will have.

But in some repository setups, you may have several related but separate groups of ref tips, with clients tending to fetch those groups independently.

For example, imagine that you are hosting several "forks" of a repository in a single shared object store, and letting clients view them as separate repositories through GIT_NAMESPACE or separate repositories using the alternates mechanism.

A naive repack may find that the optimal delta for an object is against a base that is only found in another fork.
But when a client fetches, they will not have the base object, and we'll have to find a new delta on the fly.

A similar situation may exist if you have many refs outside of refs/heads/ and refs/tags/ that point to related objects (e.g., refs/pull or refs/changes used by some hosting providers). By default, clients fetch only heads and tags, and deltas against objects found only in those other groups cannot be sent as-is.

Delta islands solve this problem by allowing you to group your refs into distinct "islands".

Pack-objects computes which objects are reachable from which islands, and refuses to make a delta from an object A against a base which is not present in all of A's islands. This results in slightly larger packs (because we miss some delta opportunities), but guarantees that a fetch of one island will not have to recompute deltas on the fly due to crossing island boundaries.

但是有一个副作用:一些命令更加冗长。 Git 2.23(2019 年第三季度)修复了这个问题。

参见 commit bdbdf42 (2019 年 6 月 20 日)作者 Jeff King (peff) .
^{(由 Junio C Hamano -- gitster -- merge 于 commit a4c8352 ，2019 年 7 月 9 日)}

delta-islands: respect progress flag

The delta island code always prints "Marked %d islands", even if progress has been suppressed with --no-progress or by sending stderr to a non-tty.

Let's pass a progress boolean to load_delta_islands().
We already do the same thing for the progress meter in resolve_tree_islands().

关于linux - 在服务器上删除重复的 Git 分支，我们在Stack Overflow上找到一个类似的问题： https://stackoverflow.com/questions/25843553/

有关linux - 在服务器上删除重复的 Git 分支的更多相关文章

ruby - 使用 ruby 和 savon 的 SOAP 服务 - 2
我正在尝试使用ruby和Savon来使用网络服务。测试服务为http://www.webservicex.net/WS/WSDetails.aspx?WSID=9&CATID=2require'rubygems'require'savon'client=Savon::Client.new"http://www.webservicex.net/stockquote.asmx?WSDL"client.get_quotedo|soap|soap.body={:symbol=>"AAPL"}end返回SOAP异常。检查soap信封，在我看来soap请求没有正确的命名空间。任何人都可以建议我
ruby - 具有身份验证的私有(private) Ruby Gem 服务器 - 2
我想安装一个带有一些身份验证的私有(private)Rubygem服务器。我希望能够使用公共(public)Ubuntu服务器托管内部gem。我读到了http://docs.rubygems.org/read/chapter/18.但是那个没有身份验证-如我所见。然后我读到了https://github.com/cwninja/geminabox.但是当我使用基本身份验证(他们在他们的Wiki中有)时，它会提示从我的服务器获取源。所以。如何制作带有身份验证的私有(private)Rubygem服务器？这是不可能的吗？谢谢。编辑:Geminabox问题。我尝试“捆绑”以安装新的gem..
ruby-on-rails - 如何从 format.xml 中删除 <hash></hash> - 2
我有一个对象has_many应呈现为xml的子对象。这不是问题。我的问题是我创建了一个Hash包含此数据，就像解析器需要它一样。但是rails自动将整个文件包含在.........我需要摆脱type="array"和我该如何处理？我没有在文档中找到任何内容。最佳答案我遇到了同样的问题；这是我的XML:我在用这个:entries.to_xml将散列数据转换为XML，但这会将条目的数据包装到中所以我修改了:entries.to_xml(root:"Contacts")但这仍然将转换后的XML包装在“联系人”中，将我的XML代码修改为
ruby - 我可以使用 Ruby 从 CSV 中删除列吗？ - 2
查看Ruby的CSV库的文档，我非常确定这是可能且简单的。我只需要使用Ruby删除CSV文件的前三列，但我没有成功运行它。最佳答案 csv_table=CSV.read(file_path_in,:headers=>true)csv_table.delete("header_name")csv_table.to_csv#=>ThenewCSVinstringformat检查CSV::Table文档:http://ruby-doc.org/stdlib-1.9.2/libdoc/csv/rdoc/CSV/Table.html
ruby - 我可以使用 aws-sdk-ruby 在 AWS S3 上使用事务性文件删除/上传吗？ - 2
我发现ActiveRecord::Base.transaction在复杂方法中非常有效。我想知道是否可以在如下事务中从AWSS3上传/删除文件:S3Object.transactiondo#writeintofiles#raiseanexceptionend引发异常后，每个操作都应在S3上回滚。S3Object这可能吗？？最佳答案虽然S3API具有批量删除功能，但它不支持事务，因为每个删除操作都可以独立于其他操作成功/失败。该API不提供任何批量上传功能(通过PUT或POST)，因此每个上传操作都是通过一个独立的API调用完成的
ruby-on-rails - 启动 Rails 服务器时 ImageMagick 的警告 - 2
最近，当我启动我的Rails服务器时，我收到了一长串警告。虽然它不影响我的应用程序，但我想知道如何解决这些警告。我的估计是imagemagick以某种方式被调用了两次？当我在警告前后检查我的git日志时。我想知道如何解决这个问题。-bcrypt-ruby(3.1.2)-better_errors(1.0.1)+bcrypt(3.1.7)+bcrypt-ruby(3.1.5)-bcrypt(>=3.1.3)+better_errors(1.1.0)bcrypt和imagemagick有关系吗？/Users/rbchris/.rbenv/versions/2.0.0-p247/lib/ru
ruby-on-rails - s3_direct_upload 在生产服务器中不工作 - 2
在Rails4.0.2中，我使用s3_direct_upload和aws-sdkgems直接为s3存储桶上传文件。在开发环境中它工作正常，但在生产环境中它会抛出如下错误，ActionView::Template::Error(noimplicitconversionofnilintoString)在View中，create_cv_url,:id=>"s3_uploader",:key=>"cv_uploads/{unique_id}/${filename}",:key_starts_with=>"cv_uploads/",:callback_param=>"cv[direct_uplo
ruby - 用 Ruby 编写一个简单的网络服务器 - 2
我想在Ruby中创建一个用于开发目的的极其简单的Web服务器(不，不想使用现成的解决方案)。代码如下:#!/usr/bin/rubyrequire'socket'server=TCPServer.new('127.0.0.1',8080)whileconnection=server.acceptheaders=[]length=0whileline=connection.getsheaders想法是从命令行运行这个脚本，提供另一个脚本，它将在其标准输入上获取请求，并在其标准输出上返回完整的响应。到目前为止一切顺利，但事实证明这真的很脆弱，因为它在第二个请求上中断并出现错误:/usr/b
ruby-on-rails - 在 Rails 中调试生产服务器 - 2
您如何在Rails中的实时服务器上进行有效调试，无论是在测试版/生产服务器上？我试过直接在服务器上修改文件，然后重启应用，但是修改好像没有生效，或者需要很长时间(缓存？)我也试过在本地做“脚本/服务器生产”，但是那很慢另一种选择是编码和部署，但效率很低。有人对他们如何有效地做到这一点有任何见解吗？最佳答案我会回答你的问题，即使我不同意这种热修补服务器代码的方式:)首先，你真的确定你已经重启了服务器吗？您可以通过跟踪日志文件来检查它。您更改的代码显示的View可能会被缓存。缓存页面位于tmp/cache文件夹下。您可以尝试手动删除
ruby - 如何安全地删除文件？ - 2
在Ruby中是否有Gem或安全删除文件的方法？我想避免系统上可能不存在的外部程序。“安全删除”指的是覆盖文件内容。最佳答案如果您使用的是*nix，一个很好的方法是使用exec/open3/open4调用shred:`shred-fxuz#{filename}`http://www.gnu.org/s/coreutils/manual/html_node/shred-invocation.html检查这个类似的帖子:Writingafileshredderinpythonorruby?

linux - 在服务器上删除重复的 Git 分支

Add `delta-islands.{c,h}`

DELTA ISLANDS

`delta-islands`: respect `progress` flag

有关linux - 在服务器上删除重复的 Git 分支的更多相关文章

随机推荐

linux - 在服务器上删除重复的 Git 分支

Add delta-islands.{c,h}

DELTA ISLANDS

delta-islands: respect progress flag

有关linux - 在服务器上删除重复的 Git 分支的更多相关文章

随机推荐

Add `delta-islands.{c,h}`

`delta-islands`: respect `progress` flag