Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fscache: add not-found directory cache to fscache #994

Merged
merged 2 commits into from Jan 25, 2017

Conversation

jeffhostetler
Copy link

@jeffhostetler jeffhostetler commented Dec 13, 2016

Teach FSCACHE to remember "not found" directories.

This is a performance optimization.

FSCACHE is a performance optimization available for Windows. It
intercepts Posix-style lstat() calls into an in-memory directory
using FindFirst/FindNext. It improves performance on Windows by
catching the first lstat() call in a directory, using FindFirst/
FindNext to read the list of files (and attribute data) for the
entire directory into the cache, and short-cut subsequent lstat()
calls in the same directory. This gives a major performance
boost on Windows.

However, it does not remember "not found" directories. When STATUS
runs and there are missing directories, the lstat() interception
fails to find the parent directory and simply return ENOENT for the
file -- it does not remember that the FindFirst on the directory
failed. Thus subsequent lstat() calls in the same directory, each
re-attempt the FindFirst. This completely defeats any performance
gains.

This can be seen by doing a sparse-checkout on a large repo and
then doing a read-tree to reset the skip-worktree bits and then
running status.

This change reduced status times for my very large repo by 60%.

@jeffhostetler jeffhostetler changed the title WIP add not-found-directory cache to fscache Add not-found directory cache to fscache Dec 15, 2016
@jeffhostetler jeffhostetler changed the title Add not-found directory cache to fscache fscache: add not-found directory cache to fscache Dec 15, 2016
@whoisj
Copy link

whoisj commented Dec 19, 2016

Ooh, this should be a pretty decent performance improvement when using sparse. 😁

@@ -6,8 +6,83 @@
static int initialized;
static volatile long enabled;
static struct hashmap map;
static struct hashmap map_nfd; /* not found directories */

This comment was marked as off-topic.


static struct nfd_entry *nfd_alloc(const char *name, size_t namelen, unsigned int hash)
{
struct nfd_entry *nfd = xcalloc(1, sizeof(struct nfd_entry) + namelen + 1);

This comment was marked as off-topic.

@jeffhostetler
Copy link
Author

@dscho I pulled your fixup and added one more.

@dscho
Copy link
Member

dscho commented Jan 13, 2017

@jeffhostetler okay, good. Please note that the fixup! <original-oneline> format is picked up by git rebase -i --autosquash, so your latest commit will not be handled automatically...

Other than that, I think the only thing we may want to consider is to try our hand at a test that verifies somehow that non-existing directories are not accessed more than once. We could introduce a new GIT_TRACE_FSCACHE category to that end, for example, and validate its output.

What do you think?

@whoisj
Copy link

whoisj commented Jan 13, 2017

We could introduce a new GIT_TRACE_FSCACHE category to that end, for example, and validate its output.

Brilliant. The more optional tracing we enable, the easier debugging will be in the future.

@jeffhostetler
Copy link
Author

@dscho Yeah. I'll re-title the commit and look at adding the tracing. I'll try to fix this up next week. Thanks!

@dscho
Copy link
Member

dscho commented Jan 16, 2017

The more optional tracing we enable, the easier debugging will be in the future.

Well, we should not overdo it, in particular in performance-critical code such as FSCache... 😄

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Teach FSCACHE to remember "not found" directories.

This is a performance optimization.

FSCACHE is a performance optimization available for Windows.  It
intercepts Posix-style lstat() calls into an in-memory directory
using FindFirst/FindNext.  It improves performance on Windows by
catching the first lstat() call in a directory, using FindFirst/
FindNext to read the list of files (and attribute data) for the
entire directory into the cache, and short-cut subsequent lstat()
calls in the same directory.  This gives a major performance
boost on Windows.

However, it does not remember "not found" directories.  When STATUS
runs and there are missing directories, the lstat() interception
fails to find the parent directory and simply return ENOENT for the
file -- it does not remember that the FindFirst on the directory
failed. Thus subsequent lstat() calls in the same directory, each
re-attempt the FindFirst.  This completely defeats any performance
gains.

This can be seen by doing a sparse-checkout on a large repo and
then doing a read-tree to reset the skip-worktree bits and then
running status.

This change reduced status times for my very large repo by 60%.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
@jeffhostetler
Copy link
Author

In this version, I added a simple GIT_TRACE_FSCACHE key to log each time FindFirst fails.
On my very large test repository (3.1M files with a 35K sparse checkout), the number of calls
to FindFirst went from 9.1M to 450K. The size of the trace log went from 1.1GB to 55MB.

@dscho dscho merged commit c610cc4 into git-for-windows:master Jan 25, 2017
@dscho
Copy link
Member

dscho commented Jan 25, 2017

Excellent! I added a test that I merged, too (10b99b6).

@dscho dscho added this to the v2.11.1 milestone Jan 25, 2017
dscho added a commit to git-for-windows/build-extra that referenced this pull request Jan 25, 2017
Performance [was enhanced when using fscache in a massively sparse
checkout](git-for-windows/git#994).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
@jeffhostetler jeffhostetler deleted the jeffhostetler/fscache_nfd branch January 25, 2017 20:00
dscho added a commit to dscho/git that referenced this pull request Feb 1, 2017
…er/fscache_nfd

fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 2, 2017
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Feb 2, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 2, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 2, 2017
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Feb 2, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 3, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 4, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 4, 2017
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Feb 4, 2017
fscache: add not-found directory cache to fscache
dscho added a commit to dscho/git that referenced this pull request Feb 4, 2017
…er/fscache_nfd

fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 22, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 22, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 23, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 23, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 24, 2024
fscache: add not-found directory cache to fscache
dscho added a commit to microsoft/git that referenced this pull request Apr 24, 2024
…er/fscache_nfd

fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 24, 2024
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Apr 25, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 25, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 25, 2024
fscache: add not-found directory cache to fscache
dscho added a commit that referenced this pull request Apr 25, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 25, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 26, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 26, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 26, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 26, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 26, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 27, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 27, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 29, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 29, 2024
fscache: add not-found directory cache to fscache
dscho added a commit to microsoft/git that referenced this pull request Apr 29, 2024
…er/fscache_nfd

fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 29, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request Apr 29, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 1, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 1, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 1, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 4, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 4, 2024
fscache: add not-found directory cache to fscache
git-for-windows-ci pushed a commit that referenced this pull request May 9, 2024
fscache: add not-found directory cache to fscache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants