Claude Code ships two distinct search primitives that share the same underlying engine. Understanding the split helps you use (and reason about) them correctly.
| Tool | User-facing name | What it searches | Returns | Hard limit |
|---|---|---|---|---|
Glob |
Search | File names (glob pattern) | Array of file paths sorted by mtime | 100 files (configurable via globLimits) |
Grep |
Search | File contents (regex) | Paths, lines, or counts depending on mode | 250 lines/files (default head_limit) |
GlobTool even reuses GrepTool's renderToolResultMessage directly (see GlobTool/UI.tsx line 53).
Despite being named "Glob," this tool does not use Node's fs.glob, fast-glob, or any JavaScript glob library. It delegates entirely to ripgrep via utils/glob.ts.
export async function glob( filePattern: string, cwd: string, { limit, offset }: { limit: number; offset: number }, abortSignal: AbortSignal, toolPermissionContext: ToolPermissionContext, ): Promise<{ files: string[]; truncated: boolean }> { // …absolute-path extraction, ignore-pattern assembly… const args = [ '--files', // list files instead of searching content '--glob', searchPattern, '--sort=modified', ...(noIgnore ? ['--no-ignore'] : []), ...(hidden ? ['--hidden'] : []), ] const allPaths = await ripGrep(args, searchDir, abortSignal) const truncated = allPaths.length > offset + limit const files = allPaths.slice(offset, offset + limit) return { files, truncated } }
The key insight: ripgrep with --files --glob <pattern> is a high-performance glob traversal. Ripgrep's multi-threaded directory walker is significantly faster than Node's fs.readdir-based glob implementations on large codebases. The same binary is reused for both name-search (Glob) and content-search (Grep).
ripgrep's --glob flag only accepts relative patterns. When the model passes an absolute path pattern like /src/utils/*.ts, extractGlobBaseDirectory() splits it into a baseDir (/src/utils) and a relative relativePattern (*.ts). The search then runs with searchDir = /src/utils.
export function extractGlobBaseDirectory(pattern: string): { baseDir: string; relativePattern: string } { // Find first glob special char: * ? [ { const match = pattern.match(/[*?[{]/) if (!match || match.index === undefined) { // Literal path — use dirname / basename split return { baseDir: dirname(pattern), relativePattern: basename(pattern) } } const staticPrefix = pattern.slice(0, match.index) const lastSep = Math.max( staticPrefix.lastIndexOf('/'), staticPrefix.lastIndexOf(sep) ) if (lastSep === -1) return { baseDir: '', relativePattern: pattern } return { baseDir: staticPrefix.slice(0, lastSep), relativePattern: pattern.slice(lastSep + 1) } }
CLAUDE_CODE_GLOB_NO_IGNORE=false — respect .gitignore (default: ignore it, include everything)CLAUDE_CODE_GLOB_HIDDEN=false — exclude hidden files (default: include them).gitignore by default? Claude needs to see build artifacts, generated files, and node_modules structure to answer questions like "is this dependency installed?" or "what files were emitted by the build?" Respecting gitignore by default would hide too much context.
The output_mode parameter controls both the ripgrep flags passed and the shape of the returned data. Each mode is a distinct ripgrep invocation strategy.
Default mode. Returns only file paths that contain at least one match. Low token cost — ideal for "which files use this pattern?"
rg -lOutput sorted by mtime (most recently modified first). Result: filenames[], numFiles.
Returns the matching lines with optional context lines before/after. Supports -n (line numbers), -B/-A/-C context, multiline mode.
Result: content string, numLines.
Returns per-file match counts in filename:N format. Useful for "how often does this pattern appear and where?"
Result: content string (raw), numMatches, numFiles.
// Add output mode flags if (output_mode === 'files_with_matches') { args.push('-l') } else if (output_mode === 'count') { args.push('-c') } // content mode: no flag needed — rg defaults to printing match lines // Line numbers only apply in content mode if (show_line_numbers && output_mode === 'content') { args.push('-n') } // Context flags (-C supersedes -B/-A) if (output_mode === 'content') { if (context !== undefined) { args.push('-C', context.toString()) } else if (context_c !== undefined) { args.push('-C', context_c.toString()) } else { if (context_before !== undefined) args.push('-B', context_before.toString()) if (context_after !== undefined) args.push('-A', context_after.toString()) } }
After ripgrep returns file paths, GrepTool runs Promise.allSettled(results.map(_ => fs.stat(_))) to fetch mtimes, then sorts descending. The most recently modified files appear first. This is intentional: the most recently changed files are the most relevant to the current task.
process.env.NODE_ENV === 'test', results are sorted by filename instead, ensuring deterministic ordering in VCR test fixtures. This is a common pattern in the codebase.
const stats = await Promise.allSettled( results.map(_ => getFsImplementation().stat(_)) ) const sortedMatches = results .map((_, i) => { const r = stats[i]! return [_, r.status === 'fulfilled' ? (r.value.mtimeMs ?? 0) : 0] as const }) .sort((a, b) => { if (process.env.NODE_ENV === 'test') return a[0].localeCompare(b[0]) const timeComparison = b[1] - a[1] return timeComparison === 0 ? a[0].localeCompare(b[0]) : timeComparison }) .map(_ => _[0])
Both tools support head_limit and offset parameters that work like a Unix tail -n +N | head -N pipeline. The default head_limit is 250 — generous enough for most searches while preventing context bloat on broad patterns.
function applyHeadLimit<T>( items: T[], limit: number | undefined, offset: number = 0, ): { items: T[]; appliedLimit: number | undefined } { // Explicit 0 = unlimited escape hatch if (limit === 0) { return { items: items.slice(offset), appliedLimit: undefined } } const effectiveLimit = limit ?? 250 // DEFAULT_HEAD_LIMIT const sliced = items.slice(offset, offset + effectiveLimit) // appliedLimit is ONLY set when truncation occurred // so the model knows there may be more results to page const wasTruncated = items.length - offset > effectiveLimit return { items: sliced, appliedLimit: wasTruncated ? effectiveLimit : undefined, } }
Three key design decisions in applyHeadLimit:
limit=0 is the unlimited escape hatch. The model can pass head_limit=0 when it explicitly wants all results regardless of size. The schema description warns "use sparingly — large result sets waste context."appliedLimit is only set when truncation occurred. If the full result fits within the limit, appliedLimit is undefined. The model only sees the pagination hint when there's actually more to page through.When truncation occurs, the tool result block includes a [Showing results with pagination = limit: 250] annotation. The model reads this and knows it can call the tool again with offset=250 to get the next page.
Every Grep and Glob invocation ultimately calls ripgrep. But which ripgrep binary runs depends on a three-mode resolution chain, evaluated once per process and memoized.
USE_BUILTIN_RIPGREPrgembeddedvendor/ripgrep/<arch>-<platform>/rgUser has USE_BUILTIN_RIPGREP set to a falsy value AND rg is on PATH. Uses the system-installed binary via the command name rg (not the resolved path — see security note below).
Running in bundled (native) Bun mode. ripgrep is statically compiled into the Bun executable. Spawned via process.execPath with argv0='rg' — the process checks argv[0] and dispatches as ripgrep.
Default npm install. A platform-specific binary is shipped at vendor/ripgrep/<arch>-<platform>/rg[.exe]. On macOS, a code-signing step may be needed (see below).
type RipgrepConfig = { mode: 'system' | 'builtin' | 'embedded' command: string args: string[] argv0?: string } const getRipgrepConfig = memoize((): RipgrepConfig => { const userWantsSystemRipgrep = isEnvDefinedFalsy(process.env.USE_BUILTIN_RIPGREP) if (userWantsSystemRipgrep) { const { cmd: systemPath } = findExecutable('rg', []) if (systemPath !== 'rg') { // SECURITY: Use command name 'rg', NOT systemPath // Prevents ./rg.exe in cwd from being executed (PATH hijacking) return { mode: 'system', command: 'rg', args: [] } } } if (isInBundledMode()) { return { mode: 'embedded', command: process.execPath, args: ['--no-config'], argv0: 'rg', } } // builtin: platform-specific vendored binary const command = process.platform === 'win32' ? path.resolve(rgRoot, `${process.arch}-win32`, 'rg.exe') : path.resolve(rgRoot, `${process.arch}-${process.platform}`, 'rg') return { mode: 'builtin', command, args: [] } })
On macOS, the vendored rg binary ships with only a linker signature (not an ad-hoc signature). macOS Gatekeeper blocks unsigned or minimally-signed binaries from running. On first use, codesignRipgrepIfNecessary() checks for linker-signed in codesign -vv output and re-signs the binary with codesign --sign - (self-sign / ad-hoc). It also strips the quarantine xattr (com.apple.quarantine) that macOS adds to downloaded files.
builtin mode. The embedded and system ripgrep binaries are assumed to already be properly signed. The codesign check is also guarded by alreadyDoneSignCheck — it runs at most once per process lifetime.
The embedded mode (with argv0) must use spawn() — Node's execFile does not support the argv0 override needed to make the Bun executable believe it is rg. All other modes use execFile with a maxBuffer: 20MB cap.
Timeouts are platform-aware: 20 seconds on standard platforms, 60 seconds on WSL (which has a 3–5x file I/O penalty). Overrideable via CLAUDE_CODE_GLOB_TIMEOUT_SECONDS. Kill escalation uses SIGTERM first, then SIGKILL after 5 seconds — because ripgrep blocked on deep filesystem traversal may not respond to SIGTERM.
In resource-constrained environments (Docker, CI), ripgrep may fail with EAGAIN (os error 11 / "Resource temporarily unavailable") when spawning worker threads. The retry strategy is precise: one retry with -j 1 (single thread) for that specific call only. A previous version persisted single-threaded mode globally, but this caused timeouts on large repos where EAGAIN was a transient startup error.
if (!isRetry && isEagainError(stderr)) { logForDebugging(`rg EAGAIN error, retrying with -j 1`) logEvent('tengu_ripgrep_eagain_retry', {}) ripGrepRaw( args, target, abortSignal, (retryError, retryStdout, retryStderr) => handleResult(retryError, retryStdout, retryStderr, true), true, // singleThread = true for this call only ) return } function isEagainError(stderr: string): boolean { return ( stderr.includes('os error 11') || stderr.includes('Resource temporarily unavailable') ) }
The telemetry call countFilesRoundedRg() uses a dedicated streaming counter rather than ripGrep(). On a repo with 247k files, buffering all paths into a string then splitting on newlines materializes ~16MB in memory. The streaming version counts newline bytes per chunk — peak memory is one stream chunk (~64KB). Results are rounded to the nearest power of 10 for privacy (e.g., 8 → 10, 42 → 100, 8500 → 10000).
The MAX_BUFFER_SIZE constant is set to 20MB. Large monorepos with 200k+ files can easily produce stdout larger than Node's default 1MB execFile buffer. If the buffer is exceeded, ripgrep returns an ERR_CHILD_PROCESS_STDIO_MAXBUFFER error — the code treats this as a partial result and drops the last (potentially incomplete) line.
All returned paths are converted from absolute to relative (via toRelativePath) before being included in the tool result sent to the model. Absolute paths like /Users/moiz/project/src/utils/ripgrep.ts cost more tokens than src/utils/ripgrep.ts. On a result set of 250 files, this saves hundreds of tokens per call.
Both Glob and Grep declare isConcurrencySafe() = true. They are read-only operations with no shared mutable state — the model can (and does) issue multiple search calls in parallel within the same turn.
Grep appends --max-columns 500 to every ripgrep invocation. This prevents base64-encoded data, minified JavaScript, or other long single-line content from flooding the output. Lines longer than 500 characters are truncated with a [omitted long line] annotation from ripgrep.
Grep automatically excludes these version control system directories from every search:
const VCS_DIRECTORIES_TO_EXCLUDE = [ '.git', '.svn', '.hg', '.bzr', '.jj', '.sl', ] as const
This covers Git, Subversion, Mercurial, Bazaar, Jujutsu, and Sapling. These directories contain large binary objects, pack files, and index databases — searching them creates noise and can be extremely slow.
Glob does not apply these exclusions explicitly, but uses --no-ignore by default which lets ripgrep's own traversal logic handle them — ripgrep's built-in behavior skips .git directories unless overridden.
If a regex pattern starts with a dash (e.g., -v), ripgrep would interpret it as a command-line flag rather than a search pattern. GrepTool handles this correctly:
// If pattern starts with dash, use -e flag to specify it as a pattern // This prevents ripgrep from interpreting it as a command-line option if (pattern.startsWith('-')) { args.push('-e', pattern) } else { args.push(pattern) }
The -e flag tells ripgrep "what follows is a pattern, not a flag." This is a class of injection vulnerability common to tools that shell out: if user-supplied text is appended to a CLI command naively, a leading dash can hijack the flag parsing.
stat calls for paths starting with \\ or // (UNC paths on Windows). The comment explains: "SECURITY: Skip filesystem operations for UNC paths to prevent NTLM credential leaks." A stat call on a UNC path triggers an SMB authentication handshake that can leak NTLM hashes to a malicious server.
The glob parameter of GrepTool accepts one or more file patterns to filter which files are searched. The parsing logic handles two cases:
if (glob) { const globPatterns: string[] = [] const rawPatterns = glob.split(/\s+/) for (const rawPattern of rawPatterns) { // Brace patterns must NOT be split on commas // e.g. "*.{ts,tsx}" is one pattern, not ["*.{ts", "tsx}"] if (rawPattern.includes('{') && rawPattern.includes('}')) { globPatterns.push(rawPattern) } else { // Split comma-separated patterns without braces globPatterns.push(...rawPattern.split(',').filter(Boolean)) } } for (const globPattern of globPatterns.filter(Boolean)) { args.push('--glob', globPattern) } }
The model can pass glob: "*.js,*.ts" (comma-separated) or glob: "*.{ts,tsx}" (brace-expanded). The parser correctly identifies brace patterns and passes them as single --glob arguments to ripgrep, which handles brace expansion natively.
The type parameter offers an alternative: type: "js" maps to ripgrep's --type js, which uses ripgrep's built-in file type definitions (includes both .js and .jsx). This is more efficient than a glob because ripgrep resolves type definitions at startup rather than matching each path against a pattern.
files_with_matches (-l), content (no flag), and count (-c) are fundamentally different ripgrep invocations — not post-processing variants. Choosing the right mode avoids unnecessary work.head_limit=0 is the escape hatch.system → embedded → builtin. The security detail is non-obvious: when using system ripgrep, the command is spelled "rg" (not the resolved path) to prevent PATH-hijacking by a local rg.exe.-j 1 only for that one call and restores multi-threaded behavior immediately after. Transient errors should not permanently degrade performance.files_with_matches mode, results are sorted by modification time (most recent first). The assumption is: files you just edited are more relevant to the current task than files that haven't been touched in months.1. When GlobTool receives a pattern like /src/utils/*.ts, what transformation happens before the ripgrep call?
2. Which ripgrep output mode returns only file paths (not matching lines)?
3. What does head_limit=0 mean in the Grep tool?
4. Why does the system-mode ripgrep use the command "rg" rather than the full resolved path from findExecutable?
5. In files_with_matches mode, how are results ordered?
6. What happens when a Grep pattern starts with a dash (e.g., -v)?