Designing cross-boundary data flow tracking: from HTTP fetch() to backend handler

Code indexers index code. They don’t index connections between code.

A frontend fetch('/api/resources') and a backend router.get('/api/resources', handler) live in different files, different services, often different repositories. Every existing static analysis tool treats them as separate, unrelated things. The developer — or the AI agent — has to hold the connection in their head.

Grafema doesn’t. Here’s how we built cross-boundary tracing, what we tried that didn’t work, and where the current implementation still falls short.

The problem nobody solved yet

Think about what an AI agent actually needs to understand a full-stack change. A product manager files a bug: “The data source connection fails silently.” The agent needs to:

  1. Find the frontend component that manages data source connections
  2. Find the API call that component makes
  3. Find the backend handler that processes that call
  4. Find the database query inside the handler
  5. Understand the error handling at each hop

Without cross-boundary tracking, the agent does this by grepping. It reads the frontend component, guesses at the endpoint URL, searches for that string in the backend, finds three files that match, reads each one, picks the most plausible candidate. This takes 8–15 minutes of agent turns and is wrong often enough to be a real problem.

With explicit graph edges connecting frontend requests to backend handlers, step 2→3 is a single query. The agent doesn’t guess — it follows an edge.

Why this is hard: three approaches we tried

Approach 1: Pure regex matching

The first prototype was three regular expressions: one for fetch(...) calls, one for axios.get(...) calls, one for router.get(...) route registrations. Extract the URL strings, compare them. If they match, emit a connection.

This works on toy examples and fails on real codebases in predictable ways.

ToolJet has routes mounted under a prefix:

// app.js
app.use('/api/v2', userRouter);

// userRouter.js
router.get('/users', handler);  // full path: /api/v2/users

A regex on userRouter.js sees /users. A regex on the frontend sees /api/v2/users. These don’t match. The connection is lost unless you separately parse the mount call and combine them — at which point you’re doing AST analysis anyway.

Template literals break regex matching in a different way:

fetch(`/api/v2/users/${userId}`)

You can regex-match the template, extract /api/v2/users/, and try to match it against routes with :id params. Now you’re implementing URL pattern matching, which is a non-trivial problem on its own.

We abandoned pure regex after two weeks. The edge cases ate us alive.

Approach 2: Pure AST analysis

The second attempt was a full AST pass using Babel. Parse every file, walk the AST, detect fetch() call expressions, extract argument values, do the same for Express router.get() calls.

AST analysis handles template literals correctly — you can distinguish StringLiteral nodes from TemplateLiteral nodes and normalize both to a canonical form. It handles mounted routers correctly because you can follow the app.use() call to find the prefix and combine it with the route’s own path.

What pure AST can’t handle: URLs that are computed at runtime.

const endpoint = config.apiBase + '/users/' + userId;
fetch(endpoint);

There is no AST representation of what endpoint evaluates to at runtime. You’d need a symbolic execution engine to even attempt it. For this case, pure AST analysis is the same as regex: it sees a variable and gives up.

On ToolJet, about 29 of 318 frontend HTTP requests fell into this category — roughly 9%. Not ignoring the problem, but not blocking on it either.

Approach 3: Hybrid AST + normalization (what we use)

The current implementation has three components:

FetchAnalyzer runs during the analysis phase. It walks the AST of frontend files and creates http:request nodes for every detected fetch(), axios.get(), custom wrapper call, etc. For StringLiteral arguments it stores the exact URL. For TemplateLiteral arguments it normalizes ${variable} to :param and stores the pattern. For computed values it stores "dynamic".

ExpressRouteAnalyzer does the same for backend files. It creates http:route nodes for every router.get(), app.post(), etc. It stores the route path as declared, plus a fullPath field that the MountPointResolver enricher will fill in.

HTTPConnectionEnricher runs during the enrichment phase — after all analysis is complete. This is a critical constraint we’ll explain below. The enricher iterates over all http:request nodes, normalizes their URLs, and tries to match each one against all http:route nodes (also normalized). A match creates an INTERACTS_WITH edge.

The URL normalization is the key piece. Both sides get converted to the same canonical form before comparison:

  • /api/v2/users/${id}/api/v2/users/:param
  • /api/v2/users/:userId/api/v2/users/:param
  • /api/v2/users/123/api/v2/users/123 (exact match attempted first)

This gives parametric matching without building a full routing engine.

The enrichment phase constraint

Why does HTTPConnectionEnricher run in the enrichment phase, not the analysis phase?

Analysis runs in parallel. Each file is processed by its analyzer independently. If you tried to match requests to routes during analysis, you’d race — the frontend file might be processed before the backend routes are created, and the match would fail.

The enrichment phase runs after all analysis is complete, in a single-threaded pass over the finished graph. All http:request nodes and all http:route nodes exist before any enricher runs. The HTTPConnectionEnricher can safely read both sets and create connections between them.

This is a general pattern in Grafema’s pipeline: any operation that needs to create edges between nodes from different files must happen in enrichment. Analysis creates nodes; enrichment connects them. Violating this constraint produces non-deterministic graphs where some connections appear on some runs and not others.

A concrete example from ToolJet

ToolJet’s ResourcesService.jsx (frontend) contains:

await authFetch('/api/v2/resources', { method: 'POST', body: ... })

FetchAnalyzer creates an http:request node: method: POST, url: /api/v2/resources, library: authFetch.

The backend has resourcesController.js with:

router.post('/resources', ResourcesController.create)

mounted under /api/v2 in app.js:

app.use('/api/v2', resourcesRouter)

ExpressRouteAnalyzer creates an http:route node: method: POST, path: /resources. MountPointResolver enriches it to fullPath: /api/v2/resources.

HTTPConnectionEnricher normalizes both URLs to /api/v2/resources, finds the match, and creates:

http:request:ResourcesService.jsx:47 --[INTERACTS_WITH]--> http:route:resourcesController.js:12

A Datalog query to traverse this:

npx @grafema/cli query --raw '
  type(Req, "http:request"),
  attr(Req, "url", "/api/v2/resources"),
  attr(Req, "method", "POST"),
  edge(Req, Route, "INTERACTS_WITH"),
  edge(Route, Handler, "HANDLED_BY")
'

Result: the exact handler function, its file, its line number. No file reading required to get there.

Current limitations

Being specific matters here. The current implementation misses three categories of connections:

Dynamic route parameters in fetch URLs. fetch('/api/v2/resources/' + id) produces a dynamic URL because string concatenation isn’t statically analyzable. Template literals (/api/v2/resources/${id}) work; concatenation doesn’t. This affects roughly 9% of ToolJet’s frontend requests.

Non-Express backend frameworks. The ExpressRouteAnalyzer handles Express and basic app.use() patterns. Fastify, Hono, tRPC, and NestJS aren’t supported out of the box. This is addressable with plugins — the plugin development guide shows a FastifyRouteAnalyzer as a complete example.

External API calls. fetch('https://api.stripe.com/charges') creates an EXTERNAL node but doesn’t trace inside the external service. This is intentional — we can’t analyze code we don’t have — but it means some frontend requests connect to EXTERNAL rather than to a handler you can navigate to.

The full list is in the cross-service tracing docs.

Why this matters for AI agents

An agent that can answer “which backend handler processes this fetch call?” is working from facts, not inference. That’s a qualitative difference.

The alternative — grep for the URL string, find multiple matches, pick the most plausible one — produces wrong answers often enough to compound. If the agent incorrectly identifies the handler, it reads the wrong function, understands the wrong data flow, and writes a fix that targets the wrong code. The error isn’t visible until the fix fails.

Explicit graph edges don’t eliminate all navigation errors. But they replace a guess with a lookup for the 91% of connections that are statically resolvable. That’s enough to change the error rate meaningfully.

What’s next: schema inference

Knowing which handler processes a request is step one. Knowing what data flows through that connection is step two.

The handler returns something. What shape is it? A TypeScript interface? A Zod schema? A raw Sequelize model? The frontend component receives something. Does its type match what the handler sends?

That’s the problem we’re working on now. The next post in this series covers our approach to schema inference — and why it’s harder than it looks.


Cross-service tracing is available in Grafema now. Full documentation: /docs/cross-service-tracing. Start with /docs/getting-started.