深度解析 Anthropic 面试题 “Converting Stack Samples to a Trace”

2025年暑期Anthropic发布了一些新岗位,我们来看看它的去年的一些题目吧。

首先我们来看题目原文

Problem: Converting stack samples to a trace

Sampling profilers are a performance analysis tool for finding the slow parts of your code by periodically sampling the entire call stack. In our problem the samples will be a list of Samples of a float timestamp and a list of function names, in order by timestamp, like this:

  • Sample stacks contain every function that is currently executing
  • The stacks are in order from outermost function (like "main") to the innermost currently executing function
  • An unlimited amount of execution/change can happen between samples. We don’t have all the function calls that happened, just some samples we want to visualize.
        
from dataclasses import dataclass

@dataclass
class Sample:
    ts: float
    stack: list[str]

example_samples = [Sample(7.5, ["main", "my_outer_function", "my_inner_function"])]
example_samples
        
    

Sometimes it’s nice to visualize these samples on a chronological timeline of the call stack using a trace visualizer UI. Time is on the X axis and nesting is on the Y axis.

These UIs were built for instrumentation that records the start and end of functions. We want to use this to visualize the data we have as best we can, and so need to convert the sorted samples into a list of start and end events for each function call.

The returned events should be in a list order that is properly nested as if they were recorded. For example, a nested function call’s start event is after the one from the enclosing function’s, and the nested call’s end event is before the enclosing call’s end event:

|------outer------|
|--inner--|

would be ordered: start outer, start inner, end inner, end outer

Assume call frames in the last sample haven’t finished. The resulting events should use this type:

        
@dataclass
class Event:
    kind: str   # either 'start' or 'end'
    ts: float
    name: str

def convert_to_trace(samples: list[Sample]) -> list[Event]:
    raise Exception("unimplemented")

samples = [
    Sample(7.5, ["main"]),
    Sample(9.2, ["main", "my_fn"]),
    Sample(10.7, ["main"])
]
convert_to_trace(samples)
# expected_result = [
#    Event('start', 7.5, 'main'),
#    Event('start', 9.2, 'my_fn'),
#    Event('end',   10.7, 'my_fn'),
# ]
        
    

这道题真正考察的是:给定一系列按时间升序排列的栈采样(每条采样里记录了从最外层到最内层的调用栈),我们并不知道中间到底发生了哪些函数调用进出,只能看到离散的快照。面试官希望你能用“start/end”事件把这些快照拼接成一条看得懂的调用轨迹:较外层的函数先 startend,较内层的函数在它的 start/end 之间;同时,最后一帧的函数因为还没退出,不生成 end

最开始,候选人按 CSOahelp实时面试辅助的提示,写出了这样一个“前后栈对比”版本:

  1. prev_stack 保存上一条样本的调用栈(初始为空);
  2. 遍历每个 Sample(ts, stack)
    • 找出 prev_stackstack 的最长公共前缀长度 L
    • prev_stack[L:] 中剩下的函数倒序生成 Event('end', ts, func)
    • stack[L:] 中新增的函数正序生成 Event('start', ts, func)
    • 更新 prev_stack = stack
        
def convert_to_trace(samples: List[Sample]) -> List[Event]:
    events = []
    prev_stack = []

    for sample in samples:
        cur_stack = sample.stack
        ts = sample.ts

        # 1. 找到 prev_stack 和 cur_stack 的公共前缀长度
        common_len = 0
        while (common_len < len(prev_stack)
               and common_len < len(cur_stack)
               and prev_stack[common_len] == cur_stack[common_len]):
            common_len += 1

        # 2. 针对在 prev_stack 里但不在 cur_stack 的函数,倒序生成 'end'
        for func in reversed(prev_stack[common_len:]):
            events.append(Event('end', ts, func))

        # 3. 针对在 cur_stack 里但不在 prev_stack 的函数,顺序生成 'start'
        for func in cur_stack[common_len:]:
            events.append(Event('start', ts, func))

        prev_stack = cur_stack

    return events

        
    

这样,既保证了内层函数的结束先于外层,也保证了外层函数的开始先于内层;遍历完所有样本,除了最后一次栈中的函数(因为没有后续样本让它消失)都得到了成对的 start/end。按样本数量 N 和最大栈深度 D,时间复杂度 O(N·D),面试中算是又快又清晰的方案。

接着,CSOahelp 提示要考虑以下几点,并辅导候选人一步步完善:

  • 避免重复事件
    如果连续两次采样栈完全相同,就不该发任何事件;需要在对比公共前缀时先判断 prev_stack == stack,直接跳过生成阶段。
  • 空栈与深度跳变
    当栈从空直接跳到深度大于 1,或从深度 3 直接跳到 1,需要单测保证 end/start 顺序没问题。
  • 单元测试覆盖
    写几个典型用例:
    • 完全相同的连续样本不出事件;
    • 逐层压入、逐层弹出;
    • 跳层进入、跳层退出;
    • 空栈→非空→空栈;
      并用断言验证事件序列精确匹配预期。
  • 去掉过度扩展
    之前加的 min_count 抖动平滑逻辑虽然有思考价值,但超出了题意,建议先提交最简洁的版本,再在后面加注释说明如何扩展。
  • 注释与可读性
    在代码里对“公共前缀”“倒序 end”“正序 start”各步骤加简短注释,让面试官一眼就看懂核心思路。

最终,我们协助候选人大哥交出了一个既逻辑清晰、注释完备,又加了充分测试的答案版本,完美契合面试要求。

如果您也在准备 Anthropic 的面试,不妨试试靠自己解决这道题。和其他大厂的纯算法题还是不一样的,非常具有参考价值。

如果你也在准备Anthropic、Amazon、Meta、TikTok等大厂的算法与系统设计面试,却不清楚如何拆题和应对各种边界,欢迎添加微信 csvohelp,即可领取北美面试求职通关秘诀。我们也有代面试,面试辅助,OA代写等服务助您早日上岸~

Leave a Reply

Your email address will not be published. Required fields are marked *