calc Time to First Token(ttft) and token speed #779

369pro · 2025-12-25T07:07:46Z

Description

Summary: (summary)

Fix: #(issue)

Docs Issue/PR: (docs-issue-or-pr-link)

Reviewer: @(reviewer)

Checklist:

I have performed a self-review of my own code | 我已自行检查了自己的代码
I have commented my code in hard-to-understand areas | 我已在难以理解的地方对代码进行了注释
I have added tests that prove my fix is effective or that my feature works | 我已添加测试以证明我的修复有效或功能正常
I have created related documentation issue/PR in MemOS-Docs (if applicable) | 我已在 MemOS-Docs 中创建了相关的文档 issue/PR（如果适用）
I have linked the issue to this PR (if applicable) | 我已将 issue 链接到此 PR（如果适用）
I have mentioned the person who will review this PR | 我已提及将审查此 PR 的人

Copilot

Pull request overview

This PR adds Time to First Token (TTFT) and token speed calculation capabilities to the LLM performance tracking infrastructure. The implementation aims to provide better observability into LLM response times and throughput.

Enhanced the timed_with_status decorator to calculate and log content-based performance metrics (speed in chars/s, content length)
Added TTFT tracking in OpenAI streaming responses to measure latency from request to first token
Introduced model name tracking to correlate performance metrics with specific models

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.

File	Description
src/memos/utils.py	Modified the `timed_with_status` decorator to add performance metrics calculation including total time in seconds, content length, and speed (chars/s) for successful function calls with truthy results
src/memos/llms/openai.py	Added TTFT (Time To First Token) tracking in the `generate_stream` method with timing logic and model name logging for the first token received

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-25T07:11:27Z

src/memos/utils.py

    - log_extra_args:
        - can be a dict: fixed contextual fields that are always attached to logs;
        - or a callable: like `fn(*args, **kwargs) -> dict`, used to dynamically generate contextual fields at runtime.
+    - target_models: list of target models to filter performance tracking


The documentation mentions a new parameter "target_models" but this parameter is not actually implemented in the function signature or used in the function body. Either remove this documentation line or implement the parameter.

Suggested change

- target_models: list of target models to filter performance tracking

Copilot · 2025-12-25T07:11:28Z

src/memos/utils.py

+                if success_flag and result:
+                    # Calculate content metrics
+                    content_length = len(result) if isinstance(result, str) else 0


The logic assumes that when success_flag is True and result is truthy, the result will be a string for length calculation. However, this function is a generic decorator that could be applied to any function. If the result is not a string (e.g., a list, dict, or custom object), this will cause a bug. Consider adding a type check before attempting to measure content_length, or document that this decorator should only be used with string-returning functions.

Copilot · 2025-12-25T07:11:28Z

src/memos/utils.py

+                    # Get model name from self.config
+                    self = args[0] if args else None
+                    model_name = (
+                        getattr(self, "config", {}).get("model_name_or_path", "unknown")
+                        if self
+                        else "unknown"
+                    )


Accessing self.config using getattr on a potentially None object is fragile. If args[0] is not an object with a config attribute (or if args is empty), this will fail silently and return "unknown". This assumes all decorated functions are methods with a specific structure. Consider adding more robust type checking or documenting this requirement clearly.

Suggested change

# Get model name from self.config

self = args[0] if args else None

model_name = (

getattr(self, "config", {}).get("model_name_or_path", "unknown")

if self

else "unknown"

)

# Get model name from self.config, if available and well-formed

self_obj = args[0] if args else None

model_name = "unknown"

if self_obj is not None:

config = getattr(self_obj, "config", None)

if isinstance(config, dict):

model_name = config.get("model_name_or_path", "unknown")

Copilot · 2025-12-25T07:11:29Z

src/memos/utils.py

+                elif (
+                    success_flag
+                ):  # Standard logging for non-target models or when not tracking metrics
+                    msg = (
+                        f"[TIMER_WITH_STATUS] {log_prefix or fn.__name__} "
+                        f"took {elapsed_ms:.0f} ms{status_info}, args: {ctx_str}"
+                    )
+                    logger.info(msg)


The logging condition is confusing. The branch at line 103-105 logs when success_flag is True but result is falsy (None, empty string, 0, False, etc.). This means successful executions with falsy results use the standard logging format, while successful executions with truthy results use the performance metrics format. This logic seems inconsistent - consider what should happen when a function successfully returns an empty string or None.

Copilot · 2025-12-25T07:11:29Z

src/memos/llms/openai.py

+                # 尝试从响应中获取实际模型信息
+                actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
+                requested_model = self.config.model_name_or_path
+
+                # Print TTFT info - 显示请求模型和实际模型(如果不一致)


The comment is in Chinese. For consistency with the rest of the codebase which uses English comments, this should be translated to English.

Suggested change

# 尝试从响应中获取实际模型信息

actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path

requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)

# Try to get the actual model information from the response

actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path

requested_model = self.config.model_name_or_path

# Print TTFT info - show requested model and actual model (if they differ)

Copilot · 2025-12-25T07:11:29Z

src/memos/llms/openai.py

+                # 尝试从响应中获取实际模型信息
+                actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
+                requested_model = self.config.model_name_or_path
+
+                # Print TTFT info - 显示请求模型和实际模型(如果不一致)


The comment is in Chinese. For consistency with the rest of the codebase which uses English comments, this should be translated to English.

Suggested change

# 尝试从响应中获取实际模型信息

actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path

requested_model = self.config.model_name_or_path

# Print TTFT info - 显示请求模型和实际模型(如果不一致)

# Try to obtain the actual model information from the response

actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path

requested_model = self.config.model_name_or_path

# Print TTFT info - show requested model and actual model (if they differ)

Copilot · 2025-12-25T07:11:29Z

src/memos/llms/openai.py

+        first_token_time = None
+        start_time = time.perf_counter()

        for chunk in response:
            delta = chunk.choices[0].delta

+            # Calculate TTFT on first token
+            if first_token_time is None:
+                first_token_time = time.perf_counter()
+                ttft_ms = (first_token_time - start_time) * 1000.0
+
+                # 尝试从响应中获取实际模型信息
+                actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
+                requested_model = self.config.model_name_or_path
+
+                # Print TTFT info - 显示请求模型和实际模型(如果不一致)
+                if actual_model != requested_model:
+                    logger.info(
+                        f"TTFT: {ttft_ms:.2f}ms | Requested: {requested_model} | Actual: {actual_model}"
+                    )
+                else:
+                    logger.info(f"TTFT: {ttft_ms:.2f}ms | {requested_model}")


The TTFT (Time To First Token) calculation and logging happens inside the generator loop but only once when first_token_time is None. However, this timing logic doesn't account for cases where the stream might be empty or where no chunks are yielded. Additionally, since this code is inside generate_stream which is decorated with @timed_with_status, there's now redundant timing logic - the decorator already tracks total time. Consider whether TTFT should be tracked separately from the decorator's timing or if this overlaps with the decorator's functionality.

Copilot · 2025-12-25T07:11:30Z

src/memos/llms/openai.py

+        first_token_time = None
+        start_time = time.perf_counter()

        for chunk in response:
            delta = chunk.choices[0].delta

+            # Calculate TTFT on first token
+            if first_token_time is None:
+                first_token_time = time.perf_counter()
+                ttft_ms = (first_token_time - start_time) * 1000.0
+
+                # 尝试从响应中获取实际模型信息
+                actual_model = getattr(chunk, "model", None) or self.config.model_name_or_path
+                requested_model = self.config.model_name_or_path
+
+                # Print TTFT info - 显示请求模型和实际模型(如果不一致)
+                if actual_model != requested_model:
+                    logger.info(
+                        f"TTFT: {ttft_ms:.2f}ms | Requested: {requested_model} | Actual: {actual_model}"
+                    )
+                else:
+                    logger.info(f"TTFT: {ttft_ms:.2f}ms | {requested_model}")


The new TTFT timing logic and model name retrieval in generate_stream lacks test coverage. The existing test at line 46-103 in test_openai.py doesn't verify the TTFT calculation or the logging of model information. Consider adding tests to verify that TTFT is calculated correctly and that model name logging works as expected.

369pro added 2 commits December 25, 2025 14:26

feat: calc Time to First Token(ttft) and token speed

a85affe

Merge branch 'dev' of github.com:MemTensor/MemOS into dev

a88b985

Copilot AI review requested due to automatic review settings December 25, 2025 07:07

Copilot started reviewing on behalf of 369pro December 25, 2025 07:08 View session

Copilot AI reviewed Dec 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

calc Time to First Token(ttft) and token speed #779

calc Time to First Token(ttft) and token speed #779

Uh oh!

369pro commented Dec 25, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Copilot AI Dec 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-                    # Get model name from self.config
-                    self = args[0] if args else None
-                    model_name = (
-                        getattr(self, "config", {}).get("model_name_or_path", "unknown")
-                        if self
-                        else "unknown"
-                    )
+                    # Get model name from self.config, if available and well-formed
+                    self_obj = args[0] if args else None
+                    model_name = "unknown"
+                    if self_obj is not None:
+                        config = getattr(self_obj, "config", None)
+                        if isinstance(config, dict):
+                            model_name = config.get("model_name_or_path", "unknown")

calc Time to First Token(ttft) and token speed #779

Are you sure you want to change the base?

calc Time to First Token(ttft) and token speed #779

Uh oh!

Conversation

369pro commented Dec 25, 2025

Description

Checklist:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 25, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant