Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 74 additions & 25 deletions src/memos/templates/mem_reader_prompts.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,13 @@
SIMPLE_STRUCT_DOC_READER_PROMPT = """You are an expert text analyst for a search and retrieval system.
Your task is to process a document chunk and generate a single, structured JSON object.

Comment on lines 227 to 229
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description still contains template placeholders (e.g., “Fixes @issue_number”) and no concrete testing steps. Please update the description with the linked issue number and how this prompt change was validated (example inputs / regression checks), since prompt updates can materially change extraction behavior.

Copilot uses AI. Check for mistakes.
If given context, use it as a supplementary background for understanding the current document content, such as:
- Explaining references to people, events, or entities in the document
- Understanding the user's plans, actions, or preferences
- Judging which information might be relevant to the user's memory

The context is only for aiding understanding of the document content and should not be used to fabricate information that is not present in the original text.

Please perform:
1. Identify key information that reflects factual content, insights, decisions, or implications from the documents — including any notable themes, conclusions, or data points. Allow a reader to fully understand the essence of the chunk without reading the original text.
2. Resolve all time, person, location, and event references clearly:
Expand All @@ -242,15 +249,15 @@
- Prioritize completeness and fidelity over conciseness.
- Do not generalize or skip details that could be contextually meaningful.

Return a single valid JSON object with the following structure:
Return a valid JSON object:

{
"memory list": [
{
"key": <string, a concise title of the `value` field>,
"memory_type": "LongTermMemory",
"value": <A clear and accurate paragraph that comprehensively summarizes the main points, arguments, and information within the document chunk — written in English if the input memory items are in English, or in Chinese if the input is in Chinese>,
"tags": <A list of relevant thematic keywords (e.g., ["deadline", "team", "planning"])>
"memory_type": <"LongTermMemory" or "UserMemory">,
"value": <A clear paragraph summarizing the main points, arguments, and information within the document chunk>,
"tags": <A list of relevant thematic keywords>
}
...
],
Expand All @@ -263,6 +270,40 @@

{custom_tags_prompt}

Example:

Reference context:
role-user: I plan to carry this for hiking in Mount Siguniang.
role-bob: Me too.

Input text chunk:
This hiking backpack has a capacity of 40L and weighs approximately 1.2kg. It uses a lightweight aluminum frame structure designed for long-distance high-altitude trekking. The backpack also includes side straps that can secure cylindrical gear, allowing quick access to equipment during hiking.

Output:
{
"memory list": [
{
"key": "40L lightweight high-altitude trekking backpack",
"memory_type": "LongTermMemory",
"value": "The document describes a hiking backpack with a 40L capacity and a weight of approximately 1.2kg. It features a lightweight aluminum frame structure designed to support long-distance trekking in high-altitude environments.",
"tags": ["hiking backpack", "trekking gear", "outdoor equipment"]
},
{
"key": "Backpack side strap equipment attachment design",
"memory_type": "LongTermMemory",
"value": "The backpack includes side strap structures that allow cylindrical gear to be secured externally, enabling hikers to quickly access equipment during movement.",
"tags": ["backpack design", "gear attachment", "outdoor equipment"]
},
{
"key": "Mount Siguniang hiking equipment plan",
"memory_type": "UserMemory",
"value": "Based on the provided context, the user and Bob plan to use this hiking backpack during a trekking trip to Mount Siguniang, indicating that this equipment is part of their hiking preparation.",
"tags": ["user plan", "Mount Siguniang", "hiking"]
}
],
"summary": "The text describes a hiking backpack designed for high-altitude trekking, with a 40L capacity and a weight of approximately 1.2kg. It features a lightweight aluminum frame structure for improved load stability and endurance during long-distance hiking. The backpack also includes side straps that allow cylindrical gear to be attached externally for quick access. Based on the provided context, the backpack is planned to be used by the user and Bob for a hiking trip to Mount Siguniang, linking the equipment description to the user's outdoor activity plans."
}

If given context, use it as a supplement to the document information extraction; if no context is given, directly process the document information.
Reference context:
{context}
Expand All @@ -275,6 +316,13 @@
SIMPLE_STRUCT_DOC_READER_PROMPT_ZH = """您是搜索与检索系统的文本分析专家。
您的任务是处理文档片段,并生成一个结构化的 JSON 列表对象。

如果提供了参考上下文,请将上下文作为理解当前文档内容的补充背景,例如:
- 解释文档中的人物、事件或指代
- 理解用户的计划、行为或偏好
- 判断哪些信息可能与用户记忆相关

上下文只用于辅助理解文档内容,不得编造原文中不存在的信息。

请执行以下操作:
1. 识别反映文档中事实内容、见解、决策或含义的关键信息——包括任何显著的主题、结论或数据点,使读者无需阅读原文即可充分理解该片段的核心内容。
2. 清晰解析所有时间、人物、地点和事件的指代:
Expand All @@ -296,7 +344,7 @@
"memory list": [
{
"key": <字符串,`value` 字段的简洁标题>,
"memory_type": "LongTermMemory",
"memory_type": <字符串,"LongTermMemory" 或 "UserMemory">,
"value": <一段清晰准确的段落,全面总结文档片段中的主要观点、论据和信息——若输入摘要为英文,则用英文;若为中文,则用中文>,
"tags": <相关主题关键词列表(例如,["截止日期", "团队", "计划"])>
}
Expand All @@ -316,42 +364,43 @@
{context}

示例:

参考的上下文:
role-user: 我打算背这个去四姑娘山徒步
role-bob: 我也是

输入的文本片段:
在Kalamang语中,亲属名词在所有格构式中的行为并不一致。名词 esa“父亲”和 ema“母亲”只能在技术称谓(teknonym)中与第三人称所有格后缀共现,而在非技术称谓用法中,带有所有格后缀是不合语法的。相比之下,大多数其他亲属名词并不允许所有格构式,只有极少数例外。
语料中还发现一种“双重所有格标记”的现象,即名词同时带有所有格后缀和独立的所有格代词。这种构式在语料中极为罕见,其语用功能尚不明确,且多出现在马来语借词中,但也偶尔见于Kalamang本族词。
此外,黏着词 =kin 可用于表达多种关联关系,包括目的性关联、空间关联以及泛指的群体所有关系。在此类构式中,被标记的通常是施事或关联方,而非被拥有物本身。这一用法显示出 =kin 可能处于近期语法化阶段。
这款登山背包容量40L,重量约1.2kg,采用轻量化铝合金支架结构,适合高海拔长距离徒步使用。背包侧面带有可固定圆柱形随行物品的织带结构,方便在行走过程中快速取放装备。

输出:
{
"memory list": [
{
"key": "亲属名词在所有格构式中的不一致行为",
"key": "40L轻量化高海拔徒步登山背包",
"memory_type": "LongTermMemory",
"value": "Kalamang语中的亲属名词在所有格构式中的行为存在显著差异,其中“父亲”(esa)和“母亲”(ema)仅能在技术称谓用法中与第三人称所有格后缀共现,而在非技术称谓中带所有格后缀是不合语法的。",
"tags": ["亲属名词", "所有格", "语法限制"]
"value": "文档描述了一款容量40L、重量约1.2kg的登山背包,该背包采用轻量化铝合金支架结构,适合高海拔长距离徒步环境使用,并具有良好的负重与稳定性设计。",
"tags": ["登山背包", "徒步装备", "户外装备"]
},
{
"key": "双重所有格标记现象",
"key": "登山背包侧面固定随行装备结构",
"memory_type": "LongTermMemory",
"value": "语料中存在名词同时带有所有格后缀和独立所有格代词的双重所有格标记构式,但该现象出现频率极低,其具体语用功能尚不明确。",
"tags": ["双重所有格", "罕见构式", "语用功能"]
"value": "该登山背包侧面配有织带结构,可用于固定圆柱形随行物品,使用户在徒步过程中能够快速取放相关装备,提高行走时的便利性。",
"tags": ["背包设计", "侧挂结构", "户外装备"]
},
{
"key": "双重所有格与借词的关系",
"memory_type": "LongTermMemory",
"value": "双重所有格标记多见于马来语借词中,但也偶尔出现在Kalamang本族词中,显示该构式并非完全由语言接触触发。",
"tags": ["语言接触", "借词", "构式分布"]
},
{
"key": "=kin 的关联功能与语法地位",
"memory_type": "LongTermMemory",
"value": "黏着词 =kin 用于表达目的性、空间或群体性的关联关系,其标记对象通常为关联方而非被拥有物,这表明 =kin 可能处于近期语法化过程中。",
"tags": ["=kin", "关联关系", "语法化"]
"key": "四姑娘山徒步装备计划",
"memory_type": "UserMemory",
"value": "结合上下文信息,用户和Bob计划在四姑娘山徒步活动中使用该款登山背包作为随身装备,这表明该装备已被纳入他们的徒步行程准备。",
"tags": ["用户计划", "四姑娘山", "徒步"]
}
],
"summary": "该文本描述了Kalamang语中所有格构式的多样性与不对称性。亲属名词在所有格标记上的限制显示出语义类别内部的分化,而罕见的双重所有格构式则反映了构式层面的不稳定性。同时,=kin 的多功能关联用法及其分布特征为理解该语言的语法化路径提供了重要线索。"
"summary": "该文本介绍了一款适用于高海拔徒步环境的登山背包,其容量为40L、重量约1.2kg,并采用轻量化铝合金支架结构以提高负重稳定性。此外,背包侧面还设计了可固定圆柱形随行物品的织带结构,方便在徒步过程中快速取放装备。结合用户提供的上下文信息,该装备被计划用于四姑娘山徒步行程,因此不仅具有装备信息价值,也与用户的户外活动计划相关。"
}

如果给定了上下文,就结合上下文信息作为文档信息提取的补充,如果没有给定上下文,请直接处理文档信息。
参考的上下文:
{context}

Comment on lines +400 to +403
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMPLE_STRUCT_DOC_READER_PROMPT_ZH repeats the “如果给定了上下文...” instruction and includes {context} twice (lines 400-403 duplicates 362-364). This adds unnecessary tokens and can confuse the model about which context to use. Recommend keeping a single context instruction + {context} placeholder (ideally right before 文档片段: {chunk_text}) and removing the duplicate block.

Suggested change
如果给定了上下文就结合上下文信息作为文档信息提取的补充如果没有给定上下文请直接处理文档信息
参考的上下文
{context}

Copilot uses AI. Check for mistakes.
文档片段:
{chunk_text}

Expand Down
Loading