-
Notifications
You must be signed in to change notification settings - Fork 104
add new flag to skip hints with the current goal in the hint source t… #310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review by Korbit AI
Korbit automatically attempts to detect when you fix issues in new commits.
| Category | Issue | Status |
|---|---|---|
| Inefficient row-wise apply for goal filtering ▹ view | ✅ Fix detected | |
| Missing error handling for JSON parsing in CSV converters ▹ view | ||
| Eager JSON parsing during CSV load ▹ view |
Files scanned
| File Path | Reviewed |
|---|---|
| src/agentlab/utils/hinting.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Check out our docs on how you can make Korbit work best for you and your team.
src/agentlab/utils/hinting.py
Outdated
|
|
||
| def get_current_goal_hints(self, goal_str: str): | ||
| hints_df = self.hint_db[ | ||
| self.hint_db.apply(lambda x: goal_str in x.source_trace_goals, axis=1) |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| converters={ | ||
| "trace_paths_json": lambda x: json.loads(x) if pd.notna(x) else [], | ||
| "source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [], | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eager JSON parsing during CSV load 
Tell me more
What is the issue?
JSON parsing is performed for every row during CSV loading, even for columns that may not be used frequently.
Why this matters
This eager parsing approach increases memory usage and loading time, especially for large CSV files. If these JSON fields are only accessed occasionally, the parsing overhead is wasted for unused data.
Suggested change ∙ Feature Preview
Consider lazy parsing by keeping JSON fields as strings initially and parsing them on-demand:
# Load without converters initially
self.hint_db = pd.read_csv(self.hint_db_path, header=0, index_col=None, dtype=str)
# Parse JSON only when needed
def _get_source_trace_goals(self, row_index):
goals_str = self.hint_db.loc[row_index, 'source_trace_goals']
return json.loads(goals_str) if pd.notna(goals_str) else []Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| dtype=str, | ||
| converters={ | ||
| "trace_paths_json": lambda x: json.loads(x) if pd.notna(x) else [], | ||
| "source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for JSON parsing in CSV converters 
Tell me more
What is the issue?
The JSON parsing converter does not handle potential JSON parsing errors that could occur with malformed JSON data in the CSV file.
Why this matters
If the CSV contains invalid JSON in the source_trace_goals column, json.loads() will raise a JSONDecodeError, causing the entire hint database loading to fail and breaking the application startup.
Suggested change ∙ Feature Preview
Add try-except blocks to handle JSON parsing errors gracefully:
"source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [] if pd.notna(x) else [],Better solution:
def safe_json_parse(x):
if pd.notna(x):
try:
return json.loads(x)
except json.JSONDecodeError:
logger.warning(f"Failed to parse JSON: {x}")
return []
return []
# Then use:
"source_trace_goals": safe_json_parse,
"trace_paths_json": safe_json_parse,Provide feedback to improve future suggestions
💬 Looking for more details? Reply to this comment to chat with Korbit.
| skip_hints = [] | ||
| if self.skip_hints_for_current_task: | ||
| skip_hints = self.get_current_task_hints(task_name) | ||
| skip_hints += self.get_current_task_hints(task_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a minor issue but if skip_hints_for_current_task is True, there is no need to check for skip_hints_for_current_goal, right?
* fixes * add new deps * use external embedding service in task hints retrieval * gpt5 fixes * first cut * update * add event listeners and launcher * Add codegen step-wise recoder agent * adding task hints to generic agent * fix repeated llm configs * load env vars in codegen agent * same hints retrieval for both generic and tooluse agents * filter out current task hints if needed * fix llm config, add gpt-5 * fix * pass new flag and fix db path passing issue * fix goal text * fix current task hints exclusion * remove old reqs * remove recorder from that brach * log task errors * expore agentlabxray * remove commented old chunk * share xray only when env flag present * Add StepWiseQueriesPrompt for enhanced query handling in GenericAgent * update hinting agent retrieval * stepwise hint retrieval * added shrink method * (wip) refactor hinting index * (wip) clean up prompt file * add scripts to run generic and hinter agents, update tmlr config for hinter * move HintsSource to separate hinting file * update hinter agent and prompt * fix prompt for task hint * undo changes to tmlr config * update hinter agent * formatting * bug fix hint retrieval * improve launch script * get queries only for step level hint * Add webarenalite to agentlab loop.py * update stepwise hint queries prompt * fix exc logging * non empty instruction * allow less then max hint queries * add generic agent gpt5-nano config * make ray available on toolkit * check that hints db exists * Fix assignment of queries_for_hints variable * Improve generic agent hinter (#309) * Make LLM retreival topic index selection more robust * add new flag to skip hints with the current goal in the hint source t… (#310) * add new flag to skip hints with the current goal in the hint source traces * Rename generic agent hinter to hint_use_agent (#311) * rename generic_agent_hinter to hint_use_agent for clarity * Add deprecation warning and module alias for generic_agent_hinter * improve module aliasing for submodules * Add todo rename agent name * black * bugfix: check for hint_db only when use_task_hint is true. * fix: address missing initialization and correct args reference in choose_hints method * black * bugfix: skip HintSource init if use_task_hint is false * Fix incorrect references for docs retrieval hinter agent (#313) * address comments * format * Add Environment Variable for Ray port (#315) * add env variable for ray port * document env variables * undo removed llm_config * undo unnessary change * add missing default values for hint prompt flags * black * update names in scripts * use default prompt in hintSource for Tool Use agent * remove experiment scripts --------- Co-authored-by: Oleh Shliazhko <[email protected]> Co-authored-by: Hadi Nekoei <[email protected]> Co-authored-by: Oleh Shliazhko <[email protected]> Co-authored-by: recursix <[email protected]> Co-authored-by: Patrice Bechard <[email protected]> Co-authored-by: Hadi Nekoei <[email protected]> Co-authored-by: Patrice Bechard <[email protected]>
This pull request enhances the hint selection logic in
src/agentlab/utils/hinting.pyto allow skipping hints not only for the current task but also for the current goal. It also improves how the hint database is loaded by parsing certain columns as JSON, making downstream processing more robust.Hint skipping improvements:
skip_hints_for_current_goalparameter to the hinting utility, enabling the option to skip hints associated with the current goal in addition to the current task. [1] [2]choose_hints_llmandchoose_hints_embmethods to use the new goal-based skipping logic, ensuring hints related to the current goal are excluded when specified. [1] [2]get_current_goal_hintsmethod that retrieves hints associated with the current goal from the hint database.Hint database loading improvements:
trace_paths_jsonandsource_trace_goalscolumns as JSON objects, improving data consistency and simplifying later access. [1] [2]…racesDescription by Korbit AI
What change is being made?
Add a new flag skip_hints_for_current_goal to optionally skip hints that relate to the current goal, and implement the underlying logic and data handling to support skipping based on the current goal.
Why are these changes being made?
To allow finer control over hint selection by excluding hints tied to the current goal, reducing noise or leakage of goal-specific hints. This is implemented by parsing stored goal references and filtering hints accordingly, with the new flag applied in both LLM and embedding-based hint retrieval paths.