Skip to content

Conversation

@amanjaiswal73892
Copy link
Collaborator

@amanjaiswal73892 amanjaiswal73892 commented Oct 21, 2025

This pull request enhances the hint selection logic in src/agentlab/utils/hinting.py to allow skipping hints not only for the current task but also for the current goal. It also improves how the hint database is loaded by parsing certain columns as JSON, making downstream processing more robust.

Hint skipping improvements:

  • Added a skip_hints_for_current_goal parameter to the hinting utility, enabling the option to skip hints associated with the current goal in addition to the current task. [1] [2]
  • Updated the choose_hints_llm and choose_hints_emb methods to use the new goal-based skipping logic, ensuring hints related to the current goal are excluded when specified. [1] [2]
  • Added a new get_current_goal_hints method that retrieves hints associated with the current goal from the hint database.

Hint database loading improvements:

  • Modified the CSV loading logic to automatically parse the trace_paths_json and source_trace_goals columns as JSON objects, improving data consistency and simplifying later access. [1] [2]…races

Description by Korbit AI

What change is being made?

Add a new flag skip_hints_for_current_goal to optionally skip hints that relate to the current goal, and implement the underlying logic and data handling to support skipping based on the current goal.

Why are these changes being made?

To allow finer control over hint selection by excluding hints tied to the current goal, reducing noise or leakage of goal-specific hints. This is implemented by parsing stored goal references and filtering hints accordingly, with the new flag applied in both LLM and embedding-based hint retrieval paths.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

Copy link

@korbit-ai korbit-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by Korbit AI

Korbit automatically attempts to detect when you fix issues in new commits.
Category Issue Status
Performance Inefficient row-wise apply for goal filtering ▹ view ✅ Fix detected
Error Handling Missing error handling for JSON parsing in CSV converters ▹ view
Performance Eager JSON parsing during CSV load ▹ view
Files scanned
File Path Reviewed
src/agentlab/utils/hinting.py

Explore our documentation to understand the languages and file types we support and the files we ignore.

Check out our docs on how you can make Korbit work best for you and your team.

Loving Korbit!? Share us on LinkedIn Reddit and X


def get_current_goal_hints(self, goal_str: str):
hints_df = self.hint_db[
self.hint_db.apply(lambda x: goal_str in x.source_trace_goals, axis=1)

This comment was marked as resolved.

Comment on lines +56 to +59
converters={
"trace_paths_json": lambda x: json.loads(x) if pd.notna(x) else [],
"source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [],
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eager JSON parsing during CSV load category Performance

Tell me more
What is the issue?

JSON parsing is performed for every row during CSV loading, even for columns that may not be used frequently.

Why this matters

This eager parsing approach increases memory usage and loading time, especially for large CSV files. If these JSON fields are only accessed occasionally, the parsing overhead is wasted for unused data.

Suggested change ∙ Feature Preview

Consider lazy parsing by keeping JSON fields as strings initially and parsing them on-demand:

# Load without converters initially
self.hint_db = pd.read_csv(self.hint_db_path, header=0, index_col=None, dtype=str)

# Parse JSON only when needed
def _get_source_trace_goals(self, row_index):
    goals_str = self.hint_db.loc[row_index, 'source_trace_goals']
    return json.loads(goals_str) if pd.notna(goals_str) else []
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

dtype=str,
converters={
"trace_paths_json": lambda x: json.loads(x) if pd.notna(x) else [],
"source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for JSON parsing in CSV converters category Error Handling

Tell me more
What is the issue?

The JSON parsing converter does not handle potential JSON parsing errors that could occur with malformed JSON data in the CSV file.

Why this matters

If the CSV contains invalid JSON in the source_trace_goals column, json.loads() will raise a JSONDecodeError, causing the entire hint database loading to fail and breaking the application startup.

Suggested change ∙ Feature Preview

Add try-except blocks to handle JSON parsing errors gracefully:

"source_trace_goals": lambda x: json.loads(x) if pd.notna(x) else [] if pd.notna(x) else [],

Better solution:

def safe_json_parse(x):
    if pd.notna(x):
        try:
            return json.loads(x)
        except json.JSONDecodeError:
            logger.warning(f"Failed to parse JSON: {x}")
            return []
    return []

# Then use:
"source_trace_goals": safe_json_parse,
"trace_paths_json": safe_json_parse,
Provide feedback to improve future suggestions

Nice Catch Incorrect Not in Scope Not in coding standard Other

💬 Looking for more details? Reply to this comment to chat with Korbit.

skip_hints = []
if self.skip_hints_for_current_task:
skip_hints = self.get_current_task_hints(task_name)
skip_hints += self.get_current_task_hints(task_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a minor issue but if skip_hints_for_current_task is True, there is no need to check for skip_hints_for_current_goal, right?

@hnekoeiq hnekoeiq merged commit f06c6d0 into generic_agent_hinter Oct 21, 2025
6 checks passed
@hnekoeiq hnekoeiq deleted the improve_generic_agent_hinter branch October 21, 2025 21:40
amanjaiswal73892 added a commit that referenced this pull request Nov 27, 2025
* fixes

* add new deps

* use external embedding service in task hints retrieval

* gpt5 fixes

* first cut

* update

* add event listeners and launcher

* Add codegen step-wise recoder agent

* adding task hints to generic agent

* fix repeated llm configs

* load env vars in codegen agent

* same hints retrieval for both generic and tooluse agents

* filter out current task hints if needed

* fix llm config, add gpt-5

* fix

* pass new flag and fix db path passing issue

* fix goal text

* fix current task hints exclusion

* remove old reqs

* remove recorder from that brach

* log task errors

* expore agentlabxray

* remove commented old chunk

* share xray only when env flag present

* Add StepWiseQueriesPrompt for enhanced query handling in GenericAgent

* update hinting agent retrieval

* stepwise hint retrieval

* added shrink method

* (wip) refactor hinting index

* (wip) clean up prompt file

* add scripts to run generic and hinter agents, update tmlr config for hinter

* move HintsSource to separate hinting file

* update hinter agent and prompt

* fix prompt for task hint

* undo changes to tmlr config

* update hinter agent

* formatting

* bug fix hint retrieval

* improve launch script

* get queries only for step level hint

* Add webarenalite to agentlab loop.py

* update stepwise hint queries prompt

* fix exc logging

* non empty instruction

* allow less then max hint queries

* add generic agent gpt5-nano config

* make ray available on toolkit

* check that hints db exists

* Fix assignment of queries_for_hints variable

* Improve generic agent hinter (#309)

* Make LLM retreival topic index selection more robust

* add new flag to skip hints with the current goal in the hint source t… (#310)

* add new flag to skip hints with the current goal in the hint source traces

* Rename generic agent hinter to hint_use_agent (#311)

* rename generic_agent_hinter to hint_use_agent for clarity

* Add deprecation warning and module alias for generic_agent_hinter

* improve module aliasing for submodules

* Add todo rename agent name

* black

* bugfix: check for hint_db only when use_task_hint is true.

* fix: address missing initialization and correct args reference in choose_hints method

* black

* bugfix: skip HintSource init if use_task_hint is false

* Fix incorrect references for docs retrieval hinter agent (#313)

* address comments

* format

* Add Environment Variable for Ray port (#315)

* add env variable for ray port

* document env variables

* undo removed llm_config

* undo unnessary change

* add missing default values for hint prompt flags

* black

* update names in scripts

* use default prompt in hintSource for Tool Use agent

* remove experiment scripts

---------

Co-authored-by: Oleh Shliazhko <[email protected]>
Co-authored-by: Hadi Nekoei <[email protected]>
Co-authored-by: Oleh Shliazhko <[email protected]>
Co-authored-by: recursix <[email protected]>
Co-authored-by: Patrice Bechard <[email protected]>
Co-authored-by: Hadi Nekoei <[email protected]>
Co-authored-by: Patrice Bechard <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants