Rough draft of tool support by robotrapta · Pull Request #3 · groundlight/trl

robotrapta · 2025-02-07T00:28:04Z

No description provided.

…e conversation.

sunildkumar

nice! left some questions.

sunildkumar · 2025-02-07T00:35:09Z

trl/trainer/qwen_grpo_trainer.py

+        # Check if the stop string is in the completions
+        # We need to convert the tensor to a string.
+        if self.tool_defn.completion_has_tool_call(prompt_completion_str):
+            tool_response_str = self.tool_defn.call_tool(prompt_completion_str)


why doesn't the dataclass have call_tool?

sunildkumar · 2025-02-07T00:53:32Z

trl/trainer/qwen_grpo_trainer.py

+        # We need to convert the tensor to a string.
+        if self.tool_defn.completion_has_tool_call(prompt_completion_str):
+            tool_response_str = self.tool_defn.call_tool(prompt_completion_str)
+            tool_response_ids_list = self.processing_class.tokenizer.encode(tool_response_str, add_special_tokens=False)


I'm assuming this doesn't add an extra BOS token?

sunildkumar · 2025-02-07T01:08:42Z

trl/trainer/qwen_grpo_trainer.py

        return inputs

+    def _generate_completion(
+        self, model: PreTrainedModel, prompt_inputs: dict[str, torch.Tensor]


I think prompt_inputs might be a BatchFeature: https://huggingface.co/docs/transformers/en/main_classes/feature_extractor#transformers.BatchFeature

Also, thanks for making this a function, clearly the right move.

sunildkumar · 2025-02-07T01:10:15Z

trl/trainer/qwen_grpo_trainer.py

+        return prompt_completion_ids
+
+    def _generate_single_completion_with_tools(
+        self, model: PreTrainedModel, prompt_inputs: dict[str, torch.Tensor], max_steps: int = 10


same nit here - BatchFeature

sunildkumar · 2025-02-07T01:11:00Z

trl/trainer/qwen_grpo_trainer.py

+        (Note that 46*44 is 2024).
+        """
+        conv = SingleConversationWithTools(prompt_inputs, self.tool_defn, self.processing_class)
+        # Loop until tool isn't called, of we max out


Suggested change

# Loop until tool isn't called, of we max out

# Loop until tool isn't called, or we max out

sunildkumar · 2025-02-07T01:22:04Z

trl/trainer/qwen_grpo_trainer.py

+        - input_ids: [1, 710] ints.  Some stuff at the beginning and the end, the middle full of 151655
+        - attention_mask: [1, 710] ints.  All 1
+        - pixel_values: 2024x1176 floats.  The image.
+        - image_grid_thw: a 1x3 tensor with values: [1, 46, 44].


nit: maybe add a short comment about what max _steps is.

My understanding: The generation will stop once a tool is called, then this code processes the tool call. max_steps is the maximum number of tools we're willing to process for a single completion?

robotrapta added 11 commits February 5, 2025 18:03

First crack at tool hook.

5f15aaa

It's calling my completion now.

29a8ef1

Progress - sorta maybe almost incorporating the tool response into th…

c1c8a92

…e conversation.

Computes completions + tool-calls individually and pads them together.

e6e4761

Tool call with strings not ids.

7d72188

Much closer to incorporating tool responses.

69291c4

More debug output.

92fab09

Taking out pdb.

f73e913

Merge remote-tracking branch 'origin/improve_performance' into vlmtool

c3c10fe

Taking out secret decoder ring.

8186689

Fixing bug with including the prompt in the completion output.

c6da400

sunildkumar approved these changes Feb 7, 2025

View reviewed changes

robotrapta added 6 commits February 6, 2025 23:13

VERBOSE from environment. Catches exceptions in tool calls.

d2ee50b

Multiple verbosity levels.

5ac1d04

Prints when it's done generating.

efc25df

Loss magnifier to avoid underflow.

d88f6c2

Turning off loss magnifier by default.

ae74b4a

Reducing loss magnifier default to 1

68965d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rough draft of tool support#3

Rough draft of tool support#3
robotrapta wants to merge 17 commits intoimprove_performancefrom
vlmtool

robotrapta commented Feb 7, 2025

Uh oh!

sunildkumar left a comment

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

sunildkumar Feb 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Loop until tool isn't called, of we max out
	# Loop until tool isn't called, or we max out

Conversation

robotrapta commented Feb 7, 2025

Uh oh!

sunildkumar left a comment

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

sunildkumar Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants