-
Notifications
You must be signed in to change notification settings - Fork 10
Implement pipeline and new schema for image datasets #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
@MajikalExplosions could you check the failing pre-commit checks and Python unit tests? Thanks! |
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
submitting a review comment so I pop it off my review stack, but please re-request review when tests are passing!
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments!
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, basically looks good! One comment though, I think we should try to fix the schema to support things we need instead of using getattr and hasattr
|
|
||
| # Handle nested image observation | ||
| image_path = None | ||
| if hasattr(event, "image_observation") and event.image_observation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused, where would this nested image observation come from? I didn't see it in other parts of the code.
In general, "getattr" and "hasattr" are kinda anti-patterns in Python programming. They are indicative of not strictly adhering to type definitions, and can cause all kinds of tricky runtime errors. Let's try to write this without using these.
Summary
Implements multimodal (image + text) fine-tuning support, enabling 9 datasets to preserve visual information for training vision-language models instead of converting images to text descriptions.
Implementation
schema/observation/image.py: Schema enhancements that:
ImageAnnotation:content_description,clickable,editableNone)agents/openhands/std_to_sft.py: Multimodal SFT conversion that:
<image>tokens in conversation text at appropriate positions_image_pathmetadata[clickable],[editable])imagesarrayDataset converters (raw_to_standardized.py): Updated 9 image datasets.
content_descriptionwith resource ID/hint/tooltip, andclickable/editablefrom raw data fieldstexttocontent_description, marks all elements asclickable=Truecontent_descriptionwith XPath informationdatasets/openhands/screenshots/{trajectory_id}/directory structureTesting
In progress. Some verification is done on the output.
Notes
count(<image> tokens) == len(images array))