Skip to content

bug: memory of position_encoding_table is not malloced correctly. #790

@johnson-magic

Description

@johnson-magic

Branch/Tag/Commit

main

Docker Image Version

nvcr.io/nvidia/pytorch:22.12-py3

GPU name

A10

CUDA Driver

535.54.03

Reproduced Steps

1. docker run -ti --gpus all --rm nvcr.io/nvidia/pytorch:22.12-py3 bash
2. git clone --recursive https://github.com/NVIDIA/FasterTransformer.git
3. cd FasterTransformer
4. mkdir build
5. cd build
6. cmake -DSM=86 -DCMAKE_BUILD_TYPE=Release ..
7. make -j14
8. CUDA_VISIBLE_DEVICES=0 ./satrn 1 1 8 64 2048 4022 3 100 576 512 0 0.0 0

Abnormal Phenomena:
in

val = val + position_encoding[step_offset + col_index];
, step_offset is calculated with intervals of hidden_units,

So I think

cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);
should be
cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * hidden_units_);
instead of
cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);

There are two similar situations

cudaD2Dcpy(weights_ptr[0], other.weights_ptr[0], max_seq_len_ * vocab_size_);

deviceMalloc(&weights_ptr[0], max_seq_len_ * vocab_size_);

I have pull a pr to try to fix it. @byshiue

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions