Add inference code #199

wade3han · 2023-04-10T05:04:26Z

Tested with own fine-tuned 7B alpaca model

python inference.py \
    --model_name_or_path {model_path}

Instruction: Tell me about alpacas.
|  2499 | Al       | -15.960 | 0.00%
| 29886 | p        | -33.403 | 0.00%
|   562 | ac       | -32.065 | 0.00%
|   294 | as       | -24.586 | 0.00%
|   526 | are      | -20.448 | 0.00%
|   263 | a        | -17.845 | 0.00%
|  6606 | species  | -16.602 | 0.00%
|   310 | of       | -15.564 | 0.00%
|  4275 | South    | -11.832 | 0.00%
|  3082 | American | -22.230 | 0.00%
|  3949 | cam      | -12.354 | 0.00%
|   295 | el       | -34.635 | 0.00%
|   333 | id       | -19.849 | 0.00%
| 29892 | ,        | -20.313 | 0.00%
...
| 29889 | .        | -25.931 | 0.00%
|     2 | </s>     | -21.040 | 0.00%
Response:  Alpacas are a species of South American camelid, related to the llama. They are smaller than llamas and typically have finer fiber. Alpacas are primarily bred for thei
r fiber, which can be spun into soft and luxurious yarns. They are also used for their meat, which is similar to that of a chicken. Alpacas are social animals and live in herds w
ith a dominant male leader.</s>

...

Largely influenced by https://github.com/kriskrisliu/stanford_alpaca/tree/krisliu

MrRace · 2023-04-10T09:58:54Z

    indices = sequences[:, cut_idx:] + beam_sequence_indices
RuntimeError: The size of tensor a (114) must match the size of tensor b (259) at non-singleton dimension 1

Have you meet error like it? @wade3han

wade3han · 2023-04-10T10:11:02Z

No, I didn't encounter that error. Can you give me more context?

MrRace · 2023-04-10T16:05:04Z

No, I didn't encounter that error. Can you give me more context?

just use :

instructions = [
        "模仿鲁迅的风格, 吐槽一下最近食堂饭菜涨价",
    ]

diichen · 2023-04-13T12:50:00Z

same problem.

magnificent1208 · 2023-04-14T07:39:45Z

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm( handle, opa, opb, m, n, k, &alpha, a, lda, b, ldb, &beta, c, ldc)

same for inferencing both llama-7b-hf and fine-tune model

diichen · 2023-04-14T08:36:29Z

Cool! The problem has been fixed.

BaoBaoGitHub · 2023-07-20T08:41:49Z

Thanks for the code!

However, I had some problems when I run the code in my server with three 3090 GPUs with VRAM of 24GB*3.
I solved the error of out of memory by commenting out the line model.cuda().
Then I solved the error "Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!" by commenting out the line num_beams=4, .

I know model.cuda() will set all model to the first GPU.
But what happend when I commenting out the line num_beams=4 ? Why it can fix the error?

adding inference code

64100b4

wade3han mentioned this pull request Apr 10, 2023

How to inference after finetuning ? #48

Closed

remove unnecessary code

d6ed004

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add inference code #199

Add inference code #199

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023

diichen commented Apr 13, 2023

magnificent1208 commented Apr 14, 2023

diichen commented Apr 14, 2023

BaoBaoGitHub commented Jul 20, 2023 •

edited

Loading

Add inference code #199

Are you sure you want to change the base?

Add inference code #199

Conversation

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023

wade3han commented Apr 10, 2023

MrRace commented Apr 10, 2023

diichen commented Apr 13, 2023

magnificent1208 commented Apr 14, 2023

diichen commented Apr 14, 2023

BaoBaoGitHub commented Jul 20, 2023 • edited Loading

BaoBaoGitHub commented Jul 20, 2023 •

edited

Loading