[Python] How to Run 70B LLMs on a Single 4GB GPU

https://generativeai.pub/how-to-run-70b-llms-on-a-single-4gb-gpu-d1c61ed5258c

How to Run 70B LLMs on a Single 4GB GPU

Have you ever dreamed of using the state-of-the-art large language models (LLMs) for your natural language processing (NLP) tasks, but felt…

generativeai.pub

#https://generativeai.pub/how-to-run-70b-llms-on-a-single-4gb-gpu-d1c61ed5258c

from airllm import AutoModel

MAX_LENGTH = 128

# load the model from the Hugging Face hub
model = AutoModel.from_pretrained("garage-bAInd/Platypus2-70B-instruct")

# or load the model from a local path
# model = AutoModel.from_pretrained("/home/ubuntu/.cache/huggingface/hub/models--garage-bAInd--Platypus2-70B-instruct/snapshots/b585e74bcaae02e52665d9ac6d23f4d0dbc81a0f")

# prepare the input text
input_text = [
    'What is the capital of United States?',
]

# tokenize the input text
input_tokens = model.tokenizer(input_text,
    return_tensors="pt",
    return_attention_mask=False,
    truncation=True,
    max_length=MAX_LENGTH,
    padding=False)

# generate the output text
generation_output = model.generate(
    input_tokens['input_ids'].cuda(),
    max_new_tokens=20,
    use_cache=True,
    return_dict_in_generate=True)

# decode the output text
output = model.tokenizer.decode(generation_output.sequences[0])

# print the output text
print(output)

저작자표시 비영리 변경금지 (새창열림)

'Python' 카테고리의 다른 글

Dragon Moving Banner (0)	2024.05.07
[Errno 13] Permission denied: '/dev/ttyUSB0' (0)	2024.05.02
[Python]Convolutional Neural Network (CNN) & Computer Vision (0)	2024.04.25
[Python]Python GUI PyQt UI 생성 및 연결(Python GUI) (0)	2024.04.24
ESP32 cam Person Detection (0)	2024.04.19

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

AI 3D Printing

[Python] How to Run 70B LLMs on a Single 4GB GPU

'Python' 카테고리의 다른 글

댓글

티스토리툴바

[Python] How to Run 70B LLMs on a Single 4GB GPU

'Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바