Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

一个服务化的可多GPU并行处理的方案(基于LitServe) #667

Closed
randydl opened this issue Sep 27, 2024 · 30 comments
Closed

一个服务化的可多GPU并行处理的方案(基于LitServe) #667

randydl opened this issue Sep 27, 2024 · 30 comments
Labels
enhancement New feature or request

Comments

@randydl
Copy link
Contributor

randydl commented Sep 27, 2024

支持传入jpg、png、pdf路径。批量处理的话大家只需要简单的多线程调用客户端的do_parse函数就可以了,服务端会自动在多个GPU上并行处理。

pip install -U litserve python-multipart filetype
pip install torch==2.3.1 torchvision==0.18.1 torchaudio==2.3.1
pip install -U magic-pdf[full] --extra-index-url https://wheels.myhloli.com
pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118

server.py

import torch
import filetype
import json, uuid
import litserve as ls
from unittest.mock import patch
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        with patch('magic_pdf.model.doc_analyze_by_custom_model.get_device') as mock_obj:
            mock_obj.return_value = device
            model_manager = ModelSingleton()
            model_manager.get_model(True, False)
            model_manager.get_model(False, False)
            mock_obj.assert_called()
            print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

client.py

import json
import pymupdf
import requests
import numpy as np
from loguru import logger
from joblib import Parallel, delayed


def to_pdf(file_path):
    with pymupdf.open(file_path) as f:
        if f.is_pdf:
            pdf_bytes = f.tobytes()
        else:
            pdf_bytes = f.convert_to_pdf()
        return pdf_bytes


def do_parse(file_path, url='http://127.0.0.1:8000/predict', **kwargs):
    try:
        kwargs.setdefault('parse_method', 'auto')
        kwargs.setdefault('debug_able', False)

        response = requests.post(url,
            data={'kwargs': json.dumps(kwargs)},
            files={'file': to_pdf(file_path)}
        )

        if response.status_code == 200:
            output = response.json()
            output['file_path'] = file_path
            return output
        else:
            raise Exception(response.text)
    except Exception as e:
        logger.error(f'File: {file_path} - Info: {e}')


if __name__ == '__main__':
    files = ['/tmp/small_ocr.pdf']
    n_jobs = np.clip(len(files), 1, 4)
    results = Parallel(n_jobs, prefer='threads', verbose=10)(
        delayed(do_parse)(p) for p in files
    )
    print(results)
@randydl randydl added the enhancement New feature or request label Sep 27, 2024
@myhloli myhloli pinned this issue Sep 27, 2024
@BlackMoki-bot
Copy link

BlackMoki-bot commented Sep 28, 2024

你好,我在运行代码时,服务器端一直报Exception: Parsing error: 'Layoutlmv3_Predictor' object has no attribute 'parameters',客户端一直报requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/predict
http://127.0.0.1能正常访问,请问这是什么原因呀?跪求大佬指教!

@randydl
Copy link
Contributor Author

randydl commented Sep 30, 2024

你好,我在运行代码时,服务器端一直报Exception: Parsing error: 'Layoutlmv3_Predictor' object has no attribute 'parameters',客户端一直报requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: http://127.0.0.1:8000/predicthttp://127.0.0.1能正常访问,请问这是什么原因呀?跪求大佬指教!

看样子是你的处理代码有问题,不是服务的问题

@randydl randydl changed the title 给大家提供一个多GPU并行处理的API调用方案,基于 LitServe (FastAPI) 一个服务化的可多GPU并行处理的实现方案(基于LitServe) Oct 8, 2024
@randydl randydl changed the title 一个服务化的可多GPU并行处理的实现方案(基于LitServe) 一个服务化的可多GPU并行处理的方案(基于LitServe) Oct 8, 2024
@flow3rdown
Copy link

使用这个代码后,表格识别变得巨慢,是什么原因呢?

@randydl
Copy link
Contributor Author

randydl commented Oct 12, 2024

使用这个代码后,表格识别变得巨慢,是什么原因呢?

你不使用服务化的方式,用magic-pdf cli的方式慢吗?

@flow3rdown
Copy link

使用这个代码后,表格识别变得巨慢,是什么原因呢?

你不使用服务化的方式,用magic-pdf cli的方式慢吗?

这样的话速度是正常的,表格识别用的TableMaster

@PoisonousBromineChan
Copy link

代码实际上没看懂咋用,就习惯性地先开server.py,把client.py里面的文件路径改成自己的再启动。结果发现报错和small_ocr.pdf有关,明明我要处理的文件都没有small_ocr.pdf了,不知道如何解决。
有没有简单一点的方法,比如直接改magic-pdf.json?把里面设备一栏改成多CUDA的?

@randydl
Copy link
Contributor Author

randydl commented Oct 16, 2024

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

@flow3rdown
Copy link

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

请问您这边跑的时候表格识别速度正常吗?

@ywh-my
Copy link

ywh-my commented Oct 18, 2024

感谢,跑通了。额外安装库 pip install python-multipart,然后启动服务器程序就请求成功了。
另外如果希望仅仅输出.md文件来节省存储空间和速度的话可以:
from magic_pdf.libs.MakeContentConfig import MakeMode # 添加这行

修改do parse 函数:

        do_parse(self.output_dir,
                  pdf_name, inputs[0],
                    [],
                    **inputs[1],
                    f_draw_span_bbox=False,
                    f_draw_layout_bbox=False,
                    f_dump_md=True,
                    f_dump_middle_json=False,
                    f_dump_model_json=False,
                    f_dump_orig_pdf=False,
                    f_dump_content_list=False,
                    f_make_md_mode=MakeMode.MM_MD,
                    f_draw_model_bbox=False)

@randydl
Copy link
Contributor Author

randydl commented Oct 18, 2024

应该是你的代码改错了吧,我这边正常运行,改了文件路径怎么可能还有small_ocr.pdf,这只是个example file @PoisonousBromineChan

请问您这边跑的时候表格识别速度正常吗?

表格我还没验证过,有时间我试试看

@234687552
Copy link

问题描述:

参考server.py使用LitServe调用,发现表格识别巨慢

系统&环境:

PRETTY_NAME="Ubuntu 24.04 LTS"

Python 3.10.14

magic-pdf version 0.7.1

paddlepaddle-gpu 3.0.0b1

magic-pdf.json配置

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

实验pdf链接:

https://github.com/opendatalab/MinerU/blob/master/demo/demo1.pdf

使用litserve

输出日志为:

2024-10-19 21:10:57.105 | INFO | magic_pdf.libs.pdf_check:detect_invalid_chars:57 - cid_count: 0, text_len: 1501, cid_chars_radio: 0.0
2024-10-19 21:10:57.861 | INFO | magic_pdf.model.pdf_extract_kit:__call__:170 - layout detection cost: 0.68
Model initialization complete!
Setup complete for worker 3.

0: 1888x1344 4 embeddings, 92.2ms
Speed: 12.7ms preprocess, 92.2ms inference, 13.2ms postprocess per image at shape (1, 3, 1888, 1344)
2024-10-19 21:10:58.633 | INFO | magic_pdf.model.pdf_extract_kit:__call__:200 - formula nums: 4, mfr time: 0.2
2024-10-19 21:10:58.640 | INFO | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins-----------------
2024-10-19 21:14:13.524 | INFO | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 194.88404989242554s-----
2024-10-19 21:14:13.525 | INFO | magic_pdf.model.pdf_extract_kit:__call__:317 - table cost: 194.89
2024-10-19 21:14:13.525 | INFO | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:124 - doc analyze cost: 196.3451521396637
2024-10-19 21:14:13.567 | INFO | magic_pdf.pdf_parse_union_core:pdf_parse_union:221 - page_id: 0, last_page_cost_time: 0.0
2024-10-19 21:14:13.663 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:143 - 发现了列表,列表行数:[(0, 1)], [[0]]
2024-10-19 21:14:13.663 | INFO | magic_pdf.para.para_split_v2:__detect_list_lines:156 - 列表行的第0到第1行是列表
2024-10-19 21:14:13.797 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
2024-10-19 21:14:13.805 | INFO | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:43 - uni_pipe mk content list finished
2024-10-19 21:14:13.805 | INFO | magic_pdf.tools.common:do_parse:119 - local output dir is /tmp/91dc2fda-fb5c-431f-bbce-9dcdc8ce3596/auto

使用命令行

/opt/mineru_venv/bin/magic-pdf -p origin.pdf -m auto

输出日志为:

[10/19 21:41:53 d2.checkpoint.detection_checkpoint]: [DetectionCheckpointer] Loading from /opt/models/Layout/model_final.pth ...
[10/19 21:41:53 fvcore.common.checkpoint]: [Checkpointer] Loading from /opt/models/Layout/model_final.pth ...
2024-10-19 21:41:56.518 | INFO     | magic_pdf.model.pdf_extract_kit:__init__:159 - DocAnalysis init done!
2024-10-19 21:41:56.518 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:custom_model_init:98 - model init cost: 21.35542368888855
2024-10-19 21:41:57.207 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:170 - layout detection cost: 0.61

0: 1888x1344 4 embeddings, 91.9ms
Speed: 9.7ms preprocess, 91.9ms inference, 1.1ms postprocess per image at shape (1, 3, 1888, 1344)
2024-10-19 21:41:57.948 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:200 - formula nums: 4, mfr time: 0.19
2024-10-19 21:41:57.956 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:291 - ------------------table recognition processing begins-----------------
[2024/10/19 21:41:59] ppocr DEBUG: dt_boxes num : 18, elapse : 0.045398712158203125
[2024/10/19 21:41:59] ppocr DEBUG: dt_boxes num : 18, elapse : 0.045398712158203125
[2024/10/19 21:41:59] ppocr DEBUG: rec_res num  : 18, elapse : 0.047318220138549805
[2024/10/19 21:41:59] ppocr DEBUG: rec_res num  : 18, elapse : 0.047318220138549805
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:300 - ------------table recognition processing ends within 1.4687747955322266s-----
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.pdf_extract_kit:__call__:317 - table cost: 1.47
2024-10-19 21:41:59.425 | INFO     | magic_pdf.model.doc_analyze_by_custom_model:doc_analyze:124 - doc analyze cost: 2.828835964202881
2024-10-19 21:41:59.467 | INFO     | magic_pdf.pdf_parse_union_core:pdf_parse_union:221 - page_id: 0, last_page_cost_time: 0.0
2024-10-19 21:42:00.020 | INFO     | magic_pdf.para.para_split_v2:__detect_list_lines:143 - 发现了列表,列表行数:[(0, 1)], [[0]]
2024-10-19 21:42:00.020 | INFO     | magic_pdf.para.para_split_v2:__detect_list_lines:156 - 列表行的第0到第1行是列表
2024-10-19 21:42:00.154 | INFO     | magic_pdf.pipe.UNIPipe:pipe_mk_markdown:48 - uni_pipe mk mm_markdown finished
2024-10-19 21:42:00.162 | INFO     | magic_pdf.pipe.UNIPipe:pipe_mk_uni_format:43 - uni_pipe mk content list finished
2024-10-19 21:42:00.162 | INFO     | magic_pdf.tools.common:do_parse:119 - local output dir is output/origin/auto

@234687552
Copy link

234687552 commented Oct 22, 2024

不知道是不是这里导致表格识别巨慢

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L46

IMG20241022-111905

@myhloli
Copy link
Collaborator

myhloli commented Oct 22, 2024

不知道是不是这里导致表格识别巨慢

https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L46

IMG20241022-111905

确实是这个原因,里面写死了匹配的规则,我们修一下这里
目前可以临时修改成

use_gpu = True if device.startswith("cuda") else False

@234687552
Copy link

问题描述:

参考server.py提供接口,15并发4gpu压测,发现gpu[0]总是爆满,其他gpu都是相对空闲。

期望结果:

gpu的压力均分

实验过程执行:

nvidia-smi --loop=1

输出日志:

                                                                                   
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:02 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            228W /  350W |   19876MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   50C    P0            146W /  350W |    9629MiB /  46068MiB |     38%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   45C    P0            154W /  350W |    9629MiB /  46068MiB |     46%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             90W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:04 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            246W /  350W |   20234MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   51C    P0            155W /  350W |    9629MiB /  46068MiB |     43%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   43C    P0            130W /  350W |    9629MiB /  46068MiB |      5%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             93W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Wed Oct 23 19:59:05 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   68C    P0            217W /  350W |   20234MiB /  46068MiB |    100%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   50C    P0            158W /  350W |    9629MiB /  46068MiB |     34%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   43C    P0             88W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   45C    P0             90W /  350W |    9629MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241023200204846

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

@234687552 你这边是打开了表格识别了吗,如果打开了可以试试关闭表格识别,再测一下负载均衡,这样可以定位是不是表格识别的问题。

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

感谢,跑通了。额外安装库 pip install python-multipart,然后启动服务器程序就请求成功了。 另外如果希望仅仅输出.md文件来节省存储空间和速度的话可以: from magic_pdf.libs.MakeContentConfig import MakeMode # 添加这行

修改do parse 函数:

        do_parse(self.output_dir,
                  pdf_name, inputs[0],
                    [],
                    **inputs[1],
                    f_draw_span_bbox=False,
                    f_draw_layout_bbox=False,
                    f_dump_md=True,
                    f_dump_middle_json=False,
                    f_dump_model_json=False,
                    f_dump_orig_pdf=False,
                    f_dump_content_list=False,
                    f_make_md_mode=MakeMode.MM_MD,
                    f_draw_model_bbox=False)

简单的方法是在调用client里面的do_parse函数时传入这些参数就可以了,不需要修改server的代码

@234687552
Copy link

234687552 commented Oct 24, 2024

@234687552 你这边是打开了表格识别了吗,如果打开了可以试试关闭表格识别,再测一下负载均衡,这样可以定位是不是表格识别的问题。

情况描述

之前是开启了表格识别:"is_table_recog_enable": true,

关闭后测试:gpu[0] 不会一直持续爆满,其他gpu相对均衡运转

关闭表格识别

cat ~/magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": false,
        "max_time": 400
    }
}

gpu使用情况

nvidia-smi --loop=1

                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:07:57 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   58C    P0            169W /  350W |   15238MiB /  46068MiB |     47%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   59C    P0            165W /  350W |    9627MiB /  46068MiB |     43%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   54C    P0            154W /  350W |    9627MiB /  46068MiB |     22%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   53C    P0            109W /  350W |    9619MiB /  46068MiB |     15%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:07:58 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   62C    P0            193W /  350W |   15238MiB /  46068MiB |     76%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   60C    P0            175W /  350W |    9627MiB /  46068MiB |     48%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   52C    P0            176W /  350W |    9627MiB /  46068MiB |     56%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   60C    P0            192W /  350W |    9629MiB /  46068MiB |     79%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 10:08:00 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   57C    P0            204W /  350W |   15238MiB /  46068MiB |     42%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   59C    P0            189W /  350W |    9627MiB /  46068MiB |     86%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   51C    P0            114W /  350W |    9627MiB /  46068MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   54C    P0            114W /  350W |    9629MiB /  46068MiB |     19%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241024101150096

@234687552
Copy link

这边实际情况是必须开启表格识别的,现在不知道如何处理让表格识别也均衡单机使用多gpu

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

这边实际情况是必须开启表格识别的,现在不知道如何处理让表格识别也均衡单机使用多gpu

看来我的猜测是对的,还是因为表格识别的bug引起的,可能还是在代码的某个地方,表格模型还是以.cuda的方式load的,还是没有正确识别到cuda:1这种。导致所有的表格模型都load到了gpu 0上,因而gpu 0爆满。

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

对于TableMaster表格识别模型,以下是存在bug的地方:
https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55
仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量

对于struct_eqtable表格模型,以下是存在bug的地方:
https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9
这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效

@myhloli @234687552

@myhloli
Copy link
Collaborator

myhloli commented Oct 24, 2024

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量

对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效

@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

server.py

import os
import torch
import filetype
import json, uuid
import litserve as ls
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

试试把server.py改成我提供的新的代码,打开表格识别,再跑一次压测看看,应该是可以了 @234687552

@234687552
Copy link

server.py

import os
import torch
import filetype
import json, uuid
import litserve as ls
from fastapi import HTTPException
from magic_pdf.tools.common import do_parse
from magic_pdf.model.doc_analyze_by_custom_model import ModelSingleton


class MinerUAPI(ls.LitAPI):
    def __init__(self, output_dir='/tmp'):
        self.output_dir = output_dir

    @staticmethod
    def clean_memory(device):
        import gc
        if torch.cuda.is_available():
            with torch.cuda.device(device):
                torch.cuda.empty_cache()
                torch.cuda.ipc_collect()
        gc.collect()

    def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

    def decode_request(self, request):
        file = request['file'].file.read()
        kwargs = json.loads(request['kwargs'])
        assert filetype.guess_mime(file) == 'application/pdf'
        return file, kwargs

    def predict(self, inputs):
        try:
            pdf_name = str(uuid.uuid4())
            do_parse(self.output_dir, pdf_name, inputs[0], [], **inputs[1])
            return pdf_name
        except Exception as e:
            raise HTTPException(status_code=500, detail=f'{e}')
        finally:
            self.clean_memory(self.device)

    def encode_response(self, response):
        return {'output_dir': response}


if __name__ == '__main__':
    server = ls.LitServer(MinerUAPI(), accelerator='gpu', devices=[0, 1], timeout=False)
    server.run(port=8000)

magic-pdf.json

{
    "bucket_info":{
        "bucket-name-1":["ak", "sk", "endpoint"],
        "bucket-name-2":["ak", "sk", "endpoint"]
    },
    "models-dir":"/opt/models",
    "device-mode":"cuda",
    "table-config": {
        "model": "TableMaster",
        "is_table_recog_enable": true,
        "max_time": 400
    }
}

试试把server.py改成我提供的新的代码,打开表格识别,再跑一次压测看看,应该是可以了 @234687552

情况描述
@randydl

gpu是均衡分配占用【详看后面的日志和截图】,但是clean_memory有异常堆栈

参考改动如下:

  def setup(self, device):
        device = torch.device(device)
        os.environ['CUDA_VISIBLE_DEVICES'] = str(device.index)
        model_manager = ModelSingleton()
        model_manager.get_model(True, False)
        model_manager.get_model(False, False)
        print(f'Model initialization complete!')

异常堆栈:

Please check the error trace for more details.
Traceback (most recent call last):
File "/opt/mineru_venv/lib/python3.10/site-packages/litserve/loops.py", line 134, in run_single_loop
y = _inject_context(
File "/opt/mineru_venv/lib/python3.10/site-packages/litserve/loops.py", line 55, in _inject_context
return func(*args, **kwargs)
File "/app/app.py", line 144, in predict
self.clean_memory(self.device)
File "/app/app.py", line 83, in clean_memory
with torch.cuda.device(device):
File "/opt/mineru_venv/lib/python3.10/site-packages/torch/cuda/__init__.py", line 365, in __enter__
self.prev_idx = torch.cuda._exchange_device(self.idx)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

gpu使用情况

nvidia-smi --loop=1

                                                                                        
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:03 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            135W /  350W |   11611MiB /  46068MiB |     18%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            124W /  350W |   11435MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            112W /  350W |   12227MiB /  46068MiB |     20%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   51C    P0            124W /  350W |   11435MiB /  46068MiB |     26%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:05 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            117W /  350W |   11611MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            130W /  350W |   11435MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            118W /  350W |   12227MiB /  46068MiB |     23%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   51C    P0            132W /  350W |   11435MiB /  46068MiB |     31%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+
Thu Oct 24 20:54:06 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.12              Driver Version: 550.90.12      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA L20                     On  |   00000000:D3:00.0 Off |                    0 |
| N/A   51C    P0            125W /  350W |   11611MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  NVIDIA L20                     On  |   00000000:D4:00.0 Off |                    0 |
| N/A   54C    P0            138W /  350W |   11435MiB /  46068MiB |     30%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   2  NVIDIA L20                     On  |   00000000:D6:00.0 Off |                    0 |
| N/A   48C    P0            126W /  350W |   12227MiB /  46068MiB |     27%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   3  NVIDIA L20                     On  |   00000000:D7:00.0 Off |                    0 |
| N/A   52C    P0            143W /  350W |   11435MiB /  46068MiB |     36%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

image-20241024205743118

@randydl
Copy link
Contributor Author

randydl commented Oct 24, 2024

感谢,看来有进展!试试把with torch.cuda.device(device):这句话删掉@234687552

@234687552
Copy link

感谢,看来有进展!试试把with torch.cuda.device(device):这句话删掉@234687552

感谢支持,现在是可以多gpu正常运作了。

@randydl
Copy link
Contributor Author

randydl commented Oct 25, 2024

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量
对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效
@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

对于TableMaster表格识别模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/ppTableModel.py#L55 仅仅改use_gpu = True if device == "cuda" else False是不够的,需要调查use_gpu变量
对于struct_eqtable表格模型,以下是存在bug的地方: https://github.com/opendatalab/MinerU/blob/master/magic_pdf/model/pek_sub_modules/structeqtable/StructTableModel.py#L9 这个bug应该好改,改成self.model = StructTable(self.model_path, self.max_new_tokens, self.max_time).to(device)应该就能生效
@myhloli @234687552

paddle框架指定gpu的方式和torch框架不一致,目前paddle都是使用第一张卡去加速的,目前我们的开发重心还在提高解析质量上,暂时分不出人力优化多卡分配的逻辑,欢迎有能力解决多卡分配问题的开发者提交pr

经过昨天的调试我们基本解决了,后续我再测一下,可以的话我提个PR

@myhloli
Copy link
Collaborator

myhloli commented Oct 25, 2024

@randydl 可以提到dev分支的project目录,参考其他项目创建一个目录放代码文件和readme

@randydl
Copy link
Contributor Author

randydl commented Oct 25, 2024

@randydl 可以提到dev分支的project目录,参考其他项目创建一个目录放代码文件和readme

好的

@myhloli
Copy link
Collaborator

myhloli commented Nov 5, 2024

@myhloli myhloli closed this as completed Nov 5, 2024
@Sakura4036
Copy link

Sakura4036 commented Nov 8, 2024

@myhloli @234687552 你好,麻烦请看看这个pip安装问题 @897

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

8 participants