[<a target="_blank" href="https://www.huoban.com/news/tags-3670.html"style="font-weight:bold;">工程</a>] <a target="_blank" href="https://www.huoban.com/news/tags-7202.html"style="font-weight:bold;">gunicorn</a>下的深度学习api 如何合理分配gpu-伙伴云

[工程] gunicorn下的深度学习api 如何合理分配gpu

网友投稿 1282 2025-04-04

背景

老大提了一个需求: gunicron 起多个进程的时候，如何保证pytorch的模型均匀分配到不同的gpu上，按道理，如果能拿到类似每个进程的序号，那分配起来应该都是简单的，那核心问题提炼出来了，如何拿到进程的序号

分析

顺手直接去找一个相关的问题和分析，https://github.com/benoitc/gunicorn/issues/1278 ，发现很多人都有同样的需求，不过貌似提的pr都没有进一步的解决，所以只能进一步来看官方的文档有什么可用的。

通过进一步发现 http://docs.gunicorn.org/en/latest/settings.html 的文档，这些在起进程的时候就可以预先定义好进程的id

实践

我们写好gunicorn_conf.py

# RTFM -> http://docs.gunicorn.org/en/latest/settings.html#settings import os from service.config import WORKERS bind = '0.0.0.0:2048' workers = WORKERS timeout = 300 max_requests = 2000 max_requests_jitter = 500 def on_starting(server): """ Attach a set of IDs that can be temporarily re-used. Used on reloads when each worker exists twice. """ server._worker_id_overload = set() def nworkers_changed(server, new_value, old_value): """ Gets called on startup too. Set the current number of workers. Required if we raise the worker count temporarily using TTIN because server.cfg.workers won't be updated and if one of those workers dies, we wouldn't know the ids go that far. """ server._worker_id_current_workers = new_value def _next_worker_id(server): """ If there are IDs open for re-use, take one. Else look for a free one. """ if server._worker_id_overload: return server._worker_id_overload.pop() in_use = set(w._worker_id for w in server.WORKERS.values() if w.alive) free = set(range(1, server._worker_id_current_workers + 1)) - in_use return free.pop() def on_reload(server): """ Add a full set of ids into overload so it can be re-used once. """ server._worker_id_overload = set(range(1, server.cfg.workers + 1)) def pre_fork(server, worker): """ Attach the next free worker_id before forking off. """ worker._worker_id = _next_worker_id(server) def post_fork(server, worker): """ Put the worker_id into an env variable for further use within the app. """ os.environ["APP_WORKER_ID"] = str(worker._worker_id)

# RTFM -> http://docs.gunicorn.org/en/latest/settings.html#settings

[工程] gunicorn下的深度学习api 如何合理分配gpu

import os

from service.config import WORKERS

bind = '0.0.0.0:2048'

workers = WORKERS

timeout = 300

max_requests = 2000

max_requests_jitter = 500

def on_starting(server):

"""

Attach a set of IDs that can be temporarily re-used.

Used on reloads when each worker exists twice.

"""

server._worker_id_overload = set()

def nworkers_changed(server, new_value, old_value):

"""

Gets called on startup too.

Set the current number of workers. Required if we raise the worker count

temporarily using TTIN because server.cfg.workers won't be updated and if

one of those workers dies, we wouldn't know the ids go that far.

"""

server._worker_id_current_workers = new_value

def _next_worker_id(server):

"""

If there are IDs open for re-use, take one. Else look for a free one.

"""

if server._worker_id_overload:

return server._worker_id_overload.pop()

in_use = set(w._worker_id for w in server.WORKERS.values() if w.alive)

free = set(range(1, server._worker_id_current_workers + 1)) - in_use

return free.pop()

def on_reload(server):

"""

Add a full set of ids into overload so it can be re-used once.

"""

server._worker_id_overload = set(range(1, server.cfg.workers + 1))

def pre_fork(server, worker):

"""

Attach the next free worker_id before forking off.

"""

worker._worker_id = _next_worker_id(server)

def post_fork(server, worker):

"""

Put the worker_id into an env variable for further use within the app.

"""

os.environ["APP_WORKER_ID"] = str(worker._worker_id)

这样我们通过环境变量就可以清楚的知道我们的当前子进程的序号

# -*- coding: utf-8 -*- import os import torch def set_process_gpu(): worker_id = int(os.environ.get('APP_WORKER_ID', 1)) devices = os.environ.get('CUDA_VISIBLE_DEVICES', '') if not devices: print('current environment did not get CUDA_VISIBLE_DEVICES env ,so use the default') rand_max = 9527 gpu_index = (worker_id + rand_max) % torch.cuda.device_count() print('current worker id {} set the gpu id :{}'.format(worker_id, gpu_index)) torch.cuda.set_device(int(gpu_index))

# -*- coding: utf-8 -*-

import os

import torch

def set_process_gpu():

worker_id = int(os.environ.get('APP_WORKER_ID', 1))

devices = os.environ.get('CUDA_VISIBLE_DEVICES', '')

if not devices:

print('current environment did not get CUDA_VISIBLE_DEVICES env ,so use the default')

rand_max = 9527

gpu_index = (worker_id + rand_max) % torch.cuda.device_count()

print('current worker id {} set the gpu id :{}'.format(worker_id, gpu_index))

torch.cuda.set_device(int(gpu_index))

通过这个方法就可以轻松的设置自己进程所在的gpu ，这样就可以根据gpu的数量，均匀的分配进程

gunicorn -c gunicorn_conf.py wsgi:app

wsgi.py 这个就是app的实体了，正常启用就可以了。

API GPU加速云服务器深度学习

工程] tmux的一些操作技巧">[工程] tmux的一些操作技巧

1282 2025-04-04

什么是数据工程，它适合您吗？（数据工程是做什么的）

1282 2025-04-04

HarmonyOS之应用工程结构与设备模板

1282 2025-04-04

[工程] gunicorn下的深度学习api 如何合理分配gpu

工程] tmux的一些操作技巧">[工程] tmux的一些操作技巧

什么是数据工程，它适合您吗？（数据工程是做什么的）

HarmonyOS之应用工程结构与设备模板

推荐文章

企业生产管理是什么，企业生产管理软件

进盘点进销存软件排行榜前十名

进销存系统哪个简单好用？进销存系统优点

工厂生产管理（工厂生产管理流程及制度）

生产管理软件，机械制造业生产管理，制造业生产过程管理软件

进销存软件和ERP有什么区别？进销存与erp软件理解

进销存如何进行库存管理

如何利用excel制作销售订单管理系统？

数据库订单管理系统有哪些功能？数据库订单管理系统怎么设计？

什么是数据库管理系统？

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理 系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

WPS2016怎么绘制简单的价格表?

系统的功能有哪些？餐饮服务系统的构成及工作程序">连锁餐饮管理系统的功能有哪些？餐饮服务系统的构成及工

Excel项目进度表模板，简化您的项目进度管理">Excel项目进度表模板，简化您的项目进度管理

友情链接

[工程] gunicorn下的深度学习api 如何合理分配gpu

微信扫一扫：分享

工程] tmux的一些操作技巧">[工程] tmux的一些操作技巧

推荐文章

最近发表

热评文章

零代码开发是什么？2022低代码平台排行榜">零代码开发是什么？2022低代码平台排行榜

进销存库存管理系统（智慧进销存）">智能进销存库存管理系统（智慧进销存）

在线文档哪家强？8款在线文档编辑软件推荐">在线文档哪家强？8款在线文档编辑软件推荐

系统的功能有哪些？餐饮服务系统的构成及工作程序">连锁餐饮管理系统的功能有哪些？餐饮服务系统的构成及工

Excel项目进度表模板，简化您的项目进度管理">Excel项目进度表模板，简化您的项目进度管理

友情链接