name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
yolov3_darknet53网络训练到执行阶段,kill -9杀死一个worker进程,报错信息不符合预期
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
ms包: commit_id = '[sha1]:5a94f41a,[branch]:(HEAD,origin/r2.3.q1,r2.3.q1)'
run包: runpkg_version:Milan_C17/20240321
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
test_ms_fmea_kill9_worker_process_0002
1.拷贝网络脚本,修改脚本等配置操作
2.执行yolov3-darknet53脚本训练
3.网络八卡训练到编译阶段,kill -9杀死一个进程
执行用例流程如下:
source /home/miniconda3/bin/activate ci
export TRAIN_MODE=GRAPH_MODE
export DEVICE_TYPE=Ascend_Arm
export ENV_DEVICE=0
source solution_test/env_set.source -e ascend
cd solution_test/cases/03subject_test/00reliability_availability/01fault_injection/01business_fault/02process/00process_exit/ms_distribute_framework
pytest -s test_ms_fmea_kill9_worker_process_0002.py
有节点异常,进程退出相关的日志信息
EE1001: The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:4677]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
登录 后才可以发表评论