name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
Pangu_alpha网络,使用Cell共享提升性能VM后端,GE的ge::Session::RunGraphWithStreamAsync异步接口里面调用流同步导致超时。
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device Ascend910A
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
run包:HiAI/Milan_C17/20240414
MindSpore 版本:2.3.0.B210
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
用例地址:solution_test/cases/01frame_func/03distributed/compiler_enhance/cell_share/
用例:test_ms_cell_sharing_normal_001.py
编译性能提升
----------------------------------------------------
- Ascend Error Message:
----------------------------------------------------
EE9999: Inner Error!
EE9999: 2024-04-28-17:04:57.723.444 Failed to synchronize sink stream, retCode=0x7150050![FUNC:LoadCompleteByStream][FILE:model.cc][LINE:557]
TraceBack (most recent call last):
Model load complete process failed, model_id=3, retCode=0x7150050.[FUNC:ModelLoadComplete][FILE:context.cc][LINE:2388]
rtModelLoadComplete execute failed, reason=[the model stream execute failed][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
Call rtModelLoadComplete(rt_model_handle_) fail, ret: 0x7BC83[FUNC:DoTaskSink][FILE:davinci_model.cc][LINE:613]
Synchronize failed.[FUNC:UnbindStream][FILE:model.cc][LINE:427]
Unbind stream failed, model_id=3, stream_id=95, retCode=0x7150050.[FUNC:ModelUnbindStream][FILE:context.cc][LINE:2357]
rtModelUnbindStream execute failed, reason=[the model stream execute failed][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
Unbind stream failed, model_id=3, stream_id=94, retCode=0x7150050.[FUNC:ModelUnbindStream][FILE:context.cc][LINE:2357]
Unbind stream failed, model_id=3, stream_id=93, retCode=0x7150050.[FUNC:ModelUnbindStream][FILE:context.cc][LINE:2357]
Tear down stream failed, stream is binded, stream_id=95, model_id=3, retCode=0x7030004.[FUNC:TearDownStream][FILE:context.cc][LINE:433]
rtStreamDestroy execute failed, reason=[the relationship between the model and stream is incorrect.][FUNC:FuncErrorReason][FILE:error_message_manage.cc][LINE:53]
Tear down stream failed, stream is binded, stream_id=94, model_id=3, retCode=0x7030004.[FUNC:TearDownStream][FILE:context.cc][LINE:433]
Tear down stream failed, stream is binded, stream_id=93, model_id=3, retCode=0x7030004.[FUNC:TearDownStream][FILE:context.cc][LINE:433]
GraphManager RunGrapWithStreamhAsync failed,session id = 0, graph id = 30, stream = 0xaaab172a6960.[FUNC:RunGraphWithStreamAsync][FILE:inner_session.cc][LINE:513]
[Run][Graph]Run graph with stream asyn failed, error code = 507011, session id = 0,graph id = 30, stream = 0xaaab172a6960.[FUNC:RunGraphWithStreamAsync][FILE:ge_api.cc][LINE:800]
DEVICE[0] PID[94293]:
EXCEPTION STREAM:
Exception info:TGID=94293, model id=0, stream id=200, stream phase=SCHEDULE
Message info[0]:RTS_HWTS: slot_id=45, stream_id=200, task_id=260, result=261
Other info[0]:time=2024-04-28-17:04:48.071.993, function=send_timeout_cq_msg_with_result, line=1390, error code=0x105
EXCEPTION TASK:
Exception info:TGID=94293, model id=0, stream id=0, stream phase=DESTROY, task id=0, task type=aicore kernel, recently received task id=2488, recently send task id=2488, task phase=COMPLETE
Message info[0]:modelId=0 streamId=200 taskId=260 taskType=14
Other info[0]:time=2024-04-28-17:04:48.072.032, function=set_model_exec_fail, line=797, error code=0x91
(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_graph_executor.cc:1295 RunGraphRefMode
[INFO] RUNTIME(94293,python):2024-04-28-17:07:17.523.738 [runtime.cc:1831] 94293 ~Runtime: deconstruct runtime
[INFO] RUNTIME(94293,python):2024-04-28-17:07:18.024.052 [runtime.cc:1838] 94293 ~Runtime: wait monitor success, use=5.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.095 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.106 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=93, model_id=3627651248, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.213 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.222 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=94, model_id=3627651248, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.247 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.253 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=95, model_id=3627651248, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.292 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.300 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=199, model_id=3623942240, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.327 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.334 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=206, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.360 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.366 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=205, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.393 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.416 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=204, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.440 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.447 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=203, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.463 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.024.468 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=202, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.011 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.041 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=201, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.087 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.094 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=200, model_id=0, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.130 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.136 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=96, model_id=4294967295, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.194 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.200 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=97, model_id=4294967295, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.217 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.223 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=98, model_id=4294967295, retCode=0x7030004.
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.252 [context.cc:433]94293 TearDownStream:report error module_type=0, module_name=EE9999
[ERROR] RUNTIME(94293,python):2024-04-28-17:07:18.029.258 [context.cc:433]94293 TearDownStream:Tear down stream failed, stream is binded, stream_id=207, model_id=4294967295, retCode=0x7030004.
走给林清客
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
登录 后才可以发表评论