name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
Resnet50网络随机添加ms_function, 导出mindir,比较mindir和ckpt推理,进程异常退出,无报错信息。看堆栈信息有 Segmentation fault
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend/GPU
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
ms版本: commit_id = '[sha1]:8abeb488,[branch]:(HEAD,origin/r2.3,r2.3)'
run包版本: Milan_C17/20240206
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
/mode graph
test_ms_jit_network_001_mindir_infer
1.参数初始化,拷贝网络脚本
2.在Resnet50的网络结构脚本中,并使用动态LossScale, 随机选择3-5个函数和construct,添加@ms_function进行修饰
3.Resnet50网络,cifar10数据集,进行单卡训练
4.执行整网训练脚本,保存ckpt,将ckpt导出为mindir
5.调用load_ckpt_mindir_compare脚本,分别加载ckpt以及mindir,比较结果
用例运行步骤:
source /home/miniconda3/bin/activate ci
export TRAIN_MODE=PYNATIVE_MODE
export DEVICE_TYPE=GPU_PCIE
export ENV_DEVICE=1
source solution_test/env_set.source -e cuda11.6
cd solution_test/cases/01frame_func/04model_save_load/ms_function/ms_function_mindir_infer/test_ms_jit_network_001_mindir_infer.py
pytest -s test_ms_jit_network_001_mindir_infer.py
正常训练,用例pass
进程退出时,info最后的日志如下:
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.761 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_prepare_actor.cc:1016] PrepareDataForSequenceAndScalarValue] Prepare device data for value node: ValueNode 0.0001
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.784 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_prepare_actor.cc:845] PrepareDataForValueNodeTensor] Prepare device data for value node: ValueNode Tensor(shape=[4], dtype=Int64, value=[256 256 3 3]), output index: 0 device address:0x55a4061e88d0
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.805 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_prepare_actor.cc:1016] PrepareDataForSequenceAndScalarValue] Prepare device data for value node: ValueNode 0.0001
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.821 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_prepare_actor.cc:1016] PrepareDataForSequenceAndScalarValue] Prepare device data for value node: ValueNode 0
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.974 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_prepare_actor.cc:695] PrepareDataForHostTensorQueue] Prepare host data, input tensor size: 2, arg size: 0
[INFO] RUNTIME_FRAMEWORK(108500,7f51bd67b740,python):2024-02-23-18:19:09.741.993 [mindspore/ccsrc/runtime/graph_scheduler/actor/data_source_actor.cc:39] FetchData] Data source actor(kernel_graph_4_HostDSActor) fetches data.
堆栈日志:
$ gdb python -c core-108500
GNU gdb (Ubuntu 8.1.1-0ubuntu1) 8.1.1
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
。。。
。。。
。。。
Core was generated by `python train.py --device_target=GPU --data_path=/home/workspace/mindspore_datas'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f51911d5006 in ?? ()
[Current thread is 1 (LWP 108500)]
(gdb) bt
#0 0x00007f51911d5006 in ?? ()
#1 0x00007f51ae9a1dd0 in ?? ()
#2 0x00007f51ae9a1c18 in ?? ()
#3 0x000055a3ff6b25c9 in ?? ()
#4 0x000055a3ff6b25c9 in ?? ()
#5 0x000055a3ff6b25c9 in ?? ()
#6 0x000055a3ff6b25c8 in ?? ()
#7 0x000055a3ff6b25d9 in ?? ()
#8 0x000055a3ff6b27c8 in ?? ()
#9 0x00007f51ae9ad3c0 in ?? ()
#10 0x00007f5100000010 in ?? ()
#11 0x000055a3ff6b25c8 in ?? ()
#12 0x00007f51ae9a1df8 in ?? ()
#13 0x0000000000000006 in ?? ()
#14 0x0000000000000000 in ?? ()
(gdb)
走个 周培晨
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的反馈,您可以评论//mindspore-assistant更快获取帮助,更多标签可以查看标签列表:
登录 后才可以发表评论