name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
abinet网络过拟合用例报错Equal-op0 op dtype is not same, type1:DT_INT64, type2:DT_INT32
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend
run包:run0414
mindspore:2.3.0rc1.B200_daily0425 commit_id = '[sha1]:b5a20264,[branch]:(HEAD,origin/master,origin/HEAD,master)'
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
PyTest/testcase/mindocr/daily_models/rec/abinet/test_abinet_resnet45_en_ascend.py
loss达标,性能达标
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.307 [graph_prepare.cc:2321]2564977 InferShapeForPreprocess: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Run][GePasses] infershape for preprocess failed, ret:4294967295.
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.338 [graph_prepare.cc:1769]2564977 FormatAndShapeProcess: ErrorNo: 1343242270(Prepare Graph infershape failed) [COMP][PRE_OPT][Call][InferShapeForPreprocess] Prepare Graph infershape failed
[INFO] GE(2564977,python):2024-04-26-13:56:32.308.349 [graph_prepare.cc:2008][EVENT]2564977 PrepareDynShape:[GEPERFTRACE] The time cost of Prepare::FormatAndShapeProcess is [906823] micro second.
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.359 [graph_prepare.cc:2008]2564977 PrepareDynShape: ErrorNo: 1343242270(Prepare Graph infershape failed) [COMP][PRE_OPT][Process][Prepare_FormatAndShapeProcess] failed
[INFO] GE(2564977,python):2024-04-26-13:56:32.308.369 [graph_manager.cc:1083][EVENT]2564977 PreRunOptimizeOriginalGraph:[GEPERFTRACE] The time cost of GraphManager::stages.preparer.PrepareDynShape is [1079515] micro second.
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.379 [graph_manager.cc:1083]2564977 PreRunOptimizeOriginalGraph: ErrorNo: 1343242270(Prepare Graph infershape failed) [COMP][PRE_OPT][Process][GraphManager_stages.preparer.PrepareDynShape] failed
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.392 [graph_manager.cc:3817]2564977 OptimizeGraph: ErrorNo: 1343242270(Prepare Graph infershape failed) [COMP][PRE_OPT][Run][PreRunOptimizeOriginalGraph] failed for graph:kernel_graph0, session_id:0
[ERROR] GE(2564977,python):2024-04-26-13:56:32.308.405 [pne_model_builder.cc:125]2564977 OptimizeGraph: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Optimize][Graph] failed, graph = kernel_graph0, engine = NPU
[ERROR] GE(2564977,python):2024-04-26-13:56:32.332.195 [graph_manager.cc:1286]2564977 PreRun: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Build][Model] failed, session_id:0, graph_id:1.
[INFO] ATRACE(2564977,python):2024-04-26-13:56:32.343.673 [tracer_schedule.c:187](tid:2564977) destory object RUNTIME_ATRACE_DEV64_TS0, exitSave(false).
[ERROR] GE(2564977,python):2024-04-26-13:56:32.344.057 [graph_manager.cc:4409]2564977 CompileGraph: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Call][PreRun] Failed, graph_id:1, session_id:0.
[ERROR] GE(2564977,python):2024-04-26-13:56:32.347.532 [inner_session.cc:964]2564977 CompileGraph: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Compile][Graph]Failed, InnerSession:0, graph_id:1.
[ERROR] GE(2564977,python):2024-04-26-13:56:32.347.636 [ge_api.cc:1165]2564977 CompileGraph: ErrorNo: 4294967295(failed) [COMP][PRE_OPT][Compile][Graph]Compile graph failed, error code:1343225857, session_id:0, graph_id:1.
[ERROR] GE_ADPT(2564977,ffffada6d020,python):2024-04-26-13:56:32.350.920 [mindspore/ccsrc/transform/graph_ir/graph_runner.cc:425] CompileGraph] Call GE CompileGraph Failed, ret is: 1343225857
Traceback (most recent call last):
File "/home/zl/jenkins/workspace/Kits/source_code/mindocr_daily/tools/train_overfit_mindocr.py", line 276, in <module>
main(args, config)
File "/home/zl/jenkins/workspace/Kits/source_code/mindocr_daily/tools/train_overfit_mindocr.py", line 161, in main
model.train(
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 1082, in train
self._train(epoch,
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 115, in wrapper
func(self, *args, **kwargs)
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 630, in _train
self._train_process(epoch, train_dataset, list_callback, cb_params, initial_epoch, valid_infos)
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/train/model.py", line 932, in _train_process
outputs = self._train_network(*next_element)
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 697, in __call__
out = self.compile_and_run(*args, **kwargs)
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 1020, in compile_and_run
self.compile(*args, **kwargs)
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/nn/cell.py", line 998, in compile
_cell_graph_executor.compile(self, *self._compile_args, phase=self.phase,
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/mindspore/common/api.py", line 1630, in compile
result = self._graph_executor.compile(obj, args, kwargs, phase, self._use_vm_mode())
RuntimeError: Compile graph kernel_graph0 failed.
----------------------------------------------------
- Ascend Error Message:
----------------------------------------------------
E40024: 2024-04-26-13:56:31.122.886 Failed call Python Func/Meathod [call_op_func], Reason[Traceback (most recent call last):
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/te_fusion/fusion_manager.py", line 816, in call_op_func
return op_func(*inputs, *outputs, *new_attrs)
File "/usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/dynamic/concat_d.py", line 60, in op_select_format
return concat_v2_op_select_format(input_values, output_data, concat_dim, kernel_name)
File "/usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/concat_v2_d.py", line 140, in op_select_format
if input_shape[concat_dim] % align_len != 0:
IndexError: list index out of range
]
Possible Cause: The Python Func/Meathod does not exist.
TraceBack (most recent call last):
Failed call Python Func/Meathod [call_op_func], Reason[Traceback (most recent call last):
File "/home/miniconda3/envs/Python380/lib/python3.8/site-packages/te_fusion/fusion_manager.py", line 816, in call_op_func
return op_func(*inputs, *outputs, *new_attrs)
File "/usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/concat_d.py", line 62, in op_select_format
return concat_v2_op_select_format(input_values, output_data, concat_dim, kernel_name)
File "/usr/local/Ascend/latest/opp/built-in/op_impl/ai_core/tbe/impl/concat_v2_d.py", line 140, in op_select_format
if input_shape[concat_dim] % align_len != 0:
IndexError: list index out of range
]
op[Default/network-NetWithLossWrapper/_net-BaseModel/backbone-ABINetIterBackbone/vision-BaseVision/Equal-op0], The Default/network-NetWithLossWrapper/_net-BaseModel/backbone-ABINetIterBackbone/vision-BaseVision/Equal-op0 op dtype is not same, type1:DT_INT64, type2:DT_INT32[FUNC:CheckTwoInputDtypeSame][FILE:util.cc][LINE:115]
Verifying Default/network-NetWithLossWrapper/_net-BaseModel/backbone-ABINetIterBackbone/vision-BaseVision/Equal-op0 failed.[FUNC:InferShapeAndType][FILE:infershape_pass.cc][LINE:132]
Call InferShapeAndType for node:Default/network-NetWithLossWrapper/_net-BaseModel/backbone-ABINetIterBackbone/vision-BaseVision/Equal-op0(Equal) failed[FUNC:Infer][FILE:infershape_pass.cc][LINE:120]
process pass InferShapePass on node:Default/network-NetWithLossWrapper/_net-BaseModel/backbone-ABINetIterBackbone/vision-BaseVision/Equal-op0 failed, ret:4294967295[FUNC:RunPassesOnNode][FILE:base_pass.cc][LINE:570]
[Call][PreRun] Failed, graph_id:1, session_id:0.[FUNC:CompileGraph][FILE:graph_manager.cc][LINE:4409]
[Compile][Graph]Compile graph failed, error code:1343225857, session_id:0, graph_id:1.[FUNC:CompileGraph][FILE:ge_api.cc][LINE:1165]
(Please search "CANN Common Error Analysis" at https://www.mindspore.cn for error code description)
----------------------------------------------------
- C++ Call Stack: (For framework developers)
----------------------------------------------------
mindspore/ccsrc/plugin/device/ascend/hal/hardware/ge_graph_executor.cc:971 CompileGraph
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
用例test_ms_resnet50_imagenet_boost_train_infer_0001也有该问题
用例
test_ms_bert_large_boost_en_wiki_train_infer_8p_0001
test_ms_bert_large_boost_en_wiki_train_infer_epoch_40_8p_0003
也有该问题
以下两个用例有该问题:
test_ms_mobilenetv2_imagenet2012_train_ascend_check_fps_1p_0001
test_ms_mobilenetv2_imagenet2012_train_ascend_check_fps_1p_0001
通过二分定位到由该PR引入:!68591:pick feature-2.3-kbk-infer-opt
2024/4/24 14:45:28 afcd1ec3b8d Milan_C17/20240414 正常
2024/4/24 13:59:07 831ae0d7933 Milan_C17/20240414 失败
已定位到问题:是由于ArgMax算子采用了ArgMaxV2实现导致的;之前用的是aicpu的CustArgMax算子,在使用ArgMaxV2实现时,ge在infershape时出错:
PASS: ge_onnx_00000020_graph_1_after_infershape.pbtxt
output_desc_dtype:0
ge::CustArgMax
s: "DT_INT32"
ge:ArgMaxV2
s: "DT_INT64"
后导致Equal的两个输入dtype不一样导致报错。
使用新的argmaxv2算子实现,但是mindspore/ccsrc/transform/graph_ir/op_declare/elewise_calculation_ops_declare.cc里没有更新dtype的定义。
修改代码如下:
// ArgMaxV2
INPUT_MAP(ArgMaxV2) = {{1, INPUT_DESC(x)}, {2, INPUT_DESC(dimension)}, {3, INPUT_DESC(dtype)}};
ATTR_INPUT_MAP(ArgMaxV2) = {{"axis", "dimension"}};
INPUT_ATTR_MAP(ArgMaxV2) = {{kIndex3, ATTR_DESC(dtype, AnyTraits<GEType>())}};
ATTR_MAP(ArgMaxV2) = {{"output_type", ATTR_DESC(dtype, AnyTraits<GEType>())}};
OUTPUT_MAP(ArgMaxV2) = {{0, OUTPUT_DESC(y)}};
REG_ADPT_DESC(ArgMaxV2, kNameArgMaxV2, ADPT_DESC(ArgMaxV2))
REG_ADPT_DESC(ArgMax, kNameArgmax, ADPT_DESC(ArgMaxV2));
https://e.gitee.com/mind_spore/dashboard?issue=I9K1EM
https://e.gitee.com/mind_spore/dashboard?issue=I9K0T5
!68963:fix bug: argmax v2 attr map
!68962:fix bug: argmax v2 attr map
回归版本:
runpkg_version:Milan_C17/20240414
mindspore:2.3.0rc2
commit_id = '[sha1]:39ac2284,[branch]:(HEAD,origin/master,origin/HEAD,master)'
回归步骤:pytest -s PyTest/testcase/mindocr/daily_models/rec/abinet/test_abinet_resnet45_en_ascend.py
基本功能:正常训练,loss达标
测试结论:回归通过
回归人员:jianyunchao
回归时间:2024-5-5
登录 后才可以发表评论