name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
[vit_b_32_224]训练精度提升不正常
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device ascend
ci上一次执行该用例精度正常上升:
run包Milan_C17/20240414
mindspore包master_20240417142516_74e1f3ea86a988bceb71bd7f02b674af4c97c89a/
PyNative
/Graph
):Please delete the mode not involved / 请删除不涉及的模式:
/mode graph
test_ms_lab_vit_b32_224_acc3_ascend_train_infer_8p_0005
1.get code from solution_test
2.cd solution_test/cases/02network/00cv/vit_b/train/
3.pytest -s test_ms_lab_vit_b32_224_acc3_ascend_train_infer_8p_0005.py
(最终的训练命令是cd /home/jenkins/workspace/TDT_deployment/solution_test/cases/02network/00cv/vit_b/train/test_ms_lab_vit_b32_224_acc3_ascend_train_infer_8p_0005;bash run_distribute_train.sh /home/workspace/config/hccl_8p.json --config configs/vit/vit_b32_224_ascend.yaml --data_dir /home/workspace/mindspore_dataset/ImageNet2012 --distribute True --ckpt_path /home/workspace/mindspore_ckpt/mindcv_models_ckpt/accuracy_cropping_ckpt_ascend/vit/vit_b_32_224-166_312.ckpt --val_while_train True --resume_opt False --ckpt_save_interval 1 --val_interval 1 > train.log 2>&1 &)
4.查看过训练精度是否正常
[vit_b_32_224]训练精度正常提升
acc test_data_list : [1.788, 3.822, 6.178, 7.638, 9.61, 11.136, 12.892, 14.756, 17.314, 19.976, 20.296, 23.756, 26.674, 29.262, 31.196, 32.538, 34.088, 35.936, 36.102, 36.536, 38.788, 40.084, 40.056, 41.454, 42.144, 41.38, 41.296, 43.15, 43.41, 42.902, 44.878, 44.332, 45.474, 46.144, 46.604, 47.598]
acc standard_data : [
66.362, 66.632, 66.738, 67.084, 67.222, 67.036, 67.16, 67.29, 67.106, 67.324, 67.788, 67.732, 67.72, 67.826,
67.862, 67.904, 68.36, 68.526, 68.312, 68.654, 68.564, 68.91, 68.73, 69.14, 69.106, 69.176, 69.572, 69.408,
69.562, 69.474, 69.698, 69.64, 69.976, 70.286, 70.61, 70.174
]
[2024-04-27 21:45:31] mindcv.utils.callbacks INFO - Epoch: [168/36], batch: [312/312], loss: 6.361636, lr: 0.000121, time: 186.347856s
[2024-04-27 21:45:39] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 3.8220%, Top_5_Accuracy: 11.5860%, time: 7.969699s
[2024-04-27 21:45:39] mindcv.utils.callbacks INFO - => New best val acc: 3.8220%
[2024-04-27 21:45:53] mindcv.utils.callbacks INFO - Saving model to ./ckpt/vit_b_32_224-168_312.ckpt
[2024-04-27 21:46:03] mindcv.utils.checkpoint_manager INFO - Top-k accuracy checkpoints:
./ckpt/vit_b_32_224-168_312.ckpt 0.03821999952197075
./ckpt/vit_b_32_224-167_312.ckpt 0.017880000174045563
[2024-04-27 21:46:03] mindcv.utils.callbacks INFO - Total time since last epoch: 217.913440(train: 186.399469, val: 7.969699)s, ETA: -28764.574142s
[2024-04-27 21:46:03] mindcv.utils.callbacks INFO - --------------------------------------------------------------------------------
[INFO] RUNTIME(3206661,python):2024-04-27-21:49:07.835.208 [engine.cc:1709] 3211067 ReportTimeoutProc: report timeout! streamId=2, taskId=174, execId=65535, pendingNum=2, reportCount=122, parseTaskCount=122, msec=65535, curSec=816459488
[2024-04-27 21:50:33] mindcv.utils.callbacks INFO - Epoch: [169/36], batch: [312/312], loss: 6.413715, lr: 0.000174, time: 269.775203s
[2024-04-27 21:50:40] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 6.1780%, Top_5_Accuracy: 16.7240%, time: 7.595529s
[2024-04-27 21:50:40] mindcv.utils.callbacks INFO - => New best val acc: 6.1780%
[2024-04-27 21:50:52] mindcv.utils.callbacks INFO - Saving model to ./ckpt/vit_b_32_224-169_312.ckpt
[2024-04-27 21:51:02] mindcv.utils.checkpoint_manager INFO - Top-k accuracy checkpoints:
./ckpt/vit_b_32_224-169_312.ckpt 0.06178000196814537
./ckpt/vit_b_32_224-168_312.ckpt 0.03821999952197075
./ckpt/vit_b_32_224-167_312.ckpt 0.017880000174045563
[2024-04-27 21:51:02] mindcv.utils.callbacks INFO - Total time since last epoch: 299.642142(train: 269.843112, val: 7.595529)s, ETA: -39852.404830s
[2024-04-27 21:51:02] mindcv.utils.callbacks INFO - --------------------------------------------------------------------------------
[INFO] RUNTIME(3206661,python):2024-04-27-21:54:07.607.217 [engine.cc:1709] 3211067 ReportTimeoutProc: report timeout! streamId=2, taskId=254, execId=65535, pendingNum=2, reportCount=170, parseTaskCount=170, msec=65535, curSec=816759260
[2024-04-27 21:55:59] mindcv.utils.callbacks INFO - Epoch: [170/36], batch: [312/312], loss: 6.344756, lr: 0.000227, time: 296.020940s
[2024-04-27 21:56:08] mindcv.utils.callbacks INFO - Validation Top_1_Accuracy: 7.6380%, Top_5_Accuracy: 19.7800%, time: 9.011860s
[2024-04-27 21:56:08] mindcv.utils.callbacks INFO - => New best val acc: 7.6380%
[2024-04-27 21:56:21] mindcv.utils.callbacks INFO - Saving model to ./ckpt/vit_b_32_224-170_312.ckpt
[2024-04-27 21:56:29] mindcv.utils.checkpoint_manager INFO - Top-k accuracy checkpoints:
./ckpt/vit_b_32_224-170_312.ckpt 0.07637999951839447
./ckpt/vit_b_32_224-169_312.ckpt 0.06178000196814537
./ckpt/vit_b_32_224-168_312.ckpt 0.03821999952197075
./ckpt/vit_b_32_224-167_312.ckpt 0.017880000174045563
[2024-04-27 21:56:29] mindcv.utils.callbacks INFO - Total time since last epoch: 326.168938(train: 296.084333, val: 9.011860)s, ETA: -43706.637681s
Please assign maintainer to check this issue.
请为此issue分配处理人。
@chentangyu
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
感谢您的提问,您可以评论//mindspore-assistant更快获取帮助:
使用此ms包未复现,master_20240426230938_a87635b6f65bc6225173fec53883351ebf449b66
Validation Top_1_Accuracy: 66.1080%, Top_5_Accuracy: 86.8300%
Validation Top_1_Accuracy: 66.4900%, Top_5_Accuracy: 87.0200%
Validation Top_1_Accuracy: 67.0460%, Top_5_Accuracy: 87.3060%
run包Milan_C17/20240414
mindspore包master_20240426230938_a87635b6f65bc6225173fec53883351ebf449b66
重跑后通过
登录 后才可以发表评论