Improving task-agnostic BERT distillation with layer mapping search
Jiao, X.; Chang, H.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q.
Neurocomputing 461: 194-203
2021
ISSN/ISBN: 0925-2312 DOI: 10.1016/j.neucom.2021.07.050
Accession: 084710732
Full-Text Article emailed within 0-6 h
Payments are secure & encrypted

References
; Yang, Zijiang; Tang, Tiantian; Liu, Chengcheng; Lin, Yun; Gui, Guan 2026: Receiver-Agnostic Radio Frequency Fingerprint Identification Using BERT and Two-Stage Knowledge Distillation IEEE Communications LettersLiu, Y.; Meng, F.; Lin, Z.; Fu, P.; Cao, Y.; Wang, W.; Zhou, J. 2022: Learning to Win Lottery Tickets in BERT Transfer via Task-agnostic Mask Training NAACL 2022 - 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference: 5840-5857
Gao, Y.; Bai, H.; Jie, Z.; Ma, J.; Jia, K.; Liu, W. 2020: MTL-NAS: Task-Agnostic Neural Architecture Search towards General-Purpose Multi-Task Learning Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition: 11540-11549
Ganesh, R.R.; Rai, A.; Sethi, A.; Malhotra, A.; Ranu, S. 2025: Tag2M- A Task-Agnostic Knowledge Distillation Framework for Distilling Gnn to MLP Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 2: 2859-2870
Lin, J.; Gregg, R.D.; Shull, P.B. 2024: Improving Task-Agnostic Energy Shaping Control of Powered Exoskeletons with Task/Gait Classification IEEE Robotics and Automation Letters 9(8): 6848-6855
Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. 2020: MINILM: Deep self-attention distillation for task-agnostic compression of pre-trained transformers Advances in Neural Information Processing Systems 2020-December
Pan, Z.; Wu, Q.; Jiang, H.; Xia, M.; Luo, X.; Zhang, J.; Lin, Q.; Ruhle, V.; Yang, Y.; Lin, C.Y.; Zhao, H.V.; Qiu, L.; Zhang, D. 2024: LLMLingua-2: Data Distillation for Efficient and Faithful Task-Agnostic Prompt Compression Proceedings of the Annual Meeting of the Association for Computational Linguistics: 963-981
Kitamura, T.; Suzuki, Y. 2025: Finding Adequate Additional Layer of Auxiliary Task in BERT-Based Multi-task Learning Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 15342 LNCS: 67-83
Jiang, Y.; Shang, Y.; Liu, Z.; Shen, H.; Xiao, Y.; Xiong, W.; Xu, S.; Yan, W.; Jin, D. 2020: BERT2DNN: BERT distillation with massive unlabeled data for online e-commerce search Proceedings - IEEE International Conference on Data Mining, ICDM 2020-November: 212-221
Toikkanen, M.; Kim, J.-w. 2025: Improving Respiratory Sound Classification with Architecture-Agnostic Knowledge Distillation from Ensembles Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH: 1023-1027
Lin, Y.-J.; Chen, K.-Y.; Kao, H.-Y. 2023: LAD: Layer-Wise Adaptive Distillation for BERT Model Compression Sensors 23(3)
Zhang, Z.; Lu, Y.; Wang, T.; Wei, X.; Wei, Z. 2024: DDK: Dynamic structure pruning based on differentiable search and recursive knowledge distillation for BERT Neural Networks: the Official Journal of the International Neural Network Society 173: 106164
Ashihara, T.; Moriya, T.; Matsuura, K.; Tanaka, T. 2022: Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022-September: 411-415
2022: Blending Weighted TF-IDF BERT for Improving Semantic Search ICARC 2022 - 2nd International Conference on Advanced Research in Computing: Towards a Digitally Empowered Society: 154-159
Xu, D.D.K.; Mukherjee, S.; Liu, X.; Dey, D.; Wang, W.; Zhang, X.; Awadallah, A.H.; Gao, J. 2022: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models Advances in Neural Information Processing Systems 35
Chen, D.; Li, Y.; Qiu, M.; Wang, Z.; Li, B.; Ding, B.; Deng, H.; Huang, J.; Lin, W.; Zhou, J. 2020: AdaBERT: Task-adaptive BERT compression with differentiable neural architecture search IJCAI International Joint Conference on Artificial Intelligence 2021-January: 2463-2469
Hataya, R.; Nakayama, H. 2022: DJMix: Unsupervised Task-agnostic Image Augmentation for Improving Robustness of Convolutional Neural Networks Proceedings of the International Joint Conference on Neural Networks 2022-July
Abolghasemi, A.; Verberne, S.; Azzopardi, L.A. 2022: Improving BERT-based Query-by-Document Retrieval with Multi-task Optimization Lecture Notes in Computer Science 13186 LNCS: 3-12
Li, J.; Li, X.; Wang, T.; Wang, S.; Cao, Y.; Xu, C.; Dou, D. 2023: Improving Bert Fine-Tuning via Stabilizing Cross-Layer Mutual Information Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing 2023-June
Zhu, X.; Li, J.; Liu, Y.; Wang, W. 2023: Improving Differentiable Architecture Search via self-distillation Neural Networks: the Official Journal of the International Neural Network Society 167: 656-667