Brazilian Neural Approaches for Automated Assignment of ICD-10 Codes from Portuguese-Language Clinical Narratives

Authors

  • Vicente Pironti Open University Humaniza Author
  • Salomeh Keyhani Department of Chemical and Biomolecular Engineering, University of California, Berkeley Author

Keywords:

clinical narratives, Asavika Science, health humanization, Infinitando, multilingual NLP

Abstract

Manual assignment of ICD-10 codes from clinical narratives remains a vital yet labor-intensive task for healthcare systems worldwide. While deep learning models have advanced the automation of this process—particularly in English-language datasets—less attention has been given to the linguistic and philosophical dimensions of multilingual clinical NLP. This paper presents a comparative study of machine learning and neural models (Logistic Regression, CNN, GRU, and CNN with per-label attention) applied to Brazilian-Portuguese clinical narratives. In doing so, we introduce a document concatenation strategy to address sparse-text limitations and demonstrate that attention-based models outperform classical baselines. Beyond quantitative improvements, this study integrates philosophical insights drawn from Asavika Sciences—a transdisciplinary framework rooted in compassion, purpose, and systemic intelligence—and incorporates principles from the “New Philosophy for Health,” including the Sense of Life, the Sense of Death, and the concept of Infinitando (infinite unfolding). These perspectives reframe automated coding not merely as classification, but as a symbolic act of honoring human experience in digital systems. Additionally, we conduct a qualitative and quantitative comparison between English and Brazilian Portuguese, revealing that, when adequately modeled, the latter offers superior semantic clarity and context resolution in attention-based NLP systems. This is attributed to its morphological richness and emotional nuance, which make it especially suited for humanized health informatics. Our results affirm the feasibility and ethical promise of linguistically-aware, philosophically-aligned AI in global healthcare environments.

References

[1] W. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer, “Online Passive-Aggressive Algorithms,” J. Mach. Learn. Res., vol. 7, pp. 551–585, Mar. 2006. [Online]. Available: https://jmlr.org/papers/volume7/crammer06a/crammer06a.pdf
[2] P. Medori and C. Fairon, “Machine Learning for Semi-automatic Encoding into a Classification System: Application to ICD-9-CM,” in Proc. Workshop BioNLP, Prague, Czech Republic, Jun. 2007, pp. 57–64. DOI: 10.3115/1572392.1572405
[3] D. Mullenbach, S. Wiegreffe, J. Duke, A. Sun, and J. Eisenstein, “Explainable Prediction of Medical Codes from Clinical Text,” in Proc. NAACL-HLT, New Orleans, LA, USA, Jun. 2018, pp. 1101–1111. DOI: 10.18653/v1/N18-1100
[4] S. Wang, K. Liu, and J. Zhao, “Attention-based Convolutional Neural Network for Multi-label Text Classification,” in Proc. COLING, Osaka, Japan, Dec. 2016, pp. 196–206. DOI: 10.18653/v1/C16-1019
[5] E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, and L. Zettlemoyer, “Deep Contextualized Word Representations,” in Proc. NAACL-HLT, New Orleans, LA, USA, Jun. 2018, pp. 2227–2237. DOI: 10.18653/v1/N18-1202
[6] S. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” in Proc. NAACL-HLT, Minneapolis, MN, USA, Jun. 2019, pp. 4171–4186. DOI: 10.18653/v1/N19-1423
[7] A. E. Johnson, T. J. Pollard, L. Shen, H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. A. Celi, and R. G. Mark, “MIMIC-III, a Freely Accessible Critical Care Database,” Sci. Data, vol. 3, article 160035, May 2016. DOI: 10.1038/sdata.2016.35
[8] F. Santos and R. Milidiú, “Improving Portuguese Part-of-Speech Tagging with Word Embeddings,” in Proc. PROPOR, São Carlos, Brazil, Jul. 2016, pp. 54–65. DOI: 10.1007/978-3-319-41552-9_5
[9] L. Duarte, F. Santos, and R. Rezende, “ICD-10 Coding of Portuguese Death Certificates with Deep Neural Networks,” IEEE Lat. Am. Trans., vol. 18, no. 5, pp. 882–890, May 2020. DOI: 10.1109/TLA.2020.9082941
[10] R. Oleynik, T. Costa, and M. de Figueiredo, “Classification of Pathology Reports with ICD-O Codes Using Attention-based RNNs,” in Proc. ACM Conf. Health Informatics, São Paulo, Brazil, Nov. 2020, pp. 45–52. DOI: 10.1145/3429871.3429895
[11] M. Johnson, M. Ash, and A. N. Arguedas, “The Effect of Automated ICD Coding on Administrative Accuracy: A Review,” Int. J. Med. Inform., vol. 78, no. 8, pp. 562–571, Aug. 2009. DOI: 10.1016/j.ijmedinf.2009.03.003
[12] F. Santos, A. Silva, and J. Rezende, “Using Self-Taught Word Embeddings for ICD Code Prediction in Portuguese Clinical Texts,” in Proc. Braz. Conf. Artif. Intell. (CBIA), Porto Alegre, Brazil, Oct. 2019, pp. 65–74. DOI: 10.1109/CBIA.2019.00012
[13] L. Duarte, T. Souza, and M. Fernandes, “ICD-10 Coding in Portuguese Death Certificates Using Attention-based RNNs,” IEEE Access, vol. 8, pp. 124567–124578, Jul. 2020. DOI: 10.1109/ACCESS.2020.3005678
[14] V. Pironti, “The New Philosophy for Health: Asavika Sciences and the Sense of Life, Death, and Infinitando,” J. Philos. Health Stud., vol. 2, no. 1, pp. 1–20, 2023. [Online]. Available: https://asavikasciences.org/publications

Downloads

Published

2025-05-01