This code implements radiology reports extracting in our paper:
Chong Ma, Zihao Wu, Jiaqi Wang, Shaochen Xu, Yaonai Wei, Zhengliang Liu, Xi Jiang, Lei Guo, Xiaoyan Cai, Shu Zhang, Tuo Zhang, Dajiang Zhu, Dinggang Shen, Tianming Liu, Xiang Li
ImpressionGPT: An Iterative Optimizing Framework for Radiology Report Summarization with ChatGPT.
1. Download.
Download at OpenI-Reports.
2. Extract.
Find extract_openI.py and:
1). Run "extract_findings_and_impression_openi()" first to extract the findings and impression section in .xml files.
2). Run "gen_openi_data()" to generate random train and test split.
3). Run "gen_report_and_label()" to generate final data used in the ImpressionGPT. The openi_findings_label.csv can be found at /res/, and this file is generated by a report labeler CheXpert.
Or you can use the post-processing data generated by me at /Results/openI test data/.
1. Download.
You should get the License first at physionet. Then you can download the original reports at MIMIC-CXR-2.0. If you don't want do that, you can just use the post-processing data generated by me at /Results/mimic test data/.
2. Extract.
Find extract_mimic.py and:
1). Run "extract_sections()" first to extract the findings and impression section from txt file. This part is implemented by MIT-LCP.
2). Run "select_test_data_from_sections()" to select the test data of official split from all reports. The all_test_ids.csv can be found at /res/.
3). Run "data_clean_for_mimic()" to clean some useless sentences in the test reports, such as "The diagnosis results was communicated with Dr.__ at __."
If you use this code, or otherwise found our work valuable, please cite:
@article{ma2023impressiongpt,
title={ImpressionGPT: an iterative optimizing framework for radiology report summarization with chatGPT},
author={Ma, Chong and Wu, Zihao and Wang, Jiaqi and Xu, Shaochen and Wei, Yaonai and Liu, Zhengliang and Guo, Lei and Cai, Xiaoyan and Zhang, Shu and Zhang, Tuo and others},
journal={arXiv preprint arXiv:2304.08448},
year={2023}
}