Ideally, deep learning models should be trained with large, representative, and high-quality annotated datasets. In reality, we often have to deal with small, biased, noisy, and sometimes scarcely or weakly annotated, or even completely unannotated, datasets. There are lot of research efforts addressing these problems. Here I would like to discuss about the noisy annotation problem due to expertise errors, i.e., the inconsistencies between different observers due to human subjectivity, using medical image segmentation as an example. The conventional wisdom considers this type of noisy annotations as a bad thing. To deal with it, we often try to achieve consensus from a group of expert observers during data annotation and try to use various strategies to mitigate its adverse effect during training. However, sometimes, or even many times, we may need to respect this type of noisy data annotation. This is because, medicine is still an art in many cases. Evidence based medicine and clinical guidelines only give physicians the floor not the ceiling. There is room for physicians to exercise their own judgements, leading to variation in physicians’ clinical practice. There is often no ground truth to tell which one is the best. Additionally, variation among physicians could be inherent due to the variation in handling the tradeoffs between outcome and toxicity, cost and benefit, etc. We have to face this reality when we develop and deploy deep learning models to solve clinical problems. Some strategies will be discussed.
Dr. Steve Jiang holds the Barbara Crittenden Endowed Professorship in Cancer Research of the department of Radiation Oncology at UT Southwestern Medical Henter. He is the vice chair and chief of the division of medical physics & engineering.