Poster Presentation Joint 2016 COSA and ANZBCTG Annual Scientific Meeting

Content analysis of clinical letters for breast cancer patients in the adjuvant setting – the first step towards automated extraction of clinical data (#293)

Patricia Banks 1 , Lawrence Cavedon 2 , Karin Verspoor 3 , Graham Pitson 4
  1. Peter MacCallum Cancer Centre, Melbourne, Victoria, Australia
  2. School of Science, RMIT University, Melbourne, Victoria, Australia
  3. Department of Computing and Information Systems, The University of Melbourne, Melbourne, Victoria, Australia
  4. Andrew Love Cancer Centre, Geelong, VIC, Australia


Detailed and reportable follow up information is lacking for many oncology patients. Follow up documentation for many breast cancer patients is restricted to free text clinical letters.  The quality of such letters is known to vary widely and the actual content largely unstudied. The aim of this study was to analyse the topics discussed in clinical letters of breast cancer patients with a view to assessing whether machine learning techniques might be able extract useful clinical content, otherwise hidden as free text.


A schema was developed for the annotation of free text clinical letters. The schema included important content such as use and toxicity of endocrine therapy, tests and test results, disease status (i.e. recurrence) and follow up plans. Two annotators reviewed 200 letters from a cohort of 55 ER positive early stage breast cancer patients. Annotations were compared and revised to ensure a high level of consistency.


Most clinical letters followed a similar format regardless of the author. The endocrine therapy used was mentioned at least once in 92% of letters with 90% also discussing timing (e.g. current, previous, future etc.). Toxicity of endocrine therapy or lack thereof was noted in 59% of letters. A comment on disease status was found in 89% with 75% also recording some prior or future test. Follow up plans were mentioned in 95% of letters with 82% detailing the timing.


Around 90% of clinical letters of early stage breast cancer patients mention the prescribed endocrine therapy, disease status and follow up plans. Toxicity (or absence) and tests performed are noted less frequently. This quantitative content analysis of letters is novel and has identified specific information that could be targeted for extraction with machine learning techniques, enabling larger-scale analysis of treatment patterns and outcomes.