A significant proportion of clinical data is stored as unstructured free-text reports, such as discharge summaries or radiology reports, which makes them difficult to process and analyse on a large scale. Text analytics methods like document retrieval and information extraction can address this challenge. I have conducted a three-month pilot study on using IBM Watson Content Analytics to identify relevant documents in large-scale collections of clinical reports (~6.5 million documents in total). My task was to retrieve documents which contain positive instances of certain conditions (e.g. “mild hydronephrosis is noted” as a positive instance, but “no evidence of hydronephrosis” as a negative instance). The custom rule-based models built using IBM Watson Content Analytics have achieved very good results for this task.