CogStack is an application framework that allows you to extract information from unstructured data sources e.g. Electronic Clinical Records where majority of the information content is locked-up (i.e. not programmatically queryable) in multiple formats of unstructured data (i.e. binary word docs, PDFs, images, text fields etc). Once extracted, harmonised and processed, multiple uses of this unstructured data become possible based around information retrieval and extraction, these include Natural Language Processing (NLP), Enterprise Search, Alerting, Cohort Selection and Research.

An enormous and growing amount of data in Electronic health records (EHRs) is in unstructured text. Text is an expressive and more natural way for clinicians to document the clinical narrative. Unfortunately, free-text is challenging to use in service planning, care and research as it is difficult to translate to a format which can be understandable by computers. Natural Language Processing through CogStack will establish this interoperability will, therefore, be fundamental in delivering the NHS Long Term Plan.

CogStack is composed of a range of adaptable modular interoperable tools which introduce tiered functionality which can be used for a variety of use-technologies:

  1. Centralise and lake clinical data including structured data i.e. observations, results, and unstructured data i.e. clinical narratives such as clinic letters, discharge and admission summaries and radiology reports also varying formats e.g. binary word docs, PDFs, images.
  2. Search and visualise millions of distinct data points in near-real-time – ‘unlocking’ capabilities that would otherwise have taken days or months previously.
  3. Natural Language Processing of clinical text to standardised clinical terminologies (SNOMED-CT) for interoperable clinical data combined with semantic context. This allows cohorting based on “find all patients with a heart attack”, regardless of how this has been referred to in the clinical text, such as “patient had myocardial infarct”, “MI“, “infarct of heart”, “cardiac infarct” and distinguishing “the patient’s father had a MI”.
  4. Deep phenotyping using NLP allows accelerated NHS clinical coding, disease registry submissions and advanced cohorting for observational studies.
  5. Population health dashboards for combining data from structured and text components of the electronic health record to track patient outcomes, enhance patient safety and improve patient care.
  6. Advanced analytics using generative AI for virtual trial emulation, high-dimensional patient or disease modelling and digital patient twins.

So far, across 4 mature enterprise-wide deployments at King’s College Hospital, Guy’s and St Thomas’ Hospital, University College London Hospitals and South London and Maudsley Hospitals. These core sites have processed >200 million clinical documents. CogStack deployments have supported  digital transformation, clinical trial recruitment, population health management, clinical clinical audits, service planning, and clinical research.

Many other hospitals around the UK are joining the community including: University Hospitals Birmingham, Lancashire Teaching Hospitals and Northern Care Alliance, and other hospitals in Netherlands and Australia, are rolling out the technology.


Prof Richard Dobson
Academic Lead, Professor of Health Informatics
Prof James Teo
Clinical Lead, Professor of Neurology
Tom Searle
Programme Manager and PhD student
Dr Anthony Shek
Senior AI Engineer
Zeljko Kraljevic
Research Fellow
Barbara Rafferty
Head of Digital & Data
Aleksandra Foy
NLP Workstream Manager
Kawsar Noor
Research Software Developer
Vlad Dinu
Research Software Engineer
Dr Angus Roberts
Senior Lecturer in Health Informatics
Amos Folarin
Software Development Lead
Yamiko Msosa
Postdoctoral Research Systems Engineer
Alex Handy
Senior Research Fellow and Data Scientist
Tao Wang
Postdoctoral Research Associate
Dr Joshua Au-Yeung
Cogstack Clinical Fellow
Honghan Wu
Clinical Informatics
Dan Bean
Clinical Informatics
© 2020 CogStack - PhiDataLab | Made by Suara