Research Publications


Research Publication Type


Publication Date Filter

Simon C Williams, Kawsar Noor, Siddharth Sinha, Richard JB Dobson, Thomas Searle, Jonathan P Funnell, John G Hanrahan, William R Muirhead, Neil Kitchen, Hala Kanona, Sherif Khalil, Shakeel R Saeed, Hani J Marcus, Patrick Grover

Zeljko Kraljevic, Dan Bean, Anthony Shek, Rebecca Bendayan, Harry Hemingway, Joshua Au Yeung, Alexander Deng, Alfred Baston, Jack Ross, Esther Idowu, James T Teo*, Richard J B Dobson*

Msosa, Yamiko Joseph; Codling, David; Wang, Tao; Broadbent, Matthew; Roberts, Angus; Harland, Robert; McGuire, Philip; Stewart, Rob; Dobson, Richard JB;

Wu, J; Biswas, D; Seale, T; Bean, D; Fairhurst, N; Kaye, G; Dobson, R; Chowienczyk, P; Shah, A; O’gallagher, K; European Heart Journal, 2023


Biswas, D. et al. Natural Language Processing to Identify Racial and Ethnic Disparities in Aortic Stenosis. medRxiv 2023.12.15.23300011 (2023) doi:10.1101/2023.12.15.23300011

Kraljevic, Z. et al. Validating Transformers for Redaction of Text from Electronic Health Records in Real-World Healthcare. in 2023 IEEE 11th International Conference on Healthcare Informatics (ICHI) 544–549 (IEEE, 2023).


Au Yeung, J., Wang, Y. Y., Kraljevic, Z. & Teo, J. T. H. Artificial intelligence (AI) for neurologists: do digital neurones dream of electric sheep? Pract. Neurol. 23, 476–488 (2023)

Wu, J. et al. Artificial Intelligence methods for Improved Detection of undiagnosed Heart Failure with Preserved Ejection Fraction (HFpEF). Eur. J. Heart Fail. (2023) doi:10.1002/ejhf.3115

Searle, T., Ibrahim, Z., Teo, J. & Dobson, R. J. B. Discharge summary hospital course summarisation of in patient Electronic Health Record text with clinical concept guided deep pre-trained Transformer models. J. Biomed. Inform. 141, 104358 (2023)

Bean, D. M., Kraljevic, Z., Shek, A., Teo, J. & Dobson, R. J. B. Hospital-wide natural language processing summarising the health data of 1 million patients. PLOS Digit Health 2, e0000218 (2023)

Cannata, A. et al. (2022) Prognostic relevance of demographic factors in cardiac magnetic resonance-proven acute myocarditis: A cohort study. Front Cardiovasc Med. 2022 Oct 13;9:1037837. doi: 10.3389/fcvm.2022.1037837

Farran, D. et al. (2022) Anticoagulation for atrial fibrillation in people with serious mental illness in the general hospital setting. Journal of psychiatric research, 153, 167–173. doi: 10.1016/j.jpsychires.2022.06.044

Farajidavar, N. et al. (2022). Diagnostic signature for heart failure with preserved ejection fraction (HFpEF): a machine learning approach using multi-modality electronic health record data. BMC cardiovascular disorders, 22(1), 567. doi: 10.1186/s12872-022-03005-w

Funnell, J. P. et al. (2022) Characterization of patients with idiopathic normal pressure hydrocephalus using natural language processing within an electronic healthcare record system. Journal of neurosurgery, 1–9. doi: 10.3171/2022.9.JNS221095

Ibrahim, ZM. et al. (2022). A Knowledge Distillation Ensemble Framework for Predicting Short- and Long-Term Hospitalization Outcomes From Electronic Health Records Data. IEEE journal of biomedical and health informatics, 26(1), 423–435. doi: 10.1109/JBHI.2021.3089287

Kraljevic, Z. et al. (2022) Foresight – Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs. arXiv:2212.08072 [cs] [Preprint].

Noor, K. et al. (2022). Deployment of a Free-Text Analytics Platform at a UK National Health Service Research Hospital: CogStack at University College London Hospitals. JMIR medical informatics, 10(8), e38122. doi: 10.2196/38122

Roy, R et al. (2022) Accuracy of ICD-10 codes for patients with acute myocarditis: a retrospective study at a large tertiary centre in London, UK, European Heart Journal, Volume 43, Issue Supplement_2, October 2022, ehac544.1704, doi: 10.1093/eurheartj/ehac544.1704

Searle, T. et al. (2022) Summarisation of Electronic Health Records with Clinical Concept Guidance. arXiv:2211.07126 [cs] [Preprint].

Wu, . et al. (2022) A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. NPJ digital medicine, 5(1), 186. doi: 10.1038/s41746-022-00730-6

Bendayan, R. et al. (2021) ‘Cognitive Trajectories in Comorbid Dementia With Schizophrenia or Bipolar Disorder: The South London and Maudsley NHS Foundation Trust Biomedical Research Centre (SLaM BRC) Case Register’, The American Journal of Geriatric Psychiatry, 29(6), pp. 604–616. doi:10.1016/j.jagp.2020.10.018.

Bittar, A. et al. (2021) ‘Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis’, JMIR Medical Informatics, 9(4), p. e22397. doi:10.2196/22397.

Carr, E. et al. (2021) ‘Evaluation and improvement of the National Early Warning Score (NEWS2) for COVID-19: a multi-hospital study’, BMC Medicine, 19(1), p. 23. doi:10.1186/s12916-020-01893-3.

Casey, A. et al. (2021) ‘A systematic review of natural language processing applied to radiology reports’, BMC Medical Informatics and Decision Making, 21(1), p. 179. doi:10.1186/s12911-021-01533-7.

Chilman, N. et al. (2021) ‘Text mining occupations from the mental health electronic health record: a natural language processing approach using records from the Clinical Record Interactive Search (CRIS) platform in south London, UK’, BMJ Open, 11(3), p. e042274. doi:10.1136/bmjopen-2020-042274.

Coats, T. et al. (2021) ‘An open‐source, expert‐designed decision tree application to support accurate diagnosis of myeloid malignancies’, eJHaem, 2(2), pp. 261–265. doi:10.1002/jha2.182.

Davidson, E.M. et al. (2021) ‘The reporting quality of natural language processing studies: systematic review of studies of radiology reports’, BMC Medical Imaging, 21(1), p. 142. doi:10.1186/s12880-021-00671-8.

Dong, H., Suárez-Paniagua, V., Whiteley, W., et al. (2021) ‘Explainable automated coding of clinical notes using hierarchical label-wise attention networks and label embedding initialisation’, Journal of Biomedical Informatics, 116, p. 103728. doi:10.1016/j.jbi.2021.103728.

Dong, H., Suárez-Paniagua, V., Zhang, H., et al. (2021) ‘Rare Disease Identification from Clinical Notes with Ontologies and Weak Supervision’, arXiv:2105.01995 [cs] [Preprint]. Available at: (Accessed: 18 December 2021).

Hong, S. et al. (2021) ‘TMEM106B and CPOX are genetic determinants of cerebrospinal fluid Alzheimer’s disease biomarker levels’, Alzheimer’s & Dementia, 17(10), pp. 1628–1640. doi:10.1002/alz.12330.

Irving, J. et al. (2021) ‘Using Natural Language Processing on Electronic Health Records to Enhance Detection and Prediction of Psychosis Risk’, Schizophrenia Bulletin, 47(2), pp. 405–414. doi:10.1093/schbul/sbaa126.

Kraljevic, Z., Shek, A., et al. (2021) ‘MedGPT: Medical Concept Prediction from Clinical Narratives’, arXiv:2107.03134 [cs] [Preprint]. Available at: (Accessed: 18 December 2021).

Kraljevic, Z., Searle, T., et al. (2021) ‘Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit’, Artificial Intelligence in Medicine, 117, p. 102083. doi:10.1016/j.artmed.2021.102083.

Lau, I.S. et al. (2021) ‘Natural language word embeddings as a glimpse into healthcare language and associated mortality surrounding end of life’, BMJ Health & Care Informatics, 28(1), p. e100464. doi:10.1136/bmjhci-2021-100464.

O’Gallagher, K. et al. (2021) ‘Pre-existing cardiovascular disease rather than cardiovascular risk factors drives mortality in COVID-19’, BMC Cardiovascular Disorders, 21(1), p. 327. doi:10.1186/s12872-021-02137-9.

Oliver, D. et al. (2021) ‘Real-world implementation of precision psychiatry: Transdiagnostic risk calculator for the automatic detection of individuals at-risk of psychosis’, Schizophrenia Research, 227, pp. 52–60. doi:10.1016/j.schres.2020.05.007.

O’Gallagher, K. et al. (2021) ‘Pre-existing cardiovascular disease rather than cardiovascular risk factors drives mortality in COVID-19’, BMC Cardiovascular Disorders, 21(1), p. 327. doi:10.1186/s12872-021-02137-9.

Oliver, D. et al. (2021) ‘Real-world implementation of precision psychiatry: Transdiagnostic risk calculator for the automatic detection of individuals at-risk of psychosis’, Schizophrenia Research, 227, pp. 52–60. doi:10.1016/j.schres.2020.05.007.

Rannikmäe, K. et al. (2021) ‘Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke’, BMC Medical Informatics and Decision Making, 21(1), p. 191. doi:10.1186/s12911-021-01556-0.

Searle, T., et al. (2021). Estimating redundancy in clinical text. Journal of biomedical informatics, 124, 103938. doi: 10.1016/j.jbi.2021.103938

Shek, A. et al. (2021) ‘Machine learning‐enabled multitrust audit of stroke comorbidities using natural language processing’, European Journal of Neurology, 28(12), pp. 4090–4097. doi:10.1111/ene.15071.

Teo, J.T.H. et al. (2021) ‘Real-time clinician text feeds from electronic health records’, npj Digital Medicine, 4(1), p. 35. doi:10.1038/s41746-021-00406-7.

Viani, N. et al. (2021) ‘A natural language processing approach for identifying temporal disease onset information from mental healthcare text’, Scientific Reports, 11(1), p. 757. doi:10.1038/s41598-020-80457-0.

Wu, H. et al. (2021) ‘Ensemble learning for poor prognosis predictions: A case study on SARS-CoV-2’, Journal of the American Medical Informatics Association, 28(4), pp. 791–800. doi:10.1093/jamia/ocaa295.

Zakeri, R. et al. (2021) ‘Biological responses to COVID-19: Insights from physiological and blood biomarker profiles’, Current Research in Translational Medicine, 69(2), p. 103276. doi:10.1016/j.retram.2021.103276.

CogStack Information Retrieval & Extraction Platform

Arrange a free demo

Arrange a free no obligation demo and allow us to demonstrate how CogStack can transform your approach. Simply fill in the contact form, and we’ll schedule your demo.