Main Menu

Department of Data Science

The Department of Data Science is a cross-divisional department that focuses on the application of computational approaches, machine learning and artificial intelligence to address key questions in translational cancer research. In particular, we apply data science to drug discovery, chemical biology and adaptive therapy in the clinic. The department is part of a wider vibrant Cancer Informatics community at The Institute of Cancer Research, London.

*This page is currently under construction, please check back later for further information.*

Informatics word cloud

Underpinning our research is the integration of complex, multidisciplinary data to connect different domains of science. We develop integrative resources and machine learning algorithms to discover hidden knowledge in data.

The department is composed of four teams

Computational Biology and Chemogenomics

The Computational Biology and Chemogenomics team is part of the Cancer Research UK Cancer Therapeutics Unit (CTU). The team focuses on the application of computational biology, molecular modelling and systems biology approaches to inform and support drug discovery projects. A key activity of the team is our Early Stage Target Evaluation.

We use our canSAR platform and other integrative expertise to objectively assess and prioritise novel targets for our drug discovery. We also innovate and apply bioinformatics, structural computational biology, molecular modelling and systems computational biology to support and inform drug discovery projects at all stages within the CTU.

Computational Chemical Biology

The Computational Chemical Biology team focuses on the development of novel methodologies to address key challenges in cancer translational research, particularly at the interface between Chemistry and Biology.

We develop and apply integrative Big Data approaches to inform key decisions in drug discovery and transitional discovery research. We develop innovative, world-leading approaches for assessing target ‘druggability’ the suitability of a protein to be modulated by a therapeutic drug. We apply these to assess and prioritise the tractability of cancer genes for drug discovery.

To enable our research, we developed canSAR, the world’s largest public cancer drug discovery resource. canSAR integrates data from patient multiomic studies, protein annotation, clinical trials, protein 3D structure, cellular networks and pathways, chemistry, pharmacology and functional genomics. Altogether, canSAR integrates over 10 billion experimental measurements from all these disciplines. The canSAR integration allows answering key questions and generating hypotheses very rapidly, empowering innovation in drug discovery.

We also develop machine learning and AI approaches to discovery hidden patterns and make recommendations for drug discovery. In particular, we provide the leading public resource on target druggability assessment. We apply our predictive models to highlight hidden opportunities in novel drug targets, identify risks associated with them, and suggest key experiments to de-risk them.

We utilise canSAR as a key component of our drug discovery at the ICR. It is also widely used by more than 200,000 users from over 250 counties. It is used by academia and industry to inform their drug discovery. 

Chemical probes are widely used in translational biological research and target validation to study specific proteins. However, the use of poor, non-selective probes by the community is rife due to lack of resources to better inform chemical probe selection, with profound implications for data robustness and reproducibility.

We have developed Probe Miner to provide statistical, objective, big data driven assessment and ranking of chemical probes based on pharmacological data in canSAR. Probe Miner uses medicinal chemistry and chemical biology curated within canSAR to analyse chemical probes for fitness Probe Miner is regularly updated and provides a user friendly access point for researchers.

The Knowledge Hub

Despite great successes and advances in therapy allowing us to effectively treat 50% of cancer patients, many patients remain who do not respond to therapy, relapse and develop long-term side effects to their treatment.

To address these problems, we must develop a picture of each patient that embraces the complex interplay between a large number of factors.

The ICR Core Bioinformatics Facility

The ICR Core Bioinformatics Facility provides bioinformatics support to projects across all areas of research at the ICR. Develop and apply bioinformatics pipelines including NGS, proteomics, systems biology to support projects in clinical and discovery research teams at the ICR and The Royal Marsden NHS Foundation Trust.

We also provide a hub for information and bioinformatics capabilities at the ICR. We organise and run bioinformatics training courses and flash talks.

Members of the department

We take great pride in our diversity and inclusivity. The department currently boasts truly international membership and researchers from disciplines as broad as genetics, computer science, mathematics, physics, chemistry and more. 

  • Costas Mitsopoulos
  • James Campbell
  • Albert Antolin
  • Patrizio di Micco
  • Bugra Ozer (Data Science, Bioinformatics)
  • Christos Kanas (Chemoinformatics)
  • Joe Tym (Software)
  • Eloy Villasclaras Fernandez (Software, Dev Ops)
  • Veronica Garcia-Perez (Software, Dev Ops, Clinical Imaging)
  • Sheng Yu (Data Science, Data Modelling)
  • Santosh Bokefode