JBCA ML Club Workshop

As the volume of information available to science increases, adoption and development of new techniques for processing this data is essential. In order to better understand the opportunities machine learning could offer data-intensive researchers, we are running a 3-day workshop that will provide introductory sessions on data exploration, a diverse range of speakers from academia and industry and networking opportunities to meet with other data-intensive researchers and professionals. Our focus for the 3 days is exploring recent advances in machine learning from fields outside astronomy and develop a better understanding of where these techniques may be best applied in our research. We encourage participants with different backgrounds and levels of experience with machine learning as the workshop will offer sessions for novices through to advanced users.

Jodrell Bank Centre for Astrophysics Code of Conduct for Meetings, Conferences & Events

JBCA is dedicated to providing an environment free from discrimination, bullying, harassment or victimisation, where all staff, students, visitors and guests are treated with respect and dignity.

In line with the University of Manchester policy on Equality and Diversity, JBCA aims to create a culture of diversity where all are valued for their contribution and individuality.

JBCA will not tolerate harassment of any group or individual with regards to age, disability, ethnicity, gender, religion, sexual orientation, physical appearance or political opinion.

This code applies to all staff, students and visitors of JBCA, as well as attendees at events hosted by the organisation.

  • All communications (both verbal and written) should be appropriate for a professional audience that includes a diverse range of background. Sexual or sexist language and imagery should not be used at any time during meetings, conference and events – including presentation and posters.
  • Be kind to others, and treat each other with respect. Do not insult or put down other attendees. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate.
  • Individuals asked to stop any inappropriate behaviour are expected to comply immediately. Any attendees violating this code may be asked to leave the event at the sole discretion of the event organisers, in consultation with the JBCA Directorate.

Programme

A detailed programme will follow but the general outline of the workshop is as follows:

  • Tuesday 13th: Introduction to Machine Learning for data-intensive science
    Time Session Speaker Slides
    9:00-10:00 Coffee and Registration
    10:00-10:05 Welcome & Logistics Alex Clarke Download
    10:05-10:15 Workshop Overview & Open Science Rachael Ainsworth
    10:15-10:30 Intro to JBCA Sara Summerton Download
    10:30-11:00 Machine Learning Basics Alex Clarke Download
    11:00-11:25 Introduction to Neural Networks Therese Cantwell Download
    11:25-11:35 Comfort Break
    11:35-12:05 Introduction to Support Vector Machines Joe Fennell Download
    12:05-12:30 Introduction to Random Forests & t-SNE Alex Clarke Download
    12:30-13:45 Lunch
    13:45-14:30 Machine Learning with Music & Audio Alex Clarke Download
    14:30-15:15 Machine Learning with Satellite data Joe Fennell Download
    15:15-15:30 Comfort Break
    15:30-16:15 Fine-Tuning Convolutional Neural Networks Therese Cantwell Download
    16:15-19:00 Pizza social Lovell Room. Alan Turing Building
  • Wednesday 14th: Machine Learning applications and methods
    Time Session Speaker Slides
    8:45-9:15 Coffee and Registration
    9:15-9:30 Day 2 Introduction Anna Scaife
    9:30-10:15 The Automation of Science Ross King
    10:15-11:00 Flash Talks 1
  • Joe Hanson
  • Richard Henchman
  • Ann Gledson
  • Matt Crooks
  • 11:00-11:15 Comfort Break
    11:15-12:00 Data-driven optimization - Designing meta-heuristics for data-rich optimization settings. Julia Handl
    12:00-12:30 Flash Talks 2
  • Karim Sidaoui
  • Hongmin Tang
  • Arianna Dagliati
  • 12:30-13:45 Lunch
    13:45-14:30 An overview of the ML and Data Science work at IBM research UK Blair Edwards
    14:30-15:15 Assessing Mental Health through Social Media and Smart Devices: Current State, Challenges and Future Directions Adam Tsakalidis Download
    15:15-15:30 Comfort Break
    15:30-16:15 Learning at Scale Sidharth Kashyap
    16:16-17:00 Flash Talks 3
  • Stephanie Shoop-Worrall
  • Zafiirah Hosenie
  • Alexia Sampri
  • 17:00-18:00 Drinks Reception
  • Thursday 15th: Machine Learning applications and methods
    Time Session Speaker Slides
    8:30-9:00 Coffee
    9:00-9:45 Toward learning from few examples ('one-shot' learning) using neuromorphic hardware Kamelia Demova
    9:45-10:30 Online Inference Using Distribution-Valued Stochastic Processes Thomas House
    10:30-10:45 Comfort Break
    10:45-11:30 Shipping a Machine Learning model to Production; is it always smooth sailing? Leanne Fitzpatrick Download
    11:30-12:14 Flash Talks 4
  • Georgia Ionescu
  • 12:15-12:45 Discussion and closing comments
    12:45-13:45 Lunch

Venue

The workshop will take place in the Bell-Burnell Lecture Room located in the Schuster Building on the University of Manchester's main campus.

Invited Speakers

We have invited speakers from both academia and industry to talk about the latest research and applications in Machine Learning:

Name Title Abstract Institute/Company
Ross King The Automation of Science The application of Artificial Intelligence (AI) to science has a distinguished history.  Recent progress in AI and laboratory automation has made possible the full automation of simple forms of scientific research. A Robot Scientist is a physically implemented robotic system that applies techniques from AI to execute cycles of automated scientific experimentation: hypothesis formation, selection of efficient experiments to discriminate between hypotheses, execution of experiments using laboratory automation equipment, and analysis of results. The motivation for Robot Scientists is to both to better understand science, and to make science more efficient. Our Robot Scientist ‘Adam’ was the first machine to autonomously discover novel scientific knowledge.  Our Robot Scientist  ‘Eve’ was originally designed to automate drug discovery, with a focus on neglected tropical diseases. In chess there is a continuum of ability from novices up to Grandmasters. We argue that this is true in science as well, from the simple research that Adam and Eve can do, up to an Einstein or Newton.  If this is true then it is likely that advances in AI and lab automation will drive the development of ever-smarter Robot Scientists. The Physics Nobel Frank Wilczek is on record as saying (10 years ago) that in 100 years’ time the best physicist will be a machine. If this comes to pass this will not only transform technology, but our understanding of science, and the Universe. University of Manchester
Blair Edwards An overview of the ML and Data Science work at IBM research UK IBM Hartree Centre
Kamelia Dimova Toward learning from few examples ('one-shot' learning) using neuromorphic hardware. Mindtrace is a start-up company, which aims to develop learning algorithms bringing the Artificial Intelligence closer to the learning capabilities of the brain. We aim to do so using a specialist hardware, which in itself represents some of the key processing characteristics of the brain. This talk will give an overview of the unsupervised, 'one-shot' learning challenges and current approaches to solving them. Mind Trace
Sidharth Kashyap Learning at Scale. Deep Learning today is in its nascent stages and the systems and methodologies that are being put into play to solve problems using Deep Learning are evolving constantly. We need to understand the underlying building blocks and then think about scale out methodologies for us to solve real world problems that go beyond recognizing a few images. We shall explore these and discuss the current state of the art in large scale Deep Learning. Intel
Adam Tsakalidis Assessing Mental Health through Social Media and Smart Devices: Current State, Challenges and Future Directions. (slides) Predicting mental health from smartphone and social media data on a longitudinal basis has recently attracted great interest, with very promising results being reported across many studies. Such approaches have the potential to revolutionise mental health assessment, if their development and evaluation follows a real world deployment setting.In this talk, we will provide an overview on the field and our work, in two parts. In the first part, we will present the first study to combine textual data from social media (i.e., posts and messages) and logs from smart devices to assess an individual's mental health. In particular, we will present a multiple kernel learning approach that leverages such heterogeneous data for assessing an individual's mental health over time, with very high accuracy. In the second part of the talk, we will take a closer look on the ability of such approaches be employed under a real-world setting. Using extensive experimentation and evaluation of current state-of-the-art approaches, we will outline major open challenges in the field and provide directions for future research. The Alan Turing Institute
Julia Handl Data-driven optimization - Designing meta-heuristics for data-rich optimization settings. Optimization, in direct or indirect form, lies at the heart of most of machine learning approaches. The reverse of this relationship is less well explored: specifically, the design of optimization methods in data-rich problem domains can significantly benefit from the integration of the available data. In this context, machine learning has a significant role to play in extracting the most relevant patterns and relationships in the data. In my talk, I will discuss the benefits of a data-driven approach to the design of meta-heuristic optimisers, providing examples in applications ranging from data clustering, over planning problems, to structure prediction. University of Manchester
Leanne Fitzpatrick Shipping a Machine Learning model to Production; is it always smooth sailing? Allowing data services to be deployed in production environments can have many barriers to entry, from culture to frameworks to real-time practicalities. Many data scientists are trained to develop machine learning models, but how can these seamlessly be integrated into the rest of the production technology stack and what pain points can this bring? Despite developing a range of in-house capabilities, processes and practices in order to tackle this problem it’s not always smooth sailing! Pitfalls such as internal schema development, package management, and testing can all leave you wondering why you wanted to deploy your model in the first place! We’ll discuss how to get started with deploying a machine learning model, considerations around the practicalities of the build and how to bridge gaps between a technical team’s skill sets to enable data products in production. Hello Soda
Thomas House Online Inference Using Distribution-Valued Stochastic Processes. Gaussian processes are a commonly used technique in statistical learning, and allow for flexible data-driven inference with few (hyper-)parameters. I will present a generalisation of this approach to other distribution-valued processes that allow for calibration of SDE- and pure-jump Markov-based models, as well as model-free inference for highly non-Gaussian data such as small counts and nominal observations. University of Manchester

Flash Talk Abstracts

There will be a number of flash talk on Wednesday and Thursday:

Name Abstract Flash Talk
Joe Hanson I'm applying machine learning to research cosmic magnetism in astrophysics, and my work involves classification with convolutional neural networks. This novel application of machine learning also necessitates further research into some specific problems; including handling class imbalance, and transitioning (e.g. through transfer or semi-supervised learning) from labelled simulated datasets to real (and unlabelled) observations. Flash Talks 1
Richard Henchman We develop theory for the total quantification of molecular flexibility using computer simulation. A key challenge is to automate the process so that it works for any simulation. Flash Talks 1
Ann Gledson Ann has been conducting research since 2000 in the areas of data analytics, text mining, interface design and software engineering and she will give an overview of the variety of data science tasks she has recently worked on in her current role as a Research Software Engineer at the UoM. For example in the InnovateUK CityVerve project she developed a web application to combine disparate streams of IoT data and allows users to perform various analysis techniques. Flash Talks 1
Matt Crooks How well can we differentiate school days from school holidays using viewing figures of CBBC programmes on BBC iPlayer? In fact, well enough to identify snow days! Flash Talks 1
Karim Sidaoui Understanding customer experience using artificial intelligence and machine learning. Flash Talks 2
Hongmin Tang Classifying radio galaxy morphology has been done by visual inspection in the last few decades. Given that CNN can extract features from image data, we adopted a 13-layer convolutional network architecture to classify selected radio survey images. The best approach yielded over 90% of general model accuracy. Some transfer learning approach were done as well. Flash Talks 2
Arianna Dagliati Use of Topological Data Analysis approaches to tackle precision medicine issues. Application of unsupervised methods to identify temporal phenotypes. Flash Talks 2
Stephanie Shoop-Worrall There are six key outcomes for young people with arthritis which may change over time in different directions. We aim to understand if groups of young people experience different outcome patterns. Flash Talks 3
Zafiirah Hosenie I will outline the application of machine learning techniques for variable star classification. In particular, I will demonstrate various methods for feature visualisation and evaluation applicable to high dimensional datasets. Flash Talks 3
Alexia Sampri Challenges in the aggregation of biomedical datasets and probabilistic approaches to overcome representational heterogeneity - Systemic Lupus Erythematosus exemplar Flash Talks 3
Georgia Ionescu Deep learning can be used for predicting breast cancer by analysing mammographic images. Our method has been tested on a large dataset and showed results comparable with human readers (medical experts). Flash Talks 4

Register

Registration is now closed.