Big Data, Data Prep
Exploring reaction data for machine learning: synthesis planning and reaction predictions
Prof Alexei Lapkin, Department of Chemical Engineering and Biotechnology, University of Cambridge
Data is critical for machine learning, but in chemistry, data is rather scarce and needs to be prepared for machine learning workflows. In SRE group we have approached this problem from different directions: evaluating how to use large reaction datasets of available chemical data (Reaxys), looking into data recording standards (extension of InChi to record process data and UDM standard), preparing data for ML pipelines starting from Open Reaction Database (ORD) schema as well as from Reaxys. We then productised some of these tools via our start-up company Chemical Data Intelligence (CDI) Pte Ltd. In the talk I’ll discuss the academic work on this topic and the route to commercialisation