ֱ̽ of Cambridge - Flatiron Institute /taxonomy/external-affiliations/flatiron-institute en New datasets will train AI models to think like scientists /research/news/new-datasets-will-train-ai-models-to-think-like-scientists <div class="field field-name-field-news-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><img class="cam-scale-with-grid" src="/sites/default/files/styles/content-580x288/public/news/research/news/polymathic-ai.jpg?itok=J6Vf_9mh" alt="A mosaic of simulations included in the Well collection of datasets" title="A mosaic of simulations included in the Well collection of datasets, Credit: Alex Meng, Aaron Watters and the Well Collaboration" /></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p> ֱ̽initiative, called <a href="https://polymathic-ai.org/">Polymathic AI</a>, uses technology like that powering large language models such as OpenAI’s ChatGPT or Google’s Gemini. But instead of ingesting text, the project’s models learn using scientific datasets from across astrophysics, biology, acoustics, chemistry, fluid dynamics and more, essentially giving the models cross-disciplinary scientific knowledge.</p> <p>“These datasets are by far the most diverse large-scale collections of high-quality data for machine learning training ever assembled for these fields,” said team member Michael McCabe from the Flatiron Institute in New York City. “Curating these datasets is a critical step in creating multidisciplinary AI models that will enable new discoveries about our universe.”</p> <p>On 2 December, the Polymathic AI team released two of its open-source training dataset collections to the public — a colossal 115 terabytes, from dozens of sources — for the scientific community to use to train AI models and enable new scientific discoveries. For comparison, GPT-3 used 45 terabytes of uncompressed, unformatted text for training, which ended up being around 0.5 terabytes after filtering.</p> <p> ֱ̽full datasets are available to download for free on <a href="https://huggingface.co/">HuggingFace</a>, a platform hosting AI models and datasets. ֱ̽Polymathic AI team provides further information about the datasets in <a href="https://nips.cc/virtual/2024/poster/97882">two</a> <a href="https://nips.cc/virtual/2024/poster/97791">papers</a> accepted for presentation at the <a href="https://neurips.cc/">NeurIPS</a> machine learning conference, to be held later this month in Vancouver, Canada.</p> <p>“Just as LLMs such as ChatGPT learn to use common grammatical structure across languages, these new scientific foundation models might reveal deep connections across disciplines that we’ve never noticed before,” said Cambridge team lead <a href="https://astroautomata.com/">Dr Miles Cranmer</a> from Cambridge’s Institute of Astronomy. “We might uncover patterns that no human can see, simply because no one has ever had both this breadth of scientific knowledge and the ability to compress it into a single framework.”</p> <p>AI tools such as machine learning are increasingly common in scientific research, and were recognised in two of this year’s <a href="/research/news/university-of-cambridge-alumnus-awarded-2024-nobel-prize-in-physics">Nobel</a> <a href="/research/news/university-of-cambridge-alumni-awarded-2024-nobel-prize-in-chemistry">Prizes</a>. Still, such tools are typically purpose-built for a specific application and trained using data from that field. ֱ̽Polymathic AI project instead aims to develop models that are truly polymathic, like people whose expert knowledge spans multiple areas. ֱ̽project’s team reflects intellectual diversity, with physicists, astrophysicists, mathematicians, computer scientists and neuroscientists.</p> <p> ֱ̽first of the two new training dataset collections focuses on astrophysics. Dubbed the Multimodal Universe, the dataset contains hundreds of millions of astronomical observations and measurements, such as portraits of galaxies taken by NASA’s James Webb Space Telescope and measurements of our galaxy’s stars made by the European Space Agency’s Gaia spacecraft.</p> <p> ֱ̽other collection — called the Well — comprises over 15 terabytes of data from 16 diverse datasets. These datasets contain numerical simulations of biological systems, fluid dynamics, acoustic scattering, supernova explosions and other complicated processes. Cambridge researchers played a major role in developing both dataset collections, working alongside PolymathicAI and other international collaborators.</p> <p>While these diverse datasets may seem disconnected at first, they all require the modelling of mathematical equations called partial differential equations. Such equations pop up in problems related to everything from quantum mechanics to embryo development and can be incredibly difficult to solve, even for supercomputers. One of the goals of the Well is to enable AI models to churn out approximate solutions to these equations quickly and accurately.</p> <p>“By uniting these rich datasets, we can drive advancements in artificial intelligence not only for scientific discovery, but also for addressing similar problems in everyday life,” said Ben Boyd, PhD student in the Institute of Astronomy.</p> <p>Gathering the data for those datasets posed a challenge, said team member Ruben Ohana from the Flatiron Institute. ֱ̽team collaborated with scientists to gather and create data for the project. “ ֱ̽creators of numerical simulations are sometimes sceptical of machine learning because of all the hype, but they’re curious about it and how it can benefit their research and accelerate scientific discovery,” he said.</p> <p> ֱ̽Polymathic AI team is now using the datasets to train AI models. In the coming months, they will deploy these models on various tasks to see how successful these well-rounded, well-trained AIs are at tackling complex scientific problems.</p> <p>“It will be exciting to see if the complexity of these datasets can push AI models to go beyond merely recognising patterns, encouraging them to reason and generalise across scientific domains,” said Dr Payel Mukhopadhyay from the Institute of Astronomy. “Such generalisation is essential if we ever want to build AI models that can truly assist in conducting meaningful science.”</p> <p>“Until now, haven’t had a curated scientific-quality dataset cover such a wide variety of fields,” said Cranmer, who is also a member of Cambridge’s Department of Applied Mathematics and Theoretical Physics. “These datasets are opening the door to true generalist scientific foundation models for the first time. What new scientific principles might we discover? We're about to find out, and that's incredibly exciting.”</p> <p> ֱ̽Polymathic AI project is run by researchers from the Simons Foundation and its Flatiron Institute, New York ֱ̽, the ֱ̽ of Cambridge, Princeton ֱ̽, the French Centre National de la Recherche Scientifique and the Lawrence Berkeley National Laboratory.</p> <p>Members of the Polymathic AI team from the ֱ̽ of Cambridge include PhD students, postdoctoral researchers and faculty across four departments: the Department of Applied Mathematics and Theoretical Physics, the Department of Pure Mathematics and Mathematical Statistics, the Institute of Astronomy and the Kavli Institute for Cosmology.</p> </div></div></div><div class="field field-name-field-content-summary field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p><p>What can exploding stars teach us about how blood flows through an artery? Or swimming bacteria about how the ocean’s layers mix? A collaboration of researchers, including from the ֱ̽ of Cambridge, has reached a milestone toward training artificial intelligence models to find and use transferable knowledge between fields to drive scientific discovery.</p> </p></div></div></div><div class="field field-name-field-image-credit field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="https://polymathic-ai.org/" target="_blank">Alex Meng, Aaron Watters and the Well Collaboration</a></div></div></div><div class="field field-name-field-image-desctiprion field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">A mosaic of simulations included in the Well collection of datasets</div></div></div><div class="field field-name-field-cc-attribute-text field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"><p><a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" rel="license"><img alt="Creative Commons License." src="/sites/www.cam.ac.uk/files/inner-images/cc-by-nc-sa-4-license.png" style="border-width: 0px; width: 88px; height: 31px;" /></a><br /> ֱ̽text in this work is licensed under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. Images, including our videos, are Copyright © ֱ̽ of Cambridge and licensors/contributors as identified. All rights reserved. We make our image and video content available in a number of ways – on our <a href="/">main website</a> under its <a href="/about-this-site/terms-and-conditions">Terms and conditions</a>, and on a <a href="/about-this-site/connect-with-us">range of channels including social media</a> that permit your use and sharing of our content under their respective Terms.</p> </div></div></div><div class="field field-name-field-show-cc-text field-type-list-boolean field-label-hidden"><div class="field-items"><div class="field-item even">Yes</div></div></div> Mon, 02 Dec 2024 15:59:08 +0000 sc604 248583 at Scientists begin building AI for scientific discovery using tech behind ChatGPT /research/news/scientists-begin-building-ai-for-scientific-discovery-using-tech-behind-chatgpt <div class="field field-name-field-news-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><img class="cam-scale-with-grid" src="/sites/default/files/styles/content-580x288/public/news/research/news/gettyimages-1398047278-dp.jpg?itok=-K0YLB_o" alt="Network and data connection on a dark blue background." title="Network and data connection on a dark blue background., Credit: Yuichiro Chino via Getty Images" /></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>While ChatGPT deals in words and sentences, the team’s AI will learn from numerical data and physics simulations from across scientific fields to aid scientists in modelling everything from supergiant stars to the Earth’s climate.</p>&#13; &#13; <p> ֱ̽team launched the initiative, called <a href="https://polymathic-ai.org/">Polymathic AI</a> earlier this week, alongside the publication of a series of <a href="https://arxiv.org/abs/2310.02994">related</a> <a href="https://arxiv.org/abs/2310.02989">scientific</a> <a href="https://arxiv.org/abs/2310.03024">papers</a> on the arXiv.org open access repository.</p>&#13; &#13; <p>“This will completely change how people use AI and machine learning in science,” said Polymathic AI principal investigator Shirley Ho, a group leader at the Flatiron Institute’s Center for Computational Astrophysics in New York City.</p>&#13; &#13; <p> ֱ̽idea behind Polymathic AI “is similar to how it’s easier to learn a new language when you already know five languages,” said Ho.</p>&#13; &#13; <p>Starting with a large, pre-trained model, known as a foundation model, can be both faster and more accurate than building a scientific model from scratch. That can be true even if the training data isn’t obviously relevant to the problem at hand.</p>&#13; &#13; <p>“It’s been difficult to carry out academic research on full-scale foundation models due to the scale of computing power required,” said co-investigator Miles Cranmer, from Cambridge’s Department of Applied Mathematics and Theoretical Physics and Institute of Astronomy. “Our collaboration with Simons Foundation has provided us with unique resources to start prototyping these models for use in basic science, which researchers around the world will be able to build from – it’s exciting.”</p>&#13; &#13; <p>“Polymathic AI can show us commonalities and connections between different fields that might have been missed,” said co-investigator Siavash Golkar, a guest researcher at the Flatiron Institute’s Center for Computational Astrophysics. “In previous centuries, some of the most influential scientists were polymaths with a wide-ranging grasp of different fields. This allowed them to see connections that helped them get inspiration for their work. With each scientific domain becoming more and more specialised, it is increasingly challenging to stay at the forefront of multiple fields. I think this is a place where AI can help us by aggregating information from many disciplines.”</p>&#13; &#13; <p> ֱ̽Polymathic AI team includes researchers from the Simons Foundation and its Flatiron Institute, New York ֱ̽, the ֱ̽ of Cambridge, Princeton ֱ̽ and the Lawrence Berkeley National Laboratory. ֱ̽team includes experts in physics, astrophysics, mathematics, artificial intelligence and neuroscience.</p>&#13; &#13; <p>Scientists have used AI tools before, but they’ve primarily been purpose-built and trained using relevant data. “Despite rapid progress of machine learning in recent years in various scientific fields, in almost all cases, machine learning solutions are developed for specific use cases and trained on some very specific data,” said co-investigator Francois Lanusse, a cosmologist at the Centre national de la recherche scientifique (CNRS) in France. “This creates boundaries both within and between disciplines, meaning that scientists using AI for their research do not benefit from information that may exist, but in a different format, or in a different field entirely.”</p>&#13; &#13; <p>Polymathic AI’s project will learn using data from diverse sources across physics and astrophysics (and eventually fields such as chemistry and genomics, its creators say) and apply that multidisciplinary savvy to a wide range of scientific problems. ֱ̽project will “connect many seemingly disparate subfields into something greater than the sum of their parts,” said project member Mariel Pettee, a postdoctoral researcher at Lawrence Berkeley National Laboratory.</p>&#13; &#13; <p>“How far we can make these jumps between disciplines is unclear,” said Ho. “That’s what we want to do — to try and make it happen.”</p>&#13; &#13; <p>ChatGPT has well-known limitations when it comes to accuracy (for instance, the chatbot says 2,023 times 1,234 is 2,497,582 rather than the correct answer of 2,496,382). Polymathic AI’s project will avoid many of those pitfalls, Ho said, by treating numbers as actual numbers, not just characters on the same level as letters and punctuation. ֱ̽training data will also use real scientific datasets that capture the physics underlying the cosmos.</p>&#13; &#13; <p>Transparency and openness are a big part of the project, Ho said. “We want to make everything public. We want to democratise AI for science in such a way that, in a few years, we’ll be able to serve a pre-trained model to the community that can help improve scientific analyses across a wide variety of problems and domains.”</p>&#13; </div></div></div><div class="field field-name-field-content-summary field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p><p>An international team of scientists, including from the ֱ̽ of Cambridge, have launched a new research collaboration that will leverage the same technology behind ChatGPT to build an AI-powered tool for scientific discovery.</p>&#13; </p></div></div></div><div class="field field-name-field-image-credit field-type-link-field field-label-hidden"><div class="field-items"><div class="field-item even"><a href="/" target="_blank">Yuichiro Chino via Getty Images</a></div></div></div><div class="field field-name-field-image-desctiprion field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Network and data connection on a dark blue background.</div></div></div><div class="field field-name-field-cc-attribute-text field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"><p><a href="https://creativecommons.org/licenses/by-nc-sa/4.0/" rel="license"><img alt="Creative Commons License." src="/sites/www.cam.ac.uk/files/inner-images/cc-by-nc-sa-4-license.png" style="border-width: 0px; width: 88px; height: 31px;" /></a><br />&#13; ֱ̽text in this work is licensed under a <a href="https://creativecommons.org/licenses/by-nc-sa/4.0/">Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License</a>. Images, including our videos, are Copyright © ֱ̽ of Cambridge and licensors/contributors as identified.  All rights reserved. We make our image and video content available in a number of ways – as here, on our <a href="/">main website</a> under its <a href="/about-this-site/terms-and-conditions">Terms and conditions</a>, and on a <a href="/about-this-site/connect-with-us">range of channels including social media</a> that permit your use and sharing of our content under their respective Terms.</p>&#13; </div></div></div><div class="field field-name-field-show-cc-text field-type-list-boolean field-label-hidden"><div class="field-items"><div class="field-item even">Yes</div></div></div> Fri, 13 Oct 2023 10:55:15 +0000 sc604 242661 at Mathematics explains how giant ‘whirlpools’ form in developing egg cells /research/news/mathematics-explains-how-giant-whirlpools-form-in-developing-egg-cells <div class="field field-name-field-news-image field-type-image field-label-hidden"><div class="field-items"><div class="field-item even"><img class="cam-scale-with-grid" src="/sites/default/files/styles/content-580x288/public/news/research/news/animation.jpg?itok=dvsL0OCq" alt="" title="Credit: None" /></div></div></div><div class="field field-name-body field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p>Egg cells are among the largest cells in the animal kingdom. Unpropelled, a protein could take hours or even days to drift from one side of a forming egg cell to the other. Luckily, nature has developed a faster way: scientists have spotted cell-spanning whirlpools in the immature egg cells of animals such as mice, zebrafish and fruit flies. These vortices make cross-cell commutes take just a fraction of the time. But scientists didn’t know how these crucial flows formed.</p> <p>Using mathematical modeling, researchers say they now have an answer. ֱ̽gyres result from the collective behavior of rodlike molecular tubes called microtubules that extend inward from the cells’ membranes. Their <a href="https://doi.org/10.1103/PhysRevLett.126.028103">results</a> are reported in the journal <em>Physical Review Letters.</em></p> <p>“While much is not understood about the biological function of these flows, they distribute nutrients and other factors that organise the body plan and guide development,” said study co-lead author David Stein, a research scientist at the Flatiron Institute’s Center for Computational Biology (CCB) in New York City. And given how widely they have been observed, “they are probably even in humans.”</p> <p>Scientists have studied cellular flows since the late 18th century, when Italian physicist Bonaventura Corti peered inside cells using his microscope. What he found were fluids in constant motion, however scientists didn’t understand the mechanisms driving these flows until the 20th century.</p> <p> ֱ̽culprits, they found, are molecular motors that walk along the microtubules. Those motors haul large biological payloads such as lipids. Carrying the cargo through a cell’s relatively thick fluids is like dragging a beach ball through honey. As the payloads move through the fluid, the fluid moves too, creating a small current.</p> <p>Sometimes those currents aren’t so small. In certain developmental stages of a common fruit fly’s egg cell, scientists spotted whirlpool-like currents that spanned the entire cell. In these cells, microtubules extend inward from the cell’s membrane like stalks of wheat. Molecular motors climbing these microtubules push downward on the microtubule as they ascend. That downward force bends the microtubule, redirecting the resulting flows.</p> <p>Previous studies looked at this bending mechanism, but only for isolated microtubules. Those studies predicted that the microtubules would wave around in circles, but their behavior didn’t match the observations.</p> <p>“ ֱ̽mechanism of the swirling instability is disarmingly simple, and the agreement between our calculations and the experimental observations by various groups lends support to the idea that this is indeed the process at work in fruit fly egg cells,” said Professor Raymond Goldstein from Cambridge’s Department of Applied Mathematics and Theoretical Physics. “Further experimental tests should be able to probe details of the transition between disordered and ordered flows, where there is still much to be understood.”</p> <p>In the new study, the researchers added a key factor to their model: the influence of neighboring microtubules. That addition showed that the fluid flows generated by the payload-ferrying motors bend nearby microtubules in the same direction. With enough motors and a dense enough packing of microtubules, the authors found that all the microtubules eventually lean together like wheat stalks caught in a strong breeze. This collective alignment orients all the flows in the same direction, creating the cell-wide vortex seen in real fruit fly cells.</p> <p>While grounded in reality, the new model is stripped down to the bare essentials to make clearer the conditions responsible for the swirling flows. ֱ̽researchers are now working on versions that more realistically capture the physics behind the flows to understand better the role the currents play in biological processes.</p> <p>Stein serves as the co-lead author of the new study along with Gabriele De Canio, a researcher at the ֱ̽ of Cambridge. They co-authored the study with CCB director and New York ֱ̽ professor Michael Shelley and ֱ̽ of Cambridge professors Eric Lauga and Raymond Goldstein.</p> <p>This work was supported by the US National Science Foundation, the Wellcome Trust, the European Research Council, the Engineering and Physical Sciences Research Council, and the Schlumberger Chair Fund.</p> <p> </p> <p><em>Reference:<br /> D.B. Stein, G. De Canio, E. Lauga, M.J. Shelley, and R.E. Goldstein, “<a href="https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.126.028103">Swirling Instability of the Microtubule Cytoskeleton</a>”, Physical Review Letters (2021). DOI: 10.1103/PhysRevLett.126.028103</em></p> </div></div></div><div class="field field-name-field-content-summary field-type-text-with-summary field-label-hidden"><div class="field-items"><div class="field-item even"><p><p> ֱ̽swirling currents occur when the rodlike structures that extend inward from the cells’ membranes bend in tandem, like stalks of wheat caught in a strong breeze, according to a study from the ֱ̽ of Cambridge and the Flatiron Institute.</p> </p></div></div></div><div class="field field-name-field-content-quote field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"> ֱ̽mechanism of the swirling instability is disarmingly simple, and the agreement between our calculations and experimental observations supports the idea that this is indeed the process at work in fruit fly egg cells</div></div></div><div class="field field-name-field-content-quote-name field-type-text field-label-hidden"><div class="field-items"><div class="field-item even">Raymond Goldstein</div></div></div><div class="field field-name-field-cc-attribute-text field-type-text-long field-label-hidden"><div class="field-items"><div class="field-item even"><p><a href="http://creativecommons.org/licenses/by/4.0/" rel="license"><img alt="Creative Commons License" src="https://i.creativecommons.org/l/by/4.0/88x31.png" style="border-width:0" /></a><br /> ֱ̽text in this work is licensed under a <a href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>. Images, including our videos, are Copyright © ֱ̽ of Cambridge and licensors/contributors as identified.  All rights reserved. We make our image and video content available in a number of ways – as here, on our <a href="/">main website</a> under its <a href="/about-this-site/terms-and-conditions">Terms and conditions</a>, and on a <a href="/about-this-site/connect-with-us">range of channels including social media</a> that permit your use and sharing of our content under their respective Terms.</p> </div></div></div><div class="field field-name-field-show-cc-text field-type-list-boolean field-label-hidden"><div class="field-items"><div class="field-item even">Yes</div></div></div> Wed, 13 Jan 2021 16:35:52 +0000 sc604 221331 at