Members Can Post Anonymously On This Site
Satellite Data Can Help Limit the Dangers of Windblown Dust
-
Similar Topics
-
By NASA
6 min read
Smarter Searching: NASA AI Makes Science Data Easier to Find
Image snapshot taken from NASA Worldview of NASA’s Global Precipitation Measurement (GPM) mission on March 15, 2025 showing heavy rain across the southeastern U.S. with an overlay of the GCMD Keyword Recommender for Earth Science, Atmosphere, Precipitation, Droplet Size. NASA Worldview Imagine shopping for a new pair of running shoes online. If each seller described them differently—one calling them “sneakers,” another “trainers,” and someone else “footwear for exercise”—you’d quickly feel lost in a sea of mismatched terminology. Fortunately, most online stores use standardized categories and filters, so you can click through a simple path: Women’s > Shoes > Running Shoes—and quickly find what you need.
Now, scale that problem to scientific research. Instead of sneakers, think “aerosol optical depth” or “sea surface temperature.” Instead of a handful of retailers, it is thousands of researchers, instruments, and data providers. Without a common language for describing data, finding relevant Earth science datasets would be like trying to locate a needle in a haystack, blindfolded.
That’s why NASA created the Global Change Master Directory (GCMD), a standardized vocabulary that helps scientists tag their datasets in a consistent and searchable way. But as science evolves, so does the challenge of keeping metadata organized and discoverable.
To meet that challenge, NASA’s Office of Data Science and Informatics (ODSI) at the agency’s Marshall Space Flight Center (MSFC) in Huntsville, Alabama, developed the GCMD Keyword Recommender (GKR): a smart tool designed to help data providers and curators assign the right keywords, automatically.
Smarter Tagging, Accelerated Discovery
The upgraded GKR model isn’t just a technical improvement; it’s a leap forward in how we organize and access scientific knowledge. By automatically recommending precise, standardized keywords, the model reduces the burden on human curators while ensuring metadata quality remains high. This makes it easier for researchers, students, and the public to find exactly the datasets they need.
It also sets the stage for broader applications. The techniques used in GKR, like applying focal loss to rare-label classification problems and adapting pre-trained transformers to specialized domains, can benefit fields well beyond Earth science.
Metadata Matchmaker
The newly upgraded GKR model tackles a massive challenge in information science known as extreme multi-label classification. That’s a mouthful, but the concept is straightforward: Instead of predicting just one label, the model must choose many, sometimes dozens, from a set of thousands. Each dataset may need to be tagged with multiple, nuanced descriptors pulled from a controlled vocabulary.
Think of it like trying to identify all the animals in a photograph. If there’s just a dog, it’s easy. But if there’s a dog, a bird, a raccoon hiding behind a bush, and a unicorn that only shows up in 0.1% of your training photos, the task becomes far more difficult. That’s what GKR is up against: tagging complex datasets with precision, even when examples of some keywords are scarce.
And the problem is only growing. The new version of GKR now considers more than 3,200 keywords, up from about 430 in its earlier iteration. That’s a sevenfold increase in vocabulary complexity, and a major leap in what the model needs to learn and predict.
To handle this scale, the GKR team didn’t just add more data; they built a more capable model from the ground up. At the heart of the upgrade is INDUS, an advanced language model trained on a staggering 66 billion words drawn from scientific literature across disciplines—Earth science, biological sciences, astronomy, and more.
NASA ODSI’s GCMD Keyword Recommender AI model automatically tags scientific datasets with the help of INDUS, a large language model trained on NASA scientific publications across the disciplines of astrophysics, biological and physical sciences, Earth science, heliophysics, and planetary science. NASA “We’re at the frontier of cutting-edge artificial intelligence and machine learning for science,” said Sajil Awale, a member of the NASA ODSI AI team at MSFC. “This problem domain is interesting, and challenging, because it’s an extreme classification problem where the model needs to differentiate even very similar keywords/tags based on small variations of context. It’s exciting to see how we have leveraged INDUS to build this GKR model because it is designed and trained for scientific domains. There are opportunities to improve INDUS for future uses.”
This means that the new GKR isn’t just guessing based on word similarities; it understands the context in which keywords appear. It’s the difference between a model knowing that “precipitation” might relate to weather versus recognizing when it means a climate variable in satellite data.
And while the older model was trained on only 2,000 metadata records, the new version had access to a much richer dataset of more than 43,000 records from NASA’s Common Metadata Repository. That increased exposure helps the model make more accurate predictions.
The Common Metadata Repository is the backend behind the following data search and discovery services:
Earthdata Search International Data Network Learning to Love Rare Words
One of the biggest hurdles in a task like this is class imbalance. Some keywords appear frequently; others might show up just a handful of times. Traditional machine learning approaches, like cross-entropy loss, which was used initially to train the model, tend to favor the easy, common labels, and neglect the rare ones.
To solve this, NASA’s team turned to focal loss, a strategy that reduces the model’s attention to obvious examples and shifts focus toward the harder, underrepresented cases.
The result? A model that performs better across the board, especially on the keywords that matter most to specialists searching for niche datasets.
From Metadata to Mission
Ultimately, science depends not only on collecting data, but on making that data usable and discoverable. The updated GKR tool is a quiet but critical part of that mission. By bringing powerful AI to the task of metadata tagging, it helps ensure that the flood of Earth observation data pouring in from satellites and instruments around the globe doesn’t get lost in translation.
In a world awash with data, tools like GKR help researchers find the signal in the noise and turn information into insight.
Beyond powering GKR, the INDUS large language model is also enabling innovation across other NASA SMD projects. For example, INDUS supports the Science Discovery Engine by helping automate metadata curation and improving the relevancy ranking of search results.The diverse applications reflect INDUS’s growing role as a foundational AI capability for SMD.
The INDUS large language model is funded by the Office of the Chief Science Data Officer within NASA’s Science Mission Directorate at NASA Headquarters in Washington. The Office of the Chief Science Data Officer advances scientific discovery through innovative applications and partnerships in data science, advanced analytics, and artificial intelligence.
Share
Details
Last Updated Jul 09, 2025 Related Terms
Science & Research Artificial Intelligence (AI) Explore More
2 min read Polar Tourists Give Positive Reviews to NASA Citizen Science in Antarctica
Article
6 hours ago
2 min read Hubble Observations Give “Missing” Globular Cluster Time to Shine
Article
6 days ago
5 min read How NASA’s SPHEREx Mission Will Share Its All-Sky Map With the World
Article
7 days ago
Keep Exploring Discover Related Topics
Missions
Humans in Space
Climate Change
Solar System
View the full article
-
By European Space Agency
Astronomers using the European Space Agency’s Cheops mission have caught an exoplanet that seems to be triggering flares of radiation from the star it orbits. These tremendous explosions are blasting away the planet’s wispy atmosphere, causing it to shrink every year.
This is the first-ever evidence for a ‘planet with a death wish’. Though it was theorised to be possible since the nineties, the flares seen in this research are around 100 times more energetic than expected.
View the full article
-
By NASA
An unexpectedly strong solar storm rocked our planet on April 23, 2023, sparking auroras as far south as southern Texas in the U.S. and taking the world by surprise.
Two days earlier, the Sun blasted a coronal mass ejection (CME) — a cloud of energetic particles, magnetic fields, and solar material — toward Earth. Space scientists took notice, expecting it could cause disruptions to Earth’s magnetic field, known as a geomagnetic storm. But the CME wasn’t especially fast or massive, and it was preceded by a relatively weak solar flare, suggesting the storm would be minor. But it became severe.
Using NASA heliophysics missions, new studies of this storm and others are helping scientists learn why some CMEs have more intense effects — and better predict the impacts of future solar eruptions on our lives.
During the night of April 23 to 24, 2023, a geomagnetic storm produced auroras that were witnessed as far south as Arizona, Arkansas, and Texas in the U.S. This photo shows green aurora shimmering over Larimore, North Dakota, in the early morning of April 24. Copyright Elan Azriel, used with permission Why Was This Storm So Intense?
A paper published in the Astrophysical Journal on March 31 suggests the CME’s orientation relative to Earth likely caused the April 2023 storm to become surprisingly strong.
The researchers gathered observations from five heliophysics spacecraft across the inner solar system to study the CME in detail as it emerged from the Sun and traveled to Earth.
They noticed a large coronal hole near the CME’s birthplace. Coronal holes are areas where the solar wind — a stream of particles flowing from the Sun — floods outward at higher than normal speeds.
“The fast solar wind coming from this coronal hole acted like an air current, nudging the CME away from its original straight-line path and pushing it closer to Earth’s orbital plane,” said the paper’s lead author, Evangelos Paouris of the Johns Hopkins Applied Physics Laboratory in Laurel, Maryland. “In addition to this deflection, the CME also rotated slightly.”
Paouris says this turned the CME’s magnetic fields opposite to Earth’s magnetic field and held them there — allowing more of the Sun’s energy to pour into Earth’s environment and intensifying the storm.
The strength of the April 2023 geomagnetic storm was a surprise in part because the coronal mass ejection (CME) that produced it followed a relatively weak solar flare, seen as the bright area to the lower right of center in this extreme ultraviolet image of the Sun from NASA’s Solar Dynamics Observatory. The CMEs that produce severe geomagnetic storms are typically preceded by stronger flares. However, a team of scientists think fast solar wind from a coronal hole (the dark area below the flare in this image) helped rotate the CME and made it more potent when it struck Earth. NASA/SDO Cool Thermosphere
Meanwhile, NASA’s GOLD (Global-scale Observations of Limb and Disk) mission revealed another unexpected consequence of the April 2023 storm at Earth.
Before, during, and after the storm, GOLD studied the temperature in the middle thermosphere, a part of Earth’s upper atmosphere about 85 to 120 miles overhead. During the storm, temperatures increased throughout GOLD’s wide field of view over the Americas. But surprisingly, after the storm, temperatures dropped about 90 to 198 degrees Fahrenheit lower than they were before the storm (from about 980 to 1,070 degrees Fahrenheit before the storm to 870 to 980 degrees Fahrenheit afterward).
“Our measurement is the first to show widespread cooling in the middle thermosphere after a strong storm,” said Xuguang Cai of the University of Colorado, Boulder, lead author of a paper about GOLD’s observations published in the journal JGR Space Physics on April 15, 2025.
The thermosphere’s temperature is important, because it affects how much drag Earth-orbiting satellites and space debris experience.
“When the thermosphere cools, it contracts and becomes less dense at satellite altitudes, reducing drag,” Cai said. “This can cause satellites and space debris to stay in orbit longer than expected, increasing the risk of collisions. Understanding how geomagnetic storms and solar activity affect Earth’s upper atmosphere helps protect technologies we all rely on — like GPS, satellites, and radio communications.”
Predicting When Storms Strike
To predict when a CME will trigger a geomagnetic storm, or be “geoeffective,” some scientists are combining observations with machine learning. A paper published last November in the journal Solar Physics describes one such approach called GeoCME.
Machine learning is a type of artificial intelligence in which a computer algorithm learns from data to identify patterns, then uses those patterns to make decisions or predictions.
Scientists trained GeoCME by giving it images from the NASA/ESA (European Space Agency) SOHO (Solar and Heliospheric Observatory) spacecraft of different CMEs that reached Earth along with SOHO images of the Sun before, during, and after each CME. They then told the model whether each CME produced a geomagnetic storm.
Then, when it was given images from three different science instruments on SOHO, the model’s predictions were highly accurate. Out of 21 geoeffective CMEs, the model correctly predicted all 21 of them; of 7 non-geoeffective ones, it correctly predicted 5 of them.
“The algorithm shows promise,” said heliophysicist Jack Ireland of NASA’s Goddard Space Flight Center in Greenbelt, Maryland, who was not involved in the study. “Understanding if a CME will be geoeffective or not can help us protect infrastructure in space and technological systems on Earth. This paper shows machine learning approaches to predicting geoeffective CMEs are feasible.”
The white cloud expanding outward in this image sequence is a coronal mass ejection (CME) that erupted from the Sun on April 21, 2023. Two days later, the CME struck Earth and produced a surprisingly strong geomagnetic storm. The images in this sequence are from a coronagraph on the NASA/ESA (European Space Agency) SOHO (Solar and Heliospheric Observatory) spacecraft. The coronagraph uses a disk to cover the Sun and reveal fainter details around it. The Sun’s location and size are indicated by a small white circle. The planet Jupiter appears as a bright dot on the far right. NASA/ESA/SOHO Earlier Warnings
During a severe geomagnetic storm in May 2024 — the strongest to rattle Earth in over 20 years — NASA’s STEREO (Solar Terrestrial Relations Observatory) measured the magnetic field structure of CMEs as they passed by.
When a CME headed for Earth hits a spacecraft first, that spacecraft can often measure the CME and its magnetic field directly, helping scientists determine how strong the geomagnetic storm will be at Earth. Typically, the first spacecraft to get hit are one million miles from Earth toward the Sun at a place called Lagrange Point 1 (L1), giving us only 10 to 60 minutes advanced warning.
By chance, during the May 2024 storm, when several CMEs erupted from the Sun and merged on their way to Earth, NASA’s STEREO-A spacecraft happened to be between us and the Sun, about 4 million miles closer to the Sun than L1.
A paper published March 17, 2025, in the journal Space Weather reports that if STEREO-A had served as a CME sentinel, it could have provided an accurate prediction of the resulting storm’s strength 2 hours and 34 minutes earlier than a spacecraft could at L1.
According to the paper’s lead author, Eva Weiler of the Austrian Space Weather Office in Graz, “No other Earth-directed superstorm has ever been observed by a spacecraft positioned closer to the Sun than L1.”
Earth’s Lagrange points are places in space where the gravitational pull between the Sun and Earth balance, making them relatively stable locations to put spacecraft. NASA By Vanessa Thomas
NASA’s Goddard Space Flight Center, Greenbelt, Md.
View the full article
-
By European Space Agency
The Meteosat Third Generation Sounder (MTG-S1) satellite, which is hosting the instrument for the Copernicus Sentinel-4 mission, has been placed inside the nose cone of the Falcon 9 launch rocket and is ready for the scheduled liftoff at 23:03 CEST on Tuesday, 1 July.
View the full article
-
By European Space Agency
At ESA’s Living Planet Symposium, scientist have unveiled how the combination of different long-term, high-resolution satellite datasets from ESA’s Climate Change Initiative is shedding new light on the South American Gran Chaco – one of the world’s most endangered dry forest ecosystems. These data reveal, in remarkable clarity, that fire is the primary driver of widespread, accelerating deforestation across the region.
View the full article
-
-
Check out these Videos
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.