Difference between revisions of "RG: Machine Learning"

From gem
Jump to navigation Jump to search
(15 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
{| class="wikitable"
 
{| class="wikitable"
 
|+ style="caption-side:bottom; text-align:left | Table 1: RG Chairs.
 
|+ style="caption-side:bottom; text-align:left | Table 1: RG Chairs.
! Name          !! Organization !! Email
+
! Name          !! Affiliation !! Email
 
|-
 
|-
| 1. Hyunju Connor  || || [mailto:]
+
| 1. Hyunju Connor  || NASA GSFC || hyunju.k.connor@nasa.gov
 
|-
 
|-
| 2. Matthew Argall  || || [mailto:]
+
| 2. Matthew Argall  || UNH || Matthew.Argall@unh.edu
 
|-
 
|-
| 3. Xiangning Chu  || || [mailto:]
+
| 3. Xiangning Chu  || LASP, CU Boulder || xiangning.chu@lasp.colorado.edu
 
|-
 
|-
| 4. Bashi Ferdousi    || || [mailto:]
+
| 4. Bashi Ferdousi    || AFRL || banafsheh.ferdousi@spaceforce.edu
 
|-
 
|-
| 5. Valluri Sai Gowtam    || || [mailto:]
+
| 5. Valluri Sai Gowtam    || UAF || svalluri@alaska.edu
 
|}
 
|}
  
Line 32: Line 32:
 
innovative ways to link existing and developing ML models and establish the initial architecture
 
innovative ways to link existing and developing ML models and establish the initial architecture
 
of a new data-driven global geospace environment model. These efforts will subsequently lead to system-of-systems research to understand the collective behavior of geospace systems in response to incoming solar wind and Interplanetary Magnetic Field (IMF) conditions. Additionally, we will share new ML techniques and ML-ready dataset, outline the pros and cons of ML approaches, and offer valuable lessons learned, providing a resource for newcomers initiating their ML projects. The main research area of this FG is Global General Circulation Modeling (GGCM). However, our topics cover all other four GEM research areas, aligning well with all current GEM FGs.
 
of a new data-driven global geospace environment model. These efforts will subsequently lead to system-of-systems research to understand the collective behavior of geospace systems in response to incoming solar wind and Interplanetary Magnetic Field (IMF) conditions. Additionally, we will share new ML techniques and ML-ready dataset, outline the pros and cons of ML approaches, and offer valuable lessons learned, providing a resource for newcomers initiating their ML projects. The main research area of this FG is Global General Circulation Modeling (GGCM). However, our topics cover all other four GEM research areas, aligning well with all current GEM FGs.
 +
 +
==Background==
 +
 +
Recently, Machine Learning (ML) techniques have demonstrated significant potential in heliophysics research. These computational methods uncover complex relationships and patterns between input and output by learning from the ever-growing space/ground-based observations, as opposed to relying on a set of predetermined equations. ML models excel at predicting the dynamic response of geospace systems to time-varying solar wind and IMF input, a capability beyond the reach of traditional empirical models designed for static systems under steady input conditions. This innovative approach to data handling enables the discovery of hidden behaviors in our dynamic systems, such as global tail reconnection lines (Stephens et al. 2023). Notably, certain ML models have surpassed empirical counterparts, marking ML-based models as the next
 +
generation of statistical models.
 +
 +
ML models have multiple broader impacts. Firstly, ML models can supply realistic inputs for physics-based models, such as solar wind input for the global MHD models and high-latitude forcing for upper atmosphere models. Secondly, they can serve as valuable validation tools for physics-based calculations of geospace systems, facilitating comparisons between MHD-based (or physics-based) and ML-based (or statistical) outputs, such as global auroral precipitation patterns and cross polar cap potentials. Thirdly, ML techniques offer excellent data-mining tools, aiding in event selection, such as magnetopause crossings and substorm onsets. Lastly, despite the time-
 +
consuming training of ML models, once trained, they promptly generate outputs from inputs, allowing them to nowcast space weather without the massive computations typically required in physics-based models.
 +
 +
The GEM community has accumulated various ML models that cover a range of systems from the solar wind to the magnetosphere and ionosphere. Some ML models predict solar wind and IMF based on solar EUV images (Upendran et al. 2020; Raju & Das 2021), while others ensure accurate SW/IMF propagation from solar wind monitors to the Earth’s bow shock nose (Baumann & McCloskey 2021; O’Brien et al. 2023). Furthermore, certain models replicate responses in the magnetosheath, cusp, ring current, radiation belt, plasmasphere, ionosphere, and thermosphere concerning time-varying solar wind/IMF conditions and geomagnetic indices (Li et al., 2023; Ma et al., 2023; Chu et al., 2017, 2021; Cao et al., 2023; Raptis et al., 2020; Gowtam et al., 2019; Licata et al., 2022). There are also models designed explicitly for mining interesting events from vast heliophysics datasets (Stephens et al., 2019; Arnold et al., 2023). However, there have been no community-wide efforts to interconnect the existing ML models, develop an ML-based geospace environment model, and investigate how each individual geospace system collectively
 +
responds to the incoming solar wind drivers. Our focus group proposes to integrate growing ML efforts within the GEM community across all applicable topics and fields, aiming to pioneer a new generation of space weather prediction models and conduct system-of-systems science research.
 +
Our focus group will advance our understanding of heliophysics by integrating cutting-edge ML techniques and coupling them with other toolkits within the GEM community. For example, explainable ML techniques can unveil the relative contribution factors to radiation belt acceleration and loss processes (Ma et al., 2022). Solving theoretical equations, such as a current continuity equation, with physical parameters derived from ML models reveals how ionospheric electrodynamics responds to time-varying solar wind/IMF drivers (Gowtam et al. 2023). Coupling first-principle simulations with ML models, such as a global MHD model coupled with a ML inner-magnetosphere model, can reproduce the unstable and eruptive nature of geospace system, offering insights distant from the more conserved system response seen in typical modeling (Sciola, 2022).
 +
 +
Our FG activities will transform the perception of ML within the community, shifting it from being seen as a black box to being recognized as a fundamental toolkit for geospace physicist, alongside numerical simulations, theoretical approaches, and observation analysis. Coupling among these tool kits would be innovative and at the frontier of physical discoveries in the next decade. GEM should commence this transformative step now.
 +
 +
==Timeliness==
 +
 +
ML activities within the heliophysics community are mature enough to take the next innovative step. In past years, GEM convened an annual machine learning session during a summer workshop, but it was not categorized under a focus group. Consequently, there is a limitation for this session to collectively set a common goal and maintain the community’s momentum toward that goal. Our focus group is timely in overcoming this limitation. We will address the community’s growing  interest in ML techniques and steer the community toward a common goal: developing a ML-based geospace environment model and conducting systems science research from a data-driven perspective.
  
 
==Goals and Objectives==
 
==Goals and Objectives==
Line 39: Line 57:
 
implement the following objectives to achieve these goals:
 
implement the following objectives to achieve these goals:
  
1. Invite community-wide ML efforts: Extend invitations to a diverse community, including GEM, CEDAR, SHINE, and ML communities, fostering broader participation and collaboration.
+
'''1. Invite community-wide ML efforts:''' Extend invitations to a diverse community, including GEM, CEDAR, SHINE, and ML communities, fostering broader participation and collaboration.
  
2. Share ML advancements: Disseminate the latest ML techniques, including their advantages and disadvantages, and lessons learned. This knowledge-sharing will support community-wide geospace environment modeling efforts.
+
'''2. Share ML advancements:''' Disseminate the latest ML techniques, including their advantages and disadvantages, and lessons learned. This knowledge-sharing will support community-wide geospace environment modeling efforts.
  
3. Facilitate integration discussions: Stimulate discussions on integrating diverse ML models across geospace systems, such as solar wind, magnetosheath, cusp, ring current, radiation belt, plasmasphere, and ionosphere.
+
'''3. Facilitate integration discussions:''' Stimulate discussions on integrating diverse ML models across geospace systems, such as solar wind, magnetosheath, cusp, ring current, radiation belt, plasmasphere, and ionosphere.
  
4. Develop ML-based models: Initiate the development of an ML-based geospace environment model and encourage the strategic creation and inclusion of ML components in ML-GEM for enhanced performance.
+
'''4. Develop ML-based models:''' Initiate the development of an ML-based geospace environment model and encourage the strategic creation and inclusion of ML components in ML-GEM for enhanced performance.
  
5. Explore systematic responses: Investigate system-of-systems science in the solar wind – Earth interaction by leveraging ML-GEM and other ML models.
+
'''5. Explore systematic responses:''' Investigate system-of-systems science in the solar wind – Earth interaction by leveraging ML-GEM and other ML models.
  
6. Compile a catalog of ML-ready dataset: Cleaning and preparing datasets constitute a substantial portion of machine learning model development. We will curate a list of existing ML-ready datasets in heliophysics for various scientific purposes.
+
'''6. Compile a catalog of ML-ready dataset:''' Cleaning and preparing datasets constitute a substantial portion of machine learning model development. We will curate a list of existing ML-ready datasets in heliophysics for various scientific purposes.
  
 
==GEM-2024 Activities==
 
==GEM-2024 Activities==
Line 55: Line 73:
 
ML-GEM chairs have scheduled 4 sessions for the upcoming 2024 GEM Summer workshop held in Fort Collins, Colorado during June 23-28:
 
ML-GEM chairs have scheduled 4 sessions for the upcoming 2024 GEM Summer workshop held in Fort Collins, Colorado during June 23-28:
  
1. ML-GEM stand-alone session :  All ML efforts across the GEM research areas are invited.
+
'''1. ML-GEM stand-alone session :''' All ML efforts across the GEM research areas are invited.
  
2. ML-GEM joint session with the Inner Magnetosphere Focus Groups : ML efforts particularly in the inner magnetosphere research area are invited.  
+
'''2. ML-GEM joint session with the Inner Magnetosphere Focus Groups :''' ML efforts particularly in the inner magnetosphere research area are invited.  
  
3. ML-GEM discussion session : Please submit a summary of your ML model (1-2 slides) — the slide format to be announced — and join the discussion on how to integrate your model into a unified, data-driven geospace environment model.   
+
'''3. ML-GEM discussion session :''' Please submit a summary of your ML model (1-2 slides) — the slide format to be announced — and join the discussion on how to integrate your model into a unified, data-driven geospace environment model.   
  
4. ML-GEM tutorial session:  A hands-on tutorial on the Long Short-Term Memory (LSTM) technique that models time-series data. This tutorial will use the LSTM model and the SuperMAG geomagnetic field data, published in Blandin et al. (2022; https://doi.org/10.3389/fspas.2022.846291).
+
'''4. ML-GEM tutorial session:''' A hands-on tutorial on the Long Short-Term Memory (LSTM) technique that models time-series data. This tutorial will use the LSTM model and the SuperMAG geomagnetic field data, published in Blandin et al. (2022; https://doi.org/10.3389/fspas.2022.846291).
  
If you are interested in giving a talk in these sessions, please submit your talk at the following website: https://forms.gle/fmsU2eeBFFwD4tDQ8.
+
<u>'''If you are interested in giving a talk in these sessions, please submit your talk at the following website:'''</u> https://forms.gle/fmsU2eeBFFwD4tDQ8.

Revision as of 14:25, 29 April 2024

"""""" This page is under development """""

Chairs

Table 1: RG Chairs.
Name Affiliation Email
1. Hyunju Connor NASA GSFC hyunju.k.connor@nasa.gov
2. Matthew Argall UNH Matthew.Argall@unh.edu
3. Xiangning Chu LASP, CU Boulder xiangning.chu@lasp.colorado.edu
4. Bashi Ferdousi AFRL banafsheh.ferdousi@spaceforce.edu
5. Valluri Sai Gowtam UAF svalluri@alaska.edu

About the ML-GEM Resource Group

Machine Learning based Geospace Environment Modeling (ML-GEM) is a new resource group selected by the GEM Steering Committee, with two primary goals: advancing system-of-systems science in Sun-Earth interaction from a data-driven perspective and developing an ML-based Geospace Environment Modeling by integrating community-wide ML efforts.

Machine Learning (ML) efforts have experienced rapid growth within the GEM community over the past few years. However, there has been no collective initiative to develop a new ML-based Geospace Environment Model (ML-GEM) that comprehensively connects ML-based models from the Sun to the solar wind, magnetosphere, and upper atmosphere to systematically understand and predict Sun – Earth interactions. Our new Focus Group (FG) aims to gather and lead diverse machine-learning efforts within the GEM community, with the primary goal of developing ML-GEM and gaining insights into our geospace systems from a data-driven perspective.

Over the next four years, our diverse FG team, consisting of four early-career scientists and two females, aims to unite ML efforts across the GEM community. We will actively explore innovative ways to link existing and developing ML models and establish the initial architecture of a new data-driven global geospace environment model. These efforts will subsequently lead to system-of-systems research to understand the collective behavior of geospace systems in response to incoming solar wind and Interplanetary Magnetic Field (IMF) conditions. Additionally, we will share new ML techniques and ML-ready dataset, outline the pros and cons of ML approaches, and offer valuable lessons learned, providing a resource for newcomers initiating their ML projects. The main research area of this FG is Global General Circulation Modeling (GGCM). However, our topics cover all other four GEM research areas, aligning well with all current GEM FGs.

Background

Recently, Machine Learning (ML) techniques have demonstrated significant potential in heliophysics research. These computational methods uncover complex relationships and patterns between input and output by learning from the ever-growing space/ground-based observations, as opposed to relying on a set of predetermined equations. ML models excel at predicting the dynamic response of geospace systems to time-varying solar wind and IMF input, a capability beyond the reach of traditional empirical models designed for static systems under steady input conditions. This innovative approach to data handling enables the discovery of hidden behaviors in our dynamic systems, such as global tail reconnection lines (Stephens et al. 2023). Notably, certain ML models have surpassed empirical counterparts, marking ML-based models as the next generation of statistical models.

ML models have multiple broader impacts. Firstly, ML models can supply realistic inputs for physics-based models, such as solar wind input for the global MHD models and high-latitude forcing for upper atmosphere models. Secondly, they can serve as valuable validation tools for physics-based calculations of geospace systems, facilitating comparisons between MHD-based (or physics-based) and ML-based (or statistical) outputs, such as global auroral precipitation patterns and cross polar cap potentials. Thirdly, ML techniques offer excellent data-mining tools, aiding in event selection, such as magnetopause crossings and substorm onsets. Lastly, despite the time- consuming training of ML models, once trained, they promptly generate outputs from inputs, allowing them to nowcast space weather without the massive computations typically required in physics-based models.

The GEM community has accumulated various ML models that cover a range of systems from the solar wind to the magnetosphere and ionosphere. Some ML models predict solar wind and IMF based on solar EUV images (Upendran et al. 2020; Raju & Das 2021), while others ensure accurate SW/IMF propagation from solar wind monitors to the Earth’s bow shock nose (Baumann & McCloskey 2021; O’Brien et al. 2023). Furthermore, certain models replicate responses in the magnetosheath, cusp, ring current, radiation belt, plasmasphere, ionosphere, and thermosphere concerning time-varying solar wind/IMF conditions and geomagnetic indices (Li et al., 2023; Ma et al., 2023; Chu et al., 2017, 2021; Cao et al., 2023; Raptis et al., 2020; Gowtam et al., 2019; Licata et al., 2022). There are also models designed explicitly for mining interesting events from vast heliophysics datasets (Stephens et al., 2019; Arnold et al., 2023). However, there have been no community-wide efforts to interconnect the existing ML models, develop an ML-based geospace environment model, and investigate how each individual geospace system collectively responds to the incoming solar wind drivers. Our focus group proposes to integrate growing ML efforts within the GEM community across all applicable topics and fields, aiming to pioneer a new generation of space weather prediction models and conduct system-of-systems science research. Our focus group will advance our understanding of heliophysics by integrating cutting-edge ML techniques and coupling them with other toolkits within the GEM community. For example, explainable ML techniques can unveil the relative contribution factors to radiation belt acceleration and loss processes (Ma et al., 2022). Solving theoretical equations, such as a current continuity equation, with physical parameters derived from ML models reveals how ionospheric electrodynamics responds to time-varying solar wind/IMF drivers (Gowtam et al. 2023). Coupling first-principle simulations with ML models, such as a global MHD model coupled with a ML inner-magnetosphere model, can reproduce the unstable and eruptive nature of geospace system, offering insights distant from the more conserved system response seen in typical modeling (Sciola, 2022).

Our FG activities will transform the perception of ML within the community, shifting it from being seen as a black box to being recognized as a fundamental toolkit for geospace physicist, alongside numerical simulations, theoretical approaches, and observation analysis. Coupling among these tool kits would be innovative and at the frontier of physical discoveries in the next decade. GEM should commence this transformative step now.

Timeliness

ML activities within the heliophysics community are mature enough to take the next innovative step. In past years, GEM convened an annual machine learning session during a summer workshop, but it was not categorized under a focus group. Consequently, there is a limitation for this session to collectively set a common goal and maintain the community’s momentum toward that goal. Our focus group is timely in overcoming this limitation. We will address the community’s growing interest in ML techniques and steer the community toward a common goal: developing a ML-based geospace environment model and conducting systems science research from a data-driven perspective.

Goals and Objectives

Our overarching goals are to develop an ML-based geospace environment model and advance system-of-systems science in Sun-Earth interaction. Over the 4-year focus group period, we will implement the following objectives to achieve these goals:

1. Invite community-wide ML efforts: Extend invitations to a diverse community, including GEM, CEDAR, SHINE, and ML communities, fostering broader participation and collaboration.

2. Share ML advancements: Disseminate the latest ML techniques, including their advantages and disadvantages, and lessons learned. This knowledge-sharing will support community-wide geospace environment modeling efforts.

3. Facilitate integration discussions: Stimulate discussions on integrating diverse ML models across geospace systems, such as solar wind, magnetosheath, cusp, ring current, radiation belt, plasmasphere, and ionosphere.

4. Develop ML-based models: Initiate the development of an ML-based geospace environment model and encourage the strategic creation and inclusion of ML components in ML-GEM for enhanced performance.

5. Explore systematic responses: Investigate system-of-systems science in the solar wind – Earth interaction by leveraging ML-GEM and other ML models.

6. Compile a catalog of ML-ready dataset: Cleaning and preparing datasets constitute a substantial portion of machine learning model development. We will curate a list of existing ML-ready datasets in heliophysics for various scientific purposes.

GEM-2024 Activities

ML-GEM chairs have scheduled 4 sessions for the upcoming 2024 GEM Summer workshop held in Fort Collins, Colorado during June 23-28:

1. ML-GEM stand-alone session : All ML efforts across the GEM research areas are invited.

2. ML-GEM joint session with the Inner Magnetosphere Focus Groups : ML efforts particularly in the inner magnetosphere research area are invited.

3. ML-GEM discussion session : Please submit a summary of your ML model (1-2 slides) — the slide format to be announced — and join the discussion on how to integrate your model into a unified, data-driven geospace environment model.

4. ML-GEM tutorial session: A hands-on tutorial on the Long Short-Term Memory (LSTM) technique that models time-series data. This tutorial will use the LSTM model and the SuperMAG geomagnetic field data, published in Blandin et al. (2022; https://doi.org/10.3389/fspas.2022.846291).

If you are interested in giving a talk in these sessions, please submit your talk at the following website: https://forms.gle/fmsU2eeBFFwD4tDQ8.