Machine learning

Given the large amount of EO data of high spatial resolution data at high revisit frequency and across multiple sensors, AI techniques are needed to enable the automatic identification of complex patterns.  These techniques can contribute to the monitoring of the polar regions across many aspects, such as:

  • Climate Change Indicators
  • Snow
  • Permafrost
  • Sea Ice
  • Icebergs
  • Glaciers and Icesheets
  • Ocean
  • Atmosphere
  • Biosphere

Monitoring of the polar regions using machine learning is supported by the advance computing infrastructure of Polar TEP that hosts training data, algorithms, and results; provides a polar machine learning community discussion forum; and provides machine learning computation resources.

Polar TEP has implemented the MLflow platform to support machine learning activities.  MLflow is an open-source platform to manage all stages of the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry. MLflow offers four components:


MLflow Tracking
The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. MLflow Tracking lets you log and query experiments using Python, REST, R, and Java APIs.
MLflow Tracking uses the concept of runs, which are executions of some piece of data science code, e.g., Trained models. MLflow Tracking supports autologging for many classic libraries such as TensorFlow, Scikit-Learn, Spark or Pytorch. Runs can be stored as local files, remote server or into an SQLAlchemy compatible database. The tracking UI allows to directly visualize tracked metrics and search for the best components.

MLflow Projects
An MLflow Project is a format for packaging data science code in a reusable and reproducible way. In addition, the Projects component includes an API and command-line tools for running projects, making it possible to chain together projects into workflows.
Each project could be a git repository or directory with a code to run. In the MLproject file it is possible to define the software environment and entry points with parameters.

MLflow Models
An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools—for example, real-time serving through a REST API or batch inference on Apache Spark. The format defines a convention that lets you save a model in different “flavors” that can be understood by different downstream tools.
Flavors allow MLflow Models to be treated with corresponding functions without the need to integrate tools with each library. Flavors can be defined in the MLmodel file. Model signatures are defining outputs and inputs needed for deploying models as a REST API. The Model API allows saving, loading and logging of the model also as adding different flavors. MLflow also provides an evaluate API to evaluate previously built models on one or more datasets.


MLflow Model Registry
The MLflow Model Registry component is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow Model. The MLflow Model Registry works both in UI and API version. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations. Model versioning allows models to be archived and redeployed in the future.