Archive



Choose a time period from - to - x

Search  x

Interruptions


09.03.2021 18:32

Dear colleagues,

we are experiencing problems with /lustre and therefore have to restart the main metadata server (MDS).

We apologize for the inconvenience

GSI HPC group

10.03.2021 08:23

Dear colleagues,

/lustre is still unavailable after a file system check of the primary metadata server (MDS).

Our investigations are ongoing.

Best

GSI HPC group

10.03.2021 18:14

Dear colleagues,

unfortunately multiple restarts and consistency checks on the Lustre MDS servers did not fix the availability issue at hand.
Therefore we decided to restart the complete Lustre file system including all cllients esp. Virgo.

The batch nodes will be drained and active jobs that are stuck will presumably be lost.

We are sorry this issue cannot be fixed with less interruption.

GSI HPC group

11.03.2021 12:03

Dear colleagues,

Virgo has been drained and is shut down at the moment.
All other Lustre clients have been disconnected or shut down by now.

Many clients had to be rebooted to properly unmount Lustre.
Some clients hang during reboot, which caused some interruptions.
We are very sorry for that.

We will soon start to reboot the Lustre servers.
If everything goes as planned, Lustre and Virgo will become available again tonight.

I'd like to stress that this is a very unusual situation which to my knowledge happend before only once in ~ 15 years of Lustre operations at GSI.
It is very unfortunate that this happens now during beamtime.

Christopher Huhn on behalf of the GSI HPC group

12.03.2021 10:20

Dear colleagues,

the Lustre servers has been succesfully restartet and the file system is available again.

At the moment access to /lustre is possible via lustre.hpc.gsi.de. We will remount Lustre on all dedicated Lustre clients during the next hours while we closely observe the situation on the servers.

Virgo is available again for Job submission with limited capacity. We will ramp up the capacity peu à peu. Yet it is possible that Virgo does not reach its full capacity before Monday.

Again we apologize for the inconvenience in the disturbance caused by this incident

Christopher Huhn on behalf of the GSI HPC group


Loading...