Friday, October 12, 2018

Account Creation and Password Resets Temporarily Down, October 12

Due to ongoing maintenance, account creation and password resets are down today.

At roughly 6:15PM, we there will be brief NFS downtime as we attempt to fix the issue.

Thanks for flying OCF!

Scheduled maintenance on night of 2018-10-12

The OCF is anticipating a short period of intermittent service unavailability in order to perform some additional maintenance on our hypervisors as a followup to last week's maintenance event. Specifically, we intend to migrate NFS to our new fileserver, reinstall our hypervisors onto new disks, and possibly migrate our mirrors server to new hardware. We are scheduling this event for for low-utilization hours at night to minimize any disruption to our users.

Thanks for flying OCF and send us an email if you have any questions!

Monday, October 08, 2018

IPv6 Connectivity Issues on October 8

Starting last night, the OCF has been experiencing some connectivity issues to our public SSH server over IPv6. If you are having trouble logging in to ssh.ocf.berkeley.edu, please try using IPv4 to connect. To do this, you can add -4 to your SSH connection command, like so

ssh -4 <OCF username>@ssh.ocf.berkeley.edu

If your SSH client does not support the -4 flag, you can also connect directly to our server's IPv4 address. To do this, just connect to `169.229.226.25` instead of `ssh.ocf.berkeley.edu`.

Some other services such as MySQL may also experience issues over IPv6. If neither IPv6 nor IPv4 is working for you, please let us know.

Thank you for being patient while we restore full connectivity.

UPDATE 2018-10-09 1:45AM: IPv6 connectivity should be restored to all our user-facing services.

Wednesday, October 03, 2018

Downtime on October 6

The OCF will be experiencing downtime, due to scheduled maintenance, on October 6th, from 9PM-12AM. Hosted websites will experience downtime as we briefly reboot the servers to apply critical security updates.  Once our servers are rebooted, users accessing our public login server and our apphosting server will not be able to write files as we are moving NFS to a new host. This read-only period will last no longer than 15 minutes and all operations should behave as normal by midnight at the latest.

Thanks for flying with the OCF!

8:53 Update: We've powered off the servers to do networking hardware and kernel updates.
11:03 Update: We've decided not to do the NFS migration tonight, but networking updates have been performed and services should be back up in the next hour.
11:42 Update: Most of our public servers should be back now, apart from our public mirrors and our own website (vhosts should be fine).
12:17 Update: Our website is back, but our software mirrors are not back yet.
12:51 Update: Everything should be back and working now except our HPC control server, which we are still debugging.
1:35 Update: This is the all clear, everything should be working now! Feel free to let us know by emailing help@ocf.berkeley.edu if you notice anything wrong.

Friday, June 01, 2018

Small amount of downtime today (6/1)

We had a short amount of downtime today with a few of our services between 10:10 PM and 11:19 PM today (6/1/2018) due to some networking issues on one of our hypervisors:

  • OCF website: Down from 10:24 PM to 10:47 PM, intermittently up and down until 11:19 PM.
  • phpMyAdmin: Down from 10:10 PM to 10:48 PM, intermittently up and down until 11:19 PM.
  • OCF mirrors (rsync/ftp/http/https): Down from 10:28 PM to 11:09 PM.
We have fully recovered all affected services and put a couple protections in place to avoid this kind of issue again.

Monday, March 19, 2018

Short downtime tonight (3/19) for security updates

There will be a brief period of downtime tonight (3/19/2018), around 8:15 PM. It will likely last about 30-45 minutes as we restart our servers to apply kernel updates to them.

Edit (8:20 PM): We have started. See you on the flip side!
Edit (8:40 PM): We have rebooted a majority of our public services, they should be coming up soon. Public mirrors are still to come.
Edit (8:50 PM): Our MySQL database did not start correctly, so we have started that now. We are also in the process of rebooting our public software mirrors host.
Edit (9:00 PM): Restarting our servers is fully complete, thanks for your patience!

Friday, February 09, 2018

NFS downtime on 2018-02-09

NFS (containing home directories, cron jobs, and public_html web directories) was down for a short period of time on 2018-02-09 from 4:41 PM to around 5:06 PM. In this time, files stored in user home directories and all hosted websites (including https://www.ocf.berkeley.edu) were unavailable. Logins were also affected, both in the OCF lab and from outside the lab to our SSH server. Thank you for your patience while we brought everything back online.

Please contact us if you have any questions or concerns!

Wednesday, January 17, 2018

Downtime on 1/17/18

We're currently recovering from a multi-drive RAID failure on one of our servers. Currently logins, MySQL, printing, account creation, and our internal DNS will all not work. We'll post status updates as we bring things back up.

Downtime was on 1/17 from 13:27 to around 18:00 in total.

Note: MySQL and accounts created are being rolled back to a backup as of 2 AM this morning (1/17), so any accounts created since then or database changes will be rolled back.

Update 1/17 15:25: MySQL, logins, and DNS are all in the process of being restored. Database restoration will likely take around another hour, perhaps more, but logins and DNS are both close to being fixed.

Update 1/17 16:22: Logins (LDAP/Kerberos) and DNS have been restored, MySQL is getting closer to being restored, probably will be about half an hour on restoring that. Printing is still a work in progress, but getting closer. Account creation is still down.

Update 1/17 18:09: Everything except account creation should be back up again. We are working on restoring all accounts created today and then we will re-enable account creation. MySQL databases have been restored from our most recent backup (2 AM this morning). We have noticed some lingering issues with DNS that we will keep checking and update here if it stays a problem. Let us know if you encounter any other problems with our services!

Update 1/17 19:58: Account creation has been re-enabled, although any new accounts made today still need to be re-created, but new accounts made from this point on should work fine. We also fixed a DNSSEC issue that was preventing valid DNS responses in some cases. Contact us if you notice any other problems!

Update 1/18 03:21: All the missing accounts have been re-created again, but they will need to reset their passwords, since we purposefully do not log those for security and they were lost in our Kerberos rollback. These accounts can reset their passwords through our website.