Saturday, October 03, 2020

Mirrors down while we rebuild the RAID array

 We are replacing a drive in our mirrors server so it is down. This is expected to be done by 2PM PST 3/10/20.

In addition, from 12:01AM on 05/10, we will temporarily shut down the mirrors service to allow the RAID array to rebuild properly. This may cause a prolonged outage.

EDIT: The rebuild is now complete. We're syncing the mirrors live so you might see some outdated packages.

Monday, September 28, 2020

Mirrors out of date due to server issues

We're observing extended issues in syncing a number of our mirrors including Ubuntu and Kali, and so some of them may be very out of date.

We're currently working on resolving these issues and will update when we have more information.

Update: the issue has been resolved

Wednesday, September 16, 2020

Scheduled downtime for Sunday 9/20/20 from 12pm-5pm

The current situation with COVID has caused us to fall behind on a number of software upgrades that are critical to keeping our systems running efficiently and securely, especially on our physical servers. Fortunately we've been able to secure a block of time on from the university in order to perform these upgrades. However, in order to upgrade our physical hypervisors it may be necessary to shut down some production virtual machines. We will try to limit downtime to as little as possible.

The following services (and possibly others) may be intermittently down during this time period (9/20/20 12-5pm):

  • Web hosting
  • App hosting
  • Shell servers 
  • Possibly everything else if we decide to upgrade our production hypervisor

The following services will be down for an extended period of time:

  • HPC
  • staffvms on scurvy (basically all of them)

We suggest you do not schedule large jobs on HPC and have plans in place if your site is expecting a large number of visitors during this time period.

Saturday, July 25, 2020

MySQL upgrade saturday 726

As part of our work to transition from stretch to buster for our MySQL server, we'll be migrating user data today around 9:34pm.

To do this, we'll be setting up replication on the primary onto a dev instance, which we will then switch the 'mysql' hostname over to. There will be a short cutover period once we switch over the host. This post will be updated if any issues arise.

EDIT: The upgrade has been completed. We were under 10 minutes of read-only time and around 5 minutes of downtime.

Saturday, July 04, 2020

RSO Group account creation/password reset broken due to expired signatory status

As of today, RSO signatory status for the previous school year has expired. This means if you were a signatory previously, you will no longer be able to create a new OCF group account for it or reset the password to an existing one using the web interface.

We encourage you to re-register your RSO and add signatories as soon as possible, and are evaluating options depending on how long it takes for signatory status to be reinstated in CalLink especially with the turbulent current state of affairs. Most likely, we'll be forced to suspend account creation until we can verify your status as a signatory. If you need help resetting a password to an existing OCF group account contact help@ocf.berkeley.edu.

Sunday, May 17, 2020

Major service outage

The OCF is currently undergoing an unplanned reboot of our main hypervisor which may disrupt most if not all of our services. We expect to be back up shortly.

Edit (19:08 PDT): Most services are back, but we're still working to stabilize all services and make sure services like HPC are fully working.

Edit (19:22 PDT): All services should be back. Let us know at help@ocf.berkeley.edu if you notice that something is down.

Friday, April 03, 2020

Moderate Service Outage

The OCF is currently investigating a service outage that has disrupted various services such as our main website (ocfweb), ircbot, Keycloak, and more. All other services, such as web hosting and mirrors, should be functioning as normal.

We will update this post once we have more information on the situation.

Update 01:03 PM: Updated the scope of the outage to include ocfweb.

Update 01:21 PM: The issue has been traced back to internal DNS issues and has now been resolved. We will continue to monitor our services for issues.