Wednesday, September 22, 2021

Emergency Mirrors Maintenance

2021-09-27 19:05 PST: mirrors are back up, this downtime should be resolved. If you notice any issues using our mirrors please reach out to us at help@ocf.berkeley.edu.


We are currently (9/22) running emergency maintenance on mirrors.ocf.berkeley.edu. Due to an earlier drive failure we've had to replace the drives. We expect the mirrors server to be down until Monday, but we will keep this blog post updated.

Monday, July 05, 2021

Intermittent service outages during summer break

The OCF is currently aware of intermittent service outages during the summer break to our OCF website, web hosting services, mailing services, mirrors, and internal infrastructure. Volunteer staffers have begun an investigation of the issue and are planning to resolve it in the coming weeks. Please expect delays in this process due to the limited number of staff available in the summer and other factors beyond our control. We will update this post once we have more information on the situation.

If you are a user or represent a group that has been adversely impacted by a recent OCF service outage, please do not hesitate to contact help@ocf.berkeley.edu so we can investigate further.

Update: This has been resolved.

Monday, May 31, 2021

Major web service outage

The OCF is currently investigating an ongoing service outage that has disrupted web services including our main website (ocfweb) and user-hosted websites. All other services, such as our login servers (tsunami), HPC, and mirrors, should be functioning as normal.

Due to the timing of this outage, our response remains limited throughout the early night. We will update this post once we have more information on the situation.

Saturday, October 03, 2020

Mirrors down while we rebuild the RAID array

 We are replacing a drive in our mirrors server so it is down. This is expected to be done by 2PM PST 3/10/20.

In addition, from 12:01AM on 05/10, we will temporarily shut down the mirrors service to allow the RAID array to rebuild properly. This may cause a prolonged outage.

EDIT: The rebuild is now complete. We're syncing the mirrors live so you might see some outdated packages.

Monday, September 28, 2020

Mirrors out of date due to server issues

We're observing extended issues in syncing a number of our mirrors including Ubuntu and Kali, and so some of them may be very out of date.

We're currently working on resolving these issues and will update when we have more information.

Update: the issue has been resolved

Wednesday, September 16, 2020

Scheduled downtime for Sunday 9/20/20 from 12pm-5pm

The current situation with COVID has caused us to fall behind on a number of software upgrades that are critical to keeping our systems running efficiently and securely, especially on our physical servers. Fortunately we've been able to secure a block of time on from the university in order to perform these upgrades. However, in order to upgrade our physical hypervisors it may be necessary to shut down some production virtual machines. We will try to limit downtime to as little as possible.

The following services (and possibly others) may be intermittently down during this time period (9/20/20 12-5pm):

  • Web hosting
  • App hosting
  • Shell servers 
  • Possibly everything else if we decide to upgrade our production hypervisor

The following services will be down for an extended period of time:

  • HPC
  • staffvms on scurvy (basically all of them)

We suggest you do not schedule large jobs on HPC and have plans in place if your site is expecting a large number of visitors during this time period.

Saturday, July 25, 2020

MySQL upgrade saturday 726

As part of our work to transition from stretch to buster for our MySQL server, we'll be migrating user data today around 9:34pm.

To do this, we'll be setting up replication on the primary onto a dev instance, which we will then switch the 'mysql' hostname over to. There will be a short cutover period once we switch over the host. This post will be updated if any issues arise.

EDIT: The upgrade has been completed. We were under 10 minutes of read-only time and around 5 minutes of downtime.