FS#80218 - [gitlab] Since updating to 16.5.1-2, a part of this keeps looping, giving a 502 in the web UI

Attached to Project: Arch Linux
Opened by simonzack (simonzack) - Friday, 10 November 2023, 14:55 GMT
Last edited by Toolybird (Toolybird) - Sunday, 12 November 2023, 20:54 GMT
Task Type Bug Report
Category Packages: Extra
Status Closed
Assigned To Anatol Pomozov (anatolik)
Caleb Maclennan (alerque)
Architecture All
Severity Low
Priority Normal
Reported Version
Due in Version Undecided
Due Date Undecided
Percent Complete 100%
Votes 0
Private No

Details

Description:

Today I updated `gitlab 16.1.4-1 -> 16.5.1-2`.

Ever since the update, the web UI gives a 502.

The following constantly loops in `/var/log/gitlab/application_json.log`. It takes up an entire CPU thread.

```
{"severity":"DEBUG","time":"2023-11-10T14:09:08.632Z","message":"ActiveRecord connection established"}
...
{"severity":"INFO","time":"2023-11-10T14:09:15.753Z","message":"stopped","memwd_reason":"background task stopped","memwd_handler_class":"Gitlab::Memory::Watchdog::Handlers::PumaHandler","memwd_sleep_time_s":60,"pid":35950,"worker_id":"puma_master","memwd_rss_bytes":648003584}
```

`/var/log/gitlab/production_json.log` also gives the example.org domain, despite me setting it to a custom one.

```
{"method":"GET","path":"/","format":"html","controller":"RootController","action":"index","status":302,"location":"http://example.org/users/sign_in","time":"2023-11-10T14:08:58.261Z","params":[],"correlation_id":"b78617ed-7d1d-4ca8-b45a-c9e98727be6a","meta.caller_id":"RootController#index","meta.feature_category":"groups_and_projects","meta.client_id":"ip/","request_urgency":"low","target_duration_s":5,"redis_calls":26,"redis_duration_s":0.002655,"redis_read_bytes":26,"redis_write_bytes":4277,"redis_feature_flag_calls":26,"redis_feature_flag_duration_s":0.002655,"redis_feature_flag_read_bytes":26,"redis_feature_flag_write_bytes":4277,"db_count":13,"db_write_count":0,"db_cached_count":0,"db_replica_count":0,"db_primary_count":13,"db_main_count":13,"db_main_replica_count":0,"db_replica_cached_count":0,"db_primary_cached_count":0,"db_main_cached_count":0,"db_main_replica_cached_count":0,"db_replica_wal_count":0,"db_primary_wal_count":0,"db_main_wal_count":0,"db_main_replica_wal_count":0,"db_replica_wal_cached_count":0,"db_primary_wal_cached_count":0,"db_main_wal_cached_count":0,"db_main_replica_wal_cached_count":0,"db_replica_duration_s":0.0,"db_primary_duration_s":0.002,"db_main_duration_s":0.002,"db_main_replica_duration_s":0.0,"cpu_s":2.192361,"pid":35830,"worker_id":"puma_master","rate_limiting_gates":[],"db_duration_s":0.00099,"view_duration_s":0.0,"duration_s":0.01341}
```

Additional info:
* package version(s) 16.5.1-2
* config and/or log files etc.
* link to upstream bug report, if any

Steps to reproduce:
* Install *GitLab* 16.5.1-2.
This task depends upon

Closed by  Toolybird (Toolybird)
Sunday, 12 November 2023, 20:54 GMT
Reason for closing:  Duplicate
Additional comments about closing:  Merging into  FS#80233 
Comment by Toolybird (Toolybird) - Friday, 10 November 2023, 21:42 GMT
> Today I updated `gitlab 16.1.4-1 -> 16.5.1-2`

For such a complex beastie, that long between updates is asking for trouble. You might have better luck asking about upgrade issues in the Arch support channels. To date, nobody else has reported this specific problem, and there is no confirmation of an Arch packaging issue in the info provided. Recent related issues  FS#80129   FS#80077   FS#79922  are likely not relevant but might be worth checking out.
Comment by simonzack (simonzack) - Saturday, 11 November 2023, 07:01 GMT
I was away for around a month and didn't have access to my computer. That was usually fine for me in the past, but I think *GitLab* is a bit special.

Thanks for the links. Indeed I don't think those issues are too relevant.

Today I also saw discovered the following error:

$ sudo journalctl --unit=gitlab-workhorse.service --boot=0
...
Nov 11 17:45:22 proxima systemd[1]: gitlab-workhorse.service: Main process exited, code=exited, status=2/INVALIDARGUMET
...
Nov 11 17:45:22 proxima systemd[1]: gitlab-workhorse.service: Failed with result 'exit-code'.
...

I see the forum post <https://bbs.archlinux.org/viewtopic.php?id=290040> also mentions `gitlab-workhorse` crashing, although I don't have their `gitlab-sidekiq` error.
Comment by AK (Andreaskem) - Saturday, 11 November 2023, 07:52 GMT
Going from 16.1 directly to 16.5 does not seem to be a supported upgrade path.
https://docs.gitlab.com/ee/update/#upgrade-paths

***
GitLab 16: 16.0.x (only instances with lots of users or large pipeline variables history) > 16.1(instances with NPM packages in their Package Registry) > 16.2.x (only instances with large pipeline variables history) > 16.3 > latest 16.Y.Z.
***

https://docs.gitlab.com/ee/update/#required-upgrade-stops
***
Required upgrade stops are versions of GitLab that you must upgrade to before upgrading to later versions. Required upgrade stops allow required background migrations to finish.

During GitLab 16.x, we are scheduling required upgrade stops beforehand so users can better plan out appropriate upgrade stops and downtime when necessary.

The first scheduled required upgrade stop has been announced for 16.3.x. When planning upgrades, please take this into account.
***
Comment by simonzack (simonzack) - Sunday, 12 November 2023, 09:48 GMT
Thanks for the comments.

I first restored my *PosgreSQL* database from backup. Then I tried to upgrade version-by-version.

The version v16.3 doesn't exist as a package in *Arch Linux*, so I went straight to 16.4.1-1.

This still worked for me. I tested editing of issues, pushing.

However, the next upgrade to 16.5.1-2, results in a hang.
Comment by simonzack (simonzack) - Sunday, 12 November 2023, 14:29 GMT
Turns out this issue is the same as  FS#80233 .

My very approximate guess is that when *GitLab* doesn't see *Puma* being active, it just restarts itself, so it hangs.

Thank you @lahwaacz on opening that issue!

Loading...