remove observedGeneration from postgresDatabase#1812
remove observedGeneration from postgresDatabase#1812DmytroPI-dev wants to merge 5 commits intofeature/database-controllersfrom
Conversation
| predicate.GenerationChangedPredicate{}, | ||
| predicate.Funcs{ | ||
| UpdateFunc: func(e event.UpdateEvent) bool { | ||
| oldObj, okOld := e.ObjectOld.(*cnpgv1.Database) |
There was a problem hiding this comment.
whats the difference between this predicate and GenerationChangedPredicate?
There was a problem hiding this comment.
If my understanding is correct, GenerationChangedPredicate is triggered only when CNPG Database spec changes. If we have status-only update, generation might not always change. Owner reference changes on the owned CNPG Database would also stop triggering reconciliation.
| } | ||
|
|
||
| func expectReadyStatus(current *enterprisev4.PostgresDatabase, generation int64, expectedDatabase enterprisev4.DatabaseInfo) { | ||
| func expectReadyStatus(current *enterprisev4.PostgresDatabase, expectedDatabase enterprisev4.DatabaseInfo) { |
There was a problem hiding this comment.
I'd still leave generation in status. Reason - we will know what is the CRD version status was emited for
| c := rc.Client | ||
| logger := log.FromContext(ctx).WithValues("postgresDatabase", postgresDB.Name) | ||
| ctx = log.IntoContext(ctx, logger) | ||
| defer func() { |
There was a problem hiding this comment.
Why do we want to do defer here? We always catch the errors explicitly close to the called function so what benefit this brings?
There was a problem hiding this comment.
Remove defer, add explicit requeue on conflict.
| err = nil | ||
| }() | ||
| logger.Info("Reconciling PostgresDatabase") | ||
| wasReady := postgresDB.Status.Phase != nil && *postgresDB.Status.Phase == string(readyDBPhase) |
There was a problem hiding this comment.
Could you explain what it does and how we use it?
There was a problem hiding this comment.
The idea was to reduce EventPostgresDatabaseReady on every successful reconcile.
| EventClusterNotReady = "ClusterNotReady" | ||
| EventRoleConflict = "RoleConflict" | ||
| EventUserSecretsFailed = "UserSecretsFailed" | ||
| EventUserSecretsDriftDetected = "UserSecretsDriftDetected" |
There was a problem hiding this comment.
TIL: we should start using roles instead of users
| } | ||
|
|
||
| func reconcileUserSecrets(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase) error { | ||
| func reconcileUserSecrets(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, existingDatabases map[string]struct{}) error { |
There was a problem hiding this comment.
this function should not be bothered with logic checking what db is already provisioned, it should be clean
| } | ||
|
|
||
| func ensureSecret(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, roleName, secretName string) error { | ||
| func ensureSecret(ctx context.Context, c client.Client, scheme *runtime.Scheme, postgresDB *enterprisev4.PostgresDatabase, roleName, secretName string, databaseAlreadyProvisioned bool) error { |
There was a problem hiding this comment.
this function should not be responsible for checking if db is already provisioned
| } | ||
| } | ||
|
|
||
| func validateManagedSecret(secret *corev1.Secret, roleName string) error { |
There was a problem hiding this comment.
I wonder why we need this in the first place? What would be the use-case for example username not matching roleName?
There was a problem hiding this comment.
To prevent adopting secret in case of manual changes, for example. Does this makes sense?
0b9acab to
c333f1e
Compare
| Expect(current.Status.Databases[0].AdminUserSecretRef).NotTo(BeNil()) | ||
| Expect(current.Status.Databases[0].RWUserSecretRef).NotTo(BeNil()) | ||
| Expect(current.Status.Databases[0].ConfigMapRef).NotTo(BeNil()) | ||
| Expect(current.Status.ObservedGeneration).NotTo(BeNil()) |
There was a problem hiding this comment.
Isn't that supposed to be removed from the general logic?
There was a problem hiding this comment.
Yes, we removed observedGeneration to act as a database readiness check, but left it to have an idea know what is the CRD version status was emitted for. See here
| case ClusterReady: | ||
| rc.emitOnConditionTransition(postgresDB, postgresDB.Status.Conditions, clusterReady, EventClusterValidated, "Referenced PostgresCluster is ready") | ||
| if err := updateStatus(clusterReady, metav1.ConditionTrue, reasonClusterAvailable, "Cluster is operational", provisioningDBPhase); err != nil { | ||
| if result, conflictErr, ok := requeueOnConflict(err, "persisting cluster ready status"); ok { |
There was a problem hiding this comment.
Do I understand correctly, adding this logic is a replacement for observed generation?
There was a problem hiding this comment.
If asking about this:
if result, conflictErr, ok := requeueOnConflict(err, "persisting cluster ready status"); ok {
return result, conflictErr
}
- this is to catch errors on version conflicts and requeue immediately to soothe reconciliation storm.
| if result, conflictErr, ok := requeueOnConflict(err, "reconciling user secrets"); ok { | ||
| return result, conflictErr | ||
| } | ||
| var driftErr *secretReconcileError |
There was a problem hiding this comment.
are we sure, that It will always be drift related?
What I'm after is having It coupled to the name secretReconcileError, which seems quite generic. Even If the underlaying struct could be verbose and drift only in values.
| updateStatus := func(conditionType conditionTypes, conditionStatus metav1.ConditionStatus, reason conditionReasons, message string, phase reconcileDBPhases) error { | ||
| return persistStatus(ctx, c, postgresDB, conditionType, conditionStatus, reason, message, phase) | ||
| } | ||
| requeueOnConflict := func(err error, action string) (ctrl.Result, error, bool) { |
There was a problem hiding this comment.
more of a nit: Will having a lamda created here not cause to miss scope for us. Ie. If we have k8s conflict error occuring in a few places. Will the action log be sufficient and fully satisfying our business needs? Also, there could be some metrics done in the future from reconcilation misses(conflicts) which could prove useful for system admins.
There was a problem hiding this comment.
Add typed conflict categories instead of lambda
Description
This PR aligns
PostgresDatabasewith the watch-driven reconciliation approach used forPostgresCluster. It removes the reconcile skip based onObservedGeneration, reacts to ownedSecret/ConfigMapdrift even when the spec is unchanged, repairs configmap drift declaratively, and degrades status instead of recreating previously provisioned managed secrets that were deleted.Key Changes
postgresdatabase_controller.goadd explicit drift-trigger predicates for ownedSecret,ConfigMap, and relevant CNPGDatabasechanges. Conflict retries are requeued without surfacing noisy reconcile errors.database.goremove theObservedGenerationshort-circuit and revalidated stages on every reconcile. Add drift handling for managed user secrets, configmap repair/re-adoption, retained-resource adoption, conflict-friendly database reconcile handling, and quieter ready/not-found paths.events.goAdd events for drift and retained-resource adoption.cluster.goreduced log noise on transient status-update conflicts.Testing and Verification
postgresdatabase_controller_test.gofor:database_unit_test.gofor secret/configmap drift and adoption paths.DeleteandRetainpolicies.Related Issues
CPI-1950
PR Checklist