Upgrade xapp-frame-go module in integration tests

[ric-plt/a1.git] / docs / overview.rst
diff --git a/docs/overview.rst b/docs/overview.rst

index 5ba3e31..73f8ed9 100644 (file)
--- a/docs/overview.rst
+++ b/docs/overview.rst
@@ -47,9 +47,15 @@ Xapps to A1
  ~~~~~~~~~~~
  There are three scenarios in which Xapps are to send a message to A1:
  
  ~~~~~~~~~~~
  There are three scenarios in which Xapps are to send a message to A1:
  
-1. When an xapp receives a CREATE or UPDATE message for a policy instance. Xapps must respond to these requests by sending a message of type 20011 to A1. The schema for that message is defined by ``downstream_notification_schema`` in ``docs/a1_xapp_contract_openapi.yaml``
-2. Since policy instances can "deprecate" other instances, there are times when xapps need to asyncronously tell A1 that a policy is no longer active. Same message type and schema. The only difference between case 1 and 2 is that case 1 is a "reply" and case 2 is "unsolicited".
-3. Xapps can request A1 to re-send all instances of a type using a query, message 20012. When A1 receives this (TBD HERE, STILL BE WORKED OUT)
+1. When an xapp receives a CREATE message for a policy instance. Xapps must respond to these requests by sending a message of type 20011 to A1.
+   The schema for that message is defined by ``downstream_notification_schema`` in ``docs/a1_xapp_contract_openapi.yaml``.
+   Note, if the Xapp uses RTS for this, do not forget to change the message type before replying!
+2. Since policy instances can "deprecate" other instances, there are times when xapps need to asyncronously tell A1 that a policy is no longer active. Same message type and schema.
+3. Xapps can request A1 to re-send all instances of a type T using a query, message 20012.
+   The schema for that message is defined by ``policy_query_schema`` in ``docs/a1_xapp_contract_openapi.yaml`` (just a body with ``{policy_type_id}: ...}```.
+   When A1 receives this, A1 will send the xapp a CREATE message N times, where N is the number of policy instances for type T. The xapp should reply normally to each of those as per bullet 1.
+   That is, after the Xapp performs the query, the N CREATE messages sent and the N replies are "as normal".
+   The query just kicks off this process rather than an external caller to A1.
  
  
  Known differences from A1 1.0.0 spec
  
  
  Known differences from A1 1.0.0 spec
@@ -73,3 +79,23 @@ In some cases, the spec is deficient and we are "ahead", in other cases this doe
  7. [Spec is ahead] The spec defines that a query of all policy instances should return the full bodies, however right now the RIC A1m returns a list of IDs (assuming subsequent queries can fetch the bodies).
  
  8. [?] The spec document details some very specific "types", but the RIC A1m allows these to be loaded in (see #1). For example, spec section 4.2.6.2. We believe this should be removed from the spec and rather defined as a type. Xapps can be created that define new types, so the spec will quickly become "stale" if "types" are defined in the spec.
  7. [Spec is ahead] The spec defines that a query of all policy instances should return the full bodies, however right now the RIC A1m returns a list of IDs (assuming subsequent queries can fetch the bodies).
  
  8. [?] The spec document details some very specific "types", but the RIC A1m allows these to be loaded in (see #1). For example, spec section 4.2.6.2. We believe this should be removed from the spec and rather defined as a type. Xapps can be created that define new types, so the spec will quickly become "stale" if "types" are defined in the spec.
+
+
+Resiliency
+----------
+
+A1 is resilient to the majority of failures, but not all currently (though a solution is known).
+
+A1 uses the RIC SDL library to persist all policy state information: this includes the policy types, policy instances, and policy statuses.
+If state is built up in A1, and A1 fails (where Kubernetes will then restart it), none of this state is lost.
+
+The tiny bit of state that *is currently* in A1 (volatile) is it's "next second" job queue.
+Specifically, when policy instances are created or deleted, A1 creates jobs in a job queue (in memory).
+An rmr thread polls that thread every second, dequeues the jobs, and performs them.
+
+If A1 were killed at *exactly* the right time, you could have jobs lost, meaning the PUT or DELETE of an instance wouldn't actually take.
+This isn't drastic, as the operations are idempotent and could always be re-performed.
+
+In order for A1 to be considered completely resilient, this job queue would need to be moved to SDL.
+SDL uses Redis as a backend, and Redis natively supports queues via LIST, LPUSH, RPOP.
+I've asked the SDL team to consider an extension to SDL to support these Redis operations.