docs/overview.rst

   1 .. This work is licensed under a Creative Commons Attribution 4.0 International License.
   2 .. SPDX-License-Identifier: CC-BY-4.0
   3 .. Copyright (C) 2019-2024 Wind River Systems, Inc.
   4
   5 Infrastructure Overview (INF)
   6 =============================
   7
   8 This project is a reference implementation of O-Cloud infrastructure and it implements a real time platform (rtp) to deploy the O-CU and O-DU.
   9
  10 In O-RAN architecture, the O-DU and O-CU could have different deployed scenarios.
  11 The could be container based or VM based, which will be both supported in the release.
  12 In general the performance sensitive parts of the 5G stack require real time platform,
  13 especially for O-DU, the L1 and L2 are requiring the real time feature,
  14 the platform should support the Preemptive Scheduling feature.
  15
  16 Following requirements are going to address the container based solution:
  17
  18 1. Support the real time kernel
  19 2. Support Node Feature Discovery
  20 3. Support CPU Affinity and Isolation
  21 4. Support Dynamic HugePages Allocation
  22
  23 And for the network requirements, the following should be supported:
  24
  25 1. Multiple Networking Interface
  26 2. High performance data plane including the DPDK based vswitch and PCI pass-through/SR-IOV.
  27
  28 O-Cloud Components
  29 ------------------
  30
  31 In this project, the following O-Cloud components and services are enabled:
  32
  33 1. Fault Management
  34
  35    - Framework for infrastructure services to raise and persist alarm and event data.
  36
  37      - Set, clear and query customer alarms
  38      - Generate customer logs for significant events
  39
  40    - Maintains an Active Alarm List
  41    - Provides REST API to query alarms and events, also available through SNMP traps
  42    - Support for alarm suppression
  43    - Operator alarms
  44
  45      - On platform nodes and resources
  46      - On hosted virtual resources
  47
  48    - Operator logs - Event List
  49
  50      - Logging of sets/clears of alarms
  51      - Related to platform nodes and resources
  52      - Related to hosted virtual resources
  53
  54 2. Configuration Management
  55
  56    - Manages Installation and Commissioning
  57
  58      - Auto-discover of new nodes
  59      - Full Infrastructure management
  60      - Manage installation parameters (i.e. console, root disks)
  61
  62    - Nodal Configuration
  63
  64      - Node role, role profiles
  65      - Core, memory (including huge page) assignments
  66      - Network Interfaces and storage assignments
  67
  68    - Hardware Discovery
  69
  70      - CPU/cores, SMT, processors, memory, huge pages
  71      - Storage, ports
  72      - GPUs, storage, Crypto/compression H/W
  73
  74 3. Software Management
  75
  76    - Manages Installation and Commissioning
  77
  78      - Auto-discover of new nodes
  79      - Full Infrastructure management
  80      - Manage installation parameters (i.e. console, root disks)
  81
  82    - Nodal Configuration
  83
  84      - Node role, role profiles
  85      - Core, memory (including huge page) assignments
  86      - Network Interfaces and storage assignments
  87
  88    - Hardware Discovery
  89
  90      - CPU/cores, SMT, processors, memory, huge pages
  91      - Storage, ports
  92      - GPUs, storage, Crypto/compression H/W
  93
  94 4. Host Management
  95
  96    - Full life-cycle and availability management of the physical hosts
  97    - Detects and automatically handles host failures and initiates recovery
  98    - Monitoring and fault reporting for:
  99
 100      - Cluster connectivity
 101      - Critical process failures
 102      - Resource utilization thresholds, interface states
 103      - H/W fault / sensors, host watchdog
 104      - Activity progress reporting
 105
 106    - Interfaces with board management (BMC)
 107
 108      - For out of band reset
 109      - Power-on/off
 110      - H/W sensor monitoring
 111
 112 5. Service Management
 113
 114    - Manages high availability of critical infrastructure and cluster services
 115
 116      - Supports many redundancy models: N, or N+M
 117      - Active or passive monitoring of services
 118      - Allows for specifying the impact of a service failure and escalation policy
 119      - Automatically recovers failed services
 120
 121    - Uses multiple messaging paths to avoid split-brain communication failures
 122
 123      - Up to 3 independent communication paths
 124      - LAG can also be configured for multi-link protection of each path
 125      - Messages are authenticated using HMAC
 126      - SHA-512 if configured / enabled on an interface by-interface basis
 127
 128 6. Support the ansible bootstrap to implement the zero touch provisioning
 129
 130 Enable the ansible configuration functions for infrastructure itself including the image installation and service configuration.
 131
 132 NOTE: These features leverage the StarlingX (www.starlingx.io). And in current release, these features are only avalaible for IA platform.
 133
 134 Multi OS and Deployment Configurations
 135 --------------------------------------
 136
 137 * The INF project supports Multi OS and currently the following OS are supported:
 138
 139   * StarlingX
 140
 141     * Debian 11 (bullseye)
 142     * CentOS 7
 143     * Yocto 2.7 (warrior)
 144
 145   * OKD
 146
 147     * Fedora CoreOS 38
 148
 149 A variety of deployment configuration options are supported:
 150
 151 1. **All-in-one Simplex**
 152
 153   A single physical server providing all three cloud functions (controller, worker and storage).
 154
 155 2. **All-in-one Duplex**
 156
 157   Two HA-protected physical servers, both running all three cloud functions (controller, worker and storage), optionally with up to 50 worker nodes added to the cluster.
 158
 159 3. **All-in-one Duplex + up to 50 worker nodes**
 160
 161   Two HA-protected physical servers, both running all three cloud functions (controller, worker and storage), plus with up to 50 worker nodes added to the cluster.
 162
 163 4. **Standard with Storage Cluster on Controller Nodes**
 164
 165   A two node HA controller + storage node cluster, managing up to 200 worker nodes.
 166
 167 5. **Standard with Storage Cluster on dedicated Storage Nodes**
 168
 169   A two node HA controller node cluster with a 2-9 node Ceph storage cluster, managing up to 200 worker nodes.
 170
 171 6. **Distributed Cloud**
 172
 173   Distributed Cloud configuration supports an edge computing solution by providing central management and orchestration for a geographically distributed network of StarlingX systems.
 174
 175 **NOTE:**
 176
 177  - For Debian and CentOS based image, all the above deployment configuration are supported.
 178  - For Yocto Based image, only deployment 1 - 3 are supported, and only container based solution is supported, VM based is not supprted yet.
 179
 180 About Yocto and OpenEmbedded
 181 ----------------------------
 182 The Yocto Project is an open source collaboration project that provides templates,
 183 tools and methods to help you create custom Linux-based systems for embedded and
 184 IOT products, regardless of the hardware architecture.
 185
 186 OpenEmbedded is a build automation framework and cross-compile environment used
 187 to create Linux distributions for embedded devices. The OpenEmbedded framework
 188 is developed by the OpenEmbedded community, which was formally established in 2003.
 189 OpenEmbedded is the recommended build system of the Yocto Project, which is a Linux
 190 Foundation workgroup that assists commercial companies in the development of Linux-based
 191 systems for embedded products.
 192
 193
 194 About StarlingX
 195 ---------------
 196 StarlingX is a complete cloud infrastructure software stack for the edge used by the most demanding applications in industrial IOT, telecom, video delivery and other ultra-low latency use cases. With deterministic low latency required by edge applications, and tools that make distributed edge manageable, StarlingX provides a container-based infrastructure for edge implementations in scalable solutions that is ready for production now.
 197
 198 About OKD
 199 ---------------
 200 OKD is a complete open source container application platform and the community Kubernetes distribution that powers Red Hat OpenShift.
 201
 202 Contact info
 203 ------------
 204 If you need support or add new features/components, please feel free to contact the following:
 205
 206  - Jackie Huang <jackie.huang@windriver.com>