From df497a64d6a61eab11a9ad0de51136afec9dcc4f Mon Sep 17 00:00:00 2001 From: ashishj1729 Date: Wed, 12 Feb 2025 14:58:42 +0530 Subject: [PATCH] Refining Documentation Change-Id: I30f76c352787e9a47a15dacbeacf9967072439d6 Signed-off-by: ashishj1729 --- docs/installation-guide.rst | 299 ++++++++++++++++++++++++++++---------------- docs/ranpm-installation.rst | 3 +- 2 files changed, 195 insertions(+), 107 deletions(-) diff --git a/docs/installation-guide.rst b/docs/installation-guide.rst index d49d12e..f14bb80 100755 --- a/docs/installation-guide.rst +++ b/docs/installation-guide.rst @@ -84,75 +84,114 @@ Ensure image version is correct. -Check running state of all pods and services using below command +Check running state of all pods and services using below command : .. code:: bash - kubectl get pods --all-namespaces - kubectl get svc --all-namespaces - + ~$ kubectl get pods --all-namespaces + + kubeflow cache-deployer-deployment-cf9646b9c-jxlqc 1/1 Running 0 53m + kubeflow cache-server-56d4959c9-sz948 1/1 Running 0 53m + kubeflow leofs-bfc4794f5-7xfdn 1/1 Running 0 56m + kubeflow metadata-envoy-deployment-9c7db86d8-7rlkf 1/1 Running 0 53m + kubeflow metadata-grpc-deployment-d94cc8676-mhw4l 1/1 Running 5 (47m ago) 53m + kubeflow metadata-writer-cd5dd8f7-6qsx6 1/1 Running 1 (46m ago) 53m + kubeflow minio-5dc6ff5b96-4f9xd 1/1 Running 0 53m + kubeflow ml-pipeline-85b6bf5f67-5x9lq 1/1 Running 2 53m + kubeflow ml-pipeline-persistenceagent-fc7c944d4-bjz5n 1/1 Running 1 (46m ago) 53m + kubeflow ml-pipeline-scheduledworkflow-676478b778-h42kx 1/1 Running 0 53m + kubeflow ml-pipeline-ui-76bc4d6c99-8rw9x 1/1 Running 0 53m + kubeflow ml-pipeline-viewer-crd-8574556b89-g5xw7 1/1 Running 0 53m + kubeflow ml-pipeline-visualizationserver-5d7c54f495-mhdtj 1/1 Running 0 53m + kubeflow mysql-5b446b5744-mcqlw 1/1 Running 0 53m + kubeflow workflow-controller-679dcfdd4f-c64bj 1/1 Running 0 53m + traininghost aiml-dashboard-667c546669-rslbz 1/1 Running 0 38m + traininghost aiml-notebook-5689459959-hd8r4 1/1 Running 0 38m + traininghost cassandra-0 1/1 Running 0 41m + traininghost data-extraction-bd7dc6747-98ddq 1/1 Running 0 39m + traininghost kfadapter-75c88574d5-ww7qb 1/1 Running 0 38m + traininghost modelmgmtservice-56874bfc67-ct6lk 1/1 Running 0 38m + traininghost tm-757bf57cb-rlx7v 1/1 Running 0 39m + traininghost tm-db-postgresql-0 1/1 Running 0 53m + + + +**Note: In K Release, dashboard is not supported. We recomment to use cURL to interact with AIMLFW components. +Details are provided in further section for each operation required for model training.** + + +Software Uninstallation & Upgrade +--------------------------------- -Check the AIMLFW dashboard by using the following url -Note: In K Release, dashboard is not supported. We recomment to use cURL to interact with AIMLFW components. -Details are provided in further section for each operation required for model training. +Run the following script to uninstall the `traininghost`: .. code:: bash - http://localhost:32005/ + bin/uninstall_traininghost.sh -In case of any change required in the RECIPE_EXAMPLE/example_recipe_latest_stable.yaml file after installation, -the following steps can be followed to reinstall with new changes. +To update the AIMLFW component, you need to follow a series of steps to ensure that the new changes are properly installed and integrated. .. code:: bash + # Step 1: Uninstall the existing AIMLFW component bin/uninstall.sh - bin/install.sh -f RECIPE_EXAMPLE/example_recipe_latest_stable.yaml - -Software Uninstallation ------------------------ + # Step 2: Update the RECIPE_EXAMPLE/example_recipe_latest_stable.yaml file + # Make necessary changes to the recipe file here -.. code:: bash + # Step 3: Reinstall the AIMLFW component with the updated recipe + bin/install.sh -f RECIPE_EXAMPLE/example_recipe_latest_stable.yaml - bin/uninstall_traininghost.sh .. _install-influx-db-as-datalake: .. _reference2: -Install Influx DB as datalake (Optional) ----------------------------------------- +DataLake Installation +---------------------- + +In the context of AIMLFW, a datalake can be used to store and manage large amounts of data generated by various sources. + +This section provides a detailed guide on how to install and configure a datalake for AIMLFW. Currently we support following methods to injest data for model-training: Standalone InfluxDB Installation and Prepare Non-RT RIC DME as a Data Source for AIMLFW. + -Standalone Influx DB installation can be used if DME is not used as a data source. +1. Install Influx DB as datalake (Optional) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Standalone Influx DB can be installed using the following commands: .. code:: bash helm repo add bitnami https://charts.bitnami.com/bitnami helm install my-release bitnami/influxdb --version 5.13.5 - kubectl exec -it bash + -From below command we can get username, org name, org id and access token + ~$ kubectl get pods -.. code:: bash + NAME READY STATUS RESTARTS AGE + my-release-influxdb-85888dfd97-77dwg 1/1 Running 0 15m - cat bitnami/influxdb/influxd.bolt | tr -cd "[:print:]" +Use the following command to get `INFLUX_DB_TOKEN` which is required while creating feature-group. -eg: {"id":"0a576f4ba82db000","token":"xJVlOom1GRUxDNkldo1v","status":"active","description":"admin's Token","orgID":"783d5882c44b34f0","userID":"0a576f4b91edb000","permissions" ... +.. code:: bash -Use the tokens further in the below configurations and in the recipe file. + kubectl get secret my-release-influxdb -o jsonpath="{.data.admin-user-token}" | base64 --decode -Following are the steps to add qoe data to Influx DB. +**This section provides a detailed guide to onboard test-data to execute model-training.** Execute below from inside Influx DB container to create a bucket: .. code:: bash - influx bucket create -n UEData -o primary -t + # INFLUX_DB_TOKEN is referred to the influxDb-token collected in previous step: + kubectl exec -it -- influx bucket create -n UEData -o primary -t +Note: This Bucket name `UEData` will be reffered while creating featureGroup in further-steps. -Install the following dependencies + +Install the following dependencies which is required for parsing and onboarding data from `.csv` file: .. code:: bash @@ -168,7 +207,7 @@ Use the :file:`insert.py` in ``ric-app/qp repository`` to upload the qoe data in git clone -b f-release https://gerrit.o-ran-sc.org/r/ric-app/qp cd qp/qp -Update :file:`insert.py` file with the following content: +Overwrite :file:`insert.py` file with the following content: .. code-block:: python @@ -181,7 +220,7 @@ Update :file:`insert.py` file with the following content: class INSERTDATA: def __init__(self): - self.client = InfluxDBClient(url = "http://localhost:8086", token="") + self.client = InfluxDBClient(url = "http://localhost:8086", token="") def explode(df): @@ -225,15 +264,15 @@ Update :file:`insert.py` file with the following content: populatedb() -Update ```` in :file:`insert.py` file +Update ```` in :file:`insert.py` with the influxDb-token collected in previous step. -Follow below command to port forward to access Influx DB +Follow below command to port forward for the script to access Influx DB (as no NodePort is exposed for InfluxDb) .. code:: bash kubectl port-forward svc/my-release-influxdb 8086:8086 -To insert data: +Execute the following script to onboard test-data to local influxDb: .. code:: bash @@ -243,74 +282,39 @@ To check inserted data in Influx DB , execute below command inside the Influx DB .. code:: bash - influx query 'from(bucket: "UEData") |> range(start: -1000d)' -o primary -t - - + # Token is referred to the acess-token collected in previous step: + kubectl exec -it -- influx query 'from(bucket: "UEData") |> range(start: -1000d)' -o primary -t -.. _reference3: - -Prepare Non-RT RIC DME as data source for AIMLFW (optional) ------------------------------------------------------------ -Bring up the RANPM setup by following the steps mentioned in the file install/README.md present in the repository `RANPM repository `__ + Result: _result + Table: keys: [_start, _stop, _field, _measurement] + _start:time _stop:time _field:string _measurement:string _time:time _value:int + ------------------------------ ------------------------------ ---------------------- ---------------------- ------------------------------ -------------------------- + 2022-05-18T12:52:18.008858111Z 2025-02-11T12:52:18.008858111Z availPrbDl liveCell 2025-01-23T17:01:22.563381000Z 45 + 2022-05-18T12:52:18.008858111Z 2025-02-11T12:52:18.008858111Z availPrbDl liveCell 2025-01-23T17:01:22.573381000Z 91 + 2022-05-18T12:52:18.008858111Z 2025-02-11T12:52:18.008858111Z availPrbDl liveCell 2025-01-23T17:01:22.583381000Z 273 + 2022-05-18T12:52:18.008858111Z 2025-02-11T12:52:18.008858111Z availPrbDl liveCell 2025-01-23T17:01:22.593381000Z 53 -Once all the pods are in running state, follow the below steps to prepare ranpm setup for AIMLFW qoe usecase data access -The scripts files are present in the folder demos/hrelease/scripts of repository `AIMLFW repository `__ - -Note: The following steps need to be performed in the VM where the ranpm setup is installed. - -.. code:: bash - git clone "https://gerrit.o-ran-sc.org/r/aiml-fw/aimlfw-dep" - cd aimlfw-dep/demos/hrelease/scripts - ./get_access_tokens.sh -Output of ./get_access_tokens.sh can be used during feature group creation step. +.. _reference3: -Execute the below script - -.. code:: bash - - ./prepare_env_aimlfw_access.sh - -Add feature group from AIMLFW dashboard, example on how to create a feature group is shown in this demo video: `Feature group creation demo `__ - -Execute below script to push qoe data into ranpm setup - -.. code:: bash - - ./push_qoe_data.sh - -Example for executing above script - -.. code:: bash - - ./push_qoe_data.sh gnb300505 30 c4/B2 - -Steps to check if data is upload correctly - - -.. code:: bash - - kubectl exec -it influxdb2-0 -n nonrtric -- bash - influx query 'from(bucket: "pm-logg-bucket") |> range(start: -1000000000000000000d)' |grep pdcpBytesDl +Prepare Non-RT RIC DME as data source for AIMLFW (optional) +----------------------------------------------------------- -Steps to clear the data in InfluxDB +Please refer to the `RANPM Installation Guide `__ to install NonRtRic's RANPM and Prepare the DME as a data-soruce for AIMLFW. -.. code:: bash - - kubectl exec -it influxdb2-0 -n nonrtric -- bash - influx delete --bucket pm-logg-bucket --start 1801-01-27T05:00:22.305309038Z --stop 2023-11-14T00:00:00Z -Feature group creation +Feature Group Creation ---------------------- -From AIMLFW dashboard create feature group (Training Jobs-> Create Feature Group ) Or curl +A Feature Group is a logical entity to represent structured dataset, often stored in a Feature Store, to ensure consistency and reusability across different ML models and pipelines. + -NOTE: Here is a curl request to create feature group using curl +Following is the cURL request to create a feature group. .. code:: bash @@ -319,20 +323,23 @@ NOTE: Here is a curl request to create feature group using curl --data '{ "featuregroup_name": "", "feature_list": "", - "datalake_source": "InfluxSource", + "datalake_source": "", "enable_dme": , "host": "", "port": "",", - "dme_port": "", + "dme_port": "", "bucket": "", "token": "", - "source_name": "", - "measured_obj_class": "", - "measurement": "", + "source_name": "", + "measured_obj_class": "", + "measurement": "", "db_org": "" }' -NOTE: Below are some example values to be used for the DME based feature group creation for qoe usecase + +Below are two examples covering supported scenarios for Data Injestion. + +**1. Non-RT RIC DME based feature group creation for Qoe Usecase** .. code:: bash @@ -354,7 +361,7 @@ NOTE: Below are some example values to be used for the DME based feature group c "db_org": "est" } ' -NOTE: Below are some example values to be used for the standalone influx DB creation for qoe usecase. Dme is not used in this example. +**2. Standalone Influx DB based feature group creation for Qoe Usecase.** .. code:: bash @@ -376,10 +383,13 @@ NOTE: Below are some example values to be used for the standalone influx DB crea "db_org": "primary" }' -Register Model (compulsory) ---------------------------- +Register Model +--------------- + +A model MUST be registered to the Model-Management-Service (MME) before submitting any training request. +A model is uniquely identified by modelName and modelVersion. +Following is the sample cURL request to be used for registering the model. -Register the model using the below steps using Model management service for training. .. code:: bash @@ -400,41 +410,118 @@ Register the model using the below steps using Model management service for trai } }' + # inputDataType & outputDataType represents the input(features) & output for trainedModels. + # Note: Currently, outputDataType is not fucntionality used in implementation. + Model Discovery --------------- -Model discovery can be done using the following API endpoint: +This section describes model-discovery and its various options. - -To fetch all registered models, use the following API endpoint: +a. To fetch all registered models, use the following API endpoint: .. code:: bash curl --location 'http://:32006/ai-ml-model-discovery/v1/models' -To fetch models with model name , use the following API endpoint: +b. To fetch models with modelName, use the following API endpoint: .. code:: bash curl --location 'http://:32006/ai-ml-model-discovery/v1/models?model-name=' -To fetch specific model, use the following API endpoint: +c. To fetch specific model, use the following API endpoint: .. code:: bash curl --location 'http://:32006/ai-ml-model-discovery/v1/models?model-name=&&model-version=' +Onboarding Training/Re-Training Pipelines +------------------------------------------- + +Training and retraining pipelines in AIMLFW (AI/ML Framework for O-RAN SC) are structured sequences of steps designed to train or retrain ML models. These pipelines automate the execution of data processing, model training, evaluation, and storage, ensuring a streamlined workflow. + +1. Onboard Pre-Existing Pipeline +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + +2. Onboard Custom Pipeline +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +| To use a training/retraining pipeline in AIMLFW, it needs to be onboarded into the system. This involves the following steps: +| **a. Pipeline Definition**: A pipeline must be defined in code (e.g., Python) using Kubeflow Pipelines SDK. It includes all necessary ML steps, such as data ingestion, preprocessing, training, and model deployment. +| **b. Pipeline Registration**: The pipeline must be registered in Kubeflow Pipelines so that AIMLFW can utilize it. This is done by compiling the pipeline code and uploading it to the pipeline repository. +| **c. Pipeline Invocation**: During training instance creation, users must specify the required pipeline.AIMLFW uses the registered pipeline to launch a training/retraining job. + +Following is a sample `pseudo-code` for a custom pipeline which user can implement and onboard. + +.. code:: python + + from kfp import dsl + from kfp.compiler import Compiler + from kubernetes import client as k8s_client + + @dsl.pipeline( + name="Model Training Pipeline", + description="A sample pipeline for training a machine learning model" + ) + def training_pipeline(): + # Implement the trainingPipeline Here + + + # Compile the pipeline to yaml-file + Compiler().compile(training_pipeline, "") + + # Upload Pipeline to AIMLFW + import requests + requests.post("http://:32002/pipelines//upload", files={'file':open("",'rb')}) + + + +One can refer `kubeflow documentation `__ for implementing your pipeline. + + + Training job creation with DME or Standalone InfluxDB as data source -------------------------------------------------------------------- -#. AIMLFW should be installed by following steps in section :ref:`Software Installation and Deployment ` -#. RANPM setup should be installed and configured as per steps mentioned in section :ref:`Prepare Non-RT RIC DME as data source for AIMLFW ` -#. After training job is created and executed successfully, model can be deployed using steps mentioned in section :ref:`Deploy trained qoe prediction model on Kserve ` or - :ref:`Steps to deploy model using Kserve adapter ` - NOTE: The QoE training function does not come pre uploaded, we need to go to training function, create training function and run the qoe-pipeline notebook. +The TrainingJob + +.. code:: bash + + curl --location 'http://:32002/ai-ml-model-training/v1/training-jobs' \ + --header 'Content-Type: application/json' \ + --data '{ + "modelId":{ + "modelname": "modeltest15", + "modelversion": "1" + }, + "model_location": "", + "training_config": { + "description": "trainingjob for testing", + "dataPipeline": { + "feature_group_name": , + "query_filter": "", + "arguments": "{'epochs': 1}" + }, + "trainingPipeline": { + "training_pipeline_name": "qoe_Pipeline_testing_1", + "training_pipeline_version": "qoe_Pipeline_testing_1", + "retraining_pipeline_name":"qoe_Pipeline_retrain", + "retraining_pipeline_version":"2" + } + }, + "training_dataset": "", + "validation_dataset": "", + "notification_url": "", + "consumer_rapp_id": "", + "producer_rapp_id": "" + }' + + .. code:: bash curl --location 'http://:32002/ai-ml-model-training/v1/training-jobs' \ diff --git a/docs/ranpm-installation.rst b/docs/ranpm-installation.rst index 7316412..b7b8ea0 100644 --- a/docs/ranpm-installation.rst +++ b/docs/ranpm-installation.rst @@ -254,7 +254,8 @@ Execute below script to push qoe data into ranpm setup The Following script downloads `cells.csv `__ , filters the data based on ``Cell Identity``, -For each PM report, the script convert the PM-report to XML documents, uploads it to one of 'pm-https-server', and sends a File-Ready event on Kafka-topic signifying that the PM report is ready to be processed by RANPM +For each PM report, the script convert the PM-report to XML documents, uploads it to one of 'pm-https-server', and sends a File-Ready event on Kafka-topic signifying that the PM report is ready to be processed by RANPM. +Once the file is processed, the PM reports is stored under bucket_name `pm-logg-bucket` and measurement `test,ManagedElement=nodedntest,GNBDUFunction=1004,NRCellDU=` which will be reffered while creating featureGroup in further-steps. Example for executing above script -- 2.16.6