diff --git a/.github/workflows/update-plugin-docs.yml b/.github/workflows/update-plugin-docs.yml index a4da869c..97c1e48a 100644 --- a/.github/workflows/update-plugin-docs.yml +++ b/.github/workflows/update-plugin-docs.yml @@ -37,7 +37,6 @@ jobs: run: | source venv/bin/activate python docs/generate_plugin_doc_bundle.py \ - --package nodescraper.plugins.inband \ --output docs/PLUGIN_DOC.md \ --update-readme-help diff --git a/.mypy.ini b/.mypy.ini index f9d68f19..cf6c2344 100644 --- a/.mypy.ini +++ b/.mypy.ini @@ -1,5 +1,6 @@ [mypy] # Global mypy configuration +mypy_path = test/unit [mypy-nodescraper.base.regexanalyzer] ignore_errors = True diff --git a/README.md b/README.md index 9b3b25ae..dce86aef 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Node Scraper Node Scraper is a tool which performs automated data collection and analysis for the purposes of -system debug. +system debug. For details on what data is collected and analyzed, see the [plugin reference table](docs/PLUGIN_DOC.md). ## Table of Contents - [Installation](#installation) diff --git a/docs/PLUGIN_DOC.md b/docs/PLUGIN_DOC.md index 0ca1366f..cf7cb371 100644 --- a/docs/PLUGIN_DOC.md +++ b/docs/PLUGIN_DOC.md @@ -1,9 +1,10 @@ # Plugin Documentation -# Plugin Table +# IB Plugins | Plugin | Collection | Analyzer Args | Collection Args | DataModel | Collector | Analyzer | | --- | --- | --- | --- | --- | --- | --- | +| GenericCollectionPlugin | Runs each command from collection_args.commands on the target (in-band host or BMC over OOB SSH).
Commands are user-configured; there are no fixed CMD_* class fields. | **Analyzer Args:**
- `checks`: list[nodescraper.plugins.generic_collection.analyzer_args.CommandCheck] — Per-command validation rules keyed by collected command name. | **Collection Args:**
- `commands`: list[nodescraper.plugins.generic_collection.collector_args.CommandSpec] — Named commands to run. Each entry must include 'name' and 'command'. Prefer small textual stdout; see class docstring...
- `sudo`: bool — Default sudo setting for commands that do not specify sudo.
- `timeout`: int — Default per-command timeout in seconds.
- `include_stdout`: bool — Default: include each command's stdout in collected results for analysis. When false, stdout is omitted from stored r... | [GenericCollectionDataModel](#GenericCollectionDataModel-Model) | [GenericCollectionCollector](#Collector-Class-GenericCollectionCollector) | [GenericAnalyzer](#Data-Analyzer-Class-GenericAnalyzer) | | AmdSmiPlugin | bad-pages
firmware --json
list --json
metric -g all
partition --json
process --json
ras --cper --folder={folder}
ras --afid --cper-file {cper_file}
static -g all --json
static -g {gpu_id} --json
topology
version --json
xgmi -l
xgmi -m | **Analyzer Args:**
- `check_static_data`: bool — If True, run static data checks (e.g. driver version, partition mode).
- `expected_gpu_processes`: Optional[int] — Expected number of GPU processes.
- `expected_max_power`: Optional[int] — Expected maximum power value (e.g. watts).
- `expected_power_management`: Optional[str] — Expected amd-smi metric power_management value per GPU (e.g. DISABLED for active/full power, ENABLED for power-manage...
- `expected_driver_version`: Optional[str] — Expected AMD driver version string.
- `expected_memory_partition_mode`: Optional[str] — Expected memory partition mode (e.g. sp3, dp).
- `expected_compute_partition_mode`: Optional[str] — Expected compute partition mode.
- `expected_firmware_versions`: Optional[dict[str, str]] — Expected firmware versions keyed by amd-smi fw_id (e.g. PLDM_BUNDLE).
- `l0_to_recovery_count_error_threshold`: Optional[int] — L0-to-recovery count above which an error is raised.
- `l0_to_recovery_count_warning_threshold`: Optional[int] — L0-to-recovery count above which a warning is raised.
- `vendorid_ep`: Optional[str] — Expected endpoint vendor ID (e.g. for PCIe).
- `vendorid_ep_vf`: Optional[str] — Expected endpoint VF vendor ID.
- `devid_ep`: Optional[str] — Expected endpoint device ID.
- `devid_ep_vf`: Optional[str] — Expected endpoint VF device ID.
- `sku_name`: Optional[str] — Expected SKU name string for GPU.
- `expected_xgmi_speed`: Optional[list[float]] — Expected xGMI speed value(s) (e.g. link rate).
- `analysis_range_start`: Optional[datetime.datetime] — Start of time range for time-windowed analysis.
- `analysis_range_end`: Optional[datetime.datetime] — End of time range for time-windowed analysis. | **Collection Args:**
- `analysis_firmware_ids`: Optional[list[str]] — amd-smi fw_id values to record in analysis_ref.firmware_versions
- `cper_file_path`: Optional[str] — Path to CPER folder or file for RAS AFID collection (ras --afid --cper-file). | [AmdSmiDataModel](#AmdSmiDataModel-Model) | [AmdSmiCollector](#Collector-Class-AmdSmiCollector) | [AmdSmiAnalyzer](#Data-Analyzer-Class-AmdSmiAnalyzer) | | BiosPlugin | sh -c 'cat /sys/devices/virtual/dmi/id/bios_version'
wmic bios get SMBIOSBIOSVersion /Value | **Analyzer Args:**
- `exp_bios_version`: list[str] — Expected BIOS version(s) to match against collected value (str or list).
- `regex_match`: bool — If True, match exp_bios_version as regex; otherwise exact match. | - | [BiosDataModel](#BiosDataModel-Model) | [BiosCollector](#Collector-Class-BiosCollector) | [BiosAnalyzer](#Data-Analyzer-Class-BiosAnalyzer) | | CmdlinePlugin | cat /proc/cmdline | **Analyzer Args:**
- `required_cmdline`: Union[str, List] — Command-line parameters that must be present (e.g. 'pci=bfsort').
- `banned_cmdline`: Union[str, List] — Command-line parameters that must not be present.
- `os_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-OS overrides for required_cmdline and banned_cmdline (keyed by OS identifier).
- `platform_overrides`: Dict[str, nodescraper.plugins.inband.cmdline.cmdlineconfig.OverrideConfig] — Per-platform overrides for required_cmdline and banned_cmdline (keyed by platform). | - | [CmdlineDataModel](#CmdlineDataModel-Model) | [CmdlineCollector](#Collector-Class-CmdlineCollector) | [CmdlineAnalyzer](#Data-Analyzer-Class-CmdlineAnalyzer) | @@ -24,6 +25,7 @@ | PciePlugin | lspci -d {vendor_id}: -nn
lspci -x
lspci -xxxx
lspci -PP
lspci -PP -d {vendor_id}:{dev_id}
lspci -PP -D -d {vendor_id}:{dev_id}
lspci -PP -D
lspci -vvv
lspci -vvvt | **Analyzer Args:**
- `exp_speed`: int — Expected PCIe link speed (generation 1–5).
- `exp_width`: int — Expected PCIe link width in lanes (1–16).
- `exp_sriov_count`: int — Expected SR-IOV virtual function count.
- `exp_gpu_count_override`: Optional[int] — Override expected GPU count for validation.
- `exp_max_payload_size`: Union[Dict[int, int], int, NoneType] — Expected max payload size: int for all devices, or dict keyed by device ID.
- `exp_max_rd_req_size`: Union[Dict[int, int], int, NoneType] — Expected max read request size: int for all devices, or dict keyed by device ID.
- `exp_ten_bit_tag_req_en`: Union[Dict[int, int], int, NoneType] — Expected 10-bit tag request enable: int for all devices, or dict keyed by device ID. | - | [PcieDataModel](#PcieDataModel-Model) | [PcieCollector](#Collector-Class-PcieCollector) | [PcieAnalyzer](#Data-Analyzer-Class-PcieAnalyzer) | | ProcessPlugin | top -b -n 1
rocm-smi --showpids
top -b -n 1 -o %CPU | **Analyzer Args:**
- `max_kfd_processes`: int — Maximum allowed number of KFD (Kernel Fusion Driver) processes; 0 disables the check.
- `max_cpu_usage`: float — Maximum allowed CPU usage (percent) for process checks. | **Collection Args:**
- `top_n_process`: int — Number of top processes by CPU usage to collect (e.g. for top -b -n 1 -o %%CPU). | [ProcessDataModel](#ProcessDataModel-Model) | [ProcessCollector](#Collector-Class-ProcessCollector) | [ProcessAnalyzer](#Data-Analyzer-Class-ProcessAnalyzer) | | RdmaPlugin | rdma link -j
rdma dev
rdma link
rdma statistic -j | - | - | [RdmaDataModel](#RdmaDataModel-Model) | [RdmaCollector](#Collector-Class-RdmaCollector) | [RdmaAnalyzer](#Data-Analyzer-Class-RdmaAnalyzer) | +| RegexSearchPlugin | - | Runs RegexSearchAnalyzer: user-defined patterns via analysis_args.error_regex (same shape as Dmesg).
Emits regex match events with optional per-file source in the description when scanning directories.
**Analyzer Args:**
- `error_regex`: Optional[list[dict[str, Any]]] — Regex patterns to search for; each dict may include regex (str), message, event_category, event_priority (same as Dme...
- `interval_to_collapse_event`: int — Seconds within which repeated events are collapsed into one.
- `num_timestamps`: int — Number of timestamps to include per event in output. | - | [RegexSearchData](#RegexSearchData-Model) | - | [RegexSearchAnalyzer](#Data-Analyzer-Class-RegexSearchAnalyzer) | | RocmPlugin | {rocm_path}/opencl/bin/*/clinfo
env | grep -Ei 'rocm|hsa|hip|mpi|openmp|ucx|miopen'
ls /sys/class/kfd/kfd/proc/
grep -i -E 'rocm' /etc/ld.so.conf.d/*
{rocm_path}/bin/rocminfo
ls -v -d {rocm_path}*
ls -v -d {rocm_path}-[3-7]* | tail -1
ldconfig -p | grep -i -E 'rocm'
grep . -H -r -i {rocm_path}/.info/* | **Analyzer Args:**
- `exp_rocm`: Union[str, list] — Expected ROCm version string(s) to match (e.g. from rocminfo).
- `exp_rocm_latest`: str — Expected 'latest' ROCm path or version string for versioned installs.
- `exp_rocm_sub_versions`: dict[str, Union[str, list]] — Map sub-version name (e.g. version_rocm) to expected string or list of allowed strings. | **Collection Args:**
- `rocm_path`: str — Base path to ROCm installation (e.g. /opt/rocm). Used for rocminfo, clinfo, and version discovery. | [RocmDataModel](#RocmDataModel-Model) | [RocmCollector](#Collector-Class-RocmCollector) | [RocmAnalyzer](#Data-Analyzer-Class-RocmAnalyzer) | | StoragePlugin | sh -c 'df -lH -B1 | grep -v 'boot''
wmic LogicalDisk Where DriveType="3" Get DeviceId,Size,FreeSpace | - | **Collection Args:**
- `skip_sudo`: bool — If True, do not use sudo when running df and related storage commands. | [StorageDataModel](#StorageDataModel-Model) | [StorageCollector](#Collector-Class-StorageCollector) | [StorageAnalyzer](#Data-Analyzer-Class-StorageAnalyzer) | | SysSettingsPlugin | cat /sys/{}
ls -1 /sys/{}
ls -l /sys/{} | **Analyzer Args:**
- `checks`: Optional[list[nodescraper.plugins.inband.sys_settings.analyzer_args.SysfsCheck]] — List of sysfs checks (path, expected values or pattern, display name). | **Collection Args:**
- `paths`: list[str] — Sysfs paths to read (cat). Paths with '*' are collected with ls -l (e.g. class/net/*/device).
- `directory_paths`: list[str] — Sysfs paths to list (ls -1); used for checks that match entry names by regex. | [SysSettingsDataModel](#SysSettingsDataModel-Model) | [SysSettingsCollector](#Collector-Class-SysSettingsCollector) | [SysSettingsAnalyzer](#Data-Analyzer-Class-SysSettingsAnalyzer) | @@ -31,8 +33,42 @@ | SyslogPlugin | ls -1 /var/log/syslog* 2>/dev/null | grep -E '^/var/log/syslog(\.[0-9]+(\.gz)?)?$' || true
ls -1 /var/log/messages* 2>/dev/null | grep -E '^/var/log/messages(\.[0-9]+(\.gz)?)?$' || true | - | - | [SyslogData](#SyslogData-Model) | [SyslogCollector](#Collector-Class-SyslogCollector) | - | | UptimePlugin | uptime | - | - | [UptimeDataModel](#UptimeDataModel-Model) | [UptimeCollector](#Collector-Class-UptimeCollector) | - | +# OOB plugins + +| Plugin | Collection | Analyzer Args | Collection Args | DataModel | Collector | Analyzer | +| --- | --- | --- | --- | --- | --- | --- | +| OobGenericCollectionPlugin | Runs each command from collection_args.commands on the target (in-band host or BMC over OOB SSH).
Commands are user-configured; there are no fixed CMD_* class fields. | **Analyzer Args:**
- `checks`: list[nodescraper.plugins.generic_collection.analyzer_args.CommandCheck] — Per-command validation rules keyed by collected command name. | **Collection Args:**
- `commands`: list[nodescraper.plugins.generic_collection.collector_args.CommandSpec] — Named commands to run. Each entry must include 'name' and 'command'. Prefer small textual stdout; see class docstring...
- `sudo`: bool — Default sudo setting for commands that do not specify sudo.
- `timeout`: int — Default per-command timeout in seconds.
- `include_stdout`: bool — Default: include each command's stdout in collected results for analysis. When false, stdout is omitted from stored r... | [GenericCollectionDataModel](#GenericCollectionDataModel-Model) | [GenericCollectionCollector](#Collector-Class-GenericCollectionCollector) | [GenericAnalyzer](#Data-Analyzer-Class-GenericAnalyzer) | +| OobBmcArchivePlugin | SSH (BMC) shell: tar+gzip archives for each path in collection_args (see PathSpec entries).
Uses sudo on the BMC when collection_args paths require elevated access. | - | **Collection Args:**
- `paths`: list[nodescraper.plugins.ooband.bmc_archive.collector_args.PathSpec] — Named BMC paths to archive with tar czf -. Configure in plugin config under plugins.OobBmcArchivePlugin.collection_ar...
- `sudo`: bool — Default sudo setting for paths that do not specify sudo.
- `timeout`: int — Default per-path tar timeout in seconds.
- `skip_if_missing`: bool — Skip paths that do not exist on the BMC instead of failing collection.
- `ignore_failed_read`: bool — When true, pass GNU tar's --ignore-failed-read when the remote tar supports it. | [BmcArchiveDataModel](#BmcArchiveDataModel-Model) | [BmcArchiveCollector](#Collector-Class-BmcArchiveCollector) | - | +| RedfishEndpointPlugin | Redfish GET: explicit paths from collection_args.uris (parallel when max_workers>1).
Optional paged GET following the Members collection OData nextLink field when follow_next_link is true.
Redfish GET tree: when discover_tree is true, walks from api_root using OData resource id links and Members navigation (depth and endpoint caps from collection_args). | For each entry in analysis_args.checks, reads JSON paths in collected responses and compares values to constraints (eq, min/max, anyOf, regex, etc.).
URI key "*" runs checks against every collected response body.
**Analyzer Args:**
- `checks`: dict[str, dict[str, Union[int, float, str, bool, dict[str, Any]]]] — Map: URI or '*' -> { property_path: constraint }. URI keys must match a key in the collected responses (exact match).... | **Collection Args:**
- `uris`: list[str] — Redfish URIs to GET. Ignored when discover_tree is True.
- `discover_tree`: bool — If True, discover endpoints from the BMC Redfish tree (service root and links) instead of using uris.
- `tree_max_depth`: int — When discover_tree is True: max traversal depth (1=service root only, 2=root + collections, 3=+ members).
- `tree_max_endpoints`: int — When discover_tree is True: max endpoints to discover (0=no limit).
- `max_workers`: int — Max concurrent GETs (1=sequential). Use >1 for async endpoint fetches.
- `follow_next_link`: bool — If True, follow Redfish Members collection OData nextLink pagination for each URI and merge all pages into a single r...
- `max_pages`: int — When follow_next_link is True: safety cap on the number of pages to follow per URI (default 200). | [RedfishEndpointDataModel](#RedfishEndpointDataModel-Model) | [RedfishEndpointCollector](#Collector-Class-RedfishEndpointCollector) | [RedfishEndpointAnalyzer](#Data-Analyzer-Class-RedfishEndpointAnalyzer) | +| RedfishOemDiagPlugin | Redfish LogService.CollectDiagnosticData for each entry in collection_args.oem_diagnostic_types (collection_args.log_service_path selects the LogService).
Optional binary archives under the plugin log path when log_path is set. | Summarizes success/failure per OEM diagnostic type from collected results.
When analysis_args.require_all_success is true, fails the run if any type failed collection.
**Analyzer Args:**
- `require_all_success`: bool — If True, analysis fails when any OEM type collection failed. | **Collection Args:**
- `log_service_path`: str — Redfish path to the LogService (e.g. DiagLogs).
- `oem_diagnostic_types_allowable`: Optional[list[str]] — Allowable OEM diagnostic types for this architecture/BMC. When set, used for validation and as default for oem_diagno...
- `oem_diagnostic_types`: list[str] — OEM diagnostic types to collect. When empty and oem_diagnostic_types_allowable is set, defaults to that list.
- `task_timeout_s`: int — Max seconds to wait for each BMC task. | [RedfishOemDiagDataModel](#RedfishOemDiagDataModel-Model) | [RedfishOemDiagCollector](#Collector-Class-RedfishOemDiagCollector) | [RedfishOemDiagAnalyzer](#Data-Analyzer-Class-RedfishOemDiagAnalyzer) | +| ServiceabilityPluginMI3XX | - | **Analyzer Args:**
- `hub_python_module`: Optional[str] — Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs.
- `hub_display_name`: Optional[str] — Optional label for analyzer status messages.
- `afid_sag_path`: Optional[str] — Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg.
- `hub_init_path_kwarg`: str — Hub __init__ keyword that receives afid_sag_path.
- `hub_analyze_method`: str — Hub method called with rf_events first (default get_service_info).
- `skip_hub`: bool — If True, only build afid_events without running the service hub.
- `cper_decode_module`: Optional[str] — Module import path for CPER decoding when events include CPER attachments.
- `cper_decode_method`: str — Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out.
- `hub_options`: Optional[dict[str, Any]] — Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key.
- `from_ac_cycle`: int — from_ac_cycle kwarg for the hub analyze call (merged after hub_options).
- `from_date`: Optional[str] — Optional from_date for the hub analyze call (merged after hub_options).
- `designation_serials`: Optional[dict[str, str]] — Optional designation_serials for the hub analyze call (merged after hub_options).
- `suppress_service_actions`: Optional[list[str]] — Optional suppress_service_actions for the hub analyze call (merged after hub_options). | **Collection Args:**
- `uri`: Optional[str] — Optional alias for ``rf_event_log_uri``. When both ``uri`` and ``rf_event_log_uri`` are explicitly set to non-empty v...
- `rf_event_log_uri`: str — Redfish URI for the event log ``Entries`` collection.
- `rf_chassis_devices`: Optional[List[str]] — Chassis designations for Assembly GETs; required with ``rf_assembly_uri_template``.
- `rf_assembly_uri_template`: Optional[str] — Redfish URI template containing ``{device}`` for each chassis Assembly resource.
- `rf_firmware_bundle_uri`: Optional[str] — Redfish URI for firmware bundle inventory when subclasses extract component details.
- `follow_next_link`: bool — If True, follow Members@odata.nextLink up to max_pages; else single GET.
- `max_pages`: int — Safety cap on the number of pages when following event log pagination.
- `top`: Optional[int] — Most recent N entries via $skip after count probe; None collects full window.
- `reference_time`: Optional[str] — Optional ISO-8601 date or date-time used with time_operator (e.g. 2026-05-17 or 2026-05-17T13:01:00).
- `time_operator`: Optional[Literal['>', '>=', '<', '<=', '==']] — Comparison operator applied when reference_time is set. | [ServiceabilityDataModel](#ServiceabilityDataModel-Model) | [MI3XXCollector](#Collector-Class-MI3XXCollector) | [MI3XXAnalyzer](#Data-Analyzer-Class-MI3XXAnalyzer) | +| ServiceabilityPluginBase | - | - | - | [ServiceabilityDataModel](#ServiceabilityDataModel-Model) | [ServiceabilityCollectorBase](#Collector-Class-ServiceabilityCollectorBase) | - | + # Collectors +## Collector Class GenericCollectionCollector + +### Description + +Run user-configured shell commands and report per-command success. + +**Bases**: ['InBandDataCollector'] + +**Link to code**: [generic_collection_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/generic_collection/generic_collection_collector.py) + +### Class Variables + +- **SUPPORTED_OS_FAMILY**: `{, , }` + +### Provides Data + +GenericCollectionDataModel + +### Documented collection + +- Runs each command from collection_args.commands on the target (in-band host or BMC over OOB SSH). +- Commands are user-configured; there are no fixed CMD_* class fields. + ## Collector Class AmdSmiCollector ### Description @@ -947,8 +983,115 @@ UptimeDataModel - uptime +## Collector Class BmcArchiveCollector + +### Description + +Archive BMC directories over SSH using tar czf - . + +**Bases**: ['InBandDataCollector'] + +**Link to code**: [bmc_archive_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/bmc_archive/bmc_archive_collector.py) + +### Class Variables + +- **SUPPORTED_OS_FAMILY**: `{, }` +- **REMOTE_ARCHIVE_TEMPLATE**: `/tmp/node_scraper_{name}.tar.gz` +- **_tar_ignore_failed_read_supported**: `None` + +### Provides Data + +BmcArchiveDataModel + +### Documented collection + +- SSH (BMC) shell: tar+gzip archives for each path in collection_args (see PathSpec entries). +- Uses sudo on the BMC when collection_args paths require elevated access. + +## Collector Class RedfishEndpointCollector + +### Description + +Collects Redfish endpoint responses for URIs specified in config. + +**Bases**: ['RedfishDataCollector'] + +**Link to code**: [endpoint_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_endpoint/endpoint_collector.py) + +### Provides Data + +RedfishEndpointDataModel + +### Documented collection + +- Redfish GET: explicit paths from collection_args.uris (parallel when max_workers>1). +- Optional paged GET following the Members collection OData nextLink field when follow_next_link is true. +- Redfish GET tree: when discover_tree is true, walks from api_root using OData resource id links and Members navigation (depth and endpoint caps from collection_args). + +## Collector Class RedfishOemDiagCollector + +### Description + +Collects Redfish OEM diagnostic logs (e.g. JournalControl, AllLogs) via LogService.CollectDiagnosticData. + +**Bases**: ['RedfishDataCollector'] + +**Link to code**: [oem_diag_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_collector.py) + +### Provides Data + +RedfishOemDiagDataModel + +### Documented collection + +- Redfish LogService.CollectDiagnosticData for each entry in collection_args.oem_diagnostic_types (collection_args.log_service_path selects the LogService). +- Optional binary archives under the plugin log path when log_path is set. + +## Collector Class MI3XXCollector + +### Description + +Collect MI3XX BMC Redfish data: event log members (with pagination), firmware inventory, + CPER attachment bytes for qualifying events, and optional assembly/chassis metadata. + +**Bases**: ['ServiceabilityCollectorBase'] + +**Link to code**: [mi3xx_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py) + +### Provides Data + +ServiceabilityDataModel + +## Collector Class ServiceabilityCollectorBase + +### Description + +OOB Redfish collection skeleton; subclasses implement filtering, CPER handling, and JSON parsing. + +**Bases**: ['RedfishDataCollector', 'Generic'] + +**Link to code**: [serviceability_collector.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/serviceability_collector.py) + +### Provides Data + +ServiceabilityDataModel + # Data Models +## GenericCollectionDataModel Model + +### Description + +Results for each command configured in collection_args. + +**Link to code**: [generic_collection_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/generic_collection/generic_collection_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **results**: `list[nodescraper.plugins.generic_collection.generic_collection_data.CommandCollectionResult]` + ## AmdSmiDataModel Model ### Description @@ -1286,6 +1429,22 @@ Data model for RDMA (Remote Direct Memory Access) statistics and link informatio - **dev_list**: `list[nodescraper.plugins.inband.rdma.rdmadata.RdmaDevice]` - **link_list_text**: `list[nodescraper.plugins.inband.rdma.rdmadata.RdmaLinkText]` +## RegexSearchData Model + +### Description + +Loaded file or directory contents passed to the analyzer (via --data). + +**Link to code**: [regex_search_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/regex_search/regex_search_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **content**: `str` +- **data_root**: `str` +- **files**: `dict[str, str]` + ## RocmDataModel Model **Link to code**: [rocmdata.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/rocm/rocmdata.py) @@ -1378,8 +1537,85 @@ Data model for in band syslog logs - **current_time**: `str` - **uptime**: `str` +## BmcArchiveDataModel Model + +### Description + +Collected BMC directory archives. + +**Link to code**: [bmc_archive_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/bmc_archive/bmc_archive_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **results**: `list[nodescraper.plugins.ooband.bmc_archive.bmc_archive_data.ArchiveCollectionResult]` +- **archives**: `list[nodescraper.connection.inband.inband.BinaryFileArtifact]` + +## RedfishEndpointDataModel Model + +### Description + +Collected Redfish endpoint responses: URI -> JSON body. + +**Link to code**: [endpoint_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_endpoint/endpoint_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **responses**: `dict[str, dict]` + +## RedfishOemDiagDataModel Model + +### Description + +Collected Redfish OEM diagnostic log results: OEM type -> result (success, error, metadata). + +**Link to code**: [oem_diag_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **results**: `dict[str, nodescraper.plugins.ooband.redfish_oem_diag.oem_diag_data.OemDiagTypeResult]` + +## ServiceabilityDataModel Model + +### Description + +Collected Redfish responses and intermediate serviceability fields. + +**Link to code**: [serviceability_data.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/serviceability_data.py) + +**Bases**: ['DataModel'] + +### Model annotations and fields + +- **responses**: `dict[str, Any]` +- **rf_events**: `list[Any]` +- **assembly_info**: `Dict[str, DeviceInfo]` +- **cper_raw**: `Dict[str, str]` +- **cper_data**: `Dict[str, Any]` +- **component_details**: `Optional[str]` +- **log_path**: `Optional[str]` +- **bmc_host**: `Optional[str]` +- **afid_events**: `List[AfidEvent]` +- **serviceability**: `Optional[ServiceabilityBlock]` +- **result**: `Optional[ServiceabilityResult]` + # Data Analyzers +## Data Analyzer Class GenericAnalyzer + +### Description + +Validate generic collection command results against analysis_args checks. + +**Bases**: ['DataAnalyzer'] + +**Link to code**: [generic_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/generic_collection/generic_analyzer.py) + ## Data Analyzer Class AmdSmiAnalyzer ### Description @@ -1709,6 +1945,20 @@ Check RDMA statistics for errors (RoCE and other RDMA error counters). **Link to code**: [rdma_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/rdma/rdma_analyzer.py) +## Data Analyzer Class RegexSearchAnalyzer + +### Description + +Run user-provided regexes against text loaded from --data (file or directory). + +**Bases**: ['RegexAnalyzer'] + +**Link to code**: [regex_search_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/regex_search/regex_search_analyzer.py) + +### Class Variables + +- **ERROR_REGEX**: `[]` + ## Data Analyzer Class RocmAnalyzer ### Description @@ -1753,8 +2003,58 @@ Check sysctl matches expected sysctl details **Link to code**: [sysctl_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/sysctl/sysctl_analyzer.py) +## Data Analyzer Class RedfishEndpointAnalyzer + +### Description + +Checks Redfish endpoint responses against configured thresholds and key/value rules. + +**Bases**: ['DataAnalyzer'] + +**Link to code**: [endpoint_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_endpoint/endpoint_analyzer.py) + +### Documented analysis + +- For each entry in analysis_args.checks, reads JSON paths in collected responses and compares values to constraints (eq, min/max, anyOf, regex, etc.). +- URI key "*" runs checks against every collected response body. + +## Data Analyzer Class RedfishOemDiagAnalyzer + +### Description + +Analyzes Redfish OEM diagnostic log collection results. + +**Bases**: ['DataAnalyzer'] + +**Link to code**: [oem_diag_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_analyzer.py) + +### Documented analysis + +- Summarizes success/failure per OEM diagnostic type from collected results. +- When analysis_args.require_all_success is true, fails the run if any type failed collection. + +## Data Analyzer Class MI3XXAnalyzer + +### Description + +Build AFID events from collected data and run the configured service hub. + +**Bases**: ['DataAnalyzer'] + +**Link to code**: [mi3xx_analyzer.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py) + # Analyzer Args +## Analyzer Args Class GenericAnalyzerArgs + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/generic_collection/analyzer_args.py) + +### Annotations / fields + +- **checks**: `list[nodescraper.plugins.generic_collection.analyzer_args.CommandCheck]` — Per-command validation rules keyed by collected command name. + ## Analyzer Args Class AmdSmiAnalyzerArgs **Bases**: ['AnalyzerArgs'] @@ -1973,6 +2273,22 @@ Arguments for PCIe analyzer - **max_kfd_processes**: `int` — Maximum allowed number of KFD (Kernel Fusion Driver) processes; 0 disables the check. - **max_cpu_usage**: `float` — Maximum allowed CPU usage (percent) for process checks. +## Analyzer Args Class RegexSearchAnalyzerArgs + +### Description + +Arguments for RegexSearchAnalyzer (dict items match Dmesg-style error_regex). + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/inband/regex_search/analyzer_args.py) + +### Annotations / fields + +- **error_regex**: `Optional[list[dict[str, Any]]]` — Regex patterns to search for; each dict may include regex (str), message, event_category, event_priority (same as Dmesg analyzer error_regex). +- **interval_to_collapse_event**: `int` — Seconds within which repeated events are collapsed into one. +- **num_timestamps**: `int` — Number of timestamps to include per event in output. + ## Analyzer Args Class RocmAnalyzerArgs **Bases**: ['AnalyzerArgs'] @@ -2021,3 +2337,57 @@ Sysfs settings for analysis via a list of checks (path, expected values, name). - **exp_vm_dirty_ratio**: `Optional[int]` — Expected vm.dirty_ratio value. - **exp_vm_dirty_writeback_centisecs**: `Optional[int]` — Expected vm.dirty_writeback_centisecs value. - **exp_kernel_numa_balancing**: `Optional[int]` — Expected kernel.numa_balancing value. + +## Analyzer Args Class RedfishEndpointAnalyzerArgs + +### Description + +Analyzer args for config-driven Redfish checks. + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_endpoint/analyzer_args.py) + +### Annotations / fields + +- **checks**: `dict[str, dict[str, Union[int, float, str, bool, dict[str, Any]]]]` — Map: URI or '*' -> { property_path: constraint }. URI keys must match a key in the collected responses (exact match). Use '*' as the key to apply the inner constraints to every collected response body. Property paths use '/' for nesting and indices, e.g. 'Status/Health', 'PowerControl/0/PowerConsumedWatts'. Constraints: 'eq' — value must equal the given literal (int, float, str, bool). 'min' — value must be numeric and >= the given number. 'max' — value must be numeric and <= the given number. 'anyOf' — value must be in the given list (OR; any match passes). Example: { "/redfish/v1/Systems/1": { "Status/Health": { "anyOf": ["OK", "Warning"] }, "PowerState": "On" }, "*": { "Status/Health": { "anyOf": ["OK"] } } }. + +## Analyzer Args Class RedfishOemDiagAnalyzerArgs + +### Description + +Analyzer args for Redfish OEM diagnostic log results. + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/ooband/redfish_oem_diag/analyzer_args.py) + +### Annotations / fields + +- **require_all_success**: `bool` — If True, analysis fails when any OEM type collection failed. + +## Analyzer Args Class ServiceabilityAnalyzerArgs + +### Description + +Analyzer args for serviceability plugins that run a configurable Python hub. + +**Bases**: ['AnalyzerArgs'] + +**Link to code**: [analyzer_args.py](https://github.com/amd/node-scraper/blob/HEAD/nodescraper/plugins/serviceability/analyzer_args.py) + +### Annotations / fields + +- **hub_python_module**: `Optional[str]` — Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs. +- **hub_display_name**: `Optional[str]` — Optional label for analyzer status messages. +- **afid_sag_path**: `Optional[str]` — Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg. +- **hub_init_path_kwarg**: `str` — Hub __init__ keyword that receives afid_sag_path. +- **hub_analyze_method**: `str` — Hub method called with rf_events first (default get_service_info). +- **skip_hub**: `bool` — If True, only build afid_events without running the service hub. +- **cper_decode_module**: `Optional[str]` — Module import path for CPER decoding when events include CPER attachments. +- **cper_decode_method**: `str` — Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out. +- **hub_options**: `Optional[dict[str, Any]]` — Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key. +- **from_ac_cycle**: `int` — from_ac_cycle kwarg for the hub analyze call (merged after hub_options). +- **from_date**: `Optional[str]` — Optional from_date for the hub analyze call (merged after hub_options). +- **designation_serials**: `Optional[dict[str, str]]` — Optional designation_serials for the hub analyze call (merged after hub_options). +- **suppress_service_actions**: `Optional[list[str]]` — Optional suppress_service_actions for the hub analyze call (merged after hub_options). diff --git a/docs/generate_plugin_doc_bundle.py b/docs/generate_plugin_doc_bundle.py index b7676b6a..cd9897b0 100644 --- a/docs/generate_plugin_doc_bundle.py +++ b/docs/generate_plugin_doc_bundle.py @@ -26,10 +26,8 @@ """ Usage python generate_plugin_doc_bundle.py \ - --package /home/alexbara/node-scraper/nodescraper/plugins/inband \ - --output PLUGIN_DOC.md \ + --output docs/PLUGIN_DOC.md \ --update-readme-help - """ import argparse import importlib @@ -43,8 +41,14 @@ from typing import Any, Iterable, List, Optional, Type LINK_BASE_DEFAULT = "https://github.com/amd/node-scraper/blob/HEAD/" -REL_ROOT_DEFAULT = "nodescraper/plugins/inband" -DEFAULT_ROOT_PACKAGE = "nodescraper.plugins" +REL_ROOT_DEFAULT = "nodescraper/plugins" +# Import and document every concrete plugin under nodescraper.plugins (inband, ooband, +# generic_collection, regex_search, serviceability, …). +PACKAGE_PLUGINS_ROOT = "nodescraper.plugins" +# ``plugins_for_package_prefix`` matches on ``cls.__module__``; keep the trailing dot so +# ``nodescraper.plugins`` itself does not match every module starting with that string. +PLUGIN_MODULE_PREFIX = f"{PACKAGE_PLUGINS_ROOT}." +DEFAULT_PACKAGES = (PACKAGE_PLUGINS_ROOT,) def get_attr(obj: Any, name: str, default: Any = None) -> Any: @@ -182,6 +186,54 @@ def find_inband_plugin_base(): return get_attr(base_mod, "InBandDataPlugin") +def find_oob_plugin_bases() -> tuple[type, ...]: + """Return OOB plugin base classes (Redfish + BMC SSH) used to discover OOB plugins.""" + base_mod = importlib.import_module("nodescraper.base") + oob = get_attr(base_mod, "OOBandDataPlugin") + oob_ssh = get_attr(base_mod, "OOBSSHDataPlugin") + bases = [b for b in (oob, oob_ssh) if b is not None] + return tuple(bases) + + +def is_concrete_plugin_class(cls: type) -> bool: + if not inspect.isclass(cls): + return False + return not bool(get_attr(cls, "__abstractmethods__", set())) + + +def all_subclasses_union(bases: Iterable[type]) -> set[type]: + """All distinct concrete descendants across one or more base classes (transitive).""" + merged: set[type] = set() + for base in bases: + merged |= all_subclasses_single(base) + return merged + + +def all_subclasses_single(cls: type) -> set[type]: + seen, out, work = set(), set(), [cls] + while work: + parent = work.pop() + for sub in parent.__subclasses__(): + if sub not in seen: + seen.add(sub) + out.add(sub) + work.append(sub) + return out + + +def plugins_for_package_prefix(base_classes: Iterable[type], package_prefix: str) -> List[type]: + """Non-abstract plugin classes under ``base_classes`` whose ``__module__`` starts with *package_prefix*.""" + found: List[type] = [] + for cls in all_subclasses_union(base_classes): + mod = getattr(cls, "__module__", "") or "" + if not mod.startswith(package_prefix): + continue + if not is_concrete_plugin_class(cls): + continue + found.append(cls) + return found + + def link_anchor(obj: Any, kind: str) -> str: if obj is None or not inspect.isclass(obj): return "-" @@ -228,6 +280,126 @@ def add_cmd(s: Any): return cmds +# Optional human-readable bullets for plugins without CMD_* shell snippets (e.g. Redfish). +DOCUMENTATION_COLLECTION_ITEMS_ATTR = "DOCUMENTATION_COLLECTION_ITEMS" +DOCUMENTATION_ANALYSIS_ITEMS_ATTR = "DOCUMENTATION_ANALYSIS_ITEMS" + + +def _documentation_lines_for_attr(cls: Any, attr_name: str) -> List[str]: + if cls is None or not inspect.isclass(cls): + return [] + raw = get_attr(cls, attr_name, None) + if raw is None: + return [] + if isinstance(raw, str): + return [ln.strip() for ln in raw.splitlines() if ln.strip()] + if isinstance(raw, (list, tuple)): + return [str(x).strip() for x in raw if isinstance(x, str) and str(x).strip()] + return [] + + +def merge_unique_lines(*line_groups: Iterable[str]) -> List[str]: + """Concatenate line groups, dropping exact duplicates while preserving order.""" + seen: set[str] = set() + out: List[str] = [] + for group in line_groups: + for line in group: + if line not in seen: + seen.add(line) + out.append(line) + return out + + +def extract_collection_lines_for_table(plugin_cls: type, collector_cls: Any) -> List[str]: + """Shell CMD_* lines plus optional DOCUMENTATION_COLLECTION_ITEMS (collector then plugin).""" + cmd_lines: List[str] = [] + if inspect.isclass(collector_cls): + cmd_lines = extract_cmds_from_classvars(collector_cls) + doc_collector = _documentation_lines_for_attr( + collector_cls, DOCUMENTATION_COLLECTION_ITEMS_ATTR + ) + doc_plugin = _documentation_lines_for_attr(plugin_cls, DOCUMENTATION_COLLECTION_ITEMS_ATTR) + return merge_unique_lines(cmd_lines, doc_collector, doc_plugin) + + +def extract_analysis_doc_lines_for_table(plugin_cls: type, analyzer_cls: Any) -> List[str]: + """Optional DOCUMENTATION_ANALYSIS_ITEMS (analyzer then plugin) for the analyzer column.""" + doc_an = _documentation_lines_for_attr(analyzer_cls, DOCUMENTATION_ANALYSIS_ITEMS_ATTR) + doc_pl = _documentation_lines_for_attr(plugin_cls, DOCUMENTATION_ANALYSIS_ITEMS_ATTR) + return merge_unique_lines(doc_an, doc_pl) + + +def iter_plugin_collector_classes(plugin_cls: type) -> List[type]: + """Return collector class(es) for a plugin (supports tuple COLLECTOR via DataPlugin.get_collector_classes).""" + gcs = getattr(plugin_cls, "get_collector_classes", None) + if callable(gcs): + try: + return [c for c in gcs() if inspect.isclass(c)] + except Exception: + return [] + return [] + + +def collector_has_table_collection_coverage(plugin_cls: type, collector_cls: type) -> bool: + """True if the plugin table Collection cell would be non-empty from CMD_* or documentation lines.""" + if extract_cmds_from_classvars(collector_cls): + return True + if _documentation_lines_for_attr(collector_cls, DOCUMENTATION_COLLECTION_ITEMS_ATTR): + return True + if _documentation_lines_for_attr(plugin_cls, DOCUMENTATION_COLLECTION_ITEMS_ATTR): + return True + return False + + +def analyzer_has_table_analysis_coverage( + plugin_cls: type, analyzer_cls: type, analyzer_args_cls: Any +) -> bool: + """True if the Analyzer Args table cell would be non-empty from regex/args extraction or doc lines.""" + if _documentation_lines_for_attr(analyzer_cls, DOCUMENTATION_ANALYSIS_ITEMS_ATTR): + return True + if _documentation_lines_for_attr(plugin_cls, DOCUMENTATION_ANALYSIS_ITEMS_ATTR): + return True + if extract_regexes_and_args_from_analyzer(analyzer_cls, analyzer_args_cls): + return True + return False + + +def collect_plugin_doc_table_coverage_messages(plugins: List[type]) -> List[str]: + """Messages for plugins whose generated table would show '-' for collection or analysis unjustifiably.""" + msgs: List[str] = [] + for p in plugins: + pname = p.__name__ + for c in iter_plugin_collector_classes(p): + if not collector_has_table_collection_coverage(p, c): + msgs.append( + f"{pname}: collector {c.__name__} has no CMD_* command strings and no " + f"{DOCUMENTATION_COLLECTION_ITEMS_ATTR} on the collector or plugin." + ) + an = get_attr(p, "ANALYZER", None) + aargs = get_attr(p, "ANALYZER_ARGS", None) + if inspect.isclass(an) and not analyzer_has_table_analysis_coverage(p, an, aargs): + msgs.append( + f"{pname}: analyzer {an.__name__} has no extractable analyzer table content " + f"(built-in regexes / *REGEX* attrs / analyzer args fields) and no " + f"{DOCUMENTATION_ANALYSIS_ITEMS_ATTR} on the analyzer or plugin." + ) + return msgs + + +def emit_plugin_doc_coverage_warnings(msgs: List[str], *, strict: bool) -> None: + if not msgs: + return + sys.stderr.write("PLUGIN_DOC.md table coverage warnings:\n") + for m in msgs: + sys.stderr.write(f" WARNING: {m}\n") + if strict: + sys.stderr.write( + f"error: {len(msgs)} plugin documentation coverage warning(s) " + "(--strict-plugin-doc-coverage)\n" + ) + sys.exit(1) + + def extract_regexes_and_args_from_analyzer( analyzer_cls: type, args_cls: Optional[type] ) -> List[str]: @@ -335,7 +507,8 @@ def escape_table_cell(s: str) -> str: """ if not s: return s - return s.replace("|", "|").replace("\n", " ").replace("\r", " ") + # Avoid @ in cells (e.g. OData property names) being turned into mail/mention links in Outlook/HTML viewers. + return s.replace("|", "|").replace("@", "@").replace("\n", " ").replace("\r", " ") def md_header(text: str, level: int = 2) -> str: @@ -454,14 +627,14 @@ def generate_plugin_table_rows(plugins: List[type]) -> List[List[str]]: an = get_attr(p, "ANALYZER", None) args = get_attr(p, "ANALYZER_ARGS", None) collector_args_cls = get_attr(p, "COLLECTOR_ARGS", None) - cmds: List[str] = [] - if inspect.isclass(col): - cmds = extract_cmds_from_classvars(col) + cmds = extract_collection_lines_for_table(p, col) - # Extract regexes and args from analyzer - regex_and_args = [] + # Extract regexes and args from analyzer; optional DOCUMENTATION_ANALYSIS_* lines first + regex_and_args: List[str] = extract_analysis_doc_lines_for_table( + p, an if inspect.isclass(an) else None + ) if inspect.isclass(an): - regex_and_args = extract_regexes_and_args_from_analyzer(an, args) + regex_and_args.extend(extract_regexes_and_args_from_analyzer(an, args)) # Extract collection args from collector args class collection_args_lines = extract_collection_args_from_collector_args(collector_args_cls) @@ -504,7 +677,13 @@ def render_collector_section(col: type, link_base: str, rel_root: Optional[str]) _url = setup_link(col, link_base, rel_root) s += md_kv("Link to code", f"[{Path(_url).name}]({_url})") - exclude = {"__doc__", "__module__", "__weakref__", "__dict__"} + exclude = { + "__doc__", + "__module__", + "__weakref__", + "__dict__", + DOCUMENTATION_COLLECTION_ITEMS_ATTR, + } cv = class_vars_dump(col, exclude) if cv: s += md_header("Class Variables", 3) + md_list(cv) @@ -516,6 +695,10 @@ def render_collector_section(col: type, link_base: str, rel_root: Optional[str]) if cmds: s += md_header("Commands", 3) + md_list(cmds) + doc_coll = _documentation_lines_for_attr(col, DOCUMENTATION_COLLECTION_ITEMS_ATTR) + if doc_coll: + s += md_header("Documented collection", 3) + md_list(doc_coll) + return s @@ -529,11 +712,21 @@ def render_analyzer_section(an: type, link_base: str, rel_root: Optional[str]) - _url = setup_link(an, link_base, rel_root) s += md_kv("Link to code", f"[{Path(_url).name}]({_url})") - exclude = {"__doc__", "__module__", "__weakref__", "__dict__"} + exclude = { + "__doc__", + "__module__", + "__weakref__", + "__dict__", + DOCUMENTATION_ANALYSIS_ITEMS_ATTR, + } cv = class_vars_dump(an, exclude) if cv: s += md_header("Class Variables", 3) + md_list(cv) + doc_an = _documentation_lines_for_attr(an, DOCUMENTATION_ANALYSIS_ITEMS_ATTR) + if doc_an: + s += md_header("Documented analysis", 3) + md_list(doc_an) + # Add regex patterns if present (pass None for args_cls since we don't have context here) regex_info = extract_regexes_and_args_from_analyzer(an, None) if regex_info: @@ -648,9 +841,21 @@ def main(): description="Generate Plugin Table and detail sections with setup_link + rel-root." ) ap.add_argument( - "--package", default=DEFAULT_ROOT_PACKAGE, help="Dotted package or filesystem path" + "--package", + action="append", + dest="packages", + default=None, + metavar="PKG", + help=( + "Dotted package or filesystem path to import in addition to the default plugin " + f"packages ({', '.join(DEFAULT_PACKAGES)}). Repeatable." + ), + ) + ap.add_argument( + "--output", + default="docs/PLUGIN_DOC.md", + help="Output Markdown file (default: docs/PLUGIN_DOC.md under repo root)", ) - ap.add_argument("--output", default="PLUGIN_DOC.md", help="Output Markdown file") ap.add_argument( "--update-readme-help", action="store_true", @@ -661,31 +866,57 @@ def main(): default=None, help="Path to README.md (default: README.md in current working directory)", ) + ap.add_argument( + "--strict-plugin-doc-coverage", + action="store_true", + help=( + "Exit with status 1 if any plugin lacks CMD_* / DOCUMENTATION_COLLECTION_ITEMS " + "for collectors or lacks analyzer table content / DOCUMENTATION_ANALYSIS_ITEMS " + "when an analyzer is defined." + ), + ) args = ap.parse_args() - root = args.package - root_path = Path(root) - if os.sep in root or root_path.exists(): - root = dotted_from_path(root_path) - base = find_inband_plugin_base() - import_all_modules(root) - - def all_subclasses(cls: Type) -> set[type]: - seen, out, work = set(), set(), [cls] - while work: - parent = work.pop() - for sub in parent.__subclasses__(): - if sub not in seen: - seen.add(sub) - out.add(sub) - work.append(sub) - return out - - plugins = [c for c in all_subclasses(base) if c is not base] - plugins = [c for c in plugins if not get_attr(c, "__abstractmethods__", set())] - plugins.sort(key=lambda c: f"{c.__module__}.{c.__name__}".lower()) - - rows = generate_plugin_table_rows(plugins) + normalized_extra: List[str] = [] + if args.packages: + for root in args.packages: + root_path = Path(root) + if os.sep in root or root_path.exists(): + root = dotted_from_path(root_path) + normalized_extra.append(root) + + # Always import the full nodescraper.plugins tree; append optional extras. + to_import: List[str] = [] + seen_pkg: set[str] = set() + for pkg in list(DEFAULT_PACKAGES) + normalized_extra: + if pkg not in seen_pkg: + seen_pkg.add(pkg) + to_import.append(pkg) + + for pkg in to_import: + import_all_modules(pkg) + + inband_base = find_inband_plugin_base() + oob_bases = find_oob_plugin_bases() + + ib_plugins = sorted( + plugins_for_package_prefix((inband_base,), PLUGIN_MODULE_PREFIX), + key=lambda c: f"{c.__module__}.{c.__name__}".lower(), + ) + oob_plugins = sorted( + plugins_for_package_prefix(oob_bases, PLUGIN_MODULE_PREFIX), + key=lambda c: f"{c.__module__}.{c.__name__}".lower(), + ) + plugins = sorted( + set(ib_plugins) | set(oob_plugins), + key=lambda c: f"{c.__module__}.{c.__name__}".lower(), + ) + + coverage_msgs = collect_plugin_doc_table_coverage_messages(plugins) + emit_plugin_doc_coverage_warnings(coverage_msgs, strict=args.strict_plugin_doc_coverage) + + ib_rows = generate_plugin_table_rows(ib_plugins) + oob_rows = generate_plugin_table_rows(oob_plugins) headers = [ "Plugin", "Collection", @@ -718,8 +949,10 @@ def all_subclasses(cls: Type) -> set[type]: out = [] out.append(md_header("Plugin Documentation", 1)) - out.append(md_header("Plugin Table", 1)) - out.append(render_table(headers, rows)) + out.append(md_header("IB Plugins", 1)) + out.append(render_table(headers, ib_rows)) + out.append(md_header("OOB plugins", 1)) + out.append(render_table(headers, oob_rows)) if collectors: out.append(md_header("Collectors", 1)) diff --git a/nodescraper/cli/cli.py b/nodescraper/cli/cli.py index 129ef136..30dc8792 100644 --- a/nodescraper/cli/cli.py +++ b/nodescraper/cli/cli.py @@ -32,6 +32,7 @@ import platform import sys import uuid +from collections.abc import Callable, Sequence from typing import Optional import nodescraper @@ -65,6 +66,7 @@ from nodescraper.constants import DEFAULT_LOGGER from nodescraper.enums import ExecutionStatus, SystemInteractionLevel, SystemLocation from nodescraper.models import SystemInfo +from nodescraper.models.pluginresult import PluginResult from nodescraper.pluginexecutor import PluginExecutor from nodescraper.pluginregistry import PluginRegistry @@ -461,6 +463,7 @@ def main( arg_input: Optional[list[str]] = None, *, host_cli_args: Optional[argparse.Namespace] = None, + plugin_run_result_hooks: Optional[Sequence[Callable[[PluginResult], None]]] = None, ): """Main entry point for the CLI @@ -468,6 +471,8 @@ def main( arg_input (Optional[list[str]], optional): list of args to parse. Defaults to None. host_cli_args: Optional namespace from an embedding host (e.g. detect-errors) for code that calls get_plugin_run_invocation during the plugin queue. + plugin_run_result_hooks: Optional callbacks invoked with each plugin's :class:`PluginResult` + after ``run()`` completes (used by embedded hosts such as error-scraper). """ if arg_input is None: arg_input = sys.argv[1:] @@ -643,6 +648,7 @@ def main( sname=sname, host_cli_args=host_cli_args, session_id=str(uuid.uuid4()), + plugin_run_result_hooks=plugin_run_result_hooks, ) log_system_info(log_path, system_info, logger) diff --git a/nodescraper/cli/embed.py b/nodescraper/cli/embed.py index 60d94515..b1e91c37 100644 --- a/nodescraper/cli/embed.py +++ b/nodescraper/cli/embed.py @@ -27,9 +27,11 @@ from __future__ import annotations import argparse +from collections.abc import Callable, Sequence from typing import Optional from nodescraper.cli.cli import get_cli_top_level_subcommands +from nodescraper.models.pluginresult import PluginResult CLI_TOP_LEVEL_SUBCOMMANDS = get_cli_top_level_subcommands() @@ -45,29 +47,38 @@ def run_cli_return_code( argv: list[str], *, host_cli_args: Optional[argparse.Namespace] = None, + plugin_run_result_hooks: Optional[Sequence[Callable[[PluginResult], None]]] = None, ) -> int: """Run nodescraper in-process; same behavior as :func:`run_main_return_code`. Args: argv: Tokens after the program name. host_cli_args: Optional host namespace forwarded to :func:`nodescraper.cli.cli.main`. + plugin_run_result_hooks: Optional callbacks invoked with each + :class:`~nodescraper.models.pluginresult.PluginResult` after a plugin finishes (embed hosts). Returns: Integer exit code (``SystemExit`` is mapped, not raised). """ - return run_main_return_code(argv, host_cli_args=host_cli_args) + return run_main_return_code( + argv, + host_cli_args=host_cli_args, + plugin_run_result_hooks=plugin_run_result_hooks, + ) def run_main_return_code( arg_input: list[str], *, host_cli_args: Optional[argparse.Namespace] = None, + plugin_run_result_hooks: Optional[Sequence[Callable[[PluginResult], None]]] = None, ) -> int: """Run :func:`nodescraper.cli.cli.main` and map ``SystemExit`` to an exit code. Args: arg_input: Tokens after the program name. host_cli_args: Optional host namespace for embedded runs. + plugin_run_result_hooks: Optional per-plugin result callbacks for embedded runs. Returns: Integer exit code. @@ -75,7 +86,11 @@ def run_main_return_code( from nodescraper.cli.cli import main try: - main(arg_input, host_cli_args=host_cli_args) + main( + arg_input, + host_cli_args=host_cli_args, + plugin_run_result_hooks=plugin_run_result_hooks, + ) except SystemExit as exc: code = exc.code if code is None: diff --git a/nodescraper/cli/invocation.py b/nodescraper/cli/invocation.py index ee59e4a6..9edc7214 100644 --- a/nodescraper/cli/invocation.py +++ b/nodescraper/cli/invocation.py @@ -28,6 +28,7 @@ import argparse import logging +from collections.abc import Callable, Sequence from contextlib import contextmanager from contextvars import ContextVar from dataclasses import dataclass @@ -72,6 +73,7 @@ class PluginRunInvocation: sname: str host_cli_args: Optional[argparse.Namespace] = None session_id: Optional[str] = None + plugin_run_result_hooks: tuple[Callable[[PluginResult], None], ...] = () def run_plugin_queue_with_invocation( @@ -86,8 +88,12 @@ def run_plugin_queue_with_invocation( sname: str, host_cli_args: Optional[argparse.Namespace] = None, session_id: Optional[str] = None, + plugin_run_result_hooks: Optional[Sequence[Callable[[PluginResult], None]]] = None, ) -> list[PluginResult]: """Constructs the plugin executor, binds invocation context, and runs the plugin queue.""" + hooks_tuple: tuple[Callable[[PluginResult], None], ...] = ( + tuple(plugin_run_result_hooks) if plugin_run_result_hooks else () + ) inv = PluginRunInvocation( plugin_reg=plugin_reg, parsed_args=parsed_args, @@ -99,6 +105,7 @@ def run_plugin_queue_with_invocation( sname=sname, host_cli_args=host_cli_args, session_id=session_id, + plugin_run_result_hooks=hooks_tuple, ) plugin_executor = PluginExecutor( logger=logger, @@ -108,6 +115,7 @@ def run_plugin_queue_with_invocation( log_path=log_path, plugin_registry=plugin_reg, session_id=session_id, + plugin_run_result_hooks=hooks_tuple, ) with plugin_run_invocation_scope(inv): return plugin_executor.run_queue() diff --git a/nodescraper/configbuilder.py b/nodescraper/configbuilder.py index 7823b95a..bc8f1b8a 100644 --- a/nodescraper/configbuilder.py +++ b/nodescraper/configbuilder.py @@ -24,6 +24,7 @@ # ############################################################################### import enum +import inspect import logging from typing import Any, Optional, Type, Union @@ -64,9 +65,17 @@ def gen_config(self, plugin_names: list[str]) -> PluginConfig: @classmethod def _build_plugin_config(cls, plugin_class: Type[PluginInterface]) -> dict: type_map = TypeUtils.get_func_arg_types(plugin_class.run, plugin_class) + run_sig = inspect.signature(plugin_class.run) config = {} for arg, arg_data in type_map.items(): + param = run_sig.parameters.get(arg) + # abstraction level for the ServiceabilityPlugin to allow kwargs for hub call + if param is not None and param.kind in ( + inspect.Parameter.VAR_KEYWORD, + inspect.Parameter.VAR_POSITIONAL, + ): + continue cls._update_config(arg, arg_data, config) return config diff --git a/nodescraper/interfaces/dataanalyzertask.py b/nodescraper/interfaces/dataanalyzertask.py index 0e6b3b06..fd6cc284 100644 --- a/nodescraper/interfaces/dataanalyzertask.py +++ b/nodescraper/interfaces/dataanalyzertask.py @@ -99,7 +99,7 @@ def wrapper( result = analyzer.result result.finalize(analyzer.logger) - analyzer._run_hooks(result) + analyzer._run_hooks(result, data=data) return result diff --git a/nodescraper/interfaces/datacollectortask.py b/nodescraper/interfaces/datacollectortask.py index 3c30a6ea..60826b16 100644 --- a/nodescraper/interfaces/datacollectortask.py +++ b/nodescraper/interfaces/datacollectortask.py @@ -204,7 +204,8 @@ def __init_subclass__(cls, **kwargs) -> None: if not issubclass(cls.DATA_MODEL, DataModel): raise TypeError(f"DATA_MODEL must be a subclass of DataModel in {cls.__name__}") if hasattr(cls, "collect_data"): - cls.collect_data = collect_decorator(cls.collect_data) + if "collect_data" in vars(cls): + cls.collect_data = collect_decorator(cls.collect_data) else: raise TypeError(f"Data collector {cls.__name__} must implement collect_data") diff --git a/nodescraper/interfaces/plugin.py b/nodescraper/interfaces/plugin.py index 06959b54..9e22d346 100644 --- a/nodescraper/interfaces/plugin.py +++ b/nodescraper/interfaces/plugin.py @@ -26,7 +26,7 @@ import abc import inspect import logging -from typing import Callable, Generic, Optional, Type, Union +from typing import Any, Callable, Generic, Optional, Type, Union from nodescraper.constants import DEFAULT_EVENT_REPORTER, DEFAULT_LOGGER from nodescraper.models import PluginResult, SystemInfo @@ -125,7 +125,7 @@ def _update_queue(self, queue_item: tuple) -> None: self.queue_callback(queue_item) @abc.abstractmethod - def run(self, **kwargs) -> PluginResult: + def run(self, **kwargs: Any) -> PluginResult: """Plugin run function Returns: diff --git a/nodescraper/pluginexecutor.py b/nodescraper/pluginexecutor.py index 4f3febed..bb22b8a9 100644 --- a/nodescraper/pluginexecutor.py +++ b/nodescraper/pluginexecutor.py @@ -30,6 +30,7 @@ import logging import uuid from collections import deque +from collections.abc import Callable, Sequence from typing import Optional, Type, Union from pydantic import BaseModel @@ -38,6 +39,7 @@ from nodescraper.connection.oob_ssh import OobSshConnectionManager from nodescraper.constants import DEFAULT_LOGGER from nodescraper.interfaces import ConnectionManager, DataPlugin, PluginInterface +from nodescraper.interfaces.taskresulthook import TaskResultHook from nodescraper.models import PluginConfig, SystemInfo from nodescraper.models.pluginresult import PluginResult from nodescraper.pluginregistry import PluginRegistry @@ -57,6 +59,7 @@ def __init__( plugin_registry: Optional[PluginRegistry] = None, log_path: Optional[str] = None, session_id: Optional[str] = None, + plugin_run_result_hooks: Optional[Sequence[Callable[[PluginResult], None]]] = None, ): if logger is None: @@ -89,7 +92,11 @@ def __init__( self.log_path = log_path - self.connection_result_hooks = [] + self.plugin_run_result_hooks: list[Callable[[PluginResult], None]] = ( + list(plugin_run_result_hooks) if plugin_run_result_hooks else [] + ) + + self.connection_result_hooks: list[TaskResultHook] = [] if log_path: self.connection_result_hooks.append(FileSystemLogHook(log_base_path=log_path)) @@ -263,7 +270,10 @@ def run_queue(self) -> list[PluginResult]: continue self.logger.info("-" * 50) - plugin_results.append(plugin_inst.run(**run_payload)) + plugin_result = plugin_inst.run(**run_payload) + plugin_results.append(plugin_result) + for hook in self.plugin_run_result_hooks: + hook(plugin_result) except Exception as e: self.logger.exception( "Unexpected exception when running plugin %s: %s", plugin_name, e diff --git a/nodescraper/plugins/generic_collection/generic_collection_collector.py b/nodescraper/plugins/generic_collection/generic_collection_collector.py index 873f572a..1c15462b 100644 --- a/nodescraper/plugins/generic_collection/generic_collection_collector.py +++ b/nodescraper/plugins/generic_collection/generic_collection_collector.py @@ -41,6 +41,11 @@ class GenericCollectionCollector( DATA_MODEL = GenericCollectionDataModel SUPPORTED_OS_FAMILY: set[OSFamily] = {OSFamily.WINDOWS, OSFamily.LINUX, OSFamily.UNKNOWN} + DOCUMENTATION_COLLECTION_ITEMS: tuple[str, ...] = ( + "Runs each command from collection_args.commands on the target (in-band host or BMC over OOB SSH).", + "Commands are user-configured; there are no fixed CMD_* class fields.", + ) + def collect_data( self, args: Optional[GenericCollectionCollectorArgs] = None ) -> tuple[TaskResult, Optional[GenericCollectionDataModel]]: diff --git a/nodescraper/plugins/inband/amdsmi/amdsmidata.py b/nodescraper/plugins/inband/amdsmi/amdsmidata.py index fd603028..3b8aae3c 100644 --- a/nodescraper/plugins/inband/amdsmi/amdsmidata.py +++ b/nodescraper/plugins/inband/amdsmi/amdsmidata.py @@ -523,6 +523,9 @@ class StaticCacheInfoItem(AmdSmiBaseModel): na_validator = field_validator("cache_size", mode="before")(na_to_none) +_STATIC_CLOCK_FREQ_LEVEL_VALIDATOR_FIELDS = tuple(f"Level_{i}" for i in range(16)) + + class StaticFrequencyLevels(AmdSmiBaseModel): """Static clock frequency levels; each level is normalized to ``ValueUnit``.""" @@ -534,8 +537,21 @@ class StaticFrequencyLevels(AmdSmiBaseModel): Level_0: ValueUnit = Field(..., alias="Level 0") Level_1: Optional[ValueUnit] = Field(default=None, alias="Level 1") Level_2: Optional[ValueUnit] = Field(default=None, alias="Level 2") - - _level_value_unit = field_validator("Level_0", "Level_1", "Level_2", mode="before")( + Level_3: Optional[ValueUnit] = Field(default=None, alias="Level 3") + Level_4: Optional[ValueUnit] = Field(default=None, alias="Level 4") + Level_5: Optional[ValueUnit] = Field(default=None, alias="Level 5") + Level_6: Optional[ValueUnit] = Field(default=None, alias="Level 6") + Level_7: Optional[ValueUnit] = Field(default=None, alias="Level 7") + Level_8: Optional[ValueUnit] = Field(default=None, alias="Level 8") + Level_9: Optional[ValueUnit] = Field(default=None, alias="Level 9") + Level_10: Optional[ValueUnit] = Field(default=None, alias="Level 10") + Level_11: Optional[ValueUnit] = Field(default=None, alias="Level 11") + Level_12: Optional[ValueUnit] = Field(default=None, alias="Level 12") + Level_13: Optional[ValueUnit] = Field(default=None, alias="Level 13") + Level_14: Optional[ValueUnit] = Field(default=None, alias="Level 14") + Level_15: Optional[ValueUnit] = Field(default=None, alias="Level 15") + + _level_value_unit = field_validator(*_STATIC_CLOCK_FREQ_LEVEL_VALIDATOR_FIELDS, mode="before")( coerce_value_unit_input ) diff --git a/nodescraper/plugins/inband/pcie/pcie_collector.py b/nodescraper/plugins/inband/pcie/pcie_collector.py index eb3bb5f7..624122ec 100755 --- a/nodescraper/plugins/inband/pcie/pcie_collector.py +++ b/nodescraper/plugins/inband/pcie/pcie_collector.py @@ -489,10 +489,20 @@ def get_cap_cfg( for cap_id, cap_addr in cap_data.items(): if cap_id == 0: continue - if cap_addr >= 0x100: - cap_enum: Enum = ExtendedCapabilityEnum(cap_id) - else: - cap_enum = CapabilityEnum(cap_id) + cap_type = ExtendedCapabilityEnum if cap_addr >= 0x100 else CapabilityEnum + try: + cap_enum: Enum = cap_type(cap_id) + except ValueError: + # Unknown / not-yet-modeled capability id. Skip it instead of + # aborting the whole collection so one new cap id can't take + # down the entire PCIe plugin. + self.logger.warning( + "Skipping unknown %s id 0x%X at offset 0x%X", + cap_type.__name__, + cap_id, + cap_addr, + ) + continue cap_cls = self.get_cap_struct(cap_enum) if cap_cls is None: continue diff --git a/nodescraper/plugins/inband/pcie/pcie_data.py b/nodescraper/plugins/inband/pcie/pcie_data.py index 83a03403..70da6375 100644 --- a/nodescraper/plugins/inband/pcie/pcie_data.py +++ b/nodescraper/plugins/inband/pcie/pcie_data.py @@ -157,6 +157,7 @@ class ExtendedCapabilityEnum(Enum): ALT_PROTOCOL = 0x002B # Alternate Protocol Extended Capability SFI = 0x002C # System Firmware Intermediary (SFI)Extended Capability DOE = 0x2E # 0x2e Data Object Exchange + IDE = 0x2F # 0x2f Integrity and Data Encryption (IDE) INT_DOE = 0x30 # 0x30 Integrity and Data Encryption diff --git a/nodescraper/plugins/regex_search/__init__.py b/nodescraper/plugins/inband/regex_search/__init__.py similarity index 97% rename from nodescraper/plugins/regex_search/__init__.py rename to nodescraper/plugins/inband/regex_search/__init__.py index 708b6b04..b8ee4a8e 100644 --- a/nodescraper/plugins/regex_search/__init__.py +++ b/nodescraper/plugins/inband/regex_search/__init__.py @@ -1,28 +1,28 @@ -############################################################################### -# -# MIT License -# -# Copyright (c) 2026 Advanced Micro Devices, Inc. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# -############################################################################### -from .regex_search_plugin import RegexSearchPlugin - -__all__ = ["RegexSearchPlugin"] +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from .regex_search_plugin import RegexSearchPlugin + +__all__ = ["RegexSearchPlugin"] diff --git a/nodescraper/plugins/regex_search/analyzer_args.py b/nodescraper/plugins/inband/regex_search/analyzer_args.py similarity index 97% rename from nodescraper/plugins/regex_search/analyzer_args.py rename to nodescraper/plugins/inband/regex_search/analyzer_args.py index b30acb7e..254d6a13 100644 --- a/nodescraper/plugins/regex_search/analyzer_args.py +++ b/nodescraper/plugins/inband/regex_search/analyzer_args.py @@ -1,50 +1,50 @@ -############################################################################### -# -# MIT License -# -# Copyright (c) 2026 Advanced Micro Devices, Inc. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# -############################################################################### -from typing import Any, Optional - -from pydantic import Field - -from nodescraper.models import AnalyzerArgs - - -class RegexSearchAnalyzerArgs(AnalyzerArgs): - """Arguments for RegexSearchAnalyzer (dict items match Dmesg-style error_regex).""" - - error_regex: Optional[list[dict[str, Any]]] = Field( - default=None, - description=( - "Regex patterns to search for; each dict may include regex (str), message, " - "event_category, event_priority (same as Dmesg analyzer error_regex). " - ), - ) - interval_to_collapse_event: int = Field( - default=60, - description="Seconds within which repeated events are collapsed into one.", - ) - num_timestamps: int = Field( - default=3, - description="Number of timestamps to include per event in output.", - ) +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from typing import Any, Optional + +from pydantic import Field + +from nodescraper.models import AnalyzerArgs + + +class RegexSearchAnalyzerArgs(AnalyzerArgs): + """Arguments for RegexSearchAnalyzer (dict items match Dmesg-style error_regex).""" + + error_regex: Optional[list[dict[str, Any]]] = Field( + default=None, + description=( + "Regex patterns to search for; each dict may include regex (str), message, " + "event_category, event_priority (same as Dmesg analyzer error_regex). " + ), + ) + interval_to_collapse_event: int = Field( + default=60, + description="Seconds within which repeated events are collapsed into one.", + ) + num_timestamps: int = Field( + default=3, + description="Number of timestamps to include per event in output.", + ) diff --git a/nodescraper/plugins/regex_search/regex_search_analyzer.py b/nodescraper/plugins/inband/regex_search/regex_search_analyzer.py similarity index 97% rename from nodescraper/plugins/regex_search/regex_search_analyzer.py rename to nodescraper/plugins/inband/regex_search/regex_search_analyzer.py index 0b4384f4..85da6501 100644 --- a/nodescraper/plugins/regex_search/regex_search_analyzer.py +++ b/nodescraper/plugins/inband/regex_search/regex_search_analyzer.py @@ -1,102 +1,102 @@ -############################################################################### -# -# MIT License -# -# Copyright (c) 2026 Advanced Micro Devices, Inc. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# -############################################################################### -import os -from typing import Optional, Union - -from nodescraper.base.regexanalyzer import ErrorRegex, RegexAnalyzer, RegexEvent -from nodescraper.enums import ExecutionStatus -from nodescraper.models import TaskResult - -from .analyzer_args import RegexSearchAnalyzerArgs -from .regex_search_data import RegexSearchData - - -class RegexSearchAnalyzer(RegexAnalyzer[RegexSearchData, RegexSearchAnalyzerArgs]): - """Run user-provided regexes against text loaded from --data (file or directory).""" - - DATA_MODEL = RegexSearchData - - ERROR_REGEX: list[ErrorRegex] = [] - - def _build_regex_event( - self, regex_obj: ErrorRegex, match: Union[str, list[str]], source: str - ) -> RegexEvent: - """Augment the default event text with a file path when the origin is a concrete path. - - Args: - regex_obj: Metadata for the rule that produced the match. - match: Substring or grouped capture text from the pattern. - source: Origin label, or an absolute path when matching per file. - - Returns: - Match record with an extended description when a path-like source is present. - """ - event = super()._build_regex_event(regex_obj, match, source) - if source and source != "regex_search": - event.description = f"{regex_obj.message} [file: {source}]" - return event - - def analyze_data( - self, - data: RegexSearchData, - args: Optional[RegexSearchAnalyzerArgs] = None, - ) -> TaskResult: - """Scan loaded inputs with the given patterns, or mark the task not run if inputs are incomplete. - - Args: - data: Aggregated and per-file text loaded from the user data path. - args: Optional pattern list and timing knobs; omitted or empty patterns skip work. - - Returns: - Work outcome with match events, or a not-run status when patterns are absent. - """ - if args is None or not args.error_regex: - self.result.status = ExecutionStatus.NOT_RAN - self.result.message = "Analysis args need to be provided for the analyzer to run" - return self.result - - final_regex = self._convert_and_extend_error_regex(args.error_regex, []) - - if data.files: - for rel_path in sorted(data.files.keys()): - file_content = data.files[rel_path] - abs_source = os.path.normpath(os.path.join(data.data_root, rel_path)) - self.result.events += self.check_all_regexes( - content=file_content, - source=abs_source, - error_regex=final_regex, - num_timestamps=args.num_timestamps, - interval_to_collapse_event=args.interval_to_collapse_event, - ) - else: - self.result.events += self.check_all_regexes( - content=data.content, - source=data.data_root or "regex_search", - error_regex=final_regex, - num_timestamps=args.num_timestamps, - interval_to_collapse_event=args.interval_to_collapse_event, - ) - return self.result +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import os +from typing import Optional, Union + +from nodescraper.base.regexanalyzer import ErrorRegex, RegexAnalyzer, RegexEvent +from nodescraper.enums import ExecutionStatus +from nodescraper.models import TaskResult + +from .analyzer_args import RegexSearchAnalyzerArgs +from .regex_search_data import RegexSearchData + + +class RegexSearchAnalyzer(RegexAnalyzer[RegexSearchData, RegexSearchAnalyzerArgs]): + """Run user-provided regexes against text loaded from --data (file or directory).""" + + DATA_MODEL = RegexSearchData + + ERROR_REGEX: list[ErrorRegex] = [] + + def _build_regex_event( + self, regex_obj: ErrorRegex, match: Union[str, list[str]], source: str + ) -> RegexEvent: + """Augment the default event text with a file path when the origin is a concrete path. + + Args: + regex_obj: Metadata for the rule that produced the match. + match: Substring or grouped capture text from the pattern. + source: Origin label, or an absolute path when matching per file. + + Returns: + Match record with an extended description when a path-like source is present. + """ + event = super()._build_regex_event(regex_obj, match, source) + if source and source != "regex_search": + event.description = f"{regex_obj.message} [file: {source}]" + return event + + def analyze_data( + self, + data: RegexSearchData, + args: Optional[RegexSearchAnalyzerArgs] = None, + ) -> TaskResult: + """Scan loaded inputs with the given patterns, or mark the task not run if inputs are incomplete. + + Args: + data: Aggregated and per-file text loaded from the user data path. + args: Optional pattern list and timing knobs; omitted or empty patterns skip work. + + Returns: + Work outcome with match events, or a not-run status when patterns are absent. + """ + if args is None or not args.error_regex: + self.result.status = ExecutionStatus.NOT_RAN + self.result.message = "Analysis args need to be provided for the analyzer to run" + return self.result + + final_regex = self._convert_and_extend_error_regex(args.error_regex, []) + + if data.files: + for rel_path in sorted(data.files.keys()): + file_content = data.files[rel_path] + abs_source = os.path.normpath(os.path.join(data.data_root, rel_path)) + self.result.events += self.check_all_regexes( + content=file_content, + source=abs_source, + error_regex=final_regex, + num_timestamps=args.num_timestamps, + interval_to_collapse_event=args.interval_to_collapse_event, + ) + else: + self.result.events += self.check_all_regexes( + content=data.content, + source=data.data_root or "regex_search", + error_regex=final_regex, + num_timestamps=args.num_timestamps, + interval_to_collapse_event=args.interval_to_collapse_event, + ) + return self.result diff --git a/nodescraper/plugins/regex_search/regex_search_data.py b/nodescraper/plugins/inband/regex_search/regex_search_data.py similarity index 97% rename from nodescraper/plugins/regex_search/regex_search_data.py rename to nodescraper/plugins/inband/regex_search/regex_search_data.py index a12b2841..1e094d45 100644 --- a/nodescraper/plugins/regex_search/regex_search_data.py +++ b/nodescraper/plugins/inband/regex_search/regex_search_data.py @@ -1,107 +1,107 @@ -############################################################################### -# -# MIT License -# -# Copyright (c) 2026 Advanced Micro Devices, Inc. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# -############################################################################### -import os -from pathlib import Path -from typing import Union - -from pydantic import Field - -from nodescraper.models import DataModel -from nodescraper.utils import get_unique_filename - - -class RegexSearchData(DataModel): - """Loaded file or directory contents passed to the analyzer (via --data).""" - - content: str - data_root: str = "" - files: dict[str, str] = Field(default_factory=dict) - - def log_model(self, log_path: str) -> None: - """Persist the aggregated text payload as one log file under the given base path. - - Args: - log_path: Directory where the log file should be written. - - Returns: - None. - """ - log_name = os.path.join(log_path, get_unique_filename(log_path, "regex_search_source.log")) - with open(log_name, "w", encoding="utf-8") as log_file: - log_file.write(self.content) - - @classmethod - def import_model(cls, model_input: Union[dict, str]) -> "RegexSearchData": - """Import datamodel. - - Args: - model_input: Keyed fields for direct validation, or a path string to load from disk. - - Returns: - Instance with content, root path, and per-file bodies filled in. - """ - if isinstance(model_input, dict): - return cls.model_validate(model_input) - if isinstance(model_input, str): - return cls._from_filesystem_path(model_input) - raise ValueError("Invalid input for regex search data") - - @classmethod - def _from_filesystem_path(cls, path: str) -> "RegexSearchData": - """Read one file or every file under a directory into a merged view plus a path-to-text map. - - Args: - path: Absolute or resolvable path to a file or directory. - - Returns: - Instance built from the read text and discovered relative paths. - - """ - path = os.path.abspath(path) - if not os.path.exists(path): - raise FileNotFoundError(f"Path not found: {path}") - if os.path.isfile(path): - text = Path(path).read_text(encoding="utf-8", errors="replace") - rel = os.path.basename(path) - data_root = os.path.dirname(path) or os.path.abspath(os.path.curdir) - return cls(content=text, data_root=data_root, files={rel: text}) - if os.path.isdir(path): - files: dict[str, str] = {} - parts: list[str] = [] - for root, _dirs, filenames in os.walk(path): - for name in sorted(filenames): - fp = os.path.join(root, name) - if not os.path.isfile(fp): - continue - rel = os.path.relpath(fp, path) - try: - text = Path(fp).read_text(encoding="utf-8", errors="replace") - except OSError: - continue - files[rel] = text - parts.append(f"===== {rel} =====\n{text}") - return cls(content="\n".join(parts), data_root=path, files=files) - raise ValueError(f"Unsupported path type: {path}") +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import os +from pathlib import Path +from typing import Union + +from pydantic import Field + +from nodescraper.models import DataModel +from nodescraper.utils import get_unique_filename + + +class RegexSearchData(DataModel): + """Loaded file or directory contents passed to the analyzer (via --data).""" + + content: str + data_root: str = "" + files: dict[str, str] = Field(default_factory=dict) + + def log_model(self, log_path: str) -> None: + """Persist the aggregated text payload as one log file under the given base path. + + Args: + log_path: Directory where the log file should be written. + + Returns: + None. + """ + log_name = os.path.join(log_path, get_unique_filename(log_path, "regex_search_source.log")) + with open(log_name, "w", encoding="utf-8") as log_file: + log_file.write(self.content) + + @classmethod + def import_model(cls, model_input: Union[dict, str]) -> "RegexSearchData": + """Import datamodel. + + Args: + model_input: Keyed fields for direct validation, or a path string to load from disk. + + Returns: + Instance with content, root path, and per-file bodies filled in. + """ + if isinstance(model_input, dict): + return cls.model_validate(model_input) + if isinstance(model_input, str): + return cls._from_filesystem_path(model_input) + raise ValueError("Invalid input for regex search data") + + @classmethod + def _from_filesystem_path(cls, path: str) -> "RegexSearchData": + """Read one file or every file under a directory into a merged view plus a path-to-text map. + + Args: + path: Absolute or resolvable path to a file or directory. + + Returns: + Instance built from the read text and discovered relative paths. + + """ + path = os.path.abspath(path) + if not os.path.exists(path): + raise FileNotFoundError(f"Path not found: {path}") + if os.path.isfile(path): + text = Path(path).read_text(encoding="utf-8", errors="replace") + rel = os.path.basename(path) + data_root = os.path.dirname(path) or os.path.abspath(os.path.curdir) + return cls(content=text, data_root=data_root, files={rel: text}) + if os.path.isdir(path): + files: dict[str, str] = {} + parts: list[str] = [] + for root, _dirs, filenames in os.walk(path): + for name in sorted(filenames): + fp = os.path.join(root, name) + if not os.path.isfile(fp): + continue + rel = os.path.relpath(fp, path) + try: + text = Path(fp).read_text(encoding="utf-8", errors="replace") + except OSError: + continue + files[rel] = text + parts.append(f"===== {rel} =====\n{text}") + return cls(content="\n".join(parts), data_root=path, files=files) + raise ValueError(f"Unsupported path type: {path}") diff --git a/nodescraper/plugins/regex_search/regex_search_plugin.py b/nodescraper/plugins/inband/regex_search/regex_search_plugin.py similarity index 85% rename from nodescraper/plugins/regex_search/regex_search_plugin.py rename to nodescraper/plugins/inband/regex_search/regex_search_plugin.py index 36d650c6..2a101ff8 100644 --- a/nodescraper/plugins/regex_search/regex_search_plugin.py +++ b/nodescraper/plugins/inband/regex_search/regex_search_plugin.py @@ -1,76 +1,73 @@ -############################################################################### -# -# MIT License -# -# Copyright (c) 2026 Advanced Micro Devices, Inc. -# -# Permission is hereby granted, free of charge, to any person obtaining a copy -# of this software and associated documentation files (the "Software"), to deal -# in the Software without restriction, including without limitation the rights -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell -# copies of the Software, and to permit persons to whom the Software is -# furnished to do so, subject to the following conditions: -# -# The above copyright notice and this permission notice shall be included in all -# copies or substantial portions of the Software. -# -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE -# SOFTWARE. -# -############################################################################### -from typing import Optional, Union - -from nodescraper.connection.inband import InBandConnectionManager, SSHConnectionParams -from nodescraper.enums import EventPriority -from nodescraper.interfaces import DataPlugin -from nodescraper.models import CollectorArgs, TaskResult - -from .analyzer_args import RegexSearchAnalyzerArgs -from .regex_search_analyzer import RegexSearchAnalyzer -from .regex_search_data import RegexSearchData - - -class RegexSearchPlugin( - DataPlugin[ - InBandConnectionManager, - SSHConnectionParams, - RegexSearchData, - CollectorArgs, - RegexSearchAnalyzerArgs, - ] -): - """Analyzer-only plugin: search user regexes against a file or directory (--data).""" - - DATA_MODEL = RegexSearchData - ANALYZER = RegexSearchAnalyzer - - def analyze( - self, - max_event_priority_level: Optional[Union[EventPriority, str]] = EventPriority.CRITICAL, - analysis_args: Optional[Union[RegexSearchAnalyzerArgs, dict]] = None, - data: Optional[Union[str, dict, RegexSearchData]] = None, - ) -> TaskResult: - if analysis_args is None: - missing_error_regex = True - elif isinstance(analysis_args, RegexSearchAnalyzerArgs): - missing_error_regex = not bool(analysis_args.error_regex) - elif isinstance(analysis_args, dict): - er = analysis_args.get("error_regex") - missing_error_regex = er is None or er == [] - else: - missing_error_regex = True - if missing_error_regex: - self.logger.warning( - "RegexSearchPlugin: analysis args need to be provided for the analyzer to run " - "(e.g. --error-regex for each pattern)." - ) - return super().analyze( - max_event_priority_level=max_event_priority_level, - analysis_args=analysis_args, - data=data, - ) +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from typing import Optional, Union + +from nodescraper.base import InBandDataPlugin +from nodescraper.enums import EventPriority +from nodescraper.models import CollectorArgs, TaskResult + +from .analyzer_args import RegexSearchAnalyzerArgs +from .regex_search_analyzer import RegexSearchAnalyzer +from .regex_search_data import RegexSearchData + + +class RegexSearchPlugin(InBandDataPlugin[RegexSearchData, CollectorArgs, RegexSearchAnalyzerArgs]): + """Analyzer-only plugin: search user regexes against a file or directory (--data).""" + + DATA_MODEL = RegexSearchData + ANALYZER = RegexSearchAnalyzer + ANALYZER_ARGS = RegexSearchAnalyzerArgs + + DOCUMENTATION_ANALYSIS_ITEMS: tuple[str, ...] = ( + "Runs RegexSearchAnalyzer: user-defined patterns via analysis_args.error_regex (same shape as Dmesg).", + "Emits regex match events with optional per-file source in the description when scanning directories.", + ) + + def analyze( + self, + max_event_priority_level: Optional[Union[EventPriority, str]] = EventPriority.CRITICAL, + analysis_args: Optional[Union[RegexSearchAnalyzerArgs, dict]] = None, + data: Optional[Union[str, dict, RegexSearchData]] = None, + ) -> TaskResult: + if analysis_args is None: + missing_error_regex = True + elif isinstance(analysis_args, RegexSearchAnalyzerArgs): + missing_error_regex = not bool(analysis_args.error_regex) + elif isinstance(analysis_args, dict): + er = analysis_args.get("error_regex") + missing_error_regex = er is None or er == [] + else: + missing_error_regex = True + if missing_error_regex: + self.logger.warning( + "RegexSearchPlugin: analysis args need to be provided for the analyzer to run " + "(e.g. --error-regex for each pattern)." + ) + return super().analyze( + max_event_priority_level=max_event_priority_level, + analysis_args=analysis_args, + data=data, + ) diff --git a/nodescraper/plugins/ooband/bmc_archive/bmc_archive_collector.py b/nodescraper/plugins/ooband/bmc_archive/bmc_archive_collector.py index 547ba80d..722122ca 100644 --- a/nodescraper/plugins/ooband/bmc_archive/bmc_archive_collector.py +++ b/nodescraper/plugins/ooband/bmc_archive/bmc_archive_collector.py @@ -41,6 +41,11 @@ class BmcArchiveCollector(InBandDataCollector[BmcArchiveDataModel, BmcArchiveCol DATA_MODEL = BmcArchiveDataModel SUPPORTED_OS_FAMILY = {OSFamily.LINUX, OSFamily.UNKNOWN} + DOCUMENTATION_COLLECTION_ITEMS: tuple[str, ...] = ( + "SSH (BMC) shell: tar+gzip archives for each path in collection_args (see PathSpec entries).", + "Uses sudo on the BMC when collection_args paths require elevated access.", + ) + REMOTE_ARCHIVE_TEMPLATE = "/tmp/node_scraper_{name}.tar.gz" # None until first probe in a run; collect_data resets so each collection re-probes. _tar_ignore_failed_read_supported: Optional[bool] = None diff --git a/nodescraper/plugins/ooband/redfish_endpoint/collector_args.py b/nodescraper/plugins/ooband/redfish_endpoint/collector_args.py index 189c5edf..6583075e 100644 --- a/nodescraper/plugins/ooband/redfish_endpoint/collector_args.py +++ b/nodescraper/plugins/ooband/redfish_endpoint/collector_args.py @@ -59,7 +59,10 @@ class RedfishEndpointCollectorArgs(CollectorArgs): ) follow_next_link: bool = Field( default=False, - description="If True, follow Members@odata.nextLink pagination for each URI and merge all pages into a single response.", + description=( + "If True, follow Redfish Members collection OData nextLink pagination for each URI " + "and merge all pages into a single response." + ), ) max_pages: int = Field( default=200, diff --git a/nodescraper/plugins/ooband/redfish_endpoint/endpoint_analyzer.py b/nodescraper/plugins/ooband/redfish_endpoint/endpoint_analyzer.py index 59dd7a8d..1e43a71a 100644 --- a/nodescraper/plugins/ooband/redfish_endpoint/endpoint_analyzer.py +++ b/nodescraper/plugins/ooband/redfish_endpoint/endpoint_analyzer.py @@ -89,6 +89,12 @@ class RedfishEndpointAnalyzer(DataAnalyzer[RedfishEndpointDataModel, RedfishEndp DATA_MODEL = RedfishEndpointDataModel + DOCUMENTATION_ANALYSIS_ITEMS: tuple[str, ...] = ( + "For each entry in analysis_args.checks, reads JSON paths in collected responses and " + "compares values to constraints (eq, min/max, anyOf, regex, etc.).", + 'URI key "*" runs checks against every collected response body.', + ) + def analyze_data( self, data: RedfishEndpointDataModel, diff --git a/nodescraper/plugins/ooband/redfish_endpoint/endpoint_collector.py b/nodescraper/plugins/ooband/redfish_endpoint/endpoint_collector.py index e0878c1a..37bd839b 100644 --- a/nodescraper/plugins/ooband/redfish_endpoint/endpoint_collector.py +++ b/nodescraper/plugins/ooband/redfish_endpoint/endpoint_collector.py @@ -152,6 +152,13 @@ class RedfishEndpointCollector( DATA_MODEL = RedfishEndpointDataModel + DOCUMENTATION_COLLECTION_ITEMS: tuple[str, ...] = ( + "Redfish GET: explicit paths from collection_args.uris (parallel when max_workers>1).", + "Optional paged GET following the Members collection OData nextLink field when follow_next_link is true.", + "Redfish GET tree: when discover_tree is true, walks from api_root using OData resource id links and " + "Members navigation (depth and endpoint caps from collection_args).", + ) + def collect_data( self, args: Optional[RedfishEndpointCollectorArgs] = None ) -> tuple[TaskResult, Optional[RedfishEndpointDataModel]]: diff --git a/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_analyzer.py b/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_analyzer.py index c54d9e2f..11aaa1e8 100644 --- a/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_analyzer.py +++ b/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_analyzer.py @@ -38,6 +38,11 @@ class RedfishOemDiagAnalyzer(DataAnalyzer[RedfishOemDiagDataModel, RedfishOemDia DATA_MODEL = RedfishOemDiagDataModel + DOCUMENTATION_ANALYSIS_ITEMS: tuple[str, ...] = ( + "Summarizes success/failure per OEM diagnostic type from collected results.", + "When analysis_args.require_all_success is true, fails the run if any type failed collection.", + ) + def analyze_data( self, data: RedfishOemDiagDataModel, diff --git a/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_collector.py b/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_collector.py index b406ef38..f2e3d1d2 100644 --- a/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_collector.py +++ b/nodescraper/plugins/ooband/redfish_oem_diag/oem_diag_collector.py @@ -43,6 +43,12 @@ class RedfishOemDiagCollector( DATA_MODEL = RedfishOemDiagDataModel + DOCUMENTATION_COLLECTION_ITEMS: tuple[str, ...] = ( + "Redfish LogService.CollectDiagnosticData for each entry in collection_args.oem_diagnostic_types " + "(collection_args.log_service_path selects the LogService).", + "Optional binary archives under the plugin log path when log_path is set.", + ) + def __init__(self, *args: Any, **kwargs: Any) -> None: self.log_path = kwargs.pop("log_path", None) super().__init__(*args, **kwargs) diff --git a/nodescraper/plugins/serviceability/__init__.py b/nodescraper/plugins/serviceability/__init__.py new file mode 100644 index 00000000..c5e9f857 --- /dev/null +++ b/nodescraper/plugins/serviceability/__init__.py @@ -0,0 +1,89 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from .afid_events import build_afid_events_from_data +from .analyzer_args import ServiceabilityAnalyzerArgs +from .mi3xx import ( + MI3XXAnalyzer, + MI3XXCollector, + MI3XXCollectorArgs, + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + ServiceabilityPluginMI3XX, + build_mi3xx_reporting_version_fields, +) +from .se_adapter import ( + format_serviceability_solution_lines, + serviceability_block_from_service_result, +) +from .se_models import AfidEvent, ServiceabilityBlock, ServiceabilitySolution +from .se_runner import SeRunError, run_service_hub +from .serviceability_collector import ServiceabilityCollectorBase +from .serviceability_data import ( + DeviceInfo, + ServiceabilityDataModel, + ServiceabilityResult, +) +from .serviceability_plugin_base import ServiceabilityPluginBase +from .time_utils import ( + TimeOperator, + compare_iso_datetime, + is_valid_iso_datetime, + normalize_se_timestamp, + parse_iso_datetime, + satisfies_time_check, +) + +__all__ = [ + "AfidEvent", + "DeviceInfo", + "MI3XXAnalyzer", + "MI3XXCollector", + "MI3XXCollectorArgs", + "MI3XXDataModel", + "MI3XXDeviceInfo", + "MI3XXResult", + "SeRunError", + "ServiceabilityAnalyzerArgs", + "ServiceabilityBlock", + "ServiceabilityCollectorBase", + "ServiceabilityDataModel", + "ServiceabilityPluginBase", + "ServiceabilityPluginMI3XX", + "ServiceabilityResult", + "ServiceabilitySolution", + "TimeOperator", + "build_afid_events_from_data", + "build_mi3xx_reporting_version_fields", + "compare_iso_datetime", + "format_serviceability_solution_lines", + "is_valid_iso_datetime", + "normalize_se_timestamp", + "parse_iso_datetime", + "run_service_hub", + "serviceability_block_from_service_result", + "satisfies_time_check", +] diff --git a/nodescraper/plugins/serviceability/afid_events.py b/nodescraper/plugins/serviceability/afid_events.py new file mode 100644 index 00000000..a84af503 --- /dev/null +++ b/nodescraper/plugins/serviceability/afid_events.py @@ -0,0 +1,188 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, Optional + +from .se_models import AfidEvent +from .serviceability_data import ServiceabilityDataModel +from .time_utils import normalize_se_timestamp + +_EVENT_TIMESTAMP_KEYS = ("Created", "EventTimestamp", "Timestamp") +_AFID_KEYS = ("Afid", "AFID", "afid") + + +def build_afid_events_from_data(data: ServiceabilityDataModel) -> list[AfidEvent]: + """Build SE input events from collected Redfish and CPER fields.""" + events: list[AfidEvent] = [] + seen: set[tuple[int, str, str]] = set() + + for rf_event in data.rf_events: + parsed = _afid_event_from_rf_member(rf_event) + if parsed is None: + continue + key = (parsed.afid, parsed.serviceable_unit, parsed.time) + if key in seen: + continue + seen.add(key) + events.append(parsed) + + for unit, payload in data.cper_data.items(): + parsed = _afid_event_from_cper_slot(str(unit), payload) + if parsed is None: + continue + key = (parsed.afid, parsed.serviceable_unit, parsed.time) + if key in seen: + continue + seen.add(key) + events.append(parsed) + + return events + + +def _afid_event_from_rf_member(member: Any) -> Optional[AfidEvent]: + if not isinstance(member, dict): + return None + afid = _extract_afid(member) + unit = _extract_serviceable_unit(member) + timestamp = _extract_timestamp(member) + if afid is None or unit is None or timestamp is None: + return None + return AfidEvent( + afid=afid, + serviceable_unit=unit, + time=normalize_se_timestamp(timestamp), + ) + + +def _afid_event_from_cper_slot(unit: str, payload: Any) -> Optional[AfidEvent]: + if not isinstance(payload, dict): + return None + afid = _extract_afid(payload) + timestamp = _extract_timestamp(payload) + unit_name = str(payload.get("serviceable_unit") or unit).strip() + if afid is None or not unit_name or timestamp is None: + return None + return AfidEvent( + afid=afid, + serviceable_unit=unit_name, + time=normalize_se_timestamp(timestamp), + ) + + +def _extract_afid(payload: dict[str, Any]) -> Optional[int]: + for key in _AFID_KEYS: + if key in payload and payload[key] is not None: + return int(payload[key]) + oem = payload.get("Oem") + if isinstance(oem, dict): + for vendor_payload in oem.values(): + found = _extract_afid_from_oem_fragment(vendor_payload) + if found is not None: + return found + return None + + +def _extract_afid_from_oem_fragment(vendor_payload: Any) -> Optional[int]: + """Resolve AFID from one ``Oem`` property value (dict or list of dicts, e.g. ``AMDFieldIdentifiers``).""" + if isinstance(vendor_payload, dict): + for key in _AFID_KEYS: + if key in vendor_payload and vendor_payload[key] is not None: + return int(vendor_payload[key]) + elif isinstance(vendor_payload, list): + for item in vendor_payload: + if isinstance(item, dict): + for key in _AFID_KEYS: + if key in item and item[key] is not None: + return int(item[key]) + return None + + +def _origin_dict_to_unit(value: Any) -> Optional[str]: + if not isinstance(value, dict): + return None + odata_id = value.get("@odata.id") or value.get("odata.id") + if odata_id: + return _unit_from_odata_id(str(odata_id)) + return None + + +def _extract_serviceable_unit(payload: dict[str, Any]) -> Optional[str]: + for key in ("serviceable_unit", "ServiceableUnit", "OriginOfCondition", "Device"): + value = payload.get(key) + if value is None: + continue + if isinstance(value, dict): + odata_id = value.get("@odata.id") or value.get("odata.id") + if odata_id: + return _unit_from_odata_id(str(odata_id)) + text = str(value).strip() + if text: + return _unit_from_odata_id(text) if "/" in text else text + + links = payload.get("Links") or payload.get("links") + if isinstance(links, dict): + ooc = ( + links.get("OriginOfCondition") + or links.get("originOfCondition") + or links.get("OriginofCondition") + ) + unit = _origin_dict_to_unit(ooc) + if unit: + return unit + + oem = payload.get("Oem") + if isinstance(oem, dict): + for vendor_payload in oem.values(): + if isinstance(vendor_payload, dict): + unit = vendor_payload.get("serviceable_unit") or vendor_payload.get( + "ServiceableUnit" + ) + if unit is not None and str(unit).strip(): + return str(unit).strip() + elif isinstance(vendor_payload, list): + for item in vendor_payload: + if not isinstance(item, dict): + continue + su = item.get("ServiceableUnits") or item.get("serviceable_units") + if isinstance(su, list) and su: + u = _origin_dict_to_unit(su[0]) + if u: + return u + return None + + +def _extract_timestamp(payload: dict[str, Any]) -> Optional[str]: + for key in _EVENT_TIMESTAMP_KEYS: + value = payload.get(key) + if value is not None and str(value).strip(): + return str(value).strip() + return None + + +def _unit_from_odata_id(odata_id: str) -> str: + segment = odata_id.rstrip("/").split("/")[-1] + return segment or odata_id diff --git a/nodescraper/plugins/serviceability/analyzer_args.py b/nodescraper/plugins/serviceability/analyzer_args.py new file mode 100644 index 00000000..639822cc --- /dev/null +++ b/nodescraper/plugins/serviceability/analyzer_args.py @@ -0,0 +1,150 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, Optional + +from pydantic import Field, field_validator, model_validator + +from nodescraper.models import AnalyzerArgs + + +class ServiceabilityAnalyzerArgs(AnalyzerArgs): + """Analyzer args for serviceability plugins that run a configurable Python hub.""" + + hub_python_module: Optional[str] = Field( + default=None, + description="Import path for the hub module (class implements hub_analyze_method); hub_options forwards kwargs.", + ) + hub_display_name: Optional[str] = Field( + default=None, + description="Optional label for analyzer status messages.", + ) + afid_sag_path: Optional[str] = Field( + default=None, + description="Path to hub config (e.g. AFID_SAG.json); passed as hub_init_path_kwarg.", + ) + hub_init_path_kwarg: str = Field( + default="afid_sag", + description="Hub __init__ keyword that receives afid_sag_path.", + ) + hub_analyze_method: str = Field( + default="get_service_info", + description="Hub method called with rf_events first (default get_service_info).", + ) + skip_hub: bool = Field( + default=False, + description="If True, only build afid_events without running the service hub.", + ) + cper_decode_module: Optional[str] = Field( + default=None, + description="Module import path for CPER decoding when events include CPER attachments.", + ) + cper_decode_method: str = Field( + default="analyze_cper", + description="Callable on cper_decode_module: file-like CPER in, (return_code, decode_dict) out.", + ) + hub_options: Optional[dict[str, Any]] = Field( + default=None, + description="Extra kwargs for hub __init__ and analyze; collected cper_data overrides cper_data key.", + ) + from_ac_cycle: int = Field( + default=-1, + ge=-1, + description="from_ac_cycle kwarg for the hub analyze call (merged after hub_options).", + ) + from_date: Optional[str] = Field( + default=None, + description="Optional from_date for the hub analyze call (merged after hub_options).", + ) + designation_serials: Optional[dict[str, str]] = Field( + default=None, + description="Optional designation_serials for the hub analyze call (merged after hub_options).", + ) + suppress_service_actions: Optional[list[str]] = Field( + default=None, + description="Optional suppress_service_actions for the hub analyze call (merged after hub_options).", + ) + + def resolved_hub_options(self) -> dict[str, Any]: + """Merge hub_options with from_ac_cycle, from_date, designation_serials, and suppress_service_actions.""" + merged = dict(self.hub_options or {}) + merged["from_ac_cycle"] = self.from_ac_cycle + if self.from_date is not None: + merged["from_date"] = self.from_date + if self.designation_serials is not None: + merged["designation_serials"] = self.designation_serials + if self.suppress_service_actions is not None: + merged["suppress_service_actions"] = self.suppress_service_actions + return merged + + @field_validator("hub_analyze_method", "hub_init_path_kwarg") + @classmethod + def _strip_non_empty_hub_hooks(cls, value: str) -> str: + text = str(value).strip() + if not text: + raise ValueError("must not be empty") + return text + + @field_validator("hub_options", mode="before") + @classmethod + def _none_empty_hub_options(cls, value: object) -> Optional[dict[str, Any]]: + if value is None: + return None + if isinstance(value, dict) and not value: + return None + return value # type: ignore[return-value] + + @field_validator("from_date", mode="before") + @classmethod + def _strip_from_date(cls, value: object) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + return text or None + + @field_validator( + "afid_sag_path", + "hub_python_module", + "hub_display_name", + "cper_decode_module", + ) + @classmethod + def _strip_optional_strings(cls, value: Optional[str]) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + return text or None + + @model_validator(mode="after") + def _require_hub_config_when_running(self) -> ServiceabilityAnalyzerArgs: + if self.skip_hub: + return self + if not self.afid_sag_path: + raise ValueError("afid_sag_path is required when running the service hub.") + if not self.hub_python_module: + raise ValueError("hub_python_module is required when running the service hub.") + return self diff --git a/nodescraper/plugins/serviceability/cper_decode.py b/nodescraper/plugins/serviceability/cper_decode.py new file mode 100644 index 00000000..d4e9b20e --- /dev/null +++ b/nodescraper/plugins/serviceability/cper_decode.py @@ -0,0 +1,145 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Decode collected CPER attachments via a configured Python decode module.""" +from __future__ import annotations + +import base64 +import binascii +import importlib +import io +import logging +from typing import Any, Callable, Optional + + +class CperDecodeError(RuntimeError): + """Raised when the configured CPER decode module cannot be loaded or decoding fails.""" + + +def _load_decode_callable( + cper_decode_module: str, + cper_decode_method: str, +) -> Callable[[io.BytesIO], tuple[int, Any]]: + """Import a decode callable from analysis_args (module + method name).""" + try: + module = importlib.import_module(cper_decode_module) + except ImportError as exc: + raise CperDecodeError( + f"Cannot import cper_decode_module {cper_decode_module!r}: {exc}" + ) from exc + + decode_fn = getattr(module, cper_decode_method, None) + if decode_fn is None: + raise CperDecodeError( + f"Module {cper_decode_module!r} has no callable {cper_decode_method!r}" + ) + if not callable(decode_fn): + raise CperDecodeError(f"{cper_decode_module!r}.{cper_decode_method!r} is not callable") + return decode_fn + + +def count_ras_err_entries(decode_payload: Any) -> int: + """Count RasErr* keys in a decoded CPER triage_result dict.""" + if not isinstance(decode_payload, dict): + return 0 + triage_result = decode_payload.get("triage_result", {}) + if not isinstance(triage_result, dict): + return 0 + return sum(1 for key in triage_result if str(key).startswith("RasErr")) + + +def decode_cper_raw_attachments( + cper_raw: dict[str, str], + *, + cper_decode_module: str, + cper_decode_method: str = "analyze_cper", + logger: Optional[logging.Logger] = None, +) -> dict[str, Any]: + """Decode base64 CPER blobs keyed by Redfish event Id. + + The decode callable must accept a binary file-like object and return + ``(return_code, decode_dict)``. Results are passed to the service hub as + ``cper_data``; the hub does not perform CPER decoding itself. + + Returns ``{event_id: {"return_code": int, "decode": dict}}``. + """ + if not cper_raw: + return {} + + decode_fn = _load_decode_callable(cper_decode_module, cper_decode_method) + + decoded: dict[str, Any] = {} + errors: list[str] = [] + + for event_id, payload_b64 in cper_raw.items(): + try: + raw = base64.b64decode(payload_b64, validate=True) + except (binascii.Error, ValueError) as exc: + errors.append(f"event {event_id}: invalid base64 ({exc})") + continue + + try: + return_code, decode_payload = decode_fn(io.BytesIO(raw)) + except Exception as exc: # noqa: BLE001 + msg = f"event {event_id}: {exc}" + errors.append(msg) + if logger is not None: + logger.warning("CPER decode failed for Redfish event %s: %s", event_id, exc) + continue + + if return_code != 0: + errors.append(f"event {event_id}: decode return code {return_code}") + + decoded[str(event_id)] = { + "return_code": return_code, + "decode": decode_payload, + } + if logger is not None: + ras_count = count_ras_err_entries(decode_payload) + if return_code == 0: + logger.info( + "CPER decoded for Redfish event %s (return_code=0, %d RasErr entr%s)", + event_id, + ras_count, + "y" if ras_count == 1 else "ies", + ) + else: + logger.warning( + "CPER decoded for Redfish event %s with non-zero return_code=%s " + "(%d RasErr entr%s)", + event_id, + return_code, + ras_count, + "y" if ras_count == 1 else "ies", + ) + + if errors and not decoded: + raise CperDecodeError("; ".join(errors)) + + if logger is not None and errors: + for msg in errors: + logger.warning("CPER decode issue: %s", msg) + + return decoded diff --git a/nodescraper/plugins/serviceability/mi3xx/__init__.py b/nodescraper/plugins/serviceability/mi3xx/__init__.py new file mode 100644 index 00000000..b97928b3 --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/__init__.py @@ -0,0 +1,46 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from .mi3xx_analyzer import MI3XXAnalyzer +from .mi3xx_collector import MI3XXCollector +from .mi3xx_collector_args import MI3XXCollectorArgs +from .mi3xx_data import ( + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + build_mi3xx_reporting_version_fields, +) +from .serviceability_plugin_mi3xx import ServiceabilityPluginMI3XX + +__all__ = [ + "MI3XXAnalyzer", + "MI3XXCollector", + "MI3XXCollectorArgs", + "MI3XXDataModel", + "MI3XXDeviceInfo", + "MI3XXResult", + "ServiceabilityPluginMI3XX", + "build_mi3xx_reporting_version_fields", +] diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py new file mode 100644 index 00000000..b3e2644d --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_analyzer.py @@ -0,0 +1,213 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, ClassVar, Optional + +from pydantic import BaseModel, Field + +from nodescraper.enums import ExecutionStatus +from nodescraper.interfaces import DataAnalyzer +from nodescraper.models import TaskResult +from nodescraper.plugins.serviceability.afid_events import build_afid_events_from_data +from nodescraper.plugins.serviceability.analyzer_args import ServiceabilityAnalyzerArgs +from nodescraper.plugins.serviceability.cper_decode import ( + CperDecodeError, + decode_cper_raw_attachments, +) +from nodescraper.plugins.serviceability.se_adapter import ( + format_serviceability_solution_lines, +) +from nodescraper.plugins.serviceability.se_models import ServiceabilityBlock +from nodescraper.plugins.serviceability.se_runner import SeRunError, run_service_hub +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) + +from .mi3xx_cper_utils import CPER_METHOD_AFID_MAX, should_skip_cper_fetch_or_decode + + +class AfidSagMetadataArtifact(BaseModel): + """Hub AFID_SAG metadata snapshot; written to ``afid_sag_metadata.json``.""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "afid_sag_metadata" + + metadata: dict[str, Any] = Field(default_factory=dict) + + +class MI3XXAnalyzer(DataAnalyzer[ServiceabilityDataModel, ServiceabilityAnalyzerArgs]): + """Build AFID events from collected data and run the configured service hub.""" + + DATA_MODEL = ServiceabilityDataModel + + def analyze_data( + self, + data: ServiceabilityDataModel, + args: Optional[ServiceabilityAnalyzerArgs] = None, + ) -> TaskResult: + if args is None: + self.result.status = ExecutionStatus.NOT_RAN + self.result.message = "ServiceabilityAnalyzerArgs are required" + return self.result + + events = data.afid_events or build_afid_events_from_data(data) + data.afid_events = events + + if args.skip_hub: + data.serviceability = ServiceabilityBlock(afid_events=events) + self.result.status = ExecutionStatus.OK + self.result.message = f"Built {len(events)} AFID event(s); hub skipped" + self._log_serviceability_solutions(data.serviceability) + return self.result + + parent = self.parent or self.__class__.__name__ + cper_data = data.cper_data or {} + cper_raw_to_decode = self._cper_raw_needing_decode(data) + skipped_cper = len(data.cper_raw or {}) - len(cper_raw_to_decode) + if skipped_cper: + self.logger.info( + "(%s) Skipping CPER decode for %d CPER attachment(s); Redfish log " + "already has usable ACA fields (CPER-method AFID<=%s or no serial on decode)", + parent, + skipped_cper, + CPER_METHOD_AFID_MAX, + ) + if cper_raw_to_decode and not cper_data: + if not args.cper_decode_module: + self.logger.warning( + "(%s) %d CPER attachment(s) collected but cper_decode_module is " + "not set in analysis_args; skipping CPER decode", + parent, + len(cper_raw_to_decode), + ) + else: + self.logger.info( + "(%s) Decoding %d CPER attachment(s) via %s.%s", + parent, + len(cper_raw_to_decode), + args.cper_decode_module, + args.cper_decode_method, + ) + try: + cper_data = decode_cper_raw_attachments( + cper_raw_to_decode, + cper_decode_module=args.cper_decode_module, + cper_decode_method=args.cper_decode_method, + logger=self.logger, + ) + data.cper_data = cper_data + self.logger.info( + "(%s) CPER decode finished: %d of %d attachment(s) decoded", + parent, + len(cper_data), + len(cper_raw_to_decode), + ) + except CperDecodeError as exc: + self.logger.warning( + "(%s) %s; continuing without decoded CPER", + parent, + exc, + ) + elif cper_data: + self.logger.info( + "(%s) Using %d pre-decoded CPER record(s) from collection", + parent, + len(cper_data), + ) + + try: + block = run_service_hub( + hub_python_module=args.hub_python_module, # type: ignore[arg-type] + hub_display_name=args.hub_display_name, + afid_events=events, + afid_sag_path=args.afid_sag_path, # type: ignore[arg-type] + rf_events=data.rf_events, + cper_data=cper_data or None, + hub_options=args.resolved_hub_options(), + hub_analyze_method=args.hub_analyze_method, + hub_init_path_kwarg=args.hub_init_path_kwarg, + ) + except (SeRunError, ValueError) as exc: + self.result.status = ExecutionStatus.ERROR + self.result.message = str(exc) + return self.result + + data.serviceability = block + self._append_afid_sag_metadata_artifact(block) + self._log_serviceability_solutions(block) + hub_label = args.hub_display_name or args.hub_python_module + self.result.status = ExecutionStatus.OK + cper_summary = "" + if cper_data: + cper_summary = f", {len(cper_data)} decoded CPER(s)" + elif cper_raw_to_decode: + cper_summary = f", {len(cper_raw_to_decode)} CPER attachment(s) not decoded" + elif data.cper_raw: + cper_summary = f", {len(data.cper_raw)} CPER attachment(s) omitted (ACA on log entry)" + ver_bits: list[str] = [] + if block.hub_version: + ver_bits.append(f"hub {block.hub_version}") + if block.afid_sag_file_version: + ver_bits.append(f"AFID_SAG {block.afid_sag_file_version}") + ver_suffix = f" [{'; '.join(ver_bits)}]" if ver_bits else "" + self.result.message = ( + f"{hub_label}: {len(block.solution)} solution(s) " + f"from {len(data.rf_events)} Redfish event(s){cper_summary}{ver_suffix}" + ) + return self.result + + @staticmethod + def _cper_raw_needing_decode(data: ServiceabilityDataModel) -> dict[str, str]: + """Subset of ``cper_raw`` that still needs configured CPER decode (not already on the log).""" + raw = data.cper_raw or {} + if not raw: + return {} + by_id: dict[str, dict[str, Any]] = {} + for member in data.rf_events: + if not isinstance(member, dict): + continue + eid = member.get("Id") + if eid is not None: + by_id[str(eid)] = member + out: dict[str, str] = {} + for event_id, blob in raw.items(): + ev = by_id.get(str(event_id)) + if ev is not None and should_skip_cper_fetch_or_decode(ev): + continue + out[str(event_id)] = blob + return out + + def _append_afid_sag_metadata_artifact(self, block: ServiceabilityBlock) -> None: + if block.afid_sag_metadata is None: + return + self.result.artifacts.append( + AfidSagMetadataArtifact(metadata=dict(block.afid_sag_metadata)) + ) + + def _log_serviceability_solutions(self, block: ServiceabilityBlock) -> None: + parent = self.parent or self.__class__.__name__ + for line in format_serviceability_solution_lines(block): + self.logger.info("(%s) %s", parent, line) diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py new file mode 100644 index 00000000..d155f14a --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector.py @@ -0,0 +1,168 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import base64 +from typing import Any, Optional + +from nodescraper.plugins.serviceability.serviceability_collector import ( + ServiceabilityCollectorBase, +) +from nodescraper.plugins.serviceability.serviceability_data import DeviceInfo +from nodescraper.plugins.serviceability.time_utils import satisfies_time_check + +from .mi3xx_collector_args import MI3XXCollectorArgs +from .mi3xx_cper_utils import CPER_METHOD_AFID_MAX, should_skip_cper_fetch_or_decode + +_EVENT_TIMESTAMP_KEYS = ("Created", "EventTimestamp", "Timestamp") + + +class MI3XXCollector(ServiceabilityCollectorBase[MI3XXCollectorArgs]): + """Collect MI3XX BMC Redfish data: event log members (with pagination), firmware inventory, + CPER attachment bytes for qualifying events, and optional assembly/chassis metadata.""" + + def satisfies_reference_time( + self, + candidate: str, + args: MI3XXCollectorArgs, + ) -> bool: + """Test a timestamp against optional reference-time filter settings.""" + if args.reference_time is None or args.time_operator is None: + return True + return satisfies_time_check(candidate, args.reference_time, args.time_operator) + + def filter_event_members( + self, + members: list[Any], + args: MI3XXCollectorArgs, + ) -> list[Any]: + filtered: list[Any] = [] + for member in members: + if not isinstance(member, dict): + filtered.append(member) + continue + timestamp = self._event_timestamp(member) + if timestamp is None or self.satisfies_reference_time(timestamp, args): + filtered.append(member) + return filtered + + def is_cper_event(self, event: dict) -> bool: + """True when the log entry is a Redfish CPER attachment event.""" + return ( + "CPER" in event + and str(event.get("DiagnosticDataType", "")).upper() == "CPER" + and bool(event.get("AdditionalDataURI")) + ) + + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + """Fetch CPER binaries from BMC; decoding runs in the analyzer.""" + parent = self.parent or self.__class__.__name__ + attachments: dict[str, str] = {} + for event in rf_events: + if not isinstance(event, dict) or not self.is_cper_event(event): + continue + uri = event.get("AdditionalDataURI") + event_id = event.get("Id") + if not uri or not event_id: + continue + + if should_skip_cper_fetch_or_decode(event): + self.logger.info( + "(%s) Skipping CPER attachment fetch for Redfish event %s " + "(ACA decode already on log entry; CPER-method AFID<=%s or no serial)", + parent, + event_id, + CPER_METHOD_AFID_MAX, + ) + continue + + try: + resp = self.connection.get_response(uri) + except Exception as exc: # noqa: BLE001 + self.logger.warning( + "(%s) Failed to fetch CPER attachment for event %s: %s", + parent, + event_id, + exc, + ) + continue + if not resp.ok: + self.logger.warning( + "(%s) Failed to fetch CPER attachment for event %s: HTTP %s", + parent, + event_id, + resp.status_code, + ) + continue + + size_bytes = len(resp.content) + attachments[str(event_id)] = base64.b64encode(resp.content).decode("ascii") + self.logger.info( + "(%s) Fetched CPER attachment for Redfish event %s (%d bytes)", + parent, + event_id, + size_bytes, + ) + + if attachments: + self.logger.info( + "(%s) Collected %d CPER attachment(s) for analyzer decode", + parent, + len(attachments), + ) + return attachments + + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> DeviceInfo: + return DeviceInfo( + name=assembly_member_entry.get("Name") or designation, + part_number=assembly_member_entry.get("PartNumber"), + production_date=assembly_member_entry.get("ProductionDate"), + serial_number=assembly_member_entry.get("SerialNumber"), + version=assembly_member_entry.get("Version"), + ) + + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> Optional[str]: + details = firmware_inventory_payload.get("Details") + if details is not None: + return str(details) + return None + + @staticmethod + def _event_timestamp(event: dict[str, Any]) -> Optional[str]: + for key in _EVENT_TIMESTAMP_KEYS: + value = event.get(key) + if value is not None and str(value).strip(): + return str(value).strip() + return None diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py new file mode 100644 index 00000000..8d35cd2e --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_collector_args.py @@ -0,0 +1,172 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import List, Optional + +from pydantic import Field, field_validator, model_validator + +from nodescraper.models import CollectorArgs +from nodescraper.plugins.serviceability.time_utils import ( + TimeOperator, + is_valid_iso_datetime, +) + + +class MI3XXCollectorArgs(CollectorArgs): + """MI3XX OOB Redfish serviceability collector arguments.""" + + uri: Optional[str] = Field( + default=None, + description=( + "Optional alias for ``rf_event_log_uri``. When both ``uri`` and ``rf_event_log_uri`` " + "are explicitly set to non-empty values, ``uri`` wins." + ), + ) + rf_event_log_uri: str = Field( + default="/redfish/v1/Systems/UBB/LogServices/EventLog/Entries", + description="Redfish URI for the event log ``Entries`` collection.", + ) + rf_chassis_devices: Optional[List[str]] = Field( + default=None, + description="Chassis designations for Assembly GETs; required with ``rf_assembly_uri_template``.", + ) + rf_assembly_uri_template: Optional[str] = Field( + default=None, + description="Redfish URI template containing ``{device}`` for each chassis Assembly resource.", + ) + rf_firmware_bundle_uri: Optional[str] = Field( + default=None, + description="Redfish URI for firmware bundle inventory when subclasses extract component details.", + ) + follow_next_link: bool = Field( + default=True, + description="If True, follow Members@odata.nextLink up to max_pages; else single GET.", + ) + max_pages: int = Field( + default=200, + ge=1, + le=10_000, + description="Safety cap on the number of pages when following event log pagination.", + ) + top: Optional[int] = Field( + default=None, + ge=1, + description="Most recent N entries via $skip after count probe; None collects full window.", + ) + reference_time: Optional[str] = Field( + default=None, + description=( + "Optional ISO-8601 date or date-time used with time_operator " + "(e.g. 2026-05-17 or 2026-05-17T13:01:00)." + ), + ) + time_operator: Optional[TimeOperator] = Field( + default=None, + description="Comparison operator applied when reference_time is set.", + ) + + @field_validator("rf_event_log_uri") + @classmethod + def _strip_rf_event_log_uri(cls, value: object) -> str: + text = str(value).strip() + if not text: + raise ValueError("rf_event_log_uri must be a non-empty Redfish URI") + return text + + @field_validator("reference_time") + @classmethod + def _validate_reference_time_iso(cls, value: Optional[str]) -> Optional[str]: + if value is None: + return None + text = str(value).strip() + if not text: + raise ValueError("reference_time must be a non-empty ISO-8601 string") + if not is_valid_iso_datetime(text): + raise ValueError(f"reference_time is not ISO-8601 compliant: {value!r}") + return text + + @model_validator(mode="after") + def _require_event_log_uri(self) -> MI3XXCollectorArgs: + if not self.resolved_event_log_uri(): + raise ValueError( + "Provide a non-empty rf_event_log_uri or uri for the event log collection." + ) + return self + + @model_validator(mode="after") + def _assembly_consistency(self) -> MI3XXCollectorArgs: + has_tpl = bool( + self.rf_assembly_uri_template and "{device}" in self.rf_assembly_uri_template + ) + has_dev = bool(self.rf_chassis_devices) + if has_tpl != has_dev: + raise ValueError( + "Provide both rf_assembly_uri_template (with '{device}') and rf_chassis_devices, " + "or omit both to skip assembly collection." + ) + return self + + @model_validator(mode="after") + def _reference_time_requires_operator(self) -> MI3XXCollectorArgs: + has_ref = self.reference_time is not None + has_op = self.time_operator is not None + if has_ref != has_op: + raise ValueError("Provide both reference_time and time_operator, or omit both.") + return self + + @classmethod + def default_event_log_uri(cls) -> str: + """Return the built-in default for ``rf_event_log_uri`` (reads the field default; no duplicate constant).""" + raw = cls.model_fields["rf_event_log_uri"].default + if not isinstance(raw, str): + raise TypeError("rf_event_log_uri field default must be a str") + return raw + + def resolved_event_log_uri(self) -> str: + """Resolve the event log ``Entries`` URI from ``uri`` and ``rf_event_log_uri``.""" + uri_set = "uri" in self.model_fields_set + rf_set = "rf_event_log_uri" in self.model_fields_set + + def _strip(value: Optional[str]) -> str: + if value is None: + return "" + return str(value).strip() + + uri_s = _strip(self.uri) + rf_s = _strip(self.rf_event_log_uri) + + if uri_set and rf_set and uri_s and rf_s: + return uri_s + if rf_set: + return rf_s + if uri_set and uri_s: + return uri_s + if uri_set and not uri_s and not rf_set: + return rf_s + if not uri_set and not rf_set: + return rf_s + return "" diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py new file mode 100644 index 00000000..bdc4ce15 --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_cper_utils.py @@ -0,0 +1,133 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any + +# CPER-method AFIDs <= 34; MI3XX Redfish-method AFIDs 10000–10999. +CPER_METHOD_AFID_MAX = 34 +REDFISH_METHOD_AFID_MIN = 10000 +REDFISH_METHOD_AFID_MAX = 10999 + +_SERIAL_KEYS = ("SerialNumber", "serial_number", "UbbSerial", "ubb_serial") + + +def get_amd_oem_dict(event: dict[str, Any]) -> dict[str, Any]: + """Return the AMD OEM payload dict for a Redfish log member. + + BMC layouts vary: fields may live on Oem directly or under Oem.AMD. + When AMD is absent, returns Oem; when present, returns AMD if it is a dict. + """ + if not isinstance(oem := event.get("Oem"), dict): + return {} + if (amd := oem.get("AMD")) is None: + return oem + return amd if isinstance(amd, dict) else {} + + +def _oem_list_field(oem_dict: dict[str, Any], key: str) -> list[Any]: + """Return a list field from the resolved AMD OEM dict.""" + raw = oem_dict.get(key) + return raw if isinstance(raw, list) else [] + + +def event_afids_from_oem(event: dict[str, Any]) -> list[int]: + """AFIDs from Oem.AMDFieldIdentifiers or Oem.AMD.AMDFieldIdentifiers.""" + raw = _oem_list_field(get_amd_oem_dict(event), "AMDFieldIdentifiers") + out: list[int] = [] + for item in raw: + if not isinstance(item, dict): + continue + for key in ("AFID", "Afid", "afid"): + if (v := item.get(key)) is not None: + try: + out.append(int(v)) + except (TypeError, ValueError): + pass + break + return out + + +def _err_data_arr_entries(event: dict[str, Any]) -> list[dict[str, Any]]: + """ErrDataArr rows from Oem.ErrDataArr or Oem.AMD.ErrDataArr.""" + arr = _oem_list_field(get_amd_oem_dict(event), "ErrDataArr") + return [e for e in arr if isinstance(e, dict)] + + +def event_has_aca_decode(event: dict[str, Any]) -> bool: + """True when the log entry includes ACA-style DecodedData under ErrDataArr.""" + for entry in _err_data_arr_entries(event): + decoded = entry.get("DecodedData") + if isinstance(decoded, dict) and decoded: + return True + return False + + +def _nonempty_serial_in_mapping(obj: Any) -> bool: + if not isinstance(obj, dict): + return False + for key in _SERIAL_KEYS: + val = obj.get(key) + if val is not None and str(val).strip(): + return True + return False + + +def event_aca_includes_serial(event: dict[str, Any]) -> bool: + """Serial (or UBB serial) present on any ErrDataArr row MetaData.""" + return any( + _nonempty_serial_in_mapping(entry.get("MetaData")) for entry in _err_data_arr_entries(event) + ) + + +def is_cper_method_afid(afid: int) -> bool: + """True for CPER-method AFIDs (<= CPER_METHOD_AFID_MAX), including on RF log entries.""" + return afid <= CPER_METHOD_AFID_MAX + + +def is_redfish_method_afid(afid: int) -> bool: + """True for MI3XX Redfish-method AFIDs in the 10k range (10000–10999).""" + return REDFISH_METHOD_AFID_MIN <= afid <= REDFISH_METHOD_AFID_MAX + + +def should_skip_cper_fetch_or_decode(event: dict[str, Any]) -> bool: + """Whether to omit CPER binary fetch and configured CPER decode for this Redfish member. + + Skip when: + + * Every OEM-listed AFID is CPER-method (<= CPER_METHOD_AFID_MAX; may match + in-band CPER AFIDs), ACA DecodedData is present, and serial is on the entry; or + * ACA DecodedData is present but no serial — the CPER blob does not add + actionable identity beyond what is already missing from the log. + """ + if not event_has_aca_decode(event): + return False + if not event_aca_includes_serial(event): + return True + afids = event_afids_from_oem(event) + if not afids: + return False + return all(is_cper_method_afid(afid) for afid in afids) diff --git a/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py b/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py new file mode 100644 index 00000000..17a60eaa --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/mi3xx_data.py @@ -0,0 +1,186 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import json +import os +from typing import Any, Dict, List, Optional + +from pydantic import BaseModel, Field + +from nodescraper.models import DataModel + + +class MI3XXDeviceInfo(BaseModel): + """Device identity with separate board and product fields.""" + + board_product_name: Optional[str] = Field( + default=None, + description="Board product name (IPMI board information area).", + ) + board_part_number: Optional[str] = Field( + default=None, + description="Board part number.", + ) + board_serial_number: Optional[str] = Field( + default=None, + description="Board serial number.", + ) + board_manufacturing_date: Optional[str] = Field( + default=None, + description=( + "Board manufacturing date as a rendered string " + "(not IPMI minutes-since-1996 encoding)." + ), + ) + product_name: Optional[str] = Field( + default=None, + description="Product name (IPMI product information area).", + ) + product_part_number: Optional[str] = Field( + default=None, + description="Product part or model number.", + ) + product_serial_number: Optional[str] = Field( + default=None, + description="Product serial number.", + ) + product_version: Optional[str] = Field( + default=None, + description="Product version (no board-area equivalent in IPMI FRU).", + ) + oem_extensions: Dict[str, Any] = Field( + default_factory=dict, + description=("Vendor-specific fields: extra board/product data, multirecord, etc."), + ) + + +class MI3XXResult(BaseModel): + """Structured serviceability report output.""" + + node: Optional[str] = None + node_scraper_version: Optional[str] = Field( + default=None, + description="Version of amd-node-scraper that produced this report.", + ) + plugin_name: Optional[str] = Field( + default=None, + description="Name of the serviceability plugin that produced this report.", + ) + plugin_version: Optional[str] = Field( + default=None, + description="Version of the serviceability plugin that produced this report.", + ) + reporter_extensions: Dict[str, str] = Field( + default_factory=dict, + description="Additional tool versions keyed by name.", + ) + service_recommendations: Dict[str, List[dict]] = Field(default_factory=dict) + service_action_definitions: Dict[str, dict] = Field(default_factory=dict) + afid_sag_metadata: Dict[str, Any] = Field(default_factory=dict) + node_info: Dict[str, Any] = Field(default_factory=dict) + extensions: Dict[str, Any] = Field( + default_factory=dict, + description="Additional implementation-specific fields.", + ) + + +def build_mi3xx_reporting_version_fields( + *, + plugin_name: Optional[str] = None, + plugin_version: Optional[str] = None, + node_scraper_version: Optional[str] = None, + **reporter_extensions: str, +) -> Dict[str, Any]: + """Build keyword arguments for result versioning fields. + + Args: + plugin_name: Name of the reporting plugin. + plugin_version: Version of the reporting plugin. + node_scraper_version: Node scraper version; defaults to the installed package version. + reporter_extensions: Additional tool versions as keyword arguments. + + Returns: + Dictionary of versioning fields for a result model. + """ + import nodescraper + + return { + "node_scraper_version": node_scraper_version or nodescraper.__version__, + "plugin_name": plugin_name, + "plugin_version": plugin_version, + "reporter_extensions": dict(reporter_extensions), + } + + +class MI3XXDataModel(DataModel): + """Collected OOB Redfish serviceability data model.""" + + collected_data: Dict[str, Any] = Field( + default_factory=dict, + description="Arbitrary keyed payloads from the collector implementation.", + ) + device_info: Dict[str, MI3XXDeviceInfo] = Field( + default_factory=dict, + description="Optional device identity keyed by implementer-defined labels.", + ) + artifacts: Dict[str, Any] = Field( + default_factory=dict, + description="Filename to JSON-serializable payload for log_model output.", + ) + endpoint: Optional[str] = Field( + default=None, + description="Optional host or service endpoint label (not necessarily a BMC).", + ) + log_path: Optional[str] = None + result: Optional[MI3XXResult] = None + + def log_model(self, log_path: str) -> None: + """Write artifact files and a JSON summary under the log directory. + + Args: + log_path: Directory path for output files. + + Returns: + None. + """ + os.makedirs(log_path, exist_ok=True) + for filename, payload in self.artifacts.items(): + if not filename or not str(filename).strip(): + continue + artifact_path = os.path.join(log_path, str(filename).strip()) + with open(artifact_path, "w", encoding="utf-8") as handle: + json.dump(payload, handle, indent=2) + summary_path = os.path.join(log_path, "MI3XX_data.json") + with open(summary_path, "w", encoding="utf-8") as handle: + json.dump( + self.model_dump( + exclude={"artifacts"}, + mode="json", + ), + handle, + indent=2, + ) diff --git a/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py b/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py new file mode 100644 index 00000000..d578d949 --- /dev/null +++ b/nodescraper/plugins/serviceability/mi3xx/serviceability_plugin_mi3xx.py @@ -0,0 +1,51 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from nodescraper.plugins.serviceability.analyzer_args import ServiceabilityAnalyzerArgs +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) +from nodescraper.plugins.serviceability.serviceability_plugin_base import ( + ServiceabilityPluginBase, +) +from nodescraper.utils import register_log_dir_name + +from .mi3xx_analyzer import MI3XXAnalyzer +from .mi3xx_collector import MI3XXCollector +from .mi3xx_collector_args import MI3XXCollectorArgs + +register_log_dir_name("ServiceabilityPluginMI3XX", "serviceability_plugin_MI3XX") +register_log_dir_name("MI3XXCollector", "MI3XX_collector") +register_log_dir_name("MI3XXAnalyzer", "MI3XX_analyzer") + + +class ServiceabilityPluginMI3XX(ServiceabilityPluginBase): + """MI3XX OOB Redfish serviceability: BMC event log, CPER attachments, and service hub analysis.""" + + DATA_MODEL = ServiceabilityDataModel + COLLECTOR = MI3XXCollector + ANALYZER = MI3XXAnalyzer + COLLECTOR_ARGS = MI3XXCollectorArgs + ANALYZER_ARGS = ServiceabilityAnalyzerArgs diff --git a/nodescraper/plugins/serviceability/se_adapter.py b/nodescraper/plugins/serviceability/se_adapter.py new file mode 100644 index 00000000..3db9394d --- /dev/null +++ b/nodescraper/plugins/serviceability/se_adapter.py @@ -0,0 +1,338 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Map serviceability plugin models to/from Python service hub results.""" +from __future__ import annotations + +import json +from collections import defaultdict +from typing import Any, Dict, List, Optional, Tuple + +from .se_models import AfidEvent, ServiceabilityBlock, ServiceabilitySolution + +# Hub payload keys commonly holding a one-line human summary (not raw OEM metadata). +_SUMMARY_VALUE_KEYS: Tuple[str, ...] = ( + "short_service", + "short_service_info", + "summary", + "message", + "title", + "recommendation", + "solution", + "service_recommendation", + "action", +) +_UNIT_LABEL_KEYS: Tuple[str, ...] = ( + "oem", + "OEM", + "unit", + "serviceable_unit", + "designation", + "chassis", + "device", +) + + +def _hub_version_display(version_info: Any) -> Optional[str]: + """Pick a single hub version string from common hub result version dict layouts.""" + if not isinstance(version_info, dict) or not version_info: + return None + primary = ( + version_info.get("isa_version") + or version_info.get("version") + or version_info.get("engine_version") + or version_info.get("VERSION") + ) + if primary is None: + return None + text = str(primary).strip() + if not text: + return None + bd = version_info.get("build_date") + if bd and str(bd).strip(): + return f"{text} (build {str(bd).strip()})" + return text + + +def _afid_sag_file_version_display(metadata: Any) -> Optional[str]: + """Build AFID_SAG file identity string (pid, revision, variant) from hub metadata.""" + if not isinstance(metadata, dict) or not metadata: + return None + pid = metadata.get("sag_pid") or metadata.get("pid") + rev = metadata.get("revision") + variant = metadata.get("variant") + parts: list[str] = [] + if pid and str(pid).strip(): + parts.append(f"PID {str(pid).strip()}") + if rev and str(rev).strip(): + parts.append(f"revision {str(rev).strip()}") + if variant and str(variant).strip(): + parts.append(f"variant {str(variant).strip()}") + if not parts: + return None + return ", ".join(parts) + + +def _human_summary_line_from_hub_value(value: Any) -> Optional[str]: + """Pick a single human-readable line from a hub fragment (string, number, or dict).""" + if value is None: + return None + if isinstance(value, str): + text = value.strip() + return text or None + if isinstance(value, (int, float)) and not isinstance(value, bool): + return str(value).strip() or None + if isinstance(value, dict): + for key in _SUMMARY_VALUE_KEYS: + if key not in value: + continue + got = _human_summary_line_from_hub_value(value[key]) + if got: + return got + for key in ("service_action", "ServiceAction"): + if key not in value: + continue + raw = value[key] + if isinstance(raw, dict): + inner = ( + raw.get("title") + or raw.get("text") + or raw.get("name") + or raw.get("service_action") + ) + if isinstance(inner, str) and inner.strip(): + return inner.strip() + got = _human_summary_line_from_hub_value(raw) + if got: + return got + else: + s = str(raw).strip() + if s: + return s + for alt in ("text", "name", "description", "details"): + if isinstance(value.get(alt), str) and str(value[alt]).strip(): + return str(value[alt]).strip() + return None + text = str(value).strip() + return text or None + + +def _unit_label_from_short_service_item(item: dict[str, Any]) -> str: + for key in _UNIT_LABEL_KEYS: + raw = item.get(key) + if raw is not None and str(raw).strip(): + return str(raw).strip() + return "" + + +def _maybe_unwrap_outer_unit_map(d: dict[str, Any]) -> dict[str, Any]: + """If the hub wraps {wrapper: {unit: {...}}}, return the inner unit map.""" + if len(d) != 1: + return d + _, inner = next(iter(d.items())) + if isinstance(inner, dict) and inner and all(isinstance(v, dict) for v in inner.values()): + return inner + return d + + +def _merged_short_service_lines_from_unit_messages(entries: List[Tuple[str, str]]) -> List[str]: + """Group (unit, message) rows by message; merge units when the message is identical.""" + by_message: dict[str, list[str]] = defaultdict(list) + for unit, msg in entries: + if not msg: + continue + by_message[msg].append(unit or "") + + lines: list[str] = [] + for msg in sorted(by_message.keys(), key=lambda m: (-len(by_message[m]), m.lower())): + units = sorted({u for u in by_message[msg] if u}) + if len(units) <= 1: + u = units[0] if units else "" + lines.append(f"{msg} ({u})" if u else msg) + else: + lines.append(f"{msg} — OEMs/units: {', '.join(units)}") + return lines + + +def _format_short_service_info_for_block(raw: Any) -> Optional[str]: + """Turn hub ``short_service_info`` into multiline log/LLM text (no JSON dump of unit maps).""" + if raw is None: + return None + if isinstance(raw, str): + text = raw.strip() + return text or None + if isinstance(raw, (list, tuple)): + if raw and all(isinstance(x, dict) for x in raw): + entries: list[tuple[str, str]] = [] + for item in raw: + assert isinstance(item, dict) + unit = _unit_label_from_short_service_item(item) + msg = _human_summary_line_from_hub_value( + item + ) or _human_summary_line_from_hub_value(item.get("short_service_info")) + if msg: + entries.append((unit, msg)) + lines = _merged_short_service_lines_from_unit_messages(entries) + out = "\n".join(lines).strip() + return out or None + parts = [str(x).strip() for x in raw if x is not None and str(x).strip()] + return "\n".join(parts) if parts else None + if isinstance(raw, dict): + d = _maybe_unwrap_outer_unit_map(raw) + if d and all(isinstance(v, dict) for v in d.values()): + entries = [] + for unit_key, inner in d.items(): + msg = _human_summary_line_from_hub_value(inner) + if msg: + entries.append((str(unit_key).strip(), msg)) + lines = _merged_short_service_lines_from_unit_messages(entries) + out = "\n".join(lines).strip() + if out: + return out + flat_lines: list[str] = [] + for key in sorted(d.keys(), key=lambda x: str(x).lower()): + val = d[key] + if isinstance(val, dict): + msg = _human_summary_line_from_hub_value(val) + if msg: + flat_lines.append(f"{key}: {msg}") + elif val is not None and str(val).strip(): + flat_lines.append(f"{key}: {str(val).strip()}") + if flat_lines: + return "\n".join(flat_lines) + try: + compact = json.dumps(d, sort_keys=True) + except TypeError: + compact = str(d) + compact = compact.strip() + return compact or None + text = str(raw).strip() + return text or None + + +def format_serviceability_solution_lines(block: ServiceabilityBlock) -> list[str]: + """Human-readable lines for logging or console output.""" + lines: list[str] = [] + if block.short_service_info: + lines.append("short_service_info:") + for part in block.short_service_info.splitlines(): + lines.append(f" {part}" if part else " ") + lines.append("") + if block.solution_reasoning: + lines.append(block.solution_reasoning) + if block.hub_version: + lines.append(f"Hub version: {block.hub_version}") + if block.afid_sag_file_version: + lines.append(f"AFID_SAG file: {block.afid_sag_file_version}") + if not block.solution: + lines.append("No service actions recommended.") + return lines + for index, solution in enumerate(block.solution, start=1): + units = ", ".join(solution.serviceable_unit) + title = (solution.service_action_title or "").strip() + action = f"service action {solution.service_action_num}" + if title: + action = f"{action} ({title})" + lines.append(f"[{index}] AFID {solution.afid}, {action}, units: [{units}]") + return lines + + +def serviceability_block_from_service_result( + afid_events: list[AfidEvent], + result: Any, + *, + hub_label: str = "Service hub", + rf_event_count: int = 0, +) -> ServiceabilityBlock: + """Build a ``ServiceabilityBlock`` from a hub result with ``service_info``.""" + grouped: dict[tuple[int, int], list[str]] = defaultdict(list) + titles: dict[tuple[int, int], str] = {} + service_info = getattr(result, "service_info", None) or {} + + def _action_title(info: dict[str, Any]) -> str: + raw = info.get("title") or info.get("service_action") or info.get("ServiceAction") + if raw is None: + return "" + if isinstance(raw, dict): + return str(raw.get("title") or raw.get("text") or raw.get("name") or "").strip() + return str(raw).strip() + + for designation, afid_map in service_info.items(): + if not isinstance(afid_map, dict): + continue + unit = str(designation).strip() if designation is not None else "" + for afid_raw, info in afid_map.items(): + if not isinstance(info, dict): + continue + san_raw = info.get("service_action_number") + if san_raw is None: + continue + try: + afid = int(afid_raw) + san = int(san_raw) + except (TypeError, ValueError): + continue + key = (afid, san) + if unit and unit not in grouped[key]: + grouped[key].append(unit) + label = _action_title(info) + if label and key not in titles: + titles[key] = label + + solutions = [ + ServiceabilitySolution( + afid=afid, + serviceable_unit=units, + service_action_num=san, + service_action_title=titles.get((afid, san)), + ) + for (afid, san), units in sorted(grouped.items()) + ] + raw_metadata = getattr(result, "afid_sag_metadata", None) + metadata: Dict[str, Any] = raw_metadata if isinstance(raw_metadata, dict) else {} + version_info = ( + getattr(result, "engine_version_info", None) + or getattr(result, "isa_version_info", None) + or getattr(result, "version_info", None) + or {} + ) + hub_version = _hub_version_display(version_info) + afid_sag_file_version = _afid_sag_file_version_display(metadata) + reasoning = ( + f"{hub_label}: {len(solutions)} recommendation(s) from {rf_event_count} Redfish event(s)." + ) + meta_out: Optional[dict[str, Any]] = dict(metadata) if isinstance(raw_metadata, dict) else None + short_service_info = _format_short_service_info_for_block( + getattr(result, "short_service_info", None) + ) + return ServiceabilityBlock( + afid_events=list(afid_events), + solution=solutions, + solution_reasoning=reasoning, + hub_version=hub_version, + afid_sag_file_version=afid_sag_file_version, + afid_sag_metadata=meta_out, + short_service_info=short_service_info, + ) diff --git a/nodescraper/plugins/serviceability/se_models.py b/nodescraper/plugins/serviceability/se_models.py new file mode 100644 index 00000000..addef3ae --- /dev/null +++ b/nodescraper/plugins/serviceability/se_models.py @@ -0,0 +1,102 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, List, Optional + +from pydantic import BaseModel, Field, field_validator + + +class AfidEvent(BaseModel): + """One AFID occurrence on a serviceable unit.""" + + afid: int = Field(description="AMD Fault ID.") + serviceable_unit: str = Field( + description="Unit label (e.g. gpu02); standardized per platform.", + ) + time: str = Field( + description="First-occurrence timestamp (SE format, e.g. 2026-05-07 12:50:42.096-07:00).", + ) + + @field_validator("serviceable_unit") + @classmethod + def _strip_serviceable_unit(cls, value: str) -> str: + text = str(value).strip() + if not text: + raise ValueError("serviceable_unit must be non-empty") + return text + + +class ServiceabilitySolution(BaseModel): + """Recommended service action for an AFID.""" + + afid: int + serviceable_unit: List[str] = Field( + description="Affected serviceable units for this AFID and service action.", + ) + service_action_num: int = Field( + description="Service action number from AFID_SAG.json.", + ) + service_action_title: Optional[str] = Field( + default=None, + description=("Short service action label from the hub."), + ) + + +class ServiceabilityBlock(BaseModel): + """ANC-style serviceability section: SE input, output, and optional reasoning.""" + + afid_events: List[AfidEvent] = Field( + default_factory=list, + description="Summarized AFID events from collected data.", + ) + solution: List[ServiceabilitySolution] = Field( + default_factory=list, + description="Hub output: recommended service actions.", + ) + solution_reasoning: Optional[str] = Field( + default=None, + description="Human-readable summary of recommendations (counts and hub label).", + ) + hub_version: Optional[str] = Field( + default=None, + description="Service hub package/build version string when the hub returned it.", + ) + afid_sag_file_version: Optional[str] = Field( + default=None, + description="AFID_SAG.json pid/revision/variant string when the hub returned metadata.", + ) + afid_sag_metadata: Optional[dict[str, Any]] = Field( + default=None, + description="Hub-reported AFID_SAG metadata dict when the hub exposes afid_sag_metadata.", + ) + short_service_info: Optional[str] = Field( + default=None, + description=( + "Brief hub summary derived from short_service_info (human-readable lines; " + "per-unit dict payloads are collapsed, identical messages merged with unit lists)." + ), + ) diff --git a/nodescraper/plugins/serviceability/se_runner.py b/nodescraper/plugins/serviceability/se_runner.py new file mode 100644 index 00000000..6ff8b60e --- /dev/null +++ b/nodescraper/plugins/serviceability/se_runner.py @@ -0,0 +1,194 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Invoke a configured Python service hub against collected Redfish events.""" +from __future__ import annotations + +import importlib +import inspect +from pathlib import Path +from typing import Any, Callable, Optional, Type + +from .se_adapter import serviceability_block_from_service_result +from .se_models import AfidEvent, ServiceabilityBlock + + +def _signature_accepts_var_keyword(sig: inspect.Signature) -> bool: + return any(p.kind == inspect.Parameter.VAR_KEYWORD for p in sig.parameters.values()) + + +def _instantiate_hub( + hub_cls: Type[Any], + config_path: str, + init_path_kwarg: str, + hub_options: Optional[dict[str, Any]], +) -> Any: + """Construct the hub with ``config_path`` under ``init_path_kwarg``, plus matching options.""" + init_sig = inspect.signature(hub_cls.__init__) + kwargs: dict[str, Any] = {init_path_kwarg: config_path} + if not hub_options: + return hub_cls(**kwargs) + if _signature_accepts_var_keyword(init_sig): + merged = dict(hub_options) + merged[init_path_kwarg] = config_path + return hub_cls(**merged) + for key, val in hub_options.items(): + if key in init_sig.parameters: + kwargs[key] = val + kwargs[init_path_kwarg] = config_path + return hub_cls(**kwargs) + + +def _call_hub_analyze( + analyze: Callable[..., Any], + rf_events: list[Any], + cper_data: Optional[dict[str, Any]], + hub_options: Optional[dict[str, Any]], +) -> Any: + """Invoke the hub analyze callable with ``cper_data`` and per-parameter ``hub_options``.""" + sig = inspect.signature(analyze) + params = sig.parameters + eo = dict(hub_options or {}) + + if _signature_accepts_var_keyword(sig): + if "cper_data" in params: + eo["cper_data"] = dict(cper_data) if cper_data else None + return analyze(list(rf_events), **eo) + + kw = {k: v for k, v in eo.items() if k in params} + if "cper_data" in params: + kw["cper_data"] = dict(cper_data) if cper_data else None + return analyze(list(rf_events), **kw) + + +class SeRunError(RuntimeError): + """Raised when the service hub fails or returns invalid output.""" + + +def run_service_hub( + *, + hub_python_module: str, + hub_display_name: Optional[str] = None, + afid_events: list[AfidEvent], + afid_sag_path: str, + rf_events: list[Any], + cper_data: Optional[dict[str, Any]] = None, + hub_options: Optional[dict[str, Any]] = None, + hub_analyze_method: str = "get_service_info", + hub_init_path_kwarg: str = "afid_sag", +) -> ServiceabilityBlock: + """Run the configured Python service hub and return a :class:`ServiceabilityBlock`. + + The runner imports ``hub_python_module``, picks the unique class that implements + ``hub_analyze_method``, constructs it with the config file path passed as + ``hub_init_path_kwarg``, then calls the analyze method with ``rf_events`` and any + ``hub_options`` keys that match the method signature (plus ``cper_data`` when + supported). Result mapping is handled by :func:`serviceability_block_from_service_result`. + """ + sag_path = Path(afid_sag_path) + if not sag_path.is_file(): + raise SeRunError(f"Hub config file not found: {afid_sag_path}") + + if not rf_events: + raise SeRunError( + "Collected Redfish events are required; re-run collection or use skip_hub." + ) + + label = hub_display_name or hub_python_module + try: + mod = importlib.import_module(hub_python_module) + except ImportError as exc: + raise SeRunError(f"Cannot import {hub_python_module}: {exc}") from exc + + hub_cls = _resolve_hub_class(mod, hub_analyze_method) + + try: + instance = _instantiate_hub( + hub_cls, + afid_sag_path, + hub_init_path_kwarg, + hub_options, + ) + analyze = getattr(instance, hub_analyze_method) + result = _call_hub_analyze( + analyze, + rf_events, + cper_data, + hub_options, + ) + except Exception as exc: + raise SeRunError(f"{label} {hub_analyze_method}() failed: {exc}") from exc + + if result is None: + return ServiceabilityBlock( + afid_events=list(afid_events), + solution=[], + solution_reasoning=f"{label}: no service actions after event filtering.", + ) + + return serviceability_block_from_service_result( + afid_events, + result, + hub_label=label, + rf_event_count=len(rf_events), + ) + + +def _is_hub_class(obj: Any, analyze_method: str = "get_service_info") -> bool: + return inspect.isclass(obj) and callable(getattr(obj, analyze_method, None)) + + +def _resolve_hub_class(mod: Any, analyze_method: str = "get_service_info") -> Type[Any]: + """Find the hub class in ``mod`` that implements ``analyze_method``.""" + package = mod.__name__ + candidates: list[Type[Any]] = [] + seen: set[int] = set() + + def add_candidate(obj: Any) -> None: + if not _is_hub_class(obj, analyze_method): + return + key = id(obj) + if key in seen: + return + seen.add(key) + candidates.append(obj) + + for name in getattr(mod, "__all__", []) or []: + add_candidate(getattr(mod, name, None)) + + for _, obj in inspect.getmembers(mod, inspect.isclass): + obj_module = getattr(obj, "__module__", "") + if obj_module == package or obj_module.startswith(f"{package}."): + add_candidate(obj) + + if len(candidates) == 1: + return candidates[0] + if not candidates: + raise SeRunError( + f"No class with {analyze_method}() found in {package}; " + "check hub_python_module and hub_analyze_method in analysis_args." + ) + names = ", ".join(cls.__name__ for cls in candidates) + raise SeRunError(f"Multiple classes with {analyze_method}() in {package}: {names}.") diff --git a/nodescraper/plugins/serviceability/serviceability_collector.py b/nodescraper/plugins/serviceability/serviceability_collector.py new file mode 100644 index 00000000..0ad28643 --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_collector.py @@ -0,0 +1,254 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import abc +from typing import Any, ClassVar, Generic, Literal, Optional, Protocol, TypeVar, cast +from urllib.parse import urlparse + +from pydantic import BaseModel, Field + +from nodescraper.base import RedfishDataCollector +from nodescraper.connection.redfish import ( + RF_MEMBERS, + RF_MEMBERS_COUNT, + RedfishGetResult, +) +from nodescraper.enums import ExecutionStatus +from nodescraper.models import CollectorArgs, TaskResult + +from .serviceability_data import DeviceInfo, ServiceabilityDataModel + + +class ServiceabilityUriManifestArtifact(BaseModel): + """Resolved Redfish URIs for this serviceability run (``serviceability_uri_manifest.json``).""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "serviceability_uri_manifest" + + artifact_kind: Literal["ServiceabilityUriManifest"] = "ServiceabilityUriManifest" + event_log_uri: str + assembly_get_uris: list[str] = Field(default_factory=list) + firmware_inventory_uri: Optional[str] = None + + +class FirmwareInventoryArtifact(BaseModel): + """Firmware inventory Redfish GET; written to ``firmware_inventory.json`` with path, success, data, error, and status_code fields (same layout as a Redfish GET artifact row).""" + + ARTIFACT_LOG_BASENAME: ClassVar[str] = "firmware_inventory" + + path: str + success: bool + data: Optional[dict[str, Any]] = None + error: Optional[str] = None + status_code: Optional[int] = None + + @classmethod + def from_redfish_get(cls, res: RedfishGetResult) -> FirmwareInventoryArtifact: + return cls.model_validate(res.model_dump(mode="python")) + + +class _ServiceabilityCollectArg(Protocol): + follow_next_link: bool + max_pages: int + top: Optional[int] + rf_assembly_uri_template: Optional[str] + rf_chassis_devices: Optional[list[str]] + rf_firmware_bundle_uri: Optional[str] + + def resolved_event_log_uri(self) -> str: ... + + +TServiceabilityCollectArg = TypeVar("TServiceabilityCollectArg", bound=_ServiceabilityCollectArg) + + +class ServiceabilityCollectorBase( + RedfishDataCollector[ServiceabilityDataModel, CollectorArgs], + Generic[TServiceabilityCollectArg], +): + """OOB Redfish collection skeleton; subclasses implement filtering, CPER handling, and JSON parsing.""" + + DATA_MODEL = ServiceabilityDataModel + + def __init__(self, **kwargs: Any) -> None: + self._log_path: Optional[str] = kwargs.get("log_path") + super().__init__(**kwargs) + + @abc.abstractmethod + def filter_event_members( + self, + members: list[Any], + args: TServiceabilityCollectArg, + ) -> list[Any]: + """Return the event list to retain for downstream analysis.""" + + @abc.abstractmethod + def is_cper_event(self, event: dict) -> bool: + """Return whether a Redfish event entry should be treated as diagnostic-backed.""" + + @abc.abstractmethod + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + """Fetch CPER binary attachments for qualifying events (base64 by event Id).""" + + @abc.abstractmethod + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: TServiceabilityCollectArg, + ) -> DeviceInfo: + """Map one Assemblies[] member dict into DeviceInfo.""" + + @abc.abstractmethod + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: TServiceabilityCollectArg, + ) -> Optional[str]: + """Derive component-details text from a firmware inventory GET payload, or None.""" + + def _fetch_event_log(self, args: TServiceabilityCollectArg, uri: str): + if args.follow_next_link: + return self._run_redfish_get_paged(uri, max_pages=args.max_pages, log_artifact=True) + return self._run_redfish_get(uri, log_artifact=True) + + def collect_data( + self, args: Optional[CollectorArgs] = None + ) -> tuple[TaskResult, Optional[ServiceabilityDataModel]]: + if args is None: + self.result.status = ExecutionStatus.NOT_RAN + self.result.message = "Collector args are required" + return self.result, None + + svc_args = cast(TServiceabilityCollectArg, args) + event_uri = svc_args.resolved_event_log_uri() + self.logger.info( + "Serviceability: event log Redfish URI %s (follow_next_link=%s)", + event_uri, + svc_args.follow_next_link, + ) + if svc_args.top is not None: + res = self._fetch_top(svc_args, svc_args.top, svc_args.max_pages) + else: + res = self._fetch_event_log(svc_args, event_uri) + + if not res.success or res.data is None: + self.result.status = ExecutionStatus.ERROR + self.result.message = f"Redfish GET failed for {event_uri}: {res.error}" + return self.result, None + + members = res.data.get(RF_MEMBERS, []) + responses = {res.path: res.data} + raw_base_url = getattr(self.connection, "base_url", None) + bmc_host = urlparse(raw_base_url).hostname if raw_base_url else None + + try: + filtered_members = self.filter_event_members(members, svc_args) + except ValueError as exc: + self.result.status = ExecutionStatus.ERROR + self.result.message = f"Event filter failed: {exc}" + return self.result, None + + assembly_info: dict[str, DeviceInfo] = {} + assembly_get_uris: list[str] = [] + tpl = svc_args.rf_assembly_uri_template + devices = svc_args.rf_chassis_devices + if tpl and devices: + for device in devices: + uri_asm = tpl.format(device=device) + assembly_get_uris.append(uri_asm) + self.logger.info( + "Serviceability: assembly Redfish GET %s (chassis designation=%s)", + uri_asm, + device, + ) + assembly_res = self._run_redfish_get(uri_asm, log_artifact=True) + if not assembly_res.success or assembly_res.data is None: + continue + responses[assembly_res.path] = assembly_res.data + + assemblies = assembly_res.data.get("Assemblies", []) + if not assemblies: + continue + + entry = assemblies[0] + assembly_info[device] = self.parse_assembly_entry(device, entry, svc_args) + + cper_raw = self.collect_cper_attachments(filtered_members or []) + + component_details, firmware_uri_used = self._fetch_component_details(responses, svc_args) + + data = ServiceabilityDataModel( + responses=responses, + rf_events=filtered_members or [], + assembly_info=assembly_info, + cper_raw=cper_raw, + component_details=component_details, + log_path=self._log_path, + bmc_host=bmc_host, + ) + self.result.artifacts.append( + ServiceabilityUriManifestArtifact( + event_log_uri=event_uri, + assembly_get_uris=assembly_get_uris, + firmware_inventory_uri=firmware_uri_used, + ) + ) + self.result.status = ExecutionStatus.OK + self.result.message = f"Collected {len(members)} event log member(s)" + return self.result, data + + def _fetch_component_details( + self, responses: dict[str, Any], args: TServiceabilityCollectArg + ) -> tuple[Optional[str], Optional[str]]: + """Return ``(component_details, firmware_uri)``; firmware_uri is set when a GET was attempted.""" + fw_uri = args.rf_firmware_bundle_uri + if not fw_uri or not str(fw_uri).strip(): + return None, None + fw_uri = str(fw_uri).strip() + self.logger.info("Serviceability: firmware inventory Redfish GET %s", fw_uri) + fw_res = self._run_redfish_get(fw_uri, log_artifact=False) + self.result.artifacts.append(FirmwareInventoryArtifact.from_redfish_get(fw_res)) + if not fw_res.success or fw_res.data is None: + return None, fw_uri + responses[fw_res.path] = fw_res.data + return self.extract_component_details(fw_res.data, args), fw_uri + + def _fetch_top(self, args: TServiceabilityCollectArg, top: int, max_pages: int): + event_uri = args.resolved_event_log_uri() + probe = self._run_redfish_get(f"{event_uri}?$top=1", log_artifact=True) + if not probe.success or probe.data is None: + return probe + + count = probe.data.get(RF_MEMBERS_COUNT, 0) + + if count <= top: + return self._fetch_event_log(args, event_uri) + + skip = count - top + skip_uri = f"{event_uri}?$skip={skip}" + if args.follow_next_link: + return self._run_redfish_get_paged(skip_uri, max_pages=max_pages, log_artifact=True) + return self._run_redfish_get(skip_uri, log_artifact=True) diff --git a/nodescraper/plugins/serviceability/serviceability_data.py b/nodescraper/plugins/serviceability/serviceability_data.py new file mode 100644 index 00000000..b275c579 --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_data.py @@ -0,0 +1,107 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +import json +import os +from typing import Any, Dict, List, Optional + +from pydantic import BaseModel, Field + +from nodescraper.models import DataModel + +from .se_models import AfidEvent, ServiceabilityBlock + + +class DeviceInfo(BaseModel): + """Chassis fields from Assembly parsing; extra vendor keys belong in oem_extensions.""" + + name: Optional[str] = None + part_number: Optional[str] = None + production_date: Optional[str] = None + serial_number: Optional[str] = None + version: Optional[str] = None + oem_extensions: Dict[str, Any] = Field( + default_factory=dict, + description="Opaque vendor/product extensions parsed by the concrete collector.", + ) + + +class ServiceabilityResult(BaseModel): + """Structured serviceability output (typically populated by a downstream analyzer).""" + + node: Optional[str] = None + service_recommendations: Dict[str, List[dict]] = {} + service_action_definitions: Dict[str, dict] = {} + afid_sag_metadata: Dict[str, Any] = {} + node_info: Dict[str, Any] = {} + + +class ServiceabilityDataModel(DataModel): + """Collected Redfish responses and intermediate serviceability fields.""" + + responses: dict[str, Any] = {} + rf_events: list[Any] = [] + assembly_info: Dict[str, DeviceInfo] = {} + cper_raw: Dict[str, str] = Field( + default_factory=dict, + description=( + "Base64-encoded CPER attachment bytes keyed by Redfish event Id; " + "populated during collection and decoded in the analyzer." + ), + ) + cper_data: Dict[str, Any] = {} + component_details: Optional[str] = None + log_path: Optional[str] = None + bmc_host: Optional[str] = None + afid_events: List[AfidEvent] = Field( + default_factory=list, + description="Service Hub input; built during analysis when not pre-filled.", + ) + serviceability: Optional[ServiceabilityBlock] = Field( + default=None, + description="ANC-style serviceability block (SE input + output).", + ) + result: Optional[ServiceabilityResult] = None + + def log_model(self, log_path: str) -> None: + """Write collector artifacts and optional serviceability.json under log_path.""" + os.makedirs(log_path, exist_ok=True) + responses_path = os.path.join(log_path, "redfish_responses.json") + with open(responses_path, "w", encoding="utf-8") as f: + json.dump(self.responses, f, indent=2) + if self.cper_data: + cper_path = os.path.join(log_path, "cper_data.json") + with open(cper_path, "w", encoding="utf-8") as f: + json.dump(self.cper_data, f, indent=2) + if self.serviceability is not None: + serviceability_path = os.path.join(log_path, "serviceability.json") + with open(serviceability_path, "w", encoding="utf-8") as f: + json.dump( + self.serviceability.model_dump(mode="json"), + f, + indent=2, + ) diff --git a/nodescraper/plugins/serviceability/serviceability_plugin_base.py b/nodescraper/plugins/serviceability/serviceability_plugin_base.py new file mode 100644 index 00000000..67ff45ca --- /dev/null +++ b/nodescraper/plugins/serviceability/serviceability_plugin_base.py @@ -0,0 +1,46 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from nodescraper.base import OOBandDataPlugin +from nodescraper.models import CollectorArgs + +from .analyzer_args import ServiceabilityAnalyzerArgs +from .serviceability_collector import ServiceabilityCollectorBase +from .serviceability_data import ServiceabilityDataModel + + +class ServiceabilityPluginBase( + OOBandDataPlugin[ + ServiceabilityDataModel, + CollectorArgs, + ServiceabilityAnalyzerArgs, + ], +): + """OOB Redfish plugin stub; subclass with a concrete COLLECTOR and COLLECTOR_ARGS.""" + + DATA_MODEL = ServiceabilityDataModel + COLLECTOR = ServiceabilityCollectorBase + COLLECTOR_ARGS = CollectorArgs + ANALYZER_ARGS = ServiceabilityAnalyzerArgs diff --git a/nodescraper/plugins/serviceability/time_utils.py b/nodescraper/plugins/serviceability/time_utils.py new file mode 100644 index 00000000..7b9465c5 --- /dev/null +++ b/nodescraper/plugins/serviceability/time_utils.py @@ -0,0 +1,147 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from datetime import datetime, timezone +from typing import Literal + +TimeOperator = Literal[">", ">=", "<", "<=", "=="] + +_TIME_OPERATORS: set[str] = {">", ">=", "<", "<=", "=="} + + +def _as_utc_for_compare(value: datetime) -> datetime: + """Normalize naive datetimes to UTC for comparisons against offset-aware values.""" + if value.tzinfo is None: + return value.replace(tzinfo=timezone.utc) + return value.astimezone(timezone.utc) + + +def is_valid_iso_datetime(value: str) -> bool: + """Return whether a string is ISO-8601 compliant. + + Args: + value: Date or date-time string to validate. + + Returns: + True if the value parses as ISO-8601. + """ + try: + parse_iso_datetime(value) + except ValueError: + return False + return True + + +def normalize_se_timestamp(value: str) -> str: + """Normalize a timestamp to the Service Hub wire format. + + Accepts ISO-8601 (``2026-05-07T12:50:42``) and SE-style strings with a space + separator (``2026-05-07 12:50:42.096-07:00``). + """ + text = str(value).strip() + if not text: + raise ValueError("Empty datetime string") + if " " in text and "T" not in text: + return text + parsed = parse_iso_datetime(text) + micro = parsed.microsecond + base = parsed.strftime("%Y-%m-%d %H:%M:%S") + if micro: + base = f"{base}.{micro:06d}".rstrip("0").rstrip(".") + offset = parsed.strftime("%z") + if offset: + return f"{base}{offset[:3]}:{offset[3:]}" + return base + + +def parse_iso_datetime(value: str) -> datetime: + """Parse an ISO-8601 or SE-style date-time string. + + Args: + value: Date (e.g. 2026-05-17), ISO date-time, or SE format with a space separator. + + Returns: + Parsed datetime. + """ + text = str(value).strip() + if not text: + raise ValueError("Empty datetime string") + if text.endswith("Z"): + text = f"{text[:-1]}+00:00" + if " " in text and "T" not in text: + text = text.replace(" ", "T", 1) + try: + parsed = datetime.fromisoformat(text) + except ValueError as exc: + raise ValueError(f"Not ISO-8601 compliant: {value!r}") from exc + if "T" not in value and "+" not in value and value.count("-") == 2: + return parsed.replace(hour=0, minute=0, second=0, microsecond=0) + return parsed + + +def compare_iso_datetime(left: str, right: str, operator: TimeOperator) -> bool: + """Compare two ISO-8601 values with a relational operator. + + Args: + left: Left-hand date or date-time string. + right: Right-hand date or date-time string. + operator: One of >, >=, <, <=, or ==. + + Returns: + Result of the comparison. + """ + if operator not in _TIME_OPERATORS: + raise ValueError(f"Unsupported time operator: {operator!r}") + left_dt = _as_utc_for_compare(parse_iso_datetime(left)) + right_dt = _as_utc_for_compare(parse_iso_datetime(right)) + if operator == ">": + return left_dt > right_dt + if operator == ">=": + return left_dt >= right_dt + if operator == "<": + return left_dt < right_dt + if operator == "<=": + return left_dt <= right_dt + return left_dt == right_dt + + +def satisfies_time_check( + candidate: str, + reference: str, + operator: TimeOperator, +) -> bool: + """Test whether candidate satisfies operator against reference. + + Args: + candidate: Date or date-time string to test. + reference: Reference date or date-time string. + operator: One of >, >=, <, <=, or ==. + + Returns: + True when the comparison holds. + """ + return compare_iso_datetime(candidate, reference, operator) diff --git a/nodescraper/py.typed b/nodescraper/py.typed new file mode 100644 index 00000000..e69de29b diff --git a/nodescraper/taskresulthooks/filesystemloghook.py b/nodescraper/taskresulthooks/filesystemloghook.py index 831e3fbe..50184b4e 100644 --- a/nodescraper/taskresulthooks/filesystemloghook.py +++ b/nodescraper/taskresulthooks/filesystemloghook.py @@ -28,7 +28,7 @@ from nodescraper.interfaces.taskresulthook import TaskResultHook from nodescraper.models import DataModel, TaskResult -from nodescraper.utils import pascal_to_snake +from nodescraper.utils import resolve_log_dir_name class FileSystemLogHook(TaskResultHook): @@ -43,9 +43,9 @@ def process_result(self, task_result: TaskResult, data: Optional[DataModel] = No """Log task result to the filesystem (single events.json per directory).""" log_path = self.log_base_path if task_result.parent: - log_path = os.path.join(log_path, pascal_to_snake(task_result.parent)) + log_path = os.path.join(log_path, resolve_log_dir_name(task_result.parent)) if task_result.task: - log_path = os.path.join(log_path, pascal_to_snake(task_result.task)) + log_path = os.path.join(log_path, resolve_log_dir_name(task_result.task)) task_result.log_result(log_path) diff --git a/nodescraper/utils.py b/nodescraper/utils.py index e7a201b8..11c3ab57 100644 --- a/nodescraper/utils.py +++ b/nodescraper/utils.py @@ -189,18 +189,35 @@ def get_unique_filename(directory, filename) -> str: count += 1 -def pascal_to_snake(input_str: str) -> str: - """Convert PascalCase to snake_case +_LOG_DIR_NAME_OVERRIDES: dict[str, str] = {} - Args: - input_str (str): string to convert - Returns: - str: converted string +def register_log_dir_name(class_name: str, log_dir_name: str) -> None: + """Register a filesystem log directory name for a task or plugin class.""" + _LOG_DIR_NAME_OVERRIDES[class_name] = log_dir_name + + +def resolve_log_dir_name(class_name: str) -> str: + """Map a class name to its log directory (override or snake_case).""" + if class_name in _LOG_DIR_NAME_OVERRIDES: + return _LOG_DIR_NAME_OVERRIDES[class_name] + return pascal_to_snake(class_name) + + +def pascal_to_snake(input_str: str) -> str: + """Convert PascalCase to snake_case. + + Handles embedded acronyms with digits (e.g. ``ServiceabilityPluginMI3XX``, + ``MI3XXCollector``) without splitting into single-letter segments. """ + if not input_str: + return "" if input_str.isupper(): return input_str.lower() - return ("_").join(re.split("(?<=.)(?=[A-Z])", input_str)).lower() + normalized = re.sub(r"([A-Z][A-Z0-9]+)([A-Z][a-z])", r"\1_\2", input_str) + normalized = re.sub(r"([a-z])([A-Z][A-Z0-9]+)", r"\1_\2", normalized) + normalized = re.sub(r"([a-z0-9])([A-Z])", r"\1_\2", normalized) + return normalized.lower() def bytes_to_human_readable(input_bytes: int) -> str: diff --git a/pyproject.toml b/pyproject.toml index 1d40c1a8..d2f1bdef 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,12 +1,23 @@ [project] name = "amd-node-scraper" dynamic = ["version"] -description = "A framework for automated error detection and data collection" +description = "Automated data collection and analysis for system debug." authors = [{ name = "AMD" }] readme = "README.md" requires-python = ">=3.9" +license = { text = "MIT" } -keywords = [] +keywords = [ + "amd", + "debug", + "diagnostics", + "dmesg", + "redfish", + "scraping", + "systems", + "oob", + "in-band", +] classifiers = ["Topic :: Software Development"] @@ -31,13 +42,16 @@ dev = [ "pytest-cov", "mypy", "types-paramiko", + "types-requests", "types-setuptools", ] [project.urls] homepage = "https://github.com/amd/node-scraper" -documentation = "https://github.com/amd/node-scraper" +documentation = "https://github.com/amd/node-scraper/blob/main/docs/PLUGIN_DOC.md" repository = "https://github.com/amd/node-scraper" +changelog = "https://github.com/amd/node-scraper/releases" +issues = "https://github.com/amd/node-scraper/issues" [build-system] requires = ["setuptools==78.1.1", "setuptools-scm==8.1.0"] @@ -49,6 +63,9 @@ include = ['nodescraper'] [tool.setuptools] include-package-data = true +[tool.setuptools.package-data] +nodescraper = ["py.typed"] + [project.scripts] node-scraper = "nodescraper.cli:cli_entry" @@ -67,5 +84,10 @@ profile = "black" select = ["F", "B", "T20", "N", "W", "I", "E"] ignore = ["E501", "N806"] +[tool.mypy] +python_version = "3.9" +mypy_path = ["test/unit"] +explicit_package_bases = true + [tool.setuptools_scm] version_scheme = "post-release" diff --git a/test/unit/cli/test_cli_embed_api.py b/test/unit/cli/test_cli_embed_api.py index db44f6cf..54b95043 100644 --- a/test/unit/cli/test_cli_embed_api.py +++ b/test/unit/cli/test_cli_embed_api.py @@ -53,7 +53,12 @@ def test_run_cli_return_code_and_run_main_return_code_delegate( ) -> None: calls: list[list[str]] = [] - def fake_main(arg_input: list[str], *, host_cli_args=None) -> None: + def fake_main( + arg_input: list[str], + *, + host_cli_args=None, + plugin_run_result_hooks=None, + ) -> None: calls.append(list(arg_input)) raise SystemExit(7) diff --git a/test/unit/framework/common/shared_utils.py b/test/unit/framework/common/shared_utils.py index 5b882549..7ba16c16 100644 --- a/test/unit/framework/common/shared_utils.py +++ b/test/unit/framework/common/shared_utils.py @@ -23,7 +23,7 @@ # SOFTWARE. # ############################################################################### -from typing import Optional +from typing import Any, Dict, List, Optional from unittest.mock import MagicMock from nodescraper.constants import DEFAULT_EVENT_REPORTER @@ -83,7 +83,14 @@ def build_from_model(cls, model): class DummyDataModel(DataModel): - foo: str = None + foo: Optional[str] = None + some_version: str = "0" + + +# Module-level defaults so ``run`` signatures stay stable for ConfigBuilder tests. +_TEST_PLUGIN_A_LIST_DEFAULT: List[Any] = [1] +_TEST_PLUGIN_A_DICT_DEFAULT: Dict[str, Any] = {} +_TEST_PLUGIN_A_MODEL_DEFAULT = TestModelArg() class TestPluginA(PluginInterface[MockConnectionManager, None]): @@ -95,10 +102,12 @@ def run( self, test_bool_arg: bool = True, test_str_arg: str = "test", - test_list_arg: list[int] = [1], # noqa: B006 - test_dict_arg: dict = {}, # noqa: B006 - test_model_arg: Optional[TestModelArg] = None, - ): + test_list_arg: List[Any] = _TEST_PLUGIN_A_LIST_DEFAULT, + test_dict_arg: Dict[str, Any] = _TEST_PLUGIN_A_DICT_DEFAULT, + test_model_arg: TestModelArg = _TEST_PLUGIN_A_MODEL_DEFAULT, + **kwargs: Any, + ) -> PluginResult: + _ = kwargs return PluginResult( source="testA", status=ExecutionStatus.ERROR, diff --git a/test/unit/framework/test_plugin_executor.py b/test/unit/framework/test_plugin_executor.py index fe9a8954..494551ce 100644 --- a/test/unit/framework/test_plugin_executor.py +++ b/test/unit/framework/test_plugin_executor.py @@ -186,3 +186,18 @@ def test_connection_manager_from_plugin_when_not_in_registry(): assert len(results) == 1 assert results[0].source == "testB" assert results[0].status == ExecutionStatus.OK + + +def test_plugin_run_result_hooks_called_after_each_plugin(plugin_registry): + seen: list[str] = [] + + def hook(res: PluginResult) -> None: + seen.append(res.source) + + executor = PluginExecutor( + plugin_configs=[PluginConfig(plugins={"TestPluginB": {}})], + plugin_registry=plugin_registry, + plugin_run_result_hooks=[hook], + ) + executor.run_queue() + assert seen == ["testB"] diff --git a/test/unit/instinct_shaped_engine.py b/test/unit/instinct_shaped_engine.py new file mode 100644 index 00000000..b5989a24 --- /dev/null +++ b/test/unit/instinct_shaped_engine.py @@ -0,0 +1,67 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +from __future__ import annotations + +from typing import Any, Optional + +__all__ = ["InstinctShapedEngine"] + +_LAST_CALL: dict[str, Any] = {} + + +def clear_last_call() -> None: + _LAST_CALL.clear() + + +def get_last_call() -> dict[str, Any]: + return dict(_LAST_CALL) + + +class InstinctShapedEngine: + """Mirrors keyword parameters of ``InstinctServiceAssistant.get_service_info``.""" + + def __init__(self, afid_sag: str) -> None: + self.afid_sag = afid_sag + + def get_service_info( + self, + rf_events: list[Any], + from_ac_cycle: int = -1, + from_date: Optional[str] = None, + cper_data: Optional[dict[str, Any]] = None, + designation_serials: Optional[dict[str, str]] = None, + suppress_service_actions: Optional[list[str]] = None, + ) -> None: + _LAST_CALL.clear() + _LAST_CALL.update( + from_ac_cycle=from_ac_cycle, + from_date=from_date, + cper_data=cper_data, + designation_serials=designation_serials, + suppress_service_actions=suppress_service_actions, + rf_len=len(rf_events), + ) + return None diff --git a/test/unit/mock_python_engine.py b/test/unit/mock_python_engine.py new file mode 100644 index 00000000..09e38a7e --- /dev/null +++ b/test/unit/mock_python_engine.py @@ -0,0 +1,48 @@ +"""Mock Python service hub for unit tests.""" + +from __future__ import annotations + +from types import SimpleNamespace +from typing import Any, Optional + +from serviceability_dummy_data import ( + DUMMY_HUB_VERSION, + DUMMY_SAG_PID, + DUMMY_SAG_REVISION, + DUMMY_SAG_VARIANT, + DUMMY_SERVICE_ACTION_NUM, + DUMMY_SERVICE_ACTION_TITLE, + DUMMY_UNIT_A, +) + + +class MockServiceEngine: + def __init__(self, afid_sag: str) -> None: + self.afid_sag = afid_sag + + def get_service_info( + self, + rf_events: list[dict[str, Any]], + cper_data: Optional[dict[str, Any]] = None, + **kwargs: Any, + ) -> SimpleNamespace: + del cper_data, kwargs + service_info: dict[str, dict[str, dict[str, str]]] = {} + for event in rf_events: + afid = event.get("Afid") + unit = event.get("serviceable_unit", DUMMY_UNIT_A) + if afid is None: + continue + service_info.setdefault(str(unit), {})[str(afid)] = { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "title": DUMMY_SERVICE_ACTION_TITLE, + } + return SimpleNamespace( + service_info=service_info, + afid_sag_metadata={ + "sag_pid": DUMMY_SAG_PID, + "revision": DUMMY_SAG_REVISION, + "variant": DUMMY_SAG_VARIANT, + }, + engine_version_info={"version": DUMMY_HUB_VERSION}, + ) diff --git a/test/unit/plugin/fixtures/afid_sag_sample.json b/test/unit/plugin/fixtures/afid_sag_sample.json new file mode 100644 index 00000000..952999e6 --- /dev/null +++ b/test/unit/plugin/fixtures/afid_sag_sample.json @@ -0,0 +1,8 @@ +{ + "9001": { + "service_action_num": 99 + }, + "9002": { + "service_action_num": 88 + } +} diff --git a/test/unit/plugin/test_afid_events_bmc_schema.py b/test/unit/plugin/test_afid_events_bmc_schema.py new file mode 100644 index 00000000..8529577c --- /dev/null +++ b/test/unit/plugin/test_afid_events_bmc_schema.py @@ -0,0 +1,82 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""AFID / serviceable unit extraction for OpenBMC-style LogEntry payloads.""" +from __future__ import annotations + +from serviceability_dummy_data import ( + DUMMY_AFID_A, + DUMMY_AFID_BELOW_RF, + DUMMY_AFID_FATAL_HBM, + DUMMY_TIMESTAMP, + DUMMY_UNIT_A, + DUMMY_UNIT_B, + DUMMY_UNIT_C, + dummy_fatal_hbm_log_entry, + dummy_openbmc_log_entry, + dummy_openbmc_log_entry_serviceable_units_only, +) + +from nodescraper.plugins.serviceability.afid_events import ( + _afid_event_from_rf_member, + build_afid_events_from_data, +) +from nodescraper.plugins.serviceability.serviceability_data import ( + ServiceabilityDataModel, +) + + +def test_afid_event_from_openbmc_log_entry_with_links_and_amd_field_identifiers(): + ev = _afid_event_from_rf_member(dummy_openbmc_log_entry()) + assert ev is not None + assert ev.afid == DUMMY_AFID_BELOW_RF + assert ev.serviceable_unit == DUMMY_UNIT_A + assert DUMMY_TIMESTAMP[:10] in ev.time + + +def test_serviceable_unit_from_oem_serviceable_units_when_no_links(): + ev = _afid_event_from_rf_member(dummy_openbmc_log_entry_serviceable_units_only()) + assert ev is not None + assert ev.afid == DUMMY_AFID_A + assert ev.serviceable_unit == DUMMY_UNIT_B + + +def test_afid_event_fatal_hbm_log_entry(): + ev = _afid_event_from_rf_member(dummy_fatal_hbm_log_entry()) + assert ev is not None + assert ev.afid == DUMMY_AFID_FATAL_HBM + assert ev.serviceable_unit == DUMMY_UNIT_C + + +def test_build_afid_events_from_data_includes_openbmc_entries(): + data = ServiceabilityDataModel( + rf_events=[dummy_openbmc_log_entry(), dummy_fatal_hbm_log_entry()], + cper_data={}, + ) + events = build_afid_events_from_data(data) + assert len(events) == 2 + by_afid_oam = {(e.afid, e.serviceable_unit) for e in events} + assert (DUMMY_AFID_BELOW_RF, DUMMY_UNIT_A) in by_afid_oam + assert (DUMMY_AFID_FATAL_HBM, DUMMY_UNIT_C) in by_afid_oam diff --git a/test/unit/plugin/test_amdsmi_data.py b/test/unit/plugin/test_amdsmi_data.py index 9e28fbb9..f6c4f750 100644 --- a/test/unit/plugin/test_amdsmi_data.py +++ b/test/unit/plugin/test_amdsmi_data.py @@ -23,7 +23,7 @@ # SOFTWARE. # ############################################################################### -"""Unit tests for amd-smi pydantic models (ROCm 7.13 / legacy JSON shapes).""" +"""Unit tests for amd-smi pydantic models (legacy JSON, ROCm 7.2+ / AMD-SMI 26.2+).""" from typing import Any, Optional @@ -341,6 +341,36 @@ def test_static_frequency_levels_optional_levels(): assert levels.Level_2 is not None and levels.Level_2.value == 1300 +def test_static_frequency_levels_accepts_level_three_plus(): + """ROCm 7.2+ / AMD-SMI 26.2+ may expose additional DPM levels (e.g. Level 3).""" + levels = StaticFrequencyLevels.model_validate( + { + "Level 0": "400 MHz", + "Level 1": "800 MHz", + "Level 2": "1000 MHz", + "Level 3": "1143 MHz", + } + ) + assert levels.Level_3 is not None + assert levels.Level_3.value == 1143 + assert levels.Level_3.unit == "MHz" + + +def test_static_frequency_levels_legacy_amd_smi_three_levels_only(): + """Legacy static JSON: only Level 0–2 (no Level 3+ keys).""" + levels = StaticFrequencyLevels.model_validate( + { + "Level 0": {"value": 500, "unit": "MHz"}, + "Level 1": "900 MHz", + "Level 2": "1300 MHz", + } + ) + assert levels.Level_0.value == 500 + assert levels.Level_2 is not None and levels.Level_2.value == 1300 + assert levels.Level_3 is None + assert levels.Level_15 is None + + def test_static_limit_legacy_max_power(): """Legacy flat max_power field still resolves.""" limit = StaticLimit.model_validate(DUMMY_LIMIT_LEGACY) diff --git a/test/unit/plugin/test_mi3xx_collector.py b/test/unit/plugin/test_mi3xx_collector.py new file mode 100644 index 00000000..96a9d556 --- /dev/null +++ b/test/unit/plugin/test_mi3xx_collector.py @@ -0,0 +1,319 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import ( + DUMMY_BMC_HOST, + DUMMY_CPER_BYTES_BASIC, + DUMMY_CPER_BYTES_RF, + DUMMY_CPER_EVENT_ID_BASIC, + DUMMY_CPER_EVENT_ID_RF, + DUMMY_EVENT_URI, + DUMMY_EVENT_URI_ALT, + DUMMY_TIMESTAMP_EARLIER, + DUMMY_TIMESTAMP_LATER, + dummy_cper_basic_member, + dummy_cper_rf_member, + dummy_cper_skip_member, +) + +from nodescraper.connection.redfish import RF_MEMBERS, RedfishGetResult +from nodescraper.enums import ExecutionStatus +from nodescraper.plugins.serviceability import ( + MI3XXAnalyzer, + MI3XXCollector, + MI3XXCollectorArgs, + MI3XXDataModel, + MI3XXDeviceInfo, + MI3XXResult, + ServiceabilityDataModel, + ServiceabilityPluginBase, + ServiceabilityPluginMI3XX, + build_mi3xx_reporting_version_fields, + compare_iso_datetime, + is_valid_iso_datetime, + satisfies_time_check, +) + +EVENT_URI = DUMMY_EVENT_URI + + +@pytest.fixture +def mi3xx_collector(system_info, redfish_conn_mock): + redfish_conn_mock.base_url = f"https://{DUMMY_BMC_HOST}/redfish/v1" + return MI3XXCollector( + system_info=system_info, + connection=redfish_conn_mock, + log_path="/tmp/mi3xx.log", + ) + + +def test_mi3xx_collector_args_default_event_log_uri(): + args = MI3XXCollectorArgs() + uri = args.resolved_event_log_uri() + assert uri == MI3XXCollectorArgs.default_event_log_uri() + assert uri.startswith("/redfish/") + assert "EventLog" in uri + + +def test_mi3xx_collector_args_requires_event_log_uri(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs(uri="", rf_event_log_uri="") + + +def test_mi3xx_collector_args_uri_alias_prefers_uri_when_both_set(): + args = MI3XXCollectorArgs( + uri=f" {DUMMY_EVENT_URI_ALT} ", + rf_event_log_uri=DUMMY_EVENT_URI, + ) + assert args.resolved_event_log_uri() == DUMMY_EVENT_URI_ALT + + +def test_mi3xx_collector_args_strips_rf_event_log_uri(): + args = MI3XXCollectorArgs(rf_event_log_uri=f" {DUMMY_EVENT_URI_ALT} ") + assert args.rf_event_log_uri == DUMMY_EVENT_URI_ALT + assert args.resolved_event_log_uri() == DUMMY_EVENT_URI_ALT + + +def test_mi3xx_collector_args_assembly_requires_both_template_and_devices(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/{device}/Assembly", + ) + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_reference_time_requires_operator(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + ) + + +def test_mi3xx_collector_args_accepts_iso_date_and_datetime(): + date_args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + assert date_args.reference_time == "2000-01-01" + + +def test_time_utils_iso_validation_and_comparison(): + assert is_valid_iso_datetime("2000-01-01") + assert satisfies_time_check("2000-01-02", "2000-01-01", ">") + assert compare_iso_datetime("2000-01-01T00:00:00", "2000-01-01T00:00:00", "==") + + +def test_serviceability_plugin_mi3xx_wiring(): + assert issubclass(ServiceabilityPluginMI3XX, ServiceabilityPluginBase) + assert ServiceabilityPluginMI3XX.DATA_MODEL is ServiceabilityDataModel + assert ServiceabilityPluginMI3XX.COLLECTOR is MI3XXCollector + assert ServiceabilityPluginMI3XX.COLLECTOR_ARGS is MI3XXCollectorArgs + assert ServiceabilityPluginMI3XX.ANALYZER is MI3XXAnalyzer + + +def test_mi3xx_collector_no_args(mi3xx_collector): + result, data = mi3xx_collector.collect_data() + assert result.status == ExecutionStatus.NOT_RAN + assert "required" in result.message.lower() + assert data is None + + +def test_mi3xx_collector_success_minimal(mi3xx_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [{"Id": "dummy-1", "Created": DUMMY_TIMESTAMP_LATER}]}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert len(data.rf_events) == 1 + assert data.bmc_host == DUMMY_BMC_HOST + assert data.log_path == "/tmp/mi3xx.log" + + +def test_mi3xx_collector_satisfies_reference_time_helper(mi3xx_collector): + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + assert mi3xx_collector.satisfies_reference_time(DUMMY_TIMESTAMP_LATER, args) + assert not mi3xx_collector.satisfies_reference_time(DUMMY_TIMESTAMP_EARLIER, args) + + +def test_mi3xx_collector_is_cper_event_requires_cper_block_type_and_uri(mi3xx_collector): + assert mi3xx_collector.is_cper_event(dummy_cper_basic_member()) + assert not mi3xx_collector.is_cper_event( + { + "Id": "non-cper", + "AdditionalDataURI": DUMMY_EVENT_URI, + "MessageId": "ResourceEvent.1.2.1.ResourceErrorsDetectedOEM", + } + ) + assert not mi3xx_collector.is_cper_event( + { + "Id": "partial-cper", + "CPER": {"NotificationType": "dummy"}, + "DiagnosticDataType": "CPER", + } + ) + + +def test_mi3xx_collector_fetches_cper_attachments(mi3xx_collector, redfish_conn_mock): + import base64 + from unittest.mock import MagicMock + + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_basic_member()]}, + status_code=200, + ) + response = MagicMock() + response.ok = True + response.status_code = 200 + response.content = DUMMY_CPER_BYTES_BASIC + redfish_conn_mock.get_response.return_value = response + + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw[DUMMY_CPER_EVENT_ID_BASIC] == base64.b64encode( + DUMMY_CPER_BYTES_BASIC + ).decode("ascii") + assert data.cper_data == {} + + +def test_mi3xx_collector_skips_cper_when_aca_serial_and_low_afids( + mi3xx_collector, redfish_conn_mock +): + redfish_conn_mock.get_response.reset_mock() + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_skip_member()]}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw == {} + redfish_conn_mock.get_response.assert_not_called() + + +def test_mi3xx_collector_fetches_cper_when_rf_afid(mi3xx_collector, redfish_conn_mock): + import base64 + from unittest.mock import MagicMock + + redfish_conn_mock.get_response.reset_mock() + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [dummy_cper_rf_member()]}, + status_code=200, + ) + response = MagicMock() + response.ok = True + response.status_code = 200 + response.content = DUMMY_CPER_BYTES_RF + redfish_conn_mock.get_response.return_value = response + + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.cper_raw[DUMMY_CPER_EVENT_ID_RF] == base64.b64encode(DUMMY_CPER_BYTES_RF).decode( + "ascii" + ) + redfish_conn_mock.get_response.assert_called_once() + + +def test_mi3xx_collector_filters_events_by_reference_time(mi3xx_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={ + RF_MEMBERS: [ + {"Id": "dummy-1", "Created": DUMMY_TIMESTAMP_LATER}, + {"Id": "dummy-2", "Created": DUMMY_TIMESTAMP_EARLIER}, + ] + }, + status_code=200, + ) + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + reference_time="2000-01-01", + time_operator=">=", + ) + result, data = mi3xx_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert [event["Id"] for event in data.rf_events] == ["dummy-1"] + + +def test_mi3xx_device_info_fields(): + info = MI3XXDeviceInfo( + board_product_name="dummy-board", + board_serial_number="dummy-serial-001", + product_version="0.0-dummy", + ) + assert info.board_product_name == "dummy-board" + assert info.product_version == "0.0-dummy" + + +def test_mi3xx_result_reporting_versions(): + version_fields = build_mi3xx_reporting_version_fields( + plugin_name="dummy_plugin", + plugin_version="0.0-dummy", + node_scraper_version="0.0-dummy", + dummy_hub_version="0.0-dummy", + ) + result = MI3XXResult(node="dummy-node", **version_fields) + assert result.plugin_name == "dummy_plugin" + assert result.reporter_extensions["dummy_hub_version"] == "0.0-dummy" + + +def test_mi3xx_data_model_log_model(tmp_path): + model = MI3XXDataModel( + collected_data={"events": [{"id": 1}]}, + artifacts={"events.json": [{"id": 1}]}, + ) + model.log_model(str(tmp_path)) + assert (tmp_path / "events.json").is_file() + assert (tmp_path / "MI3XX_data.json").is_file() diff --git a/test/unit/plugin/test_mi3xx_cper_utils.py b/test/unit/plugin/test_mi3xx_cper_utils.py new file mode 100644 index 00000000..105ca203 --- /dev/null +++ b/test/unit/plugin/test_mi3xx_cper_utils.py @@ -0,0 +1,154 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import pytest +from serviceability_dummy_data import ( + DUMMY_AFID_BELOW_RF, + DUMMY_AFID_FATAL_HBM, + DUMMY_RF_CPER_AFID, + dummy_aca_err_row, +) + +from nodescraper.plugins.serviceability.mi3xx.mi3xx_cper_utils import ( + CPER_METHOD_AFID_MAX, + REDFISH_METHOD_AFID_MAX, + REDFISH_METHOD_AFID_MIN, + event_aca_includes_serial, + event_afids_from_oem, + event_has_aca_decode, + get_amd_oem_dict, + is_cper_method_afid, + is_redfish_method_afid, + should_skip_cper_fetch_or_decode, +) + + +def test_get_amd_oem_dict_layouts(): + flat = {"Oem": {"AMDFieldIdentifiers": [{"AFID": 1}]}} + assert get_amd_oem_dict(flat) == {"AMDFieldIdentifiers": [{"AFID": 1}]} + + nested = {"Oem": {"AMD": {"ErrDataArr": []}}} + assert get_amd_oem_dict(nested) == {"ErrDataArr": []} + + assert get_amd_oem_dict({}) == {} + assert get_amd_oem_dict({"Oem": None}) == {} + assert get_amd_oem_dict({"Oem": "bad"}) == {} + assert get_amd_oem_dict({"Oem": {"AMD": "bad"}}) == {} + + +def test_skip_when_afids_below_threshold_and_aca_has_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert event_afids_from_oem(event) == [DUMMY_AFID_BELOW_RF] + assert should_skip_cper_fetch_or_decode(event) is True + + +def test_event_afids_from_oem_nested_amd_block(): + event = { + "Oem": { + "AMD": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + "ErrDataArr": [dummy_aca_err_row()], + } + } + } + assert event_afids_from_oem(event) == [DUMMY_AFID_BELOW_RF] + assert event_has_aca_decode(event) is True + assert should_skip_cper_fetch_or_decode(event) is True + + +def test_err_data_arr_entries_nested_amd_block(): + event = {"Oem": {"AMD": {"ErrDataArr": [dummy_aca_err_row()]}}} + assert event_has_aca_decode(event) is True + assert event_aca_includes_serial(event) is True + + +def test_afid_method_ranges(): + assert is_cper_method_afid(DUMMY_AFID_BELOW_RF) + assert is_cper_method_afid(CPER_METHOD_AFID_MAX) + assert not is_cper_method_afid(CPER_METHOD_AFID_MAX + 1) + assert is_redfish_method_afid(DUMMY_RF_CPER_AFID) + assert is_redfish_method_afid(REDFISH_METHOD_AFID_MAX) + assert not is_redfish_method_afid(REDFISH_METHOD_AFID_MIN - 1) + assert not is_redfish_method_afid(REDFISH_METHOD_AFID_MAX + 1) + assert not is_redfish_method_afid(DUMMY_AFID_BELOW_RF) + + +def test_no_skip_when_rf_range_afid_even_with_aca_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert should_skip_cper_fetch_or_decode(event) is False + + +def test_skip_when_aca_decode_without_serial(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row(serial=False)], + } + } + assert event_has_aca_decode(event) is True + assert event_aca_includes_serial(event) is False + assert should_skip_cper_fetch_or_decode(event) is True + + +def test_no_skip_when_no_err_data_decoded(): + event = { + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + } + } + assert should_skip_cper_fetch_or_decode(event) is False + + +def test_no_skip_when_aca_serial_but_no_afid_list(): + event = { + "Oem": { + "ErrDataArr": [dummy_aca_err_row()], + } + } + assert event_afids_from_oem(event) == [] + assert should_skip_cper_fetch_or_decode(event) is False + + +@pytest.mark.parametrize( + "afids,expect_skip", + [ + ([DUMMY_AFID_BELOW_RF, DUMMY_AFID_FATAL_HBM], True), + ([DUMMY_AFID_BELOW_RF, DUMMY_RF_CPER_AFID], False), + ], +) +def test_skip_requires_all_afids_cper_method(afids, expect_skip): + identifiers = [{"AFID": a} for a in afids] + event = {"Oem": {"AMDFieldIdentifiers": identifiers, "ErrDataArr": [dummy_aca_err_row()]}} + assert should_skip_cper_fetch_or_decode(event) is expect_skip diff --git a/test/unit/plugin/test_regex_search_analyzer.py b/test/unit/plugin/test_regex_search_analyzer.py index ac018ee1..e93b93da 100644 --- a/test/unit/plugin/test_regex_search_analyzer.py +++ b/test/unit/plugin/test_regex_search_analyzer.py @@ -28,10 +28,16 @@ import tempfile from nodescraper.enums.executionstatus import ExecutionStatus -from nodescraper.plugins.regex_search.analyzer_args import RegexSearchAnalyzerArgs -from nodescraper.plugins.regex_search.regex_search_analyzer import RegexSearchAnalyzer -from nodescraper.plugins.regex_search.regex_search_data import RegexSearchData -from nodescraper.plugins.regex_search.regex_search_plugin import RegexSearchPlugin +from nodescraper.plugins.inband.regex_search.analyzer_args import ( + RegexSearchAnalyzerArgs, +) +from nodescraper.plugins.inband.regex_search.regex_search_analyzer import ( + RegexSearchAnalyzer, +) +from nodescraper.plugins.inband.regex_search.regex_search_data import RegexSearchData +from nodescraper.plugins.inband.regex_search.regex_search_plugin import ( + RegexSearchPlugin, +) EXPECTED_MISSING_ANALYSIS_MSG = "Analysis args need to be provided for the analyzer to run" diff --git a/test/unit/plugin/test_se_runner.py b/test/unit/plugin/test_se_runner.py new file mode 100644 index 00000000..025aef25 --- /dev/null +++ b/test/unit/plugin/test_se_runner.py @@ -0,0 +1,418 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import json +from pathlib import Path +from types import SimpleNamespace +from typing import Any + +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import ( + DUMMY_AFID_A, + DUMMY_AFID_B, + DUMMY_AFID_C, + DUMMY_DESIGNATION_A, + DUMMY_DESIGNATION_B, + DUMMY_HUB_VERSION, + DUMMY_OEM_VENDOR, + DUMMY_RF_EVENT_COUNT, + DUMMY_SAG_PID, + DUMMY_SAG_REVISION, + DUMMY_SAG_VARIANT, + DUMMY_SERVICE_ACTION_NUM, + DUMMY_TIMESTAMP, + DUMMY_UNIT_A, + DUMMY_UNIT_B, + DUMMY_UNIT_C, +) + +from nodescraper.enums import ExecutionStatus +from nodescraper.plugins.serviceability import ( + AfidEvent, + MI3XXAnalyzer, + SeRunError, + ServiceabilityAnalyzerArgs, + ServiceabilityBlock, + ServiceabilityDataModel, + build_afid_events_from_data, + format_serviceability_solution_lines, + normalize_se_timestamp, + run_service_hub, + serviceability_block_from_service_result, +) +from nodescraper.plugins.serviceability.se_models import ServiceabilitySolution + +FIXTURES = Path(__file__).resolve().parent / "fixtures" +AFID_SAG = FIXTURES / "afid_sag_sample.json" +EXAMPLE_EVENTS = [ + AfidEvent(afid=DUMMY_AFID_A, serviceable_unit=DUMMY_UNIT_A, time=DUMMY_TIMESTAMP), + AfidEvent(afid=DUMMY_AFID_B, serviceable_unit=DUMMY_UNIT_B, time=DUMMY_TIMESTAMP), + AfidEvent(afid=DUMMY_AFID_C, serviceable_unit=DUMMY_UNIT_C, time=DUMMY_TIMESTAMP), +] + + +def test_afid_event_requires_non_empty_serviceable_unit(): + with pytest.raises(ValidationError): + AfidEvent(afid=1, serviceable_unit=" ", time=DUMMY_TIMESTAMP) + + +def test_normalize_se_timestamp_preserves_format_value(): + sample = "2000-01-01 12:00:00.000+00:00" + assert normalize_se_timestamp(sample) == sample + + +def test_analyzer_args_require_hub_config(): + with pytest.raises(ValidationError): + ServiceabilityAnalyzerArgs() + with pytest.raises(ValidationError, match="hub_python_module"): + ServiceabilityAnalyzerArgs(afid_sag_path=str(AFID_SAG)) + args = ServiceabilityAnalyzerArgs( + hub_python_module="dummy.test.module", + afid_sag_path=str(AFID_SAG), + ) + assert args.hub_python_module == "dummy.test.module" + + +def test_resolved_hub_options_explicit_fields_override_options_bag(): + args = ServiceabilityAnalyzerArgs( + hub_python_module="dummy.test.module", + afid_sag_path=str(AFID_SAG), + hub_options={"from_ac_cycle": 9, "extra": 1}, + from_ac_cycle=3, + from_date="2025-01-01", + designation_serials={"U": "S"}, + suppress_service_actions=["99"], + ) + merged = args.resolved_hub_options() + assert merged["from_ac_cycle"] == 3 + assert merged["from_date"] == "2025-01-01" + assert merged["designation_serials"] == {"U": "S"} + assert merged["suppress_service_actions"] == ["99"] + assert merged["extra"] == 1 + + +def test_format_serviceability_solution_lines(): + block = ServiceabilityBlock( + afid_events=EXAMPLE_EVENTS[:1], + solution=[ + ServiceabilitySolution( + afid=DUMMY_AFID_A, + serviceable_unit=[DUMMY_DESIGNATION_A, DUMMY_DESIGNATION_B], + service_action_num=DUMMY_SERVICE_ACTION_NUM, + service_action_title="RMA", + ) + ], + solution_reasoning="Dummy test reasoning.", + hub_version="1.0.0-test", + afid_sag_file_version=( + f"PID {DUMMY_SAG_PID}, revision {DUMMY_SAG_REVISION}, variant {DUMMY_SAG_VARIANT}" + ), + ) + lines = format_serviceability_solution_lines(block) + assert lines[0] == "Dummy test reasoning." + assert lines[1] == "Hub version: 1.0.0-test" + assert ( + lines[2] + == f"AFID_SAG file: PID {DUMMY_SAG_PID}, revision {DUMMY_SAG_REVISION}, variant {DUMMY_SAG_VARIANT}" + ) + assert f"AFID {DUMMY_AFID_A}" in lines[3] + assert DUMMY_DESIGNATION_A in lines[3] + assert "service action 99 (RMA)" in lines[3] + + +def test_serviceability_block_from_service_result(): + result = SimpleNamespace( + service_info={ + DUMMY_DESIGNATION_A: { + str(DUMMY_AFID_A): { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "error_category": "dummy_category", + "error_type": "dummy_type", + "title": "Dummy service action", + } + }, + DUMMY_DESIGNATION_B: { + str(DUMMY_AFID_A): { + "service_action_number": str(DUMMY_SERVICE_ACTION_NUM), + "error_category": "dummy_category", + "error_type": "dummy_type", + "title": "Dummy service action", + } + }, + }, + afid_sag_metadata={ + "sag_pid": DUMMY_SAG_PID, + "revision": DUMMY_SAG_REVISION, + "variant": DUMMY_SAG_VARIANT, + }, + engine_version_info={"version": DUMMY_HUB_VERSION}, + ) + block = serviceability_block_from_service_result( + EXAMPLE_EVENTS[:1], + result, + hub_label="Dummy test hub", + rf_event_count=DUMMY_RF_EVENT_COUNT, + ) + assert len(block.solution) == 1 + assert block.solution[0].afid == DUMMY_AFID_A + assert block.solution[0].service_action_num == DUMMY_SERVICE_ACTION_NUM + assert block.solution[0].service_action_title == "Dummy service action" + assert set(block.solution[0].serviceable_unit) == {DUMMY_DESIGNATION_A, DUMMY_DESIGNATION_B} + assert block.hub_version == DUMMY_HUB_VERSION + assert block.afid_sag_file_version == ( + f"PID {DUMMY_SAG_PID}, revision {DUMMY_SAG_REVISION}, variant {DUMMY_SAG_VARIANT}" + ) + assert f"{DUMMY_RF_EVENT_COUNT} Redfish event(s)" in block.solution_reasoning + assert "Dummy test hub" in block.solution_reasoning + + +def test_serviceability_block_from_service_result_isa_version_info(): + result = SimpleNamespace( + service_info={}, + afid_sag_metadata={ + "sag_pid": DUMMY_SAG_PID, + "revision": DUMMY_SAG_REVISION, + "variant": DUMMY_SAG_VARIANT, + }, + isa_version_info={"VERSION": "1.2.3"}, + ) + block = serviceability_block_from_service_result( + EXAMPLE_EVENTS[:1], + result, + hub_label="ISA", + rf_event_count=1, + ) + assert block.hub_version == "1.2.3" + assert block.afid_sag_file_version == ( + f"PID {DUMMY_SAG_PID}, revision {DUMMY_SAG_REVISION}, variant {DUMMY_SAG_VARIANT}" + ) + + +def test_resolve_hub_class_finds_package_export(): + import types + + submodule = types.ModuleType("fake_engine.impl") + submodule.__dict__["EngineImpl"] = type( + "EngineImpl", + (), + {"get_service_info": lambda self, rf_events, cper_data=None: None}, + ) + package = types.ModuleType("fake_engine") + package.EngineImpl = submodule.EngineImpl # type: ignore[attr-defined] + package.__all__ = ["EngineImpl"] + + from nodescraper.plugins.serviceability.se_runner import _resolve_hub_class + + assert _resolve_hub_class(package) is submodule.EngineImpl + + +def test_run_service_hub_with_mock_module(): + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + {"Afid": DUMMY_AFID_C, "serviceable_unit": DUMMY_UNIT_C, "Created": DUMMY_TIMESTAMP}, + ] + block = run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS[:2], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + ) + assert len(block.solution) == 2 + assert block.solution[0].afid == DUMMY_AFID_A + assert block.solution[0].service_action_num == DUMMY_SERVICE_ACTION_NUM + + +def test_run_service_hub_custom_analyze_method_and_path_kwarg(): + import sys + import types + + init_log: list[tuple[str, bool]] = [] + analyze_log: list[Any] = [] + + class AltEngine: + def __init__(self, rulebook_path: str, debug: bool = False) -> None: + init_log.append((rulebook_path, debug)) + + def analyze_events(self, rf_events, cper_data=None): + analyze_log.append((list(rf_events), cper_data)) + return None + + mod = types.ModuleType("alt_service_engine") + mod.AltEngine = AltEngine + mod.__all__ = ["AltEngine"] + sys.modules["alt_service_engine"] = mod + try: + run_service_hub( + hub_python_module="alt_service_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=[{"Afid": 1}], + cper_data={"k": 1}, + hub_options={"debug": True}, + hub_analyze_method="analyze_events", + hub_init_path_kwarg="rulebook_path", + ) + finally: + del sys.modules["alt_service_engine"] + + assert init_log[0][0] == str(AFID_SAG) + assert init_log[0][1] is True + assert analyze_log[0][1] == {"k": 1} + + +def test_run_service_hub_accepts_hub_options(): + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + block = run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + hub_options={"reporting_level": "verbose"}, + ) + assert len(block.solution) == 1 + + +def test_run_service_hub_forwards_full_hub_options_kwargs(): + from instinct_shaped_engine import clear_last_call, get_last_call + + clear_last_call() + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + run_service_hub( + hub_python_module="instinct_shaped_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + cper_data={"decoded": True}, + hub_options={ + "from_ac_cycle": 2, + "from_date": "2024-06-01", + "designation_serials": {"GPU0": "SN1"}, + "suppress_service_actions": ["42"], + }, + ) + got = get_last_call() + assert got["from_ac_cycle"] == 2 + assert got["from_date"] == "2024-06-01" + assert got["cper_data"] == {"decoded": True} + assert got["designation_serials"] == {"GPU0": "SN1"} + assert got["suppress_service_actions"] == ["42"] + + +def test_run_service_hub_collected_cper_overrides_hub_options_cper_data(): + from instinct_shaped_engine import clear_last_call, get_last_call + + clear_last_call() + rf_events = [ + {"Afid": DUMMY_AFID_A, "serviceable_unit": DUMMY_UNIT_A, "Created": DUMMY_TIMESTAMP}, + ] + run_service_hub( + hub_python_module="instinct_shaped_engine", + afid_events=EXAMPLE_EVENTS[:1], + afid_sag_path=str(AFID_SAG), + rf_events=rf_events, + cper_data={"from_collector": 1}, + hub_options={"cper_data": {"from_options": 2}, "from_ac_cycle": 0}, + ) + assert get_last_call()["cper_data"] == {"from_collector": 1} + + +def test_run_service_hub_missing_sag_raises(): + with pytest.raises(SeRunError, match="Hub config file not found"): + run_service_hub( + hub_python_module="mock_python_engine", + afid_events=EXAMPLE_EVENTS, + afid_sag_path="/nonexistent/dummy_afid_sag.json", + rf_events=[{"Afid": DUMMY_AFID_A}], + ) + + +def test_build_afid_events_from_rf_members(): + data = ServiceabilityDataModel( + rf_events=[ + { + "Afid": DUMMY_AFID_A, + "serviceable_unit": DUMMY_UNIT_A, + "Created": DUMMY_TIMESTAMP, + }, + { + "Oem": { + DUMMY_OEM_VENDOR: { + "Afid": DUMMY_AFID_B, + "serviceable_unit": DUMMY_UNIT_B, + } + }, + "EventTimestamp": DUMMY_TIMESTAMP, + }, + ] + ) + events = build_afid_events_from_data(data) + assert len(events) == 2 + assert events[0].afid == DUMMY_AFID_A + assert events[1].afid == DUMMY_AFID_B + + +def test_mi3xx_analyzer_runs_python_hub(system_info): + data = ServiceabilityDataModel( + rf_events=[ + { + "Afid": DUMMY_AFID_A, + "serviceable_unit": DUMMY_UNIT_A, + "Created": DUMMY_TIMESTAMP, + }, + { + "Afid": DUMMY_AFID_C, + "serviceable_unit": DUMMY_UNIT_C, + "Created": DUMMY_TIMESTAMP, + }, + ] + ) + analyzer = MI3XXAnalyzer(system_info=system_info) + args = ServiceabilityAnalyzerArgs( + hub_python_module="mock_python_engine", + afid_sag_path=str(AFID_SAG), + hub_options={"include_raw_events": False}, + ) + result = analyzer.analyze_data(data, args=args) + assert result.status == ExecutionStatus.OK + assert data.serviceability is not None + assert len(data.serviceability.solution) == 2 + + +def test_mi3xx_analyzer_writes_serviceability_json(tmp_path, system_info): + data = ServiceabilityDataModel( + afid_events=EXAMPLE_EVENTS[:1], + serviceability=ServiceabilityBlock( + afid_events=EXAMPLE_EVENTS[:1], + solution=[], + ), + ) + data.log_model(str(tmp_path)) + payload = json.loads((tmp_path / "serviceability.json").read_text(encoding="utf-8")) + assert payload["afid_events"][0]["afid"] == DUMMY_AFID_A diff --git a/test/unit/plugin/test_serviceability_collector.py b/test/unit/plugin/test_serviceability_collector.py new file mode 100644 index 00000000..1ce3fbb2 --- /dev/null +++ b/test/unit/plugin/test_serviceability_collector.py @@ -0,0 +1,344 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +import json +from typing import Any, Optional + +import pytest +from pydantic import ValidationError +from serviceability_dummy_data import DUMMY_BMC_HOST, DUMMY_EVENT_URI + +from nodescraper.connection.redfish import ( + RF_MEMBERS, + RF_MEMBERS_COUNT, + RedfishGetResult, +) +from nodescraper.enums import ExecutionStatus +from nodescraper.models import CollectorArgs +from nodescraper.plugins.serviceability import ( + DeviceInfo, + MI3XXCollectorArgs, + ServiceabilityAnalyzerArgs, + ServiceabilityDataModel, + ServiceabilityPluginBase, +) +from nodescraper.plugins.serviceability.serviceability_collector import ( + ServiceabilityCollectorBase, +) + +EVENT_URI = DUMMY_EVENT_URI + + +class _StubServiceabilityCollector(ServiceabilityCollectorBase[MI3XXCollectorArgs]): + def filter_event_members( + self, + members: list[Any], + args: MI3XXCollectorArgs, + ) -> list[Any]: + return members + + def is_cper_event(self, event: dict) -> bool: + return False + + def collect_cper_attachments(self, rf_events: list[Any]) -> dict[str, str]: + return {} + + def parse_assembly_entry( + self, + designation: str, + assembly_member_entry: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> DeviceInfo: + return DeviceInfo(name=designation, serial_number=assembly_member_entry.get("SerialNumber")) + + def extract_component_details( + self, + firmware_inventory_payload: dict[str, Any], + args: MI3XXCollectorArgs, + ) -> Optional[str]: + return firmware_inventory_payload.get("Details") + + +@pytest.fixture +def stub_serviceability_collector(system_info, redfish_conn_mock): + redfish_conn_mock.base_url = f"https://{DUMMY_BMC_HOST}/redfish/v1" + return _StubServiceabilityCollector( + system_info=system_info, + connection=redfish_conn_mock, + log_path="/tmp/serviceability.log", + ) + + +def test_mi3xx_collector_args_default_event_log_uri(): + args = MI3XXCollectorArgs() + uri = args.resolved_event_log_uri() + assert uri == MI3XXCollectorArgs.default_event_log_uri() + assert uri.startswith("/redfish/") + assert "EventLog" in uri + + +def test_mi3xx_collector_args_requires_event_log_uri(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs(uri="", rf_event_log_uri="") + + +def test_mi3xx_collector_args_uri_alias_prefers_uri_over_rf_event_log_uri(): + args = MI3XXCollectorArgs( + uri=" /redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt ", + rf_event_log_uri="/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/Entries", + ) + assert ( + args.resolved_event_log_uri() + == "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt" + ) + + +def test_mi3xx_collector_args_assembly_requires_both_template_and_devices(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/{device}/Assembly", + ) + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_assembly_template_must_include_device_placeholder(): + with pytest.raises(ValidationError): + MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template="/redfish/v1/Chassis/dummy-chassis/Assembly", + rf_chassis_devices=["dummy-chassis"], + ) + + +def test_mi3xx_collector_args_assembly_optional_when_omitted(): + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + assert args.rf_assembly_uri_template is None + assert args.rf_chassis_devices is None + + +def test_serviceability_plugin_base_wiring(): + assert ServiceabilityPluginBase.DATA_MODEL is ServiceabilityDataModel + assert ServiceabilityPluginBase.COLLECTOR is ServiceabilityCollectorBase + assert getattr(ServiceabilityPluginBase, "COLLECTOR_ARGS", CollectorArgs) is CollectorArgs + assert ServiceabilityPluginBase.ANALYZER_ARGS is ServiceabilityAnalyzerArgs + assert ServiceabilityPluginBase.ANALYZER is None + + +def test_stub_collector_no_args(stub_serviceability_collector): + result, data = stub_serviceability_collector.collect_data() + assert result.status == ExecutionStatus.NOT_RAN + assert "required" in result.message.lower() + assert data is None + + +def test_stub_collector_event_log_get_fails(stub_serviceability_collector, redfish_conn_mock): + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=False, + error="timeout", + status_code=None, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.ERROR + assert EVENT_URI in result.message + assert data is None + + +def test_stub_collector_success_minimal(stub_serviceability_collector, redfish_conn_mock): + members = [{"Id": "1"}] + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: members}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.rf_events == members + assert EVENT_URI in data.responses + assert data.bmc_host == DUMMY_BMC_HOST + assert data.log_path == "/tmp/serviceability.log" + redfish_conn_mock.run_get_paged.assert_called_once() + + +def test_stub_collector_filter_raises_maps_to_error( + stub_serviceability_collector, redfish_conn_mock +): + class _BadFilter(_StubServiceabilityCollector): + def filter_event_members(self, members, args): + raise ValueError("bad filter") + + collector = _BadFilter( + system_info=stub_serviceability_collector.system_info, + connection=redfish_conn_mock, + ) + redfish_conn_mock.run_get_paged.return_value = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: []}, + status_code=200, + ) + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI) + result, data = collector.collect_data(args=args) + assert result.status == ExecutionStatus.ERROR + assert "Event filter failed" in result.message + assert data is None + + +def test_stub_collector_assembly_and_firmware_paths( + stub_serviceability_collector, redfish_conn_mock +): + tpl = "/redfish/v1/Chassis/{device}/Assembly" + asm_uri = tpl.format(device="dummy-chassis") + fw_uri = "/redfish/v1/UpdateService/FirmwareInventory" + + def run_get_side_effect(path: str, *_args, **_kwargs): + if path == EVENT_URI: + return RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: []}, + status_code=200, + ) + if path == asm_uri: + return RedfishGetResult( + path=asm_uri, + success=True, + data={"Assemblies": [{"SerialNumber": "dummy-asm-serial"}]}, + status_code=200, + ) + if path == fw_uri: + return RedfishGetResult( + path=fw_uri, + success=True, + data={"Details": "dummy-fw-summary"}, + status_code=200, + ) + raise AssertionError(f"unexpected Redfish GET path: {path!r}") + + redfish_conn_mock.run_get.side_effect = run_get_side_effect + + def run_get_paged_forbidden(*_args, **_kwargs): + raise AssertionError("run_get_paged must not run when follow_next_link=False") + + redfish_conn_mock.run_get_paged.side_effect = run_get_paged_forbidden + + args = MI3XXCollectorArgs( + rf_event_log_uri=EVENT_URI, + rf_assembly_uri_template=tpl, + rf_chassis_devices=["dummy-chassis"], + rf_firmware_bundle_uri=fw_uri, + follow_next_link=False, + ) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert "dummy-chassis" in data.assembly_info + assert data.assembly_info["dummy-chassis"].serial_number == "dummy-asm-serial" + assert data.component_details == "dummy-fw-summary" + assert asm_uri in data.responses + + +def test_stub_collector_top_when_count_exceeds_top_uses_skip_and_paged( + stub_serviceability_collector, redfish_conn_mock +): + probe = RedfishGetResult( + path=f"{EVENT_URI}?$top=1", + success=True, + data={RF_MEMBERS_COUNT: 100}, + status_code=200, + ) + window = RedfishGetResult( + path=f"{EVENT_URI}?$skip=90", + success=True, + data={RF_MEMBERS: [{"Id": "last"}]}, + status_code=200, + ) + redfish_conn_mock.run_get.return_value = probe + redfish_conn_mock.run_get_paged.return_value = window + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI, top=10) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert data.rf_events == [{"Id": "last"}] + redfish_conn_mock.run_get.assert_called_once() + assert "?$top=1" in redfish_conn_mock.run_get.call_args[0][0] + redfish_conn_mock.run_get_paged.assert_called_once_with( + f"{EVENT_URI}?$skip=90", max_pages=args.max_pages + ) + + +def test_stub_collector_top_when_count_within_top_fetches_full_log( + stub_serviceability_collector, redfish_conn_mock +): + probe = RedfishGetResult( + path=f"{EVENT_URI}?$top=1", + success=True, + data={RF_MEMBERS_COUNT: 3}, + status_code=200, + ) + full = RedfishGetResult( + path=EVENT_URI, + success=True, + data={RF_MEMBERS: [{"Id": "a"}, {"Id": "b"}]}, + status_code=200, + ) + redfish_conn_mock.run_get.return_value = probe + redfish_conn_mock.run_get_paged.return_value = full + args = MI3XXCollectorArgs(rf_event_log_uri=EVENT_URI, top=50) + result, data = stub_serviceability_collector.collect_data(args=args) + assert result.status == ExecutionStatus.OK + assert data is not None + assert len(data.rf_events) == 2 + redfish_conn_mock.run_get_paged.assert_called_once_with(EVENT_URI, max_pages=args.max_pages) + + +def test_serviceability_data_model_log_model_writes_json(tmp_path): + model = ServiceabilityDataModel( + responses={"/x": {"ok": True}}, + cper_data={"slot": {"raw": "data"}}, + ) + model.log_model(str(tmp_path)) + responses_file = tmp_path / "redfish_responses.json" + cper_file = tmp_path / "cper_data.json" + assert responses_file.is_file() + assert cper_file.is_file() + assert json.loads(responses_file.read_text(encoding="utf-8")) == {"/x": {"ok": True}} + assert json.loads(cper_file.read_text(encoding="utf-8")) == {"slot": {"raw": "data"}} + + +def test_serviceability_data_model_log_model_skips_cper_when_empty(tmp_path): + model = ServiceabilityDataModel(responses={}) + model.log_model(str(tmp_path)) + assert (tmp_path / "redfish_responses.json").is_file() + assert not (tmp_path / "cper_data.json").exists() diff --git a/test/unit/serviceability_dummy_data.py b/test/unit/serviceability_dummy_data.py new file mode 100644 index 00000000..06c78d2e --- /dev/null +++ b/test/unit/serviceability_dummy_data.py @@ -0,0 +1,184 @@ +############################################################################### +# +# MIT License +# +# Copyright (c) 2026 Advanced Micro Devices, Inc. +# +# Permission is hereby granted, free of charge, to any person obtaining a copy +# of this software and associated documentation files (the "Software"), to deal +# in the Software without restriction, including without limitation the rights +# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell +# copies of the Software, and to permit persons to whom the Software is +# furnished to do so, subject to the following conditions: +# +# The above copyright notice and this permission notice shall be included in all +# copies or substantial portions of the Software. +# +# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +# SOFTWARE. +# +############################################################################### +"""Shared dummy values for serviceability unit tests (not production data).""" + +from __future__ import annotations + +from typing import Any + +DUMMY_AFID_A = 9001 +DUMMY_AFID_B = 9002 +DUMMY_AFID_C = 9003 +DUMMY_AFID_BELOW_RF = 22 +DUMMY_AFID_FATAL_HBM = 25 +DUMMY_RF_CPER_AFID = 10000 +DUMMY_SERVICE_ACTION_NUM = 99 +DUMMY_SERVICE_ACTION_TITLE = "Dummy service action" +DUMMY_UNIT_A = "dummy_unit_a" +DUMMY_UNIT_B = "dummy_unit_b" +DUMMY_UNIT_C = "dummy_unit_c" +DUMMY_DESIGNATION_A = "DUMMY_SLOT_A" +DUMMY_DESIGNATION_B = "DUMMY_SLOT_B" +DUMMY_EVENT_URI = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/Entries" +DUMMY_EVENT_URI_ALT = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog/EntriesAlt" +DUMMY_EVENT_LOG_BASE = "/redfish/v1/Systems/Dummy/LogServices/DummyEventLog" +DUMMY_CPER_ATTACHMENT_URI_1 = f"{DUMMY_EVENT_LOG_BASE}/Attachments/1" +DUMMY_CPER_ATTACHMENT_URI_2 = f"{DUMMY_EVENT_LOG_BASE}/Attachments/2" +DUMMY_TIMESTAMP = "2000-01-01T12:00:00+00:00" +DUMMY_TIMESTAMP_EARLIER = "1999-12-31T12:00:00+00:00" +DUMMY_TIMESTAMP_LATER = "2000-01-02T12:00:00+00:00" +DUMMY_RF_EVENT_COUNT = 2 +DUMMY_SAG_PID = "dummy-sag-pid" +DUMMY_SAG_REVISION = "dummy-rev-0" +DUMMY_SAG_VARIANT = "dummy-variant-0" +DUMMY_HUB_VERSION = "0.0.0-dummy" +DUMMY_BMC_HOST = "dummy-bmc.example" +DUMMY_OEM_VENDOR = "DummyVendor" +DUMMY_GPU_SERIAL_NUMBER = "DUMMY-GPU-SERIAL-0001" +DUMMY_DECODED_ERROR_TYPE = "dummy_error_type" +DUMMY_RF_EVENT_ID_1 = "dummy-rf-evt-1" +DUMMY_RF_EVENT_ID_2 = "dummy-rf-evt-2" +DUMMY_CPER_EVENT_ID_BASIC = "dummy-cper-evt-1" +DUMMY_CPER_EVENT_ID_SKIP = "dummy-cper-evt-skip" +DUMMY_CPER_EVENT_ID_RF = "dummy-cper-evt-rf" +DUMMY_CPER_BYTES_BASIC = b"\x01\x02dummy-cper" +DUMMY_CPER_BYTES_RF = b"\xaa\xbb" + + +def dummy_chassis_uri(unit: str) -> str: + return f"/redfish/v1/Chassis/{unit}" + + +def dummy_aca_err_row(*, serial: bool = True, decoded: bool = True) -> dict[str, Any]: + meta = {"SerialNumber": DUMMY_GPU_SERIAL_NUMBER} if serial else {"GpuFw": "dummy-fw"} + decoded_data = {"error_type": DUMMY_DECODED_ERROR_TYPE} if decoded else {} + return {"DecodedData": decoded_data, "MetaData": meta} + + +def dummy_cper_rf_member() -> dict[str, Any]: + """RF-range AFID with ACA decode + serial (CPER attachment fetch expected).""" + return { + "Id": DUMMY_CPER_EVENT_ID_RF, + "Created": DUMMY_TIMESTAMP_LATER, + "CPER": {"NotificationType": "dummy-notification-type"}, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_2, + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_RF_CPER_AFID}], + "ErrDataArr": [dummy_aca_err_row()], + }, + } + + +def dummy_cper_skip_member() -> dict[str, Any]: + """Low AFID with ACA decode + serial (CPER attachment fetch skipped).""" + return { + "Id": DUMMY_CPER_EVENT_ID_SKIP, + "Created": DUMMY_TIMESTAMP_LATER, + "CPER": {"NotificationType": "dummy-notification-type"}, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_1, + "Oem": { + "AMDFieldIdentifiers": [{"AFID": DUMMY_AFID_BELOW_RF}], + "ErrDataArr": [ + { + "DecodedData": {"error_type": "dummy_on_die_ecc"}, + "MetaData": {"SerialNumber": DUMMY_GPU_SERIAL_NUMBER}, + } + ], + }, + } + + +def dummy_cper_basic_member() -> dict[str, Any]: + """CPER event without OEM ACA block (attachment fetch expected).""" + return { + "Id": DUMMY_CPER_EVENT_ID_BASIC, + "Created": DUMMY_TIMESTAMP_LATER, + "CPER": {"NotificationType": "dummy-notification-type"}, + "DiagnosticDataType": "CPER", + "AdditionalDataURI": DUMMY_CPER_ATTACHMENT_URI_1, + } + + +def dummy_openbmc_log_entry() -> dict[str, Any]: + """OpenBMC-style LogEntry with Links OOC and AMDFieldIdentifiers[].""" + return { + "@odata.id": f"{DUMMY_EVENT_URI}/1", + "Created": DUMMY_TIMESTAMP, + "Id": DUMMY_RF_EVENT_ID_1, + "Links": { + "OriginOfCondition": {"@odata.id": dummy_chassis_uri(DUMMY_UNIT_A)}, + }, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_BELOW_RF, + "Description": "dummy on-die ECC, uncorrected, non-fatal", + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_A)}], + "ServiceableUnits@odata.count": 1, + } + ], + "AMDFieldIdentifiers@Members.count": 1, + }, + } + + +def dummy_openbmc_log_entry_serviceable_units_only() -> dict[str, Any]: + """LogEntry with ServiceableUnits only (no Links OOC).""" + return { + "Created": DUMMY_TIMESTAMP, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_A, + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_B)}], + } + ], + }, + } + + +def dummy_fatal_hbm_log_entry() -> dict[str, Any]: + """Minimal CPER-style row with Links + AMDFieldIdentifiers[].""" + return { + "Created": DUMMY_TIMESTAMP_LATER, + "Id": DUMMY_RF_EVENT_ID_2, + "Links": { + "OriginOfCondition": {"@odata.id": dummy_chassis_uri(DUMMY_UNIT_C)}, + }, + "Oem": { + "AMDFieldIdentifiers": [ + { + "AFID": DUMMY_AFID_FATAL_HBM, + "Description": "dummy fatal HBM", + "ServiceableUnits": [{"@odata.id": dummy_chassis_uri(DUMMY_UNIT_C)}], + "ServiceableUnits@odata.count": 1, + } + ], + "AMDFieldIdentifiers@Members.count": 1, + }, + }