Skip to content

docs: MaaS POC validated-models writeup (draft)#797

Draft
EdisonSu768 wants to merge 1 commit into
mainfrom
docs/maas-poc-validated-models
Draft

docs: MaaS POC validated-models writeup (draft)#797
EdisonSu768 wants to merge 1 commit into
mainfrom
docs/maas-poc-validated-models

Conversation

@EdisonSu768

Copy link
Copy Markdown
Member

Summary

Draft writeup of the MaaS proof of concept on Ascend 910B4, for iterative enrichment.

Adds docs/en/solutions/AI/Alauda_AI_MaaS_Validated_Models_POC_on_Ascend_910B4.md (English; Chinese is auto-translated by the pipeline).

Covers:

  • Environment: single node 8 × Ascend 910B4 (32 GB/card), vllm-ascend, KServe LLMInferenceService + InferNex, OpenAI-compatible MaaS gateway.
  • Validated models: Qwen3.6-27B (W8A8) and DeepSeek-V4-Flash (W4A8).
  • Test methodology: aiperf closed-loop, concurrency 4, two scenarios (8K / 17.5K input, 128 output), n=3.
  • Measured performance: per-model TTFT / ITL / E2E / decode / TPS tables.
  • MaaS gateway access: Scenario 1 OK; Scenario 2 blocked by the gateway request-body buffer limit (remediation noted).

Status

This is an intentional draft to build on step by step. Open items are tracked in the doc's
"To Be Enriched" checklist (accuracy eval, Qwen gateway path, diagram, screenshots, DeepSeek
quantization-source decision, capacity notes).

🤖 Generated with Claude Code

Draft writeup of the MaaS proof-of-concept on Ascend 910B4: two
validated models (Qwen3.6-27B W8A8, DeepSeek-V4-Flash W4A8), test
scenarios, measured performance (closed-loop n=3), and MaaS gateway
access findings (Scenario 2 blocked by gateway request-body limit).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant