Skip to content

System Architecture Overview

IoT DC3 runs the "collect → normalize → analyze → act → feedback" loop as a layered, multi-tenant microservice architecture. A single gateway is the only northbound entry point. Four center services each own one segment of the chain. Protocol drivers connect field devices southbound. This page starts with a layered diagram to give you the global picture, then walks through the four key design decisions — what problem each one solves and how they work together — and finally points to the deep-dive pages for every link in the chain.

You are here: you have read Platform Positioning and Core Concepts, and now you're breaking the loop down into a concrete layered structure. From here you can jump into any plane (data / command / auth / domain model).

Architecture at a Glance

This panorama lays out all six layers, the four center services with their ports, the message-bus exchanges, and the optional observability stack in one view — get the whole picture first, then read the logical drill-down below. The diagram adapts to the site's light/dark theme.

Center Services · gRPC interconnect (Facade: api / grpc / local)Driver Layer · 28+ protocol drivers (gRPC ↔ manager, via RabbitMQ)Observability (optional)HTTP /apiJWT verify gRPCCmd▼▲DataWrite▼▲ReadJDBCmetrics/logsWeb ConsoleVue3 · Vite8 · Element Plus · :8080dc3-cliTypeScript CLI (HTTP)AI Agent / MCP ClientChat · Tool calls · OAuth2dc3-gateway · API Gateway :8000Spring Cloud Gateway (WebFlux) · routing + auth filter + rate limitauthAuth · Tenant · RBAC:8300 / gRPC 9300managerDriver/Device/Point/Profile:8400 / gRPC 9400dataPoint value · Command · History:8500 / gRPC 9500agenticAI tool calls · Session:8600 · Spring AIRabbitMQ Message Buspoint_value · point_command · result · event · notify | MQTT :1883ModbusTCP / RTUOPCUA / DAS7 PLCSiemensMQTTBroker+ 24 protocolsBACnet·SNMP·IEC104·CoAP…Field DevicesPLC · Sensor · Meter · Gateway (Modbus / OPC / S7 / MQTT)PostgreSQL Storage• TimescaleDB — point values / history (time-series)• pgvector — AI vector search / memory• AGE — graph (device relations)schema:dc3_auth · dc3_manager · dc3_datadc3_agentic · dc3_common:35432 → 5432EMQX · MQTT Broker:31883 · Dashboard :18083ELK Log PipelineES · Logstash · Kibana :5601 · APMPrometheus + GrafanaMetrics · Dashboards :3000AccessServices / DriversMessage BusData StoreOps (optional)Devices / ExternalAuth flow▼ Command down ▲ Data up

Three-Layer Structure: Access, Platform, Storage & Messaging

The platform isn't one monolith. It's a set of services split by responsibility. From the caller's perspective, there's a single entry point — the gateway dc3-gateway (HTTP 8000), the only externally exposed HTTP port. The HTTP and gRPC ports of the other center services are reachable only on the internal network. The gateway routes requests to the four center services, which don't call each other over HTTP but cooperate cross-process through gRPC facades.

Southbound runs to a different rhythm. Field devices are connected by protocol drivers (dc3-driver-*, 28 in total), and drivers and the data center never call each other directly. They exchange messages asynchronously through RabbitMQ — point values flow northbound (upstream), commands flow southbound (downstream). All persistence lands in PostgreSQL, where time-series data (point value history) is stored in TimescaleDB hypertables.

The dashed lines are gRPC facade calls; the bidirectional solid lines are RabbitMQ exchanges. The difference between those two connection styles is exactly what the four designs below explain. For each service's ports, startup order, and health checks, see Services & Topology.

Monolithic and Distributed Deployment Forms

The diagram shows the default distributed form (four centers as independent processes). The platform can also merge the centers into a single process (dc3-center-single) running on one machine. That's just a deployment topology choice — the business chain doesn't change. The switch is governed by DC3_FACADE_MODE; see Facade Modes.

gRPC Facade: How Centers Call One Another

The four center services frequently need data from each other. Before the data center dispatches a command, for instance, it has to confirm with the management center that the device and point exist and are enabled. If business code assembled HTTP URLs to call peers directly, service boundaries would leak transport details, and the monolithic and distributed deployments couldn't share one codebase.

IoT DC3 solves this with facade interfaces. Cross-service calls program against the contract interfaces in dc3-common-facade-api, so business code knows the interface, not the transport. At runtime, DC3_FACADE_MODE picks the implementation behind the interface:

  • grpc (distributed default) — the implementation comes from dc3-common-facade-grpc, and calls go cross-process via gRPC to the target center.
  • local (single process) — the implementation comes from dc3-common-facade-local-*, and calls are direct in-process method invocations with no network hop.

In other words, "distributed or monolithic" is a deployment switch, not two codebases. The same business logic moves between the two forms just by changing DC3_FACADE_MODE.

Read the Facade Mode Defaults Carefully

In a distributed setup each center defaults to grpc. The management center's application.yml declares dc3.facade.mode: ${DC3_FACADE_MODE:grpc}, and dc3/env/dev.env also sets DC3_FACADE_MODE=grpc. The auth center's base application.yml contains a line dc3.facade.mode: local, but that's a local override for the single-process scenario — it does not mean the distributed default is local. Go by the environment variable and the manager's declaration. For the full distinction, see Facade Modes.

RabbitMQ Async Decoupling: Why Drivers and the Data Center Aren't Directly Connected

Point values are high-frequency and bursty. A single round of collection by a Modbus driver can produce hundreds or thousands of values in an instant. If a driver called the data center synchronously to write to the database, any slowdown along the way would back-pressure the collection thread — drivers would drop offline and data points would be lost. Command downlink is the same: an HTTP request shouldn't block indefinitely while waiting for a device to finish writing registers.

So a layer of RabbitMQ sits between the drivers and the data center, decoupling "production" from "consumption" in time:

  • Upstream (data): drivers publish collection results to the topic exchange dc3.e.value with routing key dc3.r.value.point.{driverServiceName}, landing in the durable queue dc3.q.value.point (7-day TTL, with dead-letter exchange dc3.e.point_value_dead). The data center's PointValueReceiver consumes asynchronously and writes to the database in batches or immediately.
  • Downstream (command): commands are delivered via the exchange dc3.e.point_command to the corresponding driver queue (30-second TTL + dead letter). After the driver executes, it sends the result back to dc3.e.point_command_result (60-second TTL), where the data center's result receiver collects it.

This way point-value writes never block collection, and command dispatch returns a commandId immediately for polling. Messages use durable delivery + manual ack + publisher confirm, with failures handled via redelivery and dead-letter. For the complete exchange, queue, and acknowledgment chain, see Data Plane and Command Plane.

A Failed Write Command Never Returns a Fabricated Value

A write command counts as successful only when the driver's write() returns Boolean.TRUE. On failure the result responseValue=null, and no "looks-successful" value is ever filled in. That's deliberate, to keep false success from misleading the upper layers. A command's PointCommandDTO.expireAt defaults to now + 10s, and the timeout is judged as EXPIRED by the driver at consumption time.

Multi-Tenant Isolation: How tenantId Runs Through Every Layer

The platform is multi-tenant by design. Isolation isn't a filter bolted on afterward — it's a foundation that runs through the entire request chain. Once a request comes in, tenantId is carried all the way along "gateway → center service → gRPC call → database query → cache key." No layer is allowed to fetch data across tenants.

The enforcement points in practice:

  • Query layer: MyBatis-Plus's tenant row handler automatically appends WHERE tenant_id = ? to all TenantOwned entities, covering full-database queries across the four centers.
  • Interface layer: BaseController.requireTenant() returns 404 (rather than leaking data) for cross-tenant single-ID lookups, and filterTenant() strips records that don't belong to the current tenant from batch results.
  • Cross-service: gRPC facade calls carry the tenant ID where the contract supports it, and cache keys also include tenant context.

So when you write a new query, a new gRPC call, or a new cache key, you have to preserve the tenant scope. That's a hard requirement, not an optional optimization. For how isolation is enforced layer by layer and how it works with RBAC, see Auth · Tenant · RBAC.

HMAC Gateway Signing: How the Backend Trusts Who the Caller Is

After authentication, the gateway packs the resolved identity (tenant, login name, principal) into an X-Auth-Principal JSON header and forwards it to the backend center services. The backend makes authorization decisions based on it. The question is: why should the backend trust that this header wasn't forged? Anyone who bypasses the gateway to hit an internal port directly could construct a fake principal header.

The answer is HMAC-SHA256 signing. The gateway signs the principal content with the secret AUTH_HMAC_SECRET (config key dc3.auth.hmac.secret) and puts the signature in the X-Auth-Sign header. The backend verifies it with the same secret and rejects anything that fails. Only the gateway holding the secret can sign a valid request, so a forged principal header is blocked at the backend.

The Production Secret Must Be Changed, or Startup Fails

The factory default of AUTH_HMAC_SECRET is io.github.pnoker.dc3, for development only. When the Spring profile contains pre or pro, if the secret is empty or still equals that default, the service throws IllegalStateException at startup and fail-fasts — refusing to go to production with a weak secret. DC3_SECURITY_KEY (login token signing) must likewise be changed to an environment-specific, strong random value.

Consistency and Scalability

The four center services are themselves stateless. Hot data such as sessions, token denylists, and latest values live in the Caffeine cache or the database, so requests aren't pinned to a particular instance. Each center can scale horizontally: attach a few more instances of the same kind behind the gateway to share the load. No shared memory is needed.

The data center's throughput bottleneck is on the consumption side, and consumption concurrency is tunable. PointValueReceiver consumes dc3.q.value.point with a high-throughput listener container, switching between "immediate write" and "PointValueJob batch write" based on the inbound rate. The batch threshold is controlled by POINT_BATCH_SPEED (default 100 records) and POINT_BATCH_INTERVAL (default 5 ms) — whichever is met first flushes to disk. Under a collection flood, RabbitMQ absorbs the burst first, and concurrent consumption plus batch writes then handle it.

Strong Consistency and Eventual Consistency Coexist

Steps on the request path — tenant isolation, authorization decisions, the command state machine — are strongly consistent (synchronous validation, immediate rejection). Upstream persistence of point values is eventually consistent via MQ: a value is treated as reliably delivered once it enters the queue, while the database write and alarm evaluation complete asynchronously. Understanding this boundary helps when troubleshooting timing issues like "the command is acknowledged but the history query is still a beat behind."

Further Reading

  • Services & Topology — six deployable units, port allocation, startup dependencies, and health checks
  • Facade Modes — the trade-off between grpc and local, switching between monolithic and distributed
  • Data Plane — every hop a point value takes from device to TimescaleDB, and the MQ topology
  • Command Plane — dispatch of read/write commands, the lifecycle state machine, and acknowledgments
  • Auth · Tenant · RBAC — gateway signing, token issuance, permission resolution, and tenant propagation
  • Domain Model — the fields of Profile / Point / Device and the DO/BO/VO layering
  • Module Map — the Maven module structure, the 28 drivers, and their dependencies

Released under the AGPL-3.0 License · 基于 AGPL-3.0 协议发布