Historian and Time-Series Databases for SCADA
Historian and Time-Series Databases for SCADA
SCADA systems generate and consume a continuous stream of operational data: analog measurements, discrete states, alarms, events, quality flags, and operator actions. Without a proper historian or time-series database, this data becomes fragmented across PLCs, HMI servers, SQL tables, and proprietary archives, making it difficult to trend performance, prove compliance, perform root-cause analysis, or feed analytics and predictive maintenance. The engineering challenge is not simply “where to store tags,” but how to design a data layer that preserves time integrity, supports deterministic operations, scales economically, and aligns with industrial cybersecurity and compliance requirements.
1. What a Historian Does in a SCADA Architecture
A historian is an industrial data collection and retrieval system optimized for high-frequency, time-stamped process data. In SCADA, it typically sits between field automation systems and enterprise applications. Its role is to:
- Collect time-series values from PLCs, RTUs, PACs, drives, and meters.
- Store timestamps, quality, and value changes efficiently.
- Support fast retrieval for trends, reports, dashboards, and investigations.
- Preserve event sequences for alarms and operator actions.
- Enable data exchange with MES, ERP, CMMS, and cloud analytics platforms.
In practical terms, a historian is usually optimized for “append and query by time,” while a conventional relational database is optimized for transactional records, joins, and business objects. Time-series databases are increasingly used for the same purpose, especially when open standards, cloud integration, or high-volume analytics are priorities.
2. Historian vs Time-Series Database: Engineering Differences
Historically, vendors such as OSIsoft PI, AVEVA Historian, and AspenTech historians dominated industrial plants because they offered compression, exception reporting, buffering, security, and industrial protocol support. Modern time-series databases such as InfluxDB, TimescaleDB, QuestDB, and cloud-native services can be attractive where openness, DevOps integration, or cost transparency matter.
The engineering distinction is not just branding. A historian generally provides industrial features such as:
- Store-and-forward buffering at edge or gateway level.
- Bad quality and substituted value handling.
- Tag metadata, units, scaling, and asset models.
- Event frames, batch context, and alarm integration.
- Built-in tools for operator trends and shift reports.
A time-series database may instead prioritize:
- SQL-like querying or API-first access.
- Horizontal scale-out and cloud deployment.
- Open data models and integration with data science tools.
- Flexible retention policies and downsampling.
For SCADA engineers, the key question is whether the platform can preserve operational integrity under network outages, cybersecurity constraints, and audit requirements.
3. Data Types That Matter in SCADA
Not all industrial data should be treated the same way. A robust design separates the following classes:
- Analog process values: pressure, flow, temperature, level, power, vibration.
- Discrete states: motor running, valve open, breaker trip, permissive active.
- Alarms and events: limit violations, interlocks, acknowledgments, operator actions.
- Quality information: good, uncertain, bad, substituted, communication lost.
- Batch or production context: lot ID, recipe, campaign, equipment state.
IEC 62541 (OPC UA) supports data quality and timestamps natively, which is one reason it is often preferred for historian integration. For alarm management, IEC 62682 and ISA-18.2 are directly relevant, because alarm records should not be mixed casually with process values; they require lifecycle management, shelving rules, acknowledgment status, and event timestamps.
4. Architecture Patterns for SCADA Data Collection
There are three common patterns.
4.1 Direct-to-Historian
PLCs or SCADA servers push data directly to the historian. This is simple and low-latency, but it can create coupling and overload the control network if not engineered carefully.
4.2 Edge Gateway with Store-and-Forward
An edge node collects data from controllers using OPC UA, Modbus TCP, EtherNet/IP, PROFINET gateways, or vendor drivers, then buffers locally before forwarding to the central historian. This is the preferred pattern for remote sites, harsh networks, or NIS2-sensitive environments because it reduces dependency on continuous WAN connectivity.
4.3 Distributed Historian Replication
Multiple local historians aggregate site data and replicate summarized or raw values to a central enterprise historian. This pattern is common in multi-site utilities, oil and gas, and manufacturing groups.
From a compliance and lifecycle perspective, architecture should support defense in depth, least privilege, secure remote access, and logging. IEC 62443 is the principal family of standards for industrial cybersecurity design. For European operators subject to NIS2, the historian layer is part of the essential attack surface and should be segmented, monitored, and backed by incident recovery procedures.
5. Storage Models: Raw, Compressed, and Contextual
Historian storage is usually a mix of raw samples and compressed values. Compression is not merely a disk-saving trick; it is a data engineering strategy. The system can store every change, or it can apply deadband and exception rules.
A common compression rule is:
Store a new value only if the change exceeds a threshold or if a maximum time interval has elapsed.
Mathematically, if a tag value at time $t$ is $x(t)$, then a new point is archived when:
$$|x(t)-x(t_{last})| \geq \Delta x$$
or
$$t-t_{last} \geq \Delta t_{max}$$
This reduces storage while preserving operational meaning. However, for safety-related or forensic use cases, excessive compression can destroy evidence. Engineers must define retention and compression policies by tag class, not globally.
6. Worked Example: Sizing a Simple Historian for a Water Plant
Consider a water treatment plant with the following data sources:
- 200 analog tags sampled every 2 seconds
- 400 discrete tags that change state on average 12 times per hour
- 50 alarm/event tags generating 3 events per hour each
Assume each stored analog sample requires 24 bytes after compression overhead, each discrete event record requires 18 bytes, and each alarm/event record requires 40 bytes including timestamp, quality, and metadata.
Analog data volume per day:
Samples per tag per day:
$$\frac{24 \times 60 \times 60}{2} = 43{,}200$$
Total analog samples per day:
$$200 \times 43{,}200 = 8{,}640{,}000$$
Storage per day:
$$8{,}640{,}000 \times 24 = 207{,}360{,}000 \text{ bytes} \approx 198 \text{ MiB/day}$$
Discrete data volume per day:
Events per tag per day:
$$12 \times 24 = 288$$
Total discrete events per day:
$$400 \times 288 = 115{,}200$$
Storage per day:
$$115{,}200 \times 18 = 2{,}073{,}600 \text{ bytes} \approx 2.0 \text{ MiB/day}$$
Alarm/event data volume per day:
Events per tag per day:
$$3 \times 24 = 72$$
Total alarm events per day:
$$50 \times 72 = 3{,}600$$
Storage per day:
$$3{,}600 \times 40 = 144{,}000 \text{ bytes} \approx 0.14 \text{ MiB/day}$$
Total daily storage:
$$198 + 2.0 + 0.14 \approx 200.1 \text{ MiB/day}$$
For one year:
$$200.1 \times 365 \approx 73{,}536 \text{ MiB} \approx 71.8 \text{ GiB/year}$$
This is a modest footprint, but real systems often add quality flags, redundancy, replication, reporting indexes, backups, and longer retention. A practical engineering allowance is often 2x to 5x the raw estimate, depending on architecture and retention policy.
7. Comparison Matrix: Historian vs Relational DB vs Time-Series DB
| Criterion | Industrial Historian | Relational Database | Time-Series Database |
|---|---|---|---|
| Best use case | SCADA, process data, alarms, operator history | Business transactions, master data, reporting | High-volume time-stamped telemetry and analytics |
| Timestamp handling | Strong, often with quality and source time | Possible, but not optimized | Strong, usually core design feature |
| Compression | Built-in industrial compression and deadband | Limited or custom | Often built-in or configurable |
| Store-and-forward | Common | Rare | Sometimes, usually via edge tools |
| Alarm/event context | Usually strong | Possible but manual | Variable |
| SCADA protocol support | Often native or via connectors | Needs middleware | Usually via collectors/gateways |
| Audit and compliance | Good for operational traceability | Strong for business transactions | Depends on implementation |
| Cybersecurity fit | Good if segmented and hardened | Good in IT zones | Good if secured end-to-end |
8. Engineering Requirements: Time Synchronization, Quality, and Retention
Time synchronization is non-negotiable. If timestamps are inconsistent, trending and forensic analysis become unreliable. Use a hierarchy of time sources, preferably NTP with authoritative internal time servers, or PTP where tighter synchronization is necessary. IEC 61850 environments often require particularly careful time alignment, though SCADA historians are more commonly aligned through NTP.
Quality flags should be preserved from source to archive. A value without quality can be misleading, especially during communications loss or sensor failure. For retention, define at least three tiers:
- Hot retention: recent high-resolution data for operations.
- Warm retention: compressed or downsampled data for investigations.
- Cold archive: long-term records for compliance and trend analysis.
Where regulated reporting is involved, retention periods should be aligned with contractual, environmental, or utility obligations. If the historian supports legal traceability, access control and audit logs should be enabled and reviewed.
9. Cybersecurity and Compliance Considerations
Historian systems are frequently connected to both OT and IT networks, making them high-value targets. IEC 62443-3-3 security requirements are especially relevant for segmentation, authentication, data confidentiality, integrity, and availability. IEC 62443-2-1 supports security program management, while IEC 62443-4-2 addresses component security capabilities.
In European projects, CE-marked automation systems and machinery-related SCADA integrations should also respect the Machinery Directive context where applicable, especially for control system documentation, validation, and safe-state behavior. For alarm and event logging, IEC 62682 and ISA-18.2 support disciplined management of alarm systems, which is essential when the historian is used for compliance evidence.
From a procurement standpoint, ask vendors for:
- Supported protocols and native drivers.
- Buffering behavior during WAN outages.
- Role-based access control and audit logging.
- Backup/restore and disaster recovery procedures.
- Evidence of IEC 62443 alignment or certification.
10. Practical Selection Guidance
Select a traditional historian when the project needs strong industrial semantics, alarm/event handling, and proven SCADA integration. Select a time-series database when openness, cloud analytics, or IT-standard tooling is the primary driver. In many modern architectures, the best answer is hybrid: an edge historian or gateway for plant continuity, plus a time-series or data lake layer for enterprise analytics.
Ask these questions early:
- What data must be preserved raw, and what can be compressed?
- What is the maximum tolerable data loss during a network outage?
- Which tags require millisecond timestamps and which do not?
- How will alarms, events, and operator actions be correlated?
- What cybersecurity controls are required by the owner and by NIS2 scope?
These answers drive architecture, storage sizing, licensing, and lifecycle cost far more than vendor feature lists do.
Closing Notes: Common Mistakes and How to Avoid Them
The most common engineering mistakes are treating all tags equally, ignoring timestamp quality, underestimating retention growth, and placing the historian too close to the control network without segmentation. Another frequent error is compressing data aggressively without understanding which signals are needed for forensic analysis or regulatory reporting. Engineers also sometimes fail to define ownership of alarm data, resulting in inconsistent event logs and poor investigation quality. Avoid these problems by creating a tag classification standard, defining retention by data criticality, validating time synchronization, enforcing IEC 62443-aligned segmentation, and testing outage recovery before commissioning. A historian is not just a database; it is part of the plant’s operational memory, and it should be designed with the same rigor as the control system itself.
Frequently asked questions
What is the difference between a SCADA historian and a general-purpose time-series database in industrial projects?
A SCADA historian is purpose-built for high-speed collection of process tags, alarms, and events, with features such as compression, deadbanding, and fast retrieval for operator and reporting use. A general-purpose time-series database can store similar data, but it usually lacks native industrial functions like tag metadata, event framing, and OT-oriented security and failover design; for European projects, the historian architecture should align with IEC 62443 for security and IEC 62541 (OPC UA) or IEC 60870/IEC 61850 interfaces where applicable.
How should sampling rate and data compression be configured for SCADA historian tags?
Sampling should match the process dynamics and the intended use of the data, because oversampling increases storage and network load without improving engineering value. Compression and exception-based logging are commonly used to preserve meaningful changes while reducing noise, and the implementation should be validated against project requirements and alarm/performance objectives defined in IEC 62682 and ISA-18.2 for alarm-related data.
What are the key design requirements for historian redundancy and high availability in SCADA systems?
For critical infrastructure, the historian should support redundant collectors, database failover, and store-and-forward buffering so data is not lost during network or server outages. Redundancy design should be coordinated with the overall control system availability target and documented in line with IEC 62443-3-3 for system security requirements and IEC 61508/61511 where the historian is used to support safety-related evidence or diagnostics.
How do you integrate a historian with PLCs, RTUs, and SCADA servers on multinational EPC projects?
The preferred approach is usually OPC UA for secure, vendor-neutral data exchange, with native drivers only where required for legacy equipment or protocol constraints. Integration should include clear tag naming, timestamp source rules, and network segmentation, and European projects typically expect compliance with IEC 62541 for OPC UA and IEC 62443 for secure zones and conduits.
What cybersecurity controls are required for historian access in OT networks?
Historian access should be restricted by role-based permissions, strong authentication, network segmentation, and audited remote access, because historians often expose large volumes of operational data to many users. For global projects with European compliance focus, the control set should map to IEC 62443 requirements for zones, conduits, least privilege, and secure remote maintenance, with logging retained to support incident investigation and compliance audits.
How long should SCADA historian data be retained for industrial plants and utilities?
Retention depends on operational, regulatory, and contractual requirements, such as troubleshooting, energy reporting, environmental compliance, and warranty obligations. In practice, short-term high-resolution data may be kept for weeks or months while summarized data is retained longer, and the retention policy should be defined by the owner’s information management rules and any applicable EN or national compliance obligations.
What data model and tag naming conventions are best for historian deployment on large industrial sites?
A consistent asset-based naming structure is essential so tags can be searched, filtered, and mapped to equipment hierarchies across projects and vendors. Good practice is to align historian tag structure with the plant breakdown, P&IDs, and asset register, while using standardized metadata and semantic tags where possible; OPC UA information modeling and ISA-95 concepts are commonly used to improve interoperability.
How do historians support reporting, energy management, and fault analysis in SCADA?
Historians provide time-stamped process records that can be used for production reports, energy intensity calculations, event reconstruction, and root-cause analysis after trips or process upsets. For European industrial sites, this is especially valuable when combined with quality and energy management workflows, and the data integrity expectations should be aligned with IEC 61512/ISA-88 for batch contexts and IEC 62443 for secure data handling.