Getting Database Modeling Right

Primary keys, referential integrity, and clean relationships — with a practical focus on IBM i, Db2 for i, and modern backend architectures.
1. Database modeling is architecture
In many IBM i landscapes, databases have grown over time: DDS-described physical files, logical files, RPG programs with decades of validation logic, very few explicit foreign keys, and data integrity that was “always handled by the application.” As long as a system runs in isolation, that can work surprisingly well. The invoice usually arrives later — when integration, reporting, or modernization becomes real.
Typical situations where modeling weaknesses become visible:
- A new Java, .NET, or web front end accesses Db2 for i directly through SQL.
- A data warehouse extracts ERP data regularly through ETL or ELT pipelines.
- An older 5250 application is gradually extended with APIs, microservices, or integration hubs.
- Two companies with their own IBM i systems merge.
- Multi-tenancy or group-wide consolidation is introduced after the fact.
- External systems replicate data through CDC or synchronization processes.
At that point, “the RPG program takes care of it” is no longer enough. Direct SQL access, new services, batch jobs, and reporting tools often bypass that logic. What used to work inside a monolith suddenly becomes visible in an integrated architecture — including duplicates, orphaned child rows, unclear keys, and inconsistent status values.
2. Cutting entities the right way
An entity is not a table name. An entity is a business object with stable identity, clear responsibility, and a traceable life cycle. Tables are simply the technical implementation of that business model.
Example: sales order.
- An order is created, changed, confirmed, shipped, invoiced, canceled, and archived.
- An order line initially exists only within the context of that order.
- If the line is later referenced independently — for example by a return, warehouse document, quality process, or legal retention rule — it needs a clear identity of its own.
CREATE TABLE orders (
id BIGINT GENERATED ALWAYS AS IDENTITY,
order_no VARCHAR(30) NOT NULL,
customer_id BIGINT NOT NULL,
order_status CHAR(1) NOT NULL DEFAULT 'O',
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (id),
CONSTRAINT uq_orders_order_no UNIQUE (order_no),
CONSTRAINT chk_orders_status CHECK (order_status IN ('O','C','I','S'))
);
CREATE TABLE order_positions (
id BIGINT GENERATED ALWAYS AS IDENTITY,
order_id BIGINT NOT NULL,
position_no INTEGER NOT NULL,
item_id BIGINT NOT NULL,
quantity DECIMAL(11,3) NOT NULL,
net_price DECIMAL(15,4) NOT NULL,
PRIMARY KEY (id),
CONSTRAINT fk_pos_order
FOREIGN KEY (order_id) REFERENCES orders(id)
ON DELETE CASCADE,
CONSTRAINT uq_pos_no UNIQUE (order_id, position_no),
CONSTRAINT chk_pos_quantity CHECK (quantity > 0)
);
The business model comes first. If you design tables without a clear business boundary, you create ambiguous ownership: Which table is authoritative? Which column is the truth? Which process is allowed to delete? Which data must remain audit-proof?
3. Primary key: identity is not meaning
A primary key has one job: to identify a row permanently, uniquely, and stably. It is not a display field, not a business rule, and not a human-facing numbering scheme.
Why meaningful keys become expensive over time
- Mergers: Two systems may both have customer number 4711, but for different customers.
- Multi-tenancy: A simple
customer_nosuddenly becomestenant + customer_no. - Rule changes: Number formats change, prefixes are added, and old numbers still remain historically relevant.
- Replication: Decentralized systems assign numbers independently.
- Interfaces: External systems store the key and expect it to remain stable.
Recommended key strategy
| Strategy | Good fit | Note |
|---|---|---|
BIGINT GENERATED ALWAYS AS IDENTITY |
Core entities in a central system | Compact, index-friendly, and a good fit for many classic ERP tables. |
| UUID | Distributed generation, offline capture, integration, external references | On Db2 for i, GENERATE_UUID() creates UUID v4 values. Time-ordered UUID v7 values are typically generated in the application. |
Business key as UNIQUE |
Customer numbers, item numbers, contract numbers, external IDs | Important for the business, but not necessarily the technical primary key. |
| Composite primary key | Pure relationship tables or deliberately small lookup/association tables | Do not use this as the default for core entities; foreign keys quickly become cumbersome. |
Storing UUIDs sensibly on Db2 for i
On Db2 for i, GENERATE_UUID() returns a formatted CHAR(36) value. For smaller indexes and more compact storage, storing the value as BINARY(16) or VARBINARY(16) can make sense. Conversion can be done with VARBINARY_FORMAT on supported releases.
CREATE TABLE customers (
id_bin VARBINARY(16) NOT NULL
DEFAULT VARBINARY_FORMAT(GENERATE_UUID(),
'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'),
tenant_id SMALLINT NOT NULL,
customer_no VARCHAR(30) NOT NULL,
name1 VARCHAR(80) NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
rowchg_ts TIMESTAMP NOT NULL
GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP,
PRIMARY KEY (id_bin),
CONSTRAINT uq_customer_tenant_no UNIQUE (tenant_id, customer_no),
CONSTRAINT chk_customer_no_len CHECK (LENGTH(customer_no) BETWEEN 3 AND 30)
);
-- Display the UUID in readable form
SELECT VARCHAR_FORMAT_BINARY(id_bin,
'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX') AS customer_uuid,
tenant_id,
customer_no,
name1
FROM customers;
VARBINARY_FORMAT approach is compact and index-friendly, but it requires a current enough release level. Alternatively, you can store UUIDs as CHAR(36), which is simpler and broadly available. UUID v4 is robust for distributed generation; UUID v7 is time-ordered and usually friendlier to indexes, but it is typically generated in the application.4. Integration, multi-tenancy, and mergers
Integration is the stress test for every data model. As soon as data is moved, mirrored, consolidated, or synchronized in real time across systems, you find out whether identity and relationships are truly stable.
Common integration traps
- Overlapping number ranges: Customer 4711 exists in system A and system B — but they are different customers.
- Multi-tenancy added later: Every uniqueness rule and almost every relationship must be re-evaluated.
- Business keys used as technical coupling: Changes in the business trigger technical migrations.
- No declarative constraints: ETL or reporting processes discover inconsistencies that the application supposedly would never have allowed.
- No change strategy: Without reliable change timestamps or journal-based change capture, delta extraction and CDC become unnecessarily difficult.
5. Modernizing legacy tables on IBM i
Many classic IBM i applications are based on DDS-described files and logical files. That is not automatically bad. The problem starts when implicit assumptions are not visible as SQL constraints. Modern tools then see tables, but not the rules that keep the data clean.
A pragmatic modernization path
- Profile existing data: duplicates, NULL values, invalid status codes, orphaned child records.
- Add a technical ID:
BIGINT IDENTITYor UUID, depending on integration requirements. - Populate existing rows cleanly.
- Protect business keys with
UNIQUEconstraints. - Gradually move references to technical IDs — starting with new tables and services.
- Enable foreign keys, CHECK constraints, and NOT NULL rules only after data cleanup.
- Provide compatibility views so old programs do not have to be rewritten immediately.
-- Example: add a technical UUID column
ALTER TABLE customer_legacy
ADD COLUMN id CHAR(36) CCSID 1208;
-- Populate existing rows
UPDATE customer_legacy
SET id = GENERATE_UUID()
WHERE id IS NULL;
-- Check data quality before adding constraints:
-- 1) Are there any NULL values?
-- 2) Are there duplicates in the business key?
-- 3) Are there dependent rows without a matching parent?
ALTER TABLE customer_legacy
ALTER COLUMN id SET NOT NULL;
ALTER TABLE customer_legacy
ADD CONSTRAINT pk_customer_legacy PRIMARY KEY (id);
ALTER TABLE customer_legacy
ADD CONSTRAINT uq_customer_legacy_tenant_no
UNIQUE (tenant_id, customer_no);
ROW CHANGE TIMESTAMP column is automatically set by Db2 on inserts and updates. That is useful for simple delta detection. For robust auditing, restart/recovery logic, delete history, or mixed modernization scenarios, journaling, CDC, or dedicated audit tables are often the stronger solution.6. Referential integrity belongs in the database
Foreign-key constraints are not duplicated business logic. They are structural invariants: a child row must not point to a parent row that does not exist. That rule belongs where all access paths meet — in the database.
IBM describes referential constraints in exactly that sense: they ensure that values in a dependent table are valid only when they refer to matching values in a parent table. Once defined, they are enforced system-wide when data is changed through SQL and other interfaces.
ALTER TABLE order_line
ADD CONSTRAINT fk_order_line_order
FOREIGN KEY (order_id)
REFERENCES order_header (id)
ON DELETE CASCADE
ON UPDATE RESTRICT;
| Scenario | Typical rule | Reason |
|---|---|---|
| Child is fully dependent | ON DELETE CASCADE |
An order line has no business existence without its order. |
| Child has its own meaning | RESTRICT or NO ACTION |
Deleting the parent should be prevented while dependent data still exists. |
| Audit, legal, historical data | Restrict + soft delete | Data is deactivated logically, not removed uncontrollably. |
| Optional relationship | SET NULL only deliberately |
Use only if the child can legitimately continue without a parent. |
No. Constraints do add checks, but they protect data quality across the system. With proper indexing and deliberate design, they are usually cheaper in business applications than years of cleanup, special-case logic, and troubleshooting.
7. Modeling relationships correctly
One-to-many — the classic case
A customer has many orders. The order references the customer. Not the other way around.
customers (id PK)
orders (id PK, customer_id FK -> customers.id)
Many-to-many — almost never just a dumb cross-reference table
In real systems, many-to-many relationships often gain attributes of their own: quantity, price at the time, discount, status, sequence, source, validity. At that point, the relationship table is an entity in its own right.
CREATE TABLE order_item (
id BIGINT GENERATED ALWAYS AS IDENTITY,
order_id BIGINT NOT NULL,
item_id BIGINT NOT NULL,
position_no INTEGER NOT NULL,
quantity DECIMAL(11,3) NOT NULL,
price_at_order DECIMAL(15,4) NOT NULL,
discount_percent DECIMAL(5,2) NOT NULL DEFAULT 0,
PRIMARY KEY (id),
CONSTRAINT fk_oi_order FOREIGN KEY (order_id)
REFERENCES orders(id) ON DELETE CASCADE,
CONSTRAINT fk_oi_item FOREIGN KEY (item_id)
REFERENCES items(id) ON DELETE RESTRICT,
CONSTRAINT uq_oi_position UNIQUE (order_id, position_no),
CONSTRAINT chk_oi_quantity CHECK (quantity > 0)
);
'1,7,42', or use a JSON array as a substitute for a core relationship. That makes constraints, joins, indexes, reporting, and data quality much harder.8. New table or new column?
One of the most common modeling questions is: Is a new column enough — or do we need a new table?
| Question | If yes | If no |
|---|---|---|
| Does the object have its own life cycle? | New table | Consider a column |
| Can it occur multiple times? | New one-to-many table | A column is often enough |
| Does it need history? | History table with valid-from/valid-to or versioning / temporal tables | Column or timestamp |
| Will it be referenced by other tables? | Separate entity | Attribute of the existing entity |
| Is it rarely populated and truly optional? | Separate detail table or controlled extension structure | Nullable column, if the semantics are clear |
| Is it only a display or cache value? | Denormalize only deliberately | Calculate it or use a view |
SYSTEM VERSIONING (temporal tables). That lets the database automatically maintain history for table changes. It is often cleaner than hand-built history tables and can work transparently for existing applications.9. EAV, JSON, and custom fields
Entity-Attribute-Value, or EAV, stores attributes as rows instead of columns. That looks flexible, but it moves structure out of the model and into the data. Typing, constraints, indexes, and reporting become much more difficult.
CREATE TABLE item_eav (
item_id BIGINT NOT NULL,
attribute_name VARCHAR(80) NOT NULL,
attribute_value VARCHAR(4000),
PRIMARY KEY (item_id, attribute_name)
);
Why EAV hurts for core data
- Loss of type information: Numbers, dates, and flags end up as text.
- Weak validation:
NOT NULL,CHECK, andFOREIGN KEYrules apply only indirectly. - Complex queries: Simple columns become self-joins, pivot logic, and
MAX(CASE ...)expressions. - Difficult reporting: BI tools expect stable columns, not dynamic lists of attributes.
- Index problems: An index on
attribute_namedoes not automatically help value comparisons that require type conversion.
When EAV or JSON can be acceptable
- There are many potential additional attributes, most of which are empty.
- Attributes are customer-specific or tenant-specific.
- The data is not part of central integrity rules.
- Reporting is handled through controlled views, materialized structures, or defined export models.
CREATE TABLE attribute_def (
attribute_id BIGINT GENERATED ALWAYS AS IDENTITY,
attribute_key VARCHAR(80) NOT NULL,
data_type CHAR(1) NOT NULL,
required SMALLINT NOT NULL DEFAULT 0,
PRIMARY KEY (attribute_id),
CONSTRAINT uq_attribute_key UNIQUE (attribute_key),
CONSTRAINT chk_attribute_type CHECK (data_type IN ('S','N','T','B'))
);
CREATE TABLE item_attribute (
item_id BIGINT NOT NULL,
attribute_id BIGINT NOT NULL,
val_string VARCHAR(4000),
val_number DECIMAL(31,10),
val_timestamp TIMESTAMP,
val_boolean SMALLINT,
PRIMARY KEY (item_id, attribute_id),
CONSTRAINT fk_item_attr_def
FOREIGN KEY (attribute_id) REFERENCES attribute_def(attribute_id),
CONSTRAINT chk_exactly_one_value CHECK (
(CASE WHEN val_string IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN val_number IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN val_timestamp IS NULL THEN 0 ELSE 1 END) +
(CASE WHEN val_boolean IS NULL THEN 0 ELSE 1 END) = 1
)
);
10. Normalization, explained pragmatically
Normalization reduces redundancy and prevents common data anomalies. In practice, the goal is not to push every model academically into fifth normal form. The goal is to keep master data clean and use redundancy only deliberately.
order_header (
order_no,
customer_no,
customer_name,
customer_street,
customer_zip,
customer_city,
...
)
If the customer address changes, it has to be updated in every order. If only some rows are updated, the database now contains contradictory data. That is the classic update anomaly.
| Normal form | Practical meaning |
|---|---|
| 1NF | Atomic values, no comma-separated lists, no repeating column groups. |
| 2NF | Non-key attributes depend on the entire key, not just part of it. |
| 3NF / BCNF | No hidden dependencies between non-key attributes. |
| 4NF / 5NF | Relevant for complex multi-valued dependencies and special cases. |
11. Anti-patterns at a glance
- Meaningful primary keys as the default.
VARCHAR(5000), CLOB, or JSON as a universal type for structured core data.- Comma-separated lists in a single column.
- Status values as
CHAR(1)without aCHECKconstraint or lookup table. NULLwithout clear semantics: unknown, not applicable, empty, not yet calculated?- Columns like
phone1,phone2,phone3instead of a proper one-to-many structure. - History tracking without
valid_from,valid_to, version, or audit concept. - Soft delete without an archive and purge strategy.
- Data integrity only in RPG, Java, or .NET.
- EAV for core data instead of true customization scenarios.
ALTER TABLE order_header
ADD CONSTRAINT chk_order_header_status
CHECK (status IN ('O','C','I','X'));
-- O = open, C = confirmed, I = invoiced, X = canceled
12. Practical checklist
- Every core entity has a stable technical primary key.
- Business keys are protected with
UNIQUEconstraints. - Multi-tenancy is modeled explicitly and not hidden inside number ranges.
- Important relationships have declarative foreign keys.
ON DELETEandON UPDATEbehavior is decided and documented from a business perspective.- Status values, value ranges, and mandatory fields are protected with
CHECK,NOT NULL, or lookup tables. - History, auditing, and deletion strategy are clarified before the first production run.
- No comma-separated lists, no repeating column groups, no JSON as a replacement for core relationships.
- EAV or custom fields are used only in controlled extension scenarios, not for central business objects.
- Views encapsulate compatibility and soft-delete filters instead of rebuilding rules in every tool.
- Migrations are prepared with data profiling, test data, a rollback plan, and measurement.
13. Conclusion
UNIQUE, not necessarily as the primary key. Put referential integrity into the database. Model relationships explicitly. Use EAV, JSON, and denormalization only deliberately. And above all: build models that work not just for today’s RPG program, but also for reporting, APIs, migration, mergers — and the developer who will be staring at your schema ten years from now, muttering things you probably do not want quoted.Glossary
Key relational modeling terms — explained briefly.
- Entity
- A business object with stable identity and its own life cycle, such as customer, order, item, or invoice.
- Primary Key
- A unique, non-null key for a table. It should be stable, minimal, and permanently immutable.
- Surrogate Key
- An artificial technical key with no business meaning, such as an identity value or UUID.
- Business Key / Natural Key
- A business-facing key such as customer number, item number, or contract number. Important for users and interfaces, but not automatically ideal as the primary key.
- Foreign Key
- A declarative database rule that ensures a value in a child table points to an existing parent row.
- Referential Integrity
- The property of a data model where relationships between tables remain consistent and orphaned child rows are prevented.
- DDS
- Data Description Specifications: the classic IBM i technology for defining physical and logical files.
- Db2 for i
- The integrated relational database of the IBM i platform.
- Logical File
- A classic IBM i object used for access paths and views over physical files. Useful, but not a substitute for declarative SQL constraints.
- Identity Column
- A column whose values are generated automatically by Db2, for example
BIGINT GENERATED ALWAYS AS IDENTITY. - UUID
- A 128-bit identifier. Db2 for i can generate UUID v4 values with
GENERATE_UUID(); UUID v7 is time-ordered and is often generated in the application. - ROW CHANGE TIMESTAMP
- A Db2 column that is automatically updated on inserts and updates. Useful for delta detection, but not the same as full business-level auditing.
- Normalization
- A set of rules for reducing redundancy and avoiding insert, update, and delete anomalies.
- EAV
- Entity-Attribute-Value: a flexible but often problematic pattern where attributes are stored as key-value rows.
- Soft Delete
- Logical deletion using a flag or timestamp instead of a physical DELETE. It needs an archive and purge strategy.
- CDC
- Change Data Capture: a method for detecting data changes and passing them to other systems.
- ETL / ELT
- Data integration into a data warehouse. Both approaches often bypass application code and therefore benefit from robust database rules.
- Denormalization
- Deliberate redundancy for performance, reporting, or simplification. It should be documented and technically controlled.
- Temporal Tables / SYSTEM VERSIONING
- A Db2 for i feature for automatic, transparent history tracking of table changes by the database.
Sources & further reading
- IBM Docs: Db2 for i SQL Reference 7.6
- IBM Docs: CREATE TABLE – Db2 for i
- IBM Docs: Referential constraints – Db2 for i
- IBM Docs: GENERATE_UUID – Db2 for i
- RFC 9562: Universally Unique IDentifiers (UUIDs)
- IBM Docs: CREATE VIEW – Db2 for i
- Martin Fowler: Patterns of Enterprise Application Architecture – Catalog
- E. F. Codd: A Relational Model of Data for Large Shared Data Banks
Transparency Notice
The content on tiny-tool.de is carefully researched, editorially reviewed, and updated on a regular basis. Sources and quotations are provided in a transparent and traceable manner. However, we do not guarantee the accuracy, completeness, or timeliness of the information provided. Errors cannot be ruled out.
Authorship & Editorial Support: Texts published on tiny-tool.de are original editorial works created by our team (final editorial responsibility: Guido Zeuner). Digital tools — including AI-based assistance systems — may be used solely as support tools for research, structuring, or language refinement. The selection of content, structure, reasoning, and final wording are created and approved by us as natural persons; AI systems are not authors.
Audience Measurement (VG Wort / METIS): To measure text reach, tracking pixels provided by VG Wort are used. For technical reasons, these tracking marks are loaded when the page is accessed and currently cannot be blocked through the cookie banner, as no cookies are set. The measurement is used exclusively for reach statistics; no personal profiles are created. More information is available in our Privacy Notice.
Please note: The content provided on this website is for general informational purposes only and does not constitute professional advice of any kind, including legal, tax, financial, or other specialist advice. Use of the content is at your own risk. Liability for material or immaterial damages is excluded, except in cases of intentional misconduct or gross negligence.
Advertising & Affiliate Links: Some posts may contain promotional references or so-called affiliate links. These are identified accordingly. Clicking such links does not result in any additional cost to you; we may receive a small commission.
Trademark Notice: All brand names, logos, and product names are the property of their respective rights holders and are used solely for identification and descriptive purposes. No affiliation with the companies mentioned is implied.
External Links: This website contains links to external third-party websites. Although we review links carefully, we assume no responsibility for the content of external websites. If we become aware of unlawful content, we will remove the respective links without undue delay.



tiny-tool.de