Getting Database Modeling Right

Futuristic database modeling banner with glowing database icons, ER diagrams, IBM i and Db2 for i best-practice theme, and tiny-tool.de branding.
Good data models are not a side issue: they determine how maintainable, integration-ready, and future-proof applications remain.

Primary keys, referential integrity, and clean relationships — with a practical focus on IBM i, Db2 for i, and modern backend architectures.

Data outlives technology. Frameworks come and go, front ends get rebuilt, interfaces change — but the data model often stays in the engine room for decades. When you design tables today, you are not just deciding on a few column names. You are shaping integration, maintainability, reporting, migration, and modernization for years to come. This article brings together the most important principles of solid relational database modeling — practical, IBM i-aware, and without turning modeling into a religion.

1. Database modeling is architecture

In many IBM i landscapes, databases have grown over time: DDS-described physical files, logical files, RPG programs with decades of validation logic, very few explicit foreign keys, and data integrity that was “always handled by the application.” As long as a system runs in isolation, that can work surprisingly well. The invoice usually arrives later — when integration, reporting, or modernization becomes real.

In short: The database is not a passive storage container. It is the longest-living structural model of the business. A good model makes future modernization possible. A poor model makes every modernization effort expensive.

Typical situations where modeling weaknesses become visible:

  • A new Java, .NET, or web front end accesses Db2 for i directly through SQL.
  • A data warehouse extracts ERP data regularly through ETL or ELT pipelines.
  • An older 5250 application is gradually extended with APIs, microservices, or integration hubs.
  • Two companies with their own IBM i systems merge.
  • Multi-tenancy or group-wide consolidation is introduced after the fact.
  • External systems replicate data through CDC or synchronization processes.

At that point, “the RPG program takes care of it” is no longer enough. Direct SQL access, new services, batch jobs, and reporting tools often bypass that logic. What used to work inside a monolith suddenly becomes visible in an integrated architecture — including duplicates, orphaned child rows, unclear keys, and inconsistent status values.

2. Cutting entities the right way

An entity is not a table name. An entity is a business object with stable identity, clear responsibility, and a traceable life cycle. Tables are simply the technical implementation of that business model.

The most important test: Does this object have its own life cycle? Is it created, changed, canceled, archived, or referenced by other processes independently? If yes, a dedicated entity is often cleaner than just adding a few more columns.

Example: sales order.

  • An order is created, changed, confirmed, shipped, invoiced, canceled, and archived.
  • An order line initially exists only within the context of that order.
  • If the line is later referenced independently — for example by a return, warehouse document, quality process, or legal retention rule — it needs a clear identity of its own.
SQL · Aggregate with root and child
CREATE TABLE orders (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    order_no VARCHAR(30) NOT NULL,
    customer_id BIGINT NOT NULL,
    order_status CHAR(1) NOT NULL DEFAULT 'O',
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    CONSTRAINT uq_orders_order_no UNIQUE (order_no),
    CONSTRAINT chk_orders_status CHECK (order_status IN ('O','C','I','S'))
);

CREATE TABLE order_positions (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    order_id BIGINT NOT NULL,
    position_no INTEGER NOT NULL,
    item_id BIGINT NOT NULL,
    quantity DECIMAL(11,3) NOT NULL,
    net_price DECIMAL(15,4) NOT NULL,
    PRIMARY KEY (id),
    CONSTRAINT fk_pos_order
      FOREIGN KEY (order_id) REFERENCES orders(id)
      ON DELETE CASCADE,
    CONSTRAINT uq_pos_no UNIQUE (order_id, position_no),
    CONSTRAINT chk_pos_quantity CHECK (quantity > 0)
);

The business model comes first. If you design tables without a clear business boundary, you create ambiguous ownership: Which table is authoritative? Which column is the truth? Which process is allowed to delete? Which data must remain audit-proof?

3. Primary key: identity is not meaning

A primary key has one job: to identify a row permanently, uniquely, and stably. It is not a display field, not a business rule, and not a human-facing numbering scheme.

Rule of thumb: Business keys may be visible and important. But they should not automatically be the primary key. Business rules change. Identity must not.

Why meaningful keys become expensive over time

  • Mergers: Two systems may both have customer number 4711, but for different customers.
  • Multi-tenancy: A simple customer_no suddenly becomes tenant + customer_no.
  • Rule changes: Number formats change, prefixes are added, and old numbers still remain historically relevant.
  • Replication: Decentralized systems assign numbers independently.
  • Interfaces: External systems store the key and expect it to remain stable.

Recommended key strategy

Strategy Good fit Note
BIGINT GENERATED ALWAYS AS IDENTITY Core entities in a central system Compact, index-friendly, and a good fit for many classic ERP tables.
UUID Distributed generation, offline capture, integration, external references On Db2 for i, GENERATE_UUID() creates UUID v4 values. Time-ordered UUID v7 values are typically generated in the application.
Business key as UNIQUE Customer numbers, item numbers, contract numbers, external IDs Important for the business, but not necessarily the technical primary key.
Composite primary key Pure relationship tables or deliberately small lookup/association tables Do not use this as the default for core entities; foreign keys quickly become cumbersome.

Storing UUIDs sensibly on Db2 for i

On Db2 for i, GENERATE_UUID() returns a formatted CHAR(36) value. For smaller indexes and more compact storage, storing the value as BINARY(16) or VARBINARY(16) can make sense. Conversion can be done with VARBINARY_FORMAT on supported releases.

SQL · Store UUIDs compactly as binary (recommended on current TRs)
CREATE TABLE customers (
    id_bin VARBINARY(16) NOT NULL
      DEFAULT VARBINARY_FORMAT(GENERATE_UUID(),
        'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX'),
    tenant_id SMALLINT NOT NULL,
    customer_no VARCHAR(30) NOT NULL,
    name1 VARCHAR(80) NOT NULL,
    created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    rowchg_ts TIMESTAMP NOT NULL
      GENERATED ALWAYS FOR EACH ROW ON UPDATE AS ROW CHANGE TIMESTAMP,
    PRIMARY KEY (id_bin),
    CONSTRAINT uq_customer_tenant_no UNIQUE (tenant_id, customer_no),
    CONSTRAINT chk_customer_no_len CHECK (LENGTH(customer_no) BETWEEN 3 AND 30)
);

-- Display the UUID in readable form
SELECT VARCHAR_FORMAT_BINARY(id_bin,
       'XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX') AS customer_uuid,
       tenant_id,
       customer_no,
       name1
FROM customers;
Practical tip: The VARBINARY_FORMAT approach is compact and index-friendly, but it requires a current enough release level. Alternatively, you can store UUIDs as CHAR(36), which is simpler and broadly available. UUID v4 is robust for distributed generation; UUID v7 is time-ordered and usually friendlier to indexes, but it is typically generated in the application.

4. Integration, multi-tenancy, and mergers

Integration is the stress test for every data model. As soon as data is moved, mirrored, consolidated, or synchronized in real time across systems, you find out whether identity and relationships are truly stable.

Common integration traps

  • Overlapping number ranges: Customer 4711 exists in system A and system B — but they are different customers.
  • Multi-tenancy added later: Every uniqueness rule and almost every relationship must be re-evaluated.
  • Business keys used as technical coupling: Changes in the business trigger technical migrations.
  • No declarative constraints: ETL or reporting processes discover inconsistencies that the application supposedly would never have allowed.
  • No change strategy: Without reliable change timestamps or journal-based change capture, delta extraction and CDC become unnecessarily difficult.
Good integration models separate concerns: technical identity, business numbers, external references, tenant, validity period, and change information. Cramming all of that into one meaningful ID feels convenient at first — and almost always hurts later.

5. Modernizing legacy tables on IBM i

Many classic IBM i applications are based on DDS-described files and logical files. That is not automatically bad. The problem starts when implicit assumptions are not visible as SQL constraints. Modern tools then see tables, but not the rules that keep the data clean.

A pragmatic modernization path

  1. Profile existing data: duplicates, NULL values, invalid status codes, orphaned child records.
  2. Add a technical ID: BIGINT IDENTITY or UUID, depending on integration requirements.
  3. Populate existing rows cleanly.
  4. Protect business keys with UNIQUE constraints.
  5. Gradually move references to technical IDs — starting with new tables and services.
  6. Enable foreign keys, CHECK constraints, and NOT NULL rules only after data cleanup.
  7. Provide compatibility views so old programs do not have to be rewritten immediately.
SQL · Add a technical ID to a legacy table
-- Example: add a technical UUID column
ALTER TABLE customer_legacy
  ADD COLUMN id CHAR(36) CCSID 1208;

-- Populate existing rows
UPDATE customer_legacy
   SET id = GENERATE_UUID()
 WHERE id IS NULL;

-- Check data quality before adding constraints:
-- 1) Are there any NULL values?
-- 2) Are there duplicates in the business key?
-- 3) Are there dependent rows without a matching parent?

ALTER TABLE customer_legacy
  ALTER COLUMN id SET NOT NULL;

ALTER TABLE customer_legacy
  ADD CONSTRAINT pk_customer_legacy PRIMARY KEY (id);

ALTER TABLE customer_legacy
  ADD CONSTRAINT uq_customer_legacy_tenant_no
  UNIQUE (tenant_id, customer_no);
Do not fly blind: Constraints are not cosmetic. They make existing data defects visible. That is why data profiling, cleanup, test runs, and a rollback plan belong before production activation.
Important note on ROW CHANGE TIMESTAMP: A ROW CHANGE TIMESTAMP column is automatically set by Db2 on inserts and updates. That is useful for simple delta detection. For robust auditing, restart/recovery logic, delete history, or mixed modernization scenarios, journaling, CDC, or dedicated audit tables are often the stronger solution.

6. Referential integrity belongs in the database

Foreign-key constraints are not duplicated business logic. They are structural invariants: a child row must not point to a parent row that does not exist. That rule belongs where all access paths meet — in the database.

IBM describes referential constraints in exactly that sense: they ensure that values in a dependent table are valid only when they refer to matching values in a parent table. Once defined, they are enforced system-wide when data is changed through SQL and other interfaces.

SQL · Define a foreign key deliberately
ALTER TABLE order_line
  ADD CONSTRAINT fk_order_line_order
  FOREIGN KEY (order_id)
  REFERENCES order_header (id)
  ON DELETE CASCADE
  ON UPDATE RESTRICT;
Scenario Typical rule Reason
Child is fully dependent ON DELETE CASCADE An order line has no business existence without its order.
Child has its own meaning RESTRICT or NO ACTION Deleting the parent should be prevented while dependent data still exists.
Audit, legal, historical data Restrict + soft delete Data is deactivated logically, not removed uncontrollably.
Optional relationship SET NULL only deliberately Use only if the child can legitimately continue without a parent.
Myth: “Foreign keys only slow things down.”
No. Constraints do add checks, but they protect data quality across the system. With proper indexing and deliberate design, they are usually cheaper in business applications than years of cleanup, special-case logic, and troubleshooting.

7. Modeling relationships correctly

One-to-many — the classic case

A customer has many orders. The order references the customer. Not the other way around.

Schema · One-to-many relationship
customers (id PK)
orders    (id PK, customer_id FK -> customers.id)

Many-to-many — almost never just a dumb cross-reference table

In real systems, many-to-many relationships often gain attributes of their own: quantity, price at the time, discount, status, sequence, source, validity. At that point, the relationship table is an entity in its own right.

SQL · Resolve many-to-many through a line entity
CREATE TABLE order_item (
    id BIGINT GENERATED ALWAYS AS IDENTITY,
    order_id BIGINT NOT NULL,
    item_id BIGINT NOT NULL,
    position_no INTEGER NOT NULL,
    quantity DECIMAL(11,3) NOT NULL,
    price_at_order DECIMAL(15,4) NOT NULL,
    discount_percent DECIMAL(5,2) NOT NULL DEFAULT 0,
    PRIMARY KEY (id),
    CONSTRAINT fk_oi_order FOREIGN KEY (order_id)
      REFERENCES orders(id) ON DELETE CASCADE,
    CONSTRAINT fk_oi_item FOREIGN KEY (item_id)
      REFERENCES items(id) ON DELETE RESTRICT,
    CONSTRAINT uq_oi_position UNIQUE (order_id, position_no),
    CONSTRAINT chk_oi_quantity CHECK (quantity > 0)
);
Please do not: Store multiple values in one column, such as '1,7,42', or use a JSON array as a substitute for a core relationship. That makes constraints, joins, indexes, reporting, and data quality much harder.

8. New table or new column?

One of the most common modeling questions is: Is a new column enough — or do we need a new table?

Question If yes If no
Does the object have its own life cycle? New table Consider a column
Can it occur multiple times? New one-to-many table A column is often enough
Does it need history? History table with valid-from/valid-to or versioning / temporal tables Column or timestamp
Will it be referenced by other tables? Separate entity Attribute of the existing entity
Is it rarely populated and truly optional? Separate detail table or controlled extension structure Nullable column, if the semantics are clear
Is it only a display or cache value? Denormalize only deliberately Calculate it or use a view
Rule of thumb: As soon as repetition, history, external references, or an independent life cycle enter the picture, a new table is usually cleaner than one more column.
Modern history tracking with temporal tables: Db2 for i supports SYSTEM VERSIONING (temporal tables). That lets the database automatically maintain history for table changes. It is often cleaner than hand-built history tables and can work transparently for existing applications.

9. EAV, JSON, and custom fields

Entity-Attribute-Value, or EAV, stores attributes as rows instead of columns. That looks flexible, but it moves structure out of the model and into the data. Typing, constraints, indexes, and reporting become much more difficult.

SQL · Minimal classic EAV model
CREATE TABLE item_eav (
    item_id BIGINT NOT NULL,
    attribute_name VARCHAR(80) NOT NULL,
    attribute_value VARCHAR(4000),
    PRIMARY KEY (item_id, attribute_name)
);

Why EAV hurts for core data

  • Loss of type information: Numbers, dates, and flags end up as text.
  • Weak validation: NOT NULL, CHECK, and FOREIGN KEY rules apply only indirectly.
  • Complex queries: Simple columns become self-joins, pivot logic, and MAX(CASE ...) expressions.
  • Difficult reporting: BI tools expect stable columns, not dynamic lists of attributes.
  • Index problems: An index on attribute_name does not automatically help value comparisons that require type conversion.

When EAV or JSON can be acceptable

  • There are many potential additional attributes, most of which are empty.
  • Attributes are customer-specific or tenant-specific.
  • The data is not part of central integrity rules.
  • Reporting is handled through controlled views, materialized structures, or defined export models.
Pragmatic approach: The most important 20 to 40 core properties belong in the relational model. Rare optional fields may live in a controlled custom-field structure or JSON column — but not as a replacement for core relationships, status values, or master data quality.
SQL · Controlled EAV with typed values
CREATE TABLE attribute_def (
    attribute_id BIGINT GENERATED ALWAYS AS IDENTITY,
    attribute_key VARCHAR(80) NOT NULL,
    data_type CHAR(1) NOT NULL,
    required SMALLINT NOT NULL DEFAULT 0,
    PRIMARY KEY (attribute_id),
    CONSTRAINT uq_attribute_key UNIQUE (attribute_key),
    CONSTRAINT chk_attribute_type CHECK (data_type IN ('S','N','T','B'))
);

CREATE TABLE item_attribute (
    item_id BIGINT NOT NULL,
    attribute_id BIGINT NOT NULL,
    val_string VARCHAR(4000),
    val_number DECIMAL(31,10),
    val_timestamp TIMESTAMP,
    val_boolean SMALLINT,
    PRIMARY KEY (item_id, attribute_id),
    CONSTRAINT fk_item_attr_def
      FOREIGN KEY (attribute_id) REFERENCES attribute_def(attribute_id),
    CONSTRAINT chk_exactly_one_value CHECK (
      (CASE WHEN val_string IS NULL THEN 0 ELSE 1 END) +
      (CASE WHEN val_number IS NULL THEN 0 ELSE 1 END) +
      (CASE WHEN val_timestamp IS NULL THEN 0 ELSE 1 END) +
      (CASE WHEN val_boolean IS NULL THEN 0 ELSE 1 END) = 1
    )
);

10. Normalization, explained pragmatically

Normalization reduces redundancy and prevents common data anomalies. In practice, the goal is not to push every model academically into fifth normal form. The goal is to keep master data clean and use redundancy only deliberately.

Example · denormalized order data
order_header (
    order_no,
    customer_no,
    customer_name,
    customer_street,
    customer_zip,
    customer_city,
    ...
)

If the customer address changes, it has to be updated in every order. If only some rows are updated, the database now contains contradictory data. That is the classic update anomaly.

Normal form Practical meaning
1NF Atomic values, no comma-separated lists, no repeating column groups.
2NF Non-key attributes depend on the entire key, not just part of it.
3NF / BCNF No hidden dependencies between non-key attributes.
4NF / 5NF Relevant for complex multi-valued dependencies and special cases.
Realistic goal: Aim for 3NF for master data and core business objects. Denormalization for transaction data, reporting, or performance is fine — but only deliberately, documented, and with a clear update strategy.

11. Anti-patterns at a glance

  • Meaningful primary keys as the default.
  • VARCHAR(5000), CLOB, or JSON as a universal type for structured core data.
  • Comma-separated lists in a single column.
  • Status values as CHAR(1) without a CHECK constraint or lookup table.
  • NULL without clear semantics: unknown, not applicable, empty, not yet calculated?
  • Columns like phone1, phone2, phone3 instead of a proper one-to-many structure.
  • History tracking without valid_from, valid_to, version, or audit concept.
  • Soft delete without an archive and purge strategy.
  • Data integrity only in RPG, Java, or .NET.
  • EAV for core data instead of true customization scenarios.
SQL · Protect status values properly
ALTER TABLE order_header
  ADD CONSTRAINT chk_order_header_status
  CHECK (status IN ('O','C','I','X'));

-- O = open, C = confirmed, I = invoiced, X = canceled

12. Practical checklist

  • Every core entity has a stable technical primary key.
  • Business keys are protected with UNIQUE constraints.
  • Multi-tenancy is modeled explicitly and not hidden inside number ranges.
  • Important relationships have declarative foreign keys.
  • ON DELETE and ON UPDATE behavior is decided and documented from a business perspective.
  • Status values, value ranges, and mandatory fields are protected with CHECK, NOT NULL, or lookup tables.
  • History, auditing, and deletion strategy are clarified before the first production run.
  • No comma-separated lists, no repeating column groups, no JSON as a replacement for core relationships.
  • EAV or custom fields are used only in controlled extension scenarios, not for central business objects.
  • Views encapsulate compatibility and soft-delete filters instead of rebuilding rules in every tool.
  • Migrations are prepared with data profiling, test data, a rollback plan, and measurement.

13. Conclusion

Good database modeling is strategic architecture work. Especially on IBM i, where data models often survive 20 to 30 years and multiple technology generations, clean design pays off twice. Separate technical identity from business meaning. Protect business keys with UNIQUE, not necessarily as the primary key. Put referential integrity into the database. Model relationships explicitly. Use EAV, JSON, and denormalization only deliberately. And above all: build models that work not just for today’s RPG program, but also for reporting, APIs, migration, mergers — and the developer who will be staring at your schema ten years from now, muttering things you probably do not want quoted.

Glossary

Key relational modeling terms — explained briefly.

Entity
A business object with stable identity and its own life cycle, such as customer, order, item, or invoice.
Primary Key
A unique, non-null key for a table. It should be stable, minimal, and permanently immutable.
Surrogate Key
An artificial technical key with no business meaning, such as an identity value or UUID.
Business Key / Natural Key
A business-facing key such as customer number, item number, or contract number. Important for users and interfaces, but not automatically ideal as the primary key.
Foreign Key
A declarative database rule that ensures a value in a child table points to an existing parent row.
Referential Integrity
The property of a data model where relationships between tables remain consistent and orphaned child rows are prevented.
DDS
Data Description Specifications: the classic IBM i technology for defining physical and logical files.
Db2 for i
The integrated relational database of the IBM i platform.
Logical File
A classic IBM i object used for access paths and views over physical files. Useful, but not a substitute for declarative SQL constraints.
Identity Column
A column whose values are generated automatically by Db2, for example BIGINT GENERATED ALWAYS AS IDENTITY.
UUID
A 128-bit identifier. Db2 for i can generate UUID v4 values with GENERATE_UUID(); UUID v7 is time-ordered and is often generated in the application.
ROW CHANGE TIMESTAMP
A Db2 column that is automatically updated on inserts and updates. Useful for delta detection, but not the same as full business-level auditing.
Normalization
A set of rules for reducing redundancy and avoiding insert, update, and delete anomalies.
EAV
Entity-Attribute-Value: a flexible but often problematic pattern where attributes are stored as key-value rows.
Soft Delete
Logical deletion using a flag or timestamp instead of a physical DELETE. It needs an archive and purge strategy.
CDC
Change Data Capture: a method for detecting data changes and passing them to other systems.
ETL / ELT
Data integration into a data warehouse. Both approaches often bypass application code and therefore benefit from robust database rules.
Denormalization
Deliberate redundancy for performance, reporting, or simplification. It should be documented and technically controlled.
Temporal Tables / SYSTEM VERSIONING
A Db2 for i feature for automatic, transparent history tracking of table changes by the database.

Sources & further reading

Transparency Notice


🔎
Transparency Notice:
The content on tiny-tool.de is carefully researched, editorially reviewed, and updated on a regular basis. Sources and quotations are provided in a transparent and traceable manner. However, we do not guarantee the accuracy, completeness, or timeliness of the information provided. Errors cannot be ruled out.

Authorship & Editorial Support: Texts published on tiny-tool.de are original editorial works created by our team (final editorial responsibility: Guido Zeuner). Digital tools — including AI-based assistance systems — may be used solely as support tools for research, structuring, or language refinement. The selection of content, structure, reasoning, and final wording are created and approved by us as natural persons; AI systems are not authors.

Audience Measurement (VG Wort / METIS): To measure text reach, tracking pixels provided by VG Wort are used. For technical reasons, these tracking marks are loaded when the page is accessed and currently cannot be blocked through the cookie banner, as no cookies are set. The measurement is used exclusively for reach statistics; no personal profiles are created. More information is available in our Privacy Notice.

Please note: The content provided on this website is for general informational purposes only and does not constitute professional advice of any kind, including legal, tax, financial, or other specialist advice. Use of the content is at your own risk. Liability for material or immaterial damages is excluded, except in cases of intentional misconduct or gross negligence.

Advertising & Affiliate Links: Some posts may contain promotional references or so-called affiliate links. These are identified accordingly. Clicking such links does not result in any additional cost to you; we may receive a small commission.

Trademark Notice: All brand names, logos, and product names are the property of their respective rights holders and are used solely for identification and descriptive purposes. No affiliation with the companies mentioned is implied.

External Links: This website contains links to external third-party websites. Although we review links carefully, we assume no responsibility for the content of external websites. If we become aware of unlawful content, we will remove the respective links without undue delay.