Skip to main content

Comparing tables in the same schema

R

Recently I have encountered a strange data model. Imagine you have an app with many tables for a specific group of users. Then you are asked to add a second group of users to the app. These groups have same tables but must not see records from the other group. Author of this solution decided to copy all objects (not just tables) and use a different prefix for them. Then he made some small adjustments here and there. For tables, views, mviews, triggers and packages. He skipped sequences for some reason.

If you wonder why is this strange, then think about how would you proceed if you have been asked to add a third, fourth or fifth group of users...

The issue I have was how can I quickly compare these tables to analyze how much they overlap? They are in the same scheme and they have different prefixes. I don't want to compare them one by one manually.


Here is the script I wrote for this, just change the prefixes:

WITH x AS (
    SELECT /*+ MATERIALIZE */
        'P1_' AS a,		-- first prefix
        'P2_' AS b		-- second prefix
    FROM DUAL
),
s AS (
    SELECT /*+ MATERIALIZE */
        x.a,
        x.b,
        CASE
            WHEN t.table_name LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\'
                THEN 'A'
            WHEN t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\'
                THEN 'B'
            END AS source_group,
        CASE
            WHEN t.table_name LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\'
                THEN REPLACE(t.table_name, x.a, '')
            WHEN t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\'
                THEN REPLACE(t.table_name, x.b, '')
            END AS table_name,
        t.table_name AS full_name,
        a.column_name
        --
        -- you could append data_type to column name to make the comparision type dependent
        --
    FROM user_tab_cols a
    JOIN user_tables t
        ON t.table_name     = a.table_name
    LEFT JOIN user_mviews m
        ON m.mview_name     = t.table_name
    CROSS JOIN x
    WHERE (
            t.table_name    LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\'
            OR t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\'
        )
        AND a.column_name   NOT LIKE 'SYS\_%' ESCAPE '\'   -- ignore some columns
        AND m.mview_name    IS NULL
),
a AS (
    SELECT /*+ MATERIALIZE */
        a.table_name,
        a.full_name,
        a.all_cols,
        ad.diff_cols
    FROM (
        SELECT
            a.table_name,
            a.full_name,
            LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS all_cols
        FROM s a
        WHERE a.source_group = 'A'
        GROUP BY a.table_name, a.full_name
    ) a
    LEFT JOIN (
        SELECT
            a.table_name,
            LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS diff_cols
        FROM s a
        LEFT JOIN s b
            ON b.table_name     = a.table_name
            AND b.column_name   = a.column_name
            AND b.source_group  = 'B'
        WHERE a.source_group    = 'A'
            AND b.table_name    IS NULL
        GROUP BY a.table_name
    ) ad
        ON ad.table_name = a.table_name
),
b AS (
    SELECT /*+ MATERIALIZE */
        b.table_name,
        b.full_name,
        b.all_cols,
        bd.diff_cols
    FROM (
        SELECT
            b.table_name,
            b.full_name,
            LISTAGG(b.column_name, ', ') WITHIN GROUP (ORDER BY b.column_name) AS all_cols
        FROM s b
        WHERE b.source_group = 'B'
        GROUP BY b.table_name, b.full_name
    ) b
    LEFT JOIN (
        SELECT
            a.table_name,
            LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS diff_cols
        FROM s a
        LEFT JOIN s b
            ON b.table_name     = a.table_name
            AND b.column_name   = a.column_name
            AND b.source_group  = 'A'
        WHERE a.source_group    = 'B'
            AND b.table_name    IS NULL
        GROUP BY a.table_name
    ) bd
        ON bd.table_name = b.table_name
),
r AS (
    SELECT
        NVL(a.table_name, b.table_name) AS table_name,
        a.full_name     AS a_full_name,
        a.all_cols      AS a_all_cols,
        a.diff_cols     AS a_diff_cols,
        b.full_name     AS b_full_name,
        b.all_cols      AS b_all_cols,
        b.diff_cols     AS b_diff_cols
    FROM a
    FULL JOIN b
        ON b.table_name = a.table_name
)
SELECT *
FROM (
    SELECT
        CASE
            WHEN r.b_full_name IS NULL          THEN 'MISSING TABLE'
            WHEN r.b_all_cols = r.a_all_cols    THEN 'NO DIFFERENCE'
            WHEN r.a_diff_cols IS NOT NULL      THEN 'MISSING COLUMNS'
            END AS status,
        r.table_name,
        r.a_full_name       AS full_name,
        CASE
            WHEN r.b_full_name IS NOT NULL
                THEN r.a_diff_cols
            END AS columns
    FROM r
    UNION ALL
    SELECT
        CASE
            WHEN r.a_full_name IS NULL          THEN 'EXTRA TABLE'
            WHEN r.b_diff_cols IS NOT NULL      THEN 'EXTRA COLUMNS'
            END AS status,
        r.table_name,
        r.b_full_name       AS full_name,
        CASE
            WHEN r.a_full_name IS NOT NULL
                THEN r.b_diff_cols
            END AS columns
    FROM r
)
WHERE status IS NOT NULL
ORDER BY table_name, full_name;


Here are some test tables:

CREATE TABLE p1_one (
    id              NUMBER,
    name            VARCHAR2(30),
    missing_col1    NUMBER,
    missing_col2    NUMBER
);
--
CREATE TABLE p2_one (
    id              NUMBER,
    name            VARCHAR2(30),
    extra_col1      NUMBER,
    extra_col2      NUMBER
);
--
CREATE TABLE p1_two (
    id          NUMBER,
    name        VARCHAR2(30)
);
--
CREATE TABLE p2_three (
    id          NUMBER,
    name        VARCHAR2(30)
);
--
CREATE TABLE p1_four (
    id          NUMBER,
    name        VARCHAR2(30)
);
--
CREATE TABLE p2_four (
    id          NUMBER,
    name        VARCHAR2(30)
);

And a cleanup code:

DROP TABLE p1_one;
DROP TABLE p1_two;
DROP TABLE p1_four;
DROP TABLE p2_one;
DROP TABLE p2_three;
DROP TABLE p2_four;
--
PURGE RECYCLEBIN;


Comments