R
Recently I have encountered a strange data model. Imagine you have an app with many tables for a specific group of users. Then you are asked to add a second group of users to the app. These groups have same tables but must not see records from the other group. Author of this solution decided to copy all objects (not just tables) and use a different prefix for them. Then he made some small adjustments here and there. For tables, views, mviews, triggers and packages. He skipped sequences for some reason.
If you wonder why is this strange, then think about how would you proceed if you have been asked to add a third, fourth or fifth group of users...
The issue I have was how can I quickly compare these tables to analyze how much they overlap? They are in the same scheme and they have different prefixes. I don't want to compare them one by one manually.
Here is the script I wrote for this, just change the prefixes:
WITH x AS ( SELECT /*+ MATERIALIZE */ 'P1_' AS a, -- first prefix 'P2_' AS b -- second prefix FROM DUAL ), s AS ( SELECT /*+ MATERIALIZE */ x.a, x.b, CASE WHEN t.table_name LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\' THEN 'A' WHEN t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\' THEN 'B' END AS source_group, CASE WHEN t.table_name LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\' THEN REPLACE(t.table_name, x.a, '') WHEN t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\' THEN REPLACE(t.table_name, x.b, '') END AS table_name, t.table_name AS full_name, a.column_name -- -- you could append data_type to column name to make the comparision type dependent -- FROM user_tab_cols a JOIN user_tables t ON t.table_name = a.table_name LEFT JOIN user_mviews m ON m.mview_name = t.table_name CROSS JOIN x WHERE ( t.table_name LIKE REPLACE(x.a, '_', '\_') || '%' ESCAPE '\' OR t.table_name LIKE REPLACE(x.b, '_', '\_') || '%' ESCAPE '\' ) AND a.column_name NOT LIKE 'SYS\_%' ESCAPE '\' -- ignore some columns AND m.mview_name IS NULL ), a AS ( SELECT /*+ MATERIALIZE */ a.table_name, a.full_name, a.all_cols, ad.diff_cols FROM ( SELECT a.table_name, a.full_name, LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS all_cols FROM s a WHERE a.source_group = 'A' GROUP BY a.table_name, a.full_name ) a LEFT JOIN ( SELECT a.table_name, LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS diff_cols FROM s a LEFT JOIN s b ON b.table_name = a.table_name AND b.column_name = a.column_name AND b.source_group = 'B' WHERE a.source_group = 'A' AND b.table_name IS NULL GROUP BY a.table_name ) ad ON ad.table_name = a.table_name ), b AS ( SELECT /*+ MATERIALIZE */ b.table_name, b.full_name, b.all_cols, bd.diff_cols FROM ( SELECT b.table_name, b.full_name, LISTAGG(b.column_name, ', ') WITHIN GROUP (ORDER BY b.column_name) AS all_cols FROM s b WHERE b.source_group = 'B' GROUP BY b.table_name, b.full_name ) b LEFT JOIN ( SELECT a.table_name, LISTAGG(a.column_name, ', ') WITHIN GROUP (ORDER BY a.column_name) AS diff_cols FROM s a LEFT JOIN s b ON b.table_name = a.table_name AND b.column_name = a.column_name AND b.source_group = 'A' WHERE a.source_group = 'B' AND b.table_name IS NULL GROUP BY a.table_name ) bd ON bd.table_name = b.table_name ), r AS ( SELECT NVL(a.table_name, b.table_name) AS table_name, a.full_name AS a_full_name, a.all_cols AS a_all_cols, a.diff_cols AS a_diff_cols, b.full_name AS b_full_name, b.all_cols AS b_all_cols, b.diff_cols AS b_diff_cols FROM a FULL JOIN b ON b.table_name = a.table_name ) SELECT * FROM ( SELECT CASE WHEN r.b_full_name IS NULL THEN 'MISSING TABLE' WHEN r.b_all_cols = r.a_all_cols THEN 'NO DIFFERENCE' WHEN r.a_diff_cols IS NOT NULL THEN 'MISSING COLUMNS' END AS status, r.table_name, r.a_full_name AS full_name, CASE WHEN r.b_full_name IS NOT NULL THEN r.a_diff_cols END AS columns FROM r UNION ALL SELECT CASE WHEN r.a_full_name IS NULL THEN 'EXTRA TABLE' WHEN r.b_diff_cols IS NOT NULL THEN 'EXTRA COLUMNS' END AS status, r.table_name, r.b_full_name AS full_name, CASE WHEN r.a_full_name IS NOT NULL THEN r.b_diff_cols END AS columns FROM r ) WHERE status IS NOT NULL ORDER BY table_name, full_name;
Here are some test tables:
CREATE TABLE p1_one ( id NUMBER, name VARCHAR2(30), missing_col1 NUMBER, missing_col2 NUMBER ); -- CREATE TABLE p2_one ( id NUMBER, name VARCHAR2(30), extra_col1 NUMBER, extra_col2 NUMBER ); -- CREATE TABLE p1_two ( id NUMBER, name VARCHAR2(30) ); -- CREATE TABLE p2_three ( id NUMBER, name VARCHAR2(30) ); -- CREATE TABLE p1_four ( id NUMBER, name VARCHAR2(30) ); -- CREATE TABLE p2_four ( id NUMBER, name VARCHAR2(30) );
And a cleanup code:
DROP TABLE p1_one; DROP TABLE p1_two; DROP TABLE p1_four; DROP TABLE p2_one; DROP TABLE p2_three; DROP TABLE p2_four; -- PURGE RECYCLEBIN;
Comments
Post a Comment