Bookmark and Share

Thursday, August 27, 2009

Data Comparison with DBMS_COMPARISON

The new 11g DBMS_COMPARISON package helps to detect how data of 2 tables differ. The interface is easy to use and quite powerful. Not only, it allows to compare and converge the whole content of 2 tables but you can also work with data samples and with views. This post shows how to use this feature and include flashback query. This can be very useful to use in combination with always changing data, you can only block for a few seconds, of a production system.

Limits and Requirements of DBMS_COMPARISON

Before we start, refer to the limits of DBMS_COMPARISON:

DBMS_COMPARISON and FLASHBACK Query

We'll compare the content of a table with a view that contains a flashback clause. In that case, the view is on the table itself; here is the script to create our example:
create user demo
identified by demo
default tablespace users
temporary tablespace temp;

grant connect, resource to demo;

create table DEMO.T1 (
id number,
field1 varchar2(1000),
field2 varchar2(1000),
constraint T1_PK primary key(id));

insert into demo.T1
(select rownum, to_char(rownum),to_char(rownum)
from dual
connect by level <=10000);

commit;

col scn new_value scn

select dbms_flashback.get_system_change_number scn
from dual;

insert into demo.T1
(select 10000+rownum, to_char(rownum),to_char(rownum)
from dual connect by level <=10000);

commit;

create view demo.t1v
as select * from t1 as of scn &&scn;
Once the schema created, we can use create_comparison and compare to find the differences between the table and the view:
begin
dbms_comparison.create_comparison(
COMPARISON_NAME => 'mycomparison',
SCHEMA_NAME => 'demo',
OBJECT_NAME => 't1v',
DBLINK_NAME => null,
REMOTE_SCHEMA_NAME => 'demo',
REMOTE_OBJECT_NAME => 'T1',
COMPARISON_MODE =>
DBMS_COMPARISON.CMP_COMPARE_MODE_OBJECT,
COLUMN_LIST => '*',
SCAN_MODE =>
DBMS_COMPARISON.CMP_SCAN_MODE_FULL);
end;
/

var scanid number;

SET SERVEROUTPUT ON
DECLARE
consistent BOOLEAN;
scan_info DBMS_COMPARISON.COMPARISON_TYPE;
BEGIN
consistent := DBMS_COMPARISON.COMPARE(
comparison_name => 'mycomparison',
scan_info => scan_info,
perform_row_dif => TRUE);
:scanid:=scan_info.scan_id;
DBMS_OUTPUT.PUT_LINE('Scan ID: '||scan_info.scan_id);
IF consistent=TRUE THEN
DBMS_OUTPUT.PUT_LINE('No differences were found.');
ELSE
DBMS_OUTPUT.PUT_LINE('Differences were found.');
END IF;
END;
/

COLUMN OWNER HEADING 'Comparison Owner' FORMAT A16
COLUMN COMPARISON_NAME HEADING 'Comparison Name' FORMAT A20
COLUMN SCHEMA_NAME HEADING 'Schema Name' FORMAT A11
COLUMN OBJECT_NAME HEADING 'Object Name' FORMAT A11
COLUMN CURRENT_DIF_COUNT HEADING 'Differences' FORMAT 9999999

SELECT c.OWNER,
c.COMPARISON_NAME,
c.SCHEMA_NAME,
c.OBJECT_NAME,
s.CURRENT_DIF_COUNT
FROM DBA_COMPARISON c, DBA_COMPARISON_SCAN_SUMMARY s
WHERE c.COMPARISON_NAME = s.COMPARISON_NAME AND
c.OWNER = s.OWNER AND
s.SCAN_ID = :scanid;

Comparison Owner Comparison Name Schema Name Object Name Differences
---------------- -------------------- ----------- ----------- -----------
SYS MYCOMPARISON DEMO T1V 10000
We can even make the table converge with the view:
SET SERVEROUTPUT ON
DECLARE
scan_info DBMS_COMPARISON.COMPARISON_TYPE;
BEGIN
DBMS_COMPARISON.CONVERGE(
comparison_name => 'mycomparison',
scan_id => :scanid,
scan_info => scan_info,
converge_options => DBMS_COMPARISON.CMP_CONVERGE_LOCAL_WINS);
DBMS_OUTPUT.PUT_LINE('Local Rows Merged: '||scan_info.loc_rows_merged);
DBMS_OUTPUT.PUT_LINE('Remote Rows Merged: '||scan_info.rmt_rows_merged);
DBMS_OUTPUT.PUT_LINE('Local Rows Deleted: '||scan_info.loc_rows_deleted);
DBMS_OUTPUT.PUT_LINE('Remote Rows Deleted: '||scan_info.rmt_rows_deleted);
END;
/

Local Rows Merged: 0
Remote Rows Merged: 0
Local Rows Deleted: 0
Remote Rows Deleted: 10000

select count(*) from demo.t1;

COUNT(*)
----------
10000

select max(id) from demo.t1;

MAX(ID)
----------
10000
You'll be able to do much more from this example; to learn more, check:
You can drop the comparison and the demo schema for your next test:
exec dbms_comparison.drop_comparison('mycomparison');
drop user demo cascade;

How does DBMS_COMPARISON work?

The chapter of the Streams Replication Administrator's Guide explains how the comparison is performed and that ora_hash is used. To get more details, I've traced the compare procedure and the main query look like that:
SELECT ll.l_rowid, rr.r_rowid, NVL(ll."ID", rr."ID") idx_val
FROM
(SELECT l.rowid l_rowid, l."ID",
ora_hash(
NVL(to_char(l."ID"),'ORA$STREAMS$NV'),
4294967295,
ora_hash(
NVL((l."FIELD1"), 'ORA$STREAMS$NV'),
4294967295,
ora_hash(
NVL((l."FIELD2"), 'ORA$STREAMS$NV'),
4294967295,
0
)
)
) l_hash
FROM "DEMO"."T1" l
WHERE l."ID">=:scan_min1 AND l."ID"<=:scan_max1 ) ll
FULL OUTER JOIN
(SELECT /*+ NO_MERGE REMOTE_MAPPED */ r.rowid r_rowid, r."ID",
ora_hash(
NVL(to_char(r."ID"), 'ORA$STREAMS$NV'),
4294967295,
ora_hash(
NVL((r."FIELD1"), 'ORA$STREAMS$NV'),
4294967295,
ora_hash(
NVL((r."FIELD2"), 'ORA$STREAMS$NV'),
4294967295,
0
)
)
) r_hash
FROM "DEMO"."T2" r
WHERE r."ID">=:scan_min1
AND r."ID"<=:scan_max1 ) rr
ON ll."ID"=rr."ID"
WHERE ll.l_hash IS NULL
OR rr.r_hash IS NULL
OR ll.l_hash <> rr.r_hash;
It gives a good idea of what we can expect from the tool, no?

No comments:

Post a Comment