ORA-DB-ONE

Hierarchical Query and With clause

Posted in SQL / PLSQL, Tuning by jackyoon on February 19, 2012

1. How to convert hierarchical query to With clause query

1.1 Hierarchy query

    SELECT LEVEL,
    LPAD(' ', 4*(LEVEL -1)) || first_name || ' ' || last_name
    FROM employees
    START WITH MANAGER_ID IS NULL
    CONNECT BY manager_id = PRIOR employee_id;

1.2 converted With clause

    WITH roots AS ( SELECT employee_id, manager_id,first_name || ' ' || last_name names,
    1 level_grd, department_id, 10 sortnum, 1001 sortnum2
    FROM employees
    WHERE manager_id IS NULL ),
    level2 AS ( SELECT a.employee_id, b.employee_id manager_id, a.first_name || ' ' || a.last_name names,
    2 level_grd, a.department_id, a.employee_id sortnum, 100 sortnum2
    FROM employees a, roots b
    WHERE a.manager_id = b.employee_id ),
    level3 AS ( SELECT a.employee_id, b.employee_id manager_id, a.first_name || ' ' || a.last_name names,
    3 level_grd, a.department_id, b.employee_id sortnum, a.employee_id sortnum2
    FROM employees a, level2 b
    WHERE a.manager_id = b.employee_id ),
    level4 AS ( SELECT a.employee_id, b.employee_id manager_id, a.first_name || ' ' || a.last_name names,
    4 level_grd, a.department_id, c.employee_id sortnum, b.employee_id sortnum2
    FROM employees a, level3 b, level2 c
    WHERE a.manager_id = b.employee_id
    AND b.manager_id = c.employee_id )
    SELECT LPAD(' ', 4 * (level_grd - 1)) || names, level_grd, employee_id, manager_id, sortnum, sortnum2
    FROM roots
    UNION ALL
    SELECT LPAD(' ', 4 * (level_grd - 1)) || names, level_grd, employee_id, manager_id, sortnum, sortnum2
    FROM level2
    UNION ALL
    SELECT LPAD(' ', 4 * (level_grd - 1)) || names, level_grd, employee_id, manager_id, sortnum, sortnum2
    FROM level3
    UNION ALL
    SELECT LPAD(' ', 4 * (level_grd - 1)) || names, level_grd, employee_id, manager_id, sortnum, sortnum2
    FROM level4
    ORDER BY 5, 6, 2;

Find out object that causes DB Wait event

Posted in Tuning by jackyoon on February 28, 2011
SELECT dba_objects.object_name,
dba_objects.object_type,
active_session_history.event,
p1,p2,p3,
round(SUM(active_session_history.wait_time + active_session_history.time_waited) / 1000000, 2) ttl_wait_time,
sql_id
FROM v$active_session_history active_session_history,
dba_objects
WHERE active_session_history.sample_time BETWEEN To_date(’2011021220′,’yyyymmddhh24′) AND To_date(’2011021221′,’yyyymmddhh24′)
AND active_session_history.current_obj# = dba_objects.object_id
–AND sql_id = ’8n2yun9a9833p’
–AND session_id = 263
AND event LIKE ‘direct %’
GROUP BY dba_objects.object_name, dba_objects.object_type, active_session_history.event,p1,p2,p3,sql_id
ORDER BY 7 DESC
/

OBJECT_NAME OBJECT_TYPE EVENT P1 P2 P3 TTL_WAIT_TIME SQL_ID
———————- ——————- ——————– ———- ———- ———- ————- ————-
ORDER_HISTORY TABLE direct path read 36 451136 16 .23 7d5b7yzg8ak19
ORDER_HISTORY TABLE direct path read 32 542848 9 .23 7d5b7yzg8ak19
ORDER_HISTORY TABLE direct path read 32 517456 16 .2 8zh2wuu00w86b
ORDER_HISTORY TABLE direct path read 32 526352 16 .2 8zh2wuu00w86b
ORDER_HISTORY TABLE direct path read 40 158208 16 .19 8zh2wuu00w86b

How to determine the busiest time period using AWR repository

Posted in Tuning by jackyoon on February 28, 2011
SQL> list
1 select begin_snap, end_snap, timestamp begin_timestamp, inst, a/1000000/60
2 DBtime from
3 (
4 select
5 e.snap_id end_snap,
6 lag(e.snap_id) over (order by e.snap_id) begin_snap,
7 lag(s.end_interval_time) over (order by e.snap_id) timestamp,
8 s.instance_number inst,
9 e.value,
10 nvl(value-lag(value) over (order by e.snap_id),0) a
11 from dba_hist_sys_time_model e, DBA_HIST_SNAPSHOT s
12 where s.snap_id = e.snap_id
13 and e.instance_number = s.instance_number
14 and to_char(e.instance_number) like
15 nvl(‘&instance_number’,to_char(e.instance_number))
16 and stat_name = ‘DB time’
17 )
18 where begin_snap between nvl(‘&begin_snap_id’,0) and
19 nvl(‘&end_snap_id’,99999999)
20 and begin_snap=end_snap-1
21* order by dbtime desc
/

Enter value for instance_number: 1
old 15: nvl(‘&instance_number’,to_char(e.instance_number))
new 15: nvl(’3′,to_char(e.instance_number))
Enter value for begin_snap_id: 29656
old 18: where begin_snap between nvl(‘&begin_snap_id’,0) and
new 18: where begin_snap between nvl(’29656′,0) and
Enter value for end_snap_id: 29846
old 19: nvl(‘&end_snap_id’,99999999)
new 19: nvl(’29846′,99999999)

Output of SQL from RAC NODE#1 is:

BEGIN_SNAP END_SNAP BEGIN_TIMESTAMP INST DBTIME
———- ———- —————————————- ———- ———–
29749 29750 12-FEB-11 09.00.13.438 PM 1 137.585001
(DB on RAC node#1 was busiest on 12th Feb between 9 and 10PM)
29748 29749 12-FEB-11 08.00.04.125 PM 1 123.863173
29729 29730 12-FEB-11 01.00.43.509 AM 1 93.7432338
29798 29799 14-FEB-11 10.00.01.110 PM 1 60.5540196
29716 29717 11-FEB-11 12.00.09.630 PM 1 51.24517
29822 29823 15-FEB-11 10.00.29.590 PM 1 46.9265383

SQL>
Output of SQL from RAC NODE#2 is :

29748 29749 12-FEB-11 08.00.04.088 PM 2 98.2957163
(DB on RAC node#2 busiest on 12th Feb between 8 and 9PM)
29749 29750 12-FEB-11 09.00.13.393 PM 2 87.884499
29729 29730 12-FEB-11 01.00.43.546 AM 2 68.2343783
29702 29703 10-FEB-11 10.00.26.056 PM 2 57.1373537

OUTLINE: Replace Bad Plan with Good one.

Posted in Tuning by jackyoon on September 14, 2009

Below is the scenario that I used when we deployed new version of the application. We had very bad performance issue because of slow query which was deployed to the production during the new software release. But because we couldn’t change the query until the next maintenance window, I had to use outline to fix the performance issue.

1) Create outline for bad sql.

create or replace outline outline1 on
select /*+ INDEX(se IDX_STP_EVENT_01) */ b.bank_id,
entity_id,se.last_updated_date_time,se.stp_event_key,se.entity_type,t.trade_id,
se.event_type,se.event_date_time,se.download_status,se.download_date_time,se.is_resend
from whitney.bank b,whitney.stp_event se,whitney.trade t
where b.bank_key = se.entity_key
and b.bank_id = :1
and se.entity_type != ‘CLIENT’
and se.event_date_time >= :2 and se.event_date_time < :3
and event_type != 'SETL'
and t.trade_key = se.trade_key
/

2) Create outline for good sql.

create or replace outline outline2 on
select b.bank_id,entity_id,se.last_updated_date_time,
se.stp_event_key,se.entity_type,t.trade_id,se.event_type,
se.event_date_time,se.download_status,se.download_date_time,se.is_resend
from whitney.bank b,whitney.stp_event se,whitney.trade t
where b.bank_key = se.entity_key
and b.bank_id = :1
and se.entity_type != ‘CLIENT’
and se.event_date_time >= :2 and se.event_date_time < :3
and event_type != 'SETL'
and t.trade_key = se.trade_key
/

3. Overwrite bad outline with the contents of good outline.

rem
rem This script switches bad sql with the good one.
rem

update outln.ol$hints
set ol_name = decode( ol_name,’OUTLINE1′,’OUTLINE2′,’OUTLINE2′,’OUTLINE1′)
WHERE OL_NAME IN (‘OUTLINE1′,’OUTLINE2′);

commit
/

4. Issue the command for the users to use stored outline.
SQL> alter system set use_stored_outlines = true;

5. Check the plan if the bad sql is now using good plan.

Review Oracle’s hidden parameters

Posted in Tuning by jackyoon on April 20, 2009

set line 180

col name format a40

col value format a20

col deflt format a20

col description format a60

col type format a15

select a.ksppinm name,b.ksppstvl value, b.ksppstdf deflt, decode (a.ksppity, 1, ‘boolean’, 2, ‘string’, 3, ‘number’, 4, ‘file’, a.ksppity) type,a.ksppdesc description

from sys.x$ksppi a, sys.x$ksppcv b

where a.indx = b.indx

and a.ksppinm like ‘_%’

order by name

/

PGA_AGGREGATE_TARGET

Posted in Tuning by jackyoon on April 18, 2009

As you may know that you can define instance level maximum PGA memory size for sort, group by, hash join and bitmap merge to reduce system overhead by setting proper optimal size for SQL working area.

PGA_AGGREGATE_TARGET 1G Instance level total PGA memory amount
_PGA_MAX_SIZE 214732800 Maximum PGA size that one process can use ( Size in bytes )
_SMM_MAX_SIZE 104850 Maximum Work Area size that one process can use ( Size in kilo bytes )
_SMM_PX_MAX_SIZE 524288 Total amount of work area size that all slave processes under a PQ coordinator Session can utilize. ( Size in kilo bytes )

According to above table, the maximum amount sort_are that can be allocated in case if we define pga_aggregate_target size of 1 giga bytes is 100m.

So, if we are runing a batch jobs that need lots of sorts then we’d better unset PGA_AGGREGATE_TARGET size and maunally set SORT_AREA_SIZE after set WORK_AREA_SIZE=manual or we need to set undocumented parameter “_SMM_MAX_SIZE” to allow oracle to allocate large work area size. If you need to set this value for utilities lik exp/imp or datapump etc, I would like to set database level trigger then you can set above parameter on a specific sessions that you want to.

You can monitor PGA memory usage for a specific process from v$process view.

  • PGA_USED_MEM – Current PGA memory size used by a process
  • PGA_ALLOC_MEM – Currently allocated PGA memory size including free memory.
  • PGA_MAX_MEM – Maximum PGA memory which has assigned to a processes.

select sid,s.username,spid,PGA_USED_MEM,PGA_ALLOC_MEM,PGA_FREEABLE_MEM,PGA_MAX_MEM
from gv$session s, gv$process p
where s.paddr = p.addr
and  p.spid in ( 26072, 15862 )
/

Enqueue Lock

Posted in Tuning by jackyoon on April 13, 2009

Lock types

Oracle 10g has a new view, V$LOCK_TYPE which provides very good explanations about Enqueue lock type.

SQL> select type,name,description from v$lock_type where type = ‘TX’;

TYPE       NAME            DESCRIPTION
———- ————— ———————————————————————-
TX         Transaction     Lock held by a transaction to allow other transactions to wait for it

Monitor Enqueue lock from GV$LOCK view.

SQL>   select inst_id,sid,type,id1,id2,lmode,request from gv$lock
2* where sid  in (4361,4376);

INST_ID        SID TY        ID1        ID2      LMODE    REQUEST
———-        ———- –    ———-   ———-      ———-     ———-
1         4376 TM     125044             0          3          0
1         4376 TX      458755      56937          6          0
2         4361 TM     125044             0          3          0
2         4361 TX      458755      56937          0          6
According to the above result, SID number 4376 on instance 1 holds TX lock on transaction 458755+56937 and SID number 4361 on instance 2 requests TX lock on the same transaction.



Manage statistics

Posted in Tuning by jackyoon on April 8, 2009

To be frank, so far I haven’t really think about the importance of the statistics though it’s one of the most important part of Cost Based Optimizer. Usually, I create one crontab job which runs every weekend to gather statistics.

These days, I have been reading some of articles about Cost Based Optimizer and how is oracle use statistics in addition to how oracle react depends on different statistics. It provides me a different side of view for oracle optimizer which I didn’t think about before and I realize how important the dbms_stats package is to manage right statistics and to optimize query too.

1. Gathering Actual System statistics

System statistics are gathered to allow the optimizer to consider a system’s I/O and CPU performance and utilization. They allow the optimizer to adjust access paths to allow for the actual performance of a particular system by recording actual system statistics.

The statistics are collected using the DBMS_STATS.GATHER_SYSTEM_STATS procedure and are stored in SYS.AUX_STATS$.

The usage of the statistics is controlled by the _OPTIMIZER_COST_MODEL (hidden) parameter which defaults to CHOOSE. This setting means that CPU costing will be used if system statistics are gathered. If system statistics are not gathered then ‘standard’ I/O costing will be used and collected performance information from the server will not be used. The other settings of _OPTIMIZER_COST_MODEL are ‘CPU’ and ‘IO’ and these force their respective models irrespective of the presence of statistics.

The procedure DBMS_STATS.GATHER_SYSTEM_STATS will generate the actual system statistics for the optimizer to choose more reasonable execution plan. You can also manually set specific statistics for certain resource if required by using DBMS_STATS.SET_SYSTEM_STATS procedure.

SQL> dbms_stats.gather_system_stats(gathering_mode => ‘START’);
SQL> dbms_stats.gather_system_stats(gathering_mode => ‘STOP’);

SQL> dbms_stats.delete_system_stats;

SQL> dbms_stats.set_system_stats( ‘cpuspeed’, 3400 );

SQL> dbms_stats.set_system_stats( ‘sreadtim’, 5 );

SQL> dbms_stats.set_system_stats( ‘mreadtim’, 100 );

SQL> dbms_stats.set_system_stats( ‘mbrc’, 4 );

2. How to detect if your objects have stable statistics.

–> I found a script from metalink and customer it. This can be used from a monitoring script or crontab table to either create monitoring alert or report.

set serverout on size 1000000declare

tab_obj dbms_stats.objecttab;

begin

dbms_stats.gather_schema_stats( ‘SCHEMA_NAME’ , OPTIONS => ‘LIST STALE’, OBJLIST => tab_obj );

if tab_obj.count = 0 then

dbms_output.put_line ( ‘No objects with stale statistics found.’);

end if;

for i in 1 .. tab_obj.count loop

dbms_output.put_line( tab_obj(i).objType||’ ‘||tab_obj(i).ownname||’.'||tab_obj(i).objName||’ has stale statistics.’);

end loop;

end;

/

3. How to restore statistics

There are two ways to backup & restore statistics both are supported from dbms_stats packages.

3-1 Manually backup & restore
Gathering statistics potentially change statistics and can lead changing execution plan, it is recommended to backup the statistics before gathering statistics.

DBMS_STATS package has export procedures for column, table, index and etc. But here I show example of using schema level export.
exec dbms_stats.create_stat_table( 'USERNAME', 'MYSTAT' ); exec dbms_stats.export_schema_stats( 'USERNAME', 'MYSTAT' );
If for some reason you need to restore statistics then you can execute import_schema_stats procedure on dbms_stats package.
exec dbms_stats.import_schema_stats ( 'USERNAME', 'MYSTAT' );
Beginning with Oracle10G when statistics are gathered for a table the old statistics are retained so that should there be any problem with performance of queries dependent on those statistics, the old ones can be restored. You can query this information from DBA_TAB_STATS_HISTORY view.
select owner,TABLE_NAME,STATS_UPDATE_TIME from dba_tab_stats_history where owner = 'WHITNEY';
OWNER TABLE_NAME STATS_UPDATE_TIME ---------- ------------------------------ ---------------------------------- WHITNEY ADMIN_GROUP 06-APR-09 10.00.08.605115 PM +00:00 WHITNEY ADMIN_SESSION 06-APR-09 10.00.08.999956 PM +00:00 If for some reason you need to restore the previous statistics then you can use below.
exec dbms_stats.restore_schema_stats( 'USERNAME', '06-APR-09 10.00.08.605115 PM +00:00' );
4. Lock statistics for volatile tables
Oracle recommends that you collect statistics on volatile tables by 
completing the following steps:

   1.Run the DBMS_STATS.GATHER_TABLE_STATS procedure manually on 
     volatile tables when these tables are at their fullest.
   2.Immediately after the statistics are collected on volatile tables, 
     run the DBMS_STATS.LOCK_TABLE_STATS procedure on these tables.

SQL Statement HINTS

Posted in Tuning by jackyoon on April 6, 2009
Hint Reference:

  - Sub-Queries/views:
	PUSH_SUBQ		Causes all subqueries in a query block to be 
				executed at the earliest possible time.  
				Normally subqueries are executed as the last 
				is applied is outerjoined or remote or joined 
				with a merge join. (>=7.2)

	NO_MERGE(v)		Use this hint in a VIEW to PREVENT it
				being merged into the parent query. (>=7.2)
				or use NO_MERGE(v) in parent query block
				to prevent view V being merged
	MERGE(v)		Do merge view V

 	PUSH_JOIN_PRED(v)       Push join predicates into view V
 	NO_PUSH_JOIN_PRED(v)    Do NOT push join predicates

  - Access:
        FULL(tab)               Use FTS on tab
	CACHE(tab)		If table within <Parameter:CACHE_SIZE_THRESHOLD>
				treat as if it had the CACHE option set. 
				See <Parameter:CACHE_SIZE_THRESHOLD>. Only 
				applies if FTS used.
	NOCACHE(tab)		Do not cache table even if it has CACHE option
				set. Only relevant for FTS.

        CLUSTER(tab)		Use cluster scan to access 'tab'
        HASH(tab)		Use hash scan to access 'tab'

        INDEX(tab [ind])        Use 'ind' scan to access 'tab' - Disables index_ffs
        NO_INDEX (tab [ind])    Do no use 'ind' to access 'tab'  
        INDEX_ASC(tab [ind])    Use 'ind' to access 'tab' for range scan.
        INDEX_DESC(tab {ind])   Use descending index range scan
				   (Join problems pre 7.3)
	INDEX_FFS(tab [ind])    Index fast full scan - rather than FTS.
	NO_INDEX_FFS(tab [ind]) Exclude the option of using Index fast full scan 
                                - rather than FTS.

        INDEX_RRS(tab [ind])    Index Rowid Range scan

        INDEX_COMBINE( tab i1.. i5 ) 
				Try to use some boolean combination of
			 	bitmap index/s i1,i2 etc 

        INDEX_SS(tab [ind])     Use 'ind' to access 'tab' with an 
                                index skip scan
        INDEX_SS_ASC(tab [ind])
        INDEX_SS_DESC(tab [ind])
        NO_INDEX_SS(tab [ind])  Exclude the option of using 'ind' to access 'tab' with an 
                                index skip scan

        USE_CONCAT		Use concatenation (Union All) for OR (or IN) 
				statements. (>=7.2). See Note 17214.1
				(7.2 requires <Event:10078>, 7.3 no hint req)

	NO_EXPAND		Do not perform OR-expansion (Ie: Do not use 
				Concatenation).

	DRIVING_SITE(table)     Forces query execution to be done at the
				site where "table" resides        

  - Joining:
        USE_NL(tab)             Use table 'tab' as the inner table in a 
				Nested Loops join. May not work unless accompanied 
                                by a hint that forces the correct join order (e.g. ORDERED 
				hint).
        NO_USE_NL(tab)          Excludes the use of table 'tab' as the inner table in a 
				Nested Loops join.

        USE_NL_WITH_INDEX(tab)  The USE_NL_WITH_INDEX hint will cause the optimizer to join
                                the specified table to another row source with a nested 
                                loops join using the specified table as the inner table 
                                but only under the following condition: If no index is 
                                specified, the optimizer must be able to use some index 
                                with at least one join predicate as the index key. 
                                If an index is specified, the optimizer must be able to 
                                use that index with at least one join predicate as the index key.

        USE_MERGE(tab..)        Join each specified table with another row source using a sort-merge join.
        NO_USE_MERGE(tab..)     Excludes the option of joining each specified table 
                                with another row source using a sort-merge join.

	USE_HASH(tab1 tab2) 	Join each specified table with another row 
				source with a hash join. 'tab1' is joined to 
				previous row source using a hash join. (>=7.3)
	USE_HASH(tab1 tab2) 	Excludes the option of joining each specified table with another row 
				source with a hash join.

 	STAR_TRANSFORMATION 	Use best plan containing a STAR transformation
				(if there is one)
 	NO_STAR_TRANSFORMATION 	Exclude the option of using the best plan containing a STAR 
                                transformation

        ORDERED                 Access tables in the order of the FROM clause

        LEADING                 This hint specifies only the driving table. From there
                                CBO is free to investigate multiple join orders

  - Parallel Query Option:
        PARALLEL ( table, <degree> [, <instances>] ) 
				Use parallel degree / instances as specified

 	PARALLEL_INDEX(table, [ index, [ degree [,instances] ] ]  )
				Parallel range scan for partitioned index

	PQ_DISTRIBUTE(tab,out,in) How to distribute rows from tab in a PQ
				(out/in may be HASH/NONE/BROADCAST/PARTITION)

        NO_PARALLEL(table)	Starting from 10g this syntax should be used
        NO_PARALLEL_INDEX(table [,index])  
                                Starting from 10g this syntax should be used

  - Miscellaneous 
 	APPEND			Only valid for INSERT .. SELECT.
				Allows INSERT to work like direct load
				or to perform parallel insert. See Note 50592.1
	NOAPPEND		Do not use INSERT APPEND functionality

	REWRITE(v1[,v2])	8.1+ With a view list use eligible materialized view
				Without view list use any eligible MV

        NO_REWRITE              Starting from 10g this syntax should be used   

        NO_UNNEST               Add to a subquery to prevent it from being unnested
        UNNEST                  Unnests specified subquery block if possible

        SWAP_JOIN_INPUTS        Allows the user to switch the inputs of a join.
                                See Note 171940.1

        CARDINALITY(t1 [,..],n) Makes the CBO to use different assumptions about
                                cardinality at the table level

  - Optimizer Mode:
        FIRST_ROWS, ALL_ROWS	Force CBO first rows or all rows.
	RULE              	Force Rule if possible

NOTE: Starting from Oracle version 10.1.0.2 the following hints have been deprecated and 
      should not be used:  AND_EQUAL, HASH_AJ, MERGE_AJ, NL_AJ, HASH_SJ, MERGE_SJ, NL_SJ, 
      ORDERED_PREDICATES, ROWID, STAR.

      Descriptions for these hints are included below for historical reference:

  - Deprecated Hints:

Oracle Database 10g Release 1 (10.1) 

	AND_EQUAL(tab i1.. i5 ) Merge scans of 2 to 5 single column indexes.

	MERGE_AJ(v)	}	Put hint in a NOT IN subquery to perform sort-merge
	HASH_AJ(v)	}	anti-join or hash anti-join or 
        NL_AJ(v)	}       nested loops antijoin (>=7.3)
				Eg: SELECT .. WHERE deptno is not null 
				    AND deptno NOT IN 
					(SELECT /*+ HASH_AJ */ deptno ...)
        HASH_SJ(v) 	}	Transform EXISTS subquery into HASH or MERGE
        MERGE_SJ(v)     }       or nested loops semi-join to access "v"
        NL_SJ(v)        }        

        ORDERED_PREDICATES	Forces optimizer to apply predicates as they appear
                                in the WHERE clause, except for predicates used
				as index keys

        ROWID(tab)              Access tab by ROWID directly
                                  SELECT /*+ ROWID( table ) */ ... 
                                    FROM tab WHERE ROWID between '&1' and '&2';

	STAR			Force a star query plan if possible. A star 
				plan has the largest table in the query last  
				in the join order and joins it with a nested 
				loops join on a  concatenated index. The STAR 
				hint applies when there are at least 3 tables 
				and the large table's concatenated index has 
				at least 3 columns and there are no conflicting
				access or join method hints.   (>=7.3)

        NOPARALLEL(table)	No parallel on "table"

 	NOPARALLEL_INDEX(table [,index])
                                Opposite to PARALLEL_INDEX 

	NOREWRITE		8.1+ Do not rewrite the query

Also, many new hints have been introduced in Oracle 10g which are not listed in this
note. Some of those will be added at a later stage.

How to deal with Skewed Data

Posted in Tuning by jackyoon on April 6, 2009

Oracle analyze can produce wrong execution when data is skewed in a table. The only way the optimizer to produce proper execution plan is to use Histogram though it cannot working 100%.

Generating test data >

drop table t1 purge;

create table t1 ( c1 varchar2(1), c2 int );

create index t1_n1 on t1(c1);

insert into t1
select ‘A’ , 1
from dual
connect by level <= 1000
union all
select ‘B’ , 2
from dual
connect by level <= 100
union all
select ‘C’, 3
from dual
connect by level <= 10;

commit;

Generate statistics without histogram >
exec dbms_stats.gather_table_stats( ‘JYOON’, ‘T1′, method_opt => ‘FOR ALL COLUMNS SIZE 1′ , no_invalidate => false );

Here is the histogram information which oracle generated after you ran the statistics gathering. As you see, the table T1 only has one bucket.

SQL> select column_name,num_buckets from user_tab_columns where table_name=’T1′;

COLUMN_NAME NUM_BUCKETS
—————————— ———–
C1 1
C2 1

Test

select /*+ gather_plan_statistics */ count(*)
from t1
where c1 = ‘A’
/

select * from table
(dbms_xplan.display_cursor(null,null,’all allstats cost last’));

Id Operation Name E-Rows A-Rows
1 SORT AGGREGATE 1 1
* 2 INDEX RANGE SCAN T1_N1 370 1000

As you see the above result, optimizer’s estimated rows are far different from the actual rows and this can be one of the reason that optimizer can produce non-efficient execution plan.

How to get Estimated Rows when there is no histogram:

Base Cardinality = Total number of records in a table = 1110

Number of distinct values in a table = 3

Selectivity = # of distinct values / Base Cardinality = 3 / 1110 = 370 ( This is the estimated rows from the above execution plan. But because of skewed data this calculation is far different from the actual rows. )

Generate statistics with histogram >

exec dbms_stats.gather_table_stats( ‘JYOON’, ‘T1′, method_opt => ‘FOR ALL COLUMNS SIZE SKEWONLY′ , no_invalidate => false );

When we provide skewonly option, oracle determines the columns to collect histograms based on the data distribution of the columns.

Here is the histogram information which oracle generated after you ran the statistics gathering. Oracle created 3 buckets for each columns to store histogram.

SQL> select column_name,num_buckets from user_tab_columns where table_name=’T1′;

COLUMN_NAME NUM_BUCKETS
—————————— ———–
C1 3
C2 3

Test >

select /*+ gather_plan_statistics */ count(*)
from t1
where c1 = ‘A’
/

select * from table
(dbms_xplan.display_cursor(null,null,’all allstats cost last’));

Id Operation Name E-Rows A-Rows
1 SORT AGGREGATE 1 1
* 2 INDEX RANGE SCAN T1_N1 1000 1000

Conclusion: Optimizer can lead proper execution plan when tables have right information for skewed data.

Follow

Get every new post delivered to your Inbox.