Oracle® Database 2 Day + Data Warehousing Guide 11g Release 2 (11.2) Part Number E10578-02 |
|
|
View PDF |
This section discusses how to produce effective business reports derived from business queries, and includes the following topics:
Oracle Database has enhanced SQL's analytical processing capabilities by introducing a family of aggregate and analytic SQL functions. These functions enable you to calculate ranking, percentiles, and moving averages, and allow you to answer queries such as the following:
What are the top 10 products sold by country?
What is the weekly moving average for products in stock?
What percentage of total sales occurs during the fourth quarter?
How much higher is the average discount in the fourth quarter than the discount for yearly average?
What would be the profitability ranking of existing oil refineries if 20 percent of the refineries in a country were closed?
Aggregate functions are a fundamental part of data warehousing because they enable you to derive different types of totals, and then use these totals for additional calculations. To improve aggregate performance in your data warehouse, Oracle Database provides several extensions to the GROUP
BY
clause. The CUBE
, ROLLUP
, GROUPING
, and GROUPING
SETS
functions make querying and reporting easier and faster. The ROLLUP
function calculates aggregations such as SUM
, COUNT
, MAX
, MIN
, and AVG
at increasing levels of aggregation, from the most individual detailed level up to a grand summary total. The CUBE
function is an extension similar to ROLLUP
, enabling a single statement to calculate all possible combinations of aggregations.
Analytic functions compute an aggregate value based on a group of rows. These functions differ from aggregate functions in that they return multiple rows for each group. This group of rows is called a window. This window enables calculations such as moving average or cumulative total. For each row, a window of rows is defined. This window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a logical interval such as time, or a physical number of rows. Some functions are used only with windows and are often referred to as window functions.
To enhance performance, aggregate and analytic functions can each perform in parallel: multiple processes can simultaneously execute all of these functions. These capabilities make calculations, analysis, and reporting easier and more efficient, thereby enhancing data warehouse performance, scalability, and simplicity.
You can take advantage of the advanced SQL and PL/SQL capabilities Oracle Database offers to translate business queries into SQL. This section discusses these advanced capabilities, and includes the following topics:
The ROLLUP
function enables a SELECT
statement to calculate multiple levels of subtotals across a specified group of dimensions, as well as a grand total. The ROLLUP
function is a simple extension to the GROUP
BY
clause, so its syntax is extremely easy to use. The ROLLUP
function is highly efficient, adding minimal overhead to a query.The action of the ROLLUP
function is straightforward: it creates subtotals that roll up from the most detailed level to a grand total, following a grouping list specified in the ROLLUP
function. ROLLUP
takes as its argument an ordered list of grouping columns. First, it calculates the standard aggregate values specified in the GROUP
BY
clause. Then, it creates progressively higher-level subtotals, moving from right to left through the list of grouping columns. Finally, it creates a grand total.
When your tasks involve subtotals, particularly when the subtotals are along a hierarchical dimension such as time or geography, use the ROLLUP
function. Also, a ROLLUP
function can simplify and speed up the maintenance of materialized views.
A common request when preparing business reports is to find quarterly sales revenue across different product categories, in order by the amount of revenue. The following query achieves this, and is used for the starting point for building more complicated queries later:
SELECT t.calendar_quarter_desc quarter , p.prod_category category , TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue FROM times t , products p , sales s WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy') AND TO_DATE('31-DEC-2001','dd-MON-yyyy') GROUP BY t.calendar_quarter_desc, p.prod_category ORDER BY t.calendar_quarter_desc , SUM(s.amount_sold); QUARTER CATEGORY REVENUE ------- ------------------------------ -------------- 2001-01 Software/Other $860,819.81 2001-01 Electronics $1,239,287.71 2001-01 Hardware $1,301,343.45 2001-01 Photo $1,370,706.38 2001-01 Peripherals and Accessories $1,774,940.09 2001-02 Software/Other $872,157.38 2001-02 Electronics $1,144,187.90 2001-02 Hardware $1,557,059.59 2001-02 Photo $1,563,475.51 2001-02 Peripherals and Accessories $1,785,588.01 2001-03 Software/Other $877,630.85 2001-03 Electronics $1,017,536.82 2001-03 Photo $1,607,315.63 2001-03 Hardware $1,651,454.29 2001-03 Peripherals and Accessories $2,042,061.04 2001-04 Software/Other $943,296.36 2001-04 Hardware $1,174,512.68 2001-04 Electronics $1,303,838.52 2001-04 Photo $1,792,131.39 2001-04 Peripherals and Accessories $2,257,118.57
This query is useful, but you may want to see the totals for different categories in the same report. The following example illustrates how you can use the ROLLUP
function to add the totals to the original query:
SELECT t.calendar_quarter_desc quarter
, p.prod_category category
, TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue
FROM times t
, products p
, sales s
WHERE t.time_id = s.time_id
AND p.prod_id = s.prod_id
AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy')
AND TO_DATE('31-DEC-2001','dd-MON-yyyy')
GROUP BY ROLLUP(t.calendar_quarter_desc, p.prod_category)
ORDER BY t.calendar_quarter_desc
, SUM(s.amount_sold);
QUARTER CATEGORY REVENUE
------- ------------------------------ -------------
2001-01 Software/Other $860,819.81
2001-01 Electronics $1,239,287.71
2001-01 Hardware $1,301,343.45
2001-01 Photo $1,370,706.38
2001-01 Peripherals and Accessories $1,774,940.09
2001-01 $6,547,097.44
2001-02 Software/Other $872,157.38
2001-02 Electronics $1,144,187.90
2001-02 Hardware $1,557,059.59
2001-02 Photo $1,563,475.51
2001-02 Peripherals and Accessories $1,785,588.01
2001-02 $6,922,468.39
2001-03 Software/Other $877,630.85
2001-03 Electronics $1,017,536.82
2001-03 Photo $1,607,315.63
2001-03 Hardware $1,651,454.29
2001-03 Peripherals and Accessories $2,042,061.04
2001-03 $7,195,998.63
2001-04 Software/Other $943,296.36
2001-04 Hardware $1,174,512.68
2001-04 Electronics $1,303,838.52
2001-04 Photo $1,792,131.39
2001-04 Peripherals and Accessories $2,257,118.57
2001-04 $7,470,897.52
$28,136,461.98
The CUBE
function takes a specified set of grouping columns and creates subtotals for all of the possible combinations. In terms of multidimensional analysis, the CUBE
function generates all the subtotals that can be calculated for a data cube with the specified dimensions. If you have specified CUBE(time, region, department)
, the result set will include all the values that would be included in an equivalent ROLLUP
function plus additional combinations.
Consider using the CUBE
function in any situation requiring cross-tabular reports. The data needed for cross-tabular reports can be generated with a single SELECT
statement using the CUBE
function. Like ROLLUP
, the CUBE
function can be helpful in generating materialized views. Note that population of materialized views is even faster if the query containing a CUBE
function executes in parallel.
You may want to get not only quarterly totals, but also totals for the different product categories for the selected period. The CUBE
function enables this calculation, as shown in the following example.
To use the CUBE
function:
SELECT t.calendar_quarter_desc quarter
, p.prod_category category
, TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue
FROM times t
, products p
, sales s
WHERE t.time_id = s.time_id
AND p.prod_id = s.prod_id
AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy')
AND TO_DATE('31-DEC-2001','dd-MON-yyyy')
GROUP BY CUBE(t.calendar_quarter_desc, p.prod_category)
ORDER BY t.calendar_quarter_desc
, SUM(s.amount_sold);
QUARTER CATEGORY REVENUE
------- ------------------------------ -------------
2001-01 Software/Other $860,819.81
2001-01 Electronics $1,239,287.71
2001-01 Hardware $1,301,343.45
2001-01 Photo $1,370,706.38
2001-01 Peripherals and Accessories $1,774,940.09
2001-01 $6,547,097.44
2001-02 Software/Other $872,157.38
2001-02 Electronics $1,144,187.90
2001-02 Hardware $1,557,059.59
2001-02 Photo $1,563,475.51
2001-02 Peripherals and Accessories $1,785,588.01
2001-02 $6,922,468.39
2001-03 Software/Other $877,630.85
2001-03 Electronics $1,017,536.82
2001-03 Photo $1,607,315.63
2001-03 Hardware $1,651,454.29
2001-03 Peripherals and Accessories $2,042,061.04
2001-03 $7,195,998.63
2001-04 Software/Other $943,296.36
2001-04 Hardware $1,174,512.68
2001-04 Electronics $1,303,838.52
2001-04 Photo $1,792,131.39
2001-04 Peripherals and Accessories $2,257,118.57
2001-04 $7,470,897.52
Software/Other $3,553,904.40
Electronics $4,704,850.95
Hardware $5,684,370.01
Photo $6,333,628.91
Peripherals and Accessories $7,859,707.71
$28,136,461.98
Two challenges arise with the use of the ROLLUP
and CUBE
functions. How can you programmatically determine which result set rows are subtotals, and how do you find the exact level of aggregation for a given subtotal? You often need to use subtotals in calculations such as percent-of-totals, so you need an easy way to determine which rows are the subtotals. What happens if query results contain both stored NULL
values and null values created by a ROLLUP
or CUBE
function? How can you differentiate between the two?
The GROUPING
function handles this problem. Using a single column as its argument, the GROUPING
function returns 1 when it encounters a null value created by a ROLLUP
or CUBE
function. That is, if the null value indicates the row is a subtotal, GROUPING
returns a value of 1. Any other type of value, including a stored NULL
value, returns a value of 0.
When you must handle NULL
values or null values created by a ROLLUP
or CUBE
operation, use the GROUPING
function. One reason you may want to work with null values is to put a description in null fields, for example, text describing that a number represents a total.
You might want more descriptive columns in your report because it is not always clear when a value represents a total. The GROUPING
function enables you to insert labels showing totals in the results of the query as shown in the following example.
To use the GROUPING
function:
SELECT DECODE(GROUPING(t.calendar_quarter_desc) , 0, t.calendar_quarter_desc , 1, 'TOTAL' ) quarter , DECODE(GROUPING(p.prod_category) , 0, p.prod_category , 1, 'TOTAL' ) category , TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue FROM times t , products p , sales s WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy') AND TO_DATE('31-DEC-2001','dd-MON-yyyy') GROUP BY CUBE(t.calendar_quarter_desc, p.prod_category) ORDER BY t.calendar_quarter_desc , SUM(s.amount_sold); QUARTER CATEGORY REVENUE ------- ------------------------------ ------------- 2001-01 Software/Other $860,819.81 2001-01 Electronics $1,239,287.71 2001-01 Hardware $1,301,343.45 2001-01 Photo $1,370,706.38 2001-01 Peripherals and Accessories $1,774,940.09 2001-01 TOTAL $6,547,097.44 2001-02 Software/Other $872,157.38 2001-02 Electronics $1,144,187.90 2001-02 Hardware $1,557,059.59 2001-02 Photo $1,563,475.51 2001-02 Peripherals and Accessories $1,785,588.01 2001-02 TOTAL $6,922,468.39 2001-03 Software/Other $877,630.85 2001-03 Electronics $1,017,536.82 2001-03 Photo $1,607,315.63 2001-03 Hardware $1,651,454.29 2001-03 Peripherals and Accessories $2,042,061.04 2001-03 TOTAL $7,195,998.63 2001-04 Software/Other $943,296.36 2001-04 Hardware $1,174,512.68 2001-04 Electronics $1,303,838.52 2001-04 Photo $1,792,131.39 2001-04 Peripherals and Accessories $2,257,118.57 2001-04 TOTAL $7,470,897.52 TOTAL Software/Other $3,553,904.40 TOTAL Electronics $4,704,850.95 TOTAL Hardware $5,684,370.01 TOTAL Photo $6,333,628.91 TOTAL Peripherals and Accessories $7,859,707.71 TOTAL TOTAL $28,136,461.98
You can selectively specify the set of groups that you want to create using a GROUPING
SETS
function within a GROUP
BY
clause. This allows precise specification across multiple dimensions without computing the whole data cube. In other words, not all dimension totals are needed.
When you want particular subtotals in a data cube, but not all that are possible, use the GROUPING
SETS
function.
You may want to see the total sales numbers based on sales channel. Instead of adding a separate query to retrieve the totals per channel class, you can use the GROUPING
SETS
function as illustrated in the following example.
To use the GROUPING
SETS
function:
SELECT DECODE(GROUPING(t.calendar_quarter_desc) , 0, t.calendar_quarter_desc , 1, 'TOTAL' ) quarter , DECODE(GROUPING(c.channel_class) , 0, c.channel_class , 1 , '--all--' ) channel , DECODE(GROUPING(p.prod_category) , 0, p.prod_category , 1, 'TOTAL' ) category , TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue FROM times t , products p , channels c , sales s WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id AND c.channel_id = s.channel_id AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy') AND TO_DATE('31-DEC-2001','dd-MON-yyyy') GROUP BY GROUPING SETS(c.channel_class, CUBE(t.calendar_quarter_desc, p.prod_category)) ORDER BY t.calendar_quarter_desc , SUM(s.amount_sold); QUARTER CHANNEL CATEGORY REVENUE ------- ------------- ---------------------------- ------------- 2001-01 --all-- Software/Other $860,819.81 2001-01 --all-- Electronics $1,239,287.71 2001-01 --all-- Hardware $1,301,343.45 2001-01 --all-- Photo $1,370,706.38 2001-01 --all-- Peripherals and Accessories $1,774,940.09 2001-01 --all-- TOTAL $6,547,097.44 2001-02 --all-- Software/Other $872,157.38 2001-02 --all-- Electronics $1,144,187.90 2001-02 --all-- Hardware $1,557,059.59 2001-02 --all-- Photo $1,563,475.51 2001-02 --all-- Peripherals and Accessories $1,785,588.01 2001-02 --all-- TOTAL $6,922,468.39 2001-03 --all-- Software/Other $877,630.85 2001-03 --all-- Electronics $1,017,536.82 2001-03 --all-- Photo $1,607,315.63 2001-03 --all-- Hardware $1,651,454.29 2001-03 --all-- Peripherals and Accessories $2,042,061.04 2001-03 --all-- TOTAL $7,195,998.63 2001-04 --all-- Software/Other $943,296.36 2001-04 --all-- Hardware $1,174,512.68 2001-04 --all-- Electronics $1,303,838.52 2001-04 --all-- Photo $1,792,131.39 2001-04 --all-- Peripherals and Accessories $2,257,118.57 2001-04 --all-- TOTAL $7,470,897.52 TOTAL --all-- Software/Other $3,553,904.40 TOTAL --all-- Electronics $4,704,850.95 TOTAL --all-- Hardware $5,684,370.01 TOTAL --all-- Photo $6,333,628.91 TOTAL Indirect TOTAL $6,709,496.66 TOTAL --all-- Peripherals and Accessories $7,859,707.71 TOTAL Others TOTAL $8,038,529.96 TOTAL Direct TOTAL $13,388,435.36 TOTAL --all-- TOTAL $28,136,461.98
Business information processing requires advanced calculations, including complex ranking, subtotals, moving averages, and lead/lag comparisons. These aggregation and analysis tasks are essential in creating business intelligence queries, and are accomplished by the use of window functions.
When you want to perform complex queries and analyze the query results, use the RANK
function.
Users would like to see an additional column that shows the rank of any revenue number within the quarter. The following example illustrates using the RANK
function to achieve this.
To use the RANK
function:
SELECT DECODE(GROUPING(t.calendar_quarter_desc)
, 0, t.calendar_quarter_desc
, 1, 'TOTAL'
) quarter
, DECODE(GROUPING(t.calendar_quarter_desc) + GROUPING(p.prod_category)
, 0, RANK() OVER (PARTITION BY t.calendar_quarter_desc
ORDER BY SUM(s.amount_sold))
, 1, null
) ranking
, DECODE(GROUPING(c.channel_class)
, 0, c.channel_class
, 1 , '--all--'
) channel
, DECODE(GROUPING(p.prod_category)
, 0, p.prod_category
, 1, 'TOTAL'
) category
, TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue
FROM times t
, products p
, channels c
, sales s
WHERE t.time_id = s.time_id
AND p.prod_id = s.prod_id
AND c.channel_id = s.channel_id
AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy')
AND TO_DATE('31-DEC-2001','dd-MON-yyyy')
GROUP BY GROUPING SETS(c.channel_class,
CUBE(t.calendar_quarter_desc, p.prod_category))
ORDER BY t.calendar_quarter_desc
, SUM(s.amount_sold);
QUARTER RANKING CHANNEL CATEGORY REVENUE
------- ------- -------- ---------------------------- --------------
2001-01 1 --all-- Software/Other $860,819.81
2001-01 2 --all-- Electronics $1,239,287.71
2001-01 3 --all-- Hardware $1,301,343.45
2001-01 4 --all-- Photo $1,370,706.38
2001-01 5 --all-- Peripherals and Accessories $1,774,940.09
2001-01 --all-- TOTAL $6,547,097.44
2001-02 1 --all-- Software/Other $872,157.38
2001-02 2 --all-- Electronics $1,144,187.90
2001-02 3 --all-- Hardware $1,557,059.59
2001-02 4 --all-- Photo $1,563,475.51
2001-02 5 --all-- Peripherals and Accessories $1,785,588.01
2001-02 --all-- TOTAL $6,922,468.39
2001-03 1 --all-- Software/Other $877,630.85
2001-03 2 --all-- Electronics $1,017,536.82
2001-03 3 --all-- Photo $1,607,315.63
2001-03 4 --all-- Hardware $1,651,454.29
2001-03 5 --all-- Peripherals and Accessories $2,042,061.04
2001-03 --all-- TOTAL $7,195,998.63
2001-04 1 --all-- Software/Other $943,296.36
2001-04 2 --all-- Hardware $1,174,512.68
2001-04 3 --all-- Electronics $1,303,838.52
2001-04 4 --all-- Photo $1,792,131.39
2001-04 5 --all-- Peripherals and Accessories $2,257,118.57
2001-04 --all-- TOTAL $7,470,897.52
TOTAL --all-- Software/Other $3,553,904.40
TOTAL --all-- Electronics $4,704,850.95
TOTAL --all-- Hardware $5,684,370.01
TOTAL --all-- Photo $6,333,628.91
TOTAL Indirect TOTAL $6,709,496.66
TOTAL --all-- Peripherals and Accessories $7,859,707.71
TOTAL Others TOTAL $8,038,529.96
TOTAL Direct TOTAL $13,388,435.36
TOTAL --all-- TOTAL $28,136,461.98
In this example, the PARTITION BY clause defines the boundaries for the RANK function.
A common business intelligence request is to calculate the contribution of every product category to the total revenue based on a given time period.
You want to get the differences for revenue numbers on a quarter-by-quarter basis. As illustrated in the following example, you can use a window function with a PARTITION
BY
product category to achieve this.
To calculate relative contributions to a total:
SELECT DECODE(GROUPING(t.calendar_quarter_desc) , 0, t.calendar_quarter_desc , 1, 'TOTAL' ) quarter , DECODE(GROUPING(t.calendar_quarter_desc) + GROUPING(p.prod_category) , 0, RANK() OVER (PARTITION BY t.calendar_quarter_desc ORDER BY SUM(s.amount_sold)) , 1, null ) RANKING , DECODE(GROUPING(c.channel_class) , 0, c.channel_class , 1 , '--all--' ) channel , DECODE(GROUPING(p.prod_category) , 0, p.prod_category , 1, 'TOTAL' ) category , TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue , TO_CHAR(100 * RATIO_TO_REPORT(SUM(s.amount_sold)) OVER (PARTITION BY (TO_CHAR(GROUPING(p.prod_category) || t.calendar_quarter_desc))),'990D0') percent FROM times t , products p , channels c , sales s WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id AND c.channel_id = s.channel_id AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy') AND TO_DATE('31-DEC-2001','dd-MON-yyyy') GROUP BY GROUPING SETS(c.channel_class, CUBE(t.calendar_quarter_desc, p.prod_category)) ORDER BY t.calendar_quarter_desc , SUM(s.amount_sold); QUARTER RANKING CHANNEL CATEGORY REVENUE PERC ------- ------- ------- ------------- ------------ ---- 2001-01 1 --all-- Software/Other $860,819.81 13.1 2001-01 2 --all-- Electronics $1,239,287.71 18.9 2001-01 3 --all-- Hardware $1,301,343.45 19.9 2001-01 4 --all-- Photo $1,370,706.38 20.9 2001-01 5 --all-- Peripherals $1,774,940.09 27.1 2001-01 --all-- TOTAL $6,547,097.44 100.0 2001-02 1 --all-- Software/Other $872,157.38 12.6 2001-02 2 --all-- Electronics $1,144,187.90 16.5 2001-02 3 --all-- Hardware $1,557,059.59 22.5 2001-02 4 --all-- Photo $1,563,475.51 22.6 2001-02 5 --all-- Peripherals $1,785,588.01 25.8 2001-02 --all-- TOTAL $6,922,468.39 100.0 2001-03 1 --all-- Software/Other $877,630.85 12.2 2001-03 2 --all-- Electronics $1,017,536.82 14.1 2001-03 3 --all-- Photo $1,607,315.63 22.3 2001-03 4 --all-- Hardware $1,651,454.29 22.9 2001-03 5 --all-- Peripherals $2,042,061.04 28.4 2001-03 --all-- TOTAL $7,195,998.63 100.0 2001-04 1 --all-- Software/Other $943,296.36 12.6 2001-04 2 --all-- Hardware $1,174,512.68 15.7 2001-04 3 --all-- Electronics $1,303,838.52 17.5 2001-04 4 --all-- Photo $1,792,131.39 24.0 2001-04 5 --all-- Peripherals $2,257,118.57 30.2 2001-04 --all-- TOTAL $7,470,897.52 100.0 TOTAL --all-- Software/Other $3,553,904.40 12.6 TOTAL --all-- Electronics $4,704,850.95 16.7 TOTAL --all-- Hardware $5,684,370.01 20.2 TOTAL --all-- Photo $6,333,628.91 22.5 TOTAL Indirect TOTAL $6,709,496.66 11.9 TOTAL --all-- Peripherals $7,859,707.71 27.9 TOTAL Others TOTAL $8,038,529.96 14.3 TOTAL Direct TOTAL $13,388,435.36 23.8 TOTAL --all-- TOTAL $28,136,461.98 50.0 "Peripherals" was used instead of "Peripherals and Accessories" to save space.
A common business intelligence question is how a particular result relates to another result. To do this in a single query, you can use window functions and perform interrow calculations in a single statement.
You may want to know the contribution of every product category to the total revenue for each quarter. You can use the window function RATIO_TO_REPORT
to achieve this result, as illustrated in the following example. Note that you must use concatenation with GROUPING(p.prod_category)
to preclude the total from the RATIO_TO_REPORT
per quarter.
To perform interrow calculations:
SELECT DECODE(GROUPING(t.calendar_quarter_desc) , 0, t.calendar_quarter_desc , 1, 'TOTAL' ) quarter , DECODE(GROUPING(t.calendar_quarter_desc) + GROUPING(p.prod_category) , 0, RANK() OVER (PARTITION BY t.calendar_quarter_desc ORDER BY SUM(s.amount_sold)) , 1, null ) RANKING , DECODE(GROUPING(c.channel_class) , 0, c.channel_class , 1 , '--all--' ) channel , DECODE(GROUPING(p.prod_category) , 0, p.prod_category , 1, 'TOTAL' ) category , TO_CHAR(SUM(s.amount_sold),'L999G999G990D00') revenue , TO_CHAR(100 * RATIO_TO_REPORT(SUM(s.amount_sold)) OVER (PARTITION BY (TO_CHAR(GROUPING(p.prod_category) || t.calendar_quarter_desc))),'990D0') percent , DECODE(GROUPING(t.calendar_quarter_desc) + GROUPING(p.prod_category) , 0, TO_CHAR(SUM(s.amount_sold) - LAG(SUM(s.amount_sold),1) OVER (PARTITION BY p.prod_category ORDER BY t.calendar_quarter_desc),'L999G990D00') , 1, null ) q_q_diff FROM times t , products p , channels c , sales s WHERE t.time_id = s.time_id AND p.prod_id = s.prod_id AND c.channel_id = s.channel_id AND s.time_id BETWEEN TO_DATE('01-JAN-2001','dd-MON-yyyy') AND TO_DATE('31-DEC-2001','dd-MON-yyyy') GROUP BY GROUPING SETS(c.channel_class, CUBE(t.calendar_quarter_desc, p.prod_category)) ORDER BY t.calendar_quarter_desc , SUM(s.amount_sold); QUARTER RANKING CHANNEL CATEGORY REVENUE PERC Q_Q_DIFF ------- ------- ------- ------------- ------------ ---- ---------- 2001-01 1 --all-- Software/Other $860,819.81 13.1 2001-01 2 --all-- Electronics $1,239,287.71 18.9 2001-01 3 --all-- Hardware $1,301,343.45 19.9 2001-01 4 --all-- Photo $1,370,706.38 20.9 2001-01 5 --all-- Peripherals $1,774,940.09 27.1 2001-01 --all-- TOTAL $6,547,097.44 100.0 2001-02 1 --all-- Software/Other $872,157.38 12.6 $11,337.57 2001-02 2 --all-- Electronics $1,144,187.90 16.5 -$95,099.81 2001-02 3 --all-- Hardware $1,557,059.59 22.5 $255,716.14 2001-02 4 --all-- Photo $1,563,475.51 22.6 $192,769.13 2001-02 5 --all-- Peripherals $1,785,588.01 25.8 $10,647.92 2001-02 --all-- TOTAL $6,922,468.39 100.0 2001-03 1 --all-- Software/Other $877,630.85 12.2 $5,473.47 2001-03 2 --all-- Electronics $1,017,536.82 14.1 -$126,651.08 2001-03 3 --all-- Photo $1,607,315.63 22.3 $43,840.12 2001-03 4 --all-- Hardware $1,651,454.29 22.9 $94,394.70 2001-03 5 --all-- Peripherals $2,042,061.04 28.4 $256,473.03 2001-03 --all-- TOTAL $7,195,998.63 100.0 2001-04 1 --all-- Software/Other $943,296.36 12.6 $65,665.51 2001-04 2 --all-- Hardware $1,174,512.68 15.7 -$476,941.61 2001-04 3 --all-- Electronics $1,303,838.52 17.5 $286,301.70 2001-04 4 --all-- Photo $1,792,131.39 24.0 $184,815.76 2001-04 5 --all-- Peripherals $2,257,118.57 30.2 $215,057.53 2001-04 --all-- TOTAL $7,470,897.52 100.0 TOTAL --all-- Software/Other $3,553,904.40 12.6 TOTAL --all-- Electronics $4,704,850.95 16.7 TOTAL --all-- Hardware $5,684,370.01 20.2 TOTAL --all-- Photo $6,333,628.91 22.5 TOTAL Indirect TOTAL $6,709,496.66 11.9 TOTAL --all-- Peripherals $7,859,707.71 27.9 TOTAL Others TOTAL $8,038,529.96 14.3 TOTAL Direct TOTAL $13,388,435.36 23.8 TOTAL --all-- TOTAL $28,136,461.98 50.0 "Peripherals" was used instead of "Peripherals and Accessories" to save space.
You can create moving aggregations with window functions. A moving aggregation can be based on a number of physical rows or it can be a logical time period. Window functions use the PARTITION
keyword, and, for each row in a partition, you can define a sliding window of data. This window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time. The window has a starting row and an ending row. Depending on its definition, the window may move at one or both ends. For instance, a window defined for a cumulative SUM
function would have its starting row fixed at the first row of its partition, and its ending row would slide from the starting point all the way to the last row of the partition. In contrast, a window defined for a moving average would have both its starting and ending points slide so that they maintain a constant physical or logical range.
Window functions are commonly used to calculate moving and cumulative versions of SUM
, AVERAGE
, COUNT
, MAX
, MIN
, and many more functions. They can be used only in the SELECT
and ORDER
BY
clauses of the query. Window functions include the FIRST_VALUE
function, which returns the first value in the window; and the LAST_VALUE
function, which returns the last value in the window. These functions provide access to more than one row of a table without requiring a self-join.
The following example shows a query that retrieves a 7-day moving average of product revenue per product, using a logical time interval.
To calculate a moving average:
SELECT time_id , prod_name , TO_CHAR(revenue,'L999G990D00') revenue , TO_CHAR(AVG(revenue) OVER (PARTITION BY prod_name ORDER BY time_id RANGE INTERVAL '7' DAY PRECEDING),'L999G990D00') mv_7day_avg FROM ( SELECT s.time_id, p.prod_name, SUM(s.amount_sold) revenue FROM products p , sales s WHERE p.prod_id = s.prod_id AND s.time_id BETWEEN TO_DATE('25-JUN-2001','dd-MON-yyyy') AND TO_DATE('16-JUL-2001','dd-MON-yyyy') AND p.prod_name LIKE '%Memory%' AND p.prod_category = 'Photo' GROUP BY s.time_id, p.prod_name ) ORDER BY time_id, prod_name; TIME_ID PROD_NAME REVENUE MV_7DAY_AVG --------- ----------------- ------------------ -------------- 26-JUN-01 256MB Memory Card $560.15 $560.15 30-JUN-01 256MB Memory Card $844.00 $702.08 02-JUL-01 128MB Memory Card $3,283.74 $3,283.74 02-JUL-01 256MB Memory Card $3,903.32 $1,769.16 03-JUL-01 256MB Memory Card $699.37 $1,501.71 08-JUL-01 128MB Memory Card $3,283.74 $3,283.74 08-JUL-01 256MB Memory Card $3,903.32 $2,835.34 10-JUL-01 256MB Memory Card $138.82 $1,580.50
Data is usually stored in sparse form. That is, if no value exists for a given combination of dimension values, no row exists in the fact table (the table in a data warehouse that contains the important facts, frequently sales). However, a reader of a business report may want to view the data in dense form, with rows for all combinations of dimension values displayed even when no fact table data exists for them. For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales value next to it. Moreover, time series calculations can be performed most easily when data is dense along the time dimension. This is because dense data will fill a consistent number of rows for each period, which in turn makes it simple to use window functions with physical offsets.
Data densification is the process of converting sparse data into dense form. To overcome the problem of sparsity, you can use a partition outer join to fill the gaps in a time series or any dimension. Such a join extends the conventional outer join syntax by applying the outer join to each logical partition defined in a query. Oracle Database logically partitions the rows in your query based on the expression you specify in the PARTITION
BY
clause. The result of a partition outer join is a UNION
operation of the outer joins of each of the partitions in the logically partitioned table with the table on the other side of the join. Note that you can use this type of join to fill the gaps in any dimension, not just the time dimension.
When you want to fill in missing rows in a result set or perform time series calculations, use a partition outer join.
You may want to see how a particular product sold over the duration of a number of weeks. In this example, memory cards from the Photo category are used. Because these products are not sold very frequently, there may be weeks that a product is not sold at all. To make convenient comparisons, you must make the data dense using the partition outer join as illustrated in the following example.
To use partition outer join:
SELECT tim.week_ending_day , rev.prod_name product , nvl(SUM(rev.amount_sold),0) revenue FROM (SELECT p.prod_name, s.time_id, s.amount_sold FROM products p , sales s WHERE s.prod_id = p.prod_id AND p.prod_category = 'Photo' AND p.prod_name LIKE '%Memory%' AND s.time_id BETWEEN TO_DATE('25-JUN-2001','dd-MON-yyyy') AND TO_DATE('16-JUL-2001','dd-MON-yyyy') ) rev PARTITION BY (prod_name) RIGHT OUTER JOIN (SELECT time_id, week_ending_day FROM times WHERE week_ending_day BETWEEN TO_DATE('01-JUL-2001','dd-MON-yyyy') AND TO_DATE('16-JUL-2001','dd-MON-yyyy') ) tim ON (rev.time_id = tim.time_id) GROUP BY tim.week_ending_day , rev.prod_name ORDER BY tim.week_ending_day , rev.prod_name; WEEK_ENDI PRODUCT REVENUE --------- ---------------------------------------- ---------- 01-JUL-01 128MB Memory Card 0 01-JUL-01 256MB Memory Card 1404.15 08-JUL-01 128MB Memory Card 6567.48 08-JUL-01 256MB Memory Card 8506.01 15-JUL-01 128MB Memory Card 0 15-JUL-01 256MB Memory Card 138.82
Queries that make extensive use of window functions as well as different types of joins and access many tables can become quite complex. The WITH
clause enables you to eliminate much of this complexity by incrementally building up the query. It lets you reuse the same query block in a SELECT
statement when it occurs more than once within a complex query. Oracle Database retrieves the results of a query block and stores them in the user's temporary tablespace.
When a query has multiple references to the same query block and there are joins and aggregations, use the WITH
clause.
Assume you want to compare the sales of memory card products in the Photo category for the first three week endings in July 2001. The following query takes into account that some products may not have sold at all in that period, and it returns the increase or decrease in revenue relative to the week before. Finally, the query retrieves the percentage contribution of the memory card sales for that particular week. Due to the use of the WITH
clause, individual sections of the query are not very complex.
To use the WITH
clause:
WITH sales_numbers AS ( SELECT s.prod_id, s.amount_sold, t.week_ending_day FROM sales s , times t , products p WHERE s.time_id = t.time_id AND s.prod_id = p.prod_id AND p.prod_category = 'Photo' AND p.prod_name LIKE '%Memory%' AND t.week_ending_day BETWEEN TO_DATE('01-JUL-2001','dd-MON-yyyy') AND TO_DATE('16-JUL-2001','dd-MON-yyyy') ) , product_revenue AS ( SELECT p.prod_name product, s.week_ending_day, SUM(s.amount_sold) revenue FROM products p LEFT OUTER JOIN (SELECT prod_id, amount_sold, week_ending_day FROM sales_numbers) s ON (s.prod_id = p.prod_id) WHERE p.prod_category = 'Photo' AND p.prod_name LIKE '%Memory%' GROUP BY p.prod_name, s.week_ending_day ) , weeks AS ( SELECT distinct week_ending_day week FROM times WHERE week_ending_day BETWEEN TO_DATE('01-JUL-2001','dd-MON-yyyy') AND TO_DATE('16-JUL-2001','dd-MON-yyyy') ) , complete_product_revenue AS ( SELECT w.week, pr.product, nvl(pr.revenue,0) revenue FROM product_revenue pr PARTITION BY (product) RIGHT OUTER JOIN weeks w ON (w.week = pr.week_ending_day) ) SELECT week , product , TO_CHAR(revenue,'L999G990D00') revenue , TO_CHAR(revenue - lag(revenue,1) OVER (PARTITION BY product ORDER BY week),'L999G990D00') w_w_diff , TO_CHAR(100 * RATIO_TO_REPORT(revenue) OVER (PARTITION BY week),'990D0') percentage FROM complete_product_revenue ORDER BY week, product; WEEK PRODUCT REVENUE W_W_DIFF PERCENT --------- ----------------- ------- -------- ------- 01-JUL-01 128MB Memory Card $0.00 0.0 01-JUL-01 256MB Memory Card $1,404.15 100.0 01-JUL-01 64MB Memory Card $0.00 0.0 08-JUL-01 128MB Memory Card $6,567.48 $6,567.48 43.6 08-JUL-01 256MB Memory Card $8,506.01 $7,101.86 56.4 08-JUL-01 64MB Memory Card $0.00 $0.00 0.0 15-JUL-01 128MB Memory Card $0.00 -$6,567.48 0.0 15-JUL-01 256MB Memory Card $138.82 -$8,367.19 100.0 15-JUL-01 64MB Memory Card $0.00 $0.00 0.0