datastage

1. How do you rename all of the jobs to support your new File-naming conventions? A: Create a Excel spreadsheet with new and old names. Export the whole project as a dsx. Write a Perl program, which can do a simple rename of the strin gs looking up the Excel file. Then import the new dsx file probably into a new project for testing. Recompile all jobs. Be cautious that the name of the jobs has also been changed in your job control jobs or Sequencer jobs. So you have to make the necessary changes to these Sequencers.

2.

Does the selection of 'Clear the table and Insert rows' in the ODBC stage send a Truncate statement to the DB or does it do some kind of Delete logic. A: There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a delete from statement. On an OCI stage such as Oracle, you do have both Clear and Truncate options. They are radically different in permissions (Truncate requires you to have alter table permissions where Delete doesn't).

3.

Tell me one situation from your last project, where you had faced problem and How did u solve it? A: The jobs in which data is read directly from OCI stages are running extremely slow. I had to stage the data before sending to the tr ansformer to make the jobs run faster. B. The job aborts in the middle of loading some 500,000 rows. Have an option either cleaning/deleting the loaded data and then run the fixed job or run the job again from the row the job has aborted. To make sure the load is proper we opted the former.

4.

The above might rise another question: Why do we have to load the dimensional tables first, then fact tables: A: As we load the dimensional tables the keys (primary) are generated and these key s (primary) are Foreign keys in Fact tables.

5.

How will you determine the sequence of jobs to load into data warehouse?

A: First we execute the jobs that load the data into Dimension tables, then Fact tables, then load the Aggregator tables (if any).

6.

What are the command line functions that i mport and export the DS jobs?

A: A. dsimport.exe- imports the DataStage components. B. dsexport.exe- exports the DataStage components.

7.

What is the utility you use to schedule the jobs on a UNIX server other than using Ascential Director? A: Use cron tab utility along with dsexecute() function along with proper parameters passed.

8.

How would call an external Java function which are not supported by DataStage?

A: Starting from DS 6.0 we have the ability to call ex ternal Java functions using a Java package from Ascential. In this case we can even use the command line to invoke the Java function and write the return values fr om the Java program (if any) and use that files as a source in DataStage job.

9.

What will you in a situation where somebody wants to send you a file and use that file as an input or reference and then run job. A: A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run the job. May be y ou can schedule the sequencer around the time the file is expected to arrive. B. Under UNIX: Poll for the file. On ce the file has start th e job or sequencer depending on the file.

10. Read the String functions in DS A: Functions like [] -> sub-string function and ':' -> concatenation operator Syntax: string [ [ start, ] length ] string [ delimiter, instance, repeats ]

11. How did u connect with DB2 in your last project? A: Most of the times the data was sent to us in the form of f lat files. The data is dumped and sent to us. In some cases were we need to connect to DB2 for look-ups as an instance then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation and availability. availability. Certainly DB2-UDB is better in terms of performance as you know the native drivers are always better than ODBC drivers. 'iSeries Access ODBC Driver 9.00.02.02' - ODBC drivers to connect to AS400/DB2.

12. What are Sequencers? A: Sequencers are job control programs that execute other jobs with preset Job parameters.

13. How did you handle an 'Aborted' sequencer? A: In almost all cases we have to delete the data inserted by this from DB manually and fix the job and then run the job again.

14. What are other Performance tunings you have done in your last project t o increase the performance of slowly running jobs? A: 1. Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the server using Hash/Sequential files for optimum performance also for data recovery in case job aborts. 2. Tuned the OCI stage for 'Arr ay Size' and 'Rows per Transaction' numerical values for faster ins erts, updates and selects. 3. Tuned the 'Project Tunables' in Administrator for better perf ormance. 4. Used sorted data for Aggregator. 5. Sorted the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs 6. Removed the data not used from the source as early as possible in the job. 7. Worked with DB-admin to create appropriate Indexes on tables for better performance of DS queries 8. Converted some of the complex joins/business in DS to Stored Procedures on DS for faster execution of the j obs. 9. If an input file has an excessive number of rows and can be split-up then use standard logic to run jobs in parallel. 10. Before writing a routine or a transform, make sure that there is not the functionality required in one of the standard routines supplied in the sdk or ds utilities categories. Constraints are generally CPU intensive and take a significant amount of time to process. This may be the case if the constraint calls routines or external macros but if it is inline code then the overhead will be minimal. 11. Try to have the constraints in the 'Selection' criteria of the jobs itse lf. This will eliminate the unnecessary records even getting in before joins are made. 12. Tuning should occur on a job-by-job basis. 13. Use the power of DBMS. 14. Try not to use a sort stage when you can use an ORDER BY clause in the database. 15. Using a constraint to filter a record set is much slower than performing a SELECT « WHERE«. 16. Make every attempt to use the bulk loader for your particular database. Bulk loaders are generally faster than us ing ODBC or OLE.

15. How did you handle reject data? A: Typically a Reject-link is defined and the rejected data is loaded back into data warehouse. So Reject link has to be defined every Output link you wish to collect rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-rows where data is expected.

16. If worked with DS6.0 and latest versions what are Link-Partitioner and Link-Collector used for? A: Link Partitioner - Used for partitioning the data. Link Collector - Used for c ollecting the partitioned data.

17. What are Routines and where/how are they written and have you written any routines before? A: Routines are stored in the Routines branch of the DataStage Repository, where you can create, view or edit. The following are different ty pes of routines: 1) Transform functions 2) Before-after job subroutines 3) Job Control routines

18. What are OConv () and Iconv () functions and where are they used? A: IConv() - Converts a string to an internal storage format OConv() - Converts an expression to an output format.

19. How did u connect to DB2 in your last project? A: Using DB2 ODBC drivers.

20.

Do u know about METASTAGE?

A: MetaStage is used to handle the Metadata which will be very useful for data lineage and data analysis later on. Meta Data defines the type of data we are handling. This Data Definitions are stored in repository and can be accessed with the use of MetaStage.

21.

Do you know about INTEGRITY/QUALITY stage?

A: Qulaity Stage can be integrated with DataStage, In Quality Stage we have many stages like investigate, match, survivorship like that so that we c an do the Quality related works and we can integrate with datastage we need Quality stage plugin to achieve the task.

22.

Explain the differences between Oracle8i/9i?

A: Oracle 8i does not support pseudo column sysdate but 9i supports Oracle 8i we can create 256 columns in a table but in 9i we can u pto 1000 columns(fields)

23.

How do you merge two files in DS?

A: Either use Copy command as a Before-job subroutine if the metadata of the 2 files are same or create a job to concatenate the 2 files into one if the metadata is different.

24.

What is DS Designer used for?

A: You use the Designer to build jobs by creating a visual design that models the flow and transformation of data from the data source through to the target warehouse. The Designer graphical interface lets you select stage icons, drop them onto the Designer work area, and add links.

25.

What is DS Administrator used for?

A: The Administrator enables you to set up DataStage users, control the purging of the Repository, and, if National Language Support (NLS) is enabled, install and manage maps and locales.

26.

What is DS Director used for?

A: datastage director is used to run the jobs and validate the jobs. we can go to datastage director from datastage designer it self.

27.

What is DS Manager used for?

A: The Manager is a graphical tool that enables you to v iew and manage the contents of the DataStage Repository

28.

What are Static Hash files and Dynamic Hash files?

A: As the names itself su ggest what they mean. In general we use Type-30 dynamic Hash files. The Data file has a default size of 2Gb and the overflow file is used if the data exceeds the 2GB size.

29.

What is Hash file stage and what is it used for?

A: Used for Look-ups. It is like a reference table. It is also use d in-place of ODBC, OCI tables for better performance.

30.

Have you ever involved in updating t he DS versions like DS 5.X, if so tell us some the steps you have taken in doing so? A: Yes. The following are some of the steps; I have taken in doing so: 1) Definitely take a back up of th e whole project(s) by exporting the project as a .dsx file 2) See that you are using the s ame parent folder for the new version also for your old jobs using the hard-coded file path to work. 3) After installing the new ver sion import the old project(s) and you have to compile them all again. You can use 'Compile All' tool for this. 4) Make sure that all your DB DSN's are created with the same name as old one's. This step is for moving DS from one machine to another. 5) In case if you are just upgrading your DB from Oracle 8i to Oracle 9i th ere is tool on DS CD that can do this for you. 6) Do not stop the 6.0 server before the upgrade, version 7.0 install process collects project infor mation during the upgrade. There is NO rework (recompilation of existing jobs/routines) needed after the upgrade.

31.

Did you Parameterize the job or hard-coded the values in the jobs?

A: Always parameterized the job. Either the values are coming from Job Properties or from a µParameter Manager¶ ± a third part tool. There is no way you will hard±code some parameters in your jobs. The often Parameterized variables in a job are: DB DSN name, username, password, dates W.R.T for the data to be looked against at.

How can I create a dynamic column name in Cognos?

1.Create a calculated column which contains the information that the header is to contain, such as "Report for year 1999" (co ncatenated text and date to string sub string extraction). 2.Highlight the report, and then right-click. 3.Select Properties, and then click the Headers/Footers tab. 4.Clear the Column Title Header check box. This will remove the headers from your columns. 5.Reinsert the rest of the column headers; insert text will work. For the dynamic co lumn, from the Insert menu, click Data and select the calculated column you created and insert it into the report. How to improve Report performance in ReportNet?

By aggregation. Try to use less filters in the report. Use star schema grouping, check the joins. Also drill through functionality increases performance as well. Always try to use appropriate functionality which can save t ime in this way you can improve per formance of a report. What is object security in framework manager?

Object security restrict access to specific objects in a project, for example if you have a user or group of users who you donâ¼t want access to a particular Namespace then you can remove access to that Namespace for that user or group of users. Objects which can have security applied include namespaces, quer y subjects, query items, filters and folders. You can either Allow (make visible) or Deny (not visible) access to these objects. What is catalog and types of catalogs in cagonos?

A catalog is a file containing the information (Database tables) that Impromptu users need to create reports.  personal  distributed  shared  secured What is the difference between a cascading report and drillthru report?why do we go for drill thru report?

Cascading prompt means to extract t he data from one prompt to using values from another prompt. Drill through report means to extract the dat a by selecting a column in a report, means to get the detailed information by clicking that column data. How to join multiple db in catalog?

Mutiple database could not be connected in a single catalog. So that we use hotfile for this purpose. How can i schedule reports in cognos?

By using Cognos Schedular, one can schedule the running reports in Impromptu to execute and save it in desired format.

By using Cognos MACRO script language the reports can be executed and distributed to recipients by using mail applications. Compiled Cognos MACROs can be scheduled using Cognos Schedular. What is Cognos Visualizer and Cognos Scripting?

Cognos Visualizer:Cognos Visualizer provides powerful visual analysis to communicate complex business dataquickly and intuitively. By going beyond simple pie or bar graphs, you can view and combinedata in a way that lets you immediately see areas of concern, trends, and other indicators.Cognos scripting:You can use t he Cognos Script editor, or any other OLE compliant editor to create, modify,compile, and run macros.A macro is a set of commands that automates the running of an application. You can auto matemost Cognos Business Intelligence applications by using the Cognos Script language (a BASIC-like language) the Cognos Script Editor, and OLE automation What is the difference between group and associare?

When you group data, Impromptu sorts the data item you are grouping and removes duplicatevalues. You can organize your report by grouping on one or more data items.By associating a grouped data item to another data item, you can e liminate duplicateinformation from your report.You can associate one or more data items with a grouped data item. An associated data itemshould have a one-to-one relationship with the grouped data item. For example, Order No. is agrouped data item. For each order number, there is an Order Date. Order No. and Order Datehave a one-to-one relationship. What is drill down and slicing and dicing whats the difference between them?

Drilling lets you quickly move from one level of detail to another to explore different aspects of your business. Drilling down moves you down in the hierarchy; drilling up moves you up in the hierarchy. Slicing and Dicing: While you can drill to look at information at any level, you can slice and dice to select the exact information you want in your report. How to generate cubes in cognos?

Power Play Transformer contain dimension,measure,model and cube. we can generate cube different way. 1. Just right click on cube name and build 2. we can write script in unix. using that we can generate cube. What is snapshot?

A Snapshot is the copy of data, when we create a snapshot it copies the exact data that's related to the at perticular report, we use snapshot when ever we want to compare reports(ex we want to compare this months report with previous months) What is meant by Junk Dimension?

The junk dimension is simply a structure that provides a convenient place to store the junk attributes What is IQD?What is contained in IQD? How you create IQD In ReportNet FrameWork?

1. Create a new namespace.

2. Create a new query subject inside IQDReportNet. 3.In the query subject definition window, pull in all the data you need. 4. after creating, click on the new query subject, from the pro perties pane, select "IQD" from the drop down menu of externalizeMethod. DONE. to Publish. u need to create a package. 1. create a new package. a WIZARD will open. 2. Name the new Package ; select the IQDReportNet object from the project. 3. add security, select language, select the funtion list(DB), select the location from the WIZARD 4. Publish the package. When You Import data into catalog You have complex columns names. How do you change the Name of those columns?

Open Impromptu Administrator -> Catalog -> Folders option Se lect the column you want to Click the Edit button, From Left Pane Select Particular Column which you wants to Rename Drag that into Right Pane then Select that Column and Click on Rename Button then Change the name. What is loop in frameworkmanager ?

Loop: loop is a closed path (relation) that exists among 3 (or) more tables. For example, if we have '3' tables T1, T2, T3 then, a loop exists among these tables only when we create joins in the following fashion: Loop:

T1 ---> T2 ---> T3 ---> T1

To resolve the above pro blem, we have to create a shortcut (or) Alias to the T able T1. No Loop:

T1 ---> T2 ---> T3 ---> Alias (or) Shortcut of T1

What is the security module used in cognos?

Cognos uses KERBORES security model. It was the model which was found in 1980. How can i test reports in cagonos?

In cognos report net by the validate report option a report can be tested. If there will be any error, it will specify the the error, unless it will give message -'report specification is valid'.

1.

what is use of SDR function? 2. I s the re any in-built transform for surrogate key? 3. what is the use of DSattchJob?DetachJob? where can we find it? 4. What is the complex job you have c reated ? tell me the logic of that complex job? 5. How can I know which Job is working on which node? 6. how can I know no of nodes in unix? And which are working at present? 7. root tree will find which is server job and which is parallel job? 8. how can we create rank usin g datastage?what is the meaning of rank? 9. how do u convert date in 20-12-07 to dec-20-2007 or 20-dec-2007 or 20-dec-07 in parallel jobs? 10. what is job control ? what is the use of it explain with ste ps? 11. is it possible to access the same job by two users at a time in DataStage? 12. how to drop the index before loading data into target and how to rebuild it in DS? 13. what is meaning of instance in DataStage? explain with example ? 14. how do u clean the DS re pository? 15. where the log files or tables can store in DS? 16. how can we create read only jobs in DS? 17. where we use config file as parameter ? tell me in th e project? 18. job locking methods? How can we unlock the job? Explain in briefly? 19. Is there any possibility to generate alphanumeric surrogate key? Explain briefly? 20. what are .ctl(control files) files ? how the dataset stage have better performance by this files? 21. how can we view the data of dataset from hard disk? Not from rightclick view data? 22. how can we get basic transformer in parallel jobs? 23. What are the new features of Datastage 7.1 from data stage 6.1 24. is the difference between DataStage and DataStage Scripting? 25. if we using two s ources having same meta data and how to check the data in two sources is same or not? an d if the data is not same i want to abort the job ?how we can do this? 26. memory wise difference between lookup merge and join 27. sparse lookup where its option in oci or db2 28. peekstage and pivot its practical use 29. how do u reduce warnings 30. how input data is node partitioned 31. involvement in unit testing 32. how many nodes per partition 33. datastage telnet tool where it is and u se 34. how node connect with cpu 35. dataset use? 36. how exactly schedule ur jobs ? file watcher jobs ? command watcher jobs? 37. What is Ad-Hoc access? What is the differen ce between Managed Query and Ad-Hoc access? 38. Different ways a project can be moved to production ?...example â¼¦.export-import and Version control. 39. Have u written any custom r outines in your project? If so explain 40. How do you get log info into a file? 41. What is Clear Status File and when do you use it? 42. What is Cleanup Resources and when do you use it? 43. What are the various reports that could be generated u sing this Datastage? 44. How to remove blank spaces from data? 45. Could DataStage generate test cases? 46. Can one combine pipeline & partition parallelism? How and when? 47. Are there any configurations restrictions ? for particular node? 48. what r wrappers ? 49. what is buildups? And its use? 50. What is page size/array size in oracle? 51. What is parallel execution in oracle? 52. What is Bitmap and B-Tree indexes ? 53. How can one tune SQLâ¼s in Oracle? 54. how do we create index in data satge? 55. What is the diffrence between IBM Web Sphere DataStage 7.5 (Enterprise Edition ) & Standard Ascential DataStage 7.5 Version ? 56. What is the diffrence between the Dynamic RDBMS Stage & Static RDBMS Stage ? 57. How to enable the datastage engine? 58. Type 30D hash file is GENERIC or SPECIFIC? 59. how to attach a mtr file (MapTrace) via email and the MapTrace is used to record all the execu te map errors 60. What is Orchestrate options in generic stage, what are the option names. value ? Name of an Orchestrate operator to call. what are the orchestrate operators available in datastage for AIX environment 61. scripting means? Explain briefly How it controls the job? 62. how will u design file watech j obs? 63. OLAP uses which index ? 64. use of correlated query? 65. when exactly we use ne sted tables ? 66. what are range lookups? 67. what is connected and unconnected lookups? 68. how do u catch bad rows from OCI stage? 69. CLI stands for? 70. how do u do SQL tuning? 71. what is the meaning of â¼pre load into memoryâ¼ in hash file? 72. how did you handle aborted sequencer? 73. in Unix â¼³ what is poll? 74. in parallel jobs in particular job how can I abort the job after 50 warnings ? and where I find check point for the particular job? (not in sequencer or job sequence) 75. what are the problems u faced in u r project? 76. Does truncate option is available for ODBC ? 77. exact difference between Transform, routine, func tion? 78. what is unique function ? where can I find it? how it can delete duplicate records? 79. how exactly we use stored procedures? Explain steps briefly? 80. In 4 jobs 1st job is aborted after loading 65000 records instead of 100000 records , then how c an u load the rest of the records in to target? 81. if RCP disabled what happen?

82. tell me about OSH? What is the use? Where we use it? 83. plz tell me about DataStage versions releasing years? 84. difference between routine and subroutine? 85. difference between Interprocess and Inprocess? 86. how to use rank and update strategy in DataStage? 87. how can we convert date(dd/mm/yy)into Julion date in parallel jobs? 88. How lock happens to the job ? particularly I want to lock the job how I lock the j ob? 89. where we find log tables ? 90. why we prefer oracle se quence rather than dataStage routine? 91. How we find no of rec ords in Hash file? 92. difference between inline query and subquery? 93. what is conformed dimension layer? 94. In sort re move duplicate is possible then why we separately we go for remove duplicate stage? 95. if a column contains data like ram,rakesh,madhan,suraj,pradeep,bhaskar then I want to place names separated by commas in another columns how can we do? 96. In the director of job view log, if we get any fatal err or with error code no, how to debug with that error code no?



Understanding Indexes Submitted by rleishman on Sun, 2007-02-04 04:06

y

RDBMS Server

Of iPods and Indexes I'm not really an "early-adopter" of technology. Don't get me wrong; I love it, I just don't want to feed the addiction. When I do get a new piece of technology though, it's like a fever; I can't concentrate on anything until I've read the manual from cover to cover and found out everything it can do, every built-in gizmo, and every trashy piece of after-market merchandise that can be plugg ed into it. And I don't think I'm alone here. Working in I.T., t here's no shortage of people who can peel off a half-hour litany on their new Blackberry/IPod/Notepad/Digital Watch within a day of purchase.

So why are databases different? I worked with Oracle databases for 5 years before I understood indexes - and it's right there in the manual (Concepts manual, for those interested). I don't mean a deep, spiritual, one-ness with indexes; I mean just a basic understanding of the mechanics of the t hings. I distinctly remember my first tuning methodology: "It's running slow. I think I'll index some of the columns and see if it improves." I should have copyrighted it because I've seen it used so many times in the last 10 years, I could've made a fortune in commissions. If you understand how indexes work, 99 times out of a 100 you don't need the suck-it-and-see methodology because you will help?

know beforehand whether

an index

What is an Index? This is covered in the Oracle Concepts manual, of course, but here's the Cliff Notes version.

Blocks you need to understand a block. A block - or page for Microsoft boffins - is the smallest unit of disk that Oracle will read or write. All data in Oracle tables, indexes, clusters - is stored in blocks. The block size is configurable for any given database but is usually one of 4Kb, 8Kb, 16Kb, or 32Kb. Rows i n a table are usually much smaller than this, so many rows will generally fit into a single block. So you never read "just one row"; you will always read the entire block and ignore the rows you don't need. Minimising this wastage is one of the fundamentals of Oracle Performance Tuning. First

Oracle uses two different index architectures: b-Tree indexes and bitmap indexes. Cluster indexes, bitmap join indexes, function-based indexes, reverse key indexes and text indexes are all just variations on the two main types. b-Tree is t he "normal" index, so we will come back to Bitmap indexes another time.

The "-Tree" in b-Tree A b-Tree index is a data structure in the form of a tree - no surprises there - but it is a tr ee of database blocks, not rows. Imagine the leaf blocks of the index as the pages of a phone book.

o

Each

o

The names on each page are sorted, and the pages - when sorted correctly - contain a complete sorted list of every name and address

page in the book (leaf block in t he index) contains many entries, which consist of a name (indexed column value) and an address (ROWID) that tells you the physical location of the telephone (row in the table).

A sorted list in a phone book is fine for humans, beacuse we have mastered "the flick" - the ability to fan through the book looking for t he page that will contain our target without reading the entire page. When we flick through the phone book, we are just reading the first name on each page, which is usually in a larger font in the page header. Oracle cannot read a single name (row) and ignore the reset of the page (block); it needs to read the entire block. If we had no thumbs, we may find it convenient to create a separate ordered list containing the first name on each page of the phone book along with the page number. This is how the branch-blocks of an index work; a reduced list that contains the first row of each block plus the address of that block. In a large phone book, this reduced list containing one entry per page will still cover many pages, so the process is repeated, creating the next level up in the index, and so on until we are left with a single page: the root of the tree. To find the name Gallileo in this b-Tree phone book, we:

o o o

Read page 1. This tells us that page 6 starts with

F ermat

Read page 6. This tells us that page 350 starts with

and that page 7 starts with Haw king.

he F ys

and that page 351 starts with Garibaldi.

Read page 350, which is a leaf bl ock; we find Gallileo's address and phone number.

That's it; 3 blocks to find a specific row in a million row table. In reality, index blocks often fit 100 or more rows, so b-Trees are typically quite shallow. I have never seen an index with more than 5 levels. Curious? Try this:

SELECT index_name, blevel+1 FROM user_indexes ORDER BY 2;

user_indexes.blevel is the number of branch levels. Always add 1 to i nclude the leaf level; this tells you the number of blocks a unique index scan must read to reach the leaf-block. If you're really, really, insatiably curious; try this in SQL*Plus:

ACCEPT

ALTER

index_name PROMPT "Index Name: "

SESSION SET TRACEFILE_IDENTIFIER = '&index_name';

COLUMN object_id NEW_VALUE object_id

SELECT object_id FROM

user_objects

WHERE

object_type = 'INDEX'

AND

object_name = upper('&index_name');

ALTER

SESSION SET EVENTS 'IMMEDI ATE TRACE NAME TREEDUMP LEVEL &object_id';

ALTER

SESSION SET TRACEFILE_IDENTIFIER = "";

SHOW PARAMETER user_dump_dest

Give the name of an index on a smallish table (because this will create a BIG file). Now, on the Oracle server, go to the directory shown by the final SHOW

PARAMETER

***

user_dump_dest command and find your trace file - the file name will contain your index name. Here is a sample:

2007-01-31 11:51:26.822

----- begin tree dump branch: 0x68066c8 109078216 (0: nrow: 325, level: 1) leaf: 0x68066c9 109078217 (-1: nrow: 694 rrow: 694) leaf: 0x68066ca 109078218 (0: nrow: 693 rrow: 693) leaf: 0x68066cb 109078219 (1: nrow: 693 rrow: 693) leaf: 0x68066cc 109078220 (2: nrow: 693 rrow: 693) leaf: 0x68066cd 109078221 (3: nrow: 693 rrow: 693) ... ... leaf: 0x68069cf 109078991 (320: nrow: 763 rrow: 763) leaf: 0x68069d0 109078992 (321: nrow: 761 rrow: 761) leaf: 0x68069d1 109078993 (322: nrow: 798 rrow: 798) leaf: 0x68069d2 109078994 (323: nrow: 807 rrow: 807) ----- end tree dump

This index has only a root branch with 323 leaf nodes. interesting, but it would take a while to dump.

Each

leaf node contains a variable number of index entries up to 807! A deeper index would be more

"B" is for... Contrary to popular belief, b is not for binary ; it's balanced . As you insert new rows into the t able, new rows are inserted into index leaf blocks. When a leaf block is full, another insert will cause the block to be split into two blocks, which means an entry for the new block must be added to the parent branch-block. If the br anch-block is also full, it too is split. The process propagates back up the tree until the parent of split has space for one more entry, or the root is reached. A new root is created if the root node splits. Staggeringly, this process ensures that every branch will be the same length. Try it on paper for yourself!

How are Indexes used? Indexes have three main uses:

o

To quickly find specific rows by avoiding a

Full

Table Scan

We've already seen above how a Unique Scan works. Using the phone book metaphor, it's not hard to understand how a Range Scan works in much the same way to find all people named "Gallileo", or all of the names alphabetically between "Smith" and "Smythe". Range Scans can occur when we use >, <, LIKE, or BETWEEN in a WHERE clause. A range scan will find the first row in the range using the same technique as the Unique Scan, but will then keep reading the index up to the end of the range. It is OK if the range covers many blocks.

o

To avoid a table access altogether If all we wanted to do when looking up Gallileo in the phone book was to find his address or phone number, the job would be done. However if we wanted to know his date of birth, we'd have to phone and ask. This takes time. If it was something that we needed all the time, like an email address, we could save time by adding it to the phone book.

Oracle does the same thing. If the information is in the index, then it doesn't bother to read the table. It is a reasonably common technique to add columns to an index, not because they will be used as part of the index scan, but because they save a table access. In fact, Oracle may even perform a Fast Full Scan of an index that it cannot use in a Range or Unique scan just to avoid a table access.

o

To avoid a sort This one is not so well known, largely because it is so poorly documented (and in many cases, unpredicatably implemented by the Optimizer as well). Oracle performs a sort for many reasons: ORDER BY , GROUP BY, DISTINCT, Set operations (eg. UNION), Sort-Merge Joins, uncorrelated IN-subqueries, Analytic Functions). If a sort operation requires rows in the same order as the index, then Oracle may read the table rows via the index. A sort operation is not necessary since the rows are returned in sorted order.

Despite all of the instances listed above where a sort is performed, I have only seen three cases where a sort is actually avoided.

1.

GROUP BY

2.

1

select src_sys, sum(actl_expns_amt), count(*)

3.

2

from ef_actl_expns

4.

3

where src_sys = 'CDW'

5.

4

and actl_expns_amt > 0

6.

5* group by src_sys

7. 8.

-------------------------------------------------------------

9.

| Id

10.

-------------------------------------------------------------

11.

|

0 | SELECT STATEMENT

|

|

12.

|

1 |

|

|

13.

|*

2 |

14.

|*

3 |

15.

-------------------------------------------------------------

| Operation

SORT GROUP BY NOSORT

| Name

TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ ACTL_EXPNS | INDEX RANGE SCAN

| EF_AEXP_PK

16. 17.

Predicate Information (identified by operation id):

18.

---------------------------------------------------

19. 20.

|

2 - filter("ACTL_EXPNS_ AMT">0)

3 - access("SRC_SYS"='CDW')

|

Note the NOSORT qualifier in Step 1.

21.

ORDER BY

22.

1

select

23.

2

from ef_actl_expns

24.

3


25.

4


26.

5* order by src_sys

*

27. 28.

------------------------------------------------------------

29.

| Id

30.

------------------------------------------------------------

31.

|


32.

|*

1 | TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ACTL_EXPNS |

33.

|*

2 |

34.

------------------------------------------------------------

| Operation

INDEX RANGE SCAN

| Name

|

| EF_AEXP_PK

|

|

|

35. 36.


37.

---------------------------------------------------

38. 39.


40.


Note that there is no SORT operation, despite the ORDER BY clause. Compare this to the following:

1

select

2

from ef_actl_expns

3


4


*

5* order by actl_expns_amt

------------------------------------------------------------| Id

| Operation

| Name

|

------------------------------------------------------------|


|

|

|

1 |

|

|

|*

2 |

SORT ORDER BY

TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ ACTL_EXPNS |

|*

3 |

INDEX RANGE SCAN

| EF_AEXP_PK

|

-------------------------------------------------------------

Predicate Information (identified by operation id): ---------------------------------------------------

2 - filter("ACTL_EXPNS_ AMT">0) 3 - access("SRC_SYS"='CDW')

41.

DISTINCT

42.

1

select distinct src_sys

43.

2

from ef_actl_expns

44.

3


45.

4* and actl_expns_amt > 0

46. 47.

-------------------------------------------------------------

48.

| Id

49.

-------------------------------------------------------------

50.

|


|

|

51.

|

1 |

|

|

52.

|*

2 |

53.

|*

3 |

54.

-------------------------------------------------------------

| Operation

SORT UNIQUE NOSORT

| Name

|

TABLE ACCESS BY GLOBAL INDEX ROWID| EF_ ACTL_EXPNS | INDEX RANGE SCAN

| EF_AEXP_PK

|

55. 56.


57.

---------------------------------------------------

58. 59.



Again, note the NOSORT qualifier.

This is an extraordinary tuning technique in OLTP systems like SQL* Forms that return one page of detail at a time t o the screen. A SQL with a DISTINCT, GROUP BY, or ORD ER BY that uses an index to sort can return just the first page of matching rows without having to fetch the entire result set for a sort. This can be t he difference between sub-second response time and several minutes or hours.

Everybody repeat after me: "Full table Scans are not bad" Up to now, we've seen how indexes can be good. It's not always the case; sometimes indexes are no help at all, or worse: they make a query slower .

A b-Tree index will be no help at all in a reduced scan unless the WHERE clause compares indexed columns using >, <, LIKE, IN, or BETWEEN operators. A b-Tree index cannot be used to scan for any NOT style operators: eg. !=, NOT IN, NOT LIKE. There are lots of conditions, caveats, and complexities regarding joins, sub-queries, OR predicates, functions (inc. arithmetic and concatenation), and casting that are outside the scope of this article. Consult a good SQL tuning manual. Much more interesting - and important - are the cases where an index makes a SQL process large quantities of data.

slower . These are

particularly common in batch systems that

To explain the problem, we need a new metaphor. Imagine a large deciduous tree in your front yard. It's Autumn, and it's your job t o pick up all of the leaves on the lawn. Clearly, the fastest way to do this (without a rake, or a leaf-vac...) would be get down on hands and knees with a bag and work your way back and forth over the lawn, stuffing leaves in the bag as you go. This is a Full Table Scan, selecting rows in no particular order, except that they are nearest to hand. This metaphor works on a couple of levels: you would grab leaves in handfuls, not one by one. A Full Table Scan does the same thing: when a bock is read from disk, Oracle caches the next few blocks with the expectation that it will be asked for them very soon. Type this in SQL*Plus:

SHOW PARAMETER db_file_multiblock_read_count

Just to shake things up a bit (and to feed an undiagnosed obsessive compulsive disorder), you decide to pick up the leaves in order of size. In support of this endeavour, you take a digital photograph of the lawn, write an image analysis program to identify and measure every leaf, then load the results into a Virtual Reality headset that will highlight the smallest leaf left on the lawn. Ingenious, yes; but this is clearly going to take a lot longer than a full table scan because you cover much more distance walking from leaf to leaf. So obviously Full Table Scan is t he faster way to pick up every leaf. But just as obvious is that the index (virtual reality headset) is the faster way to pick up just the s malle st leaf , or even the 100 smallest leaves. As the number rises, we approach a break-even point; a number beyond which it is faster to just full table scan. This number varies depending on the table, the index, the database settings, the hardware, and the load on the server; generally it is somewhere between 1% and 10% of the table. The main reasons for this are:

o o

As implied above, reading a table in indexed order means more movement for the disk head.

o

The db_file_multiblock_read_count setting described earlier means physical disk.

o

Even

Oracle cannot read single rows. To read a row via an index, the entire block must be read with all but one row discarded. So an index scan of 100 rows would read 100 blocks, but a FTS might read 100 rows in a single block. FTS

requires fewer visits to the

if none of these things was tr ue, accessing the entire index and the entire table is still more IO t han just accessing the table.

So what's the lesson here? Know your data! If your query needs 50% of the rows in the t able to resolve your query, an index scan just won't help. Not only should you not bother creating or investigating the existence of an index, you should check to make sure Oracle is not already using an index. There are a number of ways to influence index usage; once again, consult a tuning manual. The exception to this rule - there's always one is when all of the columns referenced in the SQL are contained in the index. If Oracle does not have to access the table then there is no breakeven point; it is generally quicker to scan the index even for 100% of the rows.

Summary Indexes are not a dark-art; they work in an entirely predictable and even intuitive way. Understanding how they work moves Performance Tuning from the realm of guesswork to that of science; so embrace the technology and read the manual. »

y

rleishman's blog

y

Login to post comments

Fantastic article Submitted by sethunathu on Mon, 2009-07-06 22:47. Hi Ross,

This is absolutely fantastic demonstration of the working of indexes in oracle. It is very simple to under stand.. full credit to your way of detailing.

Regards.

Sethu

»

y


You're awesome Submitted by atomikrej on Mon, 2009-12-07 18:34. Thanks for the article, it was very helpful. After reading this I removed unnecessary indexes on some columns and my query runs much faster!

»

y


good work Submitted by ultgnp on Sun, 2010-06-13 01:25. One of the best article I've read on indexes. Appreciate your work. Please keep posting such good articles.

»

y


datastage

Recommend Documents