Informatica® PowerCenter® 8 Level I Developer Student Guide Version - PC8LID 20060428
Informatica PowerCenter 8 Level I Developer Student Guide Version 8.1 April 2006
Copyright (c) 1998–2006 Informatica Corporation. All rights reserved. Printed in the USA. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this document is subject to change without notice. If you find any problems in the documentation, please report them to us in writing. Informatica Corporation does not warrant that this documentation is error free. Informatica, PowerMart, PowerCenter, PowerChannel, PowerCenter Connect, MX, and SuperGlue are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software are copyrighted by DataDirect Technologies, 1999-2002. Informatica PowerCenter products contain ACE (TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University and University of California, Irvine, Copyright (c) 1993-2002, all rights reserved. Portions of this software contain copyrighted material from The JBoss Group, LLC. Your right to use such materials is set forth in the GNU Lesser General Public License Agreement, which may be found at http://www.opensource.org/licenses/lgpl-license.php. The JBoss materials are provided free of charge by Informatica, “as-is”, without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. Portions of this software contain copyrighted material from Meta Integration Technology, Inc. Meta Integration® is a registered trademark of Meta Integration Technology, Inc. This product includes software developed by the Apache Software Foundation (http://www.apache.org/). The Apache Software is Copyright (c) 1999-2005 The Apache Software Foundation. All rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit and redistribution of this software is subject to terms available at http://www.openssl.org. Copyright 1998-2003 The OpenSSL Project. All Rights Reserved. The zlib library included with this software is Copyright (c) 1995-2003 Jean-loup Gailly and Mark Adler. The Curl license provided with this Software is Copyright 1996-200, Daniel Stenberg,
. All Rights Reserved. The PCRE library included with this software is Copyright (c) 1997-2001 University of Cambridge Regular expression support is provided by the PCRE library package, which is open source software, written by Philip Hazel. The source for this library may be found at ftp://ftp.csx.cam.ac.uk/pub/software/programming/ pcre. InstallAnywhere is Copyright 2005 Zero G Software, Inc. All Rights Reserved. Portions of the Software are Copyright (c) 1998-2005 The OpenLDAP Foundation. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License, available at http://www.openldap.org/software/release/license.html. This Software is protected by U.S. Patent Numbers 6,208,990; 6,044,374; 6,014,670; 6,032,158; 5,794,246; 6,339,775 and other U.S. Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation “as is” without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of non-infringement, merchantability, or use for a particular purpose. The information provided in this documentation may include technical inaccuracies or typographical errors. Informatica could make improvements and/or changes in the products described in this documentation at any time without notice.
Table of Contents List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix About This Guide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Document Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xx Other Informatica Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Obtaining Informatica Documentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting Informatica Customer Portal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Web Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Developer Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Visiting the Informatica Knowledge Base . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Obtaining Informatica Professional Certification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxi Providing Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii Obtaining Technical Support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxii
Unit 1: Data Integration Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Lesson 1-1. Introducing Informatica . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Lesson 1-2. Data Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Lesson 1-3. Mappings and Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Lesson 1-4. Tasks and Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Lesson 1-5. Metadata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Unit 2: PowerCenter Components and User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lesson 2-1. PowerCenter Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Lesson 2-2. PowerCenter Client Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Workflow Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Unit 2 Lab: Using the Designer and Workflow Manager . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Step 1: Launch the Designer and Log Into the Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Step 2: Navigate Folders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Step 3: Navigating the Designer Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Table of Contents Informatica PowerCenter 8 Level I Developer
iii
Step 4: Create and Save Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Step 5: Launch the Workflow Manager . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Step 6: Navigating the Workflow Manager Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Step 7: Workflow Manager Task Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Step 8: Database Connection Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Unit 3: Source Qualifier . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Lesson 3-1. Source Qualifier Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Source Qualifier Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Datatype Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Lesson 3-2. Velocity Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Lab Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Architecture and Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Unit 3 Lab A: Load Payment Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Section 1: Pass-Through Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Step 1: Launch the Designer and Review the Source and Target Definitions . . . . . . . . . . . . . . 38 Step 2: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Step 3: Create a Workflow and a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Step 4: Run the Workflow and Monitor the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Lesson 3-3. Source Qualifier Joins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Unit 3 Lab B: Load Product Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Section 2: Homogeneous Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Step 1: Import the Source Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Step 2: Import the Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Step 3: Create the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Step 4: Create the Session and Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Step 5: Run the Workflow and Monitor the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Lesson 3-4. Source Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Unit 3 Lab C: Load Dealership and Promotions Staging Table . . . . . . . . . . . . . . . . . . . . 59 Section 3: Two Pipeline Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Step 1: Import the Source Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Step 2: Import the Target Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Step 3: Create the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Step 4: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
iv
Table of Contents Informatica PowerCenter 8 Level I Developer
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler . . . . . . . . . . . . . . . . . . . . 67 Lesson 4-1. Expression Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Lesson 4-2. Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Lesson 4-3. File Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Lesson 4-4. Workflow Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Run Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Unit 4 Lab: Load the Customer Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Step 1: Create a Flat File Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Step 2: Create a Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Step 4: Create a Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Step 5: Create an Expression Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 Step 6: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Step 7: Schedule a Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Unit 5: Joins, Features and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Lesson 5-1. Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Joiner Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 Join Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Joiner Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Lesson 5-2. Shortcuts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Unit 5 Lab A: Load Sales Transaction Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Step 1: Create a Flat File Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Step 2: Create a Relational Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Step 3: Create a Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Step 4: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Step 5: Create a Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Step 6: Link the Target Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Step 7: Create a Workflow and Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Step 8: Start the Workflow and View Results in the Workflow Monitor . . . . . . . . . . . . . . . . 111 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Unit 5 Lab B: Features and Techniques I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Open a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Feature 1: Auto Arrange . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Feature 2: Remove Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Table of Contents Informatica PowerCenter 8 Level I Developer
v
Feature 3: Revert to Saved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Feature 4: Link Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 Feature 5: Propagating Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Feature 6: Autolink by Name and Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 Feature 7: Moving Ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Feature 8: Shortcut to Port Editing from Normal View . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Feature 9: Create Transformation Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Feature 10: Scale-to-Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 Feature 11: Designer Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Feature 12: Object Shortcuts and Copies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Feature 13: Copy Objects Within and Between Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . 128
Unit 6: Lookups and Reusable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Lesson 6-1. Lookup Transformation (Connected) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Lesson 6-2. Reusable Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
Unit 6 Lab A: Load Employee Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Step 1: Create a Flat File Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Step 2: Create a Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Step 3: Step Three: Create a Reusable Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 Step 4: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Step 5: Create a Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Step 6: Add a Reusable Expression Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Step 7: Link Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 Step 8: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
Unit 6 Lab B: Load Date Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Step 1: Create a Flat File Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Step 2: Create a Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Step 4: Create a Workflow and a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Step 5: Run the Workflow and Monitor the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Unit 7: Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Lesson 7-1. Debugging Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
Unit 7 Lab: Using the Debugger . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Step 1: Copy and Inspect the Debug Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 vi
Table of Contents Informatica PowerCenter 8 Level I Developer
Step 2: Step Through the Debug Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 Step 3: Use the Debugger to Locate the Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Step 4: Fix the Error and Confirm the Data is Correct . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Unit 8: Sequence Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Lesson 8-1. Sequence Generator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Unit 8 Lab: Load Date Dimension Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Step 1: Create a Shortcut to a Shared Relational Source Table . . . . . . . . . . . . . . . . . . . . . . . 185 Step 2: Create a Shortcut to a Shared Relational Target Table . . . . . . . . . . . . . . . . . . . . . . . 185 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185 Step 4: Create a Sequence Generator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 Step 5: Link the Target Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Step 6: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
Unit 9: Lookup Caching, More Features and Techniques. . . . . . . . . . . . . . . . . . . . . . . . 191 Lesson 9-1. Lookup Caching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) . . . . 195 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 1: Create a Shortcut to a Shared Relational Source Table . . . . . . . . . . . . . . . . . . . . . . . 198 Step 2: Create a Shortcut to Shared Relational Target Table . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 4: Create Lookups for the Start and Expiry Date Keys . . . . . . . . . . . . . . . . . . . . . . . . . 198 Step 5: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
Unit 9 Lab B: Features and Techniques II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Open a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Feature 1: Find in Workspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Feature 2: View Object Dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 Feature 3: Compare Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 Feature 4: Overview Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Unit 10: Sorter, Aggregator and Self-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Lesson 10-1. Sorter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213 Sorter Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Table of Contents Informatica PowerCenter 8 Level I Developer
vii
Lesson 10-2. Aggregator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 Aggregator Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 219 Lesson 10-3. Active and Passive Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 Lesson 10-4. Data Concatenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 Lesson 10-5. Self-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222
Unit 10 Lab: Reload the Employee Staging Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Step 1: Copy an Existing Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 Step 2: Examine Source Data to Determine a Key for Self-Join . . . . . . . . . . . . . . . . . . . . . . 230 Step 3: Prepare the New Mapping for Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Step 4: Create a Sorter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 Step 5: Create a Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232 Step 6: Create an Aggregator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Step 7: Create a Joiner Transformation for the Self-Join . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 Step 8: Get Salaries from the Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 Step 9: Connect the Joiner and Lookup to the Target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235 Step 10: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
Unit 11: Router, Update Strategy and Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Lesson 11-1. Router Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 Lesson 11-2. Update Strategy Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244 Lesson 11-3. Expression Default Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 246 Lesson 11-4. Source Qualifier Override . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247 Lesson 11-5. Target Override . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 Lesson 11-6. Session Task Mapping Overrides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250
Unit 11 Lab: Load Employee Dimension Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Step 1: Copy the Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Step 2: Edit the Expression Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Step 3: Create a Router Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Step 4: Create an Update Strategy for INSERTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 Step 5: Create Lookup to DIM_DATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 Step 6: Link upd_INSERTS and lkp_DIM_DATES_INSERTS to Target DIM_EMPLOYEE_INSERTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Step 7: Create an Update Strategy for UPDATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Step 8: Create Second Lookup to DIM_DATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Step 9: Link upd_UPDATES and lkp_DIM_DATES_UPDATES to Target DIM_EMPLOYEE_UPDATES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260 Step 10: Link ERRORS Router Group to DIM_EMPLOYEES_ERR . . . . . . . . . . . . . . . . . . 261 Step 11: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 viii
Table of Contents Informatica PowerCenter 8 Level I Developer
Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264 Step 12: Prepare, Run, and Monitor the Second Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Unit 12: Dynamic Lookup and Error Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Lesson 12-1. Dynamic Lookup Cache . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269 Lesson 12-2. Error Logging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274 Error Log Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Log Row Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277 Log Source Row Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Unit 12 Lab: Load Customer Dimension Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Step 1: Create a Relational Source Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Step 2: Create a Relational Target Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Step 3: Create a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Step 4: Create a Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282 Step 5: Create a Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Step 6: Create an Update Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284 Step 7: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287 Error Log Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288 Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Unit 13: Unconnected Lookup, Parameters and Variables . . . . . . . . . . . . . . . . . . . . . . . 291 Lesson 13-1. Unconnected Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291 Connected versus Unconnected Lookup Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . 294 Joins versus Lookups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Lesson 13-2. System Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 Lesson 13-3. Mapping Parameters and Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297
Unit 13 Lab: Load Sales Fact Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Step 1: Create an Internal Relationship Between two Source Tables . . . . . . . . . . . . . . . . . . . 307 Step 2: Create a Mapping Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 Step 3: Step Three: Create an Unconnected Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 Step 4: Add Unconnected Lookup Test to Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Step 5: Create Aggregator Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309 Step 6: Create and Run the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312 Data Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314
Unit 14: Mapplets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Lesson 14-1. Mapplets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Table of Contents Informatica PowerCenter 8 Level I Developer
ix
Mapplets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 Mapping Input Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 Mapping Output Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
Unit 14 Lab: Create a Mapplet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Step 1: Create the Mapplet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326 Step 2: Add Mapplet to Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
Unit 15: Mapping Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 Lesson 15-1. Designing Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 High Level Process Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Mapping Specifics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332
Unit 15 Workshop: Load Promotions Daily Aggregate Table . . . . . . . . . . . . . . . . . . . . . 335 Workshop Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Sources and Targets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Mapping Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 Workflow Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 Run Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 340
Unit 16: Workflow Variables and Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Lesson 16-1. Link Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343 Lesson 16-2. Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 Lesson 16-3. Assignment Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 Lesson 16-4. Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 Lesson 16-5. Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348
Unit 16 Lab: Load Product Weekly Aggregate Table . . . . . . . . . . . . . . . . . . . . . . . . . . . 351 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Step 1: Copy the Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Step 2: Copy the Existing Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353 Step 3: Create the Assignment Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354 Step 4: Create the Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Step 5: Create the Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 Step 6: Create the Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 Step 7: Start the Workflow and Monitor the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
Unit 17: More Tasks and Reusability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Lesson 17-1. Event Wait Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 Pre-Defined Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364
x
Table of Contents Informatica PowerCenter 8 Level I Developer
User-Defined Event . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 365 Lesson 17-2. Event Raise Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 Lesson 17-3. Command Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367 Lesson 17-4. Reusable Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Lesson 17-5. Reusable Session Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 369 Lesson 17-6. Reusable Session Configurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370 Lesson 17-7. pmcmd Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
Unit 18: Worklets and More Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Lesson 18-1. Worklets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 Lesson 18-2. Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 Lesson 18-3. Control Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377
Unit 18 Lab: Load Inventory Fact Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Step 1: Copy the Mappings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Step 2: Create a Worklet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Step 3: Create a Session Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Step 4: Create a Timer Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383 Step 5: Create an Email Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384 Step 6: Create a Control Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 Step 7: Create the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Step 8: Start the Workflow and Monitor the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
Unit 19: Workflow Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Lesson 19-1. Designing Workflows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389 Workflow Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 Workflow Specifics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390 Unit 19 Workshop: Load All Staging Tables in Single Workflow . . . . . . . . . . . . . . . . . . . . . . . . 393 Workshop Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Unit 20: Beyond This Course. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397
Table of Contents Informatica PowerCenter 8 Level I Developer
xi
xii
Table of Contents Informatica PowerCenter 8 Level I Developer
List of Figures Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
2-1. Navigator Window . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-2. DEV_SHARED Folder and Subfolders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-3. Designer Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-4. DEV_SHARED Target subfolder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-5. Student folder with new objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-6. Application Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-7. Task Toolbar Default Position . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-8. Task Toolbar After Being Moved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-9. Relational Connection Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1. Normal view of the payment flat file definition displayed in the Source Analyzer . . . . 3-2. Mapping with Source and Target Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-3. Normal view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-4. Completed Session Task Target Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-5. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-6. Successful Run of a Workflow Depicted in the Task View of the Workflow Monitor . 3-7. Properties for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-8. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . 3-9. Data Preview of the STG_PAYMENT Target Table . . . . . . . . . . . . . . . . . . . . . . . . . 3-10. Source Definitions with a PK/FK Relationship Displayed in the Source Analyzer . . . 3-11. Normal View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-12. Generated SQL for the m_Stage_Product Mapping . . . . . . . . . . . . . . . . . . . . . . . . . 3-13. Properties of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-14. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 3-15. Data Preview of the STG_PRODUCT Target Table . . . . . . . . . . . . . . . . . . . . . . . . 3-16. Normal view of the promotions flat file definition displayed in the Source Analyzer . 3-17. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18. Properties of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-19. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 3-20. Data Preview of the STG_DEALERSHIP Target Table . . . . . . . . . . . . . . . . . . . . . . 3-21. Data Preview of the STG_PROMOTIONS Target Table . . . . . . . . . . . . . . . . . . . . 4-1. Source Analyzer View of the customer_layout Flat File Definition . . . . . . . . . . . . . . . 4-2. Target Designer View of the STG_CUSTOMERS Table Relational Definition . . . . . . 4-3. Mapping with Source and Target Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-4. Mapping with Newly Added Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5. Properties Tab of the Filter Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-6. Completed Properties Tab of the Filter Transformation . . . . . . . . . . . . . . . . . . . . . . 4-7. Filter Transformation Linked to the Expression Transformation . . . . . . . . . . . . . . . . 4-9. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-8. Sample Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-10. Session Task Source Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-11. Contents of the customer_list.txt File List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-12. Properties for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-13. Source/Target Statistics for the Completed Session Run . . . . . . . . . . . . . . . . . . . . . 4-14. Data Preview of the STG_CUSTOMERS Target Table . . . . . . . . . . . . . . . . . . . . . . 4-15. General Properties for the Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-16. Customized Repeat Selections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-17. Completed Schedule Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
List of Figures Informatica PowerCenter 8 Level I Developer
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. 18 . 18 . 19 . 21 . 22 . 22 . 23 . 24 . 25 . 38 . 39 . 40 . 42 . 43 . 43 . 44 . 44 . 45 . 52 . 54 . 54 . 56 . 56 . 56 . 62 . 63 . 63 . 64 . 64 . 65 . 82 . 83 . 83 . 84 . 85 . 85 . 86 . 88 . 88 . 90 . 90 . 91 . 91 . 92 . 93 . 94 . 94
xiii
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure
xiv
5-1. Normal View of the Heterogeneous Sources, Source Qualifiers and Target . . . . . . . . . . . . . . 5-2. Joiner Transformation Button . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-3. Normal View of Heterogeneous Sources Connected to a Joiner Transformation . . . . . . . . . . 5-5. Edit View of the Condition Tab for Joiner Transformation Without a Condition . . . . . . . . . 5-4. Edit View of the Ports Tab for the Joiner Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . 5-6. Edit View of the Condition Tab for the Joiner Transformation with Completed Condition . 5-7. Normal View of Completed Mapping Heterogeneous Sources Not Displayed . . . . . . . . . . . . 5-8. Task Details of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-9. Source/Target Statistics for the Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-10. Data Preview of the STG_TRANSACTIONS Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-11. View of an Unorganized Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-12. Arranged View of a Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-13. Iconic View of an Arranged Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-14. Selecting Multiple Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-15. Designer Warning Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-16. Selecting the forward link path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-17. Highlighted forward link path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-18. Highlighted link path going forward and backward . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-19. Selecting to propagate the attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-20. Propagation attribute dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-21. Autolink dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-22. Defining a prefix in the autolink dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-23. Expression after the AGE port has been moved . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-24. Click and drag method of moving ports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-25. Creating a transformation using the menu . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-26. Create Transformation dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-27. Normal View of the Newly Created Aggregator Transformation . . . . . . . . . . . . . . . . . . . . 5-28. Zoom options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-29. Navigator window in the Designer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-1. Source Analyzer view of the employees_layout flat file definition . . . . . . . . . . . . . . . . . . . . . 6-2. Target Designer view of the STG_EMPLOYEES relational table definition . . . . . . . . . . . . . 6-3. Transformation edit dialog box showing how to make a transformation reusable . . . . . . . . . 6-4. Question box letting you know the action is irreversible . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-5. Transformation edit dialog box of a reusable transformation . . . . . . . . . . . . . . . . . . . . . . . . 6-6. Navigator window depicting the Transformations node . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-7. Partial mapping with source and target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-8. Transformation Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-9. Lookup Transformation table location dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-10. Dialog box 1 of the 3 step Flat File Import Wizard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-11. Normal view of the newly created Lookup Transformation . . . . . . . . . . . . . . . . . . . . . . . . 6-12. Lookup Transformation condition box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-13. Source properties for the employee_list file list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-14. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-15. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-16. Data Preview of the STG_EMPLOYEES target table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-17. Mapping with Source and Target definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-18. Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-19. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-20. Source/Target Statistics for the session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-21. Data preview of the STG_DATES table - screen 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. 106 . 107 . 107 . 108 . 108 . 109 . 110 . 111 . 111 . 112 . 116 . 117 . 117 . 118 . 118 . 119 . 119 . 120 . 120 . 121 . 122 . 123 . 124 . 124 . 125 . 125 . 125 . 126 . 127 . 142 . 142 . 143 . 143 . 143 . 144 . 144 . 145 . 145 . 145 . 146 . 147 . 148 . 149 . 149 . 150 . 157 . 158 . 159 . 160 . 161
List of Figures Informatica PowerCenter 8 Level I Developer
Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure Figure table Figure Figure Figure Figure Figure Figure Figure Figure
6-22. Data preview of the STG_DATES table - screen 2 scrolled right . . . . . . . . . . . . . . . . . . . . . . . 161 7-1. Debug Session creation dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7-2. Debug Session connections dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171 7-3. Designer while running a Debug Session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 7-4. Customize Toolbars Dialog Box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 7-5. Debugger Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 8-1. Expanded view of m-DIM_DATES_LOAD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8-2. Sequence Generator Transformation icon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186 8-3. Normal view of the sequence generator NEXTVAL port connected to a target column . . . . . . . . 186 8-4. Normal view of connected ports to the target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 8-5. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8-6. Source/Target statistics for the session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 8-7. Data Preview of the DIM_DATES table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 9-1. m_DIM_PROMOTIONS_LOAD mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 9-2. m_DIM_DATES from the previous lab that populated the DIM_DATES table . . . . . . . . . . . . . 199 9-3. Select Lookup Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 9-4. Lookup Condition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 200 9-5. m_DIM_POROMOTIONS_LOAD completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 9-6. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9-7. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 9-8. Data Preview of the DIM_PROMOTIONS target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 9-9. Preview files created when Persistent Cache is set on Lookup Transformation . . . . . . . . . . . . . . . 203 9-10. Find in workspace dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 9-11. View Dependencies dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 9-12. Transformation compare objects dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208 9-13. Compare Transformation objects Properties details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 9-14. Target comparison dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 9-15. Column differences between two target tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 10-1. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD mapping . . . . . . . . . . . . . . . . . . . . . . . 230 10-2. Employee_central.txt. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 230 10-3. Renaming an instance of a Reusable Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10-4. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD after most links removed . . . . . . . . . . . 231 10-5. Sorter Transformation Icon on Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231 10-6. Aggregator Transformation Icon on Toolbar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 10-7. Partial mapping flow depicting the flow from the Sorter to the Filter to the Aggregator. . . . . . . 233 10-8. Split data stream joined back together . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 10-9. Iconic view of the completed self-join mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 10-10. Source properties for the employee_list.txt file list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236 10-11. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10-12. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 10-13. Data preview of the self-join of Managers and Employees in the STG_EMPLOYEES target screen 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 10-14. Data preview of the STG_EMPLOYEES target table - screen 2 scrolled right . . . . . . . . . . . . . 238 11-1. Mapping copy Target Dependencies dialog box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 11-2. Iconic view of the m_DIM_EMPLOYEES_MAPPING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 11-3. Router Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 11-4. Update Strategy set to INSERT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 259 11-5. Iconic view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 11-6. Source Filter Value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262 11-7. Writers section of Target schema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
List of Figures Informatica PowerCenter 8 Level I Developer
xv
Figure 11-8. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-9. Source/Target Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-10. Data Results for DIM_EMPLOYEES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-11. Data Results for the Error Flat File (Located on the Machine Hosting the Integration Service Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-12. Task Details tab results for second run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-13. Source/Target Statistics for second run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 11-14. Data preview showing updates to the target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-1. Port tab view of a dynamic Lookup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-2. Port to Port Association . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-3. Iconic View of the Completed Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-4. Error Log Choice Screen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-5. Task Details of the Completed Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-6. Source/Target Statistics for the Session Run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-7. Data preview of the DIM_CUSTOMERS table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 12-8. Flat file error log. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-1. Source Analyzer view of the STG_TRANSACTIONS and STG_PAYMENT tables . . . . . . . . . Figure 13-2. Declare Parameters and Variables screen. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-3. Parameter entry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-4. Lookup Ports tab showing input, output and return ports checked/unchecked . . . . . . . . . . . . Figure 13-5. Aggregator ports with Group By ports checked . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-6. Finished Aggregator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-7. Aggregator to Target Links . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-8. Iconic view of the completed mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-9. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-10. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 13-11. Data Preview of the FACT_SALES target table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 14-1. Mapplet Designer view of mplt_AGG_SALES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 14-2. Mapplet Designer view of MPLT_AGG_SALES with Input and Output transformations . . . . Figure 14-3. Iconic view of the m_FACT_SALES_LOAD_MAPPLET_xx mapping . . . . . . . . . . . . . . . . . . Figure 15-1. Source table definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-2. Target table definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-3. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-4. Source/Target Statistics of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 15-5. Data Preview of the FACT_PROMOTIONS_AGG_DAILY table . . . . . . . . . . . . . . . . . . . . . Figure 16-1. Workflow variable declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-2. Link condition testing if a session run was successful . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-3. Assignment Task expression declaration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-4. Decision Task Expression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-5. Link condition testing for a Decision Task condition of TRUE . . . . . . . . . . . . . . . . . . . . . . . Figure 16-6. Email Task Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-7. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-8. Gantt chart view of the completed workflow run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-9. View Workflow Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-10. Value of the $$WORKFLOW_RUNS variable after first run . . . . . . . . . . . . . . . . . . . . . . . . Figure 16-11. Gantt chart view of the completed workflow run after the weekly load runs . . . . . . . . . . . . . Figure 16-12. Task Details of the completed session run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-1. Timer Task Relative time setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-2. Email Task Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 18-3. Control Task Properties Tab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xvi
. 263 . 263 . 264 . 264 . 265 . 265 . 266 . 283 . 284 . 285 . 285 . 286 . 286 . 287 . 288 . 307 . 308 . 308 . 309 . 310 . 311 . 312 . 312 . 313 . 313 . 314 . 326 . 327 . 328 . 336 . 336 . 340 . 340 . 340 . 354 . 355 . 355 . 356 . 357 . 358 . 358 . 359 . 359 . 360 . 360 . 360 . 384 . 385 . 386
List of Figures Informatica PowerCenter 8 Level I Developer
Figure 18-4. Completed Worklet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 386 Figure 18-5. Completed Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387 Figure 18-6. Gantt chart view of the completed workflow run. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 387
List of Figures Informatica PowerCenter 8 Level I Developer
xvii
xviii
List of Figures Informatica PowerCenter 8 Level I Developer
Preface
Welcome to the PowerCenter 8 Level I Developer course. Data integration is a large undertaking with many potential areas of concern. The PowerCenter infrastructure will greatly assist you in your data integration efforts and alleviate much of your risk. This course will prepare the developers for that challenge by teaching you the most commonly used components of the product. The students will build a small data warehouse using PowerCenter to extract from source tables and files, transform the data, load it into a staging area and finally into the data warehouse. The instructor will teach you about mappings, transformations, sources, targets, workflows, sessions, workflow tasks, connections and the Velocity methodology.
Preface Informatica PowerCenter 8 Level I Developer
xix
About This Guide Purpose Welcome to the PowerCenter 8 Level I Developer course. This course is designed to: ♦
Enable you to use PowerCenter developer tools to: ♦
Create and debug mappings ♦ Create, run, monitor and troubleshoot workflows Provide experience in designing mappings
♦
Audience This course is designed for data integration and data warehousing implementers. You should be familiar with data integration and data warehousing terminology and in using Microsoft Windows.
Document Conventions This guide uses the following formatting conventions: If you see…
It means…
Example
>
Indicates a submenu to navigate to.
Click Repository > Connect. In this example, you should click the Repository menu or button and choose Connect.
xx
boldfaced text
Indicates text you need to type or enter.
Click the Rename button and name the new source definition S_EMPLOYEE.
UPPERCASE
Database tables and column names are shown in all UPPERCASE.
T_ITEM_SUMMARY
italicized text
Indicates a variable you must replace with specific information.
Connect to the Repository using the assigned login_id.
Note:
The following paragraph provides additional facts.
Note: You can select multiple objects to import by using the Ctrl key.
Tip:
The following paragraph provides suggested uses or a Velocity best practice.
Tip: The m_ prefix for a mapping name is…
Preface Informatica PowerCenter 8 Level I Developer
Other Informatica Resources In addition to the student guides, Informatica provides these other resources: ♦
Informatica Documentation
♦
Informatica Customer Portal
♦
Informatica web site
♦
Informatica Developer Network
♦
Informatica Knowledge Base
♦
Informatica Professional Certification
♦
Informatica Technical Support
Obtaining Informatica Documentation You can access Informatica documentation from the product CD or online help.
Visiting Informatica Customer Portal As an Informatica customer, you can access the Informatica Customer Portal site at http:// my.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica Knowledge Base, and access to the Informatica user community.
Visiting the Informatica Web Site You can access Informatica’s corporate web site at http://www.informatica.com . The site contains information about Informatica, its background, upcoming events, and locating your closest sales office. You will also find product information, as well as literature and partner information. The services area of the site includes important information on technical support, training and education, and implementation services.
Visiting the Informatica Developer Network The Informatica Developer Network is a web-based forum for third-party software developers. You can access the Informatica Developer Network at the following URL: http://devnet.informatica.com
The site contains information on how to create, market, and support customer-oriented add-on solutions based on interoperability interfaces for Informatica products.
Visiting the Informatica Knowledge Base As an Informatica customer, you can access the Informatica Knowledge Base at http:// my.informatica.com. The Knowledge Base lets you search for documented solutions to known technical issues about Informatica products. It also includes frequently asked questions, technical white papers, and technical tips.
Obtaining Informatica Professional Certification You can take, and pass, exams provided by Informatica to obtain Informatica Professional Certification. For more information, go to: http://www.informatica.com/services/education_services/certification/default.htm Preface Informatica PowerCenter 8 Level I Developer
xxi
Providing Feedback Email any comments on this guide to [email protected].
Obtaining Technical Support There are many ways to access Informatica Technical Support. You can call or email your nearest Technical Support Center listed in the following table, or you can use our WebSupport Service. Use the following email addresses to contact Informatica Technical Support: ♦
[email protected] for technical inquiries
♦
[email protected] for general customer service requests
WebSupport requires a user name and password. You can request a user name and password at http:// my.informatica.com .
xxii
North America / South America
Europe / Middle East / Africa
Asia / Australia
Informatica Corporation Headquarters 100 Cardinal Way Redwood City, California 94063 United States
Informatica Software Ltd. 6 Waltham Park Waltham Road, White Waltham Maidenhead, Berkshire SL6 3TN United Kingdom
Informatica Business Solutions Pvt. Ltd. 301 & 302 Prestige Poseidon 139 Residency Road Bangalore 560 025 India
Toll Free 877 463 2435
Toll Free 00 800 4632 4357
Toll Free Australia: 00 11 800 4632 4357 Singapore: 001 800 4632 4357
Standard Rate United States: 650 385 5800
Standard Rate Belgium: +32 15 281 702 France: +33 1 41 38 92 26 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511 445
Standard Rate India: +91 80 5112 5738
Preface Informatica PowerCenter 8 Level I Developer
Unit 1: Data Integration Concepts In this unit you will learn about: ♦
Informatica
♦
Data Integration
♦
Mapping and Transformations
♦
Workflows and Tasks
♦
Metadata
Lesson 1-1. Introducing Informatica Informatica provides data integration tools for both batch and real-time applications.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
1
Informatica is affiliated with many standards organizations, including:
2
♦
Integration Consortium. www.eaiindustry.org
♦
Object Management Group (OMG). www.omg.org.
♦
Common Warehouse Metamodel (CWM). www.omg.org/cwm
♦
Enterprise Grid Alliance. www.gridalliance.org
♦
Global Grid Forum (GGF). www.gridforum.org Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
♦
XML.org. www.xml.org
♦
Web Services Interoperability Organization. www.ws-i.org
♦
Supply-Chain Council. www.supply-chain.org
♦
Carnegie-Mellon Software Engineering Institute (SEI). www.sei.cmu.edu
♦
APICS Educational and Research Foundation. www.apics.org
♦
Shared Services and Business Process Outsourcing Association (SBPOA). www.sharedxpertise.org.
Additional resources about Informatica can be found on the following websites: ♦
www.informatica.com—provides information on Professional Services and Education Services
♦
my.informatica.com—provides access to Technical Support, product documentation, Velocity methodology, knowledge base, and mapping templates
♦
devnet.informatica.com—the Informatica Developers Network offers discussion forums, web seminars, and technical papers.
Lesson 1-2. Data Integration Traditionally, data integration is a batch process—to extract, transform and load (ETL) data from transactional systems to data warehouses.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
3
The ETL process can be imagined as an assembly line.
Informatica PowerCenter is deployed for a variety of batch and real-time data integration purposes:
4
♦
Data Migration. ERP consolidation, legacy conversion, new application implementation, system consolidation
♦
Data Synchronization. application integration, business to business data transfer
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
♦
Data Warehousing. Business intelligence reporting, data marts, data mart consolidation, operational data stores
♦
Data Hubs. master data management; reference data hubs; single view of customer, product, supplier, employee, etc.
♦
Business Activity Monitoring. business process improvement, real-time reporting
Informatica partners with Composite Software for Enterprise Information Integration (EII): on-the-fly federated views and real-time reporting of information spread across multiple data sources, without moving the data into a centralized repository.
Lesson 1-3. Mappings and Transformations Mappings
Transformations Transformations change the data they receive.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
5
PowerCenter includes the following types of transformations:
6
♦
Passive. The number of rows entering and exiting the transformation are the same.
♦
Active. The number of rows exiting the transformation may not be the same as the number of rows entering the transformation.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
Commonly used PowerCenter transformations include: ♦
Source Qualifier - reads sources
♦
Filter - filters data conditionally
♦
Sorter - sorts data
♦
Expression - performs logical/mathematical functions on data
♦
Aggregator - sums, averages, maximum, minimum
♦
Joiner - joins two data flows
♦
Lookup - looks up a corresponding value from a table or flat file
Lesson 1-4. Tasks and Workflows Tasks A task is an executable set of actions, functions, or commands.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
7
Workflows
A workflow is a set of ordered tasks that describe runtime ETL processes. Tasks can be sequenced serially, in parallel and conditionally. Each linked icon represents a task.
8
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
Lesson 1-5. Metadata Metadata, which means “data about data,” is information that describes data. Common contents of metadata include the source or author of a dataset, how the dataset should be accessed, and its limitations.
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
9
10
Unit 1: Data Integration Concepts Informatica PowerCenter 8 Level I Developer
Unit 2: PowerCenter Components and User Interface In this unit you will learn about: ♦
PowerCenter Architecture
♦
PowerCenter Client Tools
Lesson 2-1. PowerCenter Architecture The following screenshot shows the PowerCenter Architecture:
♦
Sources—Can be relational tables or heterogeneous files (flat files, VSAM files and XML)
♦
Targets—Can be relational tables or heterogeneous files
♦
Integration Service—The engine that performs all of the extract, transform and load logic
♦
Repository Service—Manages connectivity to the metadata repositories that contain mapping and workflow definitions
♦
Repository Service Process—Multi-threaded process that retrieves, inserts and updates repository metadata
♦
Repository—Contains all of the metadata needed to run ETL processes
♦
Client Tools—Desktop tools used to populate the repository with metadata, execute workflows on the Integration Service, monitor the workflows and manage the repository
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
11
Lesson 2-2. PowerCenter Client Tools Client tools run on Microsoft Windows.
All tools access the repository through the Repository Service. Workflow Manager and Workflow Monitor connect to Integration Service. Each client application has its own interface. The interfaces have toolbars, a navigation window to the left, a workspace to the right, and an output window at the bottom.
12
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Designer
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
13
Within the Designer, you can display transformations in the following views:
14
♦
Iconized. Shows the transformation in relation to the rest of the mapping. This also minimizes the screen space needed to display a mapping.
♦
Normal. Shows the flow of data through the transformation. This view is typically used when copying/linking ports to other objects.
♦
Edit. Shows transformation ports and properties; allows editing. This view is used to add, edit, or delete ports and to change any of the transformation attributes or properties.
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Workflow Manager
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
15
In the Workflow Manager, you can display tasks in the following views:
16
♦
Iconized (Session task example)
♦
Edit (Session task example)
Unit 2: PowerCenter Components and User Interface Informatica PowerCenter 8 Level I Developer
Unit 2 Lab: Using the Designer and Workflow Manager Business Purpose You have been asked to learn how to use Informatica PowerCenter in order to more efficiently accomplish your ETL objectives and automate the development process. Because you have limited or no prior exposure to this software, this exercise will serve to orient you to the basic development interfaces.
Technical Description PowerCenter includes two development applications: the Designer, which you will use to create mappings, and the Workflow Manager, which you will use to create and start workflows. This exercise is designed to serve as your first hands-on experience with PowerCenter, and supplement the instructor demonstrations. You will import source and target definitions from a shortcut folder into your own folder.
Objectives ♦
Learn how to navigate the repository folder structure.
♦
Understand the purpose of the tools accessed from the Designer and Workflow Manager.
♦
Create and save source and target shortcuts.
♦
Learn how to access and edit the database connection objects
Duration 30 minutes
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
17
Instructions Step 1: Launch the Designer and Log Into the Repository 1.
Launch the Designer client application from the desktop icon. If no desktop icon is present, select Start > Programs > Informatica PowerCenter … > Client > PowerCenter Designer.
2.
Maximize the Designer window. Note: Notice the Navigator window on the left side, which should resemble Figure 2-1. However, you
may see additional or fewer repositories, depending on your classroom environment. Figure 2-1. Navigator Window
3.
Log into the PC8_DEV repository with the user name studentxx, where xx represents your student number as assigned by the instructor. The password is the same. Passwords are always case-sensitive. Tip: The user name to log into the repository is an application-level user name—it allows PowerCenter to admit you to the repository with a specific set of application privileges. It is not a database user name.
Step 2: Navigate Folders 1.
Double-click the folder DEV_SHARED. This opens the folder and shows you the subfolders associated with it. Figure 2-2 shows the Navigator: Figure 2-2. DEV_SHARED Folder and Subfolders
18
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
Note: Notice that the DEV_SHARED folder has a small blue arm holding it. This icon denotes that DEV_SHARED is a shortcut folder. As you will see later in this lab, objects dragged from a shortcut folder into an open folder create shortcuts to the object. Tip: Technically, all folders are “shared” with all users who have the appropriate folder permissions, regardless whether it has the blue arm or not. Do not confuse repository folders with the directories visible in Windows Explorer. The folders are PowerCenter repository objects and are not related to Windows directories. 2.
Expand some of the subfolders to see the objects they hold. Note that some subfolders are empty. When a new object, such as a target definition, is created within a folder, it automatically goes into the appropriate subfolder. Note: Notice that within the Sources subfolder, the source objects are organized under individual “nodes” (branches in the hierarchy), such as FlatFile, ODBC_EDW, etc. These are based on the type of source and the name of the Data Source Name that was used to import the source definition (more on this later). Very Important: You will need to click on these source nodes to locate source definitions that may be “hiding” from view. Tip: Subfolders are created and managed automatically. Users cannot create, delete, nest, or rename subfolders.
Each PowerCenter application, such as the Designer, shows only subfolders related to the objects that can be created and modified by that application. For example, in the Designer you only see subfolders for sources, targets, mappings, etc. 3.
Double-click on your individual student folder. For the remainder of the class, you will create and modify objects in this folder. Some pre-made objects have been provided as well. Note: Your student folder is now the “open” folder. Only one folder at a time can be open. The DEV_SHARED folder is now “expanded.” This distinction is important, as you will see later in this lab.
Step 3: Navigating the Designer Tools 1.
Select the menu option Tools > Source Analyzer. The workspace to the right of the Navigator window changes to an empty space. Note: Note the small toolbar directly to the right of the Navigator window, at the top. These are the five Designer tools. Each tool allows you to create and modify one specific type of object, such as sources. Figure 2-3 shows the Designer tools with the first tool (the Source Analyzer) selected. Figure 2-3. Designer Tools
2.
With your left mouse button, alternately toggle between the five tools.
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
19
The name of each tool is displayed in the upper left corner of the workspace when that tool is active. Note: The main menu bar (very top of your screen) changes depending on which tool is active. Because these menus are context-sensitive to which tool is active, you must already be in the appropriate tool to create or modify a specific type of object. ♦ ♦ ♦
♦ ♦
The Source Analyzer tool is used to create or modify source objects. They may be relational, flat file, XML or COBOL sources. The Target Designer tool is used to create or modify target objects. They may be relational, flat file, or XML. It does not matter whether these targets are part of an actual data warehouse. The Transformation Developer tool is used to create or modify reusable transformations. Nonreusable transformations are created directly in a mapping or mapplet. This distinction will be covered later in the class. The Mapplet Designer tool is used to create or modify mapplets. The Mapping Designer tool is used to create or modify mappings.
Step 4: Create and Save Shortcuts 1.
Ensure that the Target Designer is active and that your student folder is open. Important: In order to copy/shortcut any object into a folder, the destination folder (the folder you
are adding to) must be the open folder. If the destination folder is not open, the copy/shortcut will not work. 2.
To help view which folder is active, choose View > Workbook to view the PowerCenter Client in Workfbook view. The PowerCenter Client displays tabs for each folder at the bottom of the Main window:
PowerCenter Client shows tabs in Workfbook view.
20
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
3.
In the DEV_SHARED folder, expand the Targets subfolder by clicking on the + sign to the left of the subfolder. Figure 2-4 shows the Navigator window: Figure 2-4. DEV_SHARED Target subfolder
4.
Drag and drop the STG_PAYMENT target from the Navigator into the Target Designer workspace. You will see the confirmation message, “Create a shortcut to the target table STG_PAYMENT?”
5.
Click Yes at the confirmation message.
6.
Expand the Targets subfolder in your Student folder. Note that you have added a shortcut of the STG_PAYMENT staging target table to your own folder. Tip: PowerCenter shortcuts are “pointers” to the original object. They can be used but they cannot be modified as shortcuts. The original object can be modified, and any changes will immediately affect all shortcuts to that object.
7.
Open the Source Analyzer tool in your student folder.
8.
In the DEV_SHARED folder, expand the Sources subfolder and expand the FlatFile container.
9.
Add shortcuts to your folder to the two source definitions listed below. ♦ ♦
PROMOTIONS PAYMENT
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
21
10.
Confirm that your student folder appears similar to Figure 2-5: Figure 2-5. Student folder with new objects
11.
Use the menu option Repository > Save to save these objects in your student folder. Tip: You should periodically save changes to the repository when using the Designer or the Workflow Manager. The keyboard shortcut Ctrl+S can also be used. There is no “auto-save” feature.
Step 5: Launch the Workflow Manager 1.
Left-click the toolbar icon for the Workflow Manager shown in Figure 2-6. This toolbar is usually above the Navigator window. Figure 2-6. Application Toolbar Workflow Manager Button
2.
Confirm that the Workflow Manager launches and you are automatically logged into the repository the same way as you were in the Designer.
3.
Maximize the Workflow Manager application. Tip: Avoid having two or more “instances” of the same PowerCenter application (such as the Workflow Manager) running on a machine at the same time. There is no benefit in doing this, and it can result in confusion when editing objects.
22
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
4.
Browse through the various folders and subfolders in the Workflow Manager Navigator window as you did in the Designer. Note that only subfolders for the objects that can be created with the Workflow Manager are present, Tasks, Sessions, Worklets, and Workflows. Note: Although a session object is a type of task, it gets its own subfolder because you will typically have many more sessions than the other types of tasks. Only reusable sessions will appear in the Sessions subfolder. Likewise, only reusable tasks (except for sessions) will appear in the Tasks subfolder.
Step 6: Navigating the Workflow Manager Tools 1.
Select the menu option Tools > Task Developer. Just as in the Designer, you will see the workspace clear itself and a toolbar appear to the right of the Navigator window. The idea is the same as with the Designer, except there are three tools instead of five.
2.
With your left mouse button, alternately toggle between the three tools. Note that the name of each tool is displayed in the upper left corner of the workspace when that tool is active. Note also the context-sensitive menus, as we did in the Designer. ♦ ♦ ♦
The Task Developer tool is used to create or modify reusable tasks. The Worklet Designer tool is used to create or modify worklets. The Workflow Designer tool is used to create or modify workflows
Step 7: Workflow Manager Task Toolbar The Workflow Manager is equipped with a toolbar that shows an icon for each type of task that can be created. This toolbar is visible by default, but the default location of is at the top right-hand corner of the screen. We will move the toolbar to a more central location. 1.
Locate the “vertical stripe” at the far-left hand side of the task bar, as shown in Figure 2-7: Figure 2-7. Task Toolbar Default Position
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
23
2.
With your left mouse button, drag the toolbar toward the left and drop it in a convenient location so that all of the buttons are visible. The top of your Workflow Manager should appear similar to Figure 2-8: Figure 2-8. Task Toolbar After Being Moved
Step 8: Database Connection Objects Later in the class, we will create sessions that will read data from database source and target tables. In order to open a connection to the respective databases, PowerCenter needs the database log-in and the designation (i.e., connection string, database name or server name). Instead of requiring the user to type this information each time a session is created, PowerCenter allows us to create reusable and sharable database connection objects. These objects contain properties describing one database connection. The objects can be associated with multiple sessions to describe either source, target, or lookup connections. 1.
24
In the Workflow Manager, select the menu option Connections > Relational.
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
You will see the Relational Connection Browser similar to Figure 2-9. Figure 2-9. Relational Connection Browser
Note: Note that each connection object is organized under a database type. 2.
Double-click on the NATIVE_TRANS connection object to display its properties.
3.
You will not have write privileges. Click OK. Note: Note that the connection NATIVE_TRANS will log into the database with the user name sdbu.
The connection object will be shared among the students in the class. 4.
Double-click on any of the other objects that have your student number. The NATIVE_STG07 connection, for example, will have the user name tdbu07. These are the individual student connections to be used to read from and write to your individual staging tables and the enterprise data warehouse (EDW) tables. It’s intuitive to create additional connection objects. Experiment if you have extra time. Tip: Database connection objects are not associated with a specific folder.
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
25
26
Unit 2 Lab: Using the Designer and Workflow Manager Informatica PowerCenter 8 Level I Developer
Unit 3: Source Qualifier In this unit you will learn about: ♦
Source Qualifier transformation
♦
Velocity Methodology
♦
Source Qualifier joins
♦
Source pipelines
Lesson 3-1. Source Qualifier Transformation Source Qualifier Transformation
Type Active.
Description A Source Qualifier transformation: ♦
Selects records from flat file and relational table sources. Only those fields or columns used in the mapping are selected, based on the output connections.
♦
Converts the data from the source’s native datatype to the most compatible PowerCenter transformation datatype.
♦
Generates a SQL query for relational sources.
♦
Can perform homogeneous joins between relational tables on the same database.
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
27
Properties
The following table describes the Source Qualifier transformation properties: Property
Description
Sql Query
Allows you to override the default SQL query that PowerCenter creates at runtime.
User Defined Join
Allows you to specify a join that replaces the default join created by PowerCenter.
Source Filter
Allows you to create a where clause that will be inserted into the SQL query that is generated at runtime. The “where” portion of the statement is not required. For example: Table1.ID = Table2.ID
Number of Sorted Ports
PowerCenter will insert an order by clause in the generated SQL query. The order by will be on the number of ports specified, from the top down. For example, in the sq_Product_Product_Cost Source Qualifier, if the number of sorted ports = 2, the order by will be: ORDER BY PRODUCT.PRODUCT_ID, PRODUCT.GROUP_ID.
Tracing Level
Specifies the amount of detail written to the session log.
Select Distinct
Allows you to select distinct values only.
Pre SQL
Allows you to specify SQL that will be run prior to the pipeline being run. The SQL will be run using the connection specified in the session task.
Post SQL
Allows you to specify SQL that will be run after the pipeline has been run. The SQL will be run using the connection specified in the session task.
Business Purpose The use of a Source Qualifier is a product requirement; other types of sources require equivalent transformations (XML Source Qualifier, etc.). It provides an efficient way to filter input fields/columns and to perform homogeneous joins.
28
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
Datatype Conversion
Data can be converted from one datatype to another by: ♦
Passing data between ports with different datatypes
♦
Passing data from an expression to a port
♦
Using transformation functions
♦
Using transformation arithmetic operators
Supported conversions are: ♦
Numeric datatypes <=> Other numeric datatypes
♦
Numeric datatypes <=> String
♦
Date/Time <=> Date or String (to convert from string to date the string must be in the default PowerCenter data format MM/DD/YYYY HH24:MI:SS)
Similarly, when writing to a target the Integration Service converts the data to the target’s native datatype. For further information, see the PowerCenter Client Help > Index > port-to-port data conversion.
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
29
Lesson 3-2. Velocity Methodology In labs, we will use Informatica's Velocity methodology.
This methodology includes: ♦
Templates ♦
♦
Mapping specification templates ♦ Source to target field matrix Naming conventions
♦
Object type prefixes: m_, exp_, agg_, wf_, s_, …
♦
Best practices
Velocity covers the entire data integration project life cycles: Phase 1: Manage Phase 2: Architect Phase 3: Design Phase 4: Build Phase 5: Test Phase 6: Deploy Phase 7: Operate For more information, see http://devnet.informatica.com (requires registration).
Lab Project The Mersche Motors data model consists of the following star schemas. The labs predominately use the Sales star schema. 30
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
Data is moved first to the staging area and from there to the data warehouse and target flat files.
The labs can source from flat files and/or a relational database.
Source Tables and Files The source system has the following relational tables: DEALERSHIP
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
31
PRODUCT PRODUCT_COST
The source system has the following flat files: customer_layout dates employees_layout inventory payment promotions sales_transactions
Staging Area The staging area has the following tables: STG_CUSTOMERS STG_DATES STG_DEALERSHIP STG_EMPLOYEES STG_INVENTORY STG_PAYMENT STG_PRODUCT STG_PROMOTIONS STG_TRANSACTIONS
Data Warehouse The data warehouse has the following tables: DIM_CUSTOMERS DIM_DATES DIM_DEALERSHIP DIM_EMPLOYEES DIM_PAYMENT DIM_PRODUCT DIM_PROMOTIONS FACT_INVENTORY FACT_PRODUCT_AGG_DAILY FACT_PRODUCT_AGG_WEEKLY FACT_PROMOTIONS_AGG_DAILY FACT_SALES
32
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
Architecture and Connectivity
Architecture The labs use the following architecture and connections: Integration Service: PC_IService Repository Name: PC8_DEV Folders: Student 01 - 20 User Names: student01 - 20 Passwords: student01 - 20
Connectivity ODBC Connections: Source Tables
ODBC_TRANS
Staging Area
ODBC_STG (01 - 20)
Data Warehouse
ODBC_EDW (01 - 20)
Native Connections: Source Tables
NATIVE_TRANS
Staging Area
NATIVE_STG (01 - 20)
Data Warehouse
NATIVE_EDW (01 - 20)
Relational Source
sdbu with password sdbu
Relational Targets
tdbu01 - 20
Passwords
tdbu01 - 20
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
33
34
Unit 3: Source Qualifier Informatica PowerCenter 8 Level I Developer
Unit 3 Lab A: Load Payment Staging Table Section 1: Pass-Through Mapping Business Purpose The staging area of the Mersche Motors data warehouse contains a table that assigns payment type descriptions for each payment ID. Because these descriptions may change, the table must be synchronized daily with the corresponding data located in the operational system. The operational system administrator uses a simple flat file to record and edit these descriptions.
Technical Description PowerCenter will source from a delimited flat file and insert the data into a database table without performing data transformations. In order to avoid duplicate records in subsequent loads, we will configure PowerCenter to truncate the target table before each load.
Objectives ♦
Open the Designer Tools and switch between Workspaces
♦
Import a flat file definition
♦
Import a table definition
♦
Create a simple pass-through mapping
♦
Create a Session task to run the mapping and configure connectivity
♦
Create a Workflow to run the Session task
♦
Run the Workflow and monitor the results
Duration 35 minutes
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
35
Velocity Deliverable: Mapping Specifications Mapping Name
m_Stage_Payment_Type_xx
Source System
Flat file
Target System
Oracle Table
Initial Rows
5
Rows/Load
5
Short Description
Simple pass-through mapping, comma-delimited flat-file to Oracle table
Load Frequency Preprocessing
Target truncate
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
PAYMENT_ID, PAYMENT_TYPE_DESC
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
File Name
File Location
Fixed/Delimited
Additional File Info
payment.txt
C:\pmfiles\SrcFiles
Delimited
Comma delimiter
Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
Files
TARGETS
STG_PAYMENT
Insert
Unique Key
X
HIGH LEVEL PROCESS OVERVIEW
Source
Target
PROCESSING DESCRIPTION (DETAIL) This is a pass-through mapping with no data transformation.
36
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
37
Payment_id
Payment_type_desc
STG_PAYMENT
STG_PAYMENT
Varchar2(20)
Number(3,0)
Datatype
PAYMENT
PAYMENT
Source Table
Payment_type_desc
Payment_id
Source Column
String(10)
Decimal(3,0)
Datatype
Expression
reference. In future labs we will be using a shortened version of the matrix.
Velocity Best Practice: This is the Velocity Source to Target Field Matrix. It is displayed here for you
Target Column
Target Table
SOURCE TO TARGET FIELD MATRIX Default Value if Nulls Data Issues/ quality
Instructions Step 1: Launch the Designer and Review the Source and Target Definitions 1.
Launch the Designer application by selecting Start > Programs > Informatica PowerCenter … > Client > PowerCenter Designer. Tip: If an instance of the Designer is already running on your workstation, do not launch another
instance. It is unnecessary and potentially confusing to run more than one instance per workstation. 2.
Log into the PC8_DEV repository with the user name studentxx and password studentxx where xx represents your student number as assigned by the instructor.
3.
Open your student folder by double-clicking on it.
4.
Open the Source Analyzer selecting the menu option Tools > Source Analyzer.
5.
Drag the Shortcut_to_payment source file from the Sources subfolder into the Source Analyzer workspace. Confirm that your source definition appears the same as displayed in Figure 3-1. You may have to drag the box wider to see the Length column. Figure 3-1. Normal view of the payment flat file definition displayed in the Source Analyzer
6.
Open the Target Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
7.
Drag the Shortcut_to_STG_PAYMENT target table definition from the Targets subfolder into the Target Designer workspace.
8.
Review the target definition.
Step 2: Create a Mapping 1.
38
Open the Mapping Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
2.
Select the menu option Mappings > Create a.
Delete the default mapping name and enter the name m_Stage_Payment_Type_xx, xx refers to your student number.
b.
Click OK.
Velocity Best Practice: The m_ as a prefix for a mapping name is specified in the Informatica Velocity
Best Practices. Mapping names should be clear and descriptive so that others can immediately understand the purpose of the mapping. Velocity suggests either the name of the targets being accessed or a meaningful description of the function of the mapping. 3.
Perform the following steps in the Navigator window: a.
Expand the Sources subfolder.
b.
Expand the FlatFile node
c.
Drag-and-drop the source Shortcut_to_payment into the mapping.
4.
Expand the Targets subfolder, and drag-and-drop the target Shortcut_to_STG_PAYMENT into the mapping.
5.
Select the menu option View > Navigator. This will temporarily remove the Navigator window from view in order to increase your mapping screen space. Your mapping should appear as displayed in Figure 3-2. Figure 3-2. Mapping with Source and Target Definitions
6.
Select the Source Qualifier transformation. a.
Drag-and-drop the port PAYMENT_ID from the Source Qualifier to the PAYMENT_ID field in the target definition.
b.
Drag-and-drop the Source Qualifier port PAYMENT_TYPE_DESC to the PAYMENT_TYPE_DESC field in the target definition.
Tip: When linking ports in a mapping as described above, ensure that the tip of your mouse cursor is touching a letter in the name or datatype or any property for the port you are dragging. 7.
Right-click in a blank area within the mapping and choose the menu option Arrange All.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
39
Your mapping should appear the same as displayed in Figure 3-3. Figure 3-3. Normal view of the completed mapping
8.
Type Ctrl+S to save your work to the repository. Tip: The Output Window displays messages about the results of an action taken in the Designer.
9.
Confirm that your Output Window displays the following message: *******Mapping m_Stage_Payment_Type is VALID ******* mapping m_Stage_Payment_Type inserted. -----------------------------------------------------
Step 3: Create a Workflow and a Session Task 1.
Launch the Workflow Manager by clicking on the respective icon in the toolbar. The icon is shown highlighted below:
2.
Open the Workflow Designer by clicking the respective icon in the toolbar. The icon is shown highlighted below:
3.
Select the menu option Workflows > Create. a.
Delete the default Workflow name and enter wkf_Load_STG_PAYMENT_xx (xx refers to your student number).
Velocity Best Practice: The wkf_ as a prefix for a Workflow name is specified in the Informatica
Velocity Methodology. b. 4.
Click OK.
Select the menu option Tasks > Create. a.
Select session from the Select the task type to create drop-box.
b.
Enter the Session name s_m_Stage_Payment_Type_xx (xx refers to your student number).
Velocity Best Practice: The s_ as a prefix for a session name is specified in the Informatica Velocity
Methodology. The Velocity recommendation for a session name is s_mappingname.
40
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
5.
c.
Click the Create button.
d.
The Mappings list box shows the mappings saved in your folder. Confirm that the m_Stage_Payment_Type_xx mapping is selected and click OK.
e.
Click Done.
Drag the newly created session to middle of the screen. Tip: If you select the session icon from the task toolbar instead of using the Tasks > Create menu option the client tool will name the session for you, using the correct velocity standard name: s_mappingname. It will also place the session where you click in the workspace.
6.
Double-click on the session task that you just created to open it in edit mode. a.
Select the Mapping tab.
b.
Select the Source Qualifier icon SQ_Shortcut_to_payment (in the Session properties navigator window).
c.
In the Properties area scroll down and confirm the source file name and location. i.
Source file directory: $PMSourceFileDir\
ii.
Source filename: payment.txt.
Tip: When the Integration Service process runs on UNIX or Linux, the filename is case sensitive. d.
Select the target Shortcut_to_STG_PAYMENT (in the Session properties navigator window).
e.
Using the Connections list box, select the NATIVE_STGXX connection object, where XX represents your student number assigned by the instructor.
f.
In the Properties area, confirm that the load type is Bulk. Tip: Setting the load type to bulk will use the target RDBMS bulk loading facility.
g.
In the Properties area, scroll down until the property Truncate target table option is visible. Select the check-box.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
41
Your session task information should appear similar to that displayed in Figure 3-4. Figure 3-4. Completed Session Task Target Properties
h.
Click OK.
7.
Type Ctrl+S to save your work to the repository.
8.
Confirm that your Output Window displays the message: *******Workflow wkf_Load_STG_PAYMENT is INVALID******* Workflow wkf_Load_STG_PAYMENT inserted ------------------------------------------------------
Tip: For this section you have created a non-reusable session within the workflow. This session
exists only within the context of the workflow.
42
9.
Click the Link Tasks icon in the Tasks Toolbar shown below.
10.
Holding down the left mouse button, drag from the Start Task to the s_m_Stage_Payment_Type_xx Session Task and release the mouse. This will establish a link from the Start Task to the Session Task.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Your workflow should appear as displayed in Figure 3-5. Figure 3-5. Completed Workflow
11.
Type Ctrl+S to save your work to the repository. Confirm that your Output Window displays the message: ...Workflow wkf_Load_STG_PAYMENT tasks validation completed with no errors. ******* Workflow wkf_Load_STG_PAYMENT is VALID ******* Workflow wkf_Load_STG_PAYMENT updated. ---------------------------------------
Step 4: Run the Workflow and Monitor the Results 1.
Right-click on a blank area near the Workflow and inside the workspace, select Start Workflow.
2.
If Workflow Monitor is already opened, the workflow and session will automatically display. However, if the Monitor opens new:
3.
a.
Right click on the PC8_DEV repository and choose Connect.
b.
Log in with your studentxx id and password.
c.
Right click on PC_IService and choose Connect.
d.
Right click on your Studentxx folder and choose Open.
e.
Right click on wkf_Load_STG_PAYMENT_xx and select Open Latest 20 Runs.
Maximize the Workflow Monitor. Note there are two tabs above the Output window: Gantt Chart and Task View.
4.
Select Task View. Your information should appear similar to what is displayed in Figure 3-6. Figure 3-6. Successful Run of a Workflow Depicted in the Task View of the Workflow Monitor
5.
Scroll down through the Task Details window.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
43
Your information should appear as displayed Figure 3-7. Figure 3-7. Properties for the Completed Session Run
6.
Select the Source /Target statistics tab. Expand the node for the source and target. Note that for the Source and Target objects in the mapping, there is a count of the rows in various categories, such as Applied Rows (success), Affected (transformed), and Rejected, also an estimated throughput speed. Figure 3-8. Source/Target Statistics for the Completed Session Run
7.
8.
Right-click the Session again and select Get Session Log. a.
Session log will be displayed.
b.
Review the log and note the variety of information it shows.
c.
Close the Session Log.
Select the Gantt Chart tab. Note that the Workflow and the Session are displayed within a horizontal timeline.
44
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Data Results In the Designer, you can view the data that was loaded into the target. 1.
Right-click on the STG_Payment target definition.
2.
Select Preview Data.
3.
Set the ODBC Data Source drop-box to the ODBC_STG Data Source Name.
4.
Enter the user name tdbuxx, where xx represents your student number as assigned by the instructor.
5.
Enter the password tdbuxx and click the Connect button. Your data should appear as displayed in Figure 3-9. Figure 3-9. Data Preview of the STG_PAYMENT Target Table
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
45
Lesson 3-3. Source Qualifier Joins A Source Qualifier can join data from multiple relational tables on the same database (homogeneous join) if the tables have a primary key-foreign key relationship defined in the Source Analyzer. These columns do not have to be keys on the source database, but they should be indexed for best performance.
46
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
The join is performed on the source database at runtime (when SQL generated by the Source Qualifier executes). Joining data in a Source Qualifier allows the Integration Service to read data in multiple tables in a single pass, which can improve session performance.
In a case where there is no PK/FK relationship you can specify a User Defined Join. Enter the join condition in the Source Qualifier properties e.g. tableA.EmployeeID=TableB.EmployeeID. By default you get an inner join—use SQL Query override to specify other join types.
Example A business sells a high volume of products and updates the Product Dimension table on a regular basis. To update the dimension table, a join of the PRODUCT and PRODUCT_COST table is required. Since the source tables are from the same database and have a key relationship only a single Source Qualifier transformation is needed.
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
47
Note the primary key-foreign key relationship between the PRODUCT_ID field of the PRODUCT table and the PRODUCT_CODE field of the PRODUCT_COST table.
Performance Considerations For relational sources, the number of rows processed can be reduced by using SQL override and adding a “WHERE” clause or by the use of the “Source Filter” attribute if not all rows are required. Also, the default SQL generated by the Source Qualifier can be customized to improve performance.
48
Unit 3 Lab A: Load Payment Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab B: Load Product Staging Table Section 2: Homogeneous Join Business Purpose There are two Oracle tables that together contain vital information about the products sold by Mersche Motors. You need to combine the data from both tables into a single staging table that can be used as a source of data for the data warehouse.
Technical Description PowerCenter will define a homogeneous join between the two Oracle source tables. The source database server will perform an inner join on the tables based on a join statement automatically generated by the Source Qualifier. The join set will be loaded into the staging table.
Objectives ♦
Import relational source definitions
♦
View relationships between relational sources
♦
Use a Source Qualifier to define a homogeneous join and view the statement
Duration 30 minutes
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
49
Velocity Deliverable: Mapping Specifications Mapping Name
m_Stage_Product_xx
Source System
Oracle Tables
Target System
Oracle Table
Initial Rows
48
Rows/Load
48
Short Description
This mapping joins the product table and the product cost table and loads data to the staging area
Load Frequency
Once
Preprocessing
Target truncate
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
PRODUCT.PRODUCT_ID, PRODUCT_COST.PRODUCT_CODE
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
PRODUCT
SDBU
N/A
PRODUCT_COST
SDBU
N/A
TARGETS Tables
Schema Owner
Table Name
Update
TDBUxx Delete
STG_PRODUCT
Insert
Unique Key
X
HIGH LEVEL PROCESS OVERVIEW Source 1 Target Source 2
50
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
PROCESSING DESCRIPTION (DETAIL) This is a simple mapping that joins the PRODUCT table to the PRODUCT_COST to obtain cost information for each product. The product details along with the cost details are loaded into the staging area.
SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source Table
Source Column
STG_PRODUCT
PRODUCT_ID
PRODUCT
PRODUCT_ID
STG_PRODUCT
GROUP_ID
PRODUCT
GROUP_ID
STG_PRODUCT
PRODUCT_DESC
PRODUCT
PRODUCT_DESC
STG_PRODUCT
GROUP_DESC
PRODUCT
GROUP_DESC
STG_PRODUCT
DIVISION_DESC
PRODUCT
DIVISION_DESC
STG_PRODUCT
SUPPLIER_DESC
PRODUCT_COST
SUPPLIER_DESC
STG_PRODUCT
COMPONENT_ID
PRODUCT_COST
COMPONENT_ID
STG_PRODUCT
PRODUCT_COST
PRODUCT_COST
PRODUCT_COST
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Expression
Default Value if Null
51
Instructions Step 1: Import the Source Definitions 1.
Open the Source Analyzer workspace in the Designer. a.
Right click in the workspace and select Clear All.
b.
Chose the menu option Sources > Import from Database i.
Set the ODBC Data Source drop-box to the ODBC_TRANS Data Source Name
ii.
Enter the user name sdbu.
iii.
Tab down into Owner name and confirm that it defaults to the user name entered above.
iv.
Enter the password sdbu and click the Connect button.
v.
Expand the node in the Select tables area, and expand the TABLES node.
vi.
Import the relational tables PRODUCT and PRODUCT_COST.
Tip: You can select multiple objects for simultaneous import by using the Ctrl key. 2.
Save your work. Your Source Analyzer should appear as displayed in Figure 3-10. Figure 3-10. Source Definitions with a PK/FK Relationship Displayed in the Source Analyzer
Tip: The arrow connecting the keys PRODUCT_ID and PRODUCT_CODE denotes a relationship stored in the Informatica repository. By default, referential integrity (primary to foreign key) relationships defined on a database are imported when each of the tables in the relationship are imported. The arrow head is on the Primary key end (Parent / Independent / ‘one’ end) of the relationship. Tip: It is not generally a good practice to create two different tables with the same primary key. In
correct database design, this method of ‘horizontal partitioning’ of a table is usually only justified for security or performance reasons. Separating products and their product_costs doesn't meet either of these criteria. These two tables however give you a very good example of using a homogenous join in a mapping.
52
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Step 2: Import the Relational Target Definition 1.
2.
Open the Target Designer. a.
Right click in the workspace and select Clear All.
b.
Chose the menu option Targets > Import from Database. i.
Connect using the ODBC Data Source ODBC_STG, the user name tdbuxx and the password tdbuxx, where xx represents your student number.
ii.
Import the relational target definition STG_PRODUCT.
Save your work.
Step 3: Create the Mapping 1.
Open the Mapping Designer.
2.
If a mapping is visible in the workspace, close it by choosing the menu option Mappings > Close.
3.
Create a new mapping named m_Stage_Product_xx. For further details about how to do this, see Step 2, “Create a Mapping” on page 38.
4.
Choose the menu option Tools > Options. a.
Select the Tables tab. i.
Set the Tools drop-box at the top to Mapping Designer.
ii.
Uncheck the check-box Create Source Qualifiers when opening Sources.
iii.
Click OK.
Tip: The check-box described above allows you to specify whether a Source Qualifier transformation will be created automatically every time a Source definition is added to the mapping. Generally, this option is turned off when it is desired to add several relational Sources to the mapping and create a single Source Qualifier to join them. 5.
Add the source definitions PRODUCT and PRODUCT_COST to the mapping. You may need to display the navigator window by selecting the menu option View > Navigator.
6.
Create a Source Qualifier transformation by clicking on the appropriate icon in the transformation toolbar and then clicking in the workspace. The icon is shown highlighted below:
7.
In the Select Sources for Source Qualifier Transformation dialog-box, confirm that both sources are selected and click OK.
8.
Double-click the Source Qualifier to enter edit mode.
9.
Click the rename button and change the name to sq_Product_Product_Cost.
10.
Add the target definition STG_PRODUCT to the mapping.
11.
Link each of the output ports in the Source Qualifier to an input port in the target with the same name (i.e., PRODUCT_ID linked to PRODUCT_ID).
12.
Link the COST port to the PRODUCT_COST port.
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
53
13.
Save your mapping and confirm that it is valid. Note that the PRODUCT_CODE port in the Source Qualifier is intended to be unlinked, as it is not required in the target. Confirm that your mapping appears the same as displayed in Figure 3-11. Figure 3-11. Normal View of the Completed Mapping
14.
Edit the Source Qualifier. a.
Click on the Properties tab.
b.
Open the SQL Query Editor by clicking the arrow in the SQL Query property.
c.
Click the Generate SQL button. Note that the join statement can now be previewed, and that it is an inner join. Also note that the PRODUCT_CODE column is not in the SELECT statement; this is because the column is not linked in the mapping and is not needed. Your SQL Editor should appear as displayed in Figure 3-12. Figure 3-12. Generated SQL for the m_Stage_Product Mapping
15.
54
Click OK twice.
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
16.
Save your work. Tip: It is generally not a good practice to save the generated SQL unless there is a need to override it. If you cancel out of the SQL editor, then at runtime the session will create what is called the 'default query'. This is based on the ports and their links in the mapping. If you click OK and leave some SQL in the editor window, you've overridden the default query. Anytime you wanted to link a new port out of the Source Qualifier you would have to go in and regenerate the SQL. Tip: The relationship between PRODUCT_ID and PRODUCT_CODE was used to generate the inner join statement. If you desire to join two source tables on two columns that are not keys, you may establish a relationship between them by dragging the foreign key to the primary key column in the Source Analyzer. You may also modify the join statement to make it an outer join.
Step 4: Create the Session and Workflow 1.
From the Workflow Manager application, open the Workflow Designer tool.
2.
If a Workflow is visible in the workspace, close it by choosing the menu option Workflows > Close.
3.
Create a new Workflow named wkf_Stage_Product_xx. For further details about how to do this, see Step 3, “Create a Workflow and a Session Task” on page 40.
4.
Create a new Session by clicking on the appropriate icon in the task toolbar and then clicking in the workspace. The icon is shown highlighted below:
Select the mapping m_Stage_Product_xx for the Session. 5.
Edit the Session. a.
6.
In the Mapping tab: i.
Set the relational source connection object property to NATIVE_TRANS.
ii.
Set the relational target connection object property to NATIVE_STGxx where xx is your student number.
iii.
Check the property Truncate target table option in the target properties.
iv.
In the Properties area, confirm that the load type is Bulk.
Link the Start task to the Session task. For further details about how to do this, see Step 3, “Create a Workflow and a Session Task” on page 40.
7.
Right click in the workspace and select Arrange > Horizontal.
8.
Save your work.
Step 5: Run the Workflow and Monitor the Results 1.
Start the workflow.
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
55
Confirm that your Task Details appear the same as displayed in Figure 3-13. Figure 3-13. Properties of the Completed Session Run
2.
Confirm that your Source/Target Statistics appear the same as displayed in Figure 3-14. Figure 3-14. Source/Target Statistics for the Completed Session Run
3.
Using the Preview Data option in the Designer, confirm that your target data appears the same as displayed in Figure 3-15. Be sure to login with user tdbuxx. Figure 3-15. Data Preview of the STG_PRODUCT Target Table
56
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Lesson 3-4. Source Pipelines
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
57
58
Unit 3 Lab B: Load Product Staging Table Informatica PowerCenter 8 Level I Developer
Unit 3 Lab C: Load Dealership and Promotions Staging Table Section 3: Two Pipeline Mapping Business Purpose Two Dealership and Promotions staging tables must be loaded, one from a relational table and one from a flat-file.
Technical Description Both loads have a simple pass-through logic as in Lab A so we will combine them into one mapping. Even though two sources and two targets are involved, only one Session will be required to run this mapping.
Objectives ♦
Import a fixed-width flat file definition
♦
Define two data flows within one mapping
Duration 20 minutes
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
59
Velocity Deliverable: Mapping Specifications Mapping Name
m_Dealership_Promotions_xx
Source System
Flat file, Oracle Table
Target System
Oracle Table
Initial Rows
6, 31
Rows/Load
6, 31
Short Description
Simple pass-through mapping with 2 pipelines. One pipeline extracts from a flat file and loads to an Oracle table. The second pipeline extracts from an Oracle table and loads an Oracle table.
Load Frequency Preprocessing
Target truncate
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
DEALERSHIP.DEALERSHIP_ID, PROMOTIONS.PROMO_ID
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
DEALERSHIP
SDBU
N/A
File Name
File Location
Fixed/Delimited
promotions.txt
C:\pmfiles\SrcFiles
Fixed
Files Additional File Info
TARGETS
60
Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
Insert
STG_DEALERSHIP
X
STG_PROMOTIONS
X
Unique Key
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
HIGH LEVEL PROCESS OVERVIEW Source 1
Target 1
Source 2
Target 2
PROCESSING DESCRIPTION (DETAIL) This is a pass-through mapping with no data transformation.
SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source Table
Source Column
STG_DEALERSHIP
DEALERSHIP_ID
DEALERSHIP
DEALERSHIP_ID
STG_DEALERSHIP
DEALERSHIP_MANAGER_ID
DEALERSHIP
DEALERSHIP_MANAGER_ID
STG_DEALERSHIP
DEALERSHIP_DESC
DEALERSHIP
DEALERSHIP_DESC
STG_DEALERSHIP
DEALERSHIP_LOCATION
DEALERSHIP
DEALERSHIP_LOCATION
STG_DEALERSHIP
DEALERSHIP_STATE
DEALERSHIP
DEALERSHIP_STATE
STG_DEALERSHIP
DEALERSHIP_REGION
DEALERSHIP
DEALERSHIP_REGION
STG_DEALERSHIP
DEALERSHIP_COUNTRY
DEALERSHIP
DEALERSHIP_COUNTRY
STG_PROMOTIONS
PROMO_ID
PROMOTIONS
PROMO_ID
STG_PROMOTIONS
PROMO_DESC
PROMOTIONS
PROMO_DESC
STG_PROMOTIONS
PROMO_TYPE
PROMOTIONS
PROMO_TYPE
STG_PROMOTIONS
START_DATE
PROMOTIONS
START_DATE
STG_PROMOTIONS
EXPIRY_DATE
PROMOTIONS
EXPIRY_DATE
STG_PROMOTIONS
PROMO_COST
PROMOTIONS
PROMO_COST
STG_PROMOTIONS
DISCOUNT
PROMOTIONS
DISCOUNT
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Expression
Default Value if Null
61
Instructions Step 1: Import the Source Definitions 1.
Import the relational source definition DEALERSHIP. For further details about how to do this, see Step 1, “Import the Source Definitions” on page 52.
2.
Edit the source definition for the promotions.txt file in the source analyzer. a.
Click the Advanced button in the lower right of the edit box.
b.
Make sure that the number of bytes to skip between records is set to 2.
Note: A fixed width flat file will have bytes at the end of each row that depict a line feed and a carriage return. Depending on the system that the file was created on you will need to skip the appropriate number of bytes. If you don't your result set will be offset by 1 or 2 bytes. For files created on a mainframe set the value to 0, for UNIX/Linux set the value to 1, for all others set the value to 2.
Confirm that your promotions source definition appears the same as displayed in Figure 3-16. Figure 3-16. Normal view of the promotions flat file definition displayed in the Source Analyzer
Step 2: Import the Target Definitions Import the relational target definitions STG_DEALERSHIP and STG_PROMOTIONS For further details about how to do this, see Step 2, “Import the Relational Target Definition” on page 53.
Step 3: Create the Mapping 1.
62
Create a mapping named m_Dealership_Promotions_xx. a.
Make sure that the option to Create Source Qualifiers when Opening Sources is checked (on). For further details about how to do this, see Step 3, “Create the Mapping” on page 53.
b.
Add the Dealership and Promotions source definitions to the mapping.
c.
Confirm that a Source Qualifier was created for each.
d.
Add the STG_DEALERSHIP and STG_PROMOTIONS target definitions to the mapping.
e.
Link the appropriate Source Qualifier ports to the target ports.
2.
Save the mapping and confirm that it is valid.
3.
Right-click in a blank area within the mapping and choose the menu option Arrange All Iconic.
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Confirm that your mapping appears as displayed in Figure 3-17. Figure 3-17. Iconic View of the Completed Mapping
Step 4: Create and Run the Workflow 1.
Create a workflow named wkf_Load_Stage_Dealership_Promotions_xx.
2.
Create a Session Task named s_m_Dealership_Promotions_xx that uses the mapping m_Dealership_Promotions_xx.
3.
Edit the Session.
4.
a.
Set the database connection objects for the sources and targets in the Session. Note that both of the relational target database connections need to be set separately. For further details about how to do this, see “Create a Workflow and a Session Task” on page 40 and “Create the Session and Workflow” on page 55.
b.
Confirm that the source location information for the Promotions flat file is set correctly. For further details about how to do this, see “Create a Workflow and a Session Task” on page 40.
c.
Check the property Truncate target table option in the target properties.
Complete the Workflow, save it, and run it. Confirm that your Task Details appear the same as displayed in Figure 3-18. Figure 3-18. Properties of the Completed Session Run
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
63
Confirm that your Source/Target Statistics appear the same as displayed in Figure 3-19. Figure 3-19. Source/Target Statistics for the Completed Session Run
5.
Preview the target data with user tdbuxx. It should appear the same as Figure 3-20 and Figure 3-21: Figure 3-20. Data Preview of the STG_DEALERSHIP Target Table
64
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Figure 3-21. Data Preview of the STG_PROMOTIONS Target Table
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
65
66
Unit 3 Lab C: Load Dealership and Promotions Staging Table Informatica PowerCenter 8 Level I Developer
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler In this unit you will learn about: ♦
Expression transformations
♦
Filter transformations
♦
File lists
♦
Workflow scheduler
Lesson 4-1. Expression Transformation
Type Passive.
Description The Expression transformation lets you modify individual ports of a single row (or columns within a single row). It also lets you add and suppress ports. It cannot perform aggregation across multiple rows (use the Aggregator transformation).
Business Purpose You can modify ports using logical and arithmetic operators or built-in functions for: ♦
Character manipulation (concatenate, truncate, etc.)
♦
Datatype conversion (to char, to date, etc.)
♦
Data cleansing (check nulls, replace string, etc.)
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
67
♦
Data manipulation (round, truncate, etc.)
♦
Numerical calculations (exponential, power, log, modulus, etc.)
♦
Scientific calculations (sine, cosine, etc.)
♦
Special (lookup, decode, etc.)
♦
Test (for spaces, number, etc.)
For example, you might need to adjust employee salaries, concatenate first and last names, or convert strings to numbers. You can use the Expression transformation to perform any non-aggregate calculations. You can also use the Expression transformation to test conditional statements before you output the results to target tables or other transformations.
Expression Editor Interface, Variables, and Validation The Expression Editor Interface (shown below) helps the developer to construct an expression. Expressions can include numeric and logical operators, functions, ports, variables.
The Expression Editor provides:
68
♦
Numeric and arithmetic/logical operator keypads.
♦
Functions tab for built-in functions.
♦
Ports tab for port values.
♦
Variables tab for mapping and system variables.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Expressions resolve to a single value of a specific datatype. For example, the expression LENGTH (“HELLO WORLD”) / 2 returns a value of 5.5. The function LENGTH calculates the length of the string including all blank spaces as 11 bytes. Tip: Highlighting a function and pressing F1 will launch the online help and open it at the highlighted
function section.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
69
Variables and Scope
A transformation variable is created by creating a port and selecting the V check box. When V is checked, the I and O check boxes are grayed out. This indicates that a variable port is neither an input nor an output port.
70
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
When a record is processed, the expression is evaluated and the result assigned to the variable port. The result must be compatible with the datatype selected for the port otherwise an error is generated. The variable persists across the entire set of records that traverse the transformation. It may be used or modified anywhere in the set of data that is being processed.
Example 1 Check, Clean and Record Errors Suppose that we want to: ♦
Clean Up Item Name: The Accounts Receivable department is tired of generating reports with an inconsistent set of Items Names. Some are in UPPERCASE while others are in lower case; still others are in mixed case. They would like to see all of the data in a Title case mode. They would also like a count of how many changes have been made.
♦
Missing Data: The Systems and Application group is concerned that occasionally some incomplete data is sent to end users. They will like to tag each record as an error and be able to report and investigate the data where critical fields are missing data.
♦
Invalid Dates: Due to applications issues, occasionally dates are not valid. The AR departments, as well as the auditors are very concerned about this issue. They want every record with a bad date tagged and reported on.
♦
Invalid Numbers: The Sales Department is concerned that occasionally they see non-numeric data in a report that covers sales discounts where they expect to see numeric data. Find all errors and tag the records.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
71
The mapping could use the following functions and expressions: REQ
Functions Used
Notes
Expression
1
INITCAP
The INITCAP function will place the text in title case.
INITCAP(ITEM_NAME)
2
ISNULL LENGTH IS_SPACES
ISNULL will check for a NULL while IS_SPACES will look for a string that has spaces in it. If Length = 0, then the string is empty.
ISNULL (port_name) OR LENGTH (port_name) = 0 OR IS_SPACES(port_name)
3
IS_DATE
The IS_DATE function will check the input and determine if the date is Valid.
IS_DATE ("03/01/2005","MM/DD/ YYYY")
4
IS_NUMBER
The IS_DATE function will check the input and determine is the number is valid.
IS_NUMBER ("3.1415")
Example 2 Calculate Sales Discounting and Inventory Days Suppose we want to calculate: ♦
Discount Tracking: Sales Management would like to compare the amount of the suggested sell price to the actual sell price to determine the level of discounting. They plan to do this via a report. They would like a field developed that calculates sales discount.
♦
Days in Inventory: The Sales and Marketing departments would like to be able to determine how long an item was in inventory.
The mapping could use the following functions and expressions. REQ
Functions Used
Notes
Expression
1
Arithmetic
This is an Arithmetic expression.
( ( MSRP - ACTUAL ) / MSRP ) * 100
2
TO_DATE DATE_DIFF
TO_DATE will convert a string into an Informatica Internal Date and the DATE_DIFF function will calculate the difference between two dates in the units specified in the format operand. Here, the difference is returned in days because the format is 'DD'.
DATE_DIFF ( TO_DATE (SOLD_DT, 'MM/DD/YYYY'), TO_DATE (INVENTORY_DT, 'MM/DD/ YYYY'), 'DD' )
Performance Considerations Multiple identical conversions of the same data should be avoided. Ports that do not need modification should bypass the Expression transformation to save buffers.
72
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Lesson 4-2. Filter Transformation
Type Active.
Description The Filter transformation allows rows which meet the filter condition to pass through the transformation. Rows which do not meet the filter condition are skipped.
Business Purpose A business may chose not to process records which do not meet a data quality criterion, such as containing a null value in a field which may cause a target constraint violation or eliminate from the process date field values which will not provide useful data.
Example 1 Existing customer dimension records need to be updated to reflect changes to columns like address. However, only existing customer records are to be updated. The following example uses a Lookup to verify the customer exits and a filter to skip records which do not have an exiting customer (MSTR_CUST_ID) id. An Update Strategy tags the records for update which pass the filter condition.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
73
Performance Considerations Filter records which do not meet the selection criterion as early as possible in a mapping to reduce the number of rows processed, decrease throughput and decrease run-time. In fact any active transformation, that decreases the number of rows (the Normalizer and the Router can increase the number of rows), should be placed as early as possible in the mapping to decrease total rows throughput and improve performance.
74
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Lesson 4-3. File Lists In a Session task, you can set the source instance to point to a file list (list of flat files or XML files).
♦
The session processes each file in turn.
♦
The properties of all files must match the source definition.
♦
Wild cards are not allowed.
♦
All of the files must exist.
Sample file list: d:\data\eastern_trans.dat e:\data\midwest_trans.dat f:\data\canada_trans.dat
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
75
Lesson 4-4. Workflow Scheduler Workflows can be scheduled to run at regular intervals.
Run Options
76
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
♦
Run on Integration Service Initialization—will run the workflow each time the integration service initializes and then schedules it based on the other options.
♦
Run on demand—runs the workflow only when asked to.
♦
Run continuously—runs the workflow in a continuous mode. When the workflow finishes it will start again from the beginning.
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
77
78
Unit 4: Expression, Filter, File Lists, and Workflow Scheduler Informatica PowerCenter 8 Level I Developer
Unit 4 Lab: Load the Customer Staging Table Business Purpose The staging area for Mersche Motors data warehouse has a customer contacts table. Mersche Motors receives new data from their regional sales offices daily in the form of three text files. The text files are identical. For processing simplicity, Mersche Motors will be making use of the PowerCenter ability to read a list of files from a single source. The mapping that will do this will run on a nightly schedule at midnight.
Technical Description PowerCenter will source from a file list. This file list contains the names of three delimited flat files from the regional sales offices. All rows with a customer number of 99999 will need to be filtered out. There are a number of columns that will need to have the data reformatted, this will include substrings, concatenation and decodes. The target will be truncated until the mapping is fully tested.
Objectives ♦
Create a Filter transformation to eliminate unwanted rows from a flat file source
♦
Create an Expression transformation to reformat incoming rows before they are written to a target
♦
Use the DECODE function as a small lookup to replace values for incoming data before writing to target
♦
Create a session task that will accept a file list as a source
♦
Create a workflow that can run on a schedule
Duration 60 Minutes
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
79
Velocity Deliverable: Mapping Specifications Mapping Name
m_Stage_Customer_Contacts_xx
Source System
Flat files
Target System
Oracle Table
Initial Rows
6184
Rows/Load
6177
Short Description
Flat file list (customer_east.txt, customer_west.txt, customer_central.txt) comma delimited files that need to be filtered and reformatted before they are loaded into the target table.
Load Frequency
Scheduled run every night at midnight.
Preprocessing Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Files File Name
File Location
Fixed/Delimited
Additional File Info
customer_central.txt, customer_east.txt, customer_west.txt
C:\pmfiles\SrcFiles
Delimited
These 3 comma delimited flat files will be read into the session using a filelist named customer_list.txt.
Definition in customer_layout.txt
The layout of the flat files can be found in customer_layout.txt
customer_list.txt
C:\pmfiles\SrcFiles
NA
File list
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
STG_CUSTOMERS
Insert
Unique Key
X
HIGH LEVEL PROCESS OVERVIEW
Source
80
Filter
Expression
Target
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
81
Target Column
CUST_ID
CUST_NAME
CUST_ADDRESS
CUST_CITY
CUST_STATE
CUST_ZIP_CODE
CUST_COUNTRY
CUST_PHONE
CUST_GENDER
CUST_AGE_GROUP
CUST_INCOME
CUST_E_MAIL
CUST_AGE
Target Table
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
STG_CUSTOMER
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
customer_layout
Source Table
SOURCE TO TARGET FIELD MATRIX
AGE
EMAIL
INCOME
AGE
GENDER
PHONE_NUMBE R
COUNTRY
ZIP
STATE
CITY
ADDRESS
FIRSTNAME LASTNAME
CUSTOMER_NO
Source Column
The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and GREATER than 60.
The CUST_GENDER is derived from the decoding of the GENDER column. The GENDER column is a 1 character column that contains either 'M' (male) or 'F' (female). Any other values will resolve to 'UNK'.
The CUST_PHONE is a reformat of the PHONE_NUMBER column. The PHONE_NUMBER column is in the format of 9999999999 and needs to be reformatted to (999) 999-9999.
The CUST_NAME column is a concatenation of the firstname and lastname ports.
Any customer numbers that are equal to 99999 are considered customer inquiries and not part of the sales transactions.
Expression
This mapping will reformat the customer names, gender and telephone number columns.
Default Value if Null
Customer inquiries are captured using customer_no 99999. The mapping will filter out the customer inquiries.
PROCESSING DESCRIPTION (DETAIL)
Instructions Step 1: Create a Flat File Source Definition 1.
Launch the Designer client tool.
2.
Log into the PC8_DEV repository with the user name studentxx, where xx represents your student number as assigned by the instructor.
3.
Open your student folder.
4.
Import the customer_layout.txt flat file definition. This file is located in the c:\pmfiles\SrcFiles directory. If the file is located in a different directory, your instructor will specify. Ensure that the following parameters are selected: ♦ ♦ ♦ ♦
Import field names from first line. Comma delimited flat file. Text Qualifier is Double quotes. Format of the Date field is Datetime. Tip: Only one flat file definition is required when using a file list as a source in Power Center. All the files that make up the file list must have same identical layout in order for the file list to be successfully processed by Power Center.
5.
Confirm that your source definition appears as displayed in Figure 4-1. Figure 4-1. Source Analyzer View of the customer_layout Flat File Definition
6.
Save your work to the repository.
Step 2: Create a Relational Target Definition 1.
82
Import the STG_CUSTOMERS table found in your tdbuxx schema.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
2.
Confirm that your target definition appears the same as displayed in Figure 4-2. Figure 4-2. Target Designer View of the STG_CUSTOMERS Table Relational Definition
Step 3: Create a Mapping 1.
Create a new mapping named m_Stage_Customer_Contacts_xx.
2.
Add customer_layout flat file source to the mapping
3.
Add STG_CUSTOMERS target to the mapping. Your mapping will appear similar to Figure 4-3. Figure 4-3. Mapping with Source and Target Definitions
Step 4: Create a Filter Transformation 1.
Select the Filter transformation tool button located on the Transformation tool bar and place it in the workshops between the Source Qualifier and the Target. The icon is shown highlighted below:
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
83
Your mapping will appear similar to Figure 4-4. Figure 4-4. Mapping with Newly Added Filter Transformation
2.
Link the following ports from the Source Qualifier to the Filter: ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
3.
84
CUSTOMER_NO FIRSTNAME LASTNAME ADDRESS CITY STATE ZIP COUNTRY PHONE_NUMBER GENDER INCOME EMAIL AGE
Edit the Filter transformation. a.
Rename it to fil_Customer_No_99999.
b.
Select the Properties tab.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
Your display will appear similar to Figure 4-5. Figure 4-5. Properties Tab of the Filter Transformation
4.
Click the dropdown arrow for the Filter Condition Transformation Attribute to activate the Expression Editor.
5.
Remove the TRUE condition from the Expression Editor.
6.
Enter in the following the expression: CUSTOMER_NO != 99999 OR ISNULL(CUSTOMER_NO)
7.
Click the OK to return to the Properties of the Filter transformation. The Properties will appear as displayed in Figure 4-6. Figure 4-6. Completed Properties Tab of the Filter Transformation
8.
Click OK to return to the normal view of the mapping object.
Step 5: Create an Expression Transformation 1.
Create an Expression transformation directly after the Filter transformation. Select the Expression transformation tool button located on the Transformation tool bar and place it in the workspace directly after the Filter. The icon is shown highlighted below:
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
85
2.
Select the following ports from the Filter transformation and pass them to Expression transformation: ♦
FIRSTNAME LASTNAME PHONE_NUMBER GENDER AGE
♦ ♦ ♦ ♦
Your mapping will appear similar to Figure 4-7. Figure 4-7. Filter Transformation Linked to the Expression Transformation
3.
Edit the Expression transformation object. a.
Rename it exp_Format_Name_Gender_Phone.
b.
Change the port type to input for all of the ports except AGE. (AGE should remain an input/ output port.)
c.
Prefix each of these input only ports with IN_.
d.
Create a new output port after the AGE port by positioning the cursor on the AGE port and clicking the add icon.
♦
Port Name = OUT_CUST_NAME Dataytype = String Precision = 41 Expression = IN_FIRSTNAME ||' ' ||IN_LASTNAME
♦ ♦ ♦
Velocity Best Practice: Prefixing input only ports with IN_ and output ports with OUT_ is a Velocity best practice. This makes it easier to tell what the ports are without having to go into the transformation. Tip: This new port will concatenate the FIRSTNAME and LASTNAME ports into a single string. Do not use the CONCAT function to concatenate in expressions. Use || to achieve concatenation. The CONCAT function is only available for backwards compatibility. 4.
Create a new output port after the OUT_CUST_NAME port. ♦
86
Port Name = OUT_CUST_PHONE Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
♦ ♦ ♦
Datatype = String Precision = 14 Expression = '(' || SUBSTR(TO_CHAR(IN_PHONE_NUMBER),1,3) || ') ' || SUBSTR(TO_CHAR(IN_PHONE_NUMBER),4,3) ||'-' || SUBSTR(TO_CHAR(IN_PHONE_NUMBER),7,4)
Tip: The expression above uses a technique known as nesting functions. TO_CHAR function is nested inside the SUBSTR function. The TO_CHAR function is performed first. The SUBSTR function is then performed against the return value from TO_CHAR. 5.
Create new output port after the OUT_CUST_PHONE port. ♦ ♦ ♦ ♦
Port Name = OUT_GENDER Datatype = String Precision = 6 Expression = DECODE(IN_GENDER, 'M', 'MALE', 'F', 'FEMALE', 'UNK')
Tip: The DECODE function used in the previous expression can be used to replace nested IIF functions or small static lookup tables. The DECODE expression in the previous step will return the value MALE if incoming port GENDER is equal to M, FEMALE if GENDER equals F, or UNK if GENDER equals anything else beside F or M. 6.
Create a new output port after the OUT_GENDER PORT. ♦ ♦ ♦ ♦
Port Name = OUT_AGE_GROUP Datatype = String Precision = 20 Expression = Write an expression using the DECODE function that will assign the appropriate age group label to each customer based on their age. Use the online help to see details about the DECODE. If after 5 minutes you have not successfully created the DECODE statement, refer to the reference section at the end of the lab for the solution. The valid age ranges and age groups are displayed in the table below. The format of the DECODE statement follows the table. Age Range
Age Group Text
AGE<20
LESS THAN 20
AGE >= 20 AND <= 29
20 TO 29
AGE >= 30 AND <= 39
30 TO 39
AGE >= 40 AND <= 49
40 TO 49
AGE >= 50 AND <= 60
50 TO 60
AGE > 60
GREATER THAN 60
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
87
Figure 4-8. Sample Expression
7.
Save your work.
8.
Connect the following ports from the Expression transformation to the target table: AGE J
CUST_AGE
OUT_CUST_NAME J
CUST_NAME
OUT_CUST_PHONE J CUST_PHONE_NMBR
9.
OUT_GENDER J
CUST_GENDER
OUT_AGE_GROUP J
CUST_AGE_GROUP
Connect the following ports from the Filter transformation to the target table: CUSTOMER_NO J
CUST_ID
ADDRESS J
CUST_ADDRESS
CITY J
CUST_CITY
STATE J
CUST_STATE
ZIP J
CUST_ZIP_CODE
COUNTRY J
CUST_COUNTRY
INCOME J
CUST_INCOME
EMAIL J
CUST_E_MAIL
10.
Save your work.
11.
Verify that your mapping is valid.
12.
Right click in the workspace and select Arrange All Iconic. Figure 4-9. Iconic View of the Completed Mapping
Step 6: Create and Run the Workflow 1.
88
Launch the Workflow Manager and sign into your assigned folder.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
2.
Open the Workflow Designer tool and create a new workflow named wkf_Stage_Customer_Contacts_xx.
3.
Create a Session task using the session task tool button.
4.
Select m_Stage_Customer_Contacts_xx from the Mapping list box, and click OK.
5.
Link the Start object to the s_m_Stage_Customer_Contacts_xx session task.
6.
Edit the s_m_Stage_Customer_Contacts_xx session.
7.
Under the Mapping tab: a.
Select SQ_customer_layout located under the Sources folder in the navigator window.
b.
Confirm that Source file directory is set to $PMSourceFileDir\.
c.
In Properties | Attribute | Source filename type in customer_list.txt.
Tip: The source instance you are reading is known a File List. It is a list of files which will be appended together and treated as one source file by Power Center. The name of the text file that is listed in Properties | Attribute | Source filename will be a text file that contains a list of the text files to be read in as individual sources. When you create a file list you open a blank text file with a application such as Notepad and type on a separate line each text file that is to be read as part of the file list. You may precede each file name with directory path information. If you don't provide directory path information, Power Center assumes the files will be located in the same location as the file list file. d.
In Properties | Attribute | Source filetype click the dropdown arrow and change the default from Direct to Indirect.
Tip: When you use the file list feature in Power Center you have to set Properties | Attribute |
Source filetype to Indirect. The default is Direct. To change the setting, click the dropdown arrow and set the value you want to use.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
89
Your screen should appear similar to Figure 4-10. Figure 4-10. Session Task Source Properties
The file list file used in this exercise lists three text files which are found in the default location of the file list file, $PMSourceFileDir\. Figure 4-11 displays the contents of customer_list.txt. Figure 4-11. Contents of the customer_list.txt File List
e.
Select STG_CUSTOMERS located under the Target folder in the navigator window.
♦
Set the relational target connection object property to NATIVE_STGxx, where xx is your student number. Check the property Truncate target table option in the target properties.
♦
90
8.
Save your work.
9.
Check Validate messages to ensure your workflow is valid.
10.
Start the workflow.
11.
Review the session properties.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
Your information should appear as displayed in Figure 4-12. Figure 4-12. Properties for the Completed Session Run
12.
Review the Source/Target Statistics. Your statistics should be the same as displayed in Figure 4-13. Figure 4-13. Source/Target Statistics for the Completed Session Run
13.
If your session failed or had errors troubleshoot and correct them by reviewing the session log and making any necessary changes to your mapping or workflow.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
91
Data Results Preview the target data from the Designer. Your data should appear as displayed in Figure 4-14. Figure 4-14. Data Preview of the STG_CUSTOMERS Target Table
Observe the CUST_PHONE, CUST_GENDER, CUST_AGE_GROUP columns. These columns required transforming using the Expression transformation. Scroll down and review these columns. Verify you wrote your expressions correctly.
Step 7: Schedule a Workflow
92
1.
After debugging has been completed run the workflow for a final time for an initial table load.
2.
Open the session task for the mapping and ensure the truncate table property is checked.
3.
Save any changes to the repository.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
4.
Select Workflows > Edit. This will display the screen seen in Figure 4-15. Figure 4-15. General Properties for the Workflow
5.
Select the Scheduler tab.
6.
Select the Edit Scheduler command button
7.
Type sch_Stage_Customers_Contacts_xx in the Name text box.
8.
Select the Schedule tab.
.
a.
Clear the Run on demand check box.
b.
Select the Customized Repeat radio button and click the Edit button. i.
Select Week(s) from the Repeat every dropdown box.
ii.
Check the Monday, Tuesday, Wednesday, Thursday, Friday Weekly check boxes.
iii.
Select the Run once radio button in the Daily frequency group.
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
93
Your customized options should appear the same as displayed in Figure 4-16. Figure 4-16. Customized Repeat Selections
iv.
Click OK.
c.
Set the Start Date in the Start options group to tomorrow's date.
d.
Set the Start Time to 00:01.
e.
Select the Forever radio button in the End options group. Your schedule options will appear similar to the one displayed in Figure 4-17. Figure 4-17. Completed Schedule Options
94
9.
Click OK twice.
10.
Save your changes to the repository. Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
11.
Right click in the workspace and select Schedule Workflow.
12.
Check the Workflow Monitor to confirm that the workflow has been scheduled.
References 1.
Decode Statement DECODE(TRUE, AGE < 20,
'LESS THAN 20',
AGE >= 20 AND AGE <= 29, '20 TO 29', AGE >= 30 AND AGE <= 39, '30 TO 39', AGE >= 40 AND AGE <= 49, '40 TO 49', AGE >= 50 AND AGE <= 60, '50 TO 60', AGE > 60, 'GREATER THAN 60')
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
95
96
Unit 4 Lab: Load the Customer Staging Table Informatica PowerCenter 8 Level I Developer
Unit 5: Joins, Features and Techniques In this unit you will learn about: ♦
Heterogeneous joins
♦
Joiner transformations.
The labs also illustrate various PowerCenter mapping features and techniques.
Lesson 5-1. Joiner Transformation Joins combine data from different records (rows). Joins select rows from two different pipelines based on a relationship between the data, e.g. matching customer ID. One source is designated the Master, the other Detail.
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
97
Type Active.
Description The Joiner transformation combines fields from two data sources into a single combined data source based on one or more common fields also know as the join condition.
Business Purpose A business has data from two different systems that needs to be combined to get the desired results.
98
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
Example A business has sales transaction data on a flat file and product data on a relational table. The company needs to join the sales transaction to the product table to get some product information. We need to use the Joiner transformation to accomplish this task.
Joiner Properties
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
99
Join Types There are four types of join conditions supported by the Joiner transformation:
Joiner Cache How it Works ♦
There are two types of cache memory, index and data cache.
♦
All rows from the master source are loaded into cache memory.
♦
The index cache contains all port values from the master source where the port is specified in the join condition.
♦
The data cache contains all port values not specified in the join condition.
♦
After the cache is loaded the detail source is compared row by row to the values in the index cache.
♦
Upon a match the rows from the data cache are included in the stream.
Key Point If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk.
Performance Considerations The master source should be the source that will take up the least amount of space in cache. Another performance consideration would be the sorting of data prior to the Joiner transformation (discussed later).
100
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
Lesson 5-2. Shortcuts
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
101
102
Unit 5: Joins, Features and Techniques Informatica PowerCenter 8 Level I Developer
Unit 5 Lab A: Load Sales Transaction Staging Table Business Purpose Mersche Motors receives sales transaction data from their regional sales offices in the form of a text file. The sales transaction data needs to be loaded to the staging table daily.
Technical Description PowerCenter will source from a flat file and relational table. A Joiner transformation is used to create one dataflow that is then written to a relational target. The flat file is missing one field the staging table needs—the cost of each product. This value can be read from the STG_PRODUCT table. Each row of the source file contains a value named Product. This value has an identical corresponding value in the STG_PRODUCT table PRODUCT_ID column. Use the Joiner transformation to join the flat file to the relational table (heterogeneous join) and then write the results to the STG_TRANSACTIONS table.
Objectives ♦
Create a Joiner transformation and use it to join two data streams from two different Source Qualifiers.
♦
Select the master side of the join.
♦
Specify a join condition.
Duration 30 minutes
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
103
Velocity Deliverable: Mapping Specifications Mapping Name
m_STG_TRANSACTIONS_xx
Source System
Flat file
Oracle table
Initial Rows
5474
Short Description
Flat file and oracle table will be joined into one source datastream which will be written to an oracle target table.
Load Frequency
Daily
Preprocessing
Target Append
48
Target System
Oracle table
Rows/Load
5474
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
PRODUCT_ID
SOURCES Tables Table Name
Schema/Owner
STG_PRODUCT
TDBUxx
Selection/Filter
Files File Name
File Location
Fixed/Delimited
Additional File Info
sales_transactions.txtt
C:\pmfiles\SrcFiles
Delimited
Comma delimiter
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
STG_TRANSACTIONS
Insert
Unique Key
X
HIGH LEVEL PROCESS OVERVIEW
Flat File Source Joiner Transformation
Relational Target
Relational Source
104
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
PROCESSING DESCRIPTION (DETAIL) The flat file source has all the required detail except for product cost. It will be joined with the relational source into one data stream allowing access to the product cost for each sales transaction. The relational table will have fewer columns to join on as well as fewer rows. The columns to join on are PRODUCT_ID and PRODUCT. The single data stream will then be written to a staging table. Also, there are a number of fields in the source file that need further scrutiny during the import process.
SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source File
Source Column
STG_TRANSACTIONS
CUST_ID
Sales_transactions
CUST_NO
STG_TRANSACTIONS
PRODUCT_ID
Sales_transactions
PRODUCT
STG_TRANSACTIONS
DEALERSHIP_ID
Sales_transactions
DEALERSHIP
STG_TRANSACTIONS
PAYMENT_DESC
Sales_transactions
PAYMENT_DESC
STG_TRANSACTIONS
PROMO_ID
Sales_transactions
PROMO_ID
STG_TRANSACTIONS
DATE_ID
Sales_transactions
DATE_ID
STG_TRANSACTIONS
TRANSACTION_DATE
Sales_transactions
TRANSACTION_DATE
STG_TRANSACTIONS
TRANSACTION_ID
Sales_transactions
TRANSACTION_ID
STG_TRANSACTIONS
EMPLOYEE_ID
Sales_transactions
EMPLOYEE_ID
STG_TRANSACTIONS
TIME_KEY
Sales_transactions
TIME_KEY
STG_TRANSACTIONS
SELLING_PRICE
Sales_transactions
SELLING PRICE
STG_TRANSACTIONS
UNIT_COST
STG_PRODUCT
PRODUCT_COST
STG_TRANSACTIONS
DELIVERY_CHARGES
Sales_transactions
DELIVERY CHARGES
STG_TRANSACTIONS
SALES_QTY
Sales_transactions
QUANTITY
STG_TRANSACTIONS
DISCOUNT
Sales_transactions
DISCOUNT
STG_TRANSACTIONS
HOLDBACK
Sales_transactions
HOLDBACK
STG_TRANSACTIONS
REBATE
Sales_transactions
REBATE
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Expression
Default Value if Null
105
Instructions Step 1: Create a Flat File Source Definition 1.
Launch the Designer client tool (if it is not already running) and open you student folder.
2.
Open the Source Analyzer tool.
3.
Import sales_transactions.txt comma delimited flat file.
4.
Ensure that the Transaction Date field has a Datatype of Datetime.
5.
Save the repository.
Step 2: Create a Relational Source Definition 1.
Verify you are in the Source Analyzer tool, and import STG_PRODUCT table found in your tdbuxx schema. Use ODBC_STG as the ODBC data Source. Note that you are importing the table as a source definition, even though it is in your “target” (tdbuxx) schema.
2.
Save the repository.
Step 3: Create a Relational Target Definition 1.
Open the Target Designer tool.
2.
Import the STG_TRANSACTIONS table found in your tdbuxx schema.
Step 4: Create a Mapping 1.
Open the Mapping Designer tool.
2.
Create a new mapping named m_STG_TRANSACTIONS_xx.
3.
Add the sales_transactions flat file source to the new mapping.
4.
Add the STG_PRODUCT relational source to the new mapping.
5.
Add the STG_TRANSACTIONS relational target to the new mapping. Your mapping should appear similar to Figure 5-1. Figure 5-1. Normal View of the Heterogeneous Sources, Source Qualifiers and Target
106
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Step 5: Create a Joiner Transformation 1.
Select the Joiner transformation icon located on the Transformation tool bar with a single left click. Figure 5-2 shows the Joiner transformation button: Figure 5-2. Joiner Transformation Button
2.
Create a new Joiner transformation.
3.
Select all the ports from the SQ_sales_transactions object and copy/link them to the Joiner transformation.
4.
Select only the PRODUCT_ID and PRODUCT_COST ports from SQ_STG_PRODUCT object and copy them to the Joiner transformation. Your mapping should be similar to Figure 5-3. Figure 5-3. Normal View of Heterogeneous Sources Connected to a Joiner Transformation
5.
Edit the Joiner transformation. a.
Rename it to jnr_Sales_Transaction_To_STG_PRODUCT.
b.
Select the Ports tab.
c.
Set the Master (M) property to the STG_PRODUCT ports.
Tip: Which ports should be the Master? Use the source that is the smaller, in rows and bytes, if the data is not sorted. If the source data is sorted, use the source with the fewest number of join column duplicates.
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
107
Figure 5-4. Edit View of the Ports Tab for the Joiner Transformation
6.
d.
Uncheck the output check box for PRODUCT_ID.
e.
Rename the PRODUCT_ID port to IN_PRODUCT_ID.
Select the Condition tab. a.
Click the Add a new condition button. Figure 5-5 displays the Add a new condition button as selected. Figure 5-5. Edit View of the Condition Tab for Joiner Transformation Without a Condition
The Master drop down box will default to IN_PRODUCT_ID.
108
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
b.
Select the Detail drop down box and set it to PRODUCT. Your condition should be the same as displayed in Figure 5-6. Figure 5-6. Edit View of the Condition Tab for the Joiner Transformation with Completed Condition
Tip: The Joiner transformation can support multiple port conditions to create a join. If you need multiple port conditions simply click the Add a new condition button to add the other ports that make up the multiple port condition. c. 7.
Click OK.
Save the repository.
Step 6: Link the Target Table 1.
Link the following ports from the Joiner transformation to the corresponding columns in the target object. Example: JOINER PORT J TARGET COLUMN CUST_NO J
CUST_ID
PRODUCT J
PRODUCT_ID
DEALERSHIP J
DEALERSHIP_ID
PAYMENT_DESC J
PAYMENT_DESC
PROMO_ID J
PROMO_ID
DATE_ID J
DATE_ID
TRANSACTION DATE J TRANSACTION_DATE TRANSACTION_ID J
TRANSACTION_ID
EMPLOYEE_ID J
EMPLOYEE_ID
TIME_KEY J
TIME_KEY
SELLING PRICE J
SELLING_PRICE
PRODUCT_COST J
UNIT_COST
DELIVERY CHARGES J DELIVERY_CHARGES QUANTITY J
SALES_QTY
DISCOUNT J
DISCOUNT
HOLDBACK J
HOLDBACK
REBATE J
REBATE
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
109
Figure 5-7 displays the ports linked to their corresponding columns. Figure 5-7. Normal View of Completed Mapping Heterogeneous Sources Not Displayed
2.
Save the repository.
3.
Verify your mapping is valid in the Output window. If the mapping is not valid, correct the invalidations that are displayed in the message.
Step 7: Create a Workflow and Session Task 1.
Launch the Workflow Manager application (if it is not already running) and log into the repository and your student folder.
2.
Open the Workflow Designer tool and create a new workflow named wkf_STG_TRANSACTIONS_xx.
3.
Add a new Session task using the session task icon.
4.
Select m_STG_TRANSACTIONS_xx from the Mapping list box and click OK.
5.
Link the Start object to the s_m_STG_TRANSACTIONS_xx session task object.
6.
Edit the s_m_STG_TRANSACTIONS_xx session task. a.
Select the Mapping tab.
b.
Select SQ_sales_transactions located under the Sources folder in the Mapping navigator.
c.
Confirm that Properties | Attribute | Source file directory is set to $PMSourceFileDir\
d.
In Properties | Attribute | Source filename verify that sales_transactions.txt is displayed. The file extension (.txt) must be present.
e.
Select SQ_STG_PRODUCT located under the Sources folder in the navigator window. Set the Connections | Type to your assigned Native_STGxx connection object.
f.
Select STG_TRANSACTIONS located under the Target folder in the navigator window. Set the Connections | Type to your assigned Native_STGxx connection object. Check the Truncate target table option checkbox.
7.
110
Save the repository.
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
8.
Check Validate messages to ensure your workflow is valid. If you received an invalid message, correct the problem(s) and re-validate/save.
Step 8: Start the Workflow and View Results in the Workflow Monitor 1.
Start the workflow.
2.
Confirm that the Workflow Monitor application launches automatically.
3.
Maximize the Workflow Monitor.
4.
Double-click the session with your left mouse button and view the Task Details window. Your information should appear similar to Figure 5-8. Figure 5-8. Task Details of the Completed Session Run
5.
Select the Transformation Statistics tab. Your statistics should be similar to Figure 5-9. Figure 5-9. Source/Target Statistics for the Session Run
6.
If your session failed or had an error proceed to the next step.
7.
Right-click the Session again and select Get Session Log.
8.
Search the session log for error messages that caused your session to have issues. Read the messages and correct the problem. Rerun your workflow to test your fix(s). Ask your instructor for help if you get stuck.
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
111
Data Results Preview the target data from the Designer. Your data should appear the same as displayed in Figure 5-10. Figure 5-10. Data Preview of the STG_TRANSACTIONS Table
112
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
113
114
Unit 5 Lab A: Load Sales Transaction Staging Table Informatica PowerCenter 8 Level I Developer
Unit 5 Lab B: Features and Techniques I Business Purpose The management wants to increase the efficiency of the PowerCenter Developers.
Technical Description This lab will detail the use of 13 PowerCenter Designer features. Each of these features will increase the efficiency of any developer who knows how to use them efficiently. At the discretion of the instructor, this lab can also be completed as a demonstration.
Objectives ♦
Auto Arrange
♦
Remove Links
♦
Revert to Saved
♦
Link Path
♦
Propagating Ports
♦
Autolink by Name and Position
♦
Moving Ports
♦
Shortcut to Port Editing from Normal View
♦
Create Transformation Methods
♦
Scale-To-Fit
♦
Designer Options
♦
Object Shortcuts and Copies
♦
Copy Objects Within and Between Mappings
Duration 50 minutes
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
115
Instructions Open a Mapping In the Designer tool: 1.
Open your student folder.
2.
Open mapping m_Stage_Customer_Contacts_xx.
Feature 1: Auto Arrange The Designer includes an Arrange feature that will reorganize objects in the Workspace in one simple step. This aids in readability and analysis of the mapping flow and can be applied to certain paths through a mapping associated with specific target definitions. In a couple of clicks, this feature can take a mapping that looks like Figure 5-11. Figure 5-11. View of an Unorganized Mapping
116
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
And change it to look like Figure 5-12. Figure 5-12. Arranged View of a Mapping
1.
Choose Layout > Arrange All Iconic or right-click in the Workspace and select Arrange All Iconic. Figure 5-13. Iconic View of an Arranged Mapping
2.
Choose Layout > Arrange All or right-click in the Workspace and select Arrange All.
3.
Type Ctrl+S to save.
Tip: Notice the mapping would not save. When only formatting changes are made, it is not considered
a change. Another change must be made to the Repository in order for the formatting to be saved.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
117
Feature 2: Remove Links Click-and-drag the pointer over the blue link lines that are between exp_Format_Name_Gender_Phone and STG_CUSTOMERS. Figure 5-14. Selecting Multiple Links
Tip: By default, each selected link changes in color from blue to red. If any other objects (e.g.,
transformations) were selected along with the links, redo the process. Press the Delete key to remove the connections. Ensure no icons are deleted.
Feature 3: Revert to Saved Tip: While editing an object in the Designer, if unwanted changes are made there is a way to revert to a previously saved version - undoing the changes since the last save. The Revert to Saved feature works with the following objects: sources, targets, transformations, mapplets and mappings. Tip: For mappings, Revert to Saved reverts all changes to the mapping, not just selected objects. Only
the active mapping in the workspace is reverted. In the Source Analyzer, Target Designer, and Transformation Developer, individual objects may be reverted. 1.
Select Edit > Revert to Saved. Figure 5-15. Designer Warning Box
118
2.
Select Yes to proceed.
3.
Edit the exp_Format_Name_Gender_Phone expression. Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
In the Ports tab, select the OUT_CUST_NAME port and click the Delete button. 4.
Similarly, delete the AGE port.
5.
Edit the SQ_customer_layout Source Qualifier and remove the AGE port.
6.
Select only the SQ_customer_layout Source Qualifier and choose Edit > Revert to Saved. The same dialog box appears - all changes must be reverted.
7.
Select Yes to proceed. Notice all changes were reverted, not just the changes made to the SQ_customer_layout.
Feature 4: Link Path Tracing link paths allows the developer to highlight the path of a port either forward or backward through an entire mapping or mapplet. If the class is doing this lab as a follow-along exercise, do a Revert to Saved so that everyone is synchronized. 1.
Ensure that the mapping is in the arranged normal view.
2.
Right-click on CUSTOMER_NO in the SQ_customer_layout Source Qualifier and choose Select Link Path > Forward. Figure 5-16. Selecting the forward link path
Notice how the path for CUSTOMER_NO, from SQ_customer_layout all the way to STG_CUSTOMERS, is highlighted in red. Figure 5-17. Highlighted forward link path
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
119
3.
Right-click on the OUT_CUST_NAME port in the exp_Format_Name_Gender_Phone and select Link Path > Both. Figure 5-18. Highlighted link path going forward and backward
Notice how the OUT_CUST_NAME port's path not only shows where it proceeds to the STG_CUSTOMERS target definition, but also from its origin all the way back to the customer_layout source definition. Both the IN_FIRSTNAME and IN_LASTNAME are used in the formula to produce OUT_CUST_NAME, so both links are highlighted in red.
Feature 5: Propagating Ports Tip: When a port name, datatype, precision, scale, or description is changed, those changes can be propagated to the rest of the mapping. 1.
Edit SQ_customer_layout and change CUSTOMER_NO to CUST_NO and change the Precision to 10.
2.
Click OK.
3.
Right click on CUST_NO in the SQ_customer_layout transformation and select Propagate Attributes. Figure 5-19. Selecting to propagate the attributes
120
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
4.
Under Attributes to propagate, choose Name and Precision with a Direction of Forward. Figure 5-20. Propagation attribute dialog box
5.
Choose Preview. Notice the arrow between SQ_customer_layout and fil_Customer_No_99999 turns green. The green arrow indicates the places where a change would be made. Why is there only one change?
6.
Select Propagate. Was a change made in the filter?
7.
Click Close.
8.
Edit SQ_customer_layout and change GENDER to CUST_GENDER and change the Precision to 7.
9.
Click OK.
10.
Right click on CUST_GENDER in the SQ_customer_layout transformation and select Propagate Attributes. a.
Under Attributes to propagate, choose Name and Precision with a direction of Forward.
b.
Select Preview. Notice the green arrows? What will be changed?
c.
Select Propagate.
d.
Edit exp_Format_Name_Gender_Phone and open the Expression Editor for OUT_GENDER. Notice the expression now contains CUST_GENDER.
e.
Close the Propagate dialog box.
Feature 6: Autolink by Name and Position Tip: Developers can automatically link ports by name in the Designer. Use any of the following options to automatically link by name: ♦
Link by name
♦
Link by name and prefix
♦
Link by name and suffix
The Designer adds links between input and output ports that have the same name. Linking by name is case insensitive. Link by name when using the same port names across transformations. 1.
Revert to Saved to reset the mapping.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
121
2.
Remove the links between the exp_Format_Name_Gender_Phone and the STG_CUSTOMERS target definition.
3.
Right-click in the white space inside the mapping. Choose Autolink by Name
The Autolink dialog box opens. Figure 5-21. Autolink dialog box
Tip: Only one transformation may be selected in the From Transformation box and one to many transformations may be selected in the To Transformations box. For objects that contain groups, such as Router transformations or XML targets, select the group name from the To Transformations list.
122
4.
Select the exp_Format_Name_Gender_Phone transformation from the From Transformation dropdown menu; then highlight the STG_CUSTOMERS transformation in the To Transformations box.
5.
Click OK.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
Notice that nothing happened. Look carefully at the exp_Format_Name_Gender_Phone and STG_CUSTOMERS and you will notice that none of the ports match exactly, therefore autolink by name will not work in this situation. Would autolink by position work? Tip: When autolinking by name, the Designer adds links between ports that have the same name, case insensitive. The Designer also has the ability to link ports based on prefixes or suffixes defined. Adding suffixes and/or prefixes in port names help identify the ports purpose. For example, a suggested best practice is to use the prefix “OUT_” when the port is derived from input ports that were modified as it passes through the transformation. Without this feature, Autolink would skip over the names that don't match and force the developer to manually link the desired ports. 6.
Select Layout > Autolink.
7.
Select the exp_Format_Name_Gender_Phone transformation from the From Transformation dropdown menu; then highlight the STG_CUSTOMERS transformation in the To Transformations box.
8.
Select the Name radio button.
9.
Click More to view the options for entering prefixes and suffixes. Note the button toggles to become the Less button.
10.
Type OUT_ in the From Transformation Prefix field.
11.
Click OK Notice that only the OUT_CUST_NAME port was linked. This is because this is the only port with a matching name. Figure 5-22. Defining a prefix in the autolink dialog box
Feature 7: Moving Ports 1.
Revert to Saved to reset the mapping.
2.
Open the exp_Format_Name_Gender_Phone and click on the Ports tab.
3.
Single-click on the AGE port and move it up to the top using the Up arrow icon found in the upper right corner of the toolbar.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
123
The results will look like Figure 5-23: Figure 5-23. Expression after the AGE port has been moved
4.
Single-click on the number to the left of the IN_PHONE_NUMBER port.
5.
Single-click and hold the left mouse button and note the faint square that appears at the bottom of the pointer. Figure 5-24. Click and drag method of moving ports
6.
Move PHONE_NUMBER directly below AGE.
7.
Click Cancel to discard the changes.
Feature 8: Shortcut to Port Editing from Normal View Tip: There is a shortcut to go from the Normal View directly to the port desired in the Edit View -
this is especially useful in transformation objects that have dozens of ports. 1.
Revert to Saved to reset the mapping.
2.
Resize or scroll down until the AGE port appears in the exp_Format_Name_Gender_Phone.
3.
Double-click on the AGE port.
4.
Notice you are now in the Ports tab.
5.
Delete the AGE port.
Feature 9: Create Transformation Methods
124
1.
Revert to Saved to reset the mapping.
2.
On the Transformation toolbar, find the Aggregator Transformation
3.
Move the mouse into the Workspace. The cursor changes to crosshairs.
button and single-click.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
4.
Single-click in the workspace where you want to place the transformation. The selected transformation appears in the desired location of the Workspace and the cursor changes back to an arrow. Tip: When the mouse pointer hovers over a transformation icon in the toolbar that the name of the transformation object appears momentarily.
5.
Select Transformation > Create. Figure 5-25. Creating a transformation using the menu
6.
Select the Aggregator from the drop-down list. Figure 5-26. Create Transformation dialog box
7.
Enter the name agg_TargetTableName and click Create.
8.
Click on the Done button and the new transformation appears in the Workspace. Figure 5-27. Normal View of the Newly Created Aggregator Transformation
Feature 10: Scale-to-Fit 1.
Revert to Saved to reset the mapping.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
125
There are features to change the magnification of the contents of the Workspace. Use the toolbar or the Layout menu options to set zoom levels. The toolbar has the following zoom options: Figure 5-28. Zoom options
2.
Click on the Zoom out 10% button
3.
Click anywhere in the Workspace and the mapping will zoom out by 10% each time the mouse is clicked.
4.
Keep clicking until the mapping is small enough to fit within the window.
on the toolbar.
Tip: Zoom out 10% on button. Uses a point selected as the center point from which to decrease the
current magnification in 10 percent increments.
5.
Click on the Zoom in 10% button
6.
Click anywhere in the Workspace and the mapping will zoom in by 10% each time the mouse is clicked.
on the toolbar.
Tip: Zoom in 10% on button increases the current magnification of a rectangular area selected.
Degree of magnification depends on the size of the area selected, Workspace size, and current magnification.
7.
Toggle off the Zoom in 10%
8.
Click on the Scale to Fit
button. button in the toolbar.
Another way to do this is to select Layout > Scale to Fit.
Feature 11: Designer Options Tip: The designer can display different attributes within each of the transformation objects. You can select which attributes you want displayed, such as: Name, Expression (where applicable), Data type and Length/Precision.
126
1.
Select Tools > Options.
2.
Click on the Tables tab. Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
3.
Select the Expression transformation type from the drop-down list.
4.
Delete the Length/Precision from the selected box.
5.
Click OK Notice how the Length/Precision no longer appears in the Expression transformation.
6.
Change the options back to where they originally were.
Feature 12: Object Shortcuts and Copies This feature allows you create a shortcut of an object (a source or target definition, a mapping, etc.) in any folder. A shortcut is a “pointer” to the original object. If the object is edited, all shortcuts inherit the changes. The shortcut itself cannot be edited. 1.
Select and double-click the DEV_SHARED folder. Note that the folder name in the Navigator window is now bold. This means that the folder is open. Figure 5-29. Navigator window in the Designer
2.
Open your student folder by either double clicking on it or by right-clicking on it and selecting open. Note that the DEV_SHARED folder is no longer bold (open) but it remains expanded so you can see the subfolders. Tip: Only one folder at a time can be open. Any number of folders can be expanded so that the subfolders and objects are visible. As we will see below, it is important to distinguish between expanded folders and the open folder.
3.
Open the Mapping Designer and close any mapping that is in the workspace.
4.
Expand the Mappings subfolder in the DEV_SHARED folder.
5.
Click and drag the m_Stage_Customer_Contacts mapping to the Mapping Designer workspace and release the mouse button.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
127
You will see the following confirmation message:
6.
Click Yes
7.
Save the changes to the repository. Note that your folder now has a shortcut to the mapping. Select the menu option Mappings ' Edit to see how the shortcut location is displayed.
8.
Open the Filter transformation in edit mode. Note that all properties are grayed-out and not editable. A shortcut can never be edited directly.
9.
Perform the same click-and-drag operation with the same mapping, only this time press the [Ctrl] key after you have begun to drag the mapping. Note that this creates a copy of the mapping instead of a shortcut.
10.
Click No in the Copy Confirmation message box. Tip: The destination folder (the folder you are placing the copy or shortcut into) must be the open folder. The origin folder that contains the original object will be expanded.
We will now learn how to copy an object within the same folder. The instructions below are to copy a mapping but the same procedure can be used for any other object. 1.
In the Navigator window, select any mapping in your folder.
2.
Press Ctrl+C on your keyboard, followed immediately by Ctrl+V.
3.
Click Yes in the Copy Confirmation message box. The Copy Wizard will be displayed.
4.
The red x on the mapping indicates a conflict. Choose Rename for the conflict resolution.
5.
Click the Edit button. If desired, you can supply your own new name to the mapping to replace the “1” added by the Designer. Mappings within a folder must have unique names.
6.
Click Next, then Finish.
Tip: A common error when copying objects within a folder is to use the mouse to move the cursor from
the object to the workspace after copying the object Ctrl+C. This is unnecessary and will cause the copy operation to fail.
Feature 13: Copy Objects Within and Between Mappings You may find that you would like to duplicate a given set of transformations within a mapping or a mapplet, preserving the data flow between them. This technique may prove useful if you know that you will need to use the logic contained in the transformations in other mappings or mapplets. 1.
128
Use the Arrange All Iconic feature on the m_Stage_Customer_Contacts_xx mapping.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
2.
Use your left mouse button to draw a rectangle that encloses the Filter and the Expression transformations. These objects will then appear selected.
3.
Press Ctrl+C on your keyboard, followed immediately by Ctrl+V. Note that both transformations have been copied into the mapping, including the data flow between the input and output ports. They have been automatically renamed with a “1” on the end of their names.
4.
Open another mapping in the Mapping Designer. It does not matter which mapping is used, provided it is not a shortcut.
5.
Press Ctrl+V. The transformations are copied into the open mapping.
Tip: The copy objects within and between mappings feature can be used only within a single folder. 6.
Close your folder but do not save the changes.
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
129
130
Unit 5 Lab B: Features and Techniques I Informatica PowerCenter 8 Level I Developer
Unit 6: Lookups and Reusable Transformations In this unit you will learn about: ♦
Lookup transformations
♦
Reusable transformations
Lesson 6-1. Lookup Transformation (Connected)
Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
131
Type Passive.
Description A Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source. In SQL terms a Lookup transformation may be thought as a “sub-query”. The basic Lookup transformation types are connected, un-connected and dynamic.
Properties We will discuss only some of the properties in this section. The remaining properties will be discussed in other sections.
132
Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
Option
Lookup Type
Description
Lookup SQL Override
Relational
Overrides the default SQL statement to query the lookup table.Use only with the lookup cache enabled.
Lookup Table Name
Relational
Specifies the name of the table from which the transformation looks up and caches values.
Lookup Policy on Multiple Match
Flat File, Relational
Determines what happens when the Lookup transformation finds multiple rows that match the lookup condition. You can select the first or last row returned from the cache or lookup source, or report an error.
Lookup Condition
Flat File, Relational
Displays the lookup condition you set in the Condition tab.
Connection Information
Relational
Specifies the database containing the lookup table. You can select the exact database connection or you can use the $Source or $Target variable. If you use one of these variables, the lookup table must reside in the source or target database you specify when you configure the session. If you select the exact database connection, you can also specify what type of database connection it is.
Source Type
Flat File, Relational
Indicates that the Lookup transformation reads values from a relational database or a flat file.
Tracing Level
Flat File, Relational
Sets the amount of detail included in the session log when you run a session containing this transformation.
Datetime Format
Flat File
If you do not define a datetime format for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can enter any datetime format. The default is MM/DD/YYYY HH24:MI:SS.
Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
133
Option
Lookup Type
Description
Thousand Separator
Flat File
If you do not define a thousand separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can choose no separator, a comma, or a period. The default is no separator.
Decimal Separator
Flat File
If you do not define a decimal separator for a particular field in the lookup definition or on the Ports tab, the Integration Service uses the properties defined here. You can choose a comma or a period decimal separator. The default is period.
Case-Sensitive String Comparison
Flat File
If selected, the Integration Service uses case-sensitive string comparisons when performing lookups on string columns. Note: For relational lookups, the case-sensitive comparison is based on the database support.
Null Ordering
Flat File
Determines how the Integration Service orders null values. You can choose to sort null values high or low. By default, the Integration Service sorts null values high. Note: For relational lookups, null ordering is based on the database support.
Sorted Input
Flat File
Indicates whether or not the lookup file data is sorted.
Business Purpose A business may bring data from various sources but additional data from local sources may be need such as product codes, dates, names, etc.
Example In the following example an insurance company pays commissions on each new policy; however there may be a possibility by clerical error duplicate policies may be submitted. The goal is to check submitted policies against current list and reject the policies which are duplicates. A policy number is passed to a connected Lookup transformation is used to check the current policy table for the pre-existence of a policy. If the policy number exists the matching policy number is returned, if the policy number does not exist a “null” value is returned. The return is used as the “Group Filter Condition” in the Router transformation. The Router filter condition is “ISNULL (POLICY_NO1)” and is based on the return value from the Lookup transformation “POLICY_NO” port NOT the value from the Source Qualifier. Rows from the source which have no match (null return) in the lookup table will make the filter condition and pass to the new (POLICY_NEW) target. All other rows go to the Router Default group and are passed to the reject (ROLICIES_REJ) target.
Performance Considerations All rows pass through a connected Lookup so there may be performance degradation in executing additional Lookups when there are not needed. Caching a very large table may require a large amount of memory.
Lesson 6-2. Reusable Transformations You can create a reusable transformation in:
134
♦
Transformation Developer
♦
Mapping Designer and then ‘promote.’ Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
Reusable transformations are listed in the Transformations node of the Navigator. Drag and drop them in any mapping to make a shortcut and then override the properties as needed.
Key Points ♦
You can also copy them as non-reusable by depressing the Ctrl key while dragging.
♦
You can edit ports only in the Transformation Developer.
♦
Instances dynamically inherit changes.
♦
Source Qualifier transformations cannot be reusable.
♦
Changing reusable transformations can invalidate mappings
Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
135
136
Unit 6: Lookups and Reusable Transformations Informatica PowerCenter 8 Level I Developer
Unit 6 Lab A: Load Employee Staging Table Business Purpose Information about Mersche Motors employees is saved to three text files each day. We must read each of these files individually and load them to the staging area. The files do not contain employee salary information, so we must find each employee's salary. We also must reformat some of the other fields.
Technical Description We have three text files coming in daily with employee information that we would like to put into a file list. We need to find a salary for each employee, concatenate first name and last name, change the format of age and phone number and add a load date.
Objectives ♦
Reusable Expression
♦
Lookup to Flat File
Duration 45 Minutes
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
137
Velocity Deliverable: Mapping Specifications Mapping Name
m_STG_EMPLOYEES_xx
Source System
Flat file
Target System
Oracle table
Initial Rows
109
Rows/Load
109
Short Description
File list will be read, source data will be reformatted, a load date will be added and salary information for each employee will be added.
Load Frequency
Daily
Preprocessing
Target Append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Files File Name
File Location
Fixed/Delimited
Additional File Info
employees_central.txt, employees_east.txt, employees_west.txt Definition in employees_layout.txt
C:\pmfiles\SrcFiles
Delimited
These 3 comma delimited flat files will be read into the session using a filelist employees_list.txt. The layout of the flat files can be found in employees_layout.txt.
employees_list.txt
C:\pmfiles\SrcFiles
File list
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
STG_EMPLOYEES
Insert
Unique Key
X
LOOKUPS Lookup Name
lkp_salary
Table
salaries.txt
Match Condition(s)
EMPLOYEE_ID = IN_EMPLOYEE_ID
Location
C:\pmfiles\LkpFiles
Filter/SQL Override
138
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
HIGH LEVEL PROCESS OVERVIEW
Lookup Source Target Expression
PROCESSING DESCRIPTION (DETAIL) The mapping will read from three flat files contained in a file list. For each employee id, we will find the corresponding salary. First name and last name will be concatenated. Age and phone number will be reformatted before loading into the STG_EMPLOYEES Oracle table.
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
139
140
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
EMPLOYEE_EMAIL
EMPLOYEE_GENDER
AGE_GROUP
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
REGIONAL_MANAGER
EMPLOYEE_FAX_NMBR
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_PHONE_NMBR
STG_EMPLOYEES
POSITION_TYPE
EMPLOYEE_COUNTRY
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_ZIP_CODE
STG_EMPLOYEES
TER_LANG_DESC
EMPLOYEE_STATE
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_CITY
STG_EMPLOYEES
SEC_LANG_DESC
EMPLOYEE_ADDRESS
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_NAME
STG_EMPLOYEES
NATIVE_LANG_DESC
EMPLOYEE_ID
STG_EMPLOYEES
STG_EMPLOYEES
Target Column
Target Table
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
number(p,s)
varchar2
varchar2
varchar2
varchar2
number(p,s)
Data type
SOURCE TO TARGET FIELD MATRIX
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
Source File
REGIONAL_MANAGER
POSITION_TYPE
THIRD_LANGUAGE
SECOND_LANGUAGE
NATIVE_LANGUAGE
Derived
Derived
EMAIL
FAX_NUMBER
Derived
COUNTRY
ZIP_CODE
STATE
CITY
ADDRESS
Derived
EMPLOYEE_ID
Source Column
The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and Greater than 60
GENDER is currently either M or F. It needs to be Male, Female or UNK
The PHONE_NUMBER column is in the format of 9999999999 and needs to be reformatted to (999) 999-9999.
Concatenate First Name and Last Name
Expression
Default Value if Null
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
141
Target Column
DEALERSHIP_ID
DEALERSHIP_MANAGER
EMPLOYEE_SALARY
HIRE_DATE
DATE_ENTERED
Target Table
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
date
date
number(p,s)
varchar2
number(p,s)
Data type
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
Source File
DATE_ENTERED
HIRE_DATE
Derived
DEALERSHIP_MANAGER
DEALERSHIP_ID
Source Column
A Salary field for each Employee ID can be found in salaries.txt.
Expression
Default Value if Null
Instructions Step 1: Create a Flat File Source Definition 1.
Launch the Designer client tool (if it is not already running) and log into the PC8_DEV repository.
2.
Import employees_layout.txt comma delimited flat file into your student folder. Make sure that you import the field names from the first line.
3.
Save the repository. Your source definition should look the same as displayed in Figure 6-1. Figure 6-1. Source Analyzer view of the employees_layout flat file definition
Step 2: Create a Relational Target Definition 1.
In the Target Designer, import the STG_EMPLOYEES table.
2.
Save the repository. Your target definition should look the same as Figure 6-2. Figure 6-2. Target Designer view of the STG_EMPLOYEES relational table definition
Step 3: Step Three: Create a Reusable Transformation Velocity Best Practice: A Velocity Design best practice is to use as many reusable transformations as possible. This decreases development time and keeps the mappings consistent. 1.
142
Open the mapping m_Stage_Customer_Contacts_xx. Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
2.
Edit exp_Format_Name_Gender_Phone and check the Make reusable box on the Transformation tab. Figure 6-3. Transformation edit dialog box showing how to make a transformation reusable
3.
Click Yes when you see the popup box. Figure 6-4. Question box letting you know the action is irreversible
Tip: Converting a transformation to reusable is nonreversible. The Transformation will now be
saved in the Transformations node within the Navigator window and will be available as a standalone object to drag into any mapping as a shortcut. Figure 6-5. Transformation edit dialog box of a reusable transformation
4.
Review the Transformation dialog box. What differences do you now see?
5.
Select the Ports tab. Can you change anything here? Why are you unable to make changes?
6.
Open the Transformation Developer by clicking the respective icon in the toolbar. .
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
143
7.
From the Navigator Window, locate the Transformations node in your respective student folder. Figure 6-6. Navigator window depicting the Transformations node
8.
Drag exp_Format_Name_Gender_Phone into the Transformation Developer workspace.
9.
Edit exp_Format_Name_Gender_Phone and add the prefix re_ to rename it to re_exp_Format_Name_Gender_Phone_Load_Date. Velocity Best Practice: It is a Velocity recommendation that reusable transformations use the prefix
re. Shortcuts should have the prefix sc (or SC if you prefer). 10.
11.
Select the Ports tab. a.
Change the name of the OUT_CUST_NAME port to OUT_NAME.
b.
Change the name of the OUT_CUST_PHONE port to OUT_PHONE.
c.
Click OK.
Save the repository.
Step 4: Create a Mapping 1.
Open the Mapping Designer by clicking the respective icon in the toolbar.
2.
Create a new mapping named m_STG_EMPLOYEES_xx.
3.
Add employees_layout.txt flat file source to the new mapping.
4.
Add STG_EMPLOYEES relational target to the new mapping. Your mapping should appear similar to Figure 6-7. Figure 6-7. Partial mapping with source and target
144
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Step 5: Create a Lookup Transformation 1.
Select the Lookup transformation tool bar button located on the Transformations tool bar with a single left click. The selected icon in Figure 6-8 identifies the Lookup tool button. T Figure 6-8. Transformation Toolbar
2.
Move your mouse pointer into the Mapping Designer Workspace and single click your left mouse button. This will create a new Lookup Transformation.
3.
Choose Import > From Flat File for the location of the Lookup Table. Figure 6-9. Lookup Transformation table location dialog box
4.
Locate the c:\pmfiles\LkpFiles directory and select the file salaries.txt. If the file is located in a different directory, your instructor will specify.
5.
The Flat File Import Wizard will appear. Confirm that the Delimited option button is selected.
6.
Select the Import field names from first line check box. Your Wizard should appear similar to Figure 6-10. Figure 6-10. Dialog box 1 of the 3 step Flat File Import Wizard
7.
Click Next.
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
145
8.
Confirm that only the Comma check box under Delimiters is selected.
9.
Select the No quotes option button under Text Qualifier.
10.
Click Next.
11.
Confirm that the field names are displayed under Column Information. These were imported from the first line of the file.
12.
Click Finish.
13.
Confirm that your Lookup Transformation appears as displayed in Figure 6-11. Figure 6-11. Normal view of the newly created Lookup Transformation
14.
Drag and drop EMPLOYEE_ID from SQ_employees_layout to the new Lookup Transformation.
15.
Edit the Lookup Transformation. Rename it to lkp_salaries. Velocity Best Practice: Velocity naming conventions specify to name Lookup transformations lkp_LOOKUP_TABLE_NAME.
16.
Click on the Ports tab. Rename EMPLOYEE_ID1 to IN_EMPLOYEE_ID. Velocity Best Practice: It is a Velocity best to Prefix all input ports to an Expression or Lookup with
IN_.
146
17.
Uncheck the output port for IN_EMPLOYEE_ID.
18.
Select the Condition tab.
19.
Select the Add a new condition button. PowerCenter will choose the first lookup port and the first input port automatically.
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Your condition should look similar to Figure 6-12. Figure 6-12. Lookup Transformation condition box
20.
Click OK.
21.
Save the repository.
Step 6: Add a Reusable Expression Transformation In the Navigator Window, under Transformations node, click and drag re_exp_Format_Name_Gender_Phone_Load_Date into the Mapping Designer Workspace.
Step 7: Link Transformations 1.
Link the following ports from SQ_employees_layout to the STG_EMPLOYEES target: EMPLOYEE_ID
JEMPLOYEE_ID
ADDRESS
JEMPLOYEE_ADDRESS
CITY
JEMPLOYEE_CITY
STATE
JEMPLOYEE_STATE
ZIP_CODE
JEMPLOYEE_ZIP_CODE
COUNTRY
JEMPLOYEE_COUNTRY
FAX_NUMBER
JEMPLOYEE_FAX_NMBR
EMAIL
JEMPLOYEE_EMAIL
NATIVE_LANGUAGE
JNATIVE_LANG_DESC
SECOND_LANGUAGE
JSEC_LANG_DESC
THIRD_LANGUAGE
JTER_LANG_DESC
POSITION_TYPE
JPOSITION_TYPE
DEALERSHIP_ID
JDEALERSHIP_ID
REGIONAL_MANAGER
JREGIONAL_MANAGER
DEALERSHIP_MANAGER
JDEALERSHIP_MANAGER
HIRE_DATE
JHIRE_DATE
DATE_ENTERED
JDATE_ENTERED
2.
Save the repository.
3.
Link the Following ports from lkp_SALARY to STG_EMPLOYEES: SALARY
JEMPLOYEE_SALARY
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
147
4.
5.
6.
Link the following ports from SQ_employees_layout to re_exp_Format_Name_Gender_Phone_Load_Date: FIRSTNAME
JIN_FIRSTNAME
LASTNAME
JIN_LASTNAME
PHONE_NUMBER
JIN_PHONE_NUMBER
GENDER
JIN_GENDER
AGE
JAGE
Link the following ports from re_exp_Format_Name_Gender_Phone_Load_Date to STG_EMPLOYEES: OUT_NAME
JEMPLOYEE_NAME
OUT_PHONE
JEMPLOYEE_PHONE_NMBR
OUT_GENDER
JEMPLOYEE_GENDER
OUT_AGE_GROUP
JAGE_GROUP
Save the repository.
Step 8: Create and Run the Workflow 1.
Launch the Workflow Manager client and sign into your assigned folder.
2.
Open the Workflow Designer tool and create a new workflow named wkf_STG_EMPLOYEES_xx.
3.
Create a session task using the session task tool button.
4.
Select m_STG_EMPLOYEES_xx from the Mapping list box and click OK.
5.
Link the Start object to the s_m_STG_EMPLOYEES_xx session task object.
6.
Edit the s_m_STG_EMPLOYEES_xx session.
7.
Under the Mapping tab: ♦ ♦ ♦
Confirm that Source file directory is set to $PMSourceFileDir\. In Properties | Attribute | Source filename type in employees_list.txt. In Properties | Attribute | Source filetype click the drop-down arrow and change the default from Direct to Indirect. Your Mapping | Source | Properties | Attributes should be the same as Figure 6-13. Figure 6-13. Source properties for the employee_list file list
♦ ♦ ♦
148
Select STG_EMPLOYEES located under the Target folder in the navigator window. Set the relational target connection object property to NATIVE_STGxx where xx is your student number. Check the property Truncate target table option in the target properties. Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
♦ ♦ ♦
Select lkp_salaries from the Transformations folder in the navigator window. Verify the Lookup source file directory is $PMLookupFileDir\. Type salaries.txt in the Lookup filename.
8.
Save the repository.
9.
Check Validate messages to ensure your workflow is valid.
10.
Start the workflow.
11.
Review the Task Details Figure 6-14. Task Details of the completed session run
12.
Review the Source/Target Statistics. Figure 6-15. Source/Target Statistics of the completed session run
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
149
13.
Use Preview Data feature in the Designer to view the data results. Figure 6-16. Data Preview of the STG_EMPLOYEES target table
Note: not all rows and columns are shown.
150
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
151
152
Unit 6 Lab A: Load Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 6 Lab B: Load Date Staging Table Business Purpose The date staging area in the operational data store must be loaded with one record for each date covered in the data marts. Each date must be described with the date attributes used in the data mart, such as the month name, quarter number, whether the date is a weekday or a weekend, and so forth.
Technical Description To load the date staging area, we will use Informatica date functions and variables to transform a date value and date id. The raw dates are in a flat file.
Objectives ♦
Copy an Expression transformation to convert a string date to various descriptive date columns.
♦
Use the Expression Editor to create or view expressions and become familiar with date function syntax.
♦
Understand the evaluation sequence of input, output, and variable ports.
♦
Learn how to use variable ports.
Duration 30 minutes
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
153
Velocity Deliverable: Mapping Specifications Mapping Name
m_STG_DATES_xx
Source System
Flat file
Target System
Oracle table
Initial Rows
4019
Rows/Load
4019
Short Description
A text file will run through an expression to do date manipulation and load to our date staging area.
Load Frequency
Once
Preprocessing
Target append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Files File Name
File Location
Fixed/Delimited
Additional File Info
dates.txt
C:\pmfiles\SrcFiles
Delimited
Comma delimiter
TARGETS Tables
Schema Owner
Table Name
Update
Delete
STG_DATES
Insert
Unique Key
X
HIGH LEVEL PROCESS OVERVIEW
Source
154
Expression
Target
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
155
Target Column
DATE_ID_LEGACY
DATE_VALUE
DAY_OF_MONTH
MONTH_NUMBER
YEAR_VALUE
DAY_OF_WEEK
DAY_NAME
MONTH_NAME
DAY_OF_YEAR
MONTH_OF_YEAR
WEEK_OF_YEAR
DAY_OVERALL
WEEK_OVERALL
MONTH_OVERALL
YEAR_OVERALL
HOLIDAY_INDICATOR
WORKDAY_INDICATOR
WEEKDAY_INDICATOR
WEEKEND_INDICATOR
QUARTER_OF_YEAR
Target Table
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
Source File
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
derived
Derived
DATE_ID
Source Column
SOURCE TO TARGET FIELD MATRIX
The quarter number of the year.
This flag will tell us whether the record date is a weekend.
This flag will tell us whether the record date is a weekday.
This flag will tell us whether the record date is a workday.
This flag will tell us whether the record date is a holiday.
The year number overall.
The month number overall.
The week number overall.
The day number overall.
The week number of the year for the record.
The month number of the year for the record.
The day number of the year for the record. EX - 1-365
The month name for the record.
The name of the day for the record.
The day of the week for the record.
The year for each record.
The month number of the year.
The current day of the current month. EX - TUESDAY
Reformat the DATE column to MM/DD/YYYY
Expression
Default Value if Null
Data Issues/ Quality
This mapping will generate the date staging table from the dates text file. The Expression transformation is used to derive the different date values.
PROCESSING DESCRIPTION (DETAIL)
156
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Target Column
SEASON
LAST_DAY_IN_MONTH
LAST_DAY_IN_QUARTER
LAST_DAY_IN_YEAR
Target Table
STG_DATES
STG_DATES
STG_DATES
STG_DATES
dates.txt
dates.txt
dates.txt
dates.txt
Source File
derived
derived
derived
derived
Source Column
Flag to indicate the current date is the last day of the year.
Flag to indicate the current date is last day of the quarter.
Flag to indicate the current date is last day of the month.
The current season.
Expression
Default Value if Null
Data Issues/ Quality
Instructions Step 1: Create a Flat File Source Definition 1.
Launch the Designer (if it is not already running) and connect to the PC8_DEV repository.
2.
Open your student folder.
3.
Import the dates.txt comma delimited flat file source using the Flat File Wizard. Make sure that you import the field names from the first line.
4.
Save the repository.
Step 2: Create a Relational Target Definition 1.
Import the STG_DATES table using the Target Designer.
2.
Save the repository.
Step 3: Create a Mapping 1.
Create a new mapping named m_STG_DATES_xx.
2.
Add dates flat file source to the mapping.
3.
Add the STG_DATES target to the mapping. Your mapping should appear similar to Figure 6-17. Figure 6-17. Mapping with Source and Target definitions
4.
Expand the DEV_SHARED folder.
5.
Expand the Transformations subfolder. a.
select the re_exp_STG_DATES.
b.
With your left mouse button, drag the transformation toward your mapping but DO NOT DROP IT.
c.
Hold down the Ctrl key.
d.
Drop the transformation into the mapping.
e.
Click “Yes” on the Copy Confirmation message box. Note: If the confirmation box says “Shortcut” instead of “Copy”, try again and make sure that you
hold down the Ctrl key continuously as you drop the transformation into the mapping.
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
157
6.
Link the two output ports on the Source Qualifier to the two input ports on the Expression transformation, matching the names.
7.
Use the “Autolink” feature to link the output ports in the Expression transformation to the corresponding fields in the target definition - by Position.
8.
Save the mapping and confirm it is valid. Your mapping will appear the same as in Figure 6-18. Figure 6-18. Completed Mapping
9.
Edit the Expression transformation and click on the Ports tab.
10.
Examine the structure of the Expression transformation ports and expressions. Note that the DATE_ID is an integer that is passed directly to the target table unchanged. The input port DATE supplies a string that describes an individual date, such as 'May 20, 2005'. The variable ports will process that string in various ways in order to extract a specific descriptor, such as the day of the week, the quarter, the month, whether the date is a holiday, etc. These descriptors will later be used in the data warehouse to group and filter report data.
11.
Examine some of the variable port expressions and see if you can determine how they work. You can use PowerCenter Help to view the syntax for any function. If you wish, ask your instructor for clarification on any of the expressions. Note that variable ports cannot be output ports, so a separate set of output ports is used at the bottom of the transformation in order to output the data to the target. Most of these output ports simply call a variable port. Variable ports were used in this transformation because they will be resolved one at a time, top to bottom. In this case, some of the later expressions are dependent on the results of the earlier expressions.
Tip: Informatica evaluates ports in the following order: input/output (input only as well), variable, and then output. Variables are evaluated in top down order, so it is important to put them in a specific order.
158
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Step 4: Create a Workflow and a Session Task 1.
Launch the Workflow Manager application (if it's not already running) and connect to the PC8_DEV repository.
2.
Open your student folder.
3.
Create a new workflow named wkf_Load_STG_DATES_xx.
4.
Create a session named s_m_STG_DATES_xx that uses the m_STG_DATES_xx mapping,
5.
Edit the session you just created. a.
Select the Mapping tab.
b.
Select the Source Qualifier icon SQ_dates.
c.
In the Properties area scroll down and confirm the source file name and location. Ensure that the Source Filename property value includes the .txt extension.
d.
Select the target STG_DATES.
e.
Select your appropriate target connection object.
f.
Select the option “Truncate target table”.
6.
Complete the workflow by linking the Start task to the session task.
7.
Save the repository.
Step 5: Run the Workflow and Monitor the Results 1.
Start the workflow.
2.
Maximize the Workflow Monitor and select the Task View.
3.
Review the Task Details. Your information should appear the same as in Figure 6-19. Figure 6-19. Task Details of the completed session run
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
159
4.
Review the Source/Target Statistics. Figure 6-20. Source/Target Statistics for the session run
160
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Data Results Use the Preview Data feature in the Designer to view the data results. Your results should appear similar to those in Figure 6-21 through Figure 6-22. Figure 6-21. Data preview of the STG_DATES table - screen 1
Figure 6-22. Data preview of the STG_DATES table - screen 2 scrolled right
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
161
162
Unit 6 Lab B: Load Date Staging Table Informatica PowerCenter 8 Level I Developer
Unit 7: Debugger In this unit you will learn about: ♦
Debugging mappings
Lesson 7-1. Debugging Mappings The Debugger is a wizard-driven Designer tool that runs a test session.
Integration Service must be running before starting a Debug Session. 1.
Start the Debugger. A spinning Debugger Mode icon is displayed - stops when the Integration Service is ready.
2.
Choose an existing session or define a one-time debug session. Options: ♦ ♦
3.
Monitor the Debugger: ♦ ♦ ♦
4.
Load or discard target data Save debug environment for later use Output window - view Debug or Session log. Transformation Instance Data window - view transformation data. Target Instance window - view target data.
Move through the session - menu options include: ♦ ♦ ♦
Next Instance. Runs until it reaches the next transformation or satisfies a breakpoint condition. Step to Instance. Runs until it reaches the selected transformation instance or satisfies a breakpoint condition. Show current instance. Displays the current instance in the Transformation Instance window.
Unit 7: Debugger Informatica PowerCenter 8 Level I Developer
163
♦ ♦
5.
Modify data and breakpoints. When the Debugger pauses, you can modify: ♦ ♦ ♦
164
Continue. Runs until it satisfies a breakpoint condition. Break now. Pauses wherever it is currently processing. Change data Change variable values Add or change breakpoints
Unit 7: Debugger Informatica PowerCenter 8 Level I Developer
Unit 7: Debugger Informatica PowerCenter 8 Level I Developer
165
166
Unit 7: Debugger Informatica PowerCenter 8 Level I Developer
Unit 7 Lab: Using the Debugger Business Purpose The m_STG_DATES_DEBUG mapping contains at least one error that results in bad data loaded into the target table. This error must be found and corrected so the data warehouse project will be successful.
Technical Description The Debugger will be used to track down the cause of the error or errors.
Objectives ♦
Use the Debug Wizard.
♦
Use the Debug Toolbar.
Duration 30 minutes
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
167
Velocity Deliverable: Mapping Specifications Mapping Name
m_STG_DATES_DEBUG
Source System
Flat file
Target System
Oracle table
Initial Rows
4019
Rows/Load
4019
Short Description
A text file is run through an Expression transformation to do date manipulation and load our date staging area.
Load Frequency
Once
Preprocessing
Target append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
Sources Files File Name
File Location
Fixed/Delimited
Additional File Info
dates.txt
C:\pmfiles\SrcFiles
Delimited
Comma delimiter
Targets Tables
Schema Owner
Table Name
Update
Delete
STG_DATES_VIEW
Insert
Unique Key
X
High Level Process Overview
Source
Expression
Target
Processing Description (Detail) This mapping will generate the date staging table from the dates text file.
168
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
169
Target Column DATE_ID_LEGACY DATE_VALUE DAY_OF_MONTH MONTH_NUMBER MONTH_NAME YEAR_VALUE
Target Table
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
STG_DATES_VIEW
Source To Target Field Matrix
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
dates.txt
Source File
derived
derived
derived
derived
derived
DATE_ID
Source Column
Year for each record
Name of the month
Month number of the year
Current day of current month
Reformat to MM/DD/YYYY
Expression
Default Value if Null
Instructions Step 1: Copy and Inspect the Debug Mapping 1.
Expand the DEV_SHARED folder. a.
Locate the mapping m_STG_DATES_DEBUG and copy it to your folder.
b.
If a source or target conflict occurs choose Reuse.
2.
Save the Repository.
3.
Open the mapping in your workspace. a.
Get an overall idea what kind of processing is being done.
b.
Read each of the expressions in the Expression transformation. Note that the mapping is a simplified version of the one used in Unit 6 Lab B.
You have been told only that there is an “error” in the data being written to the target, without any further clarification as to the nature of the error. Tip: Many mapping errors can be found by carefully inspecting the mapping - without using the Debugger. However, if the error cannot be located in a timely fashion in this manner, the Debugger will assist you by showing the actual data passing through the transformation ports. In order to properly use the Debugger, you must first understand the logic of the mapping.
Step 2: Step Through the Debug Wizard 1.
Press your F9 key. This evokes the Debug Wizard. The first page of the Debug Wizard is informational. Please read it.
Tip: The Debugger requires a valid mapping and session to run; it cannot help you determine why a
mapping is invalid. The Designer Output Window will show you the reason(s) why a mapping is invalid. 2.
170
Press the Next button.
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
Your Wizard should appear similar to Figure 7-1 below. Accept the default setting - Create a debug session instance for this mapping, and press the Next button. Figure 7-1. Debug Session creation dialog box
The next page of the Wizard allows you to set connectivity properties. This information is familiar to you from creating sessions, except that here it is a subset of the regular session options and is formatted somewhat differently. 3.
Set the Target Connection Value to your target schema database connection object. The debugger data will be discarded in a later step so this value will be ignored.
4.
Select the Properties tab at the bottom. Your Wizard should appear as in Figure 7-2 below. Figure 7-2. Debug Session connections dialog box
♦ ♦
Ensure that the Source Filename property values includes the .txt extension. In this lab, verify you enter dates.txt. Ensure that the Target load type property value is set to Normal
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
171
5.
Press the Next button.
6.
We will not be overriding transformation properties, so press Next again.
7.
Accept the defaults on the Session Configuration Wizard page and press Next.
8.
The final Wizard page allows us to choose whether or not to discard the target data (the default) and choose which target data to view. Accept the defaults here as well.
9.
When you press the Finish button, a Debug session will be created and it will initialize, opening the required database connections. No data will be read until we are ready to view it.
Step 3: Use the Debugger to Locate the Error When the Debug Wizard Finish button is pressed, the appearance of the Designer interface will change, and it will likely require some minor adjustment to make it more readable. Note that three window panes are visible at the bottom third of the screen. Adjust the horizontal dividers with your mouse until what you see resembles Figure 7-3. 1.
Set the Target Instance and Instance drop-boxes as shown in Figure 7-3 as well. Note: The term instance is sometimes used as a synonym for transformation. Figure 7-3. Designer while running a Debug Session
As mentioned earlier, the Debug session is initialized at this point but no data is read. We will manually control the debugger so we can easily review the data values and spot the error. The debugger can be controlled via the Designer menu, via hotkeys (described in the menu), or with the Debug Toolbar. We will use the toolbar.
172
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
2.
The Debug Toolbar is not visible by default. To make it visible, select the menu option Tools > Customize. You will see the dialog box shown in Figure 7-4. Figure 7-4. Customize Toolbars Dialog Box
3.
Select the Debugger toolbar.
4.
Click OK. The Debug Toolbar is short. When it is undocked, it appears as in Figure 7-5. If you cannot see it right away, look for the red “stop sign” on the right. Figure 7-5. Debugger Toolbar
Tip: if you cannot find the Debugger Toolbar after using the menu option to select it, another toolbar has shifted it off the screen. Re-arrange the other docked toolbars until you can see it. 5.
You can cause one row of data to be read by the Source Qualifier by pressing the third toolbar button - tooltip Next Instance. Note that some data is shown in the Instance window.
6.
Toggle the Instance drop-box to the Expression transformation. The data has not yet gone that far. Note: No data available means null in the Debugger.
7.
Press the fourth toolbar button - tooltip “Step to Instance.” Note that one more row has been read, and the first row has been “pushed” into the Expression transformation and the Target table.
8.
Press the Next Instance toolbar button (third) several times. Note that each time it is pressed, one more row is read and one more row (the row that was read from the previous press) is loaded into the target. The Instance window jumps between the Source Qualifier and the Expression (i.e., it follows the row).
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
173
9.
Press the Step to Instance toolbar button (fourth) several times. Note that it also causes one row to be read and written, but the Instance window shows only the data in one transformation - the one chosen in the drop-box.
10.
Examine the data being sent to the target. What is the error? Hint: compare the values with the actual date being read from the source file.
Now that you are familiar with the basics of operating the Debugger, locate the cause of the error.
Step 4: Fix the Error and Confirm the Data is Correct When you have found the error, you will not be able to fix it while the Debugger is running (try it). The mapping properties are grayed-out because there is an “in-use” lock on the mapping.
174
1.
Stop the Debugger by pressing the second toolbar button. Press Yes.
2.
Fix the mapping error.
3.
Save the Repository.
4.
Re-start the Debug Wizard as in Step 2. Note that your Debug session properties (such as connectivity) have been saved locally, making it easier for you to evoke the Debugger again if needed.
5.
Confirm that the data being sent to the target is now correct.
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
175
176
Unit 7 Lab: Using the Debugger Informatica PowerCenter 8 Level I Developer
Unit 8: Sequence Generator In this unit you will learn about: ♦
Sequence Generator Transformation
Lesson 8-1. Sequence Generator Transformation
Type Passive.
Description The Sequence Generator Transformation generates unique numeric values that can be used to create keys. The values created by the sequence generator are sequential but not guaranteed to be contiguous. The Sequence Generator is an “output” only transformation with two outputs represented by the “ NEXTVAL” and “CURRVAL” ports. Typically connect the “NEXTVAL” port to generate a new key. When connected to multiple targets the output of the Sequence Generator generates sequential values for each target. To use the same value for each target, pass the output of the Sequence Generator to an Expression transformation before connecting it to a target.
Unit 8: Sequence Generator Informatica PowerCenter 8 Level I Developer
177
178
Unit 8: Sequence Generator Informatica PowerCenter 8 Level I Developer
Properties Property
Description
Start Value
The start value of the generated sequence that you want the Integration Service to use if you use the Cycle option. If you select Cycle, the Integration Service cycles back to this value when it reaches the end value.
Increment By
The value you want the sequence generator to increment by.
End Value
The maximum value the Integration Service generates.
Current Value
The current value of the sequence.
Cycle
If selected, the Integration Service cycles through the sequence range.
Number of Cached Values
The number of sequential values the Integration Service caches at a time.
Reset
If selected, the Integration Service generates values based on the original current value for each session.
Tracing Level
Level of detail about the transformation that the Integration Service writes into the session log.
For more details, see the online help.
Business Purpose A business receives customer information which is used to update a data warehouse customer dimension table with a customer history. A sequence generator is used to create surrogate keys to maintain referential integrity within the dimension table since a customer may have duplicate entries.
Example The following example shows a partial mapping where the sequence generator is used to generate a new key for the Dates dimension table.
Performance Considerations It is best to configure the Sequence Generator transformation as close to the target as possible in a mapping otherwise a mapping will be carrying extra sequence numbers through the transformation process which will not be transformed.
Unit 8: Sequence Generator Informatica PowerCenter 8 Level I Developer
179
180
Unit 8: Sequence Generator Informatica PowerCenter 8 Level I Developer
Unit 8 Lab: Load Date Dimension Table Business Purpose The Mersche Motors data warehouse has a date dimension table that needs to be loaded. The date dimension needs to be loaded before any of the other dimension tables.
Technical Description PowerCenter will extract the dates from a shared relational table and load them into a shared relational table. All columns in the source table have matching columns in the target table. A primary key for the target table will be assigned using the Sequence Generator transformation.
Objectives ♦
Create sources and targets based on shortcuts
♦
Create a Sequence Generator transformation
♦
Create unique integer primary key values using the NEXTVAL port
Duration 20 Minutes
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
181
Velocity Deliverable: Mapping Specifications Mapping Name
m_DIM_DATES_LOAD_xx
Source System
Oracle Table
Target System
Oracle Table
Initial Rows
4019
Rows/Load
4019
Short Description
Source relational table will be directly loaded into a relational target. The primary key for the target table will be assigned by a sequence generator.
Load Frequency
Once
Preprocessing Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Tables Table Name
Schema/Owner
STG_DATES
TDBUxx
Selection/Filter
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
DIM_DATES
Insert
Unique Key
X
DATE_KEY
HIGH LEVEL PROCESS OVERVIEW Sequence Generator
Relational Source
Relational Target
PROCESSING DESCRIPTION (DETAIL) The Sequence Generator transformation will be used to assign unique integer values for the DATE_KEY field as rows are passed from the STG_DATES table to the DIM_DATES table.
182
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
183
Target Column
DATE_KEY
DATE_VALUE
DATE_ID_LEGACY
DATE_OF_MONTH
MONTH_NUMBER
YEAR_VALUE
DAY_OF_WEEK
DAY_NAME
MONTH_NAME
DAY_OF_YEAR
MONTH_OF_YEAR
WEEK_OF_YEAR
DAY_OVERALL
WEEK_OVERALL
MONTH_OVERALL
YEAR_OVERALL
HOLIDAY_INDICATOR
WORKDAY_INDICATOR
WEEKDAY_INDICATOR
WEEKEND_INDICATOR
QUARTER_OF_YEAR
SEASON
LAST_DAY_IN_MONTH
Target Table
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
DIM_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
STG_DATES
Source File
SOURCE TO TARGET FIELD MATRIX
LAST_DAY_IN_MONTH
SEASON
QUARTER_OF_YEAR
WEEKEND_INDICATOR
WEEKDAY_INDICATOR
WORKDAY_INDICATOR
HOLIDAY_INDICATOR
YEAR_OVERALL
MONTH_OVERALL
WEEK_OVERALL
DAY_OVERALL
WEEK_OF_YEAR
MONTH_OF_YEAR
DAY_OF_YEAR
MONTH_NAME
DAY_NAME
DAY_OF_WEEK
YEAR_VALUE
MONTH_NUMBER
DATE_OF_MONTH
DATE_ID_LEGACY
DATE_VALUE
derived
Source Column NEXTVAL from Sequence Generator
Expression
Default Value if Null
184
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Target Column
LAST_DAY_IN_QUARTER
LAST_DAY_IN_YEAR
Target Table
DIM_DATES
DIM_DATES
STG_DATES
STG_DATES
Source File
LAST_DAY_IN_YEAR
LAST_DAY_IN_QUARTER
Source Column
Expression
Default Value if Null
Instructions Step 1: Create a Shortcut to a Shared Relational Source Table 1.
Expand the DEV_SHARED folder and locate the source definition STG_DATES in the ODBC_STG node. Notice that this STG_DATES object is a source, while the STG_DATES that you have already used is a target.
2.
Ensure that your student folder is open.
3.
Drag and drop the STG_DATES source definition from the DEV_SHARED folder into the Source Analyzer.
4.
Click Yes to confirm the shortcut.
5.
Rename the shortcut SC_STG_DATES.
You should now see the SC_STG_DATES shortcut in your own student folder. Velocity Best Practice: The SC_ prefix is the Velocity Best Practice naming convention for shortcut
objects. 6.
Save your work.
Step 2: Create a Shortcut to a Shared Relational Target Table 1.
In the DEV_SHARED folder, located the target DIM_DATES.
2.
Drag and drop DIM_DATES into the Target Designer.
3.
Click Yes to confirm the shortcut.
4.
Rename the shortcut SC_DIM_DATES.
5.
Save your work.
You will now be able to see the SC_DIM_DATES shortcut in your own student folder.
Step 3: Create a Mapping 1.
Create a new mapping named m_DIM_DATES_LOAD_xx.
2.
Add the SC_STG_DATES relational source to the new mapping.
3.
Add the SC_DIM_DATES relational target to the new mapping.
4.
Expand the mapping objects.
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
185
Your mapping should appear similar to Figure 8-1. Figure 8-1. Expanded view of m-DIM_DATES_LOAD
Step 4: Create a Sequence Generator Transformation 1.
From the Transformation toolbar, select the Sequence Generator transformation icon. Figure 8-2. Sequence Generator Transformation icon
2.
Position the Sequence Generator transformation before the target. Tip: You can create approximately two billion primary or foreign key values with the Sequence
Generator transformation by connecting the NEXTVAL port to the desired transformation or target and using the widest range of values (1 to 2147483647) with the smallest interval (1). 3.
From the Sequence Generator transformation select the NEXTVAL port and link it to the DATE_KEY column of the SC_DIM_DATES target. Figure 8-3. Normal view of the sequence generator NEXTVAL port connected to a target column
4.
186
Rename the sequence generator seq_DIM_DATES_DATE_KEY. Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
5.
Select the Properties tab and observe the properties available in the sequence generator. a.
Check the Reset Transformation Attribute Value.
b.
Describe the following properties. Use the Help system to find the answers.
♦
Increment by:________________________________________________ Current value:________________________________________________
♦
6.
Click the OK button to return to the Normal view of the sequence generator.
7.
Save your work.
Step 5: Link the Target Table 1.
Link all the ports from the Source Qualifier transformation to the corresponding columns in the target object using Autolink by name. See Figure 8-4. Figure 8-4. Normal view of connected ports to the target
2.
Save your work.
3.
Verify your mapping is valid in the Output window. If the mapping is not valid, correct the invalidations that are displayed in the message.
Step 6: Create and Run the Workflow 1.
Launch the Workflow Manager (if not already running) and connect to the repository and open your student folder.
2.
From Workflow Designer create a new workflow named wkf_DIM_DATES_LOAD_xx.
3.
Use the Session task icon and create a new Session task.
4.
Associate the m_DIM_DATES_LOAD_xx mapping to the new session task.
5.
Link the Start object to the s_m_DIM_DATES_LOAD_xx session task object.
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
187
6.
Edit the s_m_DIM_DATES_LOAD_xx session task and set the following options in the Mapping tab: ♦ ♦ ♦ ♦ ♦ ♦
Select SQ_SC_STG_DATES from the Sources folder in the navigator window. Set the Connections Value to your assigned NATIVE_STGxx connection value. Select SC_DIM_DATES from the Target folder in the navigator window. Set the Connections Value to your assigned NATIVE_EDWxx connection value. Set the Target Load type to Normal. Check the property Truncate target table option in the target properties.
7.
Save your work.
8.
Check Validate messages to ensure your workflow is valid. If you received an invalid message, correct the problem(s) and re-validate/save.
9.
Start the workflow.
10.
Review the Task Details. Your information should appear similar to Figure 8-5.
Figure 8-5. Task Details of the completed session run
11.
Select the Source/Target Statistics tab. Your statistics should be similar to Figure 8-6.
Figure 8-6. Source/Target statistics for the session run
188
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Data Results Preview the target data from the Designer. Your data should appear similar to Figure 8-7. Figure 8-7. Data Preview of the DIM_DATES table
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
189
190
Unit 8 Lab: Load Date Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 9: Lookup Caching, More Features and Techniques In this unit you will learn about: ♦
Lookup caching
♦
Using lookup caching in a mapping
The labs also illustrate more PowerCenter features and techniques.
Lesson 9-1. Lookup Caching Description The Lookup transformation allows you to cache the lookup table in memory. This is the default.
Properties This section will discuss the cache related properties. Dynamic cache will be discussed in a later module. Option
Lookup Type
Description
Lookup Caching Enabled
Flat File, Relational
Indicates whether the Integration Service caches lookup values during the session.
Lookup Cache Directory Name
Flat File, Relational
Specifies the directory used to build the lookup cache files when you configure the Lookup transformation to cache the lookup source. Also used to save the persistent lookup cache files when you select the Lookup Persistent option. By default, the Integration Service uses the $PMCacheDir directory configured for the Integration Service process.
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
191
Option
Lookup Type
Description
Lookup Cache Persistent
Flat File, Relational
Indicates whether the Integration Service uses a persistent lookup cache.
Lookup Data Cache Size
Flat File, Relational
Indicates the maximum size the Integration Service allocates to the data cache in memory. When the Integration Service cannot store all the data cache data in memory, it pages to disk as necessary.
Lookup Index Cache Size
Flat File, Relational
Indicates the maximum size the Integration Service allocates to the index cache in memory. When the Integration Service cannot store all the index cache data in memory, it pages to disk as necessary.
Cache File Name Prefix
Flat File, Relational
Use only with persistent lookup cache. Specifies the file name prefix to use with persistent lookup cache files.
Recache From Lookup Source
Flat File, Relational
Use only with the lookup cache enabled. When selected, the Integration Service rebuilds the lookup cache from the lookup source when it first calls the Lookup transformation instance. If you use a persistent lookup cache, it rebuilds the persistent cache files before using the cache. If you do not use a persistent lookup cache, it rebuilds the lookup cache in memory before using the cache.
For more detailed information refer to the online help.
Lookup Cache How it Works ♦ ♦ ♦ ♦ ♦ ♦
There are two types of cache memory, index and data cache. All port values from the lookup table where the port is part of the lookup condition are loaded into index cache. The index cache contains all port values from the lookup table where the port is specified in the lookup condition. The data cache contains all port values from the lookup table that are not in the lookup condition and that are specified as “output” ports. After the cache is loaded, values from the Lookup input port(s) that are part of the lookup condition are compared to the index cache. Upon a match the rows from the cache are included in the stream.
Key Point If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk.
Performance Considerations Lookup caching typically improves performance because the Integration Service need not execute an external read request to perform the lookup. However, this is true only if the time taken to load the lookup cache is less than the time that would be taken to perform the external read requests. To reduce the amount of cache required, turn off or delete any unused output ports in the Lookup transformation. You can also index the lookup file to speed the retrieval time. You can use where clauses in the SQL override to minimize the amount of data written to cache.
192
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
Rule Of Thumb Cache if the number (and size) of records in the lookup table is small relative to the number of mapping rows requiring a lookup.
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
193
194
Unit 9: Lookup Caching, More Features and Techniques Informatica PowerCenter 8 Level I Developer
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Business Purpose Mersche Motors runs a number of promotions that begin and end on certain dates. The promotions are stored in the promotions dimension table. This table stores the start and expiry dates as date keys that reference the date dimension table.
Technical Description The DIM_PROMOTIONS table requires start and expiration date keys. These exist in the DIM_DATES table that was populated in the previous lab. To obtain these date keys, which were created by the sequence generator, it will be necessary to perform a Lookup to the DIM_DATES table in the EDW database. The DIM_DATES table changes infrequently so it will be loaded into cache in a persistent state. The lookup cache will be used often by other Mappings that load Dimension tables.
Objectives Understand how to configure and use a persistent Lookup cache.
Duration 25 minutes
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
195
Velocity Deliverable: Mapping Specifications Mapping Name
m_DIM_PROMOTIONS_LOAD_xx
Source System
Oracle Table
Target System
Oracle Table
Initial Rows
6
Rows/Load
6
Short Description
Promotion data is run through the mapping and a lookup must be performed to the DIM_DATE table to acquire the date keys for the start date and expiration date in the DIM_PROMOTIONS table.
Load Frequency
To be determined
Preprocessing
DIM_DATES must be loaded
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
STG_PROMOTIONS
TDBUxx
None
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
DIM_PROMOTIONS
Insert
Unique Key
X
PROMO_ID
LOOKUPS Lookup Name
lkp_START_DATE_KEY
Table
DIM_DATES
Match Condition(s)
DIM_DATES.DATE_VALUE = STG_PROMOTIONS.START_DATE
Location
EDW
Filter/SQL Override Lookup Name
lkp_EXPIRY_DATE_KEY
Table
DIM_DATES
Match Condition(s)
DIM_DATES.DATE_VALUE = STG_PROMOTIONS.EXPIRY_DATE
Location
EDW
Filter/SQL Override
196
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
HIGH LEVEL PROCESS OVERVIEW
Lookup
Source
Target
Lookup
PROCESSING DESCRIPTION (DETAIL) This mapping will populate the DIM_PROMOTIONS table with data. In order to successfully populate the DIM_PROMOTIONS table there must be two Lookups to the DIM_DATES table to acquire values for the START_DK and EXPIRY_DK date keys. Students will need to determine which columns to use for the condition in the Lookup Transformation. Note: This lab requires the successful completion of the Unit 8 Lab.
SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source Table
Source Column
DIM_PROMOTIONS
PROMO_ID
STG_PROMOTIONS
PROMO_ID
DIM_PROMOTIONS
PROMO_DESC
STG_PROMOTIONS
PROMO_DESC
DIM_PROMOTIONS
PROMO_TYPE
STG_PROMOTIONS
PROMO_TYPE
DIM_PROMOTIONS
START_DK
DIM_DATES
DATE_KEY
DIM_PROMOTIONS
EXPIRY_DK
DIM_DATES
DATE_KEY
DIM_PROMOTIONS
PROMO_COST
STG_PROMOTIONS
PROMO_COST
DIM_PROMOTIONS
DISCOUNT
STG_PROMOTIONS
DISCOUNT
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Expression
Default Value if Null
197
Instructions Step 1: Create a Shortcut to a Shared Relational Source Table 1.
In the Source Analyzer, create a short cut to the STG_PROMOTIONS source table from the DEV_SHARED > Sources > ODBC_STG folder.
2.
Rename the shortcut to SC_STG_PROMOTIONS.
3.
Save your work.
Step 2: Create a Shortcut to Shared Relational Target Table 1.
In the Target Designer, create a shortcut to the DIM_PROMOTIONS target table from the DEV_SHARED > Targets folder.
2.
Rename the shortcut to SC_DIM_PROMOTIONS. Note: If the SC_DIM_DATES target table is not displayed in the Target Designer drag it in from the Targets folder in your student folder. Notice the primary key-foreign key relationships.
3.
Save your work.
Step 3: Create a Mapping 1.
Create a new mapping named m_DIM_PROMOTIONS_LOAD_xx.
2.
Add the source definition shortcut SC_STG_PROMOTIONS to the mapping.
3.
Add the target definition shortcut SC_DIM_PROMOTIONS to the mapping.
4.
Arrange transformations appropriately and Autolink the ports “By Name” between: ♦
5.
SQ _SC_STG_PROMOTIONS and SC_DIM_PROMOTIONS.
Save your work. It should look like the mapping in Figure 9-1.
Step 4: Create Lookups for the Start and Expiry Date Keys 1.
Examine Figure 9-1. Figure 9-1. m_DIM_PROMOTIONS_LOAD mapping
2.
In Figure 9-1, compare START_DATE and EXPIRY_DATE in SQ_SC_STG_PROMOTIONS to START_DK AND EXPIRY_DK in the SC_DIM_PROMOTIONS Target table. Notice that these two ports are not connected and the datatypes are different. The target requires key values (number), not dates. In what table do these Date Key values exist? _________________________.
198
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
3.
Examine Figure 9-2. Figure 9-2. m_DIM_DATES from the previous lab that populated the DIM_DATES table
The date dimension table (DIM_DATES) was populated by the previous lab, the DATE_KEY was generated by the seq_DIM_DATES_DATE_KEY Sequence Generator transformation and DATE_VALUE has a datatype of date/time. 4.
To acquire the value for the START_DK in the DIM_PROMOTIONS target you need to perform a Lookup on the DIM_DATES table. You will base the Lookup Condition on the ____________________ port from SQ_SC_STG_PROMOTIONS Source Qualifier and the ____________________ column in the DIM_DATES Lookup table.
5.
Similarly, to acquire the value for the EXPIRY_DK in the DIM_PROMOTIONS Target you will need a second Lookup on the DIM_DATES as well. You will base the Lookup Condition on the ____________________ port from SQ_SC_STG_PROMOTIONS Source Qualifier and the ____________________ column in the DIM_DATES Lookup table.
6.
Add a Lookup Transformation to the mapping based on the SC_DIM_DATES (shortcut to DIM_DATES) target table. Figure 9-3. Select Lookup Table
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
199
7.
Rename the Lookup Transformation to lkp_START_DATE_KEY.
8.
Click OK.
9.
Click YES to verify the “Look up condition is empty”. You will define this shortly.
10.
Now Drag and Drop the START_DATE port from SQ_SC_STG_PROMOTIONS Source Qualifier to an empty port in the lkp_START_DATE_KEY transformation.
11.
Make START_DATE input only.
12.
Rename START_DATE to IN_START_DATE.
13.
Define the Lookup Condition to look like Figure 9-4: Figure 9-4. Lookup Condition
14.
On the Properties tab and verify the following values: ♦ ♦ ♦ ♦
Lookup Table Name = DIM_DATES (default). Lookup Caching Enabled = Checked (default). Lookup Cache Persistent = Checked (needs to be set). Cache File Name Prefix = LKPSTUxx (where xx is your student number).
15.
Link the DATE_KEY port from the lkp_START_DATE_KEY transformation to the START_DK port in the SC_DIM_PROMOTIONS target.
16.
Save your work. Note: Notice that this transformation has many ports. We could have unchecked to Output column on all except for the ones that we need but since this Lookup Transformation will be persistent it would have limited its functionality for other Mappings that might leverage it.
The lkp_START_DATE_KEY transformation will not retrieve values for EXPIRY_DK because the lookup conditions will be different. 17.
Create a second Lookup transformation called lkp_EXPIRY_DATE_KEY by selecting the lkp_START_DATE_KEY transformation and pressing Ctrl+C and Ctrl+V.
18.
Make the changes necessary to the Lookup to ensure that the EXPIRY_DATE finds the proper DATE_KEY.
19.
200
a.
Rename it to lkp_EXPIRY_DATE_KEY.
b.
Rename port IN_START_DATE to IN_EXPIRY_DATE.
c.
Verify the Lookup Condition is correct.
Link the EXPIRY_DATE port from SQ_SC_STG_PROMOTIONS Source Qualifier to the IN_EXPIRY_DATE port in the lkp_EXPIRY_DATE_KEY transformation. Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
20.
Link the DATE_KEY port from the lkp_EXPIRY_DATE_KEY transformation to the EXPIRY_DK port in the SC_DIM_PROMOTIONS target.
21.
Save your work.
Figure 9-5. m_DIM_POROMOTIONS_LOAD completed mapping
Step 5: Create and Run the Workflow 1.
Launch the Workflow Manager and sign into your assigned folder.
2.
Create a new Workflow named wkf_DIM_PROMOTIONS_LOAD_xx.
3.
Create a new Session task using the mapping m_DIM_PROMOTIONS_LOAD_xx.
4.
Edit the s_m_DIM_PROMOTIONS_LOAD_xx session task.
5.
In the Mapping tab: a.
Select SQ_SC_STG_PROMOTIONS located under the Sources folder in the navigator window.
b.
Set the Connections > Type to your assigned NATIVE_STGxx connection object.
c.
Select SC_DIM_PROMOTIONS located under the Target folder in the navigator window.
d.
Set the Connections > Type to your assigned NATIVE_EDWxx connection object.
e.
Ensure that the Target load type is set to Normal.
6.
Complete the workflow by linking the Start and Session tasks and save your work.
7.
Run the workflow.
8.
Review the Task Details.
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
201
Your information should appear similar to Figure 9-6. Figure 9-6. Task Details of the completed session run
9.
Select the Source/Target Statistics tab. Your statistics tab should appear as Figure 9-7.
Figure 9-7. Source/Target Statistics of the completed session run
202
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Data Results Preview the target data. The results should be similar as Figure 9-8. Note the values for START_DK and EXPIRY_DK. Figure 9-8. Data Preview of the DIM_PROMOTIONS target table
By setting the Lookup Cache Persistent property on the Lookup transformations, two files were created in the cache file directory defined for the Integration Service process. See Figure 9-9. Note that in this lab, these files are on the Integration Service process machine, not your local computer. Also note that the names correspond to the name you entered in the Cache File Name Prefix Lookup property. To view these files, you will need to map to the file system on the Integration Service process machine. Verify that the files have a timestamp similar to when you ran the above workflow. Figure 9-9. Preview files created when Persistent Cache is set on Lookup Transformation
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
203
204
Unit 9 Lab A: Load Promotions Dimension Table (Lookup and Persistent Cache) Informatica PowerCenter 8 Level I Developer
Unit 9 Lab B: Features and Techniques II Business Purpose The management wants to increase the efficiency of the PowerCenter Developers.
Technical Description This lab will detail the use of 4 PowerCenter Designer features. Each of these features will increase the efficiency of any developer who knows how to use them appropriately. At the discretion of the instructor, this lab can also be completed as a demonstration.
Objectives ♦
Find Within Workspace
♦
View Object Dependencies
♦
Compare Objects
♦
Overview Window
Duration 15 minutes
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
205
Instructions Open a Mapping In the Designer tool, perform the following steps: 1.
Right-click the DEV_SHARED folder and select Open. This folder will be used in one of the features.
2.
Right-click your Studentxx folder and select Open.
3.
Drag the m_Stage_Customer_Contacts_xx mapping into the Mapping Designer within your Studentxx folder.
Feature 1: Find in Workspace When using this feature you can perform a string search for the name of an object, table, column, or port for all the transformations in a mapping currently open in the Mapping Designer or workspace. This feature can also be used in the Source Analyzer, Target Designer, Mapplet Designer, and Transformation Developer.
1.
Select the Find in workspace toolbar icon
2.
Type the word “customer” in the Find What text box.
3.
Click Find Now.
.
Your results should appear as in Figure 9-10. Figure 9-10. Find in workspace dialog box
206
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
Note: In the Find in workspace feature, the term “fields” can mean columns in sources or targets or ports in transformations. The term “table” can mean a source or target definition or a transformation. Velocity Best Practice: By using the Velocity Methodology object naming conventions (such as transformation type prefixes) it will be easier to locate the found objects in the workspace. For example, in Figure 9-10 we know that SQ_customer_layout is a Source Qualifier and fil_Customer_No_99999 is a filter.
Feature 2: View Object Dependencies By viewing object dependencies in the Designer a user can learn which objects may be affected by making changes to source or target definitions, mappings, mapplets, or transformations (reusable or nonreusable). Direct and indirect dependencies are shown. Object dependencies can also be viewed from the Workflow Manager and the Repository Manager. The Repository Manager will show any of the supported dependencies between a wide range of objects within the repository. See the Repository Guide for a complete list. 1.
Select the flat-file source definition promotions in the Navigator window.
2.
Right-click and select Dependencies
You will see the Dependencies dialog box as shown in Figure 9-11. Figure 9-11. View Dependencies dialog box
3.
Click OK You will see the View Dependencies window, which will show detailed information about each of the dependencies found. Browse through this window, noting that some of the information relates to Team-Based Development (version control) properties like Version, Timestamp, and Version Comments. Note: By clicking the Save button on the toolbar, the dependencies can be saved as an .htm file for future reference.
4.
Experiment by viewing the dependencies of other objects.
Tip: Dependencies can also be viewed by right-clicking on an object directly in a workspace, such as a source definition in the Mapping Designer or the Source Analyzer.
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
207
Feature 3: Compare Objects This feature allows you to compare all of the ports and properties of any two objects within a mapping or mapplet. 1.
Open the m_DIM_PROMOTIONS_LOAD_xx mapping.
2.
Right-click the Lookup transformation lkp_START_DATE_KEY and select Compare Objects.
3.
For the Instance 2 drop-box, select the Lookup transformation lkp_EXPIRY_DATE_KEY. This is the object we wish to compare with the Lookup transformation. Your screen should appear as Figure 912. Figure 9-12. Transformation compare objects dialog box
4.
208
Click Compare.
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
5.
Browse the tabs in the Transformations window that appears. Select the Properties tab, and what you see should be similar to Figure 9-13. Figure 9-13. Compare Transformation objects Properties details
Note: A great deal of comparative information is displayed in the tabs. All differences will appear in
red. Ports that are highlighted in yellow indicate a difference in the expression which may not be easily visible in this view. We will now learn how to compare objects that are in different folders. 6.
Open the target definition STG_DATES in the Target Designer.
7.
Right-click the target and select Compare Objects.
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
209
8.
The Select Targets dialog box allows you to choose a comparison object in another folder. Click the Browse button for Target 2 and select the DIM_DATES table in the DEV_SHARED folder. Your screen should appear as Figure 9-14.
Figure 9-14. Target comparison dialog box
9.
Click Compare.
10.
Browse the information in the various tabs. Note that this method can quickly tell you the differences, if any, between two objects in two different folders. See Figure 9-15. Figure 9-15. Column differences between two target tables
Tip: In order to compare objects across folders, both folders must be open.
210
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
Feature 4: Overview Window The Overview window is useful when a large mapping is “zoomed in” on your screen so you can work on the individual transformations, but the zoom level makes it difficult to scroll into a different section of the mapping because you cannot see where you are scrolling to. The Overview window has been described as a “bird’s eye view” of the mapping, enabling you to see your position relative to the entire structure. 1.
In the Mapping Designer, set the zoom level to 100-percent.
2.
Click the Toggle Overview Window toolbar button.
3.
The Overview window will appear in the upper-right hand corner of your screen. Use your left mouse button to drag the dotted rectangle to a different location within the mapping. If you were searching for a target or a source in a large and complex mapping, this feature would make it faster to locate.
Tip: Selected mapping objects appear red in the Overview window.
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
211
212
Unit 9 Lab B: Features and Techniques II Informatica PowerCenter 8 Level I Developer
Unit 10: Sorter, Aggregator and Self-Join In this unit you will learn about: ♦
Sorter transformations
♦
Aggregator transformations
♦
Active and passive transformations
♦
Data concatenations
♦
Self-joins
Lesson 10-1. Sorter Transformation
Type Active.
Description The Sorter transformation sorts the incoming data based on one or more key values - the sort order can be ascending, descending or mixed. The Sorter transformation attribute, “ Distinct” provides a facility to remove duplicates from the input rows.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
213
Properties
214
Option
Description
Sorter Cache Size
The Integration Service uses the Sorter Cache Size property to determine the maximum amount of memory it can allocate to perform the sort operation. The Integration Service passes all incoming data into the Sorter transformation before it performs the sort operation. You can specify any amount between 1 MB and 4 GB for the Sorter cache size.
Case Sensitive
The Case Sensitive property determines whether the Integration Service considers case when sorting data. When you enable the Case Sensitive property, the Integration Service sorts uppercase characters higher than lowercase characters.
Work Directory
The directory that the Integration Service uses to create temporary files while it sorts data. After the Integration Service sorts the data, it deletes the temporary files.
Distinct
You can configure the Sorter transformation to treat output rows as distinct. If you configure the Sorter transformation for distinct output rows, the Mapping Designer configures all ports as part of the sort key.
Tracing Level
Sets the amount of detail included in the session log when you run a session containing this transformation.
Null Treated Low
Enable this property if you want the Integration Service to treat null values as lower than any other value when it performs the sort operation.
Transformation Scope
Specifies how the Integration Service applies the transformation logic to incoming data: - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Business Purpose A business may aggregate data on records received from relational sources (Databases) or flat files with related records in random order. Sorting the records prior to passing them on to an Aggregator transformation may improve the overall performance of the aggregation task.
Example In the following example Gross Profit and Profit Margin are calculated for each item sold. To improve performance of this session a Sorter transformation is added prior to the Aggregator transformation. The Aggregator “Sorted Input” property must be checked to notify the Aggregator to expect input in sort order.
Sorter Cache How It Works ♦
If the cache size specified in the properties exceeds the available amount of memory on the Integration Service process machine then the Integration Service fails the session.
♦
All of the incoming data is passed into cache memory before the sort operation is performed.
♦
If the amount of incoming data is greater than the cache size specified then the PowerCenter will temporarily store the data in the Sorter transformation work directory.
Key Points The Integration Service requires disk space of at least twice the amount of incoming data when storing data in the work directory.
Performance Considerations Using a Sorter transformation may improve performance over and “Order By” clause in a SQL override in aggregate session when the source is a database because the source database may not be tuned with the buffer sizes needed for a database sort.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
215
Lesson 10-2. Aggregator Transformation
Type Active.
Description The Aggregator transformation calculates aggregates such as sums, minimum or maximum values across multiple groups of rows. The Aggregator transformation can apply expressions to its ports however those expressions will be applied to a group of rows unlike the Expression transformation which applies calculations on a row-by-row basis only. Aggregate functions are created in output ports only. Function grouping requirements are set using the Aggregator GroupBy port.
216
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Properties
Option
Description
Cache Directory
Local directory where the Integration Service creates the index and data cache files.
Tracing Level
Amount of detail displayed in the session log for this transformation.
Sorted Input
Indicates input data is presorted by groups. Select this option only if the mapping passes sorted data to the Aggregator transformation.
Aggregator Data Cache Size
Data cache size for the transformation. Default cache size is set to Auto.
Aggregator Index Cache Size
Index cache size for the transformation. Default cache size is set to Auto
Transformation Scope
Specifies how the Integration Service applies the transformation logic to incoming data: - Transaction. Applies the transformation logic to all rows in a transaction. Choose Transaction when a row of data depends on all rows in the same transaction, but does not depend on rows in other transactions. - All Input. Applies the transformation logic on all incoming data. When you choose All Input, the PowerCenter drops incoming transaction boundaries. Choose All Input when a row of data depends on all rows in the source.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
217
Business Purpose A business may want to calculate gross profit or profit margins based on items sold or summarize weekly, monthly or quarterly sales activity.
Example The following example calculates a value for units sold ( OUT_UNITS_SOLD) and revenue (OUT_REVENUE) and cost (OUT_COST) for each promotion id by date.
218
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Aggregator Cache How It Works ♦
There are two types of cache memory, index and data cache.
♦
All rows are loaded into cache before any aggregation takes place.
♦
The index cache contains group by port values.
♦
The data cache contains all port values variable and connected output ports. ♦
♦
Non group by input ports used in non-aggregate output expression. ♦ Non group by input/output ports. ♦ Local variable ports. ♦ Port containing aggregate function (multiply by three). One output row will be returned for each unique occurrence of the group by ports.
Key Points ♦
If there is not enough memory specified in the index and data cache properties, the overflow will be written out to disk.
♦
No rows are returned until all of the rows have been aggregated.
♦
Checking the sorted input attribute will bypass caching.
♦
You enable automatic memory settings by configuring a value for the Maximum Memory Allowed for Auto Memory Attributes or the Maximum Percentage of Total Memory Allowed for Auto Memory Attributes. If the value is set to zero for either of these attributes, the Integration Service disables automatic memory settings and uses default values.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
219
Performance Considerations Aggregator performance can be increased when you sort the input data in the same order as the Aggregator Group By ports prior to doing the Aggregation. The Aggregator sorted input property would need to be checked. Relational source data can be sorted using an “order by” clause in the Source Qualifier override. Flat file source data can be sorted using an external sort application or the Sorter transformation. Cache size is also important in assuring optimal performance in the Aggregator. Make sure that your cache size settings are large enough to accommodate all of the data. If they are not the system will cache out to disk causing a slow down in performance.
Lesson 10-3. Active and Passive Transformations
Passive transformations operate on one row at a time AND preserve the number of rows. Examples: Expression, Lookup, Sequence Generator. Active transformations operate on groups of rows AND/OR change the number of rows. Examples: Source Qualifier, Filter, Joiner, Aggregator.
220
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Lesson 10-4. Data Concatenation
Data concatenation brings together different pieces of the same record (row). Data concatenation works only if combining branches of the same source pipeline. For example, one branch has a customer ID and the other branch has the customer name. But if either branch contains an active transformation, the correspondence between the branches no longer exists.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
221
Lesson 10-5. Self-Join
Description The Joiner transformation combines fields from two data sources into a single combined data source based on one or more common fields also know as the join condition. However when values to be combined are located within the same pipeline a self join provides a solution. The two pipelines being joined need to be sorted in the same order.
Business Purpose A business may have to extract data from a single employee master table with employee data such as names, title, salary and reporting department and create a new table showing only those employees whose salary is greater than the average salary for the department.
Example The following example creates a new table with only those employees whose salary is greater than the average salary for the department that they work in.
222
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Key Points ♦
The inputs to the Joiner from the single source must separate into two data streams.
♦
For self-joins between two branches of the same pipeline. ♦
♦
Must add a transformation between the Source Qualifier and the Joiner in at least one branch of the pipeline. ♦ Data must be pre-sorted by the join key. ♦ Configure the Joiner to accept sorted input. For self-joins between records from the same source. Create two instances of the source and join the pipelines from each source.
Performance Considerations There is a performance benefit in a self join since it requires both the master and detailed side to be sorted.
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
223
224
Unit 10: Sorter, Aggregator and Self-Join Informatica PowerCenter 8 Level I Developer
Unit 10 Lab: Reload the Employee Staging Table Business Purpose Mersche Motors employee data has been loaded into the STG_EMPLOYEES table but after validating the data it was determined that data was missing. Although the lookup to the salaries.txt file and Workflow were successful the developer noticed that there is no data in the DEALERSHIP_MANAGER column of the target table. By leveraging the previous mapping that initially loaded the Employee data, the developer must put the Dealership Manager's full name in the DEALERSHIP_MANAGER column.
Technical Description We will copy the m_STG_EMPLOYEES_xx mapping created in a previous lab and modify it to derive the Manager Name and load it into the DEALERSHIP_MANAGER column of the STG_EMPLOYEES table. To do this we will have to split the data into two streams. One stream will have all employee records and the other will have only manager records that will need to be joined back together using the manager records as the master. On the Manager stream we will filter on the POSITION_TYPE column for MANAGER records and relate them back to the SALESREP records using the DEALERSHIP_ID. This is necessary because there is only one Manager per dealership. We will also need to maintain the Lookup with respect to the salaries.txt file to ensure that salary data is still populated.
Objectives ♦
Leverage an existing Mapping to solve a data integrity issue
♦
Split the data stream and use a self-join to bring it back together
♦
Copy and modify an existing reusable Expression transformation
Duration 70 Minutes
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
225
Velocity Deliverable: Mapping Specifications Mapping Name
m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx
Source System
Flat File
Target System
Oracle Table
Initial Rows
109
Rows/Load
109
Short Description
File list will be read, source data will be reformatted and salary information for each employee will be added. Determine the names of the Managers and populate the DEALERSHIP_MANAGER column.
Load Frequency
Daily
Preprocessing
Target Append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
DEALERSHIP_ID, EMPLOYEE_ID
SOURCES Files File Name
File Location
Fixed/Delimited
Additional File Info
employees_central.txt, employees_east.txt, employees_west.txt Definition in employees_layout.txt
C:\pmfiles\SrcFiles
Delimited
These 3 comma delimited flat files will be read into the session using a filelist employees_list.txt. The layout of the flat files can be found in employees_layout.txt.
employees_list.txt
C:\pmfiles\SrcFiles
File list
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
STG_EMPLOYEES
Insert
Unique Key
X
LOOKUPS Lookup Name
lkp_salary
Table
salaries.txt
Match Condition(s)
EMPLOYEE_ID = IN_EMPLOYEE_ID
Location
C:\pmfiles\LkpFiles
Filter/SQL Override
226
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
HIGH LEVEL PROCESS OVERVIEW Lookup
Source
Expression
Sorter
Joiner
Filter
Aggregator
Target
PROCESSING DESCRIPTION (DETAIL) The mapping will read from three flat files contained in a file list. A reusable Expression transformation, however, needs to be copied and modified to receive the employee id from the source. Additionally, the data will be sorted by Dealership id and then the data stream will be split. The master data flow (bottom) will group by DEALERSHIP_ID in the Aggregator. This will allow only 1 row of output from the Aggregator for each unique set of group by ports. The two data flows will be concatenated with a self-join based on dealership id thereby enabling the mapping to retrieve the dealership manager for each record. A lookup to a salary text file will retrieve the salary information for each employee. The data will then be loaded into the STG_EMPLOYEES table.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
227
228
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
EMPLOYEE_EMAIL
EMPLOYEE_GENDER
AGE_GROUP
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
REGIONAL_MANAGER
EMPLOYEE_FAX_NMBR
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_PHONE_NMBR
STG_EMPLOYEES
POSITION_TYPE
EMPLOYEE_COUNTRY
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_ZIP_CODE
STG_EMPLOYEES
TER_LANG_DESC
EMPLOYEE_STATE
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_CITY
STG_EMPLOYEES
SEC_LANG_DESC
EMPLOYEE_ADDRESS
STG_EMPLOYEES
STG_EMPLOYEES
EMPLOYEE_NAME
STG_EMPLOYEES
NATIVE_LANG_DESC
EMPLOYEE_ID
STG_EMPLOYEES
STG_EMPLOYEES
Target Column
Target Table
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
varchar2
number(p,s)
varchar2
varchar2
varchar2
varchar2
number(p,s)
Data type
SOURCE TO TARGET FIELD MATRIX
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
Source File
REGIONAL_MANAGER
POSITION_TYPE
THIRD_LANGUAGE
SECOND_LANGUAGE
NATIVE_LANGUAGE
Derived
Derived
EMAIL
FAX_NUMBER
Derived
COUNTRY
ZIP_CODE
STATE
CITY
ADDRESS
Derived
EMPLOYEE_ID
Source Column
The CUST_AGE_GROUP is derived from the decoding of AGE column. The valid age groups are less than 20, 20 to 29, 30 to 39, 40 to 49, 50 to 60 and Greater than 60
GENDER is currently either M or F. It needs to be Male, Female or UNK
The PHONE_NUMBER column is in the format of 9999999999 and needs to be reformatted to (999) 999-9999.
Concatenate First Name and Last Name
Expression
Default Value if Null
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
229
Target Column
DEALERSHIP_ID
DEALERSHIP_MANAGER
EMPLOYEE_SALARY
HIRE_DATE
DATE_ENTERED
Target Table
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
date
date
number(p,s)
varchar2
number(p,s)
Data type
employees_layout
employees_layout
employees_layout
employees_layout
employees_layout
Source File
DATE_ENTERED
HIRE_DATE
Derived
DEALERSHIP_MANAGER
DEALERSHIP_ID
Source Column
A Salary field for each Employee ID can be found in salaries.txt.
Concatenated FIRSTNAME and LASTNAME of the manager. The employee records are split apart and then joined back together based on DEALERSHIP_ID
Expression
Default Value if Null
Instructions Step 1: Copy an Existing Mapping 1.
Launch the Designer and sign into your assigned folder.
2.
Locate the mapping m_STG_EMPLOYEES_xx in the Navigator window.
3.
Copy it and rename it m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx.
4.
Open m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx in the Mapping Designer to make it the current mapping for editing.
5.
Save your work. Figure 10-1. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD mapping
Step 2: Examine Source Data to Determine a Key for Self-Join Figure 10-2 shows the employees_central.txt file. Some columns are not in view or hidden. Figure 10-2. Employee_central.txt
Which of these columns can we use to determine Manager records? Answer: ________________________ Which of these columns can we use for a self-join condition to obtain the Dealership Manager name for the employee records? Answer: ________________________
230
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Step 3: Prepare the New Mapping for Modification Many of the links will need to be removed in order to build the self-join. 1.
Right-click and select Arrange all and expand the Source Qualifier, Lookup and Target large enough to view all the ports and links.
2.
Remove all the links to the lkp_salaries transformation and all of the links to the STG_EMPLOYEES target.
3.
Rename the re_exp_Format_Name_Gender_Phone_Load_Date reusable transformation to exp_Format_Name_Gender_Phone_Load_Date_Mgr (notice the name change but the reusable transformation name that this expression is an instance of stays the same) Figure 10-3. Renaming an instance of a Reusable Transformation
4.
Save your work and notice that the mapping is now invalid. Your mapping should look similar to Figure 10-4 if you Arrange all Iconic. Figure 10-4. m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD after most links removed
Step 4: Create a Sorter Transformation 1.
Add a Sorter transformation to the mapping and name it srt_EMPLOYEES_DEALERSHIP_ID_DESC. Figure 10-5. Sorter Transformation Icon on Toolbar
2.
Select the following ports from SQ_employees_layout and drag them into the Sorter Transformation: ♦ ♦
DEALERSHIP_ID EMPLOYEE_ID
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
231
♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
ADDRESS CITY STATE ZIP_CODE COUNTRY FAX_NUMBER EMAIL NATIVE_LANGUAGE SECOND_LANGUAGE THIRD_LANGUAGE POSITION_TYPE REGIONAL_MANAGER HIRE_DATE DATE_ENTERED
3.
Select all the output ports from the exp_FORMAT_NAME_GENDER_PHONE_LOAD_DATE_MGR transformation and drag them into the srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation.
4.
Edit the Sorter transformation. a.
On the DEALERSHIP_ID port check the checkbox in the 'Key' column to define the sort column.
b.
Rename the following ports:
♦
OUT_NAME to EMPLOYEE_NAME OUT_PHONE to EMPLOYEE_PHONE OUT_GENDER to EMPLOYEE_GENDER OUT_AGE_GROUP to AGE_GROUP
♦ ♦ ♦
5.
Save your work.
Step 5: Create a Filter Transformation The source file contains sales representatives and managers. This stream of the mapping will only contain managers. 1.
Create a Filter transformation named fil_MANAGERS.
2.
Link the following ports from srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation to the fil_MANAGERS transformation: ♦ ♦ ♦
232
DEALERSHIP_ID EMPLOYEE_NAME POSITION_TYPE
3.
Set the filter condition to only allow 'MANAGER' position types.
4.
Save your work.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Step 6: Create an Aggregator Transformation The filtered source data may contain multiple entries for a manager. The Aggregator transformation can be used to eliminate duplicate manager records. 1.
Create an Aggregator transformation named agg_MANAGERS. Figure 10-6. Aggregator Transformation Icon on Toolbar
2.
Link the following ports from fil_MANAGERS transformation to the agg_MANAGERS transformation: ♦ ♦
3.
Edit the Aggregator. ♦ ♦
4.
DEALERSHIP_ID EMPLOYEE_NAME On the DEALERSHIP_ID port, check the checkbox in the 'Group By' column. Under the Properties tab, check the 'Sorted Input' checkbox.
Save your work. Tip: By making the DEALERSHIP_ID the group by port the Aggregator will return one row for each unique DEALERSHIP_ID. This will remove the duplicate manager rows.
The mapping depicting the Sorter to Filter to Aggregator flow should be the same as Figure 10-7. Figure 10-7. Partial mapping flow depicting the flow from the Sorter to the Filter to the Aggregator
Step 7: Create a Joiner Transformation for the Self-Join 1.
Create a Joiner transformation and name it jnr_MANAGERS_EMPLOYEES.
2.
On the Properties tab set Sorted Input property to “checked.”
3.
Click OK on the Edit Transformations dialogue and the click Yes on the “Join Condition is empty…” dialogue. The join condition will be set shortly.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
233
4.
Link all ports from the agg_MANAGERS transformation into the jnr_MANAGERS_EMPLOYEES Joiner transformation.
5.
Link all ports from the srt_EMPLOYEES_DEALERSHIP_ID_DESC transformation to the jnr_MANAGERS_EMPLOYEES transformation.
6.
Edit the jnr_MANAGERS_EMPLOYEES transformation: a.
Rename the two ports linked from the Aggregator transformation as follows:
♦ ♦
DEALERSHIP_ID to MANAGER_DEALERSHIP_ID EMPLOYEE_NAME to MANAGER_NAME
b.
Ensure that both ports have checks under the “M” column defining them as the Master record.
c.
Rename the following ports linked from the Sorter transformation:
♦ ♦
DEALERSHIP_ID1 to EMPLOYEE_DEALERSHIP_ID EMPLOYEE_NAME1 to EMPLOYEE_NAME (remove the '1')
d.
Add the following join condition: MANAGER_DEALERSHIP_ID = EMPLOYEE_DEALERSHIP_ID
7.
Save your work.
Review Figure 10-8 to verify your work. Figure 10-8. Split data stream joined back together
Step 8: Get Salaries from the Lookup 1.
234
Link the EMPLOYEE_ID port from the jnr_MANAGERS_EMPLOYEES transformation to the IN_EMPLOYEE_ID port in the lkp_salaries Lookup transformation.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Step 9: Connect the Joiner and Lookup to the Target 1.
Link the following ports between jnr_MANAGERS_EMPLOYEES to STG_EMPLOYEES target:
Tip: Hint: Some ports can be auto-linked by name; the rest must be done manually. MANAGER_NAME
--> DEALERSHIP_MANAGER
EMPLOYEE_DEALERSHIP_ID --> DEALERSHIP_ID EMPLOYEE_ID
--> EMPLOYEE_ID
EMPLOYEE_NAME
--> EMPLOYEE_NAME
ADDRESS
--> EMPLOYEE_ADDRESS
CITY
--> EMPLOYEE_CITY
STATE
--> EMPLOYEE_STATE
ZIP_CODE
--> EMPLOYEE_ZIP_CODE
COUNTRY
--> EMPLOYEE_COUNTRY
EMPLOYEE_PHONE
--> EMPLOYEE_PHONE_NUMBER
FAX_NUMBER
--> EMPLOYEE_FAX_NUMBER
EMAIL
--> EMPLOYEE_EMAIL
NATIVE_LANGUAGE
--> NATIVE_LANG_DESC
SECOND_LANGUAGE
--> SEC_LANG_DESC
THIRD_LANGUAGE
--> TER_LANG_DESC
POSITION_TYPE
--> POSITION_TYPE
REGIONAL_MANAGER
--> REGIONAL_MANAGER
HIRE_DATE
--> HIRE_DATE
EMPLOYEE_GENDER
--> EMPLOYEE_GENDER
AGE_GROUP
--> AGE_GROUP
DATE_ENTERED
--> DATE_ENTERED
2.
Link the SALARY port from the lkp_salaries transformation to the EMPLOYEE_SALARY port in the STG_EMPLOYEES target.
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
235
3.
Save your work.
Figure 10-9. Iconic view of the completed self-join mapping.
Step 10: Create and Run the Workflow 1.
Launch the Workflow Manager and sign into your assigned folder.
2.
Create a new workflow named wkf_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx
3.
Create a session task using the m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx mapping.
4.
Link the Start task to the s_m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx session task.
5.
Edit session s_m_STG_EMPLOYEES_DEALERSHIP_MGR_LOAD_xx. a.
In the Mapping tab, confirm that Source file directory is set to $PMSourceFileDir\.
b.
In Properties > Attribute > Source filename type in employees_list.txt and change Source filetype property from Direct to Indirect. The properties should look similar to Figure 10-10. Figure 10-10. Source properties for the employee_list.txt file list
c.
236
i.
Set the relational target connection object property to NATIVE_STGxx where xx is your student number.
ii.
Check the property Truncate target table option in the target properties. (this will need to be set because the data load from a previous lab needs to be replaced).
d.
Select lkp_salaries from the Transformations folder on the mapping tab and verify the following property values:
♦
Lookup source file directory = $PMLookupFileDir\. Lookup source filename = salaries.txt.
♦
6.
Select STG_EMPLOYEES located under the Target folder in the Mapping navigator.
Save your work. Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
7.
Start the workflow.
8.
Review the Task Details.
Figure 10-11. Task Details of the completed session run
9.
Review the Source/Target Statistics.
Figure 10-12. Source/Target Statistics of the completed session run
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
237
Data Results Preview the target data from the Designer. Your data should appear the same as displayed in Figure 10-13 through Figure 10-14. Figure 10-13. Data preview of the self-join of Managers and Employees in the STG_EMPLOYEES target table - screen 1
Figure 10-14. Data preview of the STG_EMPLOYEES target table - screen 2 scrolled right
Note: Not all rows and columns are shown.
238
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
239
240
Unit 10 Lab: Reload the Employee Staging Table Informatica PowerCenter 8 Level I Developer
Unit 11: Router, Update Strategy and Overrides In this unit you will learn about: ♦
Router transformation
♦
Update Strategy transformation
♦
Source Qualifier override
♦
Target update override
♦
Session task mapping overrides
Lesson 11-1. Router Transformation
Type Active.
Description The Router transformation is similar to the Filter transformation because it passes row data that meet the Router Group filter condition to the downstream transformation or target. The Router transformation has a single input group and one or more output groups with each output group representing a filter condition.
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
241
Business Purpose A business may receive records that are re-directed to specific targets, the records are “routed” to each target based on conditions of one or more record (row) fields.
242
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Example In the following example a business receives sales results based on responses to coupons featured in the local newspapers, magazines, and at their website. Each record is loaded into different target tables based on a promotion code.
In the example the “ DEFAULT” group routes rows that do not meet any of the group filters to an exception table. This would capture a record where a promo code (PROMO_ID) was incorrectly entered or a new code that has not been included in a filter group.
Performance Considerations When splitting row data based on field values a Router transformation has a performance advantage over multiple Filter transformations because a row is read once into the input group but evaluated multiple times based on the number of groups. Whereas using multiple Filter transformation requires the same row data to be duplicated for each Filter transformation.
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
243
Lesson 11-2. Update Strategy Transformation
Type Active.
244
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Description The Update Strategy transformation “tags” a row with the appropriate DML (data manipulation language) for the PowerCenter writer to apply to a relational target. Each row can be “tagged” with one of the following flags (the DD label stands for Data Driven):
DD_INSERT - tags a row for insert to a target DD_UPDATE - tags a row for update to a target DD_DELETE - tags a row for delete to a target DD_REJECT - tags a row for reject Note: For the row tags DD_DELETE and DD_UPDATE, the table definition in a mapping must have
a key identified otherwise the session created from that mapping will fail. Rows tagged with DD_REJECT will be passed on to the next transformation or target and subsequently placed in the appropriate “bad file” if the “Forward Rejected Rows” attribute is “checked” (default). If the attribute is “un-checked” then reject rows will be skipped.
Business Purpose A business process may require more than a single DML action on a target table. A target table may require historical information dealing with previous entries. Rows written to a target table, based on one or more criteria, may have to be inserted, updated or deleted. The Update Strategy transformation can be applied to meet this requirement.
Example In the following example a business wants to maintain the “MASTER_CUSTOMER” table with current information. Using a set of Filter transformations along with previous mapping objects, two data paths
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
245
have been developed, one for inserts (DD_INSERT) with the addition of a sequence number for new records and one for updates (DD_UPDATE) to update existing records with new information.
Performance Considerations The Update Strategy transformation performance can vary depending on the number of updates and inserts. In some cases there may be a performance benefit to split a mapping with updates and inserts into two mappings and sessions, one mapping with inserts and the other with updates.
Lesson 11-3. Expression Default Values
246
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Lesson 11-4. Source Qualifier Override
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
247
Properties
248
Property
Description
SQL Query
Allows you to override the default SQL query that PowerCenter creates at runtime.
User Defined Join
Allows you to specify a user defined join.
Source Filter
Allows you to create a where clause that will be inserted into the SQL query that is generated at runtime. The “where” portion of the statement is not required. EG. Table1.ID = Table2.ID.
Number of Sorted Ports
PowerCenter will insert an order by clause in the generated SQL query. The order by will be on the number of ports specified, from the top down. EG. In the sq_Product_Product_Cost Source Qualifier, if the number of sorted ports = 2, the order by will be: ORDER BY PRODUCT.PRODUCT_ID, PRODUCT.GROUP_ID.
Tracing Level
Specifies the amount of detail written to the session log.
Select Distinct
Allows you to select distinct values only.
Pre SQL
Allows you to specify SQL that will be run prior to the pipeline being run. The SQL will be run using the connection specified in the session task.
Post SQL
Allows you to specify SQL that will be run after the pipeline has been run. The SQL will be run using the connection specified in the session task.
Output is Deterministic
Source or transformation output that does not change between session runs when the input data is consistent between runs. When you configure this property, the Integration Service does not stage source data for recovery if transformations in the pipeline always produce repeatable data.
Output is Repeatable
Source or transformation output that is in the same order between session runs when the order of the input data is consistent. When output is deterministic and output is repeatable, the Integration Service does not stage source data for recovery.
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Lesson 11-5. Target Override
By default, target tables are updated based on key values. You can change this in target properties: 1.
Update Override
2.
Generate SQL
3.
Edit UPDATE WHERE clause with non-key items
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
249
Lesson 11-6. Session Task Mapping Overrides
You can override some mapping attributes in the Session task Mapping tab.
Examples ♦
250
Source readers: Turn a relational source into a flat file Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
♦
User-defined join: Modify a homogeneous join in the Source Qualifier
♦
Source filters: Add a filter to the Source Qualifier
♦
Target writers: Turn a relational target into a flat file
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
251
252
Unit 11: Router, Update Strategy and Overrides Informatica PowerCenter 8 Level I Developer
Unit 11 Lab: Load Employee Dimension Table Business Purpose The Mersche Motors data warehouse employee table is updated on a daily basis. Source rows from the staging area need to be tested to see if a row already exists in the dimension table. Rows need to be tagged for update or insert accordingly. Any rows containing bad data will need to be written to an error file.
Technical Description Rows from the STG_EMPLOYEES table need to be loaded into the DIM_EMPLOYEES table. Before loading the rows, EMPLOYEE_ID needs to be tested for NULL values. Invalid rows need to be written to an error file. Valid rows need to be tested to see if they exist already in DIM_EMPLOYEES and tagged for either INSERT or UPDATE accordingly. Finally, any rows sent to the DIM_EMPLOYEE table need to get valid dates from DIM_DATES.
Objectives ♦
Use of Update Strategy to tag rows for INSERT or UPDATE.
♦
Use of the Router transformation to conditionally route rows to different target instances.
♦
Source Qualifier Session property override.
♦
Using the Default Values option for NULL data replacement.
♦
Overriding Target writer option.
Duration 60 minutes
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
253
Velocity Deliverable: Mapping Specifications Mapping Name
m_DIM_EMPLOYEES_LOAD_xx
Source System
Oracle Table
Target System
Oracle Table, Flat File
Initial Rows
88, 21
Rows/Load
85, 21, 3, 0
Short Description
Move data from staging table to the dimension target table with error rows written to a flat file. Lookups required for date entries and to target table to test existing rows.
Load Frequency
Daily
Preprocessing
Target Append/Update
Post Processing Error Strategy
Null employee_id rows written to error file
Reload Strategy Unique Source Fields
EMPLOYEE_ID
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
STG_EMPLOYEES
TDBUxx
SQ override for daily loads only
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
DIM_EMPLOYEES
X
Insert
Unique Key
X
EMPLOYEE_ID
Files File Name
File Location
Fixed/Delimited
Additional File Info
dim_employees_err1.outt
C:\pmfiles\TgtFiles
Fixed
Based on DIM_EMPLOYEES definition
LOOKUPS Lookup Name
lkp_DIM_EMPLOYEES_EMPLOYEE_ID
Table
DIM_EMPLOYEES
Match Condition(s)
STG_EMPLOYEES.EMPLOYEE_ID = DIM_EMPLOYEES.EMPLOYEE_ID
Location
TDBUxx
Filter/SQL Override
254
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Lookup Name
lkp_DIM_DATES_INSERTS
Table
DIM_DATES
Match Condition(s)
STG_EMPLOYEES.DATE_ENTERED = DIM_DATES.DATE_VALUE
Filter/SQL Override
Reuse persistent cache from previous lab
Lookup Name
lkp_DIM_DATES_UPDATES
Table
DIM_DATES
Match Condition(s)
STG_EMPLOYEES.DATE_ENTERED = DIM_DATES.DATE_VALUE
Filter/SQL Override
Reuse persistent cache from previous lab
Location
Location
TDBUxx
TDBUxx
HIGH LEVEL PROCESS OVERVIEW Lookup
Update Strategy
Relational Source
Expression
Relational Target (Inserts)
Flat File Target (Errors)
Router Lookup
Lookup Update Strategy
Relational Target (Updates)
PROCESSING DESCRIPTION (DETAIL) The DIM_EMPLOYEE table needs to be loaded from the STG_EMPLOYEES table. The STG_EMPLOYEES has two days worth of data, 01/02/2003 and 01/03/2003. The second day contains corrections to some of the first day's data. The mapping needs to be executed twice and manual SQ override will be required for both runs. The DIM_EMPLOYEE and DIM_DATES tables will be used as Lookup tables. Any rows with a null value for employee_id need to be routed to an error file. Substitute the NULL employee_id with 99999 using the default value option.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
255
256
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Target Column
EMPLOYEE_ID
EMPLOYEE_NAME
EMPLOYEE_ADDRESS
EMPLOYEE_CITY
EMPLOYEE_STATE
EMPLOYEE_ZIP_CODE
EMPLOYEE_COUNTRY
EMPLOYEE_PHONE_NMBR
EMPLOYEE_FAX_NMBR
EMPLOYEE_EMAIL
EMPLOYEE_GENDER
AGE_GROUP
NATIVE_LANG_DESC
SEC_LANG_DESC
TER_LANG_DESC
POSITION_TYPE
DEALERSHIP_ID
REGIONAL_MANAGER
DEALERSHIP_MANAGER
INSERT_DK
UPDATE_DK
Target Table
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_EMPLOYEES
DIM_DATES
DIM_DATES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
STG_EMPLOYEES
Source Table
SOURCE TO TARGET FIELD MATRIX
DATE_KEY
DATE_KEY
DEALERSHIP_MANAGER
REGIONAL_MANAGER
DEALERSHIP_ID
POSITION_TYPE
TER_LANG_DESC
SEC_LANG_DESC
NATIVE_LANG_DESC
AGE_GROUP
EMPLOYEE_GENDER
EMPLOYEE_EMAIL
EMPLOYEE_FAX_NMBR
EMPLOYEE_PHONE_NMBR
EMPLOYEE_COUNTRY
EMPLOYEE_ZIP_CODE
EMPLOYEE_STATE
EMPLOYEE_CITY
EMPLOYEE_ADDRESS
EMPLOYEE_NAME
EMPLOYEE_ID
Source Column
Lookup to DIM_DATES table matching the date entered column from the STG_EMPLOYEES to the date value column in the DIM_DATES table
Lookup to DIM_DATES table matching the date entered column from the STG_EMPLOYEES to the date value column in the DIM_DATES table
Expression
Default Value if Null
Instructions Step 1: Copy the Mapping 1.
Launch the Designer and open your assigned folder.
2.
Copy the m_DIM_EMPLOYEES_LOAD partial mapping from the DEV_SHARED folder to your student folder and rename it to m_DIM_EMPLOYEES_LOAD_xx.
3.
Click Yes when the Target Dependencies dialog box comes up. Figure 11-1. Mapping copy Target Dependencies dialog box
4.
Save your work.
Step 2: Edit the Expression Transformation 1.
Open the mapping m_DIM_EMPLOYEES_LOAD_xx. Your mapping should appear similar to Figure 11-2. Figure 11-2. Iconic view of the m_DIM_EMPLOYEES_MAPPING
2.
Edit the exp_NULL_EMPLOYEE_ID Expression transformation and add a Default value of 99999 to the EMPLOYEE_ID port.
3.
Click the
button to validate the default entry and the click OK.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
257
4.
Save your work.
Step 3: Create a Router Transformation The Router transformation is going to be used to determine which rows will be inserted, updated or sent to the error file. This will be done by checking the value of the EMPLOYEE_ID port. 1.
Add a Router transformation to the mapping.
2.
Drag both ports from lkp_DIM_EMPLOYEES_EMPLOYEE_ID into the Router.
3.
Drag all ports except EMPLOYEE_ID from exp_NULL_EMPLOYEE_ID to the Router.
4.
Edit the Router transformation. a.
Rename the Router to rtr_DIM_EMPLOYEES.
b.
In the Groups tab add 3 new groups using the Add new group icon: i.
Name the first group INSERTS.
ii.
Add the Group filter condition:
ISNULL(EMPLOYEE_ID) AND IN_EMPLOYEE_ID != 99999 iii.
Name the second group UPDATES.
iv.
Add the Group Filter Condition:
NOT ISNULL(EMPLOYEE_ID) AND IN_EMPLOYEE_ID != 99999 v.
Name the third group ERRORS.
vi.
Add the Group Filter Condition:
IN_EMPLOYEE_ID = 99999
Router should look similar to Figure 11-3. Figure 11-3. Router Groups
Step 4: Create an Update Strategy for INSERTS 1.
258
Add an Update Strategy transformation named upd_INSERTS to the mapping.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
2.
In the Router, scroll down to the INSERTS Router group and drag all ports, except EMPLOYEE_ID1 and HIRE_DATE1, to the upd_INSERTS Update Strategy transformation.
3.
Edit the upd_INSERTS Update Strategy transformation: a.
Rename the IN_EMPLOYEE_ID1 port to EMPLOYEE_ID.
b.
In the Properties tab: i.
Select the Update Strategy Expression Value box. Delete the 0 and enter DD_INSERT. See Figure 11-4.
Figure 11-4. Update Strategy set to INSERT
Step 5: Create Lookup to DIM_DATES 1.
Create a Lookup transformation named lkp_DIM_DATES_INSERTS that references the SC_DIM_DATES target table.
2.
Pass DATE_ENTERED1 from upd_INSERTS to lkp_DIM_DATES_INSERTS.
3.
Edit the lkp_DIM_DATES_INSERTS Lookup transformation: a.
Uncheck all the Output checkmarks on all the ports except for DATE_KEY.
b.
Rename the DATE_ENTERED1 port to IN_DATE_ENTERED.
c.
Create the condition DATE_VALUE = IN_DATE_ENTERED. Ensure that you use DATE_VALUE and not DATE_KEY.
d.
In the Properties tab set the following values:
♦
Lookup cache persistent = Checked (needs to be set) Cache File Name Prefix = LKPSTUxx (where xx is your student number)
♦
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
259
Step 6: Link upd_INSERTS and lkp_DIM_DATES_INSERTS to Target DIM_EMPLOYEE_INSERTS 1.
Link the DATE_KEY port from lkp_DIM_DATES_INSERTS to the INSERT_DK column in the DIM_EMPLOYEES_INSERTS target.
2.
Right click anywhere in the workspace and select Autolink…
3.
Select upd_INSERTS from the From transformation drop box and DIM_EMPLOYEES_INSERTS from the To transformation box. Select the More button and enter a '1' for From Transformation Suffix.
4.
Click OK.
5.
Iconize the upd_INSERTS, lkp_DIM_DATES_INSERTS and DIM_EMPLOYEES_INSERTS transformations.
6.
Save your work.
Step 7: Create an Update Strategy for UPDATES 1.
Create an Update Strategy transformation named upd_UPDATES.
2.
In the Router, scroll down to the UPDATES Router group and drag all ports, except IN_EMPLOYEE_ID3 and HIRE_DATE3, to the upd_UPDATES Update Strategy transformation.
3.
Edit the upd_UPDATES Update Strategy transformation.
4.
In the Properties tab, select the Update Strategy Expression Value box. Delete the 0 and enter DD_UPDATE.
Step 8: Create Second Lookup to DIM_DATES 1.
Right click on the existing lkp_DIM_DATES_INSERTS Lookup transformation and select Copy.
2.
Move the cursor to the workspace, right click, and select Paste.
3.
Link DATE_ENTERED3 from upd_UPDATES to IN_DATE_ENTERED in the new Lookup transformation.
4.
Edit the new Lookup transformation: a.
Rename the new Lookup lkp_DIM_DATES_UPDATES.
b.
Ensure the Lookup condition is: DATE_VALUE = IN_DATE_ENTERED.
Step 9: Link upd_UPDATES and lkp_DIM_DATES_UPDATES to Target DIM_EMPLOYEE_UPDATES
260
1.
From lkp_DIM_DATES_UPDATES, link DATE_KEY to UPDATE_DK in DIM_EMPLOYEES_UPDATES.
2.
Right click anywhere in the workspace and select Autolink.
3.
Select upd_UPDATES from the From transformation drop box and DIM_EMPLOYEES_UPDATES from the To transformation box. Select the More button and enter a '3' for From transformation Suffix.
4.
Click OK.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
5.
Iconize the upd_UPDATES, lkp_DIM_DATES_UPDATES and DIM_EMPLOYEES_UPDATES transformations.
6.
Save your work.
Step 10: Link ERRORS Router Group to DIM_EMPLOYEES_ERR Using Autolink… 1.
Select the ERRORS group of rtr_DIM_EMPLOYEES from the From Transformation drop down box and DIM_EMPLOYEES_ERR from the To Transformation box. Select the More>> button and enter a '4' for From Transformation Suffix.
2.
Click OK.
3.
Delete the link for EMPLOYEE_ID4 and link instead IN_EMPLOYEE_ID4.
4.
Save your work and ensure the mapping is VALID.
5.
Arrange All Iconic and the mapping should look similar to Figure 11-5:
Figure 11-5. Iconic view of the completed mapping
Step 11: Create and Run the Workflow The first thing that we need to do is to run a pre-created workflow that loads three dimension tables. 1.
Launch the Workflow Manager and sign into your assigned folder.
2.
Locate and run the wkf_U11_Preload_DIM_PAYMENT_DEALERSHIP_PRODUCT_xx workflow. Make sure that it completed successfully and that all rows were successful.
3.
Create a workflow named wkf_DIM_EMPLOYEES_LOAD_xx.
4.
Add a new Session task named s_m_DIM_EMPLOYEES_LOAD_xx, using the m_DIM_EMPLOYEES_LOAD_xx mapping.
5.
Link the Start task to the new Session task.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
261
6.
Edit the Session task Mapping tab: a.
b.
Select the
node in the navigation window.
i.
Change all DB Connection values that relate to the target tables (DIM) to NATIVE_EDWxx.
ii.
Change all DB Connection values that relate to the source tables (STG) to NATIVE_STGxx.
iii.
Change the $Target connection value to NATIVE_EDWxx as well. (This will take care of the three lookup tables pointing to $Target.)
In the Mapping tab navigator window: i.
Click on SQ_STG_EMPLOYEES.
ii.
Scroll down in the Properties section window to the Source Filter attribute.
iii.
Add the Source Filter condition: DATE_ENTERED = '01/02/2003'
Figure 11-6. Source Filter Value
Tip: It is sometimes easier to add a quick Source filter in the Session than to go back and modify the mapping, save it, refresh the session, save it, then run the workflow. SQL overrides will override any entries in the mapping until the override is deleted. Make sure if using 'shortcuts' the prefix to the table is deleted before saving the filter. c.
Click on the target DIM_EMPLOYEE_ERR. i.
262
Under the Writers section: Change Relational Writer to File Writer. The error handling specifications want error rows written to a file, not a table.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
ii.
In the Properties Attribute, rename the Output filename to include your student number. .
Figure 11-7. Writers section of Target schema
Tip: To create a flat file as a target instead of the original table, simply change the Writers type from Relational to File. A fixed width flat file based on the format of the target definition will be created automatically. The properties of this file can also be altered by the user. 7.
Save your work and start the workflow.
8.
Review the Task Details and Source/Target statistics. They should be the same as displayed in Figure 11-8 and Figure 11-9. Figure 11-8. Task Details of the completed session run
Figure 11-9. Source/Target Statistics
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
263
Data Results Preview the DIM_EMPLOYEES target data from the Designer, your data should appear similar as displayed in Figure 11-10. Figure 11-10. Data Results for DIM_EMPLOYEES
Scroll all the way to the right to confirm that the INSERT_DK column was updated and not the UPDATE_DK column. Also, you may want to review the three rows that were written to the error file. See the instructor for the location of the files. If the Integration Service process runs on UNIX, you may need special permission from your administrator to see the files. Figure 11-11. Data Results for the Error Flat File (Located on the Machine Hosting the Integration Service Process
Step 12: Prepare, Run, and Monitor the Second Run
264
1.
Edit the s_m_DIM_EMPLOYEES_LOAD_xx session task.
2.
In the Mapping tab, click SQ_STG_EMPLOYEES in the Navigation window.
3.
Scroll down the Properties section and edit the Source filter to reflect day two loading: 01/03/2003.
4.
Save and run the workflow.
5.
Review the Task Details and Source/Target statistics.
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
They should be the same as displayed in Figure 11-12 and Figure 11-13. Figure 11-12. Task Details tab results for second run
Figure 11-13. Source/Target Statistics for second run
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
265
Preview the DIM_EMPLOYEES target data from the Designer. Scroll to the far right of the data screen and notice that there are now entries for UPDATE_DK and new entries at the bottom of the list for INSERT_DK. Figure 11-14. Data preview showing updates to the target table
266
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
267
268
Unit 11 Lab: Load Employee Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 12: Dynamic Lookup and Error Logging In this unit you will learn about: ♦
Dynamic lookup cache
♦
Error logging
Lesson 12-1. Dynamic Lookup Cache
Type Passive.
Description A Basic Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source. However when the lookup table is also the target row data may go out of sync with the target table image loaded in memory. The Dynamic Lookup transformation allows for the synchronization of the target lookup table image in memory with its physical table in a database.
Business Purpose In a data warehouse dimension tables are frequently updated and changes to new row data must be captured within a load cycle.
Example A business updates their customer master table on a daily basis. Within a day a customer may change there status or correct an error in their information. A new customer record may be added in the morning Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
269
and a change to that record may be added later in the day, the change (insert followed by an update) needs to be detected dynamically. The following data is an example of two new records followed by two changed records within the day. The record for David Mulberry shows a change in the zip code from 02061 to 02065. The record for Silvia Williamson shows a change in marital status from “S” to “M”.
The following mapping uses a Lookup transformation Dynamic Lookup Cache option to capture the changes:
270
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Dynamic Cache Properties For more detailed explanations consult the online help. Option
Lookup Type
Description
Dynamic Lookup Cache
Relational
Indicates to use a dynamic lookup cache. Inserts or updates rows in the lookup cache as it passes rows to the target table. Use only with the lookup cache enabled.
Output Old Value On Update
Relational
When you enable this property, the Integration Service outputs old values out of the lookup/output ports. When the Integration Service updates a row in the cache, it outputs the value that existed in the lookup cache before it updated the row based on the input data. When the Integration Service inserts a new row in the cache, it outputs null values.
Insert Else Update
Relational
Applies to rows entering the Lookup transformation with the row type of insert. When you select this property and the row type entering the Lookup transformation is insert, the Integration Service inserts the row into the cache if it is new, and updates the row if it exists. If you do not select this property, the Integration Service only inserts new rows into the cache when the row type entering the Lookup transformation is insert.
Update Else Insert
Relational
Applies to rows entering the Lookup transformation with the row type of update. When you select this property and the row type entering the Lookup transformation is update, the Integration Service updates the row in the cache if it exists, and inserts the row if it is new. If you do not select this property, the Integration Service only updates existing rows in the cache when the row type entering the Lookup transformation is update.
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
271
272
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
New Lookup Row
Description
0
The Integration Service does not update or insert the row in the cache.
1
The Integration Service inserts the row into the cache.
2
The Integration Service updates the row in the cache.
Key Port Points ♦
The Lookup transformation “Associated Port” matches a Lookup input port with the corresponding port in the Lookup cache.
♦
The “Ignore Null Inputs for Updates” should be checked for ports where null data in the input stream may overwrite the corresponding field in the Lookup cache.
♦
The “Ignore in Comparison” should be checked for any port that is not to be compared.
♦
The flag “New Lookup Row” indicates the type of row manipulation of the cache. If an input row creates an insert in the Lookup cache the flag is set to “1”. If an input row creates an update of the lookup cache the flag is set to “2”. If no change is detected the flag is set to “0”. A Filter or Router transformation can be used with an Update Strategy transformation to set the proper row tag to update a target table.
Performance Considerations A large lookup table may require more memory resources than available. A SQL override in the Lookup transformation can be used to reduce the amount of memory used by the Lookup cache.
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
273
Lesson 12-2. Error Logging
PowerCenter recognizes the following types of errors: ♦
Transformation. An error occurs within a transformation. The data row has only passed partway through the mapping transformation logic.
♦
Data reject. The data row is fully transformed according to the mapping logic but due to a data issue, it cannot be written to the target. For example: ♦ ♦
Target database constraint violations, out-of-space errors, log space errors, null values not accepted Target table properties 'reject truncated/overflowed rows'
A data reject can also be forced by an Update Strategy. These error types are recorded as follows:
274
Error Type
Logging OFF (Default)
Logging ON
Transformation errors
All errors written to session log then row discarded
Fatal errors written to session log. All errors appended to flat file or relational tables.
Data rejects
Appended to reject (.bad) file configured for session target
Written to row error tables or file
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Error logging is set in the Session task:
Error Log Types
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
275
Affects location attributes. Values are: ♦
None (no external error logging)
♦
Relational Database - produces 4 tables: ♦
♦
PMERR_SESS: Session metadata e.g. workflow name, session name, repository name ♦ PMERR_MSG: Error messages for a row of data ♦ PMERR_TRANS: Transformation metadata e.g. transformation group name, source name, port names with datatypes ♦ PMERR_DATA: Error row and source row data in string format e.g. [indicator1: data1 | indicator2: data2] Flat File - produces one file containing session metadata followed by de-normalized error information in the following format: Transformation || Transformation Mapplet Name || Transformation Group || Partition Index || Transformation Row ID || Error Sequence || Error Timestamp || Error UTC Time || Error Code || Error Message || Error Type || Transformation Data || Source Mapplet Name || Source Name || Source Row ID || Source Row Type || Source Data
276
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Log Row Data
Log Source Row Data
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
277
Source row logging does not work downstream of active transformations (where output rows are not uniquely correlated with input rows).
278
Unit 12: Dynamic Lookup and Error Logging Informatica PowerCenter 8 Level I Developer
Unit 12 Lab: Load Customer Dimension Table Business Purpose Mersche Motors data warehouse has a customer table that is loaded on a daily basis. Many customers visit dealership locations more than once on a daily basis so the warehouse logic has to be able to track multiple visits on the same day. This logic must be able to test if a customer record has already been loaded in the current run and if so, what, if anything, has changed about the customer.
Technical Description PowerCenter will source from the staging table STG_CUSTOMERS and load the dimension table DIM_CUSTOMERS. Customer data may have more than one occurrence in the source. Data will have to be tested for new rows, existing rows and invalid rows. A Dynamic Lookup will need to be used since a customer row could occur more than once in the source. Some rows will have null data so flat file error logging will be used to capture these.
Objectives ♦
Introduce Dynamic Lookups.
♦
Reinforce the Update Strategy.
♦
Introduce error logging.
Duration 50 minutes
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
279
Velocity Deliverable: Mapping Specifications Mapping Name
m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx
Source System
Oracle Table
Target System
Oracle Table
Initial Rows
6177
Rows/Load
6147
Short Description
Customer data will be loaded into the customer dimension table.
Load Frequency
Daily
Preprocessing Post Processing Error Strategy
Relational table error logging
Reload Strategy Unique Source Fields
CUST_ID
SOURCES Tables Table Name
Schema/Owner
STG_CUSTOMERS
TDBUxx
Selection/Filter
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
Insert
Unique Key
DIM_CUSTOMERS
X
X
X
CUST_ID
HIGH LEVEL PROCESS OVERVIEW
Relational Source
Lookup
Filter
Update Strategy
Relational Target
PROCESSING DESCRIPTION (DETAIL) The source staging table contains customer data that needs to be tested dynamically against the target dimension table in order to process possible duplicate customers. Based on the test results, records will be marked for insertion if new, update if row requires an update or rejection if a row is determined to be invalid. Invalid rows will be rejected during the update strategy and sent to an error logging file.
280
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source Table
Source Column
DIM_CUSTOMER
CUST_ID
STG_CUSTOMER
CUST_ID
DIM_CUSTOMER
CUST_NAME
STG_CUSTOMER
CUST_NAME
DIM_CUSTOMER
CUST_ADDRESS
STG_CUSTOMER
CUST_ADDRESS
DIM_CUSTOMER
CUST_CITY
STG_CUSTOMER
CUST_CITY
DIM_CUSTOMER
CUST_STATE
STG_CUSTOMER
CUST_STATE
DIM_CUSTOMER
CUST_ZIP_CODE
STG_CUSTOMER
CUST_ZIP_CODE
DIM_CUSTOMER
CUST_COUNTRY
STG_CUSTOMER
CUST_COUNTRY
DIM_CUSTOMER
CUST_PHONE_NMBR
STG_CUSTOMER
CUST_PHONE_NMBR
DIM_CUSTOMER
CUST_GENDER
STG_CUSTOMER
CUST_GENDER
DIM_CUSTOMER
CUST_AGE_GROUP
STG_CUSTOMER
CUST_AGE_GROUP
DIM_CUSTOMER
CUST_INCOME
STG_CUSTOMER
CUST_INCOME
DIM_CUSTOMER
CUST_E_MAIL
STG_CUSTOMER
CUST_E_MAIL
DIM_CUSTOMER
CUST_AGE
STG_CUSTOMER
CUST_AGE
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Expression
Default Value if Null
281
Instructions Step 1: Create a Relational Source Definition 1.
Launch the Designer and sign into your assigned folder.
2.
Verify you are in the Source Analyzer tool and create a shortcut to the STG_CUSTOMERS source table found in the DEV_SHARED folder.
3.
Rename it to SC_STG_CUSTOMERS.
Step 2: Create a Relational Target Definition 1.
Open the Target Designer tool.
2.
Create a shortcut to the DIM_CUSTOMERS target table found in the DEV_SHARED folder.
3.
Rename it to SC_DIM_CUSTOMERS.
Step 3: Create a Mapping 1.
Open the Mapping Designer tool.
2.
Create a new mapping named m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx.
3.
Add the SC_STG_CUSTOMERS relational source to the new mapping.
4.
Add the SC_DIM_CUSTOMERS relational target to the new mapping.
5.
Save your work.
Step 4: Create a Lookup Transformation
282
1.
Create a new Lookup transformation using the SC_DIM_CUSTOMERS table.
2.
Drag the lookup window and make it taller.
3.
Select all the ports from SQ_SC_STG_CUSTOMERS and drop them on to an empty port at the bottom of the Lookup.
4.
Edit the Lookup transformation. a.
Rename it lkp_DIM_CUSTOMERS.
b.
Select the Properties tab. i.
Click on the Dynamic Lookup Cache value.
ii.
Click on the Insert Else Update value.
c.
Select the Ports tab and for all ports from the SQ_SC_STG_CUSTOMERS prefix them with IN_ and remove the “1” from the end of the name.
d.
Select the Condition tab and create the condition CUST_ID = IN_CUST_ID.
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
e.
Select the Ports tab again; It should look the same as Figure 12-1. Notice the new port entry called NewLookupRow. Figure 12-1. Port tab view of a dynamic Lookup
Note: Dynamic lookups allow for inserts and updates to take place in cache as the same operations
take place against the target table. Note: The Associated port column is there to allow the association of input ports with lookup ports
of different names. This enables PowerCenter to update the Lookup Cache with correct values. Note: NewLookupRow is used to store the values; 0, 1, 2.
0 = no change 1 = Insert 2 = Update
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
283
f.
Under the Associated Port column, click the box where it says “N/A” and select the port names from the list that you want to associate. See Figure 12-2. Figure 12-2. Port to Port Association
5.
g.
Associate the remaining ports.
h.
Clear the Output checkmarks for all of the ports prefixed with “IN_”.
Click OK and save your work.
Step 5: Create a Filter Transformation 1.
Create a Filter transformation named fil_ROWS_UNCHANGED.
2.
Drag all output ports from the Lookup transformation to the Filter transformation.
3.
Create a condition that allows all rows that are marked for update or insert, or all rows where the CUST_ID is NULL to pass through. Any rows where NewLookupRow != 0 are deemed to be inserts or updates. If you need assistance refer to the reference section at the end of the lab.
Step 6: Create an Update Strategy 1.
Create an Update Strategy transformation named upd_DIM_CUSTOMERS.
2.
Drag all ports from the Filter transformation to the Update Strategy transformation.
3.
Edit the upd_DIM_CUSTOMERS Update Strategy transformation. a.
Add an Update Strategy Expression that marks the row as an insert, update or reject. Use the following pseudo code to construct your expression. If you need assistance refer to the reference section at the end of the lab. If CUST_ID is NULL then reject the row Else If NewLookupRow equals 1 then mark the row for insert Else if NewLookupRow equals 2 then mark the row for update.
b.
Ensure the Forward Rejected Rows option is checked. This will send any rejected rows to error logs which will be created later.
Tip: Refer to the Unit 11 lab for details on the Update Strategy Transformation. 4.
284
Autolink ports by name to the SC_DIM_CUSTOMERS target.
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
5.
Save your work. Figure 12-3. Iconic View of the Completed Mapping
Step 7: Create and Run the Workflow 1.
Launch the Workflow Manager and sign into your assigned folder.
2.
Create a new workflow named wkf_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx.
3.
Add a new Session task using m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx mapping.
4.
Edit the s_m_DIM_CUSTOMERS_DYN_DAILY_LOAD_xx session task. a.
Set the connection value for the SQ_STG_CUSTOMERS source to your assigned NATIVE_STGxx connection object.
b.
Set the connection value for the SC_DIM_CUSTOMERS target to your assigned NATIVE_EDWxx connection object.
c.
In the Config Object tab: i.
Change Error Handling section for the entry Error Log Type from None to Flat File as shown in Figure 12-4.
ii.
Change the Error Log File Name to PMErrorxx.log where xx refers to your student number.
Figure 12-4. Error Log Choice Screen
5.
Save your work and start the workflow. Note: In a Production environment, error logging tables or files would be created in a different schema or location than the production schema or file location.
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
285
6.
Review the Task Details, your information should appear similar to Figure 12-5. Figure 12-5. Task Details of the Completed Session Run
7.
Select the Source/Target Statistics tab. Your statistics should be the same as displayed in Figure 12-6. Figure 12-6. Source/Target Statistics for the Session Run
286
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Data Results Preview the target data from the Designer, your data should appear the same as displayed in Figure 12-7. Figure 12-7. Data preview of the DIM_CUSTOMERS table
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
287
Error Log Results The error log is written to the BadFiles directory configured for the Integration Service process under the default name of PMErrorxx.log. Look in this location for the error log and look at the rows that were written there. The log should appear similar to Figure 12-8. Figure 12-8. Flat file error log
Reference 1.
fil_ROWS_UNCHANGED Condition NewLookupRow != 0 OR ISNULL(CUST_ID)
2.
upd_DIM_CUSTOMERS Expression IIF(ISNULL(CUST_ID), DD_REJECT, IIF(NewLookupRow = 1, DD_INSERT, IIF(NewLookupRow = 2, DD_UPDATE)))
288
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
289
290
Unit 12 Lab: Load Customer Dimension Table Informatica PowerCenter 8 Level I Developer
Unit 13: Unconnected Lookup, Parameters and Variables In this unit you will learn about: ♦
Unconnected Lookup transformation
♦
System variables
♦
Mapping parameters and variables
Lesson 13-1. Unconnected Lookup Transformations
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
291
Type Passive.
Description The unconnected Lookup transformation allows the inclusion of additional information in the transformation process from an external database or flat file source when it is referenced within any transformation that supports expressions.
292
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Business Purpose A source table or file may have a percentage of records with incomplete data. The holes in the data can be filled by performing a look up to another table or tables. As only a percentage of the rows are affected it is better to perform the look up on only those rows that need it and not the entire data set.
Example In the following example an insurance business received records of policy renewals, a small percentage of records have the CUSTOMER_ID field data missing. The following mapping uses an Unconnected Lookup transformation to fill in the missing data.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
293
Key Points ♦
Use the lookup function within a conditional statement.
♦
The condition is evaluated for each row but the lookup function is only called if the condition evaluates to TRUE.
♦
The unconnected Lookup transformation is called using the key expression :lkp.lookupname.
♦
Data from several input ports may be passed to the Lookup transformation but only one port may be returned.
♦
An Unconnected Lookup transformation returns on one value designated by the Lookup transformation R (return) port.
♦
If the R port is not checked that mapping will be valid but the session created from the mapping will fail at run time.
Performance Considerations Using a cached Lookup attribute can improve performance if the Lookup table is static.
Connected versus Unconnected Lookup Transformations
294
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Joins versus Lookups
Lesson 13-2. System Variables
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
295
Description System variables hold information that is derived from the system. The user cannot control the content of the variable but can use the information contained within the variable. Three variables that we will discuss are described in the table shown below. Variable
Description
SESSSTARTTIME
The time that the session starts execution. This is based on the time of the Integration Service.
SYSDATE
The current date/time on the system that PowerCenter is running on.
$$$SESSSTARTTIME
The Session Start time returned as a string.
Business Purpose The main reason that system variables are utilized to build mappings in PowerCenter is that they can provide consistency to program execution. Business and systems professionals will find this very useful when building systems.
Example Setting a port to the system date. To set a value of a port to the system date the developer needs to do this in an expression within a transformation. For this example we will set the DATE_UPDATED port to the system date. Port: DATE_UPDATED Datatype: Date Expression: SYSDATE
296
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Lesson 13-3. Mapping Parameters and Variables
Description A mapping can utilize parameters and variables to store information during the execution. Each parameter and variable is defined with a specific data type. Parameters are different from variables in that the value of a parameter is fixed during the run of the mapping while the value for a variable can change. Both parameters and variables can be accessed from anywhere in the mapping. To create a parameter or variable, select Mapping>Parameters and Variables from within the Mapping Designer in the Designer client.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
297
Scope Parameters and variables can only be utilized inside of the object that they are created in. For instance a mapping variable created for mapping_1 can only be seen and used in mapping_1 and is not available in any other mapping or mapplet. A parameter or variable's scope is the mapping in which it was created. As a general rule for Informatica, when a variable is created its scope is relative to the object in which it was created.
298
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
User-defined variable and parameter names must always begin with $$. $$PARAMETER_NAME
or $$VARIABLE_NAME
To change the value of a variable, you must use one of the following functions within an expression: Function Name
Usage Notes
Example
SetVariable
Sets the variable to a value that you specify (executes only if a row is marked as insert or update). At the end of a successful session, the Integration Service saves either the MAX or MIN of (start value.final value) to the repository, depending on the aggregate type of the variable. Unless overridden, it uses the saved value as the start value of the variable for the next session run.
SetVariable($$VAR_NAME, 1)
SetCountVariable
Increments a counter variable. If the Row Type is Insert increment +1, if Row Type is Delete increment -1. 0 for Update and Reject.
SetCountVariable($$COUNT_VAR)
SetMaxVariable
Compare current value to value passed into the function. Returns the higher value and sets the current value to the higher value.
SetMaxVariable($$MAX_VAR,10)
SetMinVariable
Compare current value to the value passed into the function. Returns the lower value and sets the current value to the lower value.
SetMinVariable($$MIN_VAR,10)
At the end of a successful session, the values of variables are saved to the repository. The SetVariable function writes the final value of a variable to the repository based on the Aggregation Type selected when the variable was defined. The final value written to the repository is not necessarily the last value processed by the SetVariable function. The final value written to the repository for a variable that has an Aggregate type of Max will be whichever value is greater, current value or initial value. The final value for a variable with a MIN Aggregation Type will be whichever value is smaller, current value or initial value.
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
299
Variable Definition
The Integration Service determines the value of a variable by checking for it in a specific order. The following table describes the order of precedence. Number
Item
Description
1
Parameter File
This file can hold information about definitions of variables and parameters
2
Repository Saved Value
Values for variables that were saved in the repository upon the successful completion of a session.
3
Initial Value
The Initial value as defined by the user.
4
Default Value
The Default value set by the system.
Parameter Definition The Integration Service determines the value of a parameter by checking for it in a specific order. The following table describes the order of precedence. Number
Item
Description
1
Parameter File
This file can hold information about definitions of variables and parameters
2
Initial Value
The Initial value as defined by the user.
3
Default Value
The Default value set by the system.
Purpose Mapping variables and parameters are used: ♦
300
To simplify mappings by carrying information within or between transformations. Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
♦
To improve maintainability by allowing quick changes to values within a mapping.
We will discuss two examples, one using a variable and one using a parameter. The first example uses a variable to implement incremental extracts from relational sources. The second example uses parameters to replace “naked” numbers and strings within expressions.
Example 1 Tracking Last Execution Date To set up a mapping to perform an incremental extract, we will utilize a variable to track when the mapping was last executed. We will then use this variable as part of the SQL that extracts the data to ensure that we only pick up new and modified records. The variable will be updated to today's date when the mapping is complete so that we can use it the next time we run. The SQL WHERE clause will be modified in the Source Qualifier Transformation. The following is an example of a statement that could become part of a Source Qualifier Filter. F1_LAST_UPDATE_DATE >= '$$LAST_RUN_DT'
Where: ♦
F1_LAST_UPDATE_DATE
♦
is a user created mapping variable that holds the value of the date of the last run. Note that this variable is surrounded by single quotes. The quotes are required so that the SQL syntax will be proper.
is a database field that contains the date when the record was last touched
$$LAST_RUN_DT
To set the value of $$LAST_RUN_DT, we will use the function Setvariable. Setvariable( $$LAST_RUN_DT, SESSSTARTTIME)
Example 2 Replacement of Nameless Numbers It is not always wise to embed a number or character string into expressions because the support team may not understand the meaning of the number or character string. To help eliminate misunderstandings, use parameters to leave a better record of how the value is derived. Without a parameter, the expression might be: IIF(ISNULL(SOLD_DT),TO_DATE('1/1/3000','MM/DD/YYYY')
Someone could misinterpret this statement, possibly thinking it might be a mistake. But if we used a parameter then there would less chance of a misunderstanding. For instance, the following statement is much clearer. IIF(ISNULL(SOLD_DT),$$OFFICIAL_DEFAULT_DT) $$OFFICIAL_DEFAULT_DT is equal to '1/1/3000'. If all mappings used the same parameter and a common parameter file, then it would be easy to ensure that all processes used the same value. This would ensure
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
301
consistency. Additional examples of ways mappings could utilize variables or parameters to replace nameless numbers and strings is shown in the following table. Variable/Parameter Usage Examples Potential Value
Param Or Var
Name
Replace Naked Numbers with Number e.g. an expression that determines if tech support cases have been open greater than 100 days.
100
Param
$$MAX_NUM_DAYS_OPEN
Replace Naked Characters - Set the value of Processing Center where the session is executed using a variable that is defined in the mapping and has its value set in a parameter file.
'US’
Var
$$REG_PROC_LOCATION
Consistency - Utilize parameters to make sure that everyone uses the same value in expressions. Create two parameters that represent yes and no. Have all mappings use the same values via a parameter file.
'Y' ‘N’
Param Param
$$YES_1_CHAR $$NO_1_CHAR
Reason/Goal
302
Unit 13: Unconnected Lookup, Parameters and Variables Informatica PowerCenter 8 Level I Developer
Unit 13 Lab: Load Sales Fact Table Business Purpose Mersche Motors dealerships sometimes give aggressive discounts that are outside the authorized range. These type discounts are a small percentage compared to the number of rows being processed but the information needs to be processed accordingly. Also, even though the source staging table contains 7 days of data, this production run will load the entire staging table into the SALES_FACT table in a single workflow.
Technical Description The information needed resides in two separate staging tables. To compound this, the relationship between the two tables does not exist on the database. Referential integrity will have to be created within PowerCenter. Special formulas are needed to process the discounts out of range. To make this more efficient the use of mapping parameters and variables will be used.
Objectives ♦
Unconnected Lookup transformation
♦
Aggregator transformation
♦
Mapping parameters
Duration 35 Minutes
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
303
Velocity Deliverable: Mapping Specifications Mapping Name
m_FACT_SALES_LOAD_xx
Source System
Oracle Tables
Target System
Oracle Table
Initial Rows
5475, 5
Rows/Load
5441
Short Description
Will have to join two tables to get the payment id. This relationship does not exist in the RDBMS so it will need to be created in PowerCenter.
Load Frequency
Daily
Preprocessing
Target Append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
STG_TRANSACTIONS STG_PAYMENT
TDBUxx TDBUxx
Where STG_TRANSACTIONS.PAYMENT_DESC = STG_PAYMENT.PAYMENT_TYPE_DESC
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
Insert
Unique Key
FACT_SALES
X
X
X
CUST_ID, PRODUCT_KEY, DEALERSHIP_ID, PROMO_ID, DATE_KEY
LOOKUPS
304
Lookup Name
lkp_DIM_DATES
Table
DIM_DATES
Match Condition(s)
STG_TRANSACTIONS.TRANSACTION_DATE = DIM_DATES.DATE_VALUE
Filter/SQL Override
Reuse persistent cache
Location
TDBUxx
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Lookup Name
lkp_DIM_PROMOTIONS
Table
DIM_PROMOTIONS
Match Condition(s)
STG_TRANSACTIONS.PROMO_ID = DIM_PROMOTIONS.PROMO_ID
Lookup Type
Unconnected
Lookup Name
lkp_DIM_PRODUCT
Table
DIM_PRODUCT
Match Condition(s)
STG_TRANSACTIONS.PRODUCT_ID = DIM_PRODUCT.PRODUCT_ID
Location
Location
TDBUxx
TDBUxx
Filter/SQL Override
HIGH LEVEL PROCESS OVERVIEW Lookup Source1 Expression
Aggregator
Target
Source2 Lookup Lookup
PROCESSING DESCRIPTION (DETAIL) The mapping will join two relational tables that do not have a PK-FK relationship. This relationship will have to be created within PowerCenter. There are lookups required to get a date key and promotion indicator. An unconnected Lookup will be used in situations where a valid discount value needs to be obtained. Also, there are a number of values that need to be derived before being loaded into the FACT_SALES table.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
305
306
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Target Column
CUST_ID
PRODUCT_KEY
DEALERSHIP_ID
PAYMENT_ID
PROMO_ID
DATE_KEY
UNITS_SOLD
REVENUE
COST
DISCOUNT
HOLDBACK
REBATE
Target Table
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
FACT_SALES
STG_TRANSACTIONS
STG_TRANSACTIONS
STG_TRANSACTIONS STG_PROMOTIONS
STG_TRANSACTIONS
STG_TRANSACTIONS
STG_TRANSACTIONS
DIM_DATES
STG_TRANSACTIONS
STG_PAYMENT
STG_TRANSACTIONS
DIM_PRODUCT
STG_TRANSACTIONS
Source Table
SOURCE TO TARGET FIELD MATRIX
REBATE
HOLDBACK
DISCOUNT/Derived
Derived
Derived
Derived
DATE_KEY
PROMO_ID
PAYMENT_ID
DEALERSHIP_ID
PRODUCT_KEY
CUST_ID
Source Column
If the discount is > 17.75 then look up to the STG_PROMOTIONS table to select a discount rate. The discount is the discount rate divided by 100 times the selling price.
sum (UNIT_COST * SALES_QTY)
sum of (SELLING_PRICE * SALES_QTY) - DISCOUNT - HOLDBACK - REBATE)
sum of SALES_QTY
Lookup from STG_TRANSACTIONS to DIM_DATES using TRANSACTION_DATE as the lookup value.
Source Qualifier join on payment description. (stg_transactions/stg_payment)
Lookup from STG_TRANSACTIONS to DIM_PRODCUT using PRODUCT_ID as the lookup value.
Expression
Default Value if Null
Instructions Step 1: Create an Internal Relationship Between two Source Tables 1.
Launch the Designer and sign into your assigned folder.
2.
Drag the STG_TRANSACTIONS and STG_PAYMENT relational source tables into the Source Analyzer workspace.
3.
The PAYMENT_DESC column from STG_TRANSACTIONS and the PAYMENT_TYPE_DESC column of the STG_PAYMENT table are logically related so we can build a join on them. They both contain the payment type description. Link the PAYMENT_DESC column from STG_TRANSACTIONS to the PAYMENT_TYPE_DESC column of the STG_PAYMENT table. This will create a PK-FK relationship between the two tables. Note: Creating the PK_FK relationship within the Source Analyzer does not create this relationship
on the actual database tables. The relationship is created on the source definitions within PowerCenter only. Your source definitions should look the same as displayed in Figure 13-1. Figure 13-1. Source Analyzer view of the STG_TRANSACTIONS and STG_PAYMENT tables
Step 2: Create a Mapping Parameter 1.
Open the mapping named m_FACT_SALES_LOAD_xx.
2.
Add a mapping parameter be clicking Mappings > Parameters and Variables.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
307
3.
On the next screen, click the Add a new variable to this table icon. See Figure 13-2: Figure 13-2. Declare Parameters and Variables screen
4.
Create a new parameter. ♦ ♦ ♦ ♦ ♦
Parameter Name = $$MAX_DISCOUNT Type = Parameter Datatype = decimal Precision = 15,2 For the initial value, enter 17.25.
Figure 13-3. Parameter entry
5.
Click OK.
6.
Save your work.
Step 3: Step Three: Create an Unconnected Lookup 1.
308
Create a Lookup transformation using the SC_DIM_PROMOTIONS relational target table and name it lkp_DIM_PROMOTIONS. Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
2.
Under the Ports tab: a.
Click on PROMO_ID, then click the Copy icon
b.
Name the new port IN_PROMO_ID and make it an input only port.
c.
Make DISCOUNT the Return port.
d.
Uncheck the Output ports for all other ports except PROMO_ID and DISCOUNT.
, and then the Paste icon
.
The Lookup should look the same as Figure 13-4: Figure 13-4. Lookup Ports tab showing input, output and return ports checked/unchecked
3.
Create the lookup condition comparing PROMO_ID to IN_PROMO_ID.
4.
Click OK and save the repository.
Step 4: Add Unconnected Lookup Test to Expression 1.
Edit the exp_DISCOUNT_TEST Expression transformation.
2.
If the IN_DISCOUNT port has a value greater than the value passed in via a mapping parameter, then we need to get an acceptable value from the DIM_PROMOTIONS table. The variable port v_DISCOUNT will be used to hold the return value. Edit the v_DISCOUNT variable port and add the expression: IIF(IN_DISCOUNT > $$MAX_DISCOUNT, :LKP.LKP_DIM_PROMOTIONS(PROMO_ID),IN_DISCOUNT)
3.
The discount is held as a whole number. We need to change this to a percentage and apply it against the selling price to derive the dollar value of the discount. Edit the output port OUT_DISCOUNT and add the expression: v_DISCOUNT / 100 * SELLING_PRICE
Step 5: Create Aggregator Transformation 1.
Create an Aggregator transformation named agg_FACT_SALES.
2.
Drag the PRODUCT_KEY port from lkp_DIM_PRODUCT to agg_FACT_SALES.
3.
Drag the DATE_KEY port from lkp_DIM_DATES to agg_FACT_SALES.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
309
4.
Drag the following ports from the Expression transformation to the Aggregator: ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦ ♦
5.
PAYMENT_ID CUST_ID DEALERSHIP_ID PROMO_ID SELLING_PRICE UNIT_COST SALES_QTY HOLDBACK REBATE OUT_DISCOUNT
Open the Aggregator and re-order the key ports in the following order: CUST_ID, PRODUCT_KEY, DEALERSHIP_ID, PAYMENT_ID, PROMO_ID, DATE_KEY.
6.
Group by the ports in Figure 13-5: Figure 13-5. Aggregator ports with Group By ports checked
7.
Uncheck the output ports for SELLING_PRICE, UNIT_COST and SALES_QTY.
8.
Rename the following ports: ♦ ♦ ♦ ♦
9.
Add the following new ports: ♦
310
SELLING_PRICE to IN_SELLING_PRICE. UNIT_COST to IN_UNIT_COST. SALES_QTY to IN_SALES_QTY. OUT_DISCOUNT to DISCOUNT. Create a new output port after the DISCOUNT port. Port Name
OUT_UNITS_SOLD
Datatype
decimal
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
♦
♦
Precision
3
Expression
SUM(IN_SALES_QTY)
Create a new output port after the OUT_UNITS_SOLD port. Port Name
OUT_REVENUE
Datatype
decimal
Precision
15,2
Expression
SUM( ( IN_SELLING_PRICE * IN_SALES_QTY) - DISCOUNT - HOLDBACK - REBATE)
Create a new output port after the OUT_REVENUE port. Port Name
OUT_COST
Datatype
decimal
Precision
15,2
Expression
SUM( IN_UNIT_COST)
The Aggregator ports should be the same as displayed in Figure 13-6. Figure 13-6. Finished Aggregator
10.
Use Autolink by name to link the ports from the agg_FACT_SALES transformation to the SC_FACT_SALES target table. You will need to use the prefix of OUT_ to link all of the ports.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
311
The results should appear the same as Figure 13-7. Figure 13-7. Aggregator to Target Links
11.
Save your work.
12.
Iconize the mapping.
Figure 13-8. Iconic view of the completed mapping
Step 6: Create and Run the Workflow
312
1.
Launch the Workflow Manager client and sign into your assigned folder.
2.
Create a new workflow named wkf_FACT_SALES_LOAD_xx.
3.
Add a new Session task named s_m_FACT_SALES_LOAD_xx that uses the m_FACT_SALES_LOAD_xx mapping.
4.
Edit the s_m_FACT_SALES_LOAD_xx session. a.
Set the connection value for the sq_STG_TRANSACTIONS_PAYMENT source table to NATIVE_STGxx where xx is your student number.
b.
Set the connection value for the SC_FACT_SALES target table to NATIVE_EDWxx where xx is your student number.
c.
Change the Target load type to Normal.
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
d.
Under the mapping tab, select the lkp_DIM_DATES transformation and ensure that the Cache File Name Prefix is set to your pre-defined persistent cache (LKPSTUxx).
e.
Under the Session Properties tab, set $Target connection value to NATIVE_EDWxx.
5.
Save your work.
6.
Start the workflow.
7.
Review the Task Details. Figure 13-9. Task Details of the completed session run
8.
Review the Source/Target Statistics. Your statistics should be the same as displayed in Figure 13-10. Figure 13-10. Source/Target Statistics of the completed session run
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
313
Data Results Preview the target data from the Designer, your data should appear the same as displayed in Figure 13-11. Figure 13-11. Data Preview of the FACT_SALES target table
Note: Not all rows and columns are shown.
314
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
315
316
Unit 13 Lab: Load Sales Fact Table Informatica PowerCenter 8 Level I Developer
Unit 14: Mapplets In this unit you will learn about: ♦
Mapplets
Lesson 14-1. Mapplets Mapplets
Description Mapplets can combine multiple mapping object for re-usability; they can also simplify complex mapping maintenance. A mapplet can receive the input data from either an internal source or from the mapping pipeline that calls the mapplet. A mapplet must pass data out of the mapplet via a Mapplet Output transformation.
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
317
318
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
Mapping Input Transformation Type Passive or Active.
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
319
Description The Mapplet Input transformation acts as an input to a Mapplet.
Example In the following example a business as part of its daily sales needs to apply discounts to the data, perform a number of lookups and aggregate the sales values. This functionality is used in several types of feeds so a Mapplet was created to provide this functionality to many mappings. The Mapplet Input transformation is used to receive the sales transactions by customers, discounts are applied and then two lookups are used to find the product key and date keys. An Aggregator is used to sum the cost and revenue. A Mapplet Output transformation is used pass the output of the Mapplet back into the mapping that called it.
320
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
Mapping Output Transformation
Type Passive.
Description The Mapplet Output transformation acts as an output from a Mapplet.
Example The following example illustrates the Mapplet Output transformation.
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
321
The following example illustrates a Mapplet with multiple output groups.
Warning: When the mapplet is expanded at runtime, an unconnected output group can result in a
transformation having no output connections. If that is illegal, the mapping will be invalid.
322
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
Examples: ♦
If the mapplet outputs are fed by an Expression transformation, the mapping is invalid because an Expression requires a connected output.
♦
If the mapplet outputs are fed by a Router, the mapping is valid because a Router can have unconnected output groups.
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
323
324
Unit 14: Mapplets Informatica PowerCenter 8 Level I Developer
Unit 14 Lab: Create a Mapplet Business Purpose The team lead has noticed that there are other situations where we can reuse some of the transformations developed in the FACT_SALES load mapping.
Technical Description To take advantage of previously created objects, we will create a mapplet from existing objects used in a previous mapping. This mapplet can then be used in other mappings.
Objectives ♦
Create a Mapplet
Duration 15 Minutes
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
325
Instructions Step 1: Create the Mapplet 1.
In the Mapping Designer, re-open the m_FACT_SALES_LOAD_xx mapping.
2.
Highlight the following five transformations by holding down the Ctrl key and pressing the left mouse button: ♦ ♦ ♦ ♦ ♦
lkp_DIM_PROMOTIONS lkp_DIM_PRODUCT lkp_DIM_DATES exp_DISCOUNT_TEST agg_FACT_SALES.
3.
Select Edit > Copy or type Ctrl+C.
4.
Open Mapplet Designer. Create a mapplet named mplt_AGG_SALES. a.
Select Edit > Paste or type Ctrl+V.
b.
Right click in the workspace and Arrange All.
c.
Select the Scale to Fit icon.
Your mapplet definition should look the same as Figure 14-1. Figure 14-1. Mapplet Designer view of mplt_AGG_SALES
326
d.
Add a mapplet Input transformation.
e.
Name the Mapplet Input Transformation in_TRANSACTIONS.
f.
Add a mapplet Output transformation.
g.
Name the Mapplet Output Transformation out_TRANSACTIONS.
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
h.
From the exp_DISCOUNT_TEST transformation, drag all Input ports to the Input transformation.
i.
From the Aggregator agg_FACT_SALES, drag all Output ports to the Output transformation.
j.
Select the Scale to Fit icon.
k.
The mapplet should look similar to Figure 14-2.
Figure 14-2. Mapplet Designer view of MPLT_AGG_SALES with Input and Output transformations
l.
Save your work. Notice the mapplet is invalid. Scroll through the messages in the output window. They point to the expression exp_DISCOUNT_TEST as having an invalid symbol reference. The reference to the parameter $$MAX_DISCOUNT is invalid as it does not exist within the mapplet parameter definition. Note: Mapping parameters and variables that are created in a mapping are not available for use in a mapplet that is called from the mapping.
m.
Create a new parameter:
♦
Parameter Name = $$MAX_DISCOUNT Type = Parameter Datatype = decimal Precision = 15,2 Initial Value = 17.25
♦ ♦ ♦ ♦
5.
Save your work.
Step 2: Add Mapplet to Mapping 1.
Make a copy of the m_FACT_SALES_LOAD_xx mapping and open it in the Mapping Designer.
2.
Rename the mapping to m_FACT_SALES_LOAD_MAPPLET_xx.
3.
Delete the 5 transformations that you previously copied to the mapplet.
4.
Drag the mapplet mplt_AGG_DAILY into the mapping.
5.
Use Autolink by name to link the ports from the sq_STG_TRANSACTIONS_PAYMENT to the mplt_AGG_SALES input.
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
327
6.
Manually link the DISCOUNT port to the IN_DISCOUNT port.
7.
Use Autolink by name to link the Output portion of the mapplet to the target. You will need to specify OUT_ for the prefix and 1 for the suffix.
8.
Arrange All Iconic.
9.
Save your work.
Your mapping should look the same as Figure 14-3. Figure 14-3. Iconic view of the m_FACT_SALES_LOAD_MAPPLET_xx mapping
328
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
329
330
Unit 14 Lab: Create a Mapplet Informatica PowerCenter 8 Level I Developer
Unit 15: Mapping Design In this unit you will learn about: ♦
Designing mappings
♦
The workshop will give you practice in designing your own mappings.
Lesson 15-1. Designing Mappings Description This is designed to provide the user a checklist of topics to entertain during the mapping development process. This document will cover a variety of situations users will have to address and help them ask the right questions before and during the design process.
What to Consider The mapping process requires much more up front research than it appears. Before designing a mapping, it is important to have a clear picture of the end-to-end processes that the data will flow through. ♦
Design a high-level view of the mapping and document a picture of the process with the mapping, using a textual description to explain exactly what the mapping is supposed to accomplish and the methods or steps it will follow to accomplish its goal.
♦
After the high level flow has been established, document the details at the field level, listing each of the target fields and the source field(s) that are used to create the target field. Document any expression that may take place in order to generate the target field (e.g., a sum of a field, a multiplication of two fields, a comparison of two fields, etc.). Whatever the rules, be sure to document them at this point and remember to keep it at a physical level. The designer may have to do some investigation at this point for some business rules. For example, the business rules may say 'For active customers, calculate a late fee rate'. The designer of the mapping must determine that, on a physical level, that translates to 'for customers with an ACTIVE_FLAG of “1”, multiply the DAYS_LATE field by the LATE_DAY_RATE field'.
♦
Create an inventory of Mappings and Reusable objects. This list is a 'work in progress' list and will have to be continually updated as the project moves forward. The lists are valuable to all but particularly for the lead developer. These objects can be assigned to individual developers and progress tracked over the course of the project.
♦
The administrator or lead developer should gather all of the potential Sources, Targets and Reusable objects and place these in a shared folder accessible to all who may need access to them.
♦
As for Reusable objects, they need to be properly documented to make it easier for other developers to determine if they can/should use them in their own development.
♦
As a developer the specifications for a mapping should include required Sources, Targets and additional information regarding derived ports and finally how the ports relate from the source to the target.
♦
The Informatica Velocity methodology provides a matrix that assists in detailing the relationship between source fields and target fields. It also depicts fields that are derived from values in the Source and eventually linked to ports in the target.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
331
♦
If a shared folder for Sources and Targets is not available, the developer will need to obtain the source and target database schema owners, passwords and connect strings. With this information ODBC connections can be created in the Designer tool to allow access to the Source and Target definitions.
♦
Document any other information about the mapping that is likely to be helpful in developing the mapping. Helpful information may, for example, include source and target database connection information, lookups and how to match data in the lookup tables, data cleansing needed at a field level, potential data issues at a field level, any known issues with particular fields, pre or post mapping processing requirements, and any information about specific error handling for the mapping.
♦
The completed mapping design should then be reviewed with one or more team members for completeness and adherence to the business requirements. In addition, the design document should be updated if the business rules change or if more information is gathered during the build process.
High Level Process Overview Relational Target
Lookup
Relational Source
Expression
Relational Target
Router
Relational Target
Mapping Specifics The following are tips that will make the mapping development process more efficient. (not in any particular order)
332
♦
One of the first things to do is to bring in all required source and target objects into the mapping.
♦
Only connect fields that are needed or will be used.
♦
Only connect from the Source Qualifier those fields needed subsequently.
♦
Filter early and often. Only manipulate data that needs to be moved and transformed. Reduce the number non-essential records that are passed through the mapping.
♦
Decide if a Source Qualifier join will net the result needed versus creating a Lookup to retrieve desired results.
♦
Reduce the number of transformations. Excessive number of transformations will increase overhead.
♦
Consider increasing the shared memory from 12MB to 25MB or 40MB when using a large number of transformations.
♦
Make use of variables, local or global, to reduce the number of times functions will have to be used.
♦
Watch the data types. The Informatica engine converts compatible data types automatically. Excessive number of conversions is inefficient.
♦
Make use of variables, reusable transformations and mapplets for reusable code. These will leverage the work done by others. Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
♦
Use active transformations early in the process to reduce the number of records as early in the mapping as possible.
♦
When joining sources, select appropriate driving/master table.
♦
Utilize single pass reads. Design mappings to utilize one Source Qualifier to populate multiple targets.
♦
Remove or reduce field-level stored procedures. These will be executed for each record and slow performance.
♦
Lookup Transformation tips: ♦
♦
When the source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. ♦ Standard rule of thumb is not to cache tables over 500,000 rows. ♦ Use equality (=) conditions if possible in the Condition tab. ♦ Use IIF or DECODE functions when lookup returns small row sets. ♦ Avoid date comparisons in lookup; convert to string. Operations and Expression Transformation tips: ♦ ♦ ♦ ♦
Numeric operations are faster than string. Trim Char and Varchar fields before performing comparisons. Operators are faster than functions (i.e. || vs. CONCAT). Use flat files. File read/writes are faster than database reads/writes on same server. Fixed width files are faster than delimited file processing.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
333
334
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
Unit 15 Workshop: Load Promotions Daily Aggregate Table Business Purpose The management wants to be able to analyze how certain promotions are performing. They want to be able to gather the promotions by day for each dealership for each product being sold.
Technical Description The instructions will provide enough detail for you to design and build the mapping necessary to load the promotions aggregate table. It is suggested that you use the Velocity best practices that have been discussed during the course. The workshop will provide tables that can be filled in before you start building the mapping. If you are unclear on any of the instructions please ask the instructor.
OBJECTIVE Design and create a mapping to load the aggregate table
Duration 120 minutes
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
335
Workshop Details Sources and Targets SOURCE: STG_TRANSACTIONS This relational table contains sales transactions for 7 days. It will be located in the TDBUxx schema and contains 5,475 rows. For the purpose of this mapping we will read all 7 days of data. See Figure 15-1 for the source table layout. Figure 15-1. Source table definition
TARGET: FACT_PROMOTIONS_AGG_DAILY This is a relational table is located in the TDBUxx schema. After running the mapping it should contain 1,073 rows. See Figure 15-2 for the target table layout. Figure 15-2. Target table definition
Mapping Details In order to successfully create the mapping you will need to know some additional details. ♦ The management has decided that they don't want to keep track of the Manager Discount or the Employee Discount (PROMO_ID 105 and 200) so these will need to be excluded from the load.
336
♦
The PRODUCT_KEY can be obtained from the DIM_PRODUCT table by matching on the PRODUCT_ID.
♦
The DATE_KEY can be obtained from the DIM_DATES table by matching the TRANSACTION_DATE to the DATE_VALUE.
♦
UNITS_SOLD is derived by summing the SALES_QTY.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
♦
REVENUE is derived by taking the SALES_QTY times the SELLING_PRICE and then subtracting the DISCOUNT, HOLDBACK and REBATE. ♦
♦
Most of the discounts are valid but occasionally they may be higher than the acceptable value of 17.25. When this occurs you will need to obtain an acceptable value based on the PROMO_ID. The acceptable value can be obtained from the DIM_PROMOTIONS table by matching the PROMO_ID. ♦ The DISCOUNT is a percentage stored as a number. To calculate the actual discount in dollars you will need to divide the DISCOUNT by 100 and multiply it by the SELLING_PRICE. REVENUE_PER_UNIT is derived by dividing the REVENUE by the SALES_QTY.
♦
COST is derived by summing the UNIT_COST.
♦
COST_PER_UNIT is derived by summing the UNIT_COST and dividing it by the sum of the SALES_QTY.
♦
Save your work often.
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
337
Velocity Deliverable: Mapping Specifications Mapping Name
m_FACT_PROMOTIONS_AGG_DAILY_LOAD
Source System
Oracle Table
Target System
Initial Rows
Oracle Table
Rows/Load
Short Description
Load the daily promotions aggregate table from the STG_TRANSACTIONS table.
Load Frequency
Daily
Preprocessing
Target Append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
SOURCES Tables Table Name
Schema/Owner
Selection/Filter
TARGETS Tables
Schema Owner
Table Name
Update
Delete
Insert
Unique Key
LOOKUPS Lookup Name Table
Location
Match Condition(s) Filter/SQL Override Lookup Name Table
338
Location
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
Match Condition(s) Filter/SQL Override Lookup Name Table
Location
Match Condition(s) Filter/SQL Override
HIGH LEVEL PROCESS OVERVIEW
PROCESSING DESCRIPTION (DETAIL) SOURCE TO TARGET FIELD MATRIX Target Table
Target Column
Source
File
Source Column
Expression
Default Value if Null
Workflow Details This is a simple workflow containing a Start task and a Session task. Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
339
Run Details Your Task Details should be similar to Figure 15-3. Figure 15-3. Task Details of the completed session run
Your Source/Target Statistics should be similar to Figure 15-4. Figure 15-4. Source/Target Statistics of the completed session run
Your Preview Data results should be similar to Figure 15-5. Figure 15-5. Data Preview of the FACT_PROMOTIONS_AGG_DAILY table
340
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
341
342
Unit 15: Mapping Design Informatica PowerCenter 8 Level I Developer
Unit 16: Workflow Variables and Tasks In this unit you will learn about: ♦
Link Conditions
♦
Workflow variables
♦
Assignment tasks
♦
Decision tasks
♦
Email tasks
Lesson 16-1. Link Conditions You can set conditions on workflow links: ♦
If the link condition is True, the next task is executed.
♦
If the link condition is False, the next task is not executed.
To set a condition, right-click a link and enter an expression that evaluates to True or false. You can use workflow variables in the condition (see later).
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
343
Lesson 16-2. Workflow Variables Workflow variables can either be Pre-defined or User-defined.
User-defined workflow variables are created by selecting Workflows > Edit and then selecting the Variables tab.
344
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Description Workflow variables can be user-defined or pre-defined: User-defined workflow variables can be used to pass information from one point in a workflow to another. 1.
Declare workflow variables in the workflow Variables tab.
2.
Selecting persistent will write the last value out to the repository and make it available the next time the workflow is executed.
3.
Use an Assignment task to set the value of the variable.
4.
Use the variable value later in the workflow.
Pre-defined workflow variables come in two types: ♦
System Variables (SYSDATE and WORKFLOWSTARTTIME) can be used for example when calculating variable dates and times in the Assignment task link conditions.
♦
Task-specific workflow variables are available in Decision, Assignment and Timer tasks, and in link conditions. They include EndTime, ErrorCode, ErrorMsg, FirstErrorCode, FirstErrorMsg, PrevTaskStatus, SrcFailedRows, SrcSuccessRows, StartTime, Status, TgtFailedRows, TgtSuccessRows and TotalTransErrors.
Workflow variables are discussed in more detail in the Workflow Administration Guide.
Business Purpose A workflow can contain multiple tasks and multiple pipelines. One or more tasks or pipelines may be dependent on the status of previous tasks.
Example S2 may be dependent on the successful running of S1. Success may be defined as session status = Successful and the number of source and target failed rows = zero. The link that precedes S2 can be coded such that S2 will not run if all 3 of the criteria are not true. Use the Task Specific Workflow Variables 'Status', 'SrcFailedRows' and 'TgtFailedRows' in the Link Condition Expression. In this proposed case, there is no allowance for only 1 of the 3 conditions being true.
S4 may be desired not to run if S3 took more than 1 hour past the workflow start time. A truncation and testing of WORKFLOWSTARTTIME in the Link Condition preceding S4, is appropriate.
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
345
Lesson 16-3. Assignment Task
Description The Assignment task can establish the value of a Workflow Variable (refer to the subsequent Workflow Variables section of this document) whose value can be used at a later point in the workflow, as testing criteria to determine if (or when) other workflow tasks/pipelines should be run. It is a 3-step process: create a Workflow Variable in the workflow properties; establish the value of that variable with an Assignment task; test that variable value at some subsequent point in the workflow.
Business Purpose Running a workflow task may depend on the results of other tasks or calculations in the workflow. An Assignment task can do certain calculations, establish a variable value for a Workflow Variable. What that value is may determine whether other tasks or pipelines are run.
Example
S5 should run at least 1 hour after S2 completes. ASGN1 can be coded to set a time that TIMER1 will wait for, before proceeding to S5. To prevent ASGN1 from running until S2 completes, use a Link Condition (refer to Workflow Design section of this document).
346
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Code the Assignment task ASGN1 (in part) using the PowerCenter TRUNC date function, and the pseudocode for the variable date value > = Session2's EndTime + 1 hour. The Timer task TIMER1 will wait for that variable time to exist before running S5.
Lesson 16-4. Decision Task
Description Decision tasks enable the workflow designer to set criteria by which the workflow will or will not proceed to the next set of tasks, depending on whether the set criteria is true or false.
Business Purpose Commonly, workflows have multiple paths. Some are simply concurrent tasks. Others are pipelines of tasks that should only be run if results of preceding tasks are successful. Still others are pipelines of tasks that should only be run if those results are not successful. What determines the success or failure of a task or group of tasks is User Defined, depending on the business-defined rules and operational rules of processing. That criterion is set as the Decision Condition in a Decision Task and subsequently tested for a True or False condition.
Example If a session, group of sessions or any combination of workflow tasks is successful, a subsequent set of sessions should run. If any one of the tasks fails or does not produce desired results, those sessions should
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
347
not be run. Instead, an email should be sent to the processing operator to perhaps run a back out session, or simply notify the Development Team Lead or Business Unit Lead, that an error condition existed.
Lesson 16-5. Email Task
Description Email tasks enable PowerCenter to send email messages at various points in a workflow. Users can define email addresses, a subject line and the email message text. When called from within a Session task, the message text can contain variable session-related metadata.
Business Purpose Various business and operational staff may need to be notified of the progress of a workflow, the status of tasks (or combination of tasks) within it, or various metadata results of a session.
Example The Business Unit Team Lead may request to receive an email detailing the time a load finished, the total number of rows loaded and the number of rows rejected. This could be accomplished with either a 348
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
reusable email task (which allows variable session metadata) called from within a session. If sessionspecific variable metadata is not required, a standard text message could be send by using a non-reusable email task which follows the session in the workflow. Operational staff may request receipt of an email if a session-required source file does not arrive by the time the session is scheduled to run. Receipt of the email message would be the operator's signal that some type of manual intervention or restore routine is required to correct the problem.
Performance Considerations A running, configured email server is required; however, the impact of the Integration Service sending emails is minimal.
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
349
350
Unit 16: Workflow Variables and Tasks Informatica PowerCenter 8 Level I Developer
Unit 16 Lab: Load Product Weekly Aggregate Table Business Purpose The Mersche Motors data warehouse contains a number of aggregate tables. The management wants to be able to report on total sales for a product on a weekly basis. A weekly product sales aggregate table needs to be loaded for this purpose.
Technical Description The source for the weekly product aggregate table will be the daily product aggregate table. The mapping to load this table is located in the DEV_SHARED folder. A workflow needs to be created that will run the weekly aggregate load session after the daily aggregate load session has run 7 times. This can be accomplished using an assignment task, a decision task, link conditions and session tasks. A load date equal to the beginning day of the week will be used to provide the date key for the weekly aggregate table. The mapping to accomplish this has already been created and will need to be copied from the DEV_SHARED folder. It contains a mapping variable that will be incremented by 1 at the end of the session/mapping run.
Objectives ♦
Assigning Workflow Variables
♦
Incrementing Workflow Variables using the Assignment Task
♦
Branching in a workflow using a Decision Task
♦
Using Link Conditions
Duration 35 Minutes
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
351
Velocity Deliverable: Mapping Specifications Mapping Name
m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx
Source System
Oracle Tables
Target System
Oracle Table
Initial Rows
1390
Rows/Load
1390
Short Description
Load the weekly product aggregate table from the daily product aggregate table.
Load Frequency
Weekly
Preprocessing
Target Append
Post Processing Error Strategy
Default
Reload Strategy Unique Source Fields
PRODUCT_KEY, DEALERSHIP_ID, DATE_KEY
SOURCES Tables Table Name
Schema/Owner
FACT_PRODUCT_AGG_DAILY
TDBUxx
Selection/Filter
TARGETS Tables
Schema Owner
TDBUxx
Table Name
Update
Delete
FACT_PRODUCT_AGG_WEEKLY
X
Insert
Unique Key
X
PRODUCT_KEY, DEALERSHIP_ID, DATE_KEY
HIGH LEVEL PROCESS OVERVIEW (WORKFLOW) Start Task
Session Task
Assignment Task
Decision Task
Session Task
Email Task
PROCESSING DESCRIPTION (DETAIL) A workflow variable will be defined and set to 0 for the start of the workflow. The first Session task runs the load to the daily product aggregate table. The Assignment task increments the workflow variable by 1. The Decision task uses the MOD function to divide the workflow variable by 7 and see if it returns a 0. If it returns a 0, the second Session task runs and loads the weekly aggregate table. If it returns a non zero value, then an Email task runs (the Email task only runs if the Integration Service is associated with a mail server). This workflow must be run 7 times, emulating a week, to verify the process works properly. 352
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
Instructions Step 1: Copy the Mappings 1.
in the Designer, copy the m_FACT_PRODUCT_AGG_DAILY_LOAD mapping and the m_FACT_PRODUCT_AGG_WEEKLY_LOAD mapping from the DEV_SHARED folder.
2.
Select Yes for any Target Dependencies.
3.
Select Skip or Reuse to resolve any conflicts.
4.
Rename the mappings to include your student number.
5.
Save your work.
Step 2: Copy the Existing Workflow 1.
In the Workflow Manager, copy the wkf_FACT_PRODUCT_AGG_WEEKLY_LOAD workflow from the DEV_SHARED folder.
2.
Resolve the conflict by selecting the m_FACT_PRODUCT_AGG_DAILY_LOAD_xx mapping.
3.
Drag the new workflow into the Workflow Designer.
4.
Edit the session and make the following changes: a.
Rename it to include your assigned student number.
b.
In the Properties tab.
c. 5.
i.
Change the $Target Connection Value to reflect your assigned student connection NATIVE_EDWxx.
ii.
Change the Session Log File Name to include your student number.
Change the source and target connections to reflect your assigned student connection, NATIVE_STGxx and NATIVE_EDWxx respectively.
Select the menu option Workflows > Edit. a.
Rename it to include your student number.
b.
In the Properties tab change the Workflow Log File Name to include your student number.
c.
In the Variables tab create a new workflow variable:
♦
Variable Name = $$WORKFLOW_RUNS Datatype = integer Persistent = checked Default Value = 0
♦ ♦ ♦
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
353
Figure 16-1 shows the defined workflow variable: Figure 16-1. Workflow variable declaration
Step 3: Create the Assignment Task
354
1.
Add an Assignment Task to the workflow.
2.
Link the s_m_FACT_PRODUCT_AGG_DAILY_LOAD_xx Session task to the Assignment task.
3.
Double click the link.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
4.
Add a link condition to ensure that the assignment task only executes if the s_m_FACT_PRODUCT_AGG_DAILY_LOAD_xx Session task was successful. See Figure 16-2 for details. Figure 16-2. Link condition testing if a session run was successful
5.
Edit the Assignment task.
6.
Rename it to asgn_WORKFLOW_RUNS.
7.
In the Expressions tab, create an expression that increments the User Defined Variable named $$WORKFLOW_RUNS by 1. See Figure 16-3 for details.
Figure 16-3. Assignment Task expression declaration
8.
Save your work.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
355
Step 4: Create the Decision Task 1.
Add a Decision task to the workflow.
2.
Link the asgn_WORKFLOW_RUNS Assignment task to the Decision task.
3.
Double click the link.
4.
Add a link condition to ensure that the decision task only executes if the asgn_WORKFLOW_RUNS Assignment task was successful (refer to previous step).
5.
Edit the Decision task. a.
Rename it to dcn_RUN_WEEKLY.
b.
In the Properties tab. Create a Decision Name expression using the modulus function that checks to see if this is the seventh run of the workflow. This can be done by dividing the workflow variable by seven and checking to see if the remainder is 0. See Figure 16-4 for details.
Figure 16-4. Decision Task Expression
Tip: The decision task will evaluate the expression and return a value of either TRUE or FALSE. This can be checked in a link condition to determine the direction taken.
Step 5: Create the Session Task
356
1.
Create a session task named s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx that uses the m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx mapping.
2.
Link the dcn_RUN_WEEKLY Decision task to the s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx session task.
3.
Double click the link.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
4.
Add a link condition that checks to see if the dcn_RUN_WEEKLY Decision task has returned a value of TRUE, meaning that it is time to load the weekly aggregate table. See Figure 16-5 for details. Figure 16-5. Link condition testing for a Decision Task condition of TRUE
5.
Edit the s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx session. a.
Set the relational connection value for the SQ_SC_FACT_PRODUCT_AGG_DAILY source to NATIVE_EDWxx, where xx is your student number.
b.
Set the relational connection value for the SC_FACT_PRODUCT_AGG_WEEKLY target to NATIVE_EDWxx, where xx is your student number.
c.
Verify the Target load type is set to Normal.
d.
In the Properties tab set the Incremental Aggregation option to on.
Step 6: Create the Email Task 1.
Add an Email task to the workflow.
2.
Link the dcn_RUN_WEEKLY Decision task to the Email task.
3.
Double click the link. Add a link condition that checks to see if the dcn_RUN_WEEKLY Decision task has returned a value of FALSE, meaning that the daily load has completed and that it is NOT time to load the weekly aggregate table.
4.
Edit the Email task. a.
Rename it to eml_DAILY_LOAD_COMPLETE.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
357
b.
In the Properties tab. i.
Add [email protected] as the Email User Name.
ii.
Add appropriate text for the Email Subject and Email Text. See Figure 16-6 for details.
Figure 16-6. Email Task Properties
5.
Right-click in the workspace and select Arrange > Horizontal.
6.
Save your work. Your workflow should appear the same as displayed in Figure 16-7. Figure 16-7. Completed Workflow
Step 7: Start the Workflow and Monitor the Results The workflow will need to be run seven times in order to see the weekly aggregate session running. 1.
358
Start the workflow.
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
2.
Review the workflow in the Gantt view of the Workflow Monitor. It should appear similar to Figure 16-8. Figure 16-8. Gantt chart view of the completed workflow run
3.
Return to the Workflow Manager.
4.
Right-click the wkf_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx workflow in the Navigator window and select View Persistent Values. The value should be set to 1. See Figure 16-9 and Figure 16-10. Figure 16-9. View Workflow Variables
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
359
Note: Each time you run the workflow this value will increase by one. Figure 16-10. Value of the $$WORKFLOW_RUNS variable after first run
5.
Click Cancel to exit.
6.
Run the workflow six more times to emulate a week's normal runs and after the last run the Gantt Chart view should be similar to Figure 16-11. Figure 16-11. Gantt chart view of the completed workflow run after the weekly load runs
7.
Review the Task Details of the s_m_FACT_PRODUCT_AGG_WEEKLY_LOAD_xx session. Figure 16-12. Task Details of the completed session run
360
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
361
362
Unit 16 Lab: Load Product Weekly Aggregate Table Informatica PowerCenter 8 Level I Developer
Unit 17: More Tasks and Reusability In this unit you will learn about: ♦
Event Wait task
♦
Event Raise task
♦
Command task
♦
Reusable tasks
♦
Reusable Session task
♦
Reusable Session configuration
♦
pmcmd Utility
Lesson 17-1. Event Wait Task
Description Event Wait tasks wait for either the presence of a named flat file (pre-defined event) or some other userdefined event to occur in the workflow processing.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
363
For a predefined event, the task waits for the physical presence of a file in a directory local to the Integration Service process machine. This file is known as an indicator file. If the file does not exist, the Event Wait task will not complete. When the file is found, the Event Wait task completes and the workflow proceeds to subsequent tasks. The Event Wait task can optionally delete the indicator file once detected or the file can be deleted by a subsequent process. For a user-defined event, the developer: 1.
Defines an event in the workflow properties (prior to workflow processing)
2.
Includes an Event Wait task at a suitable point in the workflow, where further processing must await some specific event.
3.
Includes an Event Raise task at a suitable point in the workflow, e.g. after a parallel pipeline has completed. The Event Raise task sets the event to active. (Event Raise task is described later).
This lesson examines the two types separately.
Pre-Defined Event Business Purpose An Event Wait task watching for a flat file by name is placed in a workflow because some subsequent processing is dependent on the presence of the file.
Example A Session task may be expecting to process a flat file as source data. Inserting a Pre-Defined Event Wait task containing the specific name and location of the flat file causes the workflow to proceed if the file is found. If not found, the workflow goes into a Wait status.
364
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Performance Considerations The only known consideration is the length of time the Integration Service may have to wait if the file does not arrive. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case a file does not arrive in a reasonable length of time. Refer to the Email task earlier in this section.
User-Defined Event Business Purpose An Event Wait task waiting for the occurrence of a user-defined event will be strategically placed such that the workflow should not proceed further unless a different but specific series of pre-determined tasks and conditions have occurred. It will always work in concert with an Event Raise task. Per the 3 steps mentioned above: the user creates the workflow Event, the Event Raise triggers the Event (or sets it 'active') and the Event Wait task does not proceed to subsequent tasks until it detects that the specific Event was triggered.
Example A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4 and S5. S5 cannot run until S4 runs.
One way to ensure that S5 does not run unless S1 and S2 have run, is to create a workflow Event in the workflow properties, insert an Event Raise task after S2 that triggers (activates) the Event and place a User-Defined Event Wait task after S4 to detect whether the Event has been triggered. If not, the workflow waits until it is triggered.
Performance Considerations The only known performance consideration is the length of time the Integration Service may have to wait if the Event is not raised. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case the Event does not occur within a reasonable length of time. (Refer to the Email and Timer tasks earlier in this section.)
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
365
Lesson 17-2. Event Raise Task
Description Event Raise tasks are always used in conjunction with User-Defined Event Wait tasks. They send a signal to an Event Wait task that a particular set of pre-determined events have occurred. A user-defined event is defined as the completion of the tasks from the Start task to the Event Raise task. It is the same 3 step process previously mentioned: the developer defines an 'Event' in the workflow properties; the Event Raise task 'raises' the event at some point in the running workflow; an Event Wait task is placed at a different point in the workflow to determine if the Event has been raised.
Business Purpose This task allows signals to be passed from one spot in the workflow, to another that a particular series of pre-determined events have occurred.
Example This example is the same as the one in the Event Wait task section of this document. A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4 and S5. S5 cannot run until S4 runs.
366
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
One way to ensure that S5 is not run unless S1 and S2 have run, is to create a workflow Event in the workflow properties, insert an Event Raise task after S2 that triggers (activates) the Event and place a User-Defined Event Wait task after S4 to detect whether the Event has been triggered. If not, the workflow waits until it is triggered.
Performance Considerations As before, the only known performance consideration is the length of time the Integration Service may have to wait if the Event is not raised. This potential load window slowdown can be averted by proper workflow design which will provides alternatives in case the Event does not occur within a reasonable length of time. (Refer again to the Email and Timer tasks earlier in this section.)
Lesson 17-3. Command Task
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
367
Description Command tasks are inserted in workflows and worklets to enable the Integration Service to run one or more OS commands of any nature. All commands or batch files referenced must be executable by the OS login that owns the Integration Service process.
Business Purpose OS commands can be used for any operational or Business Unit related procedure and can be run at any point in a workflow. Command tasks can be set to run one or more OS commands or scripts/batch files, before proceeding to the next task in the workflow. If more than one command is coded into a Command task, the entire task can be set to fail if any one of the individual commands fails. Additionally and optionally, each individual command can be set not to run if a preceding command has failed.
Example A Session task that produces an output file could be followed by a Command task that copies the file to another directory or FTPs the file to another box location. The command syntax would be the same as the command syntax that would accomplish this at the OS command prompt on the Integration Service process machine. A Session task that is relying on a flat file data as its source data could be preceded by a Command task which contains a script that step-by-step verifies the presence of the file, opens it and verifies/compares control totals or record counts to some external source of information (again, any sequence of steps that could be accomplished at the OS level). A series of multiple concurrent or sequential Sessions in a workflow could all be followed by one Command task coded to copy (or move) all session logs created by the workflow to a special daily backup directory.
368
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Performance Considerations The only known consideration is the length of time the OS commands collectively take to run on the Integration Service process machine. This is not within the control of the Integration Service.
Lesson 17-4. Reusable Tasks
♦
Session, Email and Command tasks can be reusable.
♦
Use the Task Developer to create reusable tasks.
♦
Reusable tasks appear in the Navigator Tasks node and can be dragged and dropped into any workflow.
♦
In a workflow, a reusable task is indicated by a special symbol.
Lesson 17-5. Reusable Session Tasks A Session created directly in a workflow is a Non-reusable session; it is specific only to that workflow. A session created in the Task Developer workspace is reusable. An instance of a Reusable Session can be run in any workflow or worklet. Some of the properties of the session instance are customizable, workflow-byworkflow.
Business Purpose Occasionally, a certain mapping logic may be required to be run in multiple workflows. Since a mapping is a reusable object, the developer could code multiple sessions, all based on the same mapping. However, there is a simpler way to create 'like-sessions' that are all based on the same mapping - a Reusable Session. Once created in the Task Developer, an instance of the Reusable Session can be placed in any workflow or worklet.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
369
Examples If the same mapping needs to be used a number of times and if a number of Session properties need to be changed in each of the uses of a session (e.g. - time-stamped logs, increased Line Sequential Buffer Length, special error handling), the changes could be made in the parent session and every time an instance of the session were placed in a workflow, the session would automatically take on all those customized properties. This results in less developer effort versus creating separate new sessions, each with multiple customized session properties. A business receives 25 data file sources for its 25 customers. The data structure of each customer is different enough that a different mapping is required for each, to get the data into one common format. Once data is structured the same, each needs to be subsequently run using common mapping logic to further transform the data in a like manner. If 25 output files were created, 25 instances of one Reusable Session could be used to process all data files. Each workflow would contain one customer-specific session/mapping and one instance of the Reusable Session, pre-coded with common session properties.
Performance Considerations It is recommended to use reusable session tasks sparingly because retrieving the metadata for a reusable session task and its child instances from the repository takes longer than retrieving the metadata for a non-reusable session task.
Lesson 17-6. Reusable Session Configurations
370
♦
Define session properties that can be reused by multiple sessions within a folder.
♦
Use Tasks - Session Configuration menu option or Tasks toolbar icon.
♦
Opens Session Config Browser where you set session properties.
♦
Invoke in Session tasks, in Config Object tab, Config Name box.
♦
Can override these session properties further down in the Config Object tab. Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Lesson 17-7. pmcmd Utility
Description The pmcmd command line utility allows the developer to perform most Workflow Manager operations outside of the PowerCenter Client tool.
Syntax Example pmcmd startworkflow -sv integrationservicename -u yourusername -p yourpassword workflowname
This command will start the workflow located on the named Integration Service. You must supply the user name and password to sign in to the Integration Service, as well as the workflow name.
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
371
372
Unit 17: More Tasks and Reusability Informatica PowerCenter 8 Level I Developer
Unit 18: Worklets and More Tasks In this unit you will learn: ♦
Worklets
♦
Timer task
♦
Control task
Lesson 18-1. Worklets
Description Worklets are optional processing objects inside workflows. They contain PowerCenter tasks that represent a particular grouping of, or functionally-related set of tasks. They can be created directly in a workflow (non-reusable) or from within the Workflow Designer (reusable).
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
373
374
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Business Purpose A workflow may contain dozens of tasks, whether they are concurrent or sequential. During workflow design they will be developed naturally into 'groupings' of meaningfully-related tasks, run in the appropriate operational order. The workflow can run as-is, from start to finish, executing task-by-task or the developer can place natural groupings of tasks into worklets. A worklet's relationship to a workflow is like a subroutine to a program or an applet to an application. Worklets can be used in a very large workflow to encapsulate the natural groupings of tasks.
Example This example is similar to the one in the Event Wait task section of this document. Workflow with individual tasks: A workflow may have 2 concurrent pipelines containing various tasks, in this order. Pipeline 1 contains S1 and S2; Pipeline 2 contains S3 and S4. S5 cannot run until all 4 sessions run.
Workflow converted to internal worklets: Worklet1 contains S1 and S2; Worklet2 contains S3 and S4; S5 will run after both worklets complete.
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
375
Lesson 18-2. Timer Task
Description Timer tasks are used to keep track of the time an object of the workflow started. They can be based on: ♦
Absolute Time. The user can specify the date and time to start the timer from.
♦
Datetime Variable. The user can provide a variable that lets the Timer task know when to start the timer from.
♦
Relative Time. The Timer task can start the timer from the start time of the Timer task, the start time of the workflow or worklet, or from the start time of the parent workflow.
Business Purpose Business or operational processing specifications may require that a workflow run to a certain point then sit idle for a length of time or until a fixed physical point in time.
Example A workflow may contain sessions that should only run for a maximum amount of time. A Timer task can be set to wait for the maximum amount of time and then send an email or abort the workflow of the time limit is exceeded. The Timer task could be set to execute one hour after the start of the workflow.
376
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Performance Considerations There are no real performance considerations for the timer task.
Lesson 18-3. Control Task
Description Control tasks are used to alter the normal processing of a workflow. They can stop, abort or fail any workflow or worklet.
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
377
Control Options
Business Purpose When error condition exists, operational staff may prefer that a workflow or worklet simply stop, abort or fail rather than be emailed that an error exists.
Example As with the example in the Pre-Defined Event (Event Wait task section) a workflow may have a session which is expecting a flat file for source data. If the file does not arrive within one hour after the workflow start time, the desired action may be to fail the workflow. A workflow may have a session that runs through to a successful processing conclusion but could contain data row errors. A Control task could be placed subsequent to the session, with conditions set to stop, abort or fail the workflow (and use an Email task to notify someone of the issue).
378
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
379
380
Unit 18: Worklets and More Tasks Informatica PowerCenter 8 Level I Developer
Unit 18 Lab: Load Inventory Fact Table Business Purpose The Inventory fact table load runs directly after the Inventory staging table load. Sometimes the Inventory staging table load runs longer than is acceptable and delays the rest of the process. An email needs to be sent to the administrator if the staging load takes longer than 1 hour. The whole process needs to continue running even if the load takes longer than 1 hour.
Technical Description The support team has suggested that a worklet be created that controls the loading of the staging table. This worklet will contain the staging load session task, a Timer task to keep track of the run time, an Email task to inform the administrator should the load take longer than 1 hour and a Control task to stop the worklet if the Session task finishes in less than 1 hour. A workflow will be created that contains the worklet and then runs the fact load after the worklet has completed.
Objectives ♦
Create a Worklet
♦
Create a Timer task
♦
Create a Control task
♦
Create an Email task
♦
Use a Worklet within a Workflow
Duration 25 minutes
HIGH LEVEL PROCESS OVERVIEW (WORKLET) Timer Task
Email Task
Session Task
Control Task
Start Task
HIGH LEVEL PROCESS OVERVIEW (WORKFLOW)
Start Task
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
Worklet
Session Task
381
PROCESSING DESCRIPTION (DETAIL) Worklet: The top flow runs a Timer task that sets the timer to 1 hour from the beginning of the start of the task. This is followed by an email task to send an email to the administrator if the timer task executes. The bottom flow runs the staging load Session task followed by a control task. When the session finishes the control task will stop the parent (worklet), this is turn stops the timer task and causes the email task not to run. The worklet will still complete successfully. Workflow: The workflow runs the worklet and then the fact load Session task.
382
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
Instructions Step 1: Copy the Mappings 1.
Copy the m_STG_INVENTORY_LOAD and the m_FACT_INVENTORY_LOAD mappings from the DEV_SHARED folder.
2.
Rename them m_STG_INVENTORY_LOAD_xx and m_FACT_INVENTORY_LOAD_xx.
3.
Save your work.
Step 2: Create a Worklet 1.
Launch the Workflow Manager client and sign into your assigned folder.
2.
Open the Worklet Designer workspace.
3.
Select the menu option Worklets Create.
4.
Delete the default Worklet name and enter wklt_STG_INVENTORY_LOAD_xx.
Velocity Best Practice: The wklt_ as a prefix for a Worklet name is specified in the Informatica Velocity
Methodology.
Step 3: Create a Session Task 1.
Create a Session task named s_m_STG_INVENTORY_LOAD_xx that uses the m_STG_INVENTORY_LOAD_xx mapping.
2.
Edit the s_m_STG_INVENTORY_LOAD_xx session. a.
Ensure that the filename for the SQ_inventory flat file source is set to inventory.txt.
b.
Set the relational connection value for the STG_INVENTORY target to NATIVE_STGxx where xx is your student number.
c.
Set the Target truncate table option to on.
d.
In the General tab. Select the Fail parent if this task fails check box.
3.
Link the Start task to the s_m_STG_INVENTORY_LOAD_xx Session task.
Step 4: Create a Timer Task 1.
Add a Timer task to the worklet.
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
383
2.
Edit the Timer task. a.
Rename the task tim_MAX_RUN_TIME.
Velocity Best Practice: The tim_ as a prefix for a Timer task name is specified in the Informatica
Velocity Methodology. b.
In the Timer tab. Set the Relative time to start after 1 hour from the start time of this task. See Figure 18-1 for details. Figure 18-1. Timer Task Relative time setting
3.
Link the Start task to the tim_MAX_RUN_TIME Timer task.
4.
Save your work.
Step 5: Create an Email Task 1.
Add an Email task to the worklet.
2.
Edit the Email task. a.
Rename the task eml_MAX_RUN_TIME_EXCEEDED.
Velocity Best Practice: The eml_ as a prefix for an Email task name is specified in the Informatica Velocity Methodology.
384
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
b.
In the Properties tab: i.
Enter [email protected] as the Email User Name.
ii.
Enter Session s_m_STG_INVENTORY_LOAD_xx exceeded max time allotted as the Email Subject.
iii.
Enter something appropriate for the Email Text.
See Figure 18-2 for details. Figure 18-2. Email Task Properties Tab
3.
Link the tim_MAX_RUN_TIME Timer task to the eml_MAX_RUN_TIME_EXCEEDED Email task.
4.
Save your work.
Step 6: Create a Control Task 1.
Add a Control task to the worklet.
2.
Edit the Control task. a.
Rename the task ctl_STOP_TIMEOUT.
Velocity Best Practice: The ctl_ as a prefix for a Control task name is specified in the Informatica Velocity Methodology. b.
In the Properties tab.
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
385
c.
Select Stop parent for the Control Option Value. See Figure 18-3 for details. Figure 18-3. Control Task Properties Tab
3.
Link the s_m_STG_INVENTORY_LOAD_xx Session task to the ctl_STOP_TIMEOUT Control task.
4.
Save your work.
5.
Right Click anywhere in the Worklet workspace and select Arrange>Horizontal. Your Worklet should appear the same as displayed on Figure 18-4. Figure 18-4. Completed Worklet
Step 7: Create the Workflow
386
1.
Create a workflow named wkf_FACT_INVENTORY_LOAD_xx.
2.
Drag the wklt_STG_INVENTORY_LOAD_xx Worklet from the Worklets folder in the Navigator Window into the workflow.
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
3.
Link the Start task to the wklt_STG_INVENTORY_LOAD_xx worklet.
4.
Create a session task named s_m_FACT_INVENTORY_LOAD_xx that uses the m_FACT_INVENTORY_LOAD_xx mapping.
5.
Edit the s_m_FACT_INVENTORY_LOAD_xx session: a.
Set the relational connection value for the SQ_STG_INVENTORY source to NATIVE_STGxx where xx is your student number.
b.
Set the relational connection value for the FACT_INVENTORY target to NATIVE_EDWxx where xx is your student number.
c.
Set the Target load type to Normal.
6.
Link the wklt_STG_INVENTORY_LOAD_xx worklet to the s_m_FACT_INVENTORY_LOAD_xx Session task.
7.
Add a link condition to ensure that the Session task only executes if the wklt_STG_INVENTORY_LOAD_xx Worklet did not fail. Hint: worklet status != FAILED. Your Workflow should appear the same as displayed on Figure 18-5. Figure 18-5. Completed Workflow
Step 8: Start the Workflow and Monitor the Results 1.
Start the workflow.
2.
Review the workflow in the Gantt Chart view of the Workflow Monitor. When completed it should appear similar to Figure 18-6. Figure 18-6. Gantt chart view of the completed workflow run
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
387
388
Unit 18 Lab: Load Inventory Fact Table Informatica PowerCenter 8 Level I Developer
Unit 19: Workflow Design In this unit you will learn about: ♦
Designing workflows
♦
The workshop will give you practice in designing your own workflow.
Lesson 19-1. Designing Workflows Description This is designed to provide the user a checklist of topics to entertain during the workflow development process. This document will cover a variety of situations users will have to address and help them ask the right questions before and during the design process.
Considerations The workflow process requires some up front research. Before designing a workflow, it is important to have a clear picture of the task-to-task processes. ♦
Design a high-level view of the workflow and document the process within the workflow, using a textual description to explain exactly what the workflow is supposed to accomplish and the methods or steps it will follow to accomplish its goal.
♦
The load development process involves the following steps: ♦
♦
Clearly define and document all dependencies ♦ Analyze the processing resources available ♦ Develop operational requirement ♦ Develop tasks, worklets and workflows based on the results Create an inventory of Worklets and Reusable tasks. This list is a 'work in progress' list and will have to be continually updated as the project moves forward. The lists are valuable to all but particularly for the lead developer. Making an up front decision to make all Session, Email and Command tasks reusable will make this easier.
♦
The administrator or lead developer should put together a list of database connections to be used for Source and Target connection values.
♦
Reusable tasks need to be properly documented to make it easier for other developers to determine if they can/should use them in their own development.
♦
If the volume of data is sufficiently low for the available hardware to handle, you may consider volume analysis optional, developing the load process solely on the dependency analysis. Also, if the hardware is not adequate to run the sessions concurrently, you will need to prioritize them. The highest priority within a group is usually assigned to sessions with the most child dependencies.
♦
Another possible component to add into the load process is sending e-mail. Three e-mail options are available for notification during the load process: ♦
♦
Post-session e-mails can be sent after a session completes successfully or when it fails ♦ E-mail tasks can be placed in workflows before or after an event or series of events ♦ E-mails can be sent when workflows are suspended Document any other information about the workflow that is likely to be helpful in developing the workflow. Helpful information may, for example, include source and target database connection
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
389
information, pre or post workflow processing requirements, and any information about specific error handling for the workflow. ♦
Create a Load Dependency Analysis. This should list all sessions by dependency, along with all other events (Informatica or other) that they depend on. Also be sure to specify the dependency relationship between each session or event, the algorithm or logic needed to test the dependency condition during execution, and the impact of any possible dependency test results (e.g., don't run a session, fail a session, fail a parent or worklet, etc.)
♦
Create a Load Volume Analysis. This should list all the sources and row counts and row widths expected for each session. This should include all Lookup transformations in addition to the extract sources. The amount of data that is read to initialize a lookup cache can materially affect the initialization and total execution time of a session.
♦
The completed workflow design should then be reviewed with one or more team members for completeness and adherence to the business requirements. And, the design document should be updated if the business rules change or if more information is gathered during the build process.
Workflow Overview Command Task
Session Task 1 Start Task
Decision Task Session Task 2
Control Task Email Task
Workflow Specifics The following are tips that will make the workflow development process more efficient (not in any particular order): ♦
If developing a sequential workflow, use the Workflow Wizard to create Sessions in sequence. There is also the option to create dependencies between the sessions
♦
Use a parameter file to define the values for parameters and variables used in a workflow, worklet, mapping, or session. A parameter file can be created by using a text editor such as WordPad or Notepad. List the parameters or variables and their values in the parameter file. Parameter files can contain the following types of parameters and variables: ♦
♦
390
Workflow variables ♦ Worklet variables ♦ Session parameters ♦ Mapping parameters and variables When using parameters or variables in a workflow, worklet, mapping, or session, the Integration Service checks the parameter file to determine the start value of the parameter or variable. Use a parameter file to initialize workflow variables, worklet variables, mapping parameters, and mapping variables. If not defining start values for these parameters and variables, the Integration Service checks for the start value of the parameter or variable in other places. Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
♦
Session parameters must be defined in a parameter file. Since session parameters do not have default values, when the Integration Service cannot locate the value of a session parameter in the parameter file, it fails to initialize the session. To include parameter or variable information for more than one workflow, worklet, or session in a single parameter file, create separate sections for each object within the parameter file.
♦
Also, create multiple parameter files for a single workflow, worklet, or session and change the file that these tasks use, as necessary. To specify the parameter file that the Integration Service uses with a workflow, worklet, or session, do either of the following: ♦
♦
Enter the parameter file name and directory in the workflow, worklet, or session properties. ♦ Start the workflow, worklet, or session using pmcmd and enter the parameter filename and directory in the command line. On hardware systems that are under-utilized, you may be able to improve performance by processing partitioned data sets in parallel in multiple threads of the same session instance running on the Integration Service node. However, parallel execution may impair performance on over-utilized systems or systems with smaller I/O capacity
♦
Incremental aggregation is useful for applying captured changes in the source to aggregate calculations in a session. If the source changes only incrementally, and you can capture changes, you can configure the session to process only those changes. This allows the Integration Service to update your target incrementally, rather than forcing it to process the entire source and recalculate the same calculations each time you run the session.
♦
Target Load Based Strategies: ♦
♦
Loading directly into the target is possible when the data is going to be bulk loaded. ♦ Load into flat files and bulk load using an external loader. ♦ Load into a mirror database. From the Workflow Manager Tools menu, select Options and select the option to 'Show full names of task'. This will show the entire name of all tasks in the workflow.
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
391
392
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
Unit 19 Workshop: Load All Staging Tables in Single Workflow Business Purpose All of the staging tables need to be loaded in a single workflow.
Technical Description The instructions will provide enough detail for you design and build the workflow that will load all of the staging tables in a single run. If you are unclear on any of the instructions please ask the instructor.
Objectives ♦
Design and create a workflow to load all of the staging tables
Duration 60 minutes
Workshop Details Mappings Required This section contains a list of the mappings that will be used in the workflow. ♦
m_Stage_Payment_Type
♦
m_Stage_Product
♦
m_Dealership_Promotions
♦
m_Stage_Customer_Contacts
♦
m_STG_TRANSACTIONS
♦
m_STG_EMPLOYEES
Workflow/Worklet Details This section contains the workflow processing details. 1.
Name the workflow wkf_LOAD_ALL_STAGING_TABLES. The workflow needs to start at a certain time each day. For this workshop you can set the start time to be a couple of minutes from the time you complete the workflow. Remember that the start time is relative to the time on the Integration Service process machine.
2.
No sessions can begin until an indicator file shows up. The indicator file will be named fileindxx.txt and will be created by you using any text editor. You will need to place this file in the directory indicated by the instructor after you start the workflow. If you are in a UNIX environment you may skip this requirement.
3.
In order to utilize the CPU’s in a more efficient manner you will want to run some of the sessions concurrently and some of them sequentially: a.
The sessions containing mappings m_Stage_Payment_Type, m_Stage_Product and m_Dealership_Promotions can be run sequentially.
b.
The session containing mapping m_Stage_Customer_Contacts can be run concurrently to the sessions in the previous bullet point.
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
393
c.
If any of the previous sessions fails then an email should be sent to the administrator and the workflow aborted. Use [email protected] as the Email User Name.
d.
The session containing mapping m_STG_EMPLOYEES can only be run after the 4 previously mentioned sessions complete successfully.
e.
The session containing mapping m_STG_TRANSACTIONS needs to be run concurrently to the m_STG_EMPLOYEES.
f.
If either of the previous sessions fails an email should be sent to the administrator.
4.
All sessions need to truncate the target tables. You may want to create reusable sessions from previously created workflows.
5.
The management only wants the workflow to run a maximum of 50 minutes. Should the workflow take longer than the 50 minutes an email must be sent to the Administrator. Should the workflow finish in the allotted time the timer task will need to be stopped.
There is more than one solution to the workshop. You will know that your solution has worked when all of the sessions have completed successfully.
394
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
395
396
Unit 19: Workflow Design Informatica PowerCenter 8 Level I Developer
Unit 20: Beyond This Course
Note: For more information on PowerCenter training, see http://www.informatica.com/services/ education_services
Unit 20: Beyond This Course Informatica PowerCenter 8 Level I Developer
397
Note: For more information and to register to take an exam, see http://www.informatica.com/services/ education_services/certification/default.htm
398
Unit 20: Beyond This Course Informatica PowerCenter 8 Level I Developer