Designing a chip Challenges, Trends, and Latin America Opportunity Victor Grimblatt R&D Group Director
© Synopsys 2012
1
SASE 2012
Agenda Introduction The Evolution of Synthesis SoC IC Design Methodology New Techniques and Challenges IP Market, an opportunity for Latin America © Synopsys 2012
2
Introduction
© Synopsys 2012
3
Interesting Facts from Cisco • Last year’s mobile data traffic eight times the size of the entire global Internet in 2000 • Global mobile data traffic grew 2.3-fold in 2011, more than doubling for 4th year in a row • Mobile video traffic exceeded 50% for the first time in 2011 • Average smartphone usage nearly tripled in 2011 • In 2011, a 4th generation (4G) connection generated 28x more traffic on average than non-4G connection
Source: Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2011–2016, Feb 14, 2012
© Synopsys 2012
4
Drives Exploding Need for Bandwidth and Storage Bandwidth Increase
A Decade of Digital Universe Growth 7.910 Zettabytes 8000 7000 6000 5000 4000 3000 2000 1000
130 Exabytes
1.2 Zettabytes
0 2005
© Synopsys 2012
5
2010
2015
• One zettabyte = stacks of books from Earth to Pluto 20 times (72 billion miles) • If an 11 oz. cup of coffee equals 1 gigabtye, then 1 zettabyte would have the same volume of the Great Wall of China
Source: IBS and Cisco
© Synopsys 2012
6
Tomorrow’s World Reality Augmented Reality Blended Reality Search Agents Info That Finds You (and networks that know you)
2D 3D Immersive Video Holographics Medical Mobile Medical Personal Medical Person to Person Machine to Machine Human Machines © Synopsys 2012
7
What the Future Has in Store
© Synopsys 2012
8
How Does This Affect Design?
© Synopsys 2012
9
Megatrends Change Design Requirements Used to Be…
Computing Creating Info Compute Power Business At your desk Work
© Synopsys 2012
10
Today It’s…
Connectivity Consuming Info Battery Power Consumer Anywhere, anytime Entertainment
Trends Drive Process Migration Last
35%
Current
Next 31%
30% 25% 20%
20% 15%
13%
13%
10% 5%
5%
6%
180nm
130nm
5%
4%
3%
0% ≥250nm
Synopsys Global User Survey, Feb 2012 N = 1290 © Synopsys 2012
11
90nm
65/55nm
45/40nm
32/28nm
22/20nm
<20nm
and Increasing Gate Count 50% 45% >100M, 13%
40%
35% 30%
50-100M, 6%
25%
20-50M, 7% >100M, 3%
20%
50-100M, 3%
10-20M, 5%
20-50M, 3%
15%
10-20M, 5%
10%
5-10M, 9%
5-10M, 4%
5%
2-5M, 6%
2-5M, 7%
2010
2011
0% Synopsys Global User Survey, Feb 2012 © Synopsys 2012
12
and Faster Designs 100%
>2GHz 1-2GHz
751MHz-1GHz
80%
42%
501-750MHz 401-500MHz
60%
301-400MHz 40%
201-300MHz
20%
101-200MHz 51-100MHz ≤50MHz
0% 2004
2005
2006
Synopsys Global User Survey, Feb 2012 N = 962 © Synopsys 2012
13
2007
2008
2009
2010
2011
… while requiring aggressive Power Management 400%
Other Back-biasing/Well-biasing
350%
Library Variables (e.g., multi-channel length libraries) Low Vdd Standby
300%
State retention
250%
MTCMOS/Power gating
200%
Lower Vdd operation Dynamic Voltage/Frequency Scaling (DVFS) Multi-Corner, Multi-Mode (MCMM) optimization Multi-voltage domains
150% 100% 50%
Multi-Vt leakage optimization Clock gating
0% 2010 Synopsys Global User Survey, Feb 2012 N = 282 © Synopsys 2012
14
2011
Design Challenges are Multiplying Example of 28-nm challenges •
Unidirectional Poly (and other RDRs) – –
•
Device segmentation –
•
Limited device sizes, large analog devices broken up into smaller pieces; Increases analog area
28 nm is 2X harder than 40 nm 28 nm IP – area increases Complexity – Approximately 1700 design rule checks at 28nm vs. 700 at 65nm without circuit innovation – – –
• •
Requires separate layouts, verification & test effort. GF and TSMC have different preferred orientations (N/S v. E/W) No poly for local routing
28 nm analog layout 9% larger than 40 nm due to limitations on poly area
8x the # of corners at 65 v. 28nm Lower Vddmin resulting in less design headroom Metal resistance doubles from 40 nm to 28 nm
Global versus local Vth variations due to random doping effects Device Aging –
Must take into account device degradation over time due to threshold voltage instability (NBTI/PBTI) and mobility degradation (HCI)
© Synopsys 2012
15
40 nm layout
System on a chip SoC = Software HW & SW Development Costs App-Specific SW
$2.50
Low-Level SW OS Support
$2.00
Design Management Post-silicon Validation
$M
$1.50
Masks Physical Design
$1.00
RTL Verification $0.50
RTL Development Spec Development
$1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627 Source: IBS, Synopsys Months
IP Qualification
Software is Half the Time to Market For a Typical SoC ! © Synopsys 2012
16
… And Half the Cost $175
Software
$150
Cost ($M)
$125
Hardware
$100
$75
$50
$25
$0 90nm (60M)
© Synopsys 2012
17
65nm (90M) 45/40nm (130M) 32/28nm (180M) Feature Dimension (Transistor Count) Source: IBS and Synopsys, 2011
22/20nm (240M)
Unlike Moore… Software Guys are Pessimists Page’s Law: 2009
Software gets twice as slow every 18 months.” Wirth’s Law: 1995
Software is getting slower more rapidly than hardware becomes faster. ” © Synopsys 2012
18
What Can We Do About It?
© Synopsys 2012
19
The Evolution of Synthesis
© Synopsys 2012
20
Placement & Routing Ronald L. Rivest, Charles M. Fiduccia, Robert M. Mattheyses, GE & MIT, 1982
Source: GE, 1986 © Synopsys 2012
21
Logic Synthesis David Gregory, Karen Bartlett, Aart J. de Geus, Gary D. Hachtel, GE & University of Colorado at Boulder, 1986
© Synopsys 2012
22
Until Late 80’s The Implementation Flow Was Quite Straight Forward There Was Already a “Wall”…
Front-End
• Schematic Capture • Timing Simulation
Back-End
• Place & Route • DRC/LVS
© Synopsys 2012
23
Early 90’s The Relationship Needs Improvements Badly: “Walls” Now Lead to Iterations, Often Out of Control
Front-End
• RTL Simulation • Logic Synthesis
Back-End
• Place & Route • DRC/LVS
Sign-Off © Synopsys 2012
24
• Delay Calculation • Timing Simulation
Early 00’s, 130nm, 7+ Metals PC and Astro+Blast+SilEnsemble – The Relationship Matures Still, Too Many “Walls”, and # of Iterations Too High • RTL Simulation • Logic, Power & Test Synthesis
Front-End • Floorplan • Physical Synthesis
Back-End Sign-Off © Synopsys 2012
25
• Floorplan • P&R
• Extraction & STA • DRC/LVS
The Evolution Of The Relationship Convergence !
2003 90 Nanometers “Interoperability”
© Synopsys 2012
26
2005 65 Nanometers “Correlation”
2007 45/40 Nanometers “Look Ahead”
2009… 32/28 Nanometers “In-Design”
The Evolution Of The Relationship Quick Summary
• Late 80’s - Early 90’s. Attempt #1 : – Predict the future based on the past – Wire load models, broken by nanometer wires
• Mid 90’s. Attempt #2 : – Predict the future based on the present – Front-end floorplanning, broken by “Frankenstein flows”
• Late 90’s – Today. Attempt #3 : – Partner to create the future , rather than attempt to predict it – Convergence of synthesis and place & route – But underlying mathematics is different
© Synopsys 2012
27
Logic Synthesis And Place & Route A Revolutionary… Evolution : Convergence ! Logic Compiler, ca. 1986
Design Compiler, 2010.03
From Equations to Gates, to… Placed and Routable Gates © Synopsys 2012
28
SoC
© Synopsys 2012
29
What is High-Level Synthesis? Designer Intent
User inputs: • High-level algorithm • Constraints
c a * b c;
Automation using High-Level Synthesis
HLS outputs: HLS Results
© Synopsys 2012
• • • •
Synthesizable RTL C-model RTL testbench Scripts for synthesis, verification and downstream tools
30
Design technology and methodology • Develop and verify hardware at a higher level of abstraction – –
Much smaller code with fewer bugs introduced Rapid architecture exploration
• Automate implementation and verification – –
Automatic optimizations that equal hand-coded QoR Eliminate manual RTL coding & verification
Example benefits • 2-5X productivity for initial designs • 5-10X productivity for design re-use • Increased exploration leading to better results • Multi-million gate designs in weeks vs. months
High-Level Synthesis Advantage Algorithm Design
RTL Coding
Cycle by cycle functional debug
For single architecture only
HLS-based Block Design
Algorithm Design
RTL automatically generated
High-Level Design
Faster design at higher abstraction
31
RTL Verification
Quickly evaluate multiple architectures
Implementation
Spreadsheets
© Synopsys 2012
RTL Verification
Implementation
Architecture Exploration
Traditional Block Design
Better Designs,
Faster, more automatic model-to-RTL validation, reduced RTL-level debug
Faster
Changing FPGA Design Methodology Classic FPGA Methodology Top Down Implementation
• Best Quality of Results • May not be suitable for largest FPGA designs (long runtimes and large memory requirements)
“Divide and Conquer” Top Down Incremental Implementation
• Reduced Quality of Results • Shorter runtime -preserve unchanged parts • “Design Preservation”, block based flows, and Incremental P&R with “SmartGuide”
Emerging “Mix and Match” Bottom Up and Top Down Flow
• Distributed development • Better design preservation and isolation • Design style adjustments needed to achieve optimal timing Quality of Results (e.g. registering module boundaries
© Synopsys 2012
32
Unified RTL Flow for FPGA and SOC FPGA Synthesis
DesignWare IP
Synplify Premier/Certify
DW Implementation
Your IP ASIC Implementation
DesignWare Building Blocks
Galaxy
DW Implementation
Common RTL from prototype to production a combination of IP and tools
All DW Building blocks, minPower and Macrocell Blocks are supported in Synplify Premier and Certify for FPGA-based prototyping
© Synopsys 2012
33
Today’s SOC Designs • Designs are getting larger and larger. • Schedule stays the same or shorter despite the increases in design complexity. • Engineering resources are not increasing to handle this complexity.
How can EDA help manage this complexity? © Synopsys 2012
34
Many Methods of Designing “SOC Design”… Similar Approach But End Results Vary … Final Product Varies
Building Blocks Instructions Instructions 1. Preheat the oven to 450. 2. Melt butter and chocolate together in the top of a double broiler or in the microwave. Add sea salt. 3. Meanwhile, beat together the egg, egg yolks, and sugar with a whisk or an electric beater until light and slightly foamy. 4. Add the egg mixture to the warm chocolate; whisk quickly to combine. Add flour and stir just to combine. The batter will be quite thick. 5. Butter small ramekins, or use Reynolds foil cupcake liners. 6. Divide the batter evenly among the ramekins. (You can make the cakes in advance to this point and chill them until you're ready to bake. Be sure to bring the batter back to room temperature before baking.) 7. Baking time will depend on your oven; start with 7 minutes for a thin outer shell with a completely molten interior. 8. Melt a little more chocolate to drizzle on top. Sprinkle a little more salt, and serve with berries or ice cream.
© Synopsys 2012
35
Ever Increasing Chip Size Leads to Hierarchical Design Flat versus Hierarchical
Typical Threshold
Hierarchical Flat
Instances
© Synopsys 2012
36
3M 5M
15M
…
100M+
Ten Best Practices for Hierarchical Design Understanding These Practices Can Help #1 Floorplan Affects design closure
#2 Top-Level Style Requires different discipline
#3 Block Size Tradeoff size versus TAT
#4 Modeling Modeling for top-level closure
#5 Top-Level Closure Meeting the inter-block signals
© Synopsys 2012
37
#6 Block-Level I/O Paths Affects block design closure
#7 Block-Level Drivers/Loads Affects block boundary closure
#8 Inter-Block Critical Paths Absence helps chip closure
#9 Constraints Management Affects design closure & TAT
#10 Signoff STA Correlates to close timing
#1 Floorplan Affects Design Closure Example 1
– – – – –
vs.
Example 2
© Synopsys 2012
38
Logical connectivity Clock Voltage areas Physical size Multiple Instantiated Modules (MIM)
• Macro Placement • Power Planning • IO Planning
vs.
Challenge
• Partitioning Guidelines
Better Approach
#2 Top-Level Style Requires Different Design Discipline Channel
Narrow Channel
Abutted
Implementation Complexity clock © Synopsys 2012
39
Data
#3 Block Size Tradeoff Size versus TAT (turn around time)
1.5M 1.5M
1.5M
3M 1.5M
5M
1.5M 2M
2M
5M
1.5M
Faster TAT per block but more blocks to integrate
Longer TAT per block but fewer blocks to integrate
What Is Reasonable Size Depends A Lot On Design Team Preference? Note: Block Size in instances © Synopsys 2012
40
#4 Modeling ETM vs. Abstract Model Extracted Timing Model (ETM)
Abstract Model
Blocks modeled by timing arcs only Used for customized IP
Interface cells of each block retained Recommended for P&R blocks
© Synopsys 2012
41
#5 Top-Level Closure Meeting Timing on Inter-Block Signals • Closing top-level inter block signals can be challenging • Can be minimized with Chg graphic
– Proper estimation of interface constraints – Proper floorplanning for signal connectivity between blocks
• Simultaneous optimization of top-level and inter-block signals needed
© Synopsys 2012
42
#6 Block Level I/O Paths I/O Paths Are Typically Not Finalized Early Typical Hierarchical Structure
Logic Registers
Logic
Logic Registers
Adjacent Block
Logic Registers
Block Under Design
Logic Registers
Adjacent Block
• I/O paths are not finalized during early stage block design • Overconstraining these paths direct the tool to focus on I/O paths instead of the intra-block paths • Accuracy of proportional time budgets is affected if interfaces are still changing © Synopsys 2012
43
#6 Block Level I/O Paths Registering Block Outputs Makes Budgeting Easier A Better Approach
Logic
Logic Registers
Adjacent Block
Logic Registers
Logic Registers
Block Under Design
Registers
Adjacent Block
• Registering block outputs makes budgeting less dependent on completeness of the netlist and easier • Re-partitioning logic hierarchy helps manage constraints complexity • Partitioning according to power domains / logic hierarchy makes flow easier © Synopsys 2012
44
#7: Block Level Drivers and Loads Modeling I/O with Realistic Values Drives Convergence • Block Interface timing is one of the toughest issues in hierarchical flow • Realistic model of your input and output ports helps design convergence
Block A
Block B A
B
• When designing Block A, need to consider load at output port A – set_load • When designing Block B, need to consider driving cell at input port B – set_driving_cell © Synopsys 2012
45
#7: Block Level Drivers and Loads Inter-blocks Paths Are One Of The Toughest SOC Challenges
If no load is specified
Cell cannot be sized correctly
n
• Without good estimation of loads and driving cell – Integrating these blocks forces iterations unnecessary to meet timing
• Budgeting can automatically generate driver and load information – Generate a quick netlist to run through budgeting for more accurate results
© Synopsys 2012
46
#8: Inter-Block Critical Paths Absence Helps Chip Closure Block to Block path, crossing Top
Top to Block path
• Avoid critical paths crossing multiple blocks – Makes timing closure difficult
• Contain them within the same block or if you must cross multiple blocks, minimize the number of blocks • Budgeting, sizing, and load estimations are needed to solve inter-block critical paths violations . If tool cannot see complete path, may be challenge to stitch them at top-level
© Synopsys 2012
47
#8: Inter-Block Critical Paths Shielding Helps Chip Closure Without Shielding
With Shielding
• Use shielding to reduce crosstalk effects between the block- and toplevel t significantly improve timing closure in inter-block critical paths • Use new Transparent Interface Optimization (TIO) in IC Compiler
© Synopsys 2012
48
#9: Constraints Management Pay Attention to Constraints Eg: Infeasible Path, insufficient for 1 clock cycle
• Infeasible paths are paths that are impossible to meet timing – Missing false path/multi-cycle path constraints – Unreasonable input/output delay constraints
Eg: Infeasible Path, i/p delay too large
• Other things to watch out – – – – –
© Synopsys 2012
49
size_only attributes dont_touch attributes Multi-cycle paths False paths Etc.
#10 Signoff Correlation Tighter Correlation Helps Close Timing
• Use IC Compiler signoff correlation checker system –
–
Performs both consistency and correlation check with user controllable accuracy level Supports both pre-route and post-route checks
© Synopsys 2012
50
#10 Signoff Correlation Flows Flows for Pre-route and Post-route Correlation Checks Pre-Route Flow
• Focus on environment and library setup for pre-route correlation • Certain variables for correlation may have runtime and/or QoR impact on optimization • Correlation setup may change and re-check may be needed for post-route © Synopsys 2012
51
Today’s Designs Are Big & Hierarchical Timing Signoff Challenges
• More effects, more variation – Impacts accuracy vs. runtime
• Hierarchical P&R vs. flat signoff – Large machines and runtime – Interactions between top & block
• 30-40% blocks are tough to close – 10 to 20 ECO iterations
• Lot’s of scenarios to analyze Source: L. Besson, STMicroelectronics
© Synopsys 2012
52
– more machines, more reports
The Nanometer Challenges Top Issues to Look at
(1) SION Dielectric/Polysilicon Gate; (2) High-k Dielectric/Metal Gate
Source: ITRS 2009; C.A. Malachowsky, NVIDIA, EDPS 2009; P. Saxena, Intel, ISPD 2003 © Synopsys 2012
53
IC Design Methodology
© Synopsys 2012
54
But, Synthesis has Evolved • Synthesis has evolved beyond logic mapping • It’s now predicting and resolving congestion for physical design • Synthesis prediction of physical effects evolution is key to progress
© Synopsys 2012
55
And, Physical Design Under Heavy Load • Increasingly, Physical Design is the driver for implementation schedule • It’s where the rubber meets the road – speed, die-size, power, yield .. • P&R evolution key to progress
© Synopsys 2012
56
What’s on Designer’s Mind? Design & Project Management! How close are we to our design goals?
What’s the status of the blocks right now?
How can I use the experience from this project to plan the next one better?
Is everyone using the same tool version and the standard scripts?
How much compute and license resources are we using?
What’s taking up the most time? Which step? Which block?
© Synopsys 2012
57
Many Flavors Of “Methodology”… Imagination Is the Only Limit…
Source: www.bk.com 2010 © Synopsys 2012
58
Past “Guidance” doesn’t Always Apply to the Present • create_clock -period [0.7 * target] high performance • set_max_area to “0” small area • Use small blocks for fast turnaround time Things have changed but users are still using the above techniques!
Place & Route
DRC / LVS
© Synopsys 2012
59
Place & Route
DRC / LVS
Signoff
Signoff
Signoff
Design Planning
Synthesis
Synthesis
Place & Route
DRC / LVS
Synthesis Exploration
Synthesis
2011“Exploration”
2009-2010 “In-Design”
Place & Route
DRC / LVS
Implementation
2005-2008 “Look-ahead”
Signoff
2000-2005 “Correlation”
The Past vs. The Present Wireload Model (WLM) results in higher frequency during Synthesis than using Design Compiler Topographical (DCT) technology …
Figure 1
Figure 2
With WLM, these two circuits have the same delay
With DCT, the delay is a reflection of the x-y location of the cells
Which is more realistic? © Synopsys 2012
60
Ten Best Practices for Design Methodology #1 Libraries Know Your Attributes
#2 Setup Correlation and Runtime
#3 Scripts Impacts Your Design
#4 Constraints Watch Your Constraints
#5 Analyze Analyze-Fix-Proceed
© Synopsys 2012
61
#6 Methodology One or Two Flows
#7 Optimization Adjust Accordingly
#8 Signoff Review Your Environment
#9 Performance Leverage Your EDA Partner
#10 Low Power Architecture Drives Power
#1 Libraries: Know Your Attributes Why is my design larger in area? Why is it taking so long to run?
After Optimization
Original Area New Area
Watch for dont_use, dont_touch, and size_only usage in your libraries and scripts • Attributes are user-controlled to guide optimization • Restricting optimization may lead to problems © Synopsys 2012
62
Technology and IP Make Sure to Have a Good Quality Library A properly designed set of library cells give optimization engines more choice – –
–
–
•
Avoid cells sensitive to minor change in load, impedes convergence Footprint-equivalent cells are useful for final-stage optimization w/ minimal perturbation to other design metrics Std. cell pins should be on grid (especially complex cells with small drive strength: higher pin density) Multiple variants for each flop (drive strengths, delays, setup times, .. )
Library quality enabler for targeted performance
© Synopsys 2012
63
Example: Cell Sensitivity To Load Uncertainty
Cell A
Delay
•
B
Cell B
D *
A
C*
Cload
#2 Setup: Correlation and Runtime What do designers do when they run into these? Netlist v1.0 SDC v1.0
Netlist v1.1 SDC v1.1
• Compile • 3.2M instances
• Compile • 6.8M instances??
• Found issues after days of engineering work • Size_only on 3.7M cells • SDC with all cells set with set_disable_clock_gating on
© Synopsys 2012
64
What happened???
Review Your Settings and Input Understand the Different Objectives
DC Utility Checker
• Detect design issues and dirty constraints styles that can lead to bad runtime/memory and QoR
ICC Utility Checker
• Detect readiness of physical design before going into various implementation stages
PT Utility Checker
• Detects application variables, settings and design issues causing runtime or memory increase
© Synopsys 2012
65
#3 Scripts: Impacts Your Design When someone tells you “Tool A” is X times faster than “Tool B”
Incomplete
Complete
Need to put things in perspective … •
First Step: review your script – –
• • © Synopsys 2012
66
How was the script migrated to “Tool A”? Did you also update the script to leverage the latest technologies?
Early stage of your design, think fast mode Final stage of your design, think QoR
Tool Input can Impact Results Understand How the Tool Can Help Meet Design Goals • Today’s design requires completeness • Synopsys tools are tailored for performance, but they also have a mode to run fast
• Recommendations – –
© Synopsys 2012
67
The typical complaint is long runtime, choose your goal setting accordingly Make sure your script is up to date for your end goal and to take advantage of the latest features
#4 Constraints: Watch Your Constraints Symptoms of over-constraining: long runtime, excessive buffering and huge violations • Over-constraining could guide the tool to focus on artificial critical paths
Original Clock period
Input Delay
Output Delay Time Available for logic
• Over-constraining happens with • Unrealistic input and/or output delays • Tightening the clock period • Specifying large clock uncertainty
Synopsys tools are designed to work towards meeting design goals… but don’t expect miracles!
© Synopsys 2012
68
Understanding EDA Tool will help Simple Illustration Will DC do this transformation? CLKA wns = -0.300 CLKB wns = -0.100
CLKA wns = -0.280 CLKB wns = -0.150
Circuit A
Circuit B Cost = ∑ pi * wi
Default Weights
Delay Cost Before
Delay Cost After
CLKA weight = 1 CLKB weight = 1
0.30 0.10
0.28 0.15
Total WNS Cost
0.40
<
0.43
Adjusted Weights
Delay Cost Before
Delay Cost After
CLKA weight = 10 CLKB weight = 1
3.00 0.10
2.80 0.15
Total WNS Cost
3.10
© Synopsys 2012
69
>
2.95
Total cost increased Transformation rejected
Worst WNS = -0.300
Total cost reduced Transformation accepted Worst WNS = -0.280
√
#5 Analyze: Analyze-Fix-Proceed Push Button Flow does not exists
© Synopsys 2012
70
Know your circuit to guide the tool
Synopsys Galaxy Implementation Flow compile_ultra -spg DC Graphical
insert_dft
compile_ultra –spg -incr
IC Compiler
StarRC PrimeTimeSI
© Synopsys 2012
71
place_opt -spg clock_opt route_opt signoff_opt
Signoff extraction Signoff STA
Analyze results between design stages
#6 Methodology: One or Two Flows Design specifications and constraints changes constantly during the design cycle One flow for both exploration & Implementation
180 nanometers (2000) 225K gates, 11 RAMs 150 MHz © Synopsys 2012
72
Exploration flow target for early specs & constraints
Implementation flow for final design realization
45 nanometers (2010) 96mm2, ~ 300M transistors 7-9W
Exploration Throughout Galaxy DC Explorer • Early RTL Exploration – Accelerates Design Schedules
Exploration
Implementation RTL
Design Compiler • Look-ahead & Physical Guidance – Creates a better starting point
RTL Exploration
RTL Synthesis
IC Compiler • Design Exploration – Creates initial floorplan
Design Exploration
Design Planning
• Block Feasibility – Determines physical feasibility
Block Feasibility
Block Implementation
Galaxy Constraint Analyzer • Continuous improvement
© Synopsys 2012
73
Physical
#7 Optimization: Adjust Accordingly Adjust your constraints to model effects of downstream design steps An Illustration
Design Compiler
• Account for clock trees • No hold-timing fixing • Be careful with critical range • Do not over-constrain
© Synopsys 2012
74
Manage Design Constraints Throughout Guidelines For Convergent Timing Closure Synthesis and placement – – – –
•
Remove pre-CTS estimated constraints
1,000 950 900
1029
Timing Closure Profile
971 913
850
Do Not over Place Clock Route Complicate your flow Addnl. Customization For High-Performance Tuned For Hi-Performance/Low Power
Remove/adjust pre-route constraints Adjust crosstalk thresholds
© Synopsys 2012
1,050
Synthesis
Route – –
1,100
800
CTS –
•
Do not over-constrain during synthesis Use DC SPG flow Account for max_transition and clock uncertainty Specify pre-CTS estimated constraints
Timing Closure Profile
MHz
•
75
RM (Baseline)
#8 Signoff: Review your Environment Unlike wine, scripts grow stale with age Runtime (CPU Hrs)
Memory Usage (GB) 172 GB
60
128
50
112 96
40
80 30
64
20
48 32
10
16 0 1.1
1.2
5.5
37.0
Instances (Million)
50+
0 1.1
1.2
5.5
Designs run at customer site using revised PrimeTime scripts and latest release version © Synopsys 2012
76
37.0
Instances (Million)
50+
PrimeTime Scripts: Key Areas to Review • Environment and setup – Use latest release and ensure adequate hardware resources
• Reading parasitics – Use binary parasitics when possible • Multiple timing updates – Eliminate redundant/legacy update_timing steps • Inefficient TCL scripting and reporting
© Synopsys 2012
77
PrimeTime Design Utility Checker can help with some of these tasks
#9 Performance: Leverage Your EDA Partner • Starting Point
• Reduce time-to-results
– Built on Synopsys RM – Understand the new technologies and features – Easy to use
Synthesis
P&R
– Automated methodology to achieve 90% of target quickly – Additional advanced techniques to reach final goal – Minimize number of iterations or “trial and errors” – Reduce ECO efforts Iterations
Signoff + ECO
Typical Flow
HSLP Flow
Design Schedule © Synopsys 2012
78
HSLP Implementation Best Practices Reduces Time-to-Results High Performance, Low Power (HSLP) Flow Requires Customization Typical Flow on Regular designs
Targets
Typical Flow on High Performance designs
100% 90%
Typical Flow
75%
With HSLP Implementation Best Practices
HSLP Flow
Reduces time-to-results
Time
© Synopsys 2012
79
Design-specific customization
#10 Low Power: Architecture Drives Power VDD
VDDB VDD
VDDI IN
VDDB
VDDO L S
VDD
OUT
VDD VDDB
on/off
IN ISO
RR
OUT
IN
OUT AO
EN Gate
Gate
Gate
VSS VSS
DESIGN TECHNIQUES
0.9V
0.7V
Isolation Cells
VSS
Retention Registers
Alwayson Logic
0.9V
OFF 0.9V 0.9V
Multiple Voltage (MV) Domains
Level Shifters
VSS
Power Switches (MTCMOS)
0.9V
Multi-Supply with shutdown No State Retention
OFF 0.9V 0.7V
Multi-voltage with shutdown & State Retention
0.9V
OFF SR
0.9V 0.7V
Multi-Voltage with shutdown
0.9V
© Synopsys 2012
80
New Techniques and Challenges
© Synopsys 2012
81
The Race to 20nm Is On! Leading The Way In 20nm Design
© Synopsys 2012
82
The 20 nm Challenge: Single Exposure “Last Pitch With Single Exposure ~ 80 Nanometers…” We Can Print This,…
But We Cannot Print This
Source M. van den Brink, ASML, ITF 2009; P. Magarshack, STMicroelectronics, 2010 © Synopsys 2012
83
The Solution: Double Patterning A Significant Change
We Can Print This, and This,…
© Synopsys 2012
84
And Then This!
Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC Wide Spacing Enforced
Two-Color Decomposed Design
Source: Synopsys Research 2011 © Synopsys 2012
85
Synopsys Solution DPT Ready IC Compiler P&R, and IC Validator DRC
Source: Synopsys Research 2011 © Synopsys 2012
86
The Challenge: Planar CMOS Insufficient Performance, Excessive Power
32 Nanometer Planar
Performance Power
Source: K. Kuhn, Intel, IDF 2011 © Synopsys 2012
87
The Solution: Non-Planar CMOS FinFET or Tri-Gate CMOS
22 Nanometer Tri-Gate
Performance Power
Source: K. Kuhn, Intel, IDF 2011 © Synopsys 2012
88
The Solution: Non-Planar CMOS The First “Revolution”
Source: M. Bohr, Intel, YouTube 2011 © Synopsys 2012
89
There Are Many Flavors, But… Reality and Fantasy Are not the Same Thing !
© Synopsys 2012
90
FinFET Advantages FinFET vs Planar Transistor
• Superior drive current Inversion Layer
–
Planar
–
Active region spans the fin height and thickness (3 sides) Ids α (2*Hfin+Tfin) as opposed to just thickness for planar
• Reduced leakage –
• Enhanced electron mobility
FinFET
Fin
Source: Intel
© Synopsys 2012
91
Depleted substrate
– – – –
High-K gate oxide Metal gates in place of PolySilicon Strained silicon Multiple fins possible to increase total drive strength for higher performance
This Is Not The End of Moore’s Law! But the Gap Between Intel and the Crowd Is Widening
Source: M. Bohr, Intel, IDF 2011 © Synopsys 2012
92
3D ICs: Technology Trends Four Main Categories of “> 2D-IC” Ahead 1
2
C4
Memory “Cube”
(Wide I/O) Memory “Cube” on Logic
4
3
© Synopsys 2012
TSV
93
Silicon Interposer 3D Stack
Bump
3D-IC Two Basic Configurations Emerging Addressing Gigascale Design Challenges
Silicon Interposer (2.5D) • Horizontally connected dies
• Drivers: Consumer, Storage, Networking • Benefits: Yield, Cost, TTM & Flexibility
3D-IC • Vertically stacked dies with TSVs • Drivers: Wireless handset, Processors • Benefits: Performance, form factor
© Synopsys 2012
94
The ”Memory Cube” Now 8 die stack
560 microns 50 microns
1 Source: C.-G. Hwang, Samsung, IEDM 2006 © Synopsys 2012
95
IP Market, an opportunity for Latin America
© Synopsys 2012
96
IP Intellectual property core, IP core, or IP block is a reusable unit of logic, cell, or chip layout design that is the intellectual property of one party IP cores may be licensed to another party or can be owned and used by a single party alone
IP cores can be used as building blocks within ASIC chip designs or FPGA logic designs
© Synopsys 2012
97
IP
IP cores in the electronic design industry have had a profound impact on the design of systems on a chip
IP core licensor spread the cost of development among multiple chip makers IP cores for standard processors, interfaces, and internal functions have enabled chip makers to put more of their resources into developing the differentiating features of their chips new innovations faster Licensing and use of IP cores in chip design came into common practice in the 1990s
© Synopsys 2012
98
Semiconductor IP Market Segments 2011 Design IP Revenue: $1.9B Block Libraries 1%
Physical libaries 3%
Other IP 4%
GP Analog/MS 4%
Processors (CPUs, GPUs, DSPs) Memory Cells/Blocks 10%
Microprocessors 39%
Wired Interfaces 19%
Fixed Function (GPUs, Security) 15%
Source: Gartner, March 2012 © Synopsys 2012
99
DSP 5%
Semiconductor IP Market Size Synopsys Share 2,000.0
14.0%
1,800.0 12.0% 1,600.0
$M
1,200.0
8.0%
1,000.0 6.0%
800.0 600.0
4.0%
400.0 2.0% 200.0 0.0 Semiconductor IP Market Size Synopsys Share
Source: Gartner, March 2012 © Synopsys 2012
100
CY04 964.0 7.9%
CY05 1,068.3 7.6%
CY06 1,267.3 7.3%
CY07 1,378.2 7.2%
CY08 1,464.1 7.2%
CY09 1,351.0 9.1%
CY10 1,695.0 11.3%
CY11 1,910.9 12.4%
0.0%
Synopsys Share
10.0%
1,400.0
Top Semiconductor IP Vendors
Rank 1 2 3 4 5 6 7 8 9 10
Company 2010 ARM Hol di ngs 575.8 Synops ys 191.8 Ima gi na ti on Technol ogi91.5 es MIPS Technol ogi es 85.3 Ceva 44.9 Si l i con Ima ge 38.5 Ra mbus 41.4 Tens i l i ca 31.5 Mentor Gra phi cs 27.3 AuthenTec 19.6
Source: Gartner, March 2012
© Synopsys 2012
101
2011 732.5 236.2 126.4 72.1 60.2 42.8 38.9 36.3 23.6 22.8
Growth 27.2% 23.2% 38.1% -15.5% 34.1% 11.2% -6.0% 15.2% -13.8% 16.3%
2011 Share 38.3% 12.4% 6.6% 3.8% 3.2% 2.2% 2.0% 1.9% 1.2% 1.2%
IP Vendors Also Need to Provide More Functions and Functionality 120
70
100
60
% Design Reuse
80
50
60
40
40
30
IP Subsystems
20
20
IP Blocks 0
10 2005
2006
Source: Semico, October 2010
© Synopsys 2012
102
2007
2008
2009
2010
2011
2012
2013
2014
% Design Reuse
Total Number of IP Blocks per SoC
Avg. # IP Blocks per SoC
Subsystems: The Next Evolution in The IP Market What is a Subsystem?
Complete Solution: HW, SW, Prototype
© Synopsys 2012
103
Pre-integrated and Verified
SoC Ready: Seamlessly Dropin and Go
Thank You
© Synopsys 2012
104