Wednesday, December 03, 2008     Register | Login | Search | Contact Us
     

 
Designing-in DDR2 Memories on PCBs using Allegro PCB SI 15.7
Kai Keskinen
Celestica Design Services

Discuss this content or topic in the forums


DDR2 papers and interviews are becoming increasingly popular on the Cadence user community so cdnusers asked Engineering Manager Kai Keskinen to talk to us about how he uses Allegro PCB SI 15.7 to design-in DDR2 memories.

Designing-in DDR2 Memories on PCBs using Allegro PCB SI 15.7—from pre-layout simulation to post-layout verification


Introduction
Designing in DDR2 memories is different from working with DDR although there are many similarities. Working with DDR2 or DDR DIMMs is much the same except that data rates and signal edge rates are significantly higher with DDR2 so approaches used with DDR may no longer work with DDR2. The biggest differences are in embedded designs.

Higher level design requirements drive memory controller selection, memory speed, and memory size (depth or width expansion). Similarly, board size and maximum board thickness are constrained by the platform. Military/avionics OEMs almost never use DIMMs. For ruggedization reasons, they use embedded designs. Often the starting stackup is from the previous design on the platform. We almost never get a board where real estate is not an issue. Many of the designs we get use blind and buried vias—some also use via in pad which are easier to work with than standard via technology because routing channels are not blocked by vias.


Developing Constraints through simulation
Pre-layout simulation starts with the memory controller and the memory. Sometimes, even at this stage, we are already constrained with a preliminary placement due to board real estate/thermal reasons such as all five memories will be placed as close to the controller as possible with three on primary side and two on secondary side or all 12 in a group of 3x4 on secondary side.

We then perform simulations and timing analysis to determine the optimum routing for the placement. Depending on the memory controller and memory arrangement, matching requirements may force too much serpentining and so adjustment of the preliminary placement has to be done.

With width expansion, data is point to point and there are no issues. With depth expansion, data can have up to four SDRAM or DIMM data pins driven by the controller. This is where it gets difficult. Usually on a memory write, you have a lot of freedom but on a memory read, you can quickly get into trouble unless you select a good placement and topology.

Strobes are a bit more forgiving than data due to the fact they are differential, but JEDEC standard says they have to be monotonic in the switching region. At the pin, we often see non-monotonicity but the die is usually clean. We just had a case of several existing boards that were simulated with a Rev 1.0 silicon model and then built with no issues that showed non-monotonicity at the pin for Rev 2.1 silicon models but the signal at the die was good. Rev 2.1 increased the edge rate significantly to allow the CPU to drive an 800Mbps interface even though the designs are running at half that currently.

Clocks can also be easy if you have one clock output per SDRAM or DIMM. A clock fanout buffer or PLL also makes life easy if you have one output pair per SDRAM or DIMM. If you have one clock output to two SDRAMs, or as often happens, you end up with one clock to several SDRAMs and one clock that is shared, you have to take care to ensure you have a good edge at both devices. Usually a balanced T or a daisy chain with parallel termination after the last device works well. You then have to take into consideration the heavier load on the clock going to two devices for timing reasons.

Another issue is with respect to timing within the daisy chain. How do you match the address to the clocks? This is a function of the memory controller timing specifications. For some, the clock and address lengths at each DIMM have to match with some delta and tolerance. Some allow you to make the clocks the length of the mean of the address, etc. This has to be verified before you start routing by doing some detailed timing analysis. The implications to data has to be considered since strobe is usually matched to clock with some delta, but data has to be matched to the strobe with a much finer tolerance. You have to minimize serpentining of data because that eats up space and layers quickly.


Tweaking constraints during placement
On multi-DIMM interfaces, address is a daisy chain with the pullup to Vtt after the last DIMM and is usually pretty simple. Address on embedded designs is probably the most difficult to implement. Depending on placement and timing, you really want a daisy chain with the pullup to Vtt at the end. Sometimes, the timing does not allow this, so you have to use a balanced T. Placement of the pullup is usually at the node of the T. The matching requirement of the branches of the T is a function of interface speed. It gets tighter as you go faster. Several of the designs discussed in the strobes section use a daisy chain for address. The die Rev. 2.1 higher edge rate causes the signal at the first SDRAM in the daisy chain to ring back through the AC and DC thresholds causing a timing violation in our simulations. There is enough reflection from the pullup and other packages to cause this ring back due to the higher edge. This is being heavily investigated by our end customer since it impacts several designs already in the field. An AC termination at the driver is required to fix this but a retrofit is not possible due to density reasons.

Getting ready to route
O.K., so now we have a placement, routing rules (hopefully in constraint manager), and we can start routing. Unfortunately, even with 15.7, you cannot constrain everything without doing some interactive work at times.

The simplest configuration is one memory controller to one DIMM. Because the DIMM is wider than the controller, you figure out what is the longest length to the outside edges. This drives the constraints for all the other nets. You end up with a lot of serpentining for the nets near the middle of the DIMM. This forces you to use more layers.

On an embedded design, for clocks, the lack of math ability in 15.7 (fixed in 16.0) means you have to measure the clock or strobe lengths to determine the length of the starburst or PLL feedback line. So you route clock and strobe, fix them, and then constrain the starburst or feedback line.

One very recent design used four DIMMs per memory controller with two memory interfaces interleaved so we had eight DIMMs in a 1x8 group. The CPU vendor provided detailed written constraints that had to be put into CM. This again ended up as an interactive design.

The routing to the first DIMM was pretty much the same as an interface to one DIMM. This is where application of the written constraints to CM got hairy. For a given byte lane, at each DIMM, you had to match data to strobe as +/-100mils. If you applied this constraint as written, i.e. from memory to each DIMM, it was unworkable in practice because the routing to the first DIMM usually used up the budget when we tried to autoroute or even manually route. The constraints had to be +/-25 mils for the first DIMM, then +/-50 for the second, +/-75 for the third, and then +/-100 for the fourth. The strobes were treated as a matched group to a looser tolerance but all signals DQ, DQS, and address/control had to be matched +/-500mils at each DIMM. So now we had routed our byte lanes but not the clocks. The clocks were actually the last thing routed. They were the mean of the data lengths + a delta depending on at which DIMM slot they were located.

So now we had a routed board file and we needed to simulate.


Post-layout verification
The first issue was simulation time. Take the case of the fiour DIMMs per memory controller. For complete coverage, both interfaces should be done.

We had a case where we had four CPUs, each with eight DIMMs hanging of it. (4 DIMMs + CPU) x 72 bits x (fast, typical, slow) =1080 simulations, only for data. However, the recommended ODT settings were such that on a memory write, the ODT was off on the DIMM receiving and on for the other three. This meant that when using probe, we had to run four different sets of simulations, each time using a different design link. We use home-grown scripts to extract the delay numbers into the timing spreadsheets. In this design, the DIMM closest to the controller had the worst signal. At 533Mbps, the slow case was marginal and at 667Mbps, it didn't work.


Issues at Higher Speeds
Another issue is working with EBD files. At slower speeds, they are O.K. With DIMMs rated for 800Mpbs, the fact that the EBD uses lossless lines means overshoots and reflections are not damped the same way. We discovered this the hard way, creating a Rambus NexRIMM EBD model. The simulations with the board file and the EBD model did not compare at all even though the EBD file was created from the board file and matched exactly except for the lossless lines. So, at higher speeds, we need the DIMM board file to get accurate simulations. There is always an NDA problem then.

At higher speeds, package effects become significant. Often, as already discussed, the signal at the pin can look poor while at the die, it is clean. Accurate package models also become important. Lumped elements as min/typ/max package parasitics can often give you too much skew so your timing is not accurate. For MGHz signals, lumped elements in place of transmission lines can act as low pass filters. A distributed package model is required if package lengths are significant—as they are in most CPUs now. Some CPUs have matched package lengths for the memory interface and some require you to correct for the package skew on the board. Timing measurements should be from die to die at the higher speeds.


Read other member interviews on DDR2:


Summary



About the author
Celestica Inc. (NYSE: CLS, www.celestica.com) is a multinational electronics manufacturing services company headquartered in Toronto, Canada. Celestica operates a highly sophisticated global manufacturing network with operations in Asia, Europe and the Americas, providing a broad range of integrated services to leading OEMs across a variety of industries ranging from consumer electronics to aerospace/defence. Celestica has 40,000+ employees. Celestica was incorporated as a wholly owned subsidiary of IBM in 1994 and divested in 1996.

We have a diverse team of experts with experience in a variety of industry segments and disciplines. The products we are involved with include leading-edge server/storage boards, single board computers/graphics systems for aerospace/military, carrier grade telecom boards/systems, ATCA, etc. We prefer to be involved at the pre-layout stage of board development but we are often brought in at either the post-route stage or to diagnose and fix boards that do not operate reliably. Die shrinks and component substitutions often create new issues on previously reliable boards. We see all kinds of interfaces from 60X/local bus, PCI, PCIX, SDRAM, DDR, DDR2, PCIe, SRIO, SATA, SAS, OC-12/48/192, Hypertransport, etc. Often, the most time-consuming interfaces are the bi-directional multidrop 60x, PCI, local bus, DSP interfaces running at fairly low speeds. The latest high speed serial chipsets with multiple levels of pre-emphasis and drive strength have taken a lot of the risk out of high speed SERDES design up to 3.125Gbps. Most of the boards we see have 18 to 20+ layers and several thousand components. Constraint-driven design is essential.

After obtaining his Ph.D. at York University in Toronto, Kai worked at MPB Technologies in Montreal in the Electromagnetics group doing a variety of research in microwave remote sensing, radar cross-section measurements, Cepstral processing of Doppler radar data, airborne and shipborne antenna pattern measurement and predictions, and EMI/EMC work, primarily for military customers. With the downturn in defense funding at the end of the cold war, he moved to Nortel Networks to do EMI/EMC work on telecom systems and then shifted to Signal Integrity in Nortel’s Interconnect Group. After the telecom melt down, he started a small SI group at C-MAC Engineering. Following C-MAC’s acquisition by Solectron, Kai joined CoreSim, a design services company that specializes in Signal Integrity and Static timing analysis, board and chip level functional simulation, FPGA and ASIC retargets, and board design. CoreSim’s specialized techniques help external customers and CoreSim’s own designers to get their boards, FPGAs and ASICs right the first time. Kai is currently Engineering Manager for the Design Analysis group at CoreSim, which is now a Celestica Subsidiary.

Kai has been using Allegro PCB SI, or SpecctraQuest as it was previously known, for over 7 years and previously he used UniSolve from UniCAD which was bought by Cadence.


Ratings

  This content has been rated 3 out of 5 by other users  

Comments
 
akcpcb - 2/17/2008
HAVE A FLOORPLAN OF BUSBUNDLE
wjdai - 8/13/2007
Hope that detail the simulation based the SQ.
 
   
     
Copyright 2006 Cadence Design Systems, Inc.