Gemini uncloaks, announcing threaded SPICE. We talk with them to find out more.

Summary: After three and a half years in development, Gemini Design Technology last week unveiled their threaded circuit simulation technology, claiming performance of up to 30X faster than traditional SPICE simulators, and up to 10X faster than previous generations of threaded simulators. AllAboutEDA was fortunate to sit down with some of the Gemini team and dig a little into some of the detail. Is Gemini for you? Read on...

We live in interesting EDA times, especially as far as circuit simulation is concerned. There’s a wealth of activity in many areas - solvers, model types and model evaluation engines, and architectural improvements. All are being pushed forward by the power houses of circuit simulation (Synopsys and Cadence). In addition some really exciting and potentially disruptive innovation is coming out of startups who recognize potential market opportunity, both by competing against the incumbent power houses, and by addressing currently unserved market segments.

The subject of this post is Gemini Design Technology, who last week threw off their self-imposed cloak of stealth and secrecy[1] to announce the company, and the technology they’ve been busily beavering away on for the last three years. And pretty exciting technology it is too, something that I believe is going to have a profound impact on the accurate circuit simulation market, and so I consider myself fortunate indeed to have spent a few hours talking with EDA luminary Jim Solomon, Gemini’s Executive Chairman (and someone who needs no introduction to anyone even remotely interested or involved in analog design technology), and Kent Jaeger, VP Marketing.

The positioning of the Gemini technology is a frontal attack on the accurate SPICE market for IC design and verification, a market dominated in roughly equal share by Cadence’s SPECTRE and Synopsys’ HSPICE. Gemini’s claim is that their technology will deliver the fastest, 100% SPICE-compatible, simulator. A bold claim indeed. So how do they do?

The easy stuff first:

  • Netlist formats: HSPICE, Spectre, Berkeley SPICE, and some IDM-proprietary
  • Model support: BSIM3/4, HISIM, HICUM, VBIC, Verilog-A (and others)
  • Output data: PSF, FSDB, WDF, SPICE raw data
  • Analyses supported: Transient, AC, DC sweep, noise

So at first blush it seems you’ll find it straightforward to:

  1. Read in your netlist
  2. Have your model supported (Verilog-A support is important here, since so many foundries now deliver early models that include a fair amount of Verilog-A language)
  3. Perform the analysis you want
  4. View the output waveforms in the viewer of your choice, including both menu and command line integration into Cadence’s Virtuoso ADE (Analog Design Environment for additional post-simulation signal analysis.

Ok, that’s a good start. But on what class of circuits can you use it, against what circuits has it been successfully tested, and what were their characteristics? There’s a strong focus on analog and mixed-signal applications, and on fully-extracted post-layout designs. I suggest that this would have them aiming squarely at Spectre, since HSPICE is stronger in the digital design flows, but you could also put the smaller Berkeley Design Automation here and even some of the tuned Fast-SPICE market looking at A/M-S designs up to a few million elements. I’m not going to regurgitate the Gemini benchmarks here, since they’ll be more plentiful and more current on their web site. Let me just say that those I have seen are impressive, but there’s no substitute to you running the software on your own systems, on your own circuits, to get a true representation of the performance, precision and capacity you’re likely to see when deployed in your production flows. For a young company, with a product not yet released to production, they’ve established one of the most mature benchmarking processes and suite of designs I’ve come across. Big kudos.

Now I do want to spend a little time writing about what I see as the innovation behind these claims, because I’m convinced it’s a harbinger of a fundamental shift that’s going to take place in EDA over the next few years. We’re at the early stages, I believe, of a re-architecting of design tools to take advantage of computer and compiler advances in supporting parallel processing, either threaded on multi-/many-core CPUs or distributed across the network.

I don’t want to decry or down-play the benefits gained through clean-room and efficient implementation of prior art, using contemporary techniques and modern algorithms and data structures; parsing, partitioning, etc. But what I want to write about is the breakthrough that Gemini have made in implementing a SPICE built from the ground up to support a comprehensive multi-threaded approach, with both model evaluation and the matrix solver being threaded, as well as other often-serial tasks (e.g. partitioning.)

I’m not going to focus on threaded model evaluation, as there are several products on the market that support this, as well as several in-house simulators from large IDMs. It’s the threaded matrix solver that’s so interesting to me, as parallel matrix computation is something that’s been worked on in academia and industry over the last 20 years without much success - we’ve seen lots of tricks that reduce storage and enhance performance and stability, relying on the fact that the matrices we see in circuit simulation are all pretty sparse (density in the single-digit percent, even fully-extracted.) The traditional wisdom seems to be that parallel matrix computation gains performance on the solve side, but that synchronization and matrix load worsen with increasing parallelization and above 3 or 4 parallel computation streams the performance gain tails off; you might see a traditional implementation of a threaded sparse solver giving a 2.5X performance improvement on around 4 cores, but that’s about the best it will do and adding more cores reduces the speed gain. However, the approach taken by Gemini founder, Dr. Baolin Yang, results in a linear increase in simulation performance gained by adding additional cores/threads. In addition, Gemini analyze the circuit netlist to determine the optimum thread and core count, depending on the circuit size, coupling, complexity and other characteristics, ensuring efficient hardware selection. Kudos times two.

Now I’ve been of the opinion for a while that threading is a short-term solution to a long-term problem. And I appeared to be in good company: Gary Smith, and many others, discussed at DAC 2008 how “threads are dead.” My reasoning was (note the past tense) as follows:

  • Cores per CPU or motherboard scale slowly, as can be seen if you view the Intel or AMD multi-/many-core roadmaps
  • The number of cores per CPU on this roadmap stalls at 16, perhaps to allow the rest of the world (operating systems, compilers, applications, etc) to catch up
  • More cores per CPU will require new memory architectures in order to avoid exhausting the memory I/O bandwidth when multiple threads compete for the same process
  • The compilers and software development environments are immature, and debugging a heavily-threaded application is non-trivial

I had believed that distributing processing tasks across the network will, when the needed innovation was complete, offer greater performance scaling. Sure, there has to be a way to minimize the latency of the network and passing through the switch, but solve that and you can throw 100 or 1000 CPUs at some really intractable problems. And with virtualization taking over in data centers, ensuring the same operating system version is in use across the compute farm is easy, as is maintenance and management, with the result that IT data centers will see improving cost/resource efficiency.

But with a well-equipped Dell 4-socket quad-core server (that’s 16 cores in the box) costing around $13k, and the 2-socket quad-core servers (8 cores) starting around $5500, I think we’ll see even dual-core machines being obsoleted out of compute farms within the next 2 years[2], and with the benefits obtainable from solutions like Gemini, I’m coming back to threading as offering a pragmatic solution to performance scaling for circuit simulation, today. Stand by for more articles on this subject as I work with more companies on parallel processing and I’m able to share my experiences. Of course, we’d be delighted to read your comments, and to hear your experiences of parallel processing in all its many flavors. Where do you prefer threading over distributed?

In conclusion, Gemini’s technology appears to me to be of extraordinarily high value in offering high-integrity verification of large and complex analog and mixed-signal designs, particularly where detailed and exhaustive analysis of parasitic-dominated behavior is required, and where the design characteristics lead to either a large number of time-steps, or multi-rate behavior. If you’ve got tough circuit simulation problems, you should definitely check out Gemini.

Notes
1 We at AllAboutEDA wrote about Gemini in February 2008, in our article on Circuit Simulation - the next generation?, so they’ve been on our watch list for a while.

2 You can scarcely find dual-core servers for sale now. Quads are all the rage, currently offering the optimum price/performance.

Next Page »