Relevant paper (by Intel Research, actually)... there are quite a few differences (we are keeping it a lot simpler on the tx and rx ends, but that is some of our secret sauce... I can talk about it offline if you are really interested)
http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=648778...
Going over 50 cm of Twinax is great and all, but that's the ideal environment. When you guys put 16 of these things on a board, routing that PCB and getting good signal integrity is not going to be fun, even after you throw in pre-emphasis and so on.
Getting anything approximating Intel's work, assuming you are rolling all your own IP, is pretty ambitious (and probably a good deal more work than your Core design). Even at only 6 Gbit/s (faster than PCIe Gen2, btw). Just curious - have you ever taped out a chip on a modern process before?
We aren't trying to reach their 128GB/s number as listed in that paper, only 48GB/s... in addition to our design being much simpler than that. Our design is having each pin is simply a buffer and a latch that is synchronized with all the other pins part of the interface by a PLL... much easier to implement and run than a serializer for even just a single pin.
I myself have not taped out something on a modern process, but have advisors who have. My co founder and I do have nanofab experience, so we do understand the physical complexities of fabrication first hand.
Ah, so the idea is to have 64 parallel bits coming in at 6 Gbits/s, along with a slower clock that you multiply up to 6 GHz and use to sample the inputs? That will be quite tricky to get working 1) without any analog signal conditioning on the inputs (or outputs), and 2) without inter-bit skew making it impossible to meet timing on your inputs. Best of luck to you, but the interface alone sounds likely to be problematic.
That's the basic idea, but we can run it at 3GHz if we do DDR, or 1.5GHz doing QDR. There is some extra magic there which I don't want to talk publicly about just yet ;)
The biggest problem (even with our solutions for skew and crosstalk) is just the number of pins/traces on the board, but that's not unsolvable... nothing a ~10 layer PCB can't solve.