| Svelte Hacker News

points by thesz 15 years ago

It has 64 words of RAM and 64 words of ROM. It has 18-bit word for ALU operations and commands.

I think it has more than one MISC command in a word, I think count is about 3 (six-bit commands).

I cannot wrap my head around how to program that... not a beast, more like a field of tiny windmills. One of the designers of preceding chips once wrote about using it as a systolic engine, but the area of systolic algorithms is quite narrow, AFAIK.

I cannot find any C/Fortran compiler or compiler for any other high-level language.

My overall impression is that this looks like all bad ideas from Cell BE were ported to Forth language.

MISC: http://en.wikipedia.org/wiki/Minimal_instruction_set_compute... John Sokol on early GreenArray alike designs: http://hardware.slashdot.org/comments.pl?sid=274687&cid=...

metamemetics 15 years ago

My initial thought was: high density, asynchronous cores -> brain modeling. You treat each asynchronous core as a neuron.

Reading up on Charles Moore this may indeed be the intended case: http://www.pcai.com/web/ai_info/pcai_forth.html

> Charles Moore created Forth in the 1960s and 1970s to give computers real-time control over astronomical equipment. A number of Forth's features (such as its interactive style) make it a useful language for AI programming, and devoted adherents have developed Forth-based expert systems and neural networks.

Still, 100 billion neurons in brain / 144 cores * $20 per chip = ~$13 billion . Also I would guess most modern researchers in this area don't know Forth and are doing high-level programming and virtualizing neurons rather than taking a low-level hardware approach.

thesz 15 years ago

Forth was used in AI in a pretty non-linear way.
I have a book where authors developed Lisp on Forth and then proceed developing Prolog on newly created Lisp. Then they demonstrated how to use that Prolog in the development of rule-based expert system.
There was a saying that Forth amplify programmers' ability to develop programs and to make mistakes. If you a need an AI tool, but do not need your mistakes to be amplified, stay away from Forth. I think that apply to other areas of domain-specific development as well.
While I adore Forth, I cannot recommend it to anyone. Especially to simulate brain - what if you introduce an error, Forth amplifies it and we'll get a hidden psychopath? ;)
- silentbicycle 15 years ago
  
  What book is it? You can't just mention a book that has Forth, Lisp, AND Prolog and not give the title. :)
  PAIP and a couple others have Lisp and Prolog, LOL has Lisp and Forth, HOPL 2 has all three (but separately), but it doesn't sound like any of those.
  
  TY 15 years ago
  
  I think he probably spoke about "Designing and Programming Personal Expert Systems" by Carl Townsend that dates back to 1986.
  From http://www.faqs.org/faqs/computer-lang/forth-faq/part5/
  Contains LISP and Prolog emulations in Forth, including a unification algorithm. It also has some minimum distance classifier code. The application is fault diagnosis in locomotives.
  
  silentbicycle 15 years ago
  
  Salute!
- metamemetics 15 years ago
  
  Yep pretty much all simulation is currently being done with Phil Goodman's Neocortical Simulator (Matlab/C) and NEURON (C + recently a python api) on Blue Gene super computers. http://en.wikipedia.org/wiki/Blue_Brain_Project . So the mystery of who will use this chip continues.
  
  dmm 15 years ago
  
  Moore has been building chips like this for years. Someone must be buying them.
  
  metamemetics 15 years ago
  
  Perhaps the military? Small embedded neural nets could be highly useful for visual recognition algos on missiles and drones [ the majority of military planes now being built are unmanned ]. However I believe the military is now trying to move all drone tech code to a common operating system\language to increase code portability between platforms.
unwind 15 years ago

If you call them up and ask for a quote on 694,444,444 chips, I would kind of expect them to offer you a discount. But remember to ask, just in case.
maushu 15 years ago

It would also occupy a plane with the area of 12500 m^2. According to wolfram|alpha equivalent to 1.7 times the area of a FIFA-sanctioned international match soccer field.
- cullenking 15 years ago
  
  Yeah, however, if someone were to buy $1 billion of them, they may have the ability to shrink it down to a 30-40 nm design, significantly reducing the footprint. Further conceptualize it by thinking of the footprint of a decent cluster, with all the 1u blades spread out over a certain area. Definitely feasible, though i recognize your computations are die size, not total computer size.
GregBuchholz 15 years ago

Don't forget that these chips are a couple of million times faster than a neuron running at 200Hz. If we forget that the brain has more interconnect, you'd only need about $7,000 dollars worth of chips.

RodgerTheGreat 15 years ago

Three six-bit instructions per word, followed by a single three-bit instruction that can only assume a restricted subset of instructions. Since the chips have 64 words of RAM and ROM you could theoretically pack 512 instructions into each chip. In practice, this figure will be a fair bit lower- Jumps, for example, store their address in the remaining slots of a word, so they could consume as many as 4 "theoretical instructions".

However, communicating in parallel between CPUs is very easy. I/O lines between CPUs have essentially a hardware semaphore that will cause reading CPUs to block until they get a write and writing CPUs to block until they get a corresponding read. By bit-indexing ports you also get pretty easy fanout.

The docs also mention that CPUs can directly "push" instructions to one another without needing a bootstrap on the receiving end, which allows CPUs to act as extended memory for one another, eases debugging and opens up tantalyzing possibilities for self-modifying code.

You aren't going to get very far trying to execute a conventional language on this architecture, but color me interested.

dmm 15 years ago

> My overall impression is that this looks like all bad ideas from Cell BE were ported to Forth language.

What's so bad about the cell? I know developers that express nothing short of love for the cell processor.

thesz 15 years ago

Cell BE was made without compiler support. You weren't able to feed your C/C++/Fortran program to the compiler and obtain more or less parallel version of your program. This complicates things, you had to manually parallelize your program.
The tool support for Cell BE SPU was close to existent (no, I didn't mess words up). It was of such low quality so that you pretty had to use Emacs with assembler highlighting mode to to any serious work for SPU. The difference in speed between gcc and hand made assembler code circa 2007 was about 1.5-2 times.
Both PPU and SPU are in-order, so you have to avoid minefield of random memory accesses. You had to manually write allocators and such while holding up SPU constraints.
In-order architectures does not facilitate abstractions. You cannot simply recompile code from out-of-order x86 for in-order Cell BE and obtain reasonable performance (say, about 80% from maximum). You will have to optimize agressively.
You cannot load too much into 256Kbytes combined data and program memory of SPU. Divide those 256Kbytes by two, and you have about 128Kbytes of program memory (you should divide it again - one for working program and another for loaded program) and 128Kbytes, or 8Kquadwords (16 bytes per quad word) of data memory. Data memory you should divide again - one part of your data is constant, perhaps, or you work with one patr and another is being loaded. 4K quadwords. Two operations per cycle on quadwords, 2K cycles to process the whole data block. The latency of Cell memory subsystem is very high, so our 2Kcycles should be comparable to time needed to load that amount of data into SPU. So you have to be very, very careful to keep SPU loaded and working.
Many new chips suffer from lack of compiler support, especially in automatic parallelization. Cell BE surely did. So does GreenArray. Cell BE suffered from lack of memory on SPU, main parallel engine block. GreenArray does that as well. Cell BE used simple to implement but hard to program in-order architecture in all its' processing parts. GreenArray uses stack architecture which is extremely hard to program.
So, in my eyes, GreenArray is Cell BE ported to Forth. ;)

david927 15 years ago

My understanding is that this work is in conjunction with Alan Kay's Viewpoints Research: www.viewpointsresearch.org and what they're working on in languages.

gruseom 15 years ago

Are you sure? How do you know this? Any connection between Alan Kay and Charles Moore -- or even between their teams -- would be worth hearing about.