A 36-core chip design with an Internet-style routing network

Do you have anything to add to our daily newswire? Then comment on our news and post your news story here!

Moderators: CPUagnostic, MTX, Celt, Hammer_Time, Sauron_Daz, Tacitus, Anna

A 36-core chip design with an Internet-style routing network

Postby Hammer_Time » Sun Jun 29, 2014 5:28 pm

http://www.kurzweilai.net/a-36-core-chi ... on-network

A 36-core chip design with an Internet-style communication network

Chips of the future will resemble little Internets


June 27, 2014

The more cores — or processing units — a computer chip has, the bigger the problem of communication between cores becomes.

Now, Li-Shiuan Peh, the Singapore Research Professor of Electrical Engineering and Computer Science at MIT, speaking at the International Symposium on Computer Architecture, has unveiled a 36-core chip that features a “network-on-chip” to deal with the problem.

The idea: massively multicore chips of the future will need to resemble little Internets, where each core has an associated router, and data travels between cores in packets of fixed size.

The innovation also solves one of the problems that has bedeviled previous attempts to design networks-on-chip: maintaining cache coherence, or ensuring that cores’ locally stored copies of globally accessible data remain up to date.

In today’s chips, all the cores — typically somewhere between two and six — are connected by a single wire, called a bus. When two cores need to communicate, they’re granted exclusive access to the bus. But that approach won’t work as the core count mounts: Cores will spend all their time waiting for the bus to free up, rather than performing computations.

In a network-on-chip, each core is connected only to those immediately adjacent to it. “You can reach your neighbors really quickly,” says Bhavya Daya, an MIT graduate student in electrical engineering and computer science, and first author on the new paper. “You can also have multiple paths to your destination. So if you’re going way across, rather than having one congested path, you could have multiple ones.”

Maintaining cache coherence

One advantage of a bus, however, is that it makes it easier to maintain cache coherence. Every core on a chip has its own cache, a local, high-speed memory bank in which it stores frequently used data. As it performs computations, it updates the data in its cache, and every so often, it undertakes the relatively time-consuming chore of shipping the data back to main memory.

But what happens if another core needs the data before it’s been shipped? Most chips address this question with a protocol called “snoopy,” because it involves snooping on other cores’ communications. When a core needs a particular chunk of data, it broadcasts a request to all the other cores, and whichever one has the data ships it back.

If all the cores share a bus, then when one of them receives a data request, it knows that it’s the most recent request that’s been issued. Similarly, when the requesting core gets data back, it knows that it’s the most recent version of the data.

But in a network-on-chip, data is flying everywhere, and packets will frequently arrive at different cores in different sequences. The implicit ordering that the snoopy protocol relies on breaks down.

The MIT researchers solve this problem by equipping their chips with a second network, which shadows the first. The circuits connected to this network are very simple: All they can do is declare that their associated cores have sent requests for data over the main network. But precisely because those declarations are so simple, nodes in the shadow network can combine them and pass them on without incurring delays.

Groups of declarations reach the routers associated with the cores at discrete intervals — intervals corresponding to the time it takes to pass from one end of the shadow network to another. Each router can thus tabulate exactly how many requests were issued during which interval, and by which other cores. The requests themselves may still take a while to arrive, but their recipients know that they’ve been issued.

During each interval, the chip’s 36 cores are given different, hierarchical priorities. Say, for instance, that during one interval, both core 1 and core 10 issue requests, but core 1 has a higher priority. Core 32’s router may receive core 10’s request well before it receives core 1’s. But it will hold it until it’s passed along 1’s.


This hierarchical ordering simulates the chronological ordering of requests sent over a bus, so the snoopy protocol still works. The hierarchy is shuffled during every interval, however, to ensure that in the long run, all the cores receive equal weight.

The new architecture, called SCORPIO, will be open-source.


Abstract of International Symposium on Computer Architecture presentation

In the many-core era, scalable coherence and on-chip interconnects are crucial for shared memory processors. While snoopy coherence is common in small multicore systems, directory-based coherence is the de facto choice for scalability to many cores, as snoopy relies on ordered interconnects which do not scale. However, directory-based coherence does not scale beyond tens of cores due to excessive directory area overhead or inaccurate sharer tracking. Prior techniques supporting ordering on arbitrary unordered networks are impractical for full multicore chip designs.
We present SCORPIO, an ordered mesh Network-on-Chip (NoC) architecture with a separate fixed-latency, bufferless network to achieve distributed global ordering. Message delivery is decoupled from the ordering, allowing messages to arrive in any order and at any time, and still be correctly ordered.
The architecture is designed to plug-and-play with existing multicore IP and with practicality, timing, area, and power as top concerns. Full-system 36 and 64-core simulations on SPLASH-2 and PARSEC benchmarks show an average application runtime reduction of 24.1% and 12.9%, in comparison to distributed directory and AMD HyperTransport coherence protocols, respectively.
The SCORPIO architecture is incorporated in an 11 mm-by- 13mm chip prototype, fabricated in IBM 45nm SOI technology, comprising 36 Freescale e200 Power ArchitectureTMcores with private L1 and L2 caches interfacing with the NoC via ARM AMBA, along with two Cadence on-chip DDR2 controllers.
The chip prototype achieves a post synthesis operating frequency of 1 GHz (833MHz post-layout) with an estimated power of 28.8W (768mW per tile), while the network consumes only 10% of tile area and 19 % of tile power.

references:
Bhavya K. Daya et al., SCORPIO: A 36-Core Research Chip Demonstrating Snoopy Coherence on a Scalable Mesh NoC with In-Network Ordering, presented at the International Symposium on Computer Architecture, Minneapolis, 2014


http://projects.csail.mit.edu/wiki/pub/ ... ca2014.pdf

Very cool!! I wonder how strong its "single-thread" performance will be, but for parallel workloads this thing should rock! 8)
The richest man is not he who has the most, but he who needs the least. No good deed goes unpunished...

Image
User avatar
Hammer_Time
Rantmeister Mod
 
Posts: 33828
Joined: Wed Dec 31, 1969 4:00 pm
Location: Kitchener-Waterloo, Ontario, Mordor

Re: A 36-core chip design with an Internet-style routing network

Postby Sauron_Daz » Mon Jun 30, 2014 2:00 am

:cool:
We never think of us as being one of Them. We are always one of Us. It's Them that do the bad things.
User avatar
Sauron_Daz
Evil OverLord Mod
 
Posts: 34605
Joined: Wed Dec 31, 1969 4:00 pm

Re: A 36-core chip design with an Internet-style routing network

Postby Silver » Sun Jul 13, 2014 3:26 am

Interessting story :)
Specs:

Intel Core i7 4930K, Kingston 32GB
ASUS GeForce GTX 770 4GB, ASUS P9X79

Aiming for impossible goals forces thinking beyond mere extrapolation of existing achievements.
User avatar
Silver
X-bit Guru
 
Posts: 3739
Joined: Mon Jun 23, 2003 12:26 pm
Location: Sweden

Re: A 36-core chip design with an Internet-style routing network

Postby clone » Sun Jul 13, 2014 8:08 am

while it's interesting to apply the term "internet-style routing network" in answer to the issue of multi core load sharing it seems kinda obvious that they'd use existing principals to solve the challenges faced as more chips fit onto a package.

it seems like the industry is at odds, we have complex general purpose cpu's (AMD & Intel, IBM) and then a swath of notably simple less complicated task specific cpu's now competing for the same space.

stuffing 36 cpu's onto 1 package in order to perform the tasks 2, 4 & 8 core OOD cpu's did previously seems like a means for a competition with the goal being a zero sum gain.

I wonder if the driving force behind this resides in the efforts to wrest control from the entrenched brick & mortars.

if you look at just how many ARM cpu's are required to surpass 1 Intel cpu you'll understand what I'm talking about.
When we lose the right to be different, we lose the privilege to be free.
clone
X-bit Film Critic
 
Posts: 8106
Joined: Sun Aug 15, 2004 11:13 am

Re: A 36-core chip design with an Internet-style routing network

Postby Hammer_Time » Mon Jul 14, 2014 8:15 am

I think it is pretty obvious that this new cpu design is not designed to go up against Intel in the traditional desktop cpu market ( where single thread performance rules ), but it will make inroads where parallel workloads are a priority ( scientific/engineering research etc. ). Perhaps one day the benefits of this cpu core-communications design will eventually benefit regular desktop pc users, but that day is far off methinks here. Still, it is an interesting evolution ( and expected as you say ), surprised it took them this long to finally put the idea in silicon... all good... a 24% improvement in a specific benchmark over AMD's existing HyperTransport protocol is nothing to sneeze at...
The richest man is not he who has the most, but he who needs the least. No good deed goes unpunished...

Image
User avatar
Hammer_Time
Rantmeister Mod
 
Posts: 33828
Joined: Wed Dec 31, 1969 4:00 pm
Location: Kitchener-Waterloo, Ontario, Mordor

Re: A 36-core chip design with an Internet-style routing network

Postby clone » Mon Jul 14, 2014 6:00 pm

but it will make inroads where parallel workloads are a priority
isn't this being done on graphics cards whose architecture is highly optimized for just that type of computing?

I wonder..... the improvement is nothing to snivel at but I wonder if these types of configs won't wind up being the domain of Nvidia and possibly AMD if they can manage it. On the surface it would appear they have the head start.
When we lose the right to be different, we lose the privilege to be free.
clone
X-bit Film Critic
 
Posts: 8106
Joined: Sun Aug 15, 2004 11:13 am

Re: A 36-core chip design with an Internet-style routing network

Postby Hammer_Time » Mon Jul 14, 2014 6:06 pm

Yes, to a degree, but gpgpu computing is another segment altogether... really "dumb" cores ( aka "shaders" ) on graphics cards ( not even close to the complexity of modern Intel/AMD cpu "cores/modules" ) being exploited to do parallel floating point operations ... the difference is that the "dumb simple cores" in the 36 core cpu we are discussing are also able to handle integer operations , to a degree... so that is more powerful overall, although you are right, this style of "internet routing inter-core communications" won't revolutionize the way of traditional processor design, it is an important step in leveraging the power of a bunch of "dumb cores", even better than the Alpha EV-6 design, which was morphed into the "AMD Hypertransport Protocol" as we now know it. Not knocking Hypertransport at all, indeed Intel basically "copied" it, but just saying it is an evolution and improvement for parallel processing in the future, that is all. ( not that I am a cpu engineer/designer, just talking in broad overall layman's terms here of course ) This cpu communication protocol may become quite relevant in the future though, as both Intel and AMD ramp up the number of cores ( integer and floating point capable ) down the road...

We ( humanity ) sure have come a long way from the first modern cpu design:

Image

Actually the performance between cores does not depend on "integer vs floating point operations performance" obviously, just making a point about gpgpu compute in general here... the future is wide open...
The richest man is not he who has the most, but he who needs the least. No good deed goes unpunished...

Image
User avatar
Hammer_Time
Rantmeister Mod
 
Posts: 33828
Joined: Wed Dec 31, 1969 4:00 pm
Location: Kitchener-Waterloo, Ontario, Mordor

Re: A 36-core chip design with an Internet-style routing network

Postby Hammer_Time » Fri Sep 19, 2014 9:06 am

The richest man is not he who has the most, but he who needs the least. No good deed goes unpunished...

Image
User avatar
Hammer_Time
Rantmeister Mod
 
Posts: 33828
Joined: Wed Dec 31, 1969 4:00 pm
Location: Kitchener-Waterloo, Ontario, Mordor

Re: A 36-core chip design with an Internet-style routing network

Postby Sauron_Daz » Tue Sep 23, 2014 3:52 am

Hammer_Time wrote:Great article about history of cpu:

http://www.techspot.com/article/874-his ... -computer/

8)
We never think of us as being one of Them. We are always one of Us. It's Them that do the bad things.
User avatar
Sauron_Daz
Evil OverLord Mod
 
Posts: 34605
Joined: Wed Dec 31, 1969 4:00 pm


Return to Breaking Technology News

Who is online

Users browsing this forum: No registered users and 2 guests