Sistemi embedded a multiprocessore Motivation

Transcript

Sistemi embedded a
multiprocessore
Massimo Bocchi
Architetture Digitali per l’Elaborazione dei Segnali LS
A.A. 2003/2004
Motivation

Performance increase
– Pipelined execution
– Parallel execution

Cost effectiveness
–
–

Custom uniprocessors are less cost effective
Microprocessor design cost can be amortized if the
processor core can be reused on the same system
Multi processor System On Chip (MPSoC
(MPSoC))
– Growing transistor count on a single chip should be
exploited
– Parallelism becomes effective use of chip density
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
2/23
1
Processor-time diagram
sequential
execution
P2
2-level
pipelined
execution
P2
2-level
parallel
execution
P2
P1 A
P1 A
P1 A
A
A
A
A
A
A
C
C
C
C
C
A
A
A
B
C
C
C
C
C
C
C
C
C
B
A
A
A
B
A
A
A
B
A
A
A
B
C
C
C
C
C
A
C
C
A
A
A
B
B
B
C
C
C
C
3/23
Main topics
Inter-processor
communication and data
sharing
Inter-processor synchronization
Interconnection architecture
4/23
2
Multiprocessor classification
In 1966 Flynn proposed a multiprocessor classification:
SISD (single instruction stream – single data stream): a normal
single processor based on the von Neumann model
SIMD (single instruction stream – multiple data stream): a single
instruction stream is generated by a single control unit but used
used by
several processors; each processor executes the same instruction
stream but acts upon different data, generating multiple data
streams
MISD (multiple instruction stream – single data stream): no existing
implementations can be referred to this model
MIMD (multiple instruction stream – multiple data stream): one
instruction stream is generated for each computer; each instruction
instruction
acts upon different data

5/23
MIMD multiprocessor classification

A MIMD multiprocessor system needs a mechanism to
load instruction and data memories and to pass
information between the processors.
Two main models:
shared memory multiprocessor systems
– Uniform Memory Access (UMA) multiprocessors or Symmetric
Multiprocessors (SMP)
– NonNon-Uniform Memory Access (NUMA) multiprocessors or
Asymmetric Multiprocessors (AMP)

multiprocessor systems without shared memory
– dataflow multiprocessor systems
– messagemessage-passing multiprocessor systems
6/23
3
Shared memory model

All the processors
have access to a
common memory
space
Cache memories
create consistency
problems on the
shared data
Memories
Interconnection
network
Processors
7/23
Shared memory multiprocessors
Implicit
communication performed through
shared variables
Explicit synchronization based on
lock/unlock mechanism (semaphores and
monitors)
Two types of shared memory systems:
– Uniform Memory Access (UMA)
– NonNon-Uniform Memory Access (NUMA)
8/23
4
UMA Multiprocessor Systems

All the memory resources
feature the same access
latency
CPUs can access to RAM
via bus or interconnection
network
Cache memories and
local (private) memories
can be used to reduce
bus contention or
network traffic.
Straightforward
programmability
Private
Private
Memory
Memory
Private
Private
Memory
Memory
CPU
CPU
cache
cache
Shared
Shared
Memory
Memory
Interconnection
network
9/23
NUMA Multiprocessor Systems

Remote memory
resources have higher
access latency then local
memories
The system features a
single address space
visible to all CPUs
Potentially higher
performance due to
higher scalability
Difficult to program
Node 1
CPU
CPU
cache
cache
M
Node n
CPU
CPU
cache
cache
M
Interconnection
network
10/23
10/23
5
Dataflow model

This model directly
derives from DFG
representations
An instruction is executed
within a module when the
operands required are
available
The computation is data
driven since the data
stream activates the
nodal operations
11/23
11/23
Message-passing model

Direct links are used
to pass data between
the processors
There are only local
memories; each
memory can be
accessed only by one
processor
Interconnection
network
Processors
Memories
12/23
12/23
6
Message-passing systems elements
Node:
an independent subsystem
containing a processor and local memories
and peripherals
Link: a physical communication path
between two nodes
Channel: a communication path between
two processes in one processor or
between two processes executing on
different processors
13/23
13/23
Message Passing Systems
communication based on send and
receive primitives
Implicit synchronization through blocking
methods
Since each node runs as an independent
cluster, these system are also referred as
multicomputers.
Explicit
14/23
14/23
7
Message communication
mechanisms
Process P1
Process P2
Process P3
receive (m1, P1)
send (m1, P2)
receive (m1, P1)
send (m2, P3)
15/23
15/23
Interconnection architectures
Bus-based
interconnection
M
P
M
P
M
P
P
16/23
16/23
8
Interconnection architectures (2)
Switch-based
interconnection (network)
– Crossbar switch
– Omega switching
network
P
M
P
P
M
P
P
M
P
P
M
M
M
M
17/23
17/23
Bus-based multiprocessor systems

A typical implementation of a shared memory
model consists of a busbus-based multiprocessor
system (e.g. an AMBA AHBAHB-based system)
These systems are not easily expandable to
contain a large number of bus masters
Synchronization mechanisms are necessary to
maintain consistency on the shared data
The shared bus and memory generate
contentions between bus masters, thus reducing
the system speed and the data throughput
18/23
18/23
9
Parallel systems architectures
Shared Memory
P
M
P
P
P
M
P
P
M
P
M
M
M
M
P
P
P
P
M
M
M
M
P
P
P
P
P
Switch-based
M
M
Bus-based
M
Private Memory
19/23
19/23
How many processors?
Category
Type
Number of
processors
Communication
model
Message passing
8-256
Physical connection
Shared
memory
NUMA
8-256
UMA
2-64
Switch-based
8-256
Bus-based
2-32
20/23
20/23
10
MPI standard

Message Passing Interface is a standard for
writing messagemessage-passing programs
Supports an extended messagemessage-passing model
It is not targeted to a specific platform
Both Fortran 77 and C languages are supported
Main MPI features are intended to improve
performance on scalable parallel computers with
specialized interinter-processor communication
hardware
MPICH is a free portable implementation of MPI
standard
21/23
21/23
MPI Programming

Single Program Multiple Data (SPMD) model
Communication:
– PointPoint-toto-point
– Collective (broadcast primitives)

Synchronization
– PointPoint-toto-point synchronization is performed by
message passing and blocking primitives
– Global synchronization is done by broadcast
communication paradigm
22/23
22/23
11
SPMD Programming Model

The same program is loaded by all the
processors
Each processor can identify itself
Total number of processors is known
Execution can be different on each CPU:
My_ID = getID();
if (my_ID == 1)
execute task1;
if (my_ID == 2)
execute task2;
…
if (my_ID == N)
execute taskN;
23/23
23/23
12