Sistemi embedded a multiprocessore Motivation

Transcript

Sistemi embedded a multiprocessore Motivation
Sistemi embedded a
multiprocessore
Massimo Bocchi
Architetture Digitali per l’Elaborazione dei Segnali LS
A.A. 2003/2004
Motivation
„
Performance increase
– Pipelined execution
– Parallel execution
„
Cost effectiveness
–
–
„
Custom uniprocessors are less cost effective
Microprocessor design cost can be amortized if the
processor core can be reused on the same system
Multi processor System On Chip (MPSoC
(MPSoC))
– Growing transistor count on a single chip should be
exploited
– Parallelism becomes effective use of chip density
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
2/23
1
Processor-time diagram
sequential
execution
P2
2-level
pipelined
execution
P2
2-level
parallel
execution
P2
P1 A
P1 A
P1 A
A
A
A
A
A
A
Massimo Bocchi, 20/02/2004
C
C
C
C
C
A
A
A
B
C
C
C
C
C
C
C
C
C
B
A
A
A
B
A
A
A
B
A
A
A
B
C
C
C
C
C
A
C
C
A
A
A
B
B
B
C
C
C
ARCES - University of Bologna
C
3/23
Main topics
„ Inter-processor
communication and data
sharing
„ Inter-processor synchronization
„ Interconnection architecture
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
4/23
2
Multiprocessor classification
In 1966 Flynn proposed a multiprocessor classification:
SISD (single instruction stream – single data stream): a normal
single processor based on the von Neumann model
„ SIMD (single instruction stream – multiple data stream): a single
instruction stream is generated by a single control unit but used
used by
several processors; each processor executes the same instruction
stream but acts upon different data, generating multiple data
streams
„ MISD (multiple instruction stream – single data stream): no existing
implementations can be referred to this model
„ MIMD (multiple instruction stream – multiple data stream): one
instruction stream is generated for each computer; each instruction
instruction
acts upon different data
„
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
5/23
MIMD multiprocessor classification
„
A MIMD multiprocessor system needs a mechanism to
load instruction and data memories and to pass
information between the processors.
Two main models:
shared memory multiprocessor systems
– Uniform Memory Access (UMA) multiprocessors or Symmetric
Multiprocessors (SMP)
– NonNon-Uniform Memory Access (NUMA) multiprocessors or
Asymmetric Multiprocessors (AMP)
„
multiprocessor systems without shared memory
– dataflow multiprocessor systems
– messagemessage-passing multiprocessor systems
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
6/23
3
Shared memory model
„
„
All the processors
have access to a
common memory
space
Cache memories
create consistency
problems on the
shared data
Massimo Bocchi, 20/02/2004
Memories
Interconnection
network
Processors
ARCES - University of Bologna
7/23
Shared memory multiprocessors
„ Implicit
communication performed through
shared variables
„ Explicit synchronization based on
lock/unlock mechanism (semaphores and
monitors)
„ Two types of shared memory systems:
– Uniform Memory Access (UMA)
– NonNon-Uniform Memory Access (NUMA)
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
8/23
4
UMA Multiprocessor Systems
„
„
„
„
All the memory resources
feature the same access
latency
CPUs can access to RAM
via bus or interconnection
network
Cache memories and
local (private) memories
can be used to reduce
bus contention or
network traffic.
Straightforward
programmability
Private
Private
Memory
Memory
Private
Private
Memory
Memory
CPU
CPU
cache
cache
Shared
Shared
Memory
Memory
Interconnection
network
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
9/23
NUMA Multiprocessor Systems
„
„
„
„
Remote memory
resources have higher
access latency then local
memories
The system features a
single address space
visible to all CPUs
Potentially higher
performance due to
higher scalability
Difficult to program
Massimo Bocchi, 20/02/2004
Node 1
CPU
CPU
cache
cache
M
Node n
CPU
CPU
cache
cache
M
Interconnection
network
ARCES - University of Bologna
10/23
10/23
5
Dataflow model
„
„
„
This model directly
derives from DFG
representations
An instruction is executed
within a module when the
operands required are
available
The computation is data
driven since the data
stream activates the
nodal operations
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
11/23
11/23
Message-passing model
„
„
Direct links are used
to pass data between
the processors
There are only local
memories; each
memory can be
accessed only by one
processor
Massimo Bocchi, 20/02/2004
Interconnection
network
Processors
Memories
ARCES - University of Bologna
12/23
12/23
6
Message-passing systems elements
„ Node:
an independent subsystem
containing a processor and local memories
and peripherals
„ Link: a physical communication path
between two nodes
„ Channel: a communication path between
two processes in one processor or
between two processes executing on
different processors
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
13/23
13/23
Message Passing Systems
communication based on send and
receive primitives
„ Implicit synchronization through blocking
methods
„ Since each node runs as an independent
cluster, these system are also referred as
multicomputers.
„ Explicit
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
14/23
14/23
7
Message communication
mechanisms
Process P1
Process P2
Process P3
receive (m1, P1)
send (m1, P2)
receive (m1, P1)
send (m2, P3)
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
15/23
15/23
Interconnection architectures
„ Bus-based
interconnection
M
P
Massimo Bocchi, 20/02/2004
M
P
M
P
P
ARCES - University of Bologna
16/23
16/23
8
Interconnection architectures (2)
„ Switch-based
interconnection (network)
– Crossbar switch
– Omega switching
network
P
M
P
P
M
P
P
M
P
P
M
M
Massimo Bocchi, 20/02/2004
M
M
ARCES - University of Bologna
17/23
17/23
Bus-based multiprocessor systems
„
„
„
„
A typical implementation of a shared memory
model consists of a busbus-based multiprocessor
system (e.g. an AMBA AHBAHB-based system)
These systems are not easily expandable to
contain a large number of bus masters
Synchronization mechanisms are necessary to
maintain consistency on the shared data
The shared bus and memory generate
contentions between bus masters, thus reducing
the system speed and the data throughput
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
18/23
18/23
9
Parallel systems architectures
Shared Memory
P
M
P
P
P
M
P
P
M
P
M
M
M
M
P
P
P
P
M
M
M
M
P
P
P
P
P
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
Switch-based
M
M
Bus-based
M
Private Memory
19/23
19/23
How many processors?
Category
Type
Number of
processors
Communication
model
Message passing
8-256
Physical connection
Massimo Bocchi, 20/02/2004
Shared
memory
NUMA
8-256
UMA
2-64
Switch-based
8-256
Bus-based
2-32
ARCES - University of Bologna
20/23
20/23
10
MPI standard
„
„
„
„
„
„
Message Passing Interface is a standard for
writing messagemessage-passing programs
Supports an extended messagemessage-passing model
It is not targeted to a specific platform
Both Fortran 77 and C languages are supported
Main MPI features are intended to improve
performance on scalable parallel computers with
specialized interinter-processor communication
hardware
MPICH is a free portable implementation of MPI
standard
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
21/23
21/23
MPI Programming
„
„
Single Program Multiple Data (SPMD) model
Communication:
– PointPoint-toto-point
– Collective (broadcast primitives)
„
Synchronization
– PointPoint-toto-point synchronization is performed by
message passing and blocking primitives
– Global synchronization is done by broadcast
communication paradigm
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
22/23
22/23
11
SPMD Programming Model
„
„
„
„
The same program is loaded by all the
processors
Each processor can identify itself
Total number of processors is known
Execution can be different on each CPU:
My_ID = getID();
if (my_ID == 1)
execute task1;
if (my_ID == 2)
execute task2;
…
if (my_ID == N)
execute taskN;
Massimo Bocchi, 20/02/2004
ARCES - University of Bologna
23/23
23/23
12