PRACTICAL TEST

Transcript

PRACTICAL TEST
PRACTICAL TEST
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 1 di 17
COTSon installation
• Follow the COTSON USER GUIDE for the general installation procedure
http://sourceforge.net/p/cotson/code/HEAD/tree/trunk/doc/COTSON_USER_GUI
DE-v4.pdf
(here we assume that you have installed COTSon in the “cotson” directory)
• Download:
http://www.dii.unisi.it/~giorgi/teaching/hpca2/betatools/benchmark_cjpeg.tar.gz
• Examples folder:
~/cotson/src/examples/
• Uncompress CJPEG_exercise in examples folder:
$ tar xvzf benchmark_cjpeg.tar.gz -C ~/cotson/src/examples/
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 2 di 17
CJPEG program: benchmark
• CJPEG program belongs to libjpeg-turbo-utils: an utilities
for manipulating JPEG images
• This benchmark compresses the named image file, or
the standard input if no file is named, and produce JPEG
file on the standard output. The currently supported
input file formats are: PPM, PGM, and so on
• This benchmark needs an INPUT (jpeg image) and
produces an OUTPUT (ppm image in our case)
• The directory ( cjpeg_benchmark ) contains input and
expected output files
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 3 di 17
CJPEG
PPM (192KB)
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 4 di 17
JPEG(9.6KB)
CJPEG compile and execution
• Launch the complete benchmark with:
Make
- Compare files produced with those in the
expected_output directory
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 5 di 17
How to launch CJPEG
manually
• If you want launch cjpeg manually, you can use:
$ ./cjpeg < input-large.ppm > output_large.jpeg
– Are the results different?
– What it means?
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 6 di 17
Cache Configuration
• Examples cache configuration are:
L1 dcache
Memory latency
Size
Line size
Num sets
A)
1KB
16
1
24
B)
32KB
16
1
100
• You can modify cache configuration inside lua examples file
Linux commands:
•
vi <file name>: open the editor
•
i:
insert mode
•
esc:
esc for exiting insert mode
•
:wq
write file and quit
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 7 di 17
Set cache parameters
• Example of cache configuration for memory A:
Main memory
- mem=Memory{ name="main", latency=24 }
L2 cache:
- l2=Cache{ name="l2cache", size="512kB",
line_size=16, latency=20, num_sets=4, next=mem,
write_policy="WB", write_allocate="true" }
L1 instruction cache:
- ic=Cache{ name="icache", size="1kB", line_size=16,
latency=0, num_sets=1, next=l2,
write_policy="WT", write_allocate="false" }
L1 data cache:
- dc=Cache{ name="dcache", size="1kB", line_size=16,
latency=0, num_sets=1, next=l2,
write_policy="WT", write_allocate="false" }
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 8 di 17
What happens?
• Try to launch cjpeg with cache configuration A
and cache configuration B (on large input)
$ make run_cjpeg_benchmark_large_memoryA
$ make run_cjpeg_benchmark_large_memoryB
– What happens to miss rate?
– What happens to IPC?
– Why?
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 9 di 17
Cache Statistics 1
Memory A
Memory B
input_large.ppm
Input_large.ppm
1kB
32kB
132084196
53861000
L1 dcache write_miss
1979661
920373
CPU instructions
34476403
34471151
nSimulation
39334375
16006250
L1 read_miss_rate
0.335606
0.0339938
L1 write_miss_rate
0.456265
0.210756
Instruction Per Cycle
0.261018
0.640002
Input
L1 dcache size
CPU cycles
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 10 di 17
Cache Statistics 2
Small Input
Large Input
32kB
32kB
Main Memory Access
149498
201586
L2 read
403369
476308
L1 dcache read
3207862
9911483
L1 dcache write
1533580
4367012
nSimulation
8028125
16006250
Instruction per Cycle
0.404297
0.640002
L1 write miss rate
0.231708
0.210756
L2 read miss rate
0.209424
0.267012
L1 dcache size
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 11 di 17
Cache Statistics
As we can see, with a small cache memory
configuration we need more cpu cycles and we
have more read miss. Miss rate of L1 cache is
better in large cache then in smaller as we expect.
Instead, when we use a large input image, we
have more main memory access and more
nSimulation number. Infact large memory input
requires large CPU usage and high number of CPU
operations.
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 12 di 17
Region Of Interest (ROI)
$ cotson_tracer 10 1 0
$ ./cjpeg <
input_$BENCHMARK_TYPE.ppm >
output-$BENCHMARK_TYPE-$BENCHMARK_MEM.jpeg
$ cotson_tracer 10 1 1
13
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL
di 17
Region Of Interest (ROI)
cotson_tracer is part of the guest tools, preinstalled in the BSD
Call "10" is reserved for the selective sampler
$ cotson_tracer 10 1 0 ## switch to timing
$ ./cjpeg < input_small.ppm > output-small-A.jpeg
$ cotson_tracer 10 1 1 ## back to functional
14
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL
di 17
COTSon tracing
Cotson can be used and modified to receive input from inside
the guest system. CJPEG exercise is an example of the
exit_trigger functionality. The exit_trigger tells cotson that you
want the simulation to end when a particular file appears in the
host enviroment. We then use the send_keyboard function to
instruct the internal system to produce a file and send it
outside (using xput) into this exit trigger and the end of the
simulation.
This is a way of allowing simulation to take place for complete
applications.
15
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL
di 17
COTSon tracing
options = {
exit_trigger="terminate",
…..
}
simnow.commands=function()
…
send_keyboard(‘ xget cjpeg.sh cjpeg.sh;
chmod +x cjpeg.sh ;
./cjpeg.sh small A ;
xput cjpeg.sh '..options.exit_trigger)
end
16
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL
di 17
COTSon timing
Current Models:
- TraceStats: simple linear model
- Timer0 : simple linear model + cache hierarchy
- Timer1 : Timer0 + in-order pipeline
- Bandwidth: only limited by memory bandwidth
Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 17 di 17