PRACTICAL TEST
Transcript
PRACTICAL TEST
PRACTICAL TEST Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 1 di 17 COTSon installation • Follow the COTSON USER GUIDE for the general installation procedure http://sourceforge.net/p/cotson/code/HEAD/tree/trunk/doc/COTSON_USER_GUI DE-v4.pdf (here we assume that you have installed COTSon in the “cotson” directory) • Download: http://www.dii.unisi.it/~giorgi/teaching/hpca2/betatools/benchmark_cjpeg.tar.gz • Examples folder: ~/cotson/src/examples/ • Uncompress CJPEG_exercise in examples folder: $ tar xvzf benchmark_cjpeg.tar.gz -C ~/cotson/src/examples/ Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 2 di 17 CJPEG program: benchmark • CJPEG program belongs to libjpeg-turbo-utils: an utilities for manipulating JPEG images • This benchmark compresses the named image file, or the standard input if no file is named, and produce JPEG file on the standard output. The currently supported input file formats are: PPM, PGM, and so on • This benchmark needs an INPUT (jpeg image) and produces an OUTPUT (ppm image in our case) • The directory ( cjpeg_benchmark ) contains input and expected output files Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 3 di 17 CJPEG PPM (192KB) Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 4 di 17 JPEG(9.6KB) CJPEG compile and execution • Launch the complete benchmark with: Make - Compare files produced with those in the expected_output directory Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 5 di 17 How to launch CJPEG manually • If you want launch cjpeg manually, you can use: $ ./cjpeg < input-large.ppm > output_large.jpeg – Are the results different? – What it means? Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 6 di 17 Cache Configuration • Examples cache configuration are: L1 dcache Memory latency Size Line size Num sets A) 1KB 16 1 24 B) 32KB 16 1 100 • You can modify cache configuration inside lua examples file Linux commands: • vi <file name>: open the editor • i: insert mode • esc: esc for exiting insert mode • :wq write file and quit Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 7 di 17 Set cache parameters • Example of cache configuration for memory A: Main memory - mem=Memory{ name="main", latency=24 } L2 cache: - l2=Cache{ name="l2cache", size="512kB", line_size=16, latency=20, num_sets=4, next=mem, write_policy="WB", write_allocate="true" } L1 instruction cache: - ic=Cache{ name="icache", size="1kB", line_size=16, latency=0, num_sets=1, next=l2, write_policy="WT", write_allocate="false" } L1 data cache: - dc=Cache{ name="dcache", size="1kB", line_size=16, latency=0, num_sets=1, next=l2, write_policy="WT", write_allocate="false" } Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 8 di 17 What happens? • Try to launch cjpeg with cache configuration A and cache configuration B (on large input) $ make run_cjpeg_benchmark_large_memoryA $ make run_cjpeg_benchmark_large_memoryB – What happens to miss rate? – What happens to IPC? – Why? Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 9 di 17 Cache Statistics 1 Memory A Memory B input_large.ppm Input_large.ppm 1kB 32kB 132084196 53861000 L1 dcache write_miss 1979661 920373 CPU instructions 34476403 34471151 nSimulation 39334375 16006250 L1 read_miss_rate 0.335606 0.0339938 L1 write_miss_rate 0.456265 0.210756 Instruction Per Cycle 0.261018 0.640002 Input L1 dcache size CPU cycles Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 10 di 17 Cache Statistics 2 Small Input Large Input 32kB 32kB Main Memory Access 149498 201586 L2 read 403369 476308 L1 dcache read 3207862 9911483 L1 dcache write 1533580 4367012 nSimulation 8028125 16006250 Instruction per Cycle 0.404297 0.640002 L1 write miss rate 0.231708 0.210756 L2 read miss rate 0.209424 0.267012 L1 dcache size Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 11 di 17 Cache Statistics As we can see, with a small cache memory configuration we need more cpu cycles and we have more read miss. Miss rate of L1 cache is better in large cache then in smaller as we expect. Instead, when we use a large input image, we have more main memory access and more nSimulation number. Infact large memory input requires large CPU usage and high number of CPU operations. Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 12 di 17 Region Of Interest (ROI) $ cotson_tracer 10 1 0 $ ./cjpeg < input_$BENCHMARK_TYPE.ppm > output-$BENCHMARK_TYPE-$BENCHMARK_MEM.jpeg $ cotson_tracer 10 1 1 13 Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL di 17 Region Of Interest (ROI) cotson_tracer is part of the guest tools, preinstalled in the BSD Call "10" is reserved for the selective sampler $ cotson_tracer 10 1 0 ## switch to timing $ ./cjpeg < input_small.ppm > output-small-A.jpeg $ cotson_tracer 10 1 1 ## back to functional 14 Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL di 17 COTSon tracing Cotson can be used and modified to receive input from inside the guest system. CJPEG exercise is an example of the exit_trigger functionality. The exit_trigger tells cotson that you want the simulation to end when a particular file appears in the host enviroment. We then use the send_keyboard function to instruct the internal system to produce a file and send it outside (using xput) into this exit trigger and the end of the simulation. This is a way of allowing simulation to take place for complete applications. 15 Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL di 17 COTSon tracing options = { exit_trigger="terminate", ….. } simnow.commands=function() … send_keyboard(‘ xget cjpeg.sh cjpeg.sh; chmod +x cjpeg.sh ; ./cjpeg.sh small A ; xput cjpeg.sh '..options.exit_trigger) end 16 Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL di 17 COTSon timing Current Models: - TraceStats: simple linear model - Timer0 : simple linear model + cache hierarchy - Timer1 : Timer0 + in-order pipeline - Bandwidth: only limited by memory bandwidth Roberto Giorgi, Universita' degli Studi di Siena, C215ES04--SL 17 di 17