Commit | Line | Data |
---|---|---|
b64dcbe3 MN |
1 | FFmpeg & evaluating performance on the PowerPC Architecture HOWTO |
2 | ||
b839da64 | 3 | (c) 2003-2004 Romain Dolbeau <romain@dolbeau.org> |
b64dcbe3 MN |
4 | |
5 | ||
6 | ||
7 | I - Introduction | |
8 | ||
2b552569 DB |
9 | The PowerPC architecture and its SIMD extension AltiVec offer some |
10 | interesting tools to evaluate performance and improve the code. | |
41061adf | 11 | This document tries to explain how to use those tools with FFmpeg. |
b64dcbe3 | 12 | |
2b552569 DB |
13 | The architecture itself offers two ways to evaluate the performance of |
14 | a given piece of code: | |
b64dcbe3 MN |
15 | |
16 | 1) The Time Base Registers (TBL) | |
17 | 2) The Performance Monitor Counter Registers (PMC) | |
18 | ||
41061adf DB |
19 | The first ones are always available, always active, but they're not very |
20 | accurate: the registers increment by one every four *bus* cycles. On | |
21 | my 667 Mhz tiBook (ppc7450), this means once every twenty *processor* | |
22 | cycles. So we won't use that. | |
b64dcbe3 | 23 | |
41061adf | 24 | The PMC are much more useful: not only can they report cycle-accurate |
2b552569 | 25 | timing, but they can also be used to monitor many other parameters, |
41061adf | 26 | such as the number of AltiVec stalls for every kind of instruction, |
2b552569 DB |
27 | or instruction cache misses. The downside is that not all processors |
28 | support the PMC (all G3, all G4 and the 970 do support them), and | |
29 | they're inactive by default - you need to activate them with a | |
41061adf DB |
30 | dedicated tool. Also, the number of available PMC depends on the |
31 | procesor: the various 604 have 2, the various 75x (aka. G3) have 4, | |
32 | and the various 74xx (aka G4) have 6. | |
b64dcbe3 | 33 | |
41061adf DB |
34 | *WARNING*: The PowerPC 970 is not very well documented, and its PMC |
35 | registers are 64 bits wide. To properly notify the code, you *must* | |
36 | tune for the 970 (using --tune=970), or the code will assume 32 bit | |
2b552569 | 37 | registers. |
b64dcbe3 MN |
38 | |
39 | ||
40 | II - Enabling FFmpeg PowerPC performance support | |
41 | ||
41061adf DB |
42 | This needs to be done by hand. First, you need to configure FFmpeg as |
43 | usual, but add the "--powerpc-perf-enable" option. For instance: | |
b64dcbe3 MN |
44 | |
45 | ##### | |
1c1b5a40 | 46 | ./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable |
b64dcbe3 MN |
47 | ##### |
48 | ||
1c1b5a40 | 49 | This will configure FFmpeg to install inside /usr/local/ffmpeg-svn, |
2b552569 | 50 | compiling with gcc-3.3 (you should try to use this one or a newer |
41061adf DB |
51 | gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of |
52 | thumb, those at 550Mhz and more). It will also enable the PMC. | |
b64dcbe3 MN |
53 | |
54 | You may also edit the file "config.h" to enable the following line: | |
55 | ||
56 | ##### | |
57 | // #define ALTIVEC_USE_REFERENCE_C_CODE 1 | |
58 | ##### | |
59 | ||
2b552569 DB |
60 | If you enable this line, then the code will not make use of AltiVec, |
61 | but will use the reference C code instead. This is useful to compare | |
41061adf | 62 | performance between two versions of the code. |
b64dcbe3 | 63 | |
41061adf | 64 | Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h": |
b64dcbe3 MN |
65 | |
66 | ##### | |
67 | #define POWERPC_NUM_PMC_ENABLED 4 | |
68 | ##### | |
69 | ||
41061adf DB |
70 | If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more |
71 | PMC than available on your CPU! | |
b64dcbe3 | 72 | |
41061adf | 73 | Then, simply compile FFmpeg as usual (make && make install). |
b64dcbe3 MN |
74 | |
75 | ||
76 | ||
77 | III - Using FFmpeg PowerPC performance support | |
78 | ||
41061adf | 79 | This FFmeg can be used exactly as usual. But before exiting, FFmpeg |
2b552569 | 80 | will dump a per-function report that looks like this: |
b64dcbe3 MN |
81 | |
82 | ##### | |
83 | PowerPC performance report | |
2b552569 DB |
84 | Values are from the PMC registers, and represent whatever the |
85 | registers are set to record. | |
b64dcbe3 MN |
86 | Function "gmc1_altivec" (pmc1): |
87 | min: 231 | |
88 | max: 1339867 | |
89 | avg: 558.25 (255302) | |
90 | Function "gmc1_altivec" (pmc2): | |
91 | min: 93 | |
92 | max: 2164 | |
93 | avg: 267.31 (255302) | |
94 | Function "gmc1_altivec" (pmc3): | |
95 | min: 72 | |
96 | max: 1987 | |
97 | avg: 276.20 (255302) | |
98 | (...) | |
99 | ##### | |
100 | ||
2b552569 | 101 | In this example, PMC1 was set to record CPU cycles, PMC2 was set to |
41061adf | 102 | record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec |
2b552569 | 103 | Issue Stalls. |
b64dcbe3 | 104 | |
2b552569 DB |
105 | The function "gmc1_altivec" was monitored 255302 times, and the |
106 | minimum execution time was 231 processor cycles. The max and average | |
107 | aren't much use, as it's very likely the OS interrupted execution for | |
41061adf | 108 | reasons of its own :-( |
b64dcbe3 | 109 | |
41061adf DB |
110 | With the exact same settings and source file, but using the reference C |
111 | code we get: | |
b64dcbe3 MN |
112 | |
113 | ##### | |
114 | PowerPC performance report | |
2b552569 DB |
115 | Values are from the PMC registers, and represent whatever the |
116 | registers are set to record. | |
b64dcbe3 MN |
117 | Function "gmc1_altivec" (pmc1): |
118 | min: 592 | |
119 | max: 2532235 | |
120 | avg: 962.88 (255302) | |
121 | Function "gmc1_altivec" (pmc2): | |
122 | min: 0 | |
123 | max: 33 | |
124 | avg: 0.00 (255302) | |
125 | Function "gmc1_altivec" (pmc3): | |
126 | min: 0 | |
127 | max: 350 | |
128 | avg: 0.03 (255302) | |
129 | (...) | |
130 | ##### | |
131 | ||
2b552569 DB |
132 | 592 cycles, so the fastest AltiVec execution is about 2.5x faster than |
133 | the fastest C execution in this example. It's not perfect but it's not | |
134 | bad (well I wrote this function so I can't say otherwise :-). | |
b64dcbe3 | 135 | |
2b552569 | 136 | Once you have that kind of report, you can try to improve things by |
41061adf DB |
137 | finding what goes wrong and fixing it; in the example above, one |
138 | should try to diminish the number of AltiVec stalls, as this *may* | |
139 | improve performance. | |
b64dcbe3 MN |
140 | |
141 | ||
142 | ||
41061adf | 143 | IV) Enabling the PMC in Mac OS X |
b64dcbe3 | 144 | |
2b552569 DB |
145 | This is easy. Use "Monster" and "monster". Those tools come from |
146 | Apple's CHUD package, and can be found hidden in the developer web | |
41061adf | 147 | site & FTP site. "MONster" is the graphical application, use it to |
2b552569 DB |
148 | generate a config file specifying what each register should |
149 | monitor. Then use the command-line application "monster" to use that | |
150 | config file, and enjoy the results. | |
b64dcbe3 | 151 | |
41061adf | 152 | Note that "MONster" can be used for many other things, but it's |
2b552569 | 153 | documented by Apple, it's not my subject. |
b64dcbe3 | 154 | |
7bb79b44 GP |
155 | If you are using CHUD 4.4.2 or later, you'll notice that MONster is |
156 | no longer available. It's been superseeded by Shark, where | |
157 | configuration of PMCs is available as a plugin. | |
158 | ||
b64dcbe3 MN |
159 | |
160 | ||
41061adf | 161 | V) Enabling the PMC on Linux |
b64dcbe3 | 162 | |
bbf84ca1 LB |
163 | On linux you may use oprofile from http://oprofile.sf.net, depending on the |
164 | version and the cpu you may need to apply a patch[1] to access a set of the | |
165 | possibile counters from the userspace application. You can always define them | |
166 | using the kernel interface /dev/oprofile/* . | |
167 | ||
168 | [1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch | |
b64dcbe3 | 169 | |
115329f1 | 170 | -- |
bbf84ca1 LB |
171 | Romain Dolbeau <romain@dolbeau.org> |
172 | Luca Barbato <lu_zero@gentoo.org> |