More accurate value for log2(10)
[libav.git] / doc / ffmpeg_powerpc_performance_evaluation_howto.txt
CommitLineData
b64dcbe3
MN
1FFmpeg & evaluating performance on the PowerPC Architecture HOWTO
2
b839da64 3(c) 2003-2004 Romain Dolbeau <romain@dolbeau.org>
b64dcbe3
MN
4
5
6
7I - Introduction
8
2b552569
DB
9The PowerPC architecture and its SIMD extension AltiVec offer some
10interesting tools to evaluate performance and improve the code.
41061adf 11This document tries to explain how to use those tools with FFmpeg.
b64dcbe3 12
2b552569
DB
13The architecture itself offers two ways to evaluate the performance of
14a given piece of code:
b64dcbe3
MN
15
161) The Time Base Registers (TBL)
172) The Performance Monitor Counter Registers (PMC)
18
41061adf
DB
19The first ones are always available, always active, but they're not very
20accurate: the registers increment by one every four *bus* cycles. On
21my 667 Mhz tiBook (ppc7450), this means once every twenty *processor*
22cycles. So we won't use that.
b64dcbe3 23
41061adf 24The PMC are much more useful: not only can they report cycle-accurate
2b552569 25timing, but they can also be used to monitor many other parameters,
41061adf 26such as the number of AltiVec stalls for every kind of instruction,
2b552569
DB
27or instruction cache misses. The downside is that not all processors
28support the PMC (all G3, all G4 and the 970 do support them), and
29they're inactive by default - you need to activate them with a
41061adf
DB
30dedicated tool. Also, the number of available PMC depends on the
31procesor: the various 604 have 2, the various 75x (aka. G3) have 4,
32and the various 74xx (aka G4) have 6.
b64dcbe3 33
41061adf
DB
34*WARNING*: The PowerPC 970 is not very well documented, and its PMC
35registers are 64 bits wide. To properly notify the code, you *must*
36tune for the 970 (using --tune=970), or the code will assume 32 bit
2b552569 37registers.
b64dcbe3
MN
38
39
40II - Enabling FFmpeg PowerPC performance support
41
41061adf
DB
42This needs to be done by hand. First, you need to configure FFmpeg as
43usual, but add the "--powerpc-perf-enable" option. For instance:
b64dcbe3
MN
44
45#####
1c1b5a40 46./configure --prefix=/usr/local/ffmpeg-svn --cc=gcc-3.3 --tune=7450 --powerpc-perf-enable
b64dcbe3
MN
47#####
48
1c1b5a40 49This will configure FFmpeg to install inside /usr/local/ffmpeg-svn,
2b552569 50compiling with gcc-3.3 (you should try to use this one or a newer
41061adf
DB
51gcc), and tuning for the PowerPC 7450 (i.e. the newer G4; as a rule of
52thumb, those at 550Mhz and more). It will also enable the PMC.
b64dcbe3
MN
53
54You may also edit the file "config.h" to enable the following line:
55
56#####
57// #define ALTIVEC_USE_REFERENCE_C_CODE 1
58#####
59
2b552569
DB
60If you enable this line, then the code will not make use of AltiVec,
61but will use the reference C code instead. This is useful to compare
41061adf 62performance between two versions of the code.
b64dcbe3 63
41061adf 64Also, the number of enabled PMC is defined in "libavcodec/ppc/dsputil_ppc.h":
b64dcbe3
MN
65
66#####
67#define POWERPC_NUM_PMC_ENABLED 4
68#####
69
41061adf
DB
70If you have a G4 CPU, you can enable all 6 PMC. DO NOT enable more
71PMC than available on your CPU!
b64dcbe3 72
41061adf 73Then, simply compile FFmpeg as usual (make && make install).
b64dcbe3
MN
74
75
76
77III - Using FFmpeg PowerPC performance support
78
41061adf 79This FFmeg can be used exactly as usual. But before exiting, FFmpeg
2b552569 80will dump a per-function report that looks like this:
b64dcbe3
MN
81
82#####
83PowerPC performance report
2b552569
DB
84 Values are from the PMC registers, and represent whatever the
85 registers are set to record.
b64dcbe3
MN
86 Function "gmc1_altivec" (pmc1):
87 min: 231
88 max: 1339867
89 avg: 558.25 (255302)
90 Function "gmc1_altivec" (pmc2):
91 min: 93
92 max: 2164
93 avg: 267.31 (255302)
94 Function "gmc1_altivec" (pmc3):
95 min: 72
96 max: 1987
97 avg: 276.20 (255302)
98(...)
99#####
100
2b552569 101In this example, PMC1 was set to record CPU cycles, PMC2 was set to
41061adf 102record AltiVec Permute Stall Cycles, and PMC3 was set to record AltiVec
2b552569 103Issue Stalls.
b64dcbe3 104
2b552569
DB
105The function "gmc1_altivec" was monitored 255302 times, and the
106minimum execution time was 231 processor cycles. The max and average
107aren't much use, as it's very likely the OS interrupted execution for
41061adf 108reasons of its own :-(
b64dcbe3 109
41061adf
DB
110With the exact same settings and source file, but using the reference C
111code we get:
b64dcbe3
MN
112
113#####
114PowerPC performance report
2b552569
DB
115 Values are from the PMC registers, and represent whatever the
116 registers are set to record.
b64dcbe3
MN
117 Function "gmc1_altivec" (pmc1):
118 min: 592
119 max: 2532235
120 avg: 962.88 (255302)
121 Function "gmc1_altivec" (pmc2):
122 min: 0
123 max: 33
124 avg: 0.00 (255302)
125 Function "gmc1_altivec" (pmc3):
126 min: 0
127 max: 350
128 avg: 0.03 (255302)
129(...)
130#####
131
2b552569
DB
132592 cycles, so the fastest AltiVec execution is about 2.5x faster than
133the fastest C execution in this example. It's not perfect but it's not
134bad (well I wrote this function so I can't say otherwise :-).
b64dcbe3 135
2b552569 136Once you have that kind of report, you can try to improve things by
41061adf
DB
137finding what goes wrong and fixing it; in the example above, one
138should try to diminish the number of AltiVec stalls, as this *may*
139improve performance.
b64dcbe3
MN
140
141
142
41061adf 143IV) Enabling the PMC in Mac OS X
b64dcbe3 144
2b552569
DB
145This is easy. Use "Monster" and "monster". Those tools come from
146Apple's CHUD package, and can be found hidden in the developer web
41061adf 147site & FTP site. "MONster" is the graphical application, use it to
2b552569
DB
148generate a config file specifying what each register should
149monitor. Then use the command-line application "monster" to use that
150config file, and enjoy the results.
b64dcbe3 151
41061adf 152Note that "MONster" can be used for many other things, but it's
2b552569 153documented by Apple, it's not my subject.
b64dcbe3 154
7bb79b44
GP
155If you are using CHUD 4.4.2 or later, you'll notice that MONster is
156no longer available. It's been superseeded by Shark, where
157configuration of PMCs is available as a plugin.
158
b64dcbe3
MN
159
160
41061adf 161V) Enabling the PMC on Linux
b64dcbe3 162
bbf84ca1
LB
163On linux you may use oprofile from http://oprofile.sf.net, depending on the
164version and the cpu you may need to apply a patch[1] to access a set of the
165possibile counters from the userspace application. You can always define them
166using the kernel interface /dev/oprofile/* .
167
168[1] http://dev.gentoo.org/~lu_zero/development/oprofile-g4-20060423.patch
b64dcbe3 169
115329f1 170--
bbf84ca1
LB
171Romain Dolbeau <romain@dolbeau.org>
172Luca Barbato <lu_zero@gentoo.org>