We are trying to determine the number of downloads done on one of our websites. We have seen the the PDF entry in the File type section of the AWStats report. It reports the number of "Hits" but that number seems very high (i.e. 10K hits for 6K visits).
So, that leaves us wondering how that count is calculated?
I have also found some Extra Section code to count downloads but as near as I can find, robot downloads of our pdf files would not be filtered out from any extra section definitions.
Is there a recommended way to count downloads?
I am more concerned with attempted rather than completed downloads although that number would be useful as well.
I don't have a simple answer to your question. Here is a summary of the situation regarding downloads.
The PDF entry in the file type section is the number of attempted downloads. It is correct that extra sections include robots traffic.
When one downloads a PDF file, the server typically splits the file in parts and sends a first packet with a 200 code ("OK") followed by several packets with 206 codes ("Partial Content"). The total bandwidth should include the 200 and 206 packets.
The ValidHTTPCodes directive specifies the codes counted by AWStats as valid codes. The default value is:
ValidHTTPCodes="200 304"
You can replace it by:
ValidHTTPCodes="200 206 304"
You will then get bandwidth values including the partial contents. The problem with that is that the number of hits will also include the 206 codes. Anyway, this bandwidth value could help you understand if most downloads are complete or if they are aborted.
If you do not want to change ValidHTTPCodes for all sections of the report, you can perform the change for an extra section with the ExtraSectionCodeFilter parameter.