This is a fantastic site and forum.. Cannot believe I didn't find it earlier.. So well here's a summary of my issue..its detailed but its only to cut down on going back and forth..
We host several of our client's catalogs on our portal site.. essentially there are a set of different pages for each of them on the main site.. so if the main site were to be xyz.lotus.com/wps/portal , the different catalogs would be at
xyz.lotus.com/wps/portal/123 , xyz.lotus.com/wps/portal/456 , and so on..
Since the last few yrs, we had been collecting stats only for the domain xyz.lotus.com which used to generate web stats collectively for all the sub-domains (or other pages) as well. .now we need to generate stats separately for them all..
What we had before :- a conf file for the main domain.. a log file..
What I tried doing:- Instance 1:- Created separate conf files but all analysing the same log file
Instance 2:- Created separate conf files, and also filtered out the log files into separate log files (did a grep and awk).. and then run awstats
Neither of the instances work for me.. In the first instance, all separate conf files give me exactly the same stats (ie hits, pages, etc which is impossible).. Plz note for Domain names in their respective conf files, I provided the whole URL path.. xyz.lotus.com/wps/portal/123..
In the 2nd instance, i use grep and awk to make new log files for each sub-domain… they are still in the same Log format (=1) as before.. but running awstats does not generate anything.. it says found 0 records, etc…
Wow, long post.. So in a nutshell, i am not sure if what I want is possible or not since it seems that the logs are being analysed only on the top level.. and while doing on the sub- domain level, somehow it shows all subdomain stats to be the same..
It might be that I m unsure of what domain name to provide .. Any help will be appreciated.. Thanks in advance !!
AWStats includes some options to handle subdomains, but what you use are not subdomains. Subdomains of domain.com are 123.domain.com or abcd.domain.com .
You can use the OnlyFiles directive to selectively process log file entries in a particular directory of your website :
OnlyFiles="REGEX[^\\/wps\\/portal\\/123]"
This will only count hits in the 123 subdirectory. This will probably be good enough, but it will still consider that the web domain is xyz.lotus.com and hits originating from another subdirectory will not be processed as external referrers.
Hmm, so I tried the OnlyFiles option but it still doesn't show me any stats being generated. Should I change the domain name in the conf file to be xyz.lotus.com rather than xyz.lotus.com/wps/portal/123 ?
Also, for OnlyFiles, willl it only pick up stats for lines containing the exact entry there.. for eg:- in my log file i would have entries for say xyz.lotus.com/wps/portal/123 and then to specific pages for that 123 catalog as well like, xyz.lotus.com/wps/portal/123/details?catalog.label=1TxxxzCC1P or xyz.lotus.com/wps/portal/123/results?catalog.label=1Txxx0Ctt1P
I need AWstats to generate webstats for all log files which contain atleast xyz.lotus.com/wps/portal/123/ but not only that.. or else my details and results pages hits would not be counted..
Also, below is a record of my Log file..I was curious to know as to which entry in this record does awstats really compare the Domain name that I specified in the conf file with ?
Regarding the domain name, just use xyz.lotus.com . What follows that does not belong to the domain name. AWStats does not need to find the domain name in your log file. It is not present in the example you gave.
So we recently moved our hosting environment.. and changed our domain names and all too..ie from xyz.lotus.com to abcd.com .. we already had awstats on lotus.com and now i ve set it up on abcd.com as well.. traffic from lotus.com is redirected onto abcd.com
running awstats on both servers results in some anomaly in the webstats generated.. for eg: for month of sept so far, on lotus.com, I see unique visitors as being 50 for the first 6 days of sept.. for the same duration on abcd.com, i see unique visitors as being only 10.. ? if the hits from lotus.com are being redirected to abcd.com (which is happening), shouldn't that be counted as unique hits too on abcd.com (ie web stats over there should show that it had more than 50 unique visitors too )
If not, could u explain wht constitutes a unique visitor ?
Assuming you redirect every hit on xyz.lotus.com to abcd.com, I agree that you should theoretically get at least as many hits on abcd.com as on xyz.lotus.com.
Now what can disturb that theory ?
1. some users might disable the automatic redirection in their browser, but this should be less than 1% of the visitors.
2. sometimes the redirect is not done for all pages. For example, if you only redirect the home page of xyz.lotus.com to the home page of abcd.com, then you could get hits on other pages of xyz.lotus.com that do not generate hits on abcd.com.
3. AWStats is detecting robots because of their user agents. Some robots mimic the user agents of regular browsers. AWStats is not able to identify them as robots and most robots do not follow redirects as human visitors do. I do not expect that 80% of the "visitors" are hidden robots though.
In my opinion, the most likely cause is related to the redirection. How do you do the redirect ?
I m looking into the Redirect rules we have into more detail now.. will let u know if i see any anomaly..
but from a high level, we are redirecting almost everything from lotus.com to 123.com on the same heirarchy.. ie for users who have bookmarked, say a specific details page on lotus.com/wps/portal/test/details?Navcode=xxxx , the redirect rule would take them to that particular details page on the new domain.. same goes for all other pages, etc.. so there are detailed redirect rules in place.. but that should still count as unique hits on the new domain… shouldnt it ?
having said that, how does awstats actually generate the unique visitors count from the log files ? is there anything in particular that it looks for .. something in the record.. ?
Hmm, ok i wonder what the issue might be then.. so one question: will awstats only consider records which are returned as 200 or with a 304 status code as a unique visitor.. if there r records with say status code, 404 Document Not Found, those records would not be considered towards the unique visitors count.. right ? if so, then i might know what the problem is.. one of our feeds directory is broken on the new domain.. and im seeing huge numbers for 404 errors as below..
So heres one logic.. Currently since we are still migrating to a new domain and all, i used to merge say 10 days of access.logs and then run awstats on them.. WHEREAS earlier in the old domain, this used to be done at midnight everyday by a cron job.. So i am thinking is if i merge 10 days of access.logs and then run awstats on them, i am bound to get less unique visitors.
Earlier since we used to process logs daily, the unique visitors count was done on the unique IPs in the records at a daily basis and not say 10 days basis or so..
Does that make sense ? If I go back and run AWStats on every access.log.##%^ that I have since our deployment, would that generate more unique visitors (in terms of unique IPS)..
Unique visitors are counted for the all month. It does not matter that you update the stats twice a day or once a month. You will end with the same number of unique visitors.
hmm so if an IP address accesses the site on one day, it ll be counted as 1 unique visitor. Then if the same IP accessed the site on any of the remaining 29 days, it wont be counted as unique anymore ?
is there a way to count unique IPs per day (still for the monthly report though and not the one where it shows a day by day view )
thnx.. that bit helped.. now i m one more loose thread… so we generated webstats report for our new domain based on our apache logs.. now we couldnt get a count of unique visitors and visits because we are behind a proxy firewall which routes end user hits to our apache server.. so to the server all hits look as if they r coming from this proxy..
but we managed to grab and grep the proxy logs to get our data at that level.. with end user IPs and all.. it seems the proxy server piped out the log in some weird format of their own..
Questions:-
i) To run AWStats on this proxy log format, would I need to change anything in my conf file.. and more importantly,
ii) If I only change the LogFile parameter in my conf file to point to this new log and run AWStats, will I see a unique visitors count in my original webstats report … ie in the same one HTML page/catalog