Connect to S3 via Boto3. Boto is the Amazon Web Services (AWS) SDK for Python.
Fetching CloudFront log files. You can be analysed log files by downloading the files to the EC2 instance. This Python script does the following:
- Reads bucket name as the first argument.
- Saves CloudFront log files to the EC2 instance as the second argument.
- Combines the gzipped (gz) log files into a single date file.
- Decompresses the file.
- Removes comments.
- Removes S3 CloudFront logs.
You can install Python version 3+ and Boto3 library. And all things will be done in few minutes.
If your haven't installed Python with Boto3, you have to install.
Server system on Unix platforms:
$ sudo -i
$ yum -y install python36
$ pip install botocore
$ pip install boto3
Using Boto3
After installing boto3. You can use it to configure your credentials file:
You need to set up your AWS security credentials before run as Python script.
aws configure
Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials:
[default]
aws_access_key_id = <YOUR_AWS_ACCESS_KEY>
aws_secret_access_key = <YOUR_AWS_SECRET_KEY>
You can clone this repository.
$ git clone https://github.com/spakkanen/python-boto3-mv-cloudfront-s3-logs-to-ec2-instances.git
$ cd python-boto3-mv-cloudfront-s3-logs-to-ec2-instances
$ python mv_s3_all_files.py <S3_BUCKET_NAME> <WORK_PATH>
This Python application connects to the AWS S3. Script fetch CloudFront log files.
$ python mv_s3_all_files.py <S3_BUCKET_NAME> <WORK_PATH>
You should set up automatically run Python scripts on a schedule.
Schedule syntax is normal Unix cron/crontab syntax for examples:
2,12,22,32,42,52 * * * * python /root/python-boto3-mv-cloudfront-s3-logs-to-ec2-instances/mv_s3_all_files.py <S3_BUCKET_NAME> <WORK_PATH> > /dev/null
# Removes all files older than 2 weeks.
02 04 * * * find /root/python-boto3-mv-cloudfront-s3-logs-to-ec2-instances/cloudfront_prod1/ -mtime +14 -type f -exec rm -f {} \;
The log file is a UNIX standard format and that's easy using and read UNIX tools.
Find main page links (/) amount from current date log files.
$ cat logs/$(date +"%Y%m%d").log | grep -P '\t/\t' | wc -l
Find all main page links.
$ cat logs/$(date +"%Y%m%d").log | grep -P '\t/\t' | cut -f 5 | sort | uniq -c | sort -nr | head -n10
Find all 404 error links.
$ cat logs/$(date +"%Y%m%d").log | grep -P '\t404\t' | cut -f 5 | sort | uniq -c | sort -nr | head -n10
Sort most hits.
$ cat logs/$(date +"%Y%m%d").log | cut -f4-5 -d$'\t' | sort | uniq -c | sort -nr | head -n10
Contributions are welcome. Open an issue first before sending in a pull request. All pull requests review before they are merged to master.
Please contact: saku@kliffa.fi .
Member Name | GitHub Alias | Company | Role |
---|---|---|---|
Saku Pakkanen | [@spakkanen] (https://github.com/spakkanen) | [Kliffa Innovations Oy] (https://kliffa.fi/en/) | Committer |