diff options
| author | Alex <git@ajschof.me> | 2024-08-20 15:31:05 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-08-20 15:31:05 +0100 |
| commit | 80f531f3756c2db095dce0b0aee30e537d711566 (patch) | |
| tree | 671b2817d4576abd1132aded13f25ba545beff90 /README.md | |
| parent | 3ab3164c2e6f0e7a7ae6755a58914522bf3390a6 (diff) | |
| parent | a393d59e052d3a37d66f7a657a15cad1d486e3b1 (diff) | |
| download | de-project-bentley-80f531f3756c2db095dce0b0aee30e537d711566.tar.gz de-project-bentley-80f531f3756c2db095dce0b0aee30e537d711566.zip | |
Merge pull request #76 from ajschofield/development
pr: pull development into main
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 52 |
1 files changed, 51 insertions, 1 deletions
@@ -1 +1,51 @@ -# de-project-bentley
\ No newline at end of file +# ToteSys - Data Engineering Project + +[](https://www.python.org/) +[](https://aws.amazon.com/) +[](https://www.terraform.io/) +[](https://www.postgresql.org/) +[](https://github.com/features/actions) + +[](https://github.com/ajschofield/de-project-bentley/actions/workflows/deploy.yml?query=branch%3Amain) +[](https://github.com/ajschofield/de-project-bentley/deployments/production) +# Summary +The project aims to implement a data platform that can extract data from an +operational database, archive it in a data lake, and make it easily accessible +within a remodelled OLAP data warehouse. + +The solution showcases our skills in: + +- Python +- PostgreSQL +- Database modelling +- Amazon Web Services (AWS) +- Agile methodologies + +# Main Objective + +Our goal is to create a reliable ETL (Extract, Transform, Load) pipeline that +can: + +1. Extract the data from the `totesys` operational database +2. Store the data in AWS S3 buckets, that will form our data lake +3. Transform the data into a suitable schema for the data warehouse +4. Load the transformed data into the data warehouse hosted on AWS + +# Key Features + +We aim for the project to have certain features. Some are more prioritised than +others. + +- [ ] Automated data ingestion from `totesys` db +- [ ] Data storage for ingested and processed data in S3 buckets +- [ ] Data transformation for data warehouse schema +- [ ] Automated data loading into the data warehouse schema +- [ ] Logging and monitoring with CloudWatch +- [ ] Notifications for errors and successful runs (e.g. successful ingestion) +- [ ] Visualisation of warehouse data + +# Test Coverage +TBA + +# Contributors +TBA
\ No newline at end of file |
