diff options
| author | Alex <git@ajschof.me> | 2024-08-19 12:09:25 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2024-08-19 12:09:25 +0100 |
| commit | f28e4038d20b4630fafcae9a7825794e529bace2 (patch) | |
| tree | 0c378561e0dde843c0a281c692d137bb6bb0d0a7 /README.md | |
| parent | 5cc511d2afeea262db0db7039c8f83c123da77ea (diff) | |
| parent | 09b8b7903098a988a9a022d0ab607f8131c9c78f (diff) | |
| download | de-project-bentley-f28e4038d20b4630fafcae9a7825794e529bace2.tar.gz de-project-bentley-f28e4038d20b4630fafcae9a7825794e529bace2.zip | |
Merge branch 'development' into feature/test-extract-lambda
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 52 |
1 files changed, 51 insertions, 1 deletions
@@ -1 +1,51 @@ -# de-project-bentley
\ No newline at end of file +# ToteSys - Data Engineering Project + +[](https://www.python.org/) +[](https://aws.amazon.com/) +[](https://www.terraform.io/) +[](https://www.postgresql.org/) +[](https://github.com/features/actions) + +[](https://github.com/ajschofield/de-project-bentley/actions/workflows/deploy.yml?query=branch%3Amain) +[](https://github.com/ajschofield/de-project-bentley/deployments/production) +# Summary +The project aims to implement a data platform that can extract data from an +operational database, archive it in a data lake, and make it easily accessible +within a remodelled OLAP data warehouse. + +The solution showcases our skills in: + +- Python +- PostgreSQL +- Database modelling +- Amazon Web Services (AWS) +- Agile methodologies + +# Main Objective + +Our goal is to create a reliable ETL (Extract, Transform, Load) pipeline that +can: + +1. Extract the data from the `totesys` operational database +2. Store the data in AWS S3 buckets, that will form our data lake +3. Transform the data into a suitable schema for the data warehouse +4. Load the transformed data into the data warehouse hosted on AWS + +# Key Features + +We aim for the project to have certain features. Some are more prioritised than +others. + +- [ ] Automated data ingestion from `totesys` db +- [ ] Data storage for ingested and processed data in S3 buckets +- [ ] Data transformation for data warehouse schema +- [ ] Automated data loading into the data warehouse schema +- [ ] Logging and monitoring with CloudWatch +- [ ] Notifications for errors and successful runs (e.g. successful ingestion) +- [ ] Visualisation of warehouse data + +# Test Coverage +TBA + +# Contributors +TBA
\ No newline at end of file |
