From 0589b957a41303a2ab8a241d957f1799eb8c74fe Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:09:05 +0000 Subject: update table of contents in README.md --- README.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 430fcdc..ebf0c4a 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,9 @@ -- [gdpr-obfuscator](#gdpr-obfuscator) - * [Minimum Viable Product (MVP)](#minimum-viable-product--mvp-) - * [Setup](#setup) - * [Usage](#usage) +# GDPR Obfuscator - Launchpad Project + +1. [Overview](#overview) +2. [Minimum Viable Product (MVP)](#minimum-viable-product-mvp) +3. [Setup](#setup) +4. [Usage](#usage) ## Overview -- cgit v1.2.3 From 308f0e7befddae8b2306c9e57bdd5903c55ec171 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:13:09 +0000 Subject: update overview section to include JSON and parquet compatibility --- README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) (limited to 'README.md') diff --git a/README.md b/README.md index ebf0c4a..b82ccbb 100644 --- a/README.md +++ b/README.md @@ -7,10 +7,12 @@ ## Overview -A Python library designed to detect and remove Personally Identifiable Information (PII) from CSV files stored in an AWS S3 bucket. +A Python library designed to detect and remove Personally Identifiable Information (PII) from data formats such as CSV, JSON and Parquet formats. ## Minimum Viable Product (MVP) + + ## Setup ## Usage -- cgit v1.2.3 From 7823850692c12bb8a7155c5c26e66bd8129c9b4a Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:21:21 +0000 Subject: update MVP section to include the minimum requirements of the project --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index b82ccbb..222808e 100644 --- a/README.md +++ b/README.md @@ -11,7 +11,13 @@ A Python library designed to detect and remove Personally Identifiable Informati ## Minimum Viable Product (MVP) +The MVP covers: +1. Reading a JSON string containing the S3 location of the CSV file and the names of the fields that are required to be obfuscated +2. Ingesting the CSV file containing data records (with a primary key) from an AWS S3 bucket +3. Obfuscating chosen PII fields (e.g. `name`, `email_address`) by replacing their values with an obfuscated string (`***`) +4. Producing an output CSV file (or a byte-stream) that maintains the original structure but with sensitive fields changed +This meets the requirements under the General Data Protection Regulation [(GDPR)](https://ico.org.uk/media/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr-1-1.pdf) to ensure that all data containing information that can be used to identify an individual should be anonymised. ## Setup -- cgit v1.2.3 From 98fc0c2b71ae1c900ecacc19eb185a2542d4e8c4 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:25:27 +0000 Subject: add additional features section to README.md --- README.md | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 222808e..0a3857b 100644 --- a/README.md +++ b/README.md @@ -2,8 +2,9 @@ 1. [Overview](#overview) 2. [Minimum Viable Product (MVP)](#minimum-viable-product-mvp) -3. [Setup](#setup) -4. [Usage](#usage) +3. [Additional Features](#additional-features) +4. [Setup](#setup) +5. [Usage](#usage) ## Overview @@ -19,6 +20,14 @@ The MVP covers: This meets the requirements under the General Data Protection Regulation [(GDPR)](https://ico.org.uk/media/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr-1-1.pdf) to ensure that all data containing information that can be used to identify an individual should be anonymised. +### Additional Features + +*(Ranked in order of priority from high to low)* + +- [ ] **Support for JSON and Parquet formats**: Extend the library to support reading and writing data in JSON and Parquet formats +- [ ] **Command-line interface**: Create a command-line interface to allow users to run the obfuscation process from the terminal +- [ ] **Support for multiple sources**: Extend the library to support reading data from multiple sources (e.g. local file system) + ## Setup ## Usage -- cgit v1.2.3 From 83569d8facffeedb325d63364dab91abe71ba3c6 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:27:52 +0000 Subject: add setup/prerequisites section --- README.md | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index 0a3857b..113d841 100644 --- a/README.md +++ b/README.md @@ -30,4 +30,9 @@ This meets the requirements under the General Data Protection Regulation [(GDPR) ## Setup +### Prerequisites + +- Python >= 3.13 +- Poetry >= 2.0.1 + ## Usage -- cgit v1.2.3 From 00d940f72c4633855075cfb797732ef02588bba9 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:32:05 +0000 Subject: add setup/installation section --- README.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index 113d841..225a999 100644 --- a/README.md +++ b/README.md @@ -35,4 +35,22 @@ This meets the requirements under the General Data Protection Regulation [(GDPR) - Python >= 3.13 - Poetry >= 2.0.1 +### Installation + +1. Clone the repository: + +``` +git clone --recurse-submodules https://github.com/ajschofield/gdpr-obfuscator.git +cd gdpr-obfuscator +``` + +2. Install dependencies using poetry + +``` +# Production +poetry install +# Developer (optional) +poetry install --dev +``` + ## Usage -- cgit v1.2.3 From 0e89a74646d1dcb23c313ca13052c73f9b6c7989 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:32:24 +0000 Subject: update table of contents in README.md --- README.md | 2 ++ 1 file changed, 2 insertions(+) (limited to 'README.md') diff --git a/README.md b/README.md index 225a999..f575e59 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,8 @@ 2. [Minimum Viable Product (MVP)](#minimum-viable-product-mvp) 3. [Additional Features](#additional-features) 4. [Setup](#setup) + 1. [Prerequisites](#prerequisites) + 2. [Installation](#installation) 5. [Usage](#usage) ## Overview -- cgit v1.2.3 From 4740873482831a77c253bee3c0b521e09a3059a9 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:33:49 +0000 Subject: move additional features to subsection under MVP in README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'README.md') diff --git a/README.md b/README.md index f575e59..33c3987 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ 1. [Overview](#overview) 2. [Minimum Viable Product (MVP)](#minimum-viable-product-mvp) -3. [Additional Features](#additional-features) + 1. [Additional Features](#additional-features) 4. [Setup](#setup) 1. [Prerequisites](#prerequisites) 2. [Installation](#installation) -- cgit v1.2.3 From 6f58a724cfbf88fe12c96fdb4a038e65012a3b88 Mon Sep 17 00:00:00 2001 From: Alex Schofield Date: Mon, 17 Feb 2025 14:35:37 +0000 Subject: update code block formatting in README.md --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'README.md') diff --git a/README.md b/README.md index 33c3987..5cd4bfb 100644 --- a/README.md +++ b/README.md @@ -41,14 +41,14 @@ This meets the requirements under the General Data Protection Regulation [(GDPR) 1. Clone the repository: -``` +```bash git clone --recurse-submodules https://github.com/ajschofield/gdpr-obfuscator.git cd gdpr-obfuscator ``` 2. Install dependencies using poetry -``` +```bash # Production poetry install # Developer (optional) -- cgit v1.2.3