aboutsummaryrefslogtreecommitdiffstats
path: root/README.md
diff options
context:
space:
mode:
authorAlex <git@ajschof.me>2025-02-17 16:47:47 +0000
committerGitHub <noreply@github.com>2025-02-17 16:47:47 +0000
commit00917b8ecf67de9e955479be555d74fcc8257020 (patch)
tree17dd9b2e85866f85bdbb3702185463b13c911a28 /README.md
parentbf323b8c2ebd47bb446ba773027f389a0887e325 (diff)
parente2b0f2553b8dfcbe39f6e6fdc86ca68cc63f5705 (diff)
downloadgdpr-obfuscator-00917b8ecf67de9e955479be555d74fcc8257020.tar.gz
gdpr-obfuscator-00917b8ecf67de9e955479be555d74fcc8257020.zip
Merge pull request #3 from ajschofield/add-docs
update README & add comments in src code
Diffstat (limited to 'README.md')
-rw-r--r--README.md54
1 files changed, 49 insertions, 5 deletions
diff --git a/README.md b/README.md
index 430fcdc..5cd4bfb 100644
--- a/README.md
+++ b/README.md
@@ -1,14 +1,58 @@
-- [gdpr-obfuscator](#gdpr-obfuscator)
- * [Minimum Viable Product (MVP)](#minimum-viable-product--mvp-)
- * [Setup](#setup)
- * [Usage](#usage)
+# GDPR Obfuscator - Launchpad Project
+
+1. [Overview](#overview)
+2. [Minimum Viable Product (MVP)](#minimum-viable-product-mvp)
+ 1. [Additional Features](#additional-features)
+4. [Setup](#setup)
+ 1. [Prerequisites](#prerequisites)
+ 2. [Installation](#installation)
+5. [Usage](#usage)
## Overview
-A Python library designed to detect and remove Personally Identifiable Information (PII) from CSV files stored in an AWS S3 bucket.
+A Python library designed to detect and remove Personally Identifiable Information (PII) from data formats such as CSV, JSON and Parquet formats.
## Minimum Viable Product (MVP)
+The MVP covers:
+1. Reading a JSON string containing the S3 location of the CSV file and the names of the fields that are required to be obfuscated
+2. Ingesting the CSV file containing data records (with a primary key) from an AWS S3 bucket
+3. Obfuscating chosen PII fields (e.g. `name`, `email_address`) by replacing their values with an obfuscated string (`***`)
+4. Producing an output CSV file (or a byte-stream) that maintains the original structure but with sensitive fields changed
+
+This meets the requirements under the General Data Protection Regulation [(GDPR)](https://ico.org.uk/media/for-organisations/guide-to-data-protection/guide-to-the-general-data-protection-regulation-gdpr-1-1.pdf) to ensure that all data containing information that can be used to identify an individual should be anonymised.
+
+### Additional Features
+
+*(Ranked in order of priority from high to low)*
+
+- [ ] **Support for JSON and Parquet formats**: Extend the library to support reading and writing data in JSON and Parquet formats
+- [ ] **Command-line interface**: Create a command-line interface to allow users to run the obfuscation process from the terminal
+- [ ] **Support for multiple sources**: Extend the library to support reading data from multiple sources (e.g. local file system)
+
## Setup
+### Prerequisites
+
+- Python >= 3.13
+- Poetry >= 2.0.1
+
+### Installation
+
+1. Clone the repository:
+
+```bash
+git clone --recurse-submodules https://github.com/ajschofield/gdpr-obfuscator.git
+cd gdpr-obfuscator
+```
+
+2. Install dependencies using poetry
+
+```bash
+# Production
+poetry install
+# Developer (optional)
+poetry install --dev
+```
+
## Usage
git.ajschof.me — hosted by ajschofield — powered by cgit