diff --git a/README.md b/README.md index 9c3ab0e..1ba52c4 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,25 @@ # Report for Software Engineering for AI-Enabled System +- [Report for Software Engineering for AI-Enabled System](#report-for-software-engineering-for-ai-enabled-system) + - [Section 1: ML Model Implementation](#section-1-ml-model-implementation) + - [Task 1.1: ML Canvas Design](#task-11-ml-canvas-design) + - [Task 1.2: Model Training Implementation](#task-12-model-training-implementation) + - [Input data](#input-data) + - [Fine-tuning loop](#fine-tuning-loop) + - [Validation methodology](#validation-methodology) + - [Validation During Fine-Tuning](#validation-during-fine-tuning) + - [Post-Fine-Tuning Evaluation](#post-fine-tuning-evaluation) + - [Task 1.4: Model Versioning and Experimentation](#task-14-model-versioning-and-experimentation) + - [Task 1.5 + 1.6: Model Explainability + Prediction Reasoning](#task-15--16-model-explainability--prediction-reasoning) + - [Traceable Prompting](#traceable-prompting) + - [Task 1.7: Model Deployment as a Service](#task-17-model-deployment-as-a-service) + - [Section 2: UI-Model Interface](#section-2-ui-model-interface) + - [Task 2.1 UI design](#task-21-ui-design) + - [Task 2.2: Demonstration](#task-22-demonstration) + - [Interface Testing and Implementation](#interface-testing-and-implementation) + - [Challenges](#challenges) + + ## Section 1: ML Model Implementation ### Task 1.1: ML Canvas Design @@ -17,7 +37,7 @@ The Feedback section outlines how the model will learn over time by tracking met ### Task 1.2: Model Training Implementation -I did not train the LLM model by myself but instead, I do fine-tuning on gemini-2.0-flash-lite-001 in vertex AI platform with supervised learning approach. +I did not train the LLM model by myself but instead, I do fine-tuning on `gemini-2.0-flash-lite-001` in vertex AI platform with supervised learning approach. #### Input data @@ -28,16 +48,16 @@ Here is example of training data I use to fine-tune the model: It is in JSONL or JSONLines format which suitable for large scale training data, these datas are combination from two sources 1. Collected from my pipeline service - Combine the data output from pipeline with specific prompt to create user role and define the target canonical dataset for model role -2. Generate with Gemini 2.5 Flash Preview 04-17 with this prompt +2. Generate with `Gemini 2.5 Flash Preview 04-17` with this prompt - Craft prompt to more synthetic datas and cover more cases We need to do data generation because pipeline process take a lot of time to scrape data from web. Separate into 3 versions -- `train-1.jsonl`: 1 samples (2207 tokens) -- `train-2.jsonl`: 19 samples (33320 tokens) + 12 samples `evluation.jsonl` -- `train-3.jsonl`: 25 samples (43443 tokens) + 12 samples `evluation.jsonl` +- [`train-1.jsonl`](data/train/train-1.jsonl): 1 samples (2207 tokens) +- [`train-2.jsonl`](data/train/train-2.jsonl): 19 samples (33320 tokens) + 12 samples `evluation.jsonl` +- [`train-3.jsonl`](data/train/train-3.jsonl): 25 samples (43443 tokens) + 12 samples `evluation.jsonl` #### Fine-tuning loop @@ -63,8 +83,8 @@ During fine-tuning, if we provide evaluation data, Vertex AI will calculate the We approach two methods -1. JSON Syntactic Validity: Parse generated json string with json.loads() -2. Pydantic Schema Conformance: If the generated output is valid JSON, try to instantiate your CanonicalRecord Pydantic model with the parsed dictionary: CanonicalRecord(**parsed_generated_json). +1. JSON Syntactic Validity: Parse generated json string with `json.loads()` +2. Pydantic Schema Conformance: If the generated output is valid JSON, try to instantiate on [`CanonicalRecord Pydantic model`](schemas/canonical.py) with the parsed dictionary: `CanonicalRecord(**parsed_generated_json)`. To calculate the metrics, I run the following code @@ -173,5 +193,15 @@ We don't have any UI to gain feedback from user at this time, but we plan to add ### Task 2.2: Demonstration -#### UI - Model Interface Design -#### Interface Testing and Implementation \ No newline at end of file +#### Interface Testing and Implementation + +Here is the successful interaction between input data with vary sources (api, file, scraped) to unified canonical record. + +```json + +``` + +##### Challenges + +1. Prompt is not dynamically change based on pydantic model. + - We found out that we can embeded the pydantic schema into prompt directly so it can update automatically when we change the pydantic model. \ No newline at end of file diff --git a/SETUP.md b/SETUP.md new file mode 100644 index 0000000..194b500 --- /dev/null +++ b/SETUP.md @@ -0,0 +1,34 @@ +# Setup the evaluation and explainability testing environment + +Here is the setup guide for evaluation and explainability testing environment. If you want to observe the full pipeline service code, please take a look at [Borbann repository](https://github.com/Sosokker/borbann/tree/main/pipeline). + +## Prerequisites + +You need the following tools to run the evaluation and explainability testing environment + +- Python 3.12 +- Google Cloud SDK +- Vertex AI SDK +- UV + +Also, you need to modify the code in `vertex.py` to point to your project ID and model name. Create your own model in Vertex AI platform first, using the `train-1.jsonl`, `train-2.jsonl`, `train-3.jsonl` as training data and `evluation.jsonl` as evaluation data. + +## Setup + +```bash +uv sync +``` + +## Evaluation + +```bash +gcloud auth application-default login +uv run evaluate.py +``` + +## Explainability + +```bash +gcloud auth application-default login +uv run explainability.py +```