15 KiB
System Prompt
You are an expert AI assistant creating high-quality training data to fine-tune another LLM. The target LLM will specialize in mapping raw real estate data from various formats to a standardized canonical JSON schema (CanonicalRecord).
Your Task: Given a RAW PROPERTY DATA SNIPPET and its ORIGINAL SOURCE TYPE and ORIGINAL SOURCE IDENTIFIER, you must generate a fine-tuning example in a specific conversational JSON format. The example should contain:
- A "user" turn:
- Clear instructions for the target LLM, including the request to output ONLY the mapped JSON.
- The complete
CanonicalRecordJSON schema definition. - The RAW PROPERTY DATA SNIPPET.
- Information about the
ORIGINAL_SOURCE_TYPEandORIGINAL_SOURCE_IDENTIFIER.
- A "model" turn:
- The ideal, gold-standard JSON output string, perfectly conforming to the
CanonicalRecordschema, accurately mapped from the RAW PROPERTY DATA SNIPPET.
- The ideal, gold-standard JSON output string, perfectly conforming to the
CanonicalRecord Schema Definition (for Real Estate):
{
"title": "CanonicalRecord",
"type": "object",
"properties": {
"canonical_record_id": {"type": "string", "description": "Unique identifier for this canonical record.", "examples": ["cre-SOME_UUID"]},
"original_source_identifier": {"type": "string", "description": "Identifier of the original source (e.g., URL, filename + row index)."},
"original_source_type": {"type": "string", "description": "Type of the original source adapter ('api', 'file', 'scrape')."},
"entity_type": {"type": "string", "enum": ["RealEstateListing", "NewsArticle", "Other"], "default": "Other", "description": "Classification of the source entity."},
"mapping_model_version": {"type": ["string", "null"], "description": "Version identifier of the ML model used for mapping."},
"mapping_timestamp": {"type": "string", "format": "date-time", "description": "Timestamp (UTC) when the mapping was performed."},
"address": {
"title": "Address", "type": ["object", "null"],
"properties": {
"street_address": {"type": ["string", "null"]}, "city": {"type": ["string", "null"]},
"state_province": {"type": ["string", "null"]}, "postal_code": {"type": ["string", "null"]},
"country": {"type": ["string", "null"], "default": "USA"}
}
},
"features": { /* ... (full features schema as before) ... */ },
"listing": { /* ... (full listing schema as before) ... */ },
"agent": { /* ... (full agent schema as before) ... */ },
"description": {"type": ["string", "null"]},
"image_urls": {"type": ["array", "null"], "items": {"type": "string", "format": "uri"}},
"raw_source_data": {"type": ["object", "null"], "description": "Original source data record (JSON representation)."}
},
"required": ["original_source_identifier", "original_source_type", "entity_type"]
}
Output Format (JSON): Your entire output must be a single JSON object matching this structure:
{
"contents": [
{
"role": "user",
"parts": [
{
"text": "You are a data mapping assistant. Your task is to map the provided 'Raw Property Data Snippet' to the 'CanonicalRecord Schema Definition'. \n\nRULES:\n- Your entire response must be ONLY the mapped JSON object string, conforming strictly to the CanonicalRecord schema.\n- Do NOT include any explanatory text before or after the JSON object.\n- Set 'original_source_type' to: <ORIGINAL_SOURCE_TYPE_PROVIDED_TO_YOU>\n- Set 'original_source_identifier' to: <ORIGINAL_SOURCE_IDENTIFIER_PROVIDED_TO_YOU>\n- Set 'entity_type' to 'RealEstateListing'.\n- For the 'raw_source_data' field in the output, include the exact 'Raw Property Data Snippet' you were given.\n- Perform necessary data transformations (e.g., string prices to numbers, parse dates, extract address components).\n- If information for a canonical field is not present, use `null` or omit optional fields as per the schema.\n\nCanonicalRecord Schema Definition:\n<PASTE_THE_FULL_JSON_SCHEMA_STRING_HERE>\n\nOriginal Source Type: <ORIGINAL_SOURCE_TYPE_PROVIDED_TO_YOU>\nOriginal Source Identifier: <ORIGINAL_SOURCE_IDENTIFIER_PROVIDED_TO_YOU>\n\nRaw Property Data Snippet:\n<RAW_PROPERTY_DATA_SNIPPET_JSON_STRING_HERE>\n\nNow, provide ONLY the mapped JSON object string:\n\n"
}
]
},
{
"role": "model",
"parts": [
{
"text": "<GOLD_STANDARD_CANONICALRECORD_JSON_STRING>"
}
]
}
]
}
Instructions for You (the Data Generator LLM):
- You will be given:
ORIGINAL_SOURCE_TYPE: e.g., "api", "file", "scrape"ORIGINAL_SOURCE_IDENTIFIER: e.g., "https://dummyjson.com/products/1", "property_data.csv-row-5", "https://www.baania.com/..."RAW_PROPERTY_DATA_SNIPPET: The actual JSON/dictionary data extracted from the source.
- Construct the "user" text: Carefully substitute the placeholders
<ORIGINAL_SOURCE_TYPE_PROVIDED_TO_YOU>,<ORIGINAL_SOURCE_IDENTIFIER_PROVIDED_TO_YOU>,<PASTE_THE_FULL_JSON_SCHEMA_STRING_HERE>, and<RAW_PROPERTY_DATA_SNIPPET_JSON_STRING_HERE>in the user prompt template. - Construct the "model" text: This part must be a JSON string representing the fully mapped
CanonicalRecord.- Set
original_source_typeandoriginal_source_identifierin the JSON string to the values provided. - Set
entity_typeto "RealEstateListing". - Set
canonical_record_idto a placeholder like "cre-GENERATED-UUID". - Set
raw_source_datain the JSON string to theRAW_PROPERTY_DATA_SNIPPETyou were given. - Meticulously map all other fields. Perform transformations. Use
nullfor missing optional data.
- Set
- Ensure the "model" part is a valid JSON string that, when parsed, conforms to the
CanonicalRecordschema.
Now, I will provide you with the input parts. Generate the fine-tuning example JSON using the JSON MAPPING meta-prompt.
INPUT FOR YOU (THE GENERATOR LLM):
ORIGINAL_SOURCE_TYPE: "scrape"ORIGINAL_SOURCE_IDENTIFIER: "https://www.baania.com/some-property-link-v2"RAW_PROPERTY_DATA_SNIPPET(as a JSON string for easy insertion into the user prompt):{ "projectName": "The Oasis Residence", "propertyType": "Condominium", "priceInfo": { "amount": 7800000, "unit": "THB" }, "location": { "building": "Tower A", "street": "Sukhumvit Soi 31", "district": "Wattana", "city": "Bangkok", "postalCode": "10110" }, "attributes": { "num_bedrooms": "2 Bedrooms", "num_bathrooms": "2", "area": "75 SQ.M.", "floor": "15th" }, "title": "Luxury 2 Bed Condo, High Floor, Sukhumvit", "full_description": "Modern luxury condominium unit on a high floor in Tower A of The Oasis Residence. Offers stunning city views, two spacious bedrooms, and contemporary finishes. Prime location in Sukhumvit 31, Wattana. Excellent amenities including pool, gym, and 24-hour security. Built 2018.", "images": ["https://cdn.baania.com/img/condo/A1.jpg", "https://cdn.baania.com/img/condo/A2.jpg"], "contact": {"agency": "Urban Living Thailand", "tel": "+66-81-234-5678"}, "date_online": "2024-02-10T00:00:00Z" }
**Expected Output from Generator LLM (for JSONL or JSONLines Mapping in Conversational Format):**
A valid JSON Lines (.jsonl) file:
Consists of multiple JSON objects, each on its own line.
Each line must be a complete JSON object (not arrays or multi-line blocks).
There must be no line breaks inside a single JSON object (unless escaped properly as \n).
```jsonl
{"contents":[{"role":"user","parts":[{"text":"You are a data mapping assistant. Your task is to map the provided 'Raw Property Data Snippet' to the 'CanonicalRecord Schema Definition'. \n\nRULES:\n- Your entire response must be ONLY the mapped JSON object string, conforming strictly to the CanonicalRecord schema.\n- Do NOT include any explanatory text before or after the JSON object.\n- Set 'original_source_type' to: scrape\n- Set 'original_source_identifier' to: https://www.baania.com/some-property-link-v2\n- Set 'entity_type' to 'RealEstateListing'.\n- For the 'raw_source_data' field in the output, include the exact 'Raw Property Data Snippet' you were given.\n- Perform necessary data transformations (e.g., string prices to numbers, parse dates, extract address components).\n- If information for a canonical field is not present, use `null` or omit optional fields as per the schema.\n\nCanonicalRecord Schema Definition:\n{\"title\": \"CanonicalRecord\", \"type\": \"object\", \"properties\": {\"canonical_record_id\": {\"type\": \"string\", \"description\": \"Unique identifier for this canonical record.\", \"examples\": [\"cre-SOME_UUID\"]}, \"original_source_identifier\": {\"type\": \"string\", \"description\": \"Identifier of the original source (e.g., URL, filename + row index).\"}, \"original_source_type\": {\"type\": \"string\", \"enum\": [\"api\", \"file\", \"scrape\"], \"description\": \"Type of the original source adapter ('api', 'file', 'scrape').\"}, \"entity_type\": {\"type\": \"string\", \"enum\": [\"RealEstateListing\", \"NewsArticle\", \"Other\"], \"default\": \"Other\", \"description\": \"Classification of the source entity.\"}, \"mapping_model_version\": {\"type\": [\"string\", \"null\"], \"description\": \"Version identifier of the ML model used for mapping.\"}, \"mapping_timestamp\": {\"type\": \"string\", \"format\": \"date-time\", \"description\": \"Timestamp (UTC) when the mapping was performed.\"}, \"address\": {\"title\": \"Address\", \"type\": [\"object\", \"null\"], \"properties\": {\"street_address\": {\"type\": [\"string\", \"null\"]}, \"city\": {\"type\": [\"string\", \"null\"]}, \"state_province\": {\"type\": [\"string\", \"null\"]}, \"postal_code\": {\"type\": [\"string\", \"null\"]}, \"country\": {\"type\": [\"string\", \"null\"], \"default\": \"USA\"}}}, \"features\": {\"title\": \"Features\", \"type\": [\"object\", \"null\"], \"properties\": {\"bedrooms\": {\"type\": [\"integer\", \"null\"]}, \"bathrooms\": {\"type\": [\"number\", \"null\"]}, \"area_sqft\": {\"type\": [\"number\", \"null\"], \"description\": \"Area in square feet.\"}, \"lot_size_sqft\": {\"type\": [\"number\", \"null\"], \"description\": \"Lot size in square feet.\"}, \"year_built\": {\"type\": [\"integer\", \"null\"]}, \"property_type\": {\"type\": [\"string\", \"null\"]}, \"has_pool\": {\"type\": [\"boolean\", \"null\"]}, \"has_garage\": {\"type\": [\"boolean\", \"null\"]}, \"stories\": {\"type\": [\"integer\", \"null\"]}}}, \"listing\": {\"title\": \"Listing Information\", \"type\": [\"object\", \"null\"], \"properties\": {\"price\": {\"type\": [\"number\", \"null\"]}, \"currency\": {\"type\": [\"string\", \"null\"], \"default\": \"USD\"}, \"listing_status\": {\"type\": [\"string\", \"null\"]}, \"listing_type\": {\"type\": [\"string\", \"null\"]}, \"listed_date\": {\"type\": [\"string\", \"null\", \"string\"], \"format\": \"date-time\"}, \"last_updated_date\": {\"type\": [\"string\", \"null\", \"string\"]}, \"listing_url\": {\"type\": [\"string\", \"null\", \"string\"]}, \"mls_id\": {\"type\": [\"string\", \"null\"]}}}, \"agent\": {\"title\": \"Agent Information\", \"type\": [\"object\", \"null\"], \"properties\": {\"name\": {\"type\": [\"string\", \"null\"]}, \"phone\": {\"type\": [\"string\", \"null\"]}, \"email\": {\"type\": [\"string\", \"null\"]}, \"brokerage_name\": {\"type\": [\"string\", \"null\"]}}}, \"description\": {\"type\": [\"string\", \"null\"]}, \"image_urls\": {\"type\": [\"array\", \"null\"], \"items\": {\"type\": \"string\", \"format\": \"uri\"}}, \"raw_source_data\": {\"type\": [\"object\", \"null\"], \"description\": \"Original source data record (JSON representation).\"}}, \"required\": [\"original_source_identifier\", \"original_source_type\", \"entity_type\"]}\n\nOriginal Source Type: scrape\nOriginal Source Identifier: https://www.baania.com/some-property-link-v2\n\nRaw Property Data Snippet:\n{\"projectName\": \"The Oasis Residence\", \"propertyType\": \"Condominium\", \"priceInfo\": {\"amount\": 7800000, \"unit\": \"THB\"}, \"location\": {\"building\": \"Tower A\", \"street\": \"Sukhumvit Soi 31\", \"district\": \"Wattana\", \"city\": \"Bangkok\", \"postalCode\": \"10110\"}, \"attributes\": {\"num_bedrooms\": \"2 Bedrooms\", \"num_bathrooms\": \"2\", \"area\": \"75 SQ.M.\", \"floor\": \"15th\"}, \"title\": \"Luxury 2 Bed Condo, High Floor, Sukhumvit\", \"full_description\": \"Modern luxury condominium unit on a high floor in Tower A of The Oasis Residence. Offers stunning city views, two spacious bedrooms, and contemporary finishes. Prime location in Sukhumvit 31, Wattana. Excellent amenities including pool, gym, and 24-hour security. Built 2018.\", \"images\": [\"https://cdn.baania.com/img/condo/A1.jpg\", \"https://cdn.baania.com/img/condo/A2.jpg\"], \"contact\": {\"agency\": \"Urban Living Thailand\", \"tel\": \"+66-81-234-5678\"}, \"date_online\": \"2024-02-10T00:00:00Z\"}"}]},{"role":"model","parts":[{"text":"{\"canonical_record_id\": \"cre-GENERATED-UUID\", \"original_source_identifier\": \"https://www.baania.com/some-property-link-v2\", \"original_source_type\": \"scrape\", \"entity_type\": \"RealEstateListing\", \"mapping_model_version\": null, \"mapping_timestamp\": \"2024-02-10T00:00:00Z\", \"address\": {\"street_address\": \"Sukhumvit Soi 31\", \"city\": \"Bangkok\", \"state_province\": \"Wattana\", \"postal_code\": \"10110\", \"country\": \"Thailand\"}, \"features\": {\"bedrooms\": 2, \"bathrooms\": 2.0, \"area_sqft\": 807.29, \"lot_size_sqft\": null, \"year_built\": 2018, \"property_type\": \"Condominium\", \"has_pool\": null, \"has_garage\": null, \"stories\": null}, \"listing\": {\"price\": 7800000.0, \"currency\": \"THB\", \"listing_status\": null, \"listing_type\": null, \"listed_date\": \"2024-02-10T00:00:00Z\", \"last_updated_date\": null, \"listing_url\": null, \"mls_id\": null}, \"agent\": {\"name\": null, \"phone\": \"+66-81-234-5678\", \"email\": null, \"brokerage_name\": \"Urban Living Thailand\"}, \"description\": \"Luxury 2 Bed Condo, High Floor, Sukhumvit. Modern luxury condominium unit on a high floor in Tower A of The Oasis Residence. Offers stunning city views, two spacious bedrooms, and contemporary finishes. Prime location in Sukhumvit 31, Wattana. Excellent amenities including pool, gym, and 24-hour security. Built 2018.\", \"image_urls\": [\"https://cdn.baania.com/img/condo/A1.jpg\", \"https://cdn.baania.com/img/condo/A2.jpg\"], \"raw_source_data\": {\"projectName\": \"The Oasis Residence\", \"propertyType\": \"Condominium\", \"priceInfo\": {\"amount\": 7800000, \"unit\": \"THB\"}, \"location\": {\"building\": \"Tower A\", \"street\": \"Sukhumvit Soi 31\", \"district\": \"Wattana\", \"city\": \"Bangkok\", \"postalCode\": \"10110\"}, \"attributes\": {\"num_bedrooms\": \"2 Bedrooms\", \"num_bathrooms\": \"2\", \"area\": \"75 SQ.M.\", \"floor\": \"15th\"}, \"title\": \"Luxury 2 Bed Condo, High Floor, Sukhumvit\", \"full_description\": \"Modern luxury condominium unit on a high floor in Tower A of The Oasis Residence. Offers stunning city views, two spacious bedrooms, and contemporary finishes. Prime location in Sukhumvit 31, Wattana. Excellent amenities including pool, gym, and 24-hour security. Built 2018.\", \"images\": [\"https://cdn.baania.com/img/condo/A1.jpg\", \"https://cdn.baania.com/img/condo/A2.jpg\"], \"contact\": {\"agency\": \"Urban Living Thailand\", \"tel\": \"+66-81-234-5678\"}, \"date_online\": \"2024-02-10T00:00:00Z\"}}"}]}]}
GEnerate 50 token with many details, include sample where multiple data sources (api, file, scrape)