diff --git a/Chapters/Chapter_05.tex b/Chapters/Chapter_05.tex new file mode 100644 index 0000000..10d0f8f --- /dev/null +++ b/Chapters/Chapter_05.tex @@ -0,0 +1,330 @@ +%====================================================================== +\chapter{AI Component Design} +%====================================================================== + +This chapter describes how Artificial Intelligence (AI) components are designed and +integrated into the overall system. Each section starts with a clear objective and provides +guidance on analysis, design, implementation, and evaluation of AI modules. + +%------------------------------ +\section{Business Context and AI Integration} +%------------------------------ + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/system-workflow.png} + \caption{Overview system workflow} + \label{fig:system-workflow} +\end{figure} + +Figure~\ref{fig:system-workflow} shows the overview of the system workflow and how each component works together. It illustrates that AI will be used in four components: Data Integration Pipeline, Explainable Price Prediction, Property and Neighborhood Insight, and Pricing Explanation. + +For each component, I will explain why using AI is both feasible and necessary. + +\subsection*{1. Data Integration Pipeline} +Using AI is suitable in this case because content on websites varies by type of information and other elements, so AI needs to be implemented to help parse and understand the context and extract information. + +When aggregating multiple data sources, differences in data schemas present a significant challenge. We need to establish a centralized data schema, and AI can help users accomplish this more easily by suggesting related fields. + +For web scraping, the problem is complex and constantly changing because website structures vary across multiple sites and are difficult to parse with fixed rules. For file and API sources, the challenge is less complex since we can extract data from files and fetch data from API endpoints directly. However, complexity increases when combining these sources into one unified pipeline. + +We can accept some incompleteness in this module. Missing fields in the aggregated data will not significantly affect the analysis when working with large volumes of data. + +\subsection*{2. Explainable Price Prediction} +AI is appropriate for price prediction because housing prices are influenced by numerous factors that interact in complex, non-linear ways. Traditional rule-based systems cannot effectively capture these intricate relationships. + +The housing market constantly evolves with changing economic conditions, buyer preferences, and seasonal variations. AI models can adapt to these shifts by identifying new patterns in the data that might not be obvious to human analysts. + +We can accept predictions that are not 100\% accurate since real estate valuation inherently involves some uncertainty. However, by using explainable AI approaches, we can provide confidence intervals and explain the key factors influencing each prediction, making the system valuable even with some margin of error. + +\subsection*{3. Property and Neighborhood Insight} +AI is well-suited for generating property and neighborhood insights because this requires analyzing diverse data types including text descriptions, images, geographic information, and numerical data. The relationships between these varied data sources are complex and difficult to codify with traditional rules. + +The characteristics that make a neighborhood desirable change over time and vary across different buyer segments. AI can identify emerging trends and personalized insights that static analysis would miss. + +Perfect completeness is not essential, as providing valuable insights on the most significant factors affecting property desirability is more important than capturing every minor detail. Users benefit from focused, relevant information rather than exhaustive analysis. + +\subsection*{4. Pricing Explanation} +Using AI for pricing explanation is appropriate because explaining property valuations involves communicating complex relationships between numerous factors in an accessible way. These explanations need to adapt to each property's unique characteristics and the specific market context. + +The relative importance of different pricing factors changes across markets and over time. AI can generate contextual explanations that reflect these dynamics rather than relying on fixed templates. + +While we can accept explanations that might not cover every possible factor influencing price, the system provides significant value by highlighting the most important considerations and presenting them in a clear, understandable format for users. + +\newpage + +%------------------------------ +\section{Goal Hierarchy} +%------------------------------ + +This section outlines the hierarchical structure of goals for our real estate valuation system, beginning with organizational objectives and flowing down to specific AI model goals. For each level, We provide clear metrics to measure success. + +\subsection{Organizational Goals} + +The primary organizational goals for this AI system are: + +\begin{itemize} + \item To establish a trusted platform for accurate and transparent real estate valuations + \item To build a reputation for innovation in applying AI to real estate challenges +\end{itemize} + +Success at the organizational level will be measured through: + +\begin{itemize} + \item Recognition through awards +\end{itemize} + +\subsection{System Goals} + +At the system level, our goals are: + +\begin{itemize} + \item To integrate diverse real estate data sources into a unified, high-quality dataset + \item To generate accurate property valuations with clear explanations for price factors + \item To provide insightful property and neighborhood analysis to inform decision-making + \item To deliver a reliable, responsive user experience across different devices and user types +\end{itemize} + +System success metrics include: + +\begin{itemize} + \item System uptime and response time statistics + \item Data completeness and quality indicators across different property types and regions + \item System error rates and exception handling effectiveness + \item System scalability under increasing data volumes and user numbers +\end{itemize} + +\subsection{User Goals} + +For our users, the goals are: + +\begin{itemize} + \item To receive insightful property valuations they can trust for decision-making + \item To understand the key factors influencing property prices in specific markets + \item To gain insights about properties and neighborhoods beyond basic statistics + \item To save time in researching and analyzing real estate opportunities +\end{itemize} + +User success metrics will be assessed through: + +\begin{itemize} + \item User satisfaction surveys and net promoter scores +\end{itemize} + +\subsection{AI Model Goals} + +At the most specific level, our AI model goals are: + +\begin{itemize} + \item To accurately predict property prices with clearly quantified confidence levels + \item To identify and weigh the most influential factors affecting property values + \item To provide explanations for predictions that are both technically sound and accessible to non-technical users +\end{itemize} + +AI model performance will be measured through: + +\begin{itemize} + \item Statistical accuracy metrics including mean absolute error, root mean squared error, and R-squared value compared to actual transaction prices + \item Explanation quality assessed through user comprehension testing + \item Comparative performance against benchmark models and traditional valuation methods + \item Drift detection metrics to identify when model retraining is necessary + \item Computational efficiency metrics including inference time and resource utilization +\end{itemize} + +%------------------------------ +\section{Task Requirements Analysis Using AI Canvas} +%------------------------------ + +This section presents a detailed analysis of our AI system's task requirements using the AI Canvas methodology. This approach helps us clearly articulate what each AI component should accomplish, how it will operate technically, and the conditions under which it will function. + +\subsection{AI Task Requirements} +For each AI component in our system, we analyze three key dimensions: +\begin{itemize} + \item \textbf{Requirements (REQ):} The specific objectives and functions each AI component must fulfill + \item \textbf{Specifications (SPEC):} The technical approach and methods the AI will employ + \item \textbf{Environment (ENV):} The operational conditions, constraints, and context in which the AI will function +\end{itemize} + +\subsubsection*{1. Data Integration Pipeline} + +\textbf{Requirements (REQ):} The AI component must extract structured data from diverse sources including property listing websites, government databases, and unstructured documents. It needs to identify property characteristics, pricing information, location data, and temporal attributes across varying formats. The system must normalize extracted data into a unified schema. + +\textbf{Specifications (SPEC):} This component will implement a hybrid approach combining rule-based extraction for standardized sources and deep learning models for unstructured content. Named entity recognition models will identify property attributes within text, while transformer-based architectures will handle context-dependent extraction tasks. + +\textbf{Environment (ENV):} The component will operate in a data environment characterized by high heterogeneity across sources, frequent changes in website structures, and varying data quality. It must process large volumes of daily updates while maintaining extraction accuracy. The system requires periodic retraining as new data sources are integrated and reasonable human oversight for exception handling of complex cases that fall outside standard patterns. + +\subsubsection*{2. Explainable Price Prediction} + +\textbf{Requirements (REQ):} This AI component must generate accurate property valuation estimates with quantified confidence intervals. It needs to identify key value drivers specific to each property, adapt to regional market dynamics, and provide predictions that remain reliable even with incomplete input data. The system must detect when it lacks sufficient information for reliable prediction and communicate these limitations transparently. + +\textbf{Specifications (SPEC):} The prediction engine will utilize an ensemble approach combining gradient-boosted decision trees for tabular data with graph neural networks to capture neighborhood relationships and spatial dependencies. SHAP values and counterfactual explanations will provide feature importance analysis. + +\textbf{Environment (ENV):} This component operates within a complex market environment. + +\subsubsection*{3. Property and Neighborhood Insight} + +\textbf{Requirements (REQ):} The AI must analyze multilayered neighborhood data to identify significant patterns and trends relevant to property valuation. It needs to generate insights about local amenities, infrastructure developments, community characteristics, and future growth potential. + +\textbf{Specifications (SPEC):} This component will utilize multimodal analysis techniques combining geospatial machine learning for location-based features with natural language processing for textual descriptions and sentiment analysis. The system will implement clustering algorithms to identify comparable neighborhoods and temporal modeling to detect emerging trends. + +\textbf{Environment (ENV):} The insight generation occurs in an information-rich but fragmented environment with data spanning multiple domains including transportation, education, commerce, and safety. The component must operate with varying data availability across neighborhoods and maintain cultural sensitivity when analyzing community characteristics. It needs to distinguish between transient and persistent neighborhood attributes while balancing detail with relevance in its output. + +\subsubsection*{4. Pricing Explanation} + +\textbf{Requirements (REQ):} This component must translate complex valuation models into clear, intuitive explanations. It needs to communicate uncertainty appropriately, highlight the most influential factors affecting a specific property's value, and provide comparative analysis with similar properties. The system should generate explanations that help users understand market dynamics without overwhelming them with technical details. + +\textbf{Specifications (SPEC):} The explanation system will implement a natural language generation pipeline built on a large language model fine-tuned for real estate domain knowledge. It will incorporate techniques from the field of explainable AI including feature attribution visualization, importance ranking. The system will use templates for consistent structure while leveraging dynamic content generation for property-specific insights. + +\textbf{Environment (ENV):} This component operates at the interface between technical prediction systems and non-technical users. + +\newpage + +\subsection{AI Canvas Development} + +\noindent Figure \ref{fig:ai-canvas} presents an AI Canvas diagram for an Explainable Price Prediction \& Context-Aware Analytics system tailored to the Thai real estate market. This strategic planning tool articulates how artificial intelligence creates value through structured components that guide implementation. + +The AI Canvas comprises eight interconnected sections that collectively define the system's purpose and operation. The Prediction section establishes the core functionality: estimating property market values based on input features, providing confidence intervals to quantify uncertainty, and delivering accessible explanations to users. This works in concert with the Judgment section, which articulates the critical trade-offs the system must evaluate, focusing on assessing prediction reliability through model confidence metrics, data completeness evaluation, and deviation analysis from comparable properties. + +The Action section defines how the system's outputs are translated into tangible steps, displaying valuation results with confidence levels, highlighting the top contributing features to predicted values, and explaining results through user-friendly descriptions. These actions lead to the Outcome section, which clarifies the ultimate value proposition: enabling users to make informed real estate decisions across pricing, investing, and negotiations, while simultaneously collecting usage statistics for ongoing system optimization. + +The Input Data section catalogues the available information sources: property features including location, size, type, age, and amenities; environmental data encompassing infrastructure, flood zones, and news; and market data covering recent sales and current listings. Complementing this, the Training Data section defines the labeled examples powering the model: historical sales and rental prices that have undergone cleaning, normalization, and imputation processes to handle data quality issues. + +The Feedback section outlines how the model will learn over time by tracking metrics like average deviation between predictions and actual selling prices. The Intervention section establishes boundaries for human oversight, calling for expert involvement when predictions show low confidence or high deviation, while implementing fallback mechanisms when prediction certainty falls below acceptable thresholds. The Explanation section details the technical approaches for transparency: SHAP or LIME-based visual feature attribution combined with Large Language Models to generate intuitive explanations. + +\newpage + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/ai-canvas.png} + \caption{Overview system workflow} + \label{fig:ai-canvas} +\end{figure} + +\newpage + +%------------------------------ +\section{User Experience Design with AI} +%------------------------------ + +This section explains how AI is used to improve the user experience in our real estate system. AI is not just a background tool—it helps users by doing tasks, adding helpful info, giving smart suggestions, and adjusting to user needs. These features are built into key parts of the system, like data handling, neighborhood analysis, and price prediction. The design focuses on being clear, easy to use, and useful for different types of users. + +\subsection{Automated Pipelines} + +The pipeline system in BorBann is not just a backend tool. It’s a dynamic, AI interface that enables even non-technical users to build, manage, and leverage data workflows for training personalized real estate models. + +The system is designed using an \textbf{Automate} style that automatically handles repetitive technical tasks—like schema inference and spider generation—but still gives users the power to manually adjust the process. + +\subsubsection*{1. Pipeline Creation and Management} + +Users start by selecting data sources—like websites, files, or APIs—and use a no-code interface to configure their pipeline. The system generates scraping rules using an LLM and recommends schema alignments automatically. + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/pipeline-1.png} + \caption{Pipeline Creation Interface} + \label{fig:pipeline-creation-ui} +\end{figure} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/pipeline-3.png} + \caption{Pipeline Creation Interface – Additional Prompt} + \label{fig:pipeline-creation-ui-2} +\end{figure} + +Figures \ref{fig:pipeline-creation-ui} and \ref{fig:pipeline-creation-ui-2} show how users input data source URLs and set extraction options with help from the AI. + +\subsubsection*{2. Field Customization and Schema Annotation} + +After setting up the source, users can view, modify, or remove fields detected by the system. The interface visually maps fields across sources, includes data type validation, and supports custom field creation via formulas. + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/pipeline-4.png} + \caption{Field Management Interface} + \label{fig:pipeline-creation-ui-3} +\end{figure} + +Figure \ref{fig:pipeline-creation-ui-3} shows the functionalities to customize the schema and data source integration. + +\subsubsection*{3. Pipeline Monitoring and Status Overview} + +Users can view a dashboard that summarizes the status of each pipeline—categorized as Active, Paused, Failed, or Completed. Each pipeline card is annotated with insights. + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/pipeline-6.png} + \caption{Pipeline Dashboard Interface} + \label{fig:pipeline-creation-ui-4} +\end{figure} + +Figure \ref{fig:pipeline-creation-ui-4} shows how the interface represents pipeline statuses; successful pipelines are highlighted with green-bordered cards. + +\subsection{Neighborhood Insight} + +The Neighborhood Insight feature uses various APIs and tools to analyze environmental conditions, nearby facilities, local amenities, and even news sentiment. The goal is to support smarter decisions by making the neighborhood story as transparent and rich as the property details. + +This feature uses \textbf{Annotate}, which overlays real-time and historical context—like flood risk, school quality, and air pollution levels—into property views; and \textbf{Automate}, where the agent automatically pulls data from multiple sources and updates insights with minimal input. + +\subsubsection*{1. Local Context Analytics Interface} + +When users view a property, the system automatically presents key neighborhood metrics, including: + +\begin{itemize} + \item Flood risk history and air quality trends + \item Distance and quality scores for nearby schools, hospitals, and transit points + \item Local news sentiment +\end{itemize} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/insight-1.png} + \caption{Environmental Impact Analysis} + \label{fig:insight-ui-1} +\end{figure} + +Figure \ref{fig:insight-ui-1} shows the analytics interface consisting of environmental data, pricing, and property-specific details along with nearby facilities. + +\subsection{Explainable Price Prediction} + +This feature blends predictive modeling with real-time visualizations and natural language explanations. Unlike typical black-box systems, BorBann’s model lets users interact with the logic behind the number. Users can explore, adjust, and challenge the AI's assumptions. + +The user experience combines: + +\begin{itemize} + \item \textbf{Annotate} – each prediction includes visual and textual breakdowns of contributing factors. + \item \textbf{Prompt} – the system suggests related factors for simulating “what-if” scenarios. + \item \textbf{Automate} – the system recalibrates predictions as more data is integrated. +\end{itemize} + +\subsubsection*{1. Prediction Overview} + +When a user selects a property, the system immediately displays: + +\begin{itemize} + \item Predicted price range (upper and lower bounds) + \item Confidence interval + \item Base prediction with timestamp of latest data sync +\end{itemize} + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/price-prediction-1.png} + \caption{Prediction Overview Interface} + \label{fig:price-prediction-ui-1} +\end{figure} + +Figure \ref{fig:price-prediction-ui-1} shows the price range, confidence bands, and a clean summary layout. + +\subsubsection*{2. Feature Contribution Analysis} + +The system explains which factors influenced the prediction most — such as location, property size, developer reputation, and local context. + +\begin{figure}[htbp] + \centering + \includegraphics[width=1\textwidth]{assets/ai/price-prediction-2.png} + \caption{Feature Contribution Analysis Interface} + \label{fig:price-prediction-ui-2} +\end{figure} + +Figure \ref{fig:price-prediction-ui-2} shows an interface that visualizes the impact of each factor using bar graphs and percentage values. diff --git a/assets/ai/ai-canvas.png b/assets/ai/ai-canvas.png new file mode 100644 index 0000000..edf0916 Binary files /dev/null and b/assets/ai/ai-canvas.png differ diff --git a/assets/ai/insight-1.png b/assets/ai/insight-1.png new file mode 100644 index 0000000..759e8c6 Binary files /dev/null and b/assets/ai/insight-1.png differ diff --git a/assets/ai/pipeline-1.png b/assets/ai/pipeline-1.png new file mode 100644 index 0000000..14b2808 Binary files /dev/null and b/assets/ai/pipeline-1.png differ diff --git a/assets/ai/pipeline-2.png b/assets/ai/pipeline-2.png new file mode 100644 index 0000000..10f841e Binary files /dev/null and b/assets/ai/pipeline-2.png differ diff --git a/assets/ai/pipeline-3.png b/assets/ai/pipeline-3.png new file mode 100644 index 0000000..fced106 Binary files /dev/null and b/assets/ai/pipeline-3.png differ diff --git a/assets/ai/pipeline-4.png b/assets/ai/pipeline-4.png new file mode 100644 index 0000000..e5dc578 Binary files /dev/null and b/assets/ai/pipeline-4.png differ diff --git a/assets/ai/pipeline-5.png b/assets/ai/pipeline-5.png new file mode 100644 index 0000000..a401494 Binary files /dev/null and b/assets/ai/pipeline-5.png differ diff --git a/assets/ai/pipeline-6.png b/assets/ai/pipeline-6.png new file mode 100644 index 0000000..95e0962 Binary files /dev/null and b/assets/ai/pipeline-6.png differ diff --git a/assets/ai/price-prediction-1.png b/assets/ai/price-prediction-1.png new file mode 100644 index 0000000..5a3e996 Binary files /dev/null and b/assets/ai/price-prediction-1.png differ diff --git a/assets/ai/price-prediction-2.png b/assets/ai/price-prediction-2.png new file mode 100644 index 0000000..20db671 Binary files /dev/null and b/assets/ai/price-prediction-2.png differ diff --git a/assets/system-workflow.png b/assets/system-workflow.png new file mode 100644 index 0000000..1af1c97 Binary files /dev/null and b/assets/system-workflow.png differ diff --git a/document.pdf b/document.pdf index 0a36428..3b44614 100644 Binary files a/document.pdf and b/document.pdf differ diff --git a/document.tex b/document.tex index 714e03d..875bf9d 100644 --- a/document.tex +++ b/document.tex @@ -38,6 +38,7 @@ \include{Chapters/Chapter_02} \include{Chapters/Chapter_03} \include{Chapters/Chapter_04} +\include{Chapters/Chapter_05} \renewcommand*{\bibname}{References} \bibliographystyle{IEEEtran}