revisions of the revisions

add signed cover page
feat: add honors council representative and update department name in thesis
2026-05-08 15:18:54 -04:00 · 2026-04-30 00:19:02 -04:00 · 2026-04-29 12:42:41 -04:00 · 2026-04-21 11:00:20 -04:00 · 2026-04-21 00:25:54 -04:00
16 changed files with 300 additions and 99 deletions
@@ -12,6 +12,7 @@
 *.synctex.gz
 *.dvi
 *.pdf
 !pdfs/**
 !thesis/pdfs/**
 # Build directory
@@ -36,6 +36,7 @@
 \setlength{\parskip}{0.2in}
 \newcommand{\advisor}[1]{\newcommand{\advisorname}{#1}}
 \newcommand{\advisorb}[1]{\newcommand{\advisornameb}{#1}}
 \newcommand{\honorscouncilrep}[1]{\newcommand{\honorscouncilrepname}{#1}}
 \newcommand{\chair}[1]{\newcommand{\chairname}{#1}}
 \newcommand{\department}[1]{\newcommand{\departmentname}{#1}}
 \newcommand{\butitle}[1]{\newcommand{\titletext}{#1}}
@@ -114,33 +115,46 @@ in Partial Fulfillment of the Requirements for the Degree of\\
 \today
 \end{center}
-
+\vspace{0.03in}
 {\small
 \ifthenelse{\boolean{@twoadv}}{
 \vspace{0.25in}
 Approved: \hspace{0.2in}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\advisorname\\
 \mbox{\hspace{1.3in}}Thesis Advisor
-\vspace{0.25in}
+\vspace{0.03in}
 \mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\advisornameb\\
-\mbox{\hspace{1.3in}}Second Reader
+\mbox{\hspace{1.3in}}Reader
-\vspace{0.25in}
+\vspace{0.03in}
 \mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\chairname\\
-\mbox{\hspace{1.3in}}Chair of the Department of \departmentname}
+\mbox{\hspace{1.3in}}Chair of the Department of \departmentname
-{\vspace{1.0in}
+\vspace{0.03in}
-Approved: \hspace{0.2in}\underline{\hspace{2.5in}}\\
+\mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\honorscouncilrepname\\
 \mbox{\hspace{1.3in}}Honors Council Representative}
 {Approved: \hspace{0.2in}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\advisorname \\
 \mbox{\hspace{1.3in}}Thesis Advisor
-\vspace{0.5in}
+\vspace{0.03in}
 \mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\advisornameb\\
 \mbox{\hspace{1.3in}}Reader
 \vspace{0.03in}
 \mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\chairname\\
-\mbox{\hspace{1.3in}}Chair of the Department of \departmentname}
+\mbox{\hspace{1.3in}}Chair of the Department of \departmentname
 \vspace{0.03in}
 \mbox{\hspace{1.0in}}\underline{\hspace{2.5in}}\\
 \mbox{\hspace{1.3in}}\honorscouncilrepname\\
 \mbox{\hspace{1.3in}}Honors Council Representative}
 }
 \end{singlespace}
 \vfill
 \end{titlepage}}
@@ -29,4 +29,4 @@ The central question this thesis addresses is: \emph{can the right software arch
 \section{Chapter Summary}
-This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the \emph{Accessibility Problem}: high technical barriers limit participation by non-programmers. The second is the \emph{Reproducibility Problem}: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I posed the central research question and described how this thesis addresses it through formal design, a reference implementation, and a pilot validation study. The next chapters establish the technical and methodological foundations.
+This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the \emph{Accessibility Problem}: high technical barriers limit participation by non-programmers. The second is the \emph{Reproducibility Problem}: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I posed the central research question and described how this thesis addresses it through formal design, a reference implementation, and a pilot validation study.
@@ -17,9 +17,55 @@ Choregraphe \cite{Pot2009}, developed by Aldebaran Robotics for the NAO and Pepp
 Recent years have seen renewed interest in comprehensive WoZ frameworks. Gibert et al. \cite{Gibert2013} developed the Super Wizard of Oz (SWoOZ) platform. This system integrates facial tracking, gesture recognition, and real-time control capabilities to enable naturalistic human-robot interaction studies. Virtual and augmented reality have also emerged as complementary approaches to WoZ. Helgert et al. \cite{Helgert2024} demonstrated how VR-based WoZ environments can simplify experimental setup while providing researchers with precise control over environmental conditions and high-fidelity data collection.
-This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. By methodological rigor, I refer to systematic features that guide experimenters toward best practices: consistently following experimental protocols, maintaining comprehensive logging, and producing reproducible experimental designs.
+This expanding landscape reveals a persistent fundamental gap in the design space of WoZ tools. Flexible, general-purpose platforms like Polonius and OpenWoZ offer powerful capabilities but present high technical barriers. Accessible, user-friendly tools like WoZ4U and Choregraphe lower those barriers but sacrifice cross-platform compatibility and longevity. Newer approaches such as VR-based frameworks attempt to bridge this gap, yet no existing tool successfully combines accessibility, flexibility, deployment portability, and built-in methodological rigor. 
-Moreover, few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity---that is, whether observed outcomes can be attributed to the intended experimental manipulation rather than to uncontrolled variation in wizard behavior. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
+\begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
    scale=1.0,
    quadbox/.style={rectangle, draw=white, ultra thick, minimum width=5.5cm, minimum height=4.5cm, align=center},
    title/.style={font=\small\bfseries, align=center},
    desc/.style={font=\footnotesize, text=gray!60, align=center},
    axislabel/.style={font=\small\bfseries, align=center}
 ]
    % Quadrant Backgrounds
    \fill[gray!20] (0, 4.5) rectangle (5.5, 9.0);   % Top Left (HRIStudio)
    \fill[gray!15] (5.5, 4.5) rectangle (11.0, 9.0); % Top Right (Polonius)
    \fill[gray!10] (0, 0) rectangle (5.5, 4.5);      % Bottom Left (WoZ4U)
    \fill[gray!5] (5.5, 0) rectangle (11.0, 4.5);    % Bottom Right (Choregraphe)
    % Quadrant Lines
    \draw[white, ultra thick] (5.5, 0) -- (5.5, 9.0);
    \draw[white, ultra thick] (0, 4.5) -- (11.0, 4.5);
    % Axis Labels
    \node[axislabel, above] at (2.75, 9.2) {Low technical barrier};
    \node[axislabel, above] at (8.25, 9.2) {High technical barrier};
    \node[axislabel, left] at (-0.2, 6.75) {More rigorous};
    \node[axislabel, left] at (-0.2, 2.25) {Less rigorous};
    % Top Left: The Gap
    \node[axislabel] at (2.75, 6.75) {\Huge ?};
    % Top Right: Polonius, OpenWoZ, SWoOZ
    \node[title] at (8.25, 7.4) {Polonius, OpenWoZ\\SWoOZ, VR Environments};
    \node[desc] at (8.25, 6.0) {Flexible and powerful,\\but requires significant\\programming expertise};
    % Bottom Left: WoZ4U
    \node[title] at (2.75, 2.7) {WoZ4U};
    \node[desc] at (2.75, 1.7) {Accessible, but\\platform-specific\\No methodological rigor};
    % Bottom Right: Choregraphe
    \node[title] at (8.25, 2.7) {Choregraphe};
    \node[desc] at (8.25, 1.7) {Requires specialized\\training\\No methodological rigor};
 \end{tikzpicture}
 \caption{WoZ tool design space by technical barrier and methodological rigor.}
 \label{fig:tool-matrix}
 \end{figure}
 The missing quadrant in Figure~\ref{fig:tool-matrix} matters because methodological rigor requires systematic features that guide experimenters toward best practices: consistently following experimental protocols, maintaining comprehensive logging, and producing reproducible experimental designs. Few platforms directly address the methodological concerns raised by systematic reviews of WoZ research. Riek's influential analysis \cite{Riek2012} of 54 HRI studies uncovered widespread inconsistencies in how wizard behaviors were controlled and reported. Very few studies documented standardized wizard training procedures or measured wizard error rates, raising questions about internal validity---that is, whether observed outcomes can be attributed to the intended experimental manipulation rather than to uncontrolled variation in wizard behavior. The tools themselves often exacerbate this problem: poorly designed interfaces increase cognitive load on wizards, leading to timing errors and behavioral inconsistencies that can confound experimental results. Recent work by Strazdas et al. \cite{Strazdas2020} further demonstrates the importance of careful interface design in WoZ systems, showing that intuitive wizard interfaces directly improve both the quality of robot behavior and the reliability of collected data.
 \section{Requirements for Modern WoZ Infrastructure}
@@ -5,9 +5,10 @@ Having established the landscape of existing WoZ platforms and their limitations
 \section{Sources of Variability}
-\emph{The Reproducibility Problem}, as introduced in Chapter~\ref{ch:intro}, encompasses two related challenges. The first concerns \emph{execution consistency}: whether a wizard reliably follows the same experimental script across multiple trials with different participants, producing comparable robot behavior in each. The second concerns \emph{cross-platform reproducibility}: whether the same experiment can be transferred to a different robot platform with minimal change to the implementing program. Both stem from gaps in current WoZ infrastructure and are examined in this chapter. A third interpretation of the term — independent replication of a published study by researchers at other institutions — is distinct from both and is not what this thesis evaluates. It is also worth noting that execution consistency, as defined here, corresponds to what the measurement literature sometimes calls \emph{repeatability}: the degree to which the same procedure produces consistent results when repeated across multiple trials of the same study.
+\emph{The Reproducibility Problem}, as introduced in Chapter~\ref{ch:intro}, encompasses two related challenges. The first concerns \emph{execution consistency}: whether a wizard reliably follows the same experimental script across multiple trials with different participants, producing comparable robot behavior in each. The second concerns \emph{cross-platform reproducibility}: whether the same experiment can be transferred to a different robot platform with minimal change to the implementing program. Both stem from gaps in current WoZ infrastructure and are examined in this chapter. It is important to note that the term reproducibility may also refer to \emph{allowing independent replications of published studies}; this is not what this thesis evaluates. Execution consistency, as defined here, corresponds to what the measurement literature sometimes calls \emph{repeatability}: the degree to which the same procedure produces consistent results when repeated across multiple trials of the same study.
-In WoZ-based HRI studies, multiple sources of variability can compromise execution consistency. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants. Even with a detailed script, the wizard may vary in timing, with delays between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may act differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.
+In WoZ-based HRI studies, multiple sources of variability can compromise execution consistency. The wizard is simultaneously the strength and weakness of the WoZ paradigm. While human control enables sophisticated, adaptive interactions, it also introduces inconsistency. Consider a wizard conducting multiple trials of the same experiment with different participants.
 Even with a detailed script, the wizard may vary in timing, with the delay between a participant's action and the robot's response fluctuating based on the wizard's attention, fatigue, or interpretation of when to act. When a script allows for choices, different wizards may make different selections, or the same wizard may act differently across trials. Furthermore, a wizard may accidentally skip steps, trigger actions in the wrong order, or misinterpret experimental protocols.
 Riek's systematic review \cite{Riek2012} found that very few published studies reported measuring wizard error rates or providing standardized wizard training. Without such measures, it becomes impossible to determine whether experimental results reflect the intended interaction design or inadvertent variations in wizard behavior.
@@ -13,7 +13,33 @@ Figure~\ref{fig:experiment-hierarchy} shows this hierarchical structure. Reading
 Figure~\ref{fig:trial-instantiation} illustrates how a protocol definition relates to its instantiation. The left column holds the protocol, defined before the study begins; the right column shows how the abstraction defined as a protocol is instantiated as independent trials. A dashed line marks the protocol/trial boundary: everything to its left was authored by the researcher before any participant arrived; everything to its right was generated during a live session. The \textit{instantiates} arrows from the experiment node fan out to each trial record, making the relationship explicit. This separation is central to reproducibility: the same experiment specification generates a distinct, timestamped record per participant, so researchers can compare across participants without conflating what was designed with what was executed.
-To illustrate the hierarchy with a concrete example, consider an interactive storytelling study with the research question: \emph{Does how the robot tells a story affect how a human will remember the story?} The two experiments use different robots: the NAO6, a humanoid robot with expressive gestures and a human-like form, and the TurtleBot, a wheeled mobile robot that is visibly machine-like with no social movement cues. The narrative task remains the same across both experiments; only how the robot delivers it changes.
+To illustrate the hierarchy with a concrete example, consider an interactive storytelling study with the research question: \emph{Does how the robot tells a story affect how a human will remember the story?} The experiment might use different robots, for instance Pepper, NAO6, and TurtleBot. Figure~\ref{fig:robot-morphologies} shows the morphology of these three different robots: Pepper and NAO6 are humanoid social robots with expressive gestures and human-like forms, while TurtleBot is a wheeled mobile robot with a visibly machine-like form and no social movement cues. In the example below, the narrative task remains the same across two robot-specific experiments; only how the robot delivers it changes.
 \begin{figure}[htbp]
 \centering
 \begin{subfigure}[b]{0.3\textwidth}
    \centering
    \includegraphics[width=\textwidth]{images/nao6.jpg}
    \caption{NAO6 (Humanoid)}
    \label{fig:robot-nao}
 \end{subfigure}
 \hfill
 \begin{subfigure}[b]{0.3\textwidth}
    \centering
    \includegraphics[width=\textwidth]{images/pepper.png}
    \caption{Pepper (Social)}
    \label{fig:robot-pepper}
 \end{subfigure}
 \hfill
 \begin{subfigure}[b]{0.3\textwidth}
    \centering
    \includegraphics[width=\textwidth]{images/turtlebot.png}
    \caption{TurtleBot (Mechanical)}
    \label{fig:robot-turtlebot}
 \end{subfigure}
 \caption{Three robot morphologies supported by the HRIStudio architecture.}
 \label{fig:robot-morphologies}
 \end{figure}
 Figure~\ref{fig:example-hierarchy} maps the study presented above onto the hierarchical elements defined in Figure~\ref{fig:experiment-hierarchy}. The study branches into two experiments (TurtleBot with only voice, NAO6 with added gestures), each experiment uses the same sequence of ordered steps (Intro, Story Telling, Recall Test), and each step defines the specific actions the robot will perform. The figure expands only the Story Telling step to keep the diagram readable, but Intro and Recall Test follow the same structure.
@@ -267,7 +293,7 @@ This separation of concerns provides two concrete benefits. First, each layer ca
 \centering
 \begin{tikzpicture}[
    layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm},
-    arrow/.style={->, thick, line width=1.5pt}]
+    arrow/.style={-, thick, line width=1.5pt}]
    % Layer 1: UI
    \node[layer, fill=gray!15] (ui) at (0, 3.5) {
@@ -288,8 +314,8 @@ This separation of concerns provides two concrete benefits. First, each layer ca
    };
    % Arrows (bidirectional)
-    \draw[<->, thick, line width=1.5pt] (ui.south) -- (logic.north);
+    \draw[-, thick, line width=1.5pt] (ui.south) -- (logic.north);
-    \draw[<->, thick, line width=1.5pt] (logic.south) -- (data.north);
+    \draw[-, thick, line width=1.5pt] (logic.south) -- (data.north);
 \end{tikzpicture}
 \caption{Three-layer architecture separates user interface, application logic, and data/robot control.}
@@ -300,7 +326,7 @@ This separation of concerns provides two concrete benefits. First, each layer ca
 During the design phase, researchers create experiment specifications that are stored in the system database. During a trial, the system manages bidirectional communication between the wizard's interface and the robot control layer. All actions, sensor data, and events are streamed to a data logging service that stores complete records. After the trial, researchers can inspect these records through the Analysis interface.
-The flow of data during a trial proceeds through six distinct phases, as shown in Figure~\ref{fig:trial-dataflow}:
+The flow of data during a trial proceeds through six distinct phases as discussed below; these phases are summarized in Figure~\ref{fig:trial-dataflow}:
 \begin{enumerate}
 \item A researcher creates an experiment protocol using the Design interface.
@@ -335,7 +361,7 @@ This design creates automatically a comprehensive documentation of every trial,
    \draw[arrow] (s5.south) -- (s6.north);
 \end{tikzpicture}
-\caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
+\caption{Six-phase trial data flow.}
 \label{fig:trial-dataflow}
 \end{figure}
@@ -7,11 +7,81 @@ HRIStudio is a complete, operational platform that realizes the design principle
 HRIStudio follows the model of a web application. Users access it through a standard browser without installing specialized software, and the entire study team, including researchers, wizards, and observers, connect to the same shared system. This eliminates the need for a local installation and ensures the platform works identically on any operating system, directly addressing the low-technical-barrier requirement (R2, from Chapter~\ref{ch:background}). It also enables easy collaboration (R6): multiple team members can access experiment data and observe trials simultaneously from different machines without any additional configuration.
-I organized the system into three layers: User Interface, Application Logic, and Data \& Robot Control. This layered structure is presented in Chapter~\ref{ch:design} and shown in Figure~\ref{fig:three-tier}. In practice, the User Interface layer runs in each researcher's browser (the client), while the Application Logic and Data \& Robot Control layers run on a shared application server. It is essential that this server and the robot control hardware run on the same local network. This keeps communication latency low during trials: a noticeable delay between the wizard's input and the robot's response would break the interaction.
+I organized the system into three layers: User Interface, Application Logic, and Data \& Robot Control. This layered structure is presented in Chapter~\ref{ch:design} and shown in Figure~\ref{fig:three-tier}. In practice, the User Interface layer runs in each researcher's browser (the client), while the Application Logic and Data \& Robot Control layers run on a shared application server. 
 While the system can run entirely on a single machine for local testing, this architecture allows the components to be distributed across different systems. The application server can be hosted centrally or even in a remote data center, enabling observers to connect to a live trial from any location with internet access. In such a configuration, it is essential that the robot control hardware and the client computer running the wizard's Execution interface stay on the same local network as the robot. This ensures that the WebSocket-based communication between the wizard and the robot bridge maintains low latency, as a noticeable delay between the wizard's input and the robot's response would break the interaction.
 This flexibility of deployment also addresses the varying data security and compliance needs of different research institutions. A lab may choose to host HRIStudio on a public-facing server to prioritize collaborative ease and accessibility for remote team members. Alternatively, a lab with strict data privacy requirements or institutional review board (IRB) constraints can deploy the entire stack on a private, air-gapped network. Because the platform is self-contained and does not rely on external cloud services for its core execution logic, researchers have full control over where their experimental data is stored and who can access it.
 I implemented all three layers in the same language: TypeScript~\cite{TypeScript2014}, a statically-typed superset of JavaScript. The single-language decision keeps the type system consistent across the full stack. When the structure of experiment data changes, the type checker surfaces inconsistencies across the entire codebase at compile time rather than allowing them to appear as runtime failures during a trial.
-HRIStudio is released as open-source software under the MIT License, with the application hosted at a public repository~\cite{HRIStudioRepo} and the companion robot plugin repository hosted separately~\cite{RobotPluginsRepo}. Both are available for inspection, extension, and deployment by other research groups.
+HRIStudio is released as open-source software under the MIT License, with the application hosted at a public repository~\cite{HRIStudioRepo}. The companion robot plugin repository~\cite{RobotPluginsRepo} is maintained as a git submodule and is updated whenever HRIStudio requires schema or protocol updates. Both repositories are available for inspection, extension, and deployment by other research groups.
 HRIStudio is implemented as a set of containerized services that work together to provide the platform's functionality. This modular architecture ensures that each component can be scaled or replaced independently as requirements change.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
    node distance=0.8cm and 1.8cm,
    servicebox/.style={rectangle, draw=black, thick, fill=gray!15, align=center, font=\small, inner sep=5pt, minimum width=2.2cm},
    containerbox/.style={rectangle, draw=black, thick, dashed, fill=gray!5, align=center, font=\small\bfseries, inner sep=12pt},
    wsbox/.style={rectangle, draw=black, ultra thick, fill=white, align=center, font=\scriptsize\bfseries, inner sep=3pt},
    arrow/.style={->, thick, >=stealth},
    darrow/.style={<->, thick, >=stealth, dashed},
    labelstyle/.style={font=\scriptsize\itshape, align=center}
 ]
    % HRIStudio System Container Services
    \node[servicebox] (nextjs) {Next.js\\Server};
    \node[servicebox, below=of nextjs] (postgres) {PostgreSQL\\Database};
    \node[servicebox, below=of postgres] (minio) {MinIO\\Object Storage};
    \draw[arrow] (nextjs) -- (postgres);
    \draw[arrow] (nextjs) -- (minio);
    % HRIStudio Container Boundary
    \begin{scope}[on background layer]
        \node[containerbox, fit=(nextjs) (postgres) (minio), inner sep=15pt] (hri_cont) {};
        \node[anchor=south, font=\small\bfseries, yshift=2pt] at (hri_cont.north) {HRIStudio System};
    \end{scope}
    % NAO6 Integration Bridge Container Services
    \node[servicebox, right=4.5cm of nextjs] (driver) {NAOqi\\Driver};
    \node[servicebox, below=of driver] (ros) {ROS 2\\Core};
    \node[servicebox, below=of ros] (adapter) {HRIStudio\\Adapter};
    \draw[darrow] (driver) -- (ros);
    \draw[darrow] (ros) -- (adapter);
    % Bridge Container Boundary
    \begin{scope}[on background layer]
        \node[containerbox, fit=(driver) (ros) (adapter), inner sep=15pt] (bridge_cont) {};
        \node[anchor=south, font=\small\bfseries, yshift=2pt] at (bridge_cont.north) {NAO6 Bridge};
    \end{scope}
    % Client/Wizard
    \node[servicebox] (client) at ($(hri_cont.north)!0.5!(bridge_cont.north) + (0, 2.2)$) {Wizard Browser};
    % WebSocket Connections
    \node[wsbox] (sys_ws) at ($(client.south)!0.5!(hri_cont.north)$) {System WebSocket};
    \node[wsbox] (robot_ws) at ($(client.south)!0.5!(bridge_cont.north)$) {Robot WebSocket};
    \draw[darrow] (client.south) -- (sys_ws.north);
    \draw[darrow] (sys_ws.south) -- (hri_cont.north);
    \draw[darrow] (client.south) -- (robot_ws.north);
    \draw[darrow] (robot_ws.south) -- (bridge_cont.north);
    % Hardware
    \node[servicebox, right=1.5cm of bridge_cont] (robot) {NAO6\\Robot};
    \draw[arrow] (bridge_cont.east) -- node[above, font=\scriptsize, align=center] {NAOqi\\API} (robot.west);
 \end{tikzpicture}
 \caption{Containerized HRIStudio and NAO6 integration architecture.}
 \label{fig:system-architecture}
 \end{figure}
 The HRIStudio system consists of three primary services: a Next.js application server that handles the user interface and business logic, a PostgreSQL database for persistent storage of experiment and trial data, and a MinIO object storage service for managing large media files like video and audio recordings. For robot integration, the \texttt{nao6-hristudio-integration} bridge also employs a containerized structure consisting of the NAOqi driver, a ROS 2 core for message routing, and a specialized adapter that communicates with HRIStudio.
 During a live trial, the wizard's browser establishes two independent WebSocket connections. The System WebSocket connects to the HRIStudio server to manage trial state, protocol progression, and logging. The Robot WebSocket connects directly to the integration bridge to provide low-latency control of the robot platform. This split-connection model ensures that system-level management does not introduce latency into the robot's physical responses.
 \subsection{Working with AI Coding Assistants}
 \label{sec:ai-ws}
@@ -126,7 +196,7 @@ Figure~\ref{fig:execution-view} shows the Execution interface as it appears to a
 \section{Robot Integration}
-A plugin file describes each robot platform, listing the actions it supports and specifying how each one maps to a command the robot understands. The execution engine reads this file at startup and uses it whenever it needs to dispatch a command: it looks up the action type, assembles the appropriate message, and sends it to the robot over a bridge process running on the local network. The web server itself has no knowledge of any specific robot; all hardware-specific logic lives in the plugin file.
+A plugin file describes each robot platform, listing the actions it supports and specifying how each one maps to a command the robot understands. The execution engine reads this file at startup and uses it whenever it needs to dispatch a command: it looks up the action type, assembles the appropriate message, and sends it to the robot over a bridge process running on the local network. For the NAO6 platform, I developed a specialized ROS-based bridge called \texttt{nao6-hristudio-integration}~\cite{NaoIntegrationRepo} that translates HRIStudio commands into the NAOqi API calls required by the robot. The web server itself has no knowledge of any specific robot; all hardware-specific logic lives in the plugin file.
 The execution engine treats control flow elements such as branches and conditionals, which function as elements of a computer program, the same way as robot actions. These control-flow elements appear as action groups in the experiment and are evaluated during the trial, so researchers can freely mix logical decisions and physical robot behaviors when designing an experiment without any special handling.
@@ -177,6 +247,10 @@ Figure~\ref{fig:plugin-architecture} illustrates this mapping using NAO6 and Tur
 \label{fig:plugin-architecture}
 \end{figure}
 \subsection{Containerized Development Environment}
 To support development and testing for the NAO platform, I also developed \texttt{nao-workspace}, a containerized workspace~\cite{NaoWorkspaceRepo}. This was motivated by the technical constraints of Choregraphe and its related libraries, which only supported x86-64 systems running Ubuntu 22.04. The containerized structure was the only way I could run the proprietary NAO development tools on modern hardware. While I developed this stack primarily to enable technical testing and material preparation during the project, the resulting tooling may be useful to other HRI researchers facing similar platform constraints.
 \section{Access Control}
 I implemented access control using a role-based access control (RBAC) model with two layers. System-level roles govern what a user can do across the platform (administrator, researcher, wizard, observer), while study-level roles govern what a user can see and do within a specific study (owner, researcher, wizard, observer). The two layers are checked independently, so a user who is a wizard on one study can be an observer on another without any additional configuration. Within a study, the four study-level roles define a clear separation of capabilities: those who own the study, those who design it, those who run it, and those who observe it. This enforces need-to-know access at the study level so that each team member sees or is able to modify only what their role requires. The capabilities and constraints for each role are described below:
@@ -13,7 +13,7 @@ I hypothesized that HRIStudio would improve both accessibility and reproducibili
 \section{Study Design}
-I used what Bartneck et al.~\cite{Bartneck2024} call a between-subjects design, in which each participant is assigned to only one condition. To ensure that programming experience was balanced across conditions, I stratified assignment by self-reported programming background: each wizard was first classified into one of three strata (\emph{None}, \emph{Moderate}, or \emph{Extensive} programming experience), and then randomly assigned within their stratum to one of the two conditions (HRIStudio or Choregraphe). This produced a design in which each condition contained exactly one wizard at each experience level, allowing the tool effect to be evaluated without confounding from the distribution of programming experience. Both groups received the same task, the same time allocation, and a similar training structure. Measuring each participant in only one condition prevents carryover effects, meaning performance changes caused by prior exposure to another condition rather than by the assigned condition itself.
+I used what Bartneck et al.~\cite{Bartneck2024} call a \emph{between-subjects design}, in which each participant is assigned to only one condition. To ensure that programming experience was balanced across conditions, I stratified assignment by self-reported programming background: each wizard was first classified as having \emph{None}, \emph{Moderate}, or \emph{Extensive} programming experience, and then randomly assigned within that stratum to HRIStudio or Choregraphe. This produced a design in which each condition contained exactly one wizard at each experience level, reducing the risk that tool effects would be confused with differences in programming experience. Both groups received the same task, the same time allocation, and a similar training structure. Because each wizard used only one tool, the design also avoided carryover effects from prior exposure to the other condition.
 \section{Participants}
@@ -5,37 +5,61 @@ This chapter presents the results of the pilot validation study described in Cha
 \section{Participant Overview}
-Table~\ref{tbl:sessions} summarizes the personas and their assigned conditions. Wizards are identified by code to protect confidentiality. All six participants were Bucknell University professors drawn from Computer Science, Chemical Engineering, Digital Humanities, and Logic and Philosophy of Science. Demographic information (programming background) was collected during recruitment.
+Table~\ref{tbl:sessions} summarizes the participants and their assigned conditions. Wizards are identified by code to protect confidentiality. All six participants were Bucknell University professors drawn from Computer Science, Chemical Engineering, Digital Humanities, and Logic and Philosophy of Science. Demographic information (programming background) was collected during recruitment.
 \begin{table}[htbp]
 \centering
 \footnotesize
-\begin{tabular}{|l|l|l|l|l|l|l|}
+\begin{tabular}{|l|l|l|l|}
 \hline
-\textbf{ID} & \textbf{Condition} & \textbf{Background} & \makecell[l]{\textbf{Programming}\\\textbf{Experience}} & \textbf{DFS} & \textbf{ERS} & \textbf{SUS} \\
+\textbf{ID} & \textbf{Condition} & \textbf{Background} & \makecell[l]{\textbf{Programming}\\\textbf{Experience}} \\
 \hline
-W-01 & Choregraphe & Digital Humanities & None & 42.5 & 65 & 60 \\
+W-01 & Choregraphe & Digital Humanities & None \\
 \hline
-W-02 & HRIStudio & Logic and Philosophy of Science & Moderate & 100 & 95 & 90 \\
+W-02 & HRIStudio & Logic and Philosophy of Science & Moderate \\
 \hline
-W-03 & Choregraphe & Computer Science & Extensive & 65 & 60 & 75 \\
+W-03 & Choregraphe & Computer Science & Extensive \\
 \hline
-W-04 & Choregraphe & Chemical Engineering & Moderate & 62.5 & 75 & 42.5 \\
+W-04 & Choregraphe & Chemical Engineering & Moderate \\
 \hline
-W-05 & HRIStudio & Chemical Engineering & None & 100 & 95 & 70 \\
+W-05 & HRIStudio & Chemical Engineering & None \\
 \hline
-W-06 & HRIStudio & Computer Science & Extensive & 100 & 100 & 70 \\
+W-06 & HRIStudio & Computer Science & Extensive \\
 \hline
 \end{tabular}
-\caption{Summary of wizard participants, assigned conditions, and scores.}
+\caption{Summary of wizard participants and assigned conditions.}
 \label{tbl:sessions}
 \end{table}
-This table also presents numerical data representing the study's results, which is discussed next.
+Table~\ref{tbl:primary-outcomes} presents the primary outcome scores, which are discussed next.
 \section{Primary Measures}
 \begin{table}[htbp]
 \centering
 \footnotesize
 \begin{tabular}{|l|l|r|r|r|}
 \hline
 \textbf{ID} & \textbf{Condition} & \textbf{DFS} & \textbf{ERS} & \textbf{SUS} \\
 \hline
 W-01 & Choregraphe & 42.5 & 65 & 60 \\
 \hline
 W-02 & HRIStudio & 100 & 95 & 90 \\
 \hline
 W-03 & Choregraphe & 65 & 60 & 75 \\
 \hline
 W-04 & Choregraphe & 62.5 & 75 & 42.5 \\
 \hline
 W-05 & HRIStudio & 100 & 95 & 70 \\
 \hline
 W-06 & HRIStudio & 100 & 100 & 70 \\
 \hline
 \end{tabular}
 \caption{Primary outcome scores by wizard and condition.}
 \label{tbl:primary-outcomes}
 \end{table}
 \subsection{Design Fidelity Score (DFS)}
 The Design Fidelity Score measures how completely and correctly each wizard implemented the written specification of their assigned experiment. Scores range from 0 to 100, with full points awarded only when a component — a rubric criterion representing a required speech action, gesture, or control-flow element — is both present and correct. (For a full description of rubric categories, see Section~\ref{sec:measures}.)
@@ -141,7 +165,7 @@ Figure~\ref{fig:results-chart} summarizes the three primary measures side-by-sid
    \node[anchor=west, font=\footnotesize] at (7.5, -1.125) {HRIStudio};
 \end{tikzpicture}
-\caption{Mean scores by condition across the three primary outcome measures. Within each group, the left bar is Choregraphe and the right bar is HRIStudio.}
+\caption{Mean scores by condition across the three primary outcome measures.}
 \label{fig:results-chart}
 \end{figure}
@@ -212,7 +236,7 @@ Figure~\ref{fig:timing-chart} compares the per-condition means for training, des
    \node[anchor=west, font=\footnotesize] at (7.5, -1.125) {HRIStudio};
 \end{tikzpicture}
-\caption{Mean phase durations (in minutes) by condition. Within each group, the left bar is Choregraphe and the right bar is HRIStudio.}
+\caption{Mean phase durations by condition.}
 \label{fig:timing-chart}
 \end{figure}
@@ -1,16 +1,16 @@
 \chapter{AI-Assisted Development Workflow}
 \label{app:ai_workflow}
-This appendix documents the role that AI coding assistants played in the construction of HRIStudio. It is included both for transparency about how the system was built and because the workflow itself is, in my view, one of the more interesting artifacts produced by the project. Section~\ref{sec:ai-ws} in Chapter~\ref{ch:implementation} introduces the topic briefly; here I describe the division of labor, the specific tools I used, the tasks each handled well, the limits I ran into, and the integrity controls I maintained between implementation work and the evaluation reported in Chapter~\ref{ch:results}.
+This appendix documents the role that AI coding assistants played in the construction of HRIStudio. It is included both for transparency about how the system was built and because the workflow itself is, in my view, one of the more interesting artifacts produced by the project. Section~\ref{sec:ai-ws} in Chapter~\ref{ch:implementation} introduces the topic briefly; here I describe the specific responsibilities I kept for myself, the tasks I delegated to coding agents, the tools I used, the limits I encountered, and the integrity controls I maintained between implementation work and the evaluation reported in Chapter~\ref{ch:results}.
 \section{Context}
 \label{sec:ai-context}
-HRIStudio was built by a single undergraduate in parallel with a full course load, a thesis writeup, and the pilot validation study described in Chapter~\ref{ch:evaluation}. The feature surface described in Chapters~\ref{ch:design} and~\ref{ch:implementation} is larger than what a solo developer on that schedule could reasonably have produced without assistance, and the deadline constraints did not allow for the kind of team that a system of this scope would normally involve. AI coding assistants made the scope tractable. They did not replace design judgment, but they substantially reduced the cost of the mechanical work that sits between a well-specified design and a working feature: scaffolding new modules, implementing well-defined CRUD and validation code, applying consistent patterns across files, and producing the many small edits that a project of this size accumulates.
+I built HRIStudio while also carrying a full course load, writing this thesis, and running the pilot validation study described in Chapter~\ref{ch:evaluation}. The feature surface described in Chapters~\ref{ch:design} and~\ref{ch:implementation} is larger than what I could reasonably have produced on that schedule without assistance, given both the scope and the level of ambition of the work. AI coding assistants made that scope tractable. They did not replace design judgment; they reduced the cost of the mechanical work that sits between a well-specified design and a working feature: scaffolding new modules, implementing well-defined create/read/update/delete (CRUD) and validation code, applying consistent patterns across files, and producing the many small edits that a project of this size accumulates.
-The set of tools available to a solo developer changed substantially during the project's timeline. When I began, agentic coding tools were still early and most of my AI use was conversational. By the end of the project, multiple mature terminal- and editor-integrated agents were available. I changed tools as the landscape evolved and used what was available to me at each point. Tools overlapped in places, but I generally used one at a time for a given task; I did not operate a fleet of agents in parallel or maintain a consistent pipeline across tools.
+The set of tools available to me as a solo developer changed substantially during the project's timeline. When I began, agentic coding tools were still early and most of my AI use was conversational, primarily through Cursor~\cite{CursorEditor} and Zed~\cite{ZedEditor}. By the end of the project, multiple mature terminal- and editor-integrated agents were available. I changed tools as the landscape evolved, eventually moving into a mixed workflow between Visual Studio Code, Antigravity~\cite{GoogleAntigravity}, Claude Code~\cite{AnthropicClaudeCode}, and OpenCode~\cite{OpenCode}.
-\section{Tools Used}
+\section{Tools and Hardware}
 \label{sec:ai-tools}
 Table~\ref{tbl:ai-tools} lists the tools I used during development and the capacity in which I used each. The split between them was determined partly by capability and partly by availability over time.
@@ -22,29 +22,31 @@ Table~\ref{tbl:ai-tools} lists the tools I used during development and the capac
 \hline
 \textbf{Tool} & \textbf{Category} & \textbf{Primary use} \\
 \hline
-Claude~\cite{Anthropic2024Claude} & Chat model & Design discussions, architectural review, debugging assistance, refactoring proposals, occasional help drafting commit messages. \\
+Claude~\cite{Anthropic2024Claude} & Chat model & Design discussions, architectural review, debugging assistance, and refactoring proposals. \\
 \hline
-Claude Code~\cite{AnthropicClaudeCode} & Terminal agent & Multi-file feature implementation against a written spec; codemod-style refactors; test scaffolding. \\
+Claude Code~\cite{AnthropicClaudeCode} & Terminal agent & Multi-file feature implementation against a written spec; codemod-style refactors; and test scaffolding. \\
 \hline
 OpenCode~\cite{OpenCode} & Terminal agent & Same class of task as Claude Code, used when I preferred an open-source workflow or a different backing model. \\
 \hline
 Gemini CLI~\cite{GeminiCLI} & Terminal agent & Occasional cross-check on changes produced by a different agent, and work against Google's models when I wanted a second reading of a larger diff. \\
 \hline
-Google Antigravity~\cite{GoogleAntigravity} & IDE agent & Editor-integrated agentic coding work, primarily late in the project as the tool became available. \\
+Antigravity~\cite{GoogleAntigravity} & IDE agent & Editor-integrated agentic coding work, primarily late in the project as the tool became available. \\
 \hline
-Zed~\cite{ZedEditor} & Editor & Day-to-day development environment; provided its own AI-assisted editing features alongside the agents listed above. \\
+Cursor~\cite{CursorEditor} & Editor & Early development; AI-native editing and indexing. \\
 \hline
 Zed~\cite{ZedEditor} & Editor & High-performance editing; transition phase before moving to specialized agents. \\
 \hline
 \end{tabular}
 \caption{AI tools used during HRIStudio development.}
 \label{tbl:ai-tools}
 \end{table}
-I did not use these tools as a coordinated pipeline. I used whichever one fit the task in front of me at the moment, with the set of options expanding as the year progressed. Some of the work overlaps between tools --- any of the agents can, in principle, produce the same diff for a well-scoped task --- but I generally used one at a time and did not run multiple agents against the same code simultaneously.
+Beyond cloud-hosted models, I experimented with local execution using \texttt{llama.cpp} to run various open-weights models on my local hardware (Apple M4 Pro, 14-core CPU, 48GB RAM). While the hardware was capable of running 7B and 14B parameter models with high throughput, the reasoning performance of the local models frequently lagged behind the state-of-the-art frontier models. I found that the additional cognitive overhead of correcting errors in local model output outweighed the benefits of offline execution, leading me to rely primarily on the cloud-hosted agents for complex implementation tasks.
 \section{Division of Responsibility}
 \label{sec:ai-division}
-My working rule throughout the project was that I did the engineering and the agents did the implementation. In practice, this meant that I was responsible for every decision that had downstream consequences for the shape of the system, and the agents were responsible for producing the code that realized those decisions. Concretely, I did the following work directly, without delegating it to an agent:
+My working rule throughout the project was for me to handle the engineering and for the agents to flesh out the implementation. In practice, this meant that I was responsible for every decision that had downstream consequences for the shape of the system, and the agents were responsible for producing code that realized those decisions. Concretely, I did the following work directly, without delegating it to an agent:
 \begin{itemize}
 \item \textbf{Architecture.} The three-tier structure described in Chapter~\ref{ch:design}, the separation between experiment specifications and trial records, the choice to route all robot communication through plugin files, and the overall shape of the event-driven execution model were mine. I wrote these decisions as prose before any code was written.
@@ -53,36 +55,24 @@ My working rule throughout the project was that I did the engineering and the ag
 \item \textbf{Research design.} The pilot validation study in Chapter~\ref{ch:evaluation} was designed and analyzed entirely by me. The Observer Data Sheet, Design Fidelity Score rubric, and Execution Reliability Score rubric were written by hand. No AI tool was used to score sessions, compute results, or draft claims about what the data showed.
-\item \textbf{The prose of this thesis.} Every chapter was written by me. AI tools occasionally helped me reword an awkward sentence or catch an inconsistency between sections, but the structure of the argument and the specific claims I make are my own.
+\item \textbf{The prose of this thesis.} Every chapter was written by me. The structure of the argument and the specific claims I make are my own. While AI assisted with the nuances of \LaTeX{} formatting (particularly the generation of TikZ diagrams and complex chart syntax), the content is mine.
 \end{itemize}
-The agents handled the work that sat inside those decisions: implementing tRPC procedures from a written signature, generating the Drizzle migration files that matched a schema change I had specified, producing React components from a layout sketch and a list of props, writing the serializer that turned a plugin definition into the JSON format the runtime expected, and applying consistent edits across files when I changed a shared interface. I read every diff before accepting it. When a diff was wrong, I either explained what was wrong and asked for a revision with specifics, or I discarded it and wrote the code myself.
+\section{Evolution of the Workflow}
 \section{A Representative Interaction Pattern}
 \label{sec:ai-pattern}
-The typical loop I followed for a medium-sized feature proceeded in five steps.
+My use of these tools changed over the course of the project, and evolved as the models improved. Early on, I treated the agent's output as a draft that required line-by-line review. The typical loop followed five steps: writing a specification, generating a diff, reading the diff, running the code, and then accepting or rejecting the change.
-First, I wrote the specification. This was usually a short markdown document I kept in a scratch file: a statement of what the feature should do, the tRPC procedure signature it would expose, the tables it would touch, the React components that would consume it, and the acceptance criteria that would let me know it was complete. Writing the specification was design work, and I did it myself.
+As the models improved and the agents became more reliable, the focus of my effort shifted. By the final stages of development, I spent significantly less time on manual line-by-line reviews and more time on empirical testing. I moved from being a ``code reviewer'' to a ``test-driven supervisor.'' If the agent produced a feature that passed my manual acceptance tests and integrated correctly with the existing system, I was more likely to accept the implementation without auditing every line in the program. This shift allowed me to increase the velocity of development significantly in the weeks leading up to the evaluation.
 Second, I handed the specification to an agent with the repository open. The agent read the relevant existing files, produced a diff that implemented the specification, and reported what it had done.
 Third, I read the diff. This step was non-negotiable: I did not accept code I had not read. For small changes I read directly; for larger ones I asked the agent for a summary first and then read the diff file by file.
 Fourth, I ran the code. I ran the development server, exercised the feature manually, checked the database state where relevant, and ran whatever tests existed. If the feature did not work, I returned to step three with a specific failure to investigate.
 Fifth, I either accepted the diff, asked for a revision, or discarded it. A revision request described the specific thing that was wrong, not a vague instruction to \textit{try again}. Discarding happened when the agent had misunderstood the specification in a way that made a revision more expensive than rewriting from scratch.
 This loop is unremarkable. It is the same loop I would follow if I were reviewing a pull request from a junior engineer. The key point is that the agent's output was treated as a draft pull request that I, as the engineer, either accepted, requested changes on, or rejected --- not as finished work.
 \section{What Worked and What Did Not}
 \label{sec:ai-limits}
-The tasks that agents handled well were those with a narrow and well-specified interface. Implementing a tRPC procedure from a signature, writing a Drizzle migration that matched a schema diff, adding a new field through an existing form, or applying a consistent rename across files --- these were cheap to specify and the agent's output was usually accepted on the first or second iteration. Agents were also good at scaffolding: producing the initial shape of a component, test file, or API route that I then edited to completion.
+The tasks that agents handled well were those with a narrow and well-specified interface. Implementing a tRPC procedure from a signature, writing a Drizzle migration that matched a schema diff, adding a new field through an existing form, or applying a consistent rename across files: these were cheap to specify and the agent's output was usually accepted on the first or second iteration. Agents were also good at scaffolding: producing the initial shape of a component, test file, or API route that I then edited to completion.
-The tasks that agents handled poorly were those that required reasoning across more of the system than the context window could hold, or that depended on a piece of context I had not written down. Cross-cutting changes to the experiment and trial data models, for example, required careful coordination across the schema, the tRPC procedures, the execution runtime, and the analysis interface; when I tried to delegate changes of this shape to an agent, the diffs were often locally plausible but globally inconsistent. I ended up doing that work myself. Subtle concurrency and timing questions in the execution layer were another category the agents did not handle well; the event-driven execution model in Chapter~\ref{ch:design} has enough non-obvious ordering constraints that an agent without the full picture tended to introduce races. Those parts of the codebase I wrote by hand.
+The tasks that agents handled poorly were those that required reasoning across more of the system than the context window could hold, or that depended on a piece of context I had not written down. Cross-cutting changes to the experiment and trial data models, for example, required careful coordination across the schema, the tRPC procedures, the execution runtime, and the analysis interface. When I tried to delegate changes of this shape to an agent, the diffs were often locally plausible but globally inconsistent; I ended up doing that work myself. Subtle concurrency and timing questions in the execution layer were another category the agents did not handle well. The event-driven execution model in Chapter~\ref{ch:design} has enough non-obvious ordering constraints that an agent without the full picture tended to introduce races; those parts of the codebase I wrote by hand.
-Across the full set of tools I used, the differences in capability for the work I asked of them were smaller than I expected. Any of the agents could, in principle, produce a correct diff for a well-scoped task, and when one tool failed it was usually because the task was underspecified rather than because of a difference in model capability. The practical differences between tools mattered more at the workflow level --- which shell integration I preferred, how the tool handled long diffs, how it behaved when it needed to ask for clarification --- than at the capability level.
+Across the full set of tools I used, the differences in capability for the work I asked of them were smaller than I expected. Any of the agents could, in principle, produce a correct diff for a well-scoped task, and when one tool failed it was usually because the task was underspecified rather than because of a difference in model capability. The practical differences between tools mattered more at the workflow level, such as which shell integration I preferred, how the tool handled long diffs, and how it behaved when it needed to ask for clarification, than at the capability level.
 \section{Research Integrity}
 \label{sec:ai-integrity}
@@ -94,7 +84,7 @@ Because this thesis reports an empirical evaluation, I treat the boundary betwee
 \item No AI tool produced the tables, means, or comparative claims in Chapter~\ref{ch:results}. The numbers were tabulated by hand from the completed Observer Data Sheets reproduced in Appendix~\ref{app:completed_materials}, and the claims about what those numbers support or do not support are mine.
-\item No AI tool drafted the prose of this thesis. The chapters were written by me, in my own voice, and I am responsible for every claim they make and every argument they advance. AI tools were occasionally used as a proofreading aid --- catching typos, flagging awkward phrasing, or suggesting an alternative word --- but the sentences are mine.
+\item No AI tool drafted the prose of this thesis. The chapters were written by me, in my own voice, and I am responsible for every claim they make and every argument they advance. AI tools were occasionally used as a proofreading aid to catch typos, flag awkward phrasing, or suggest an alternative word; however, the sentences are mine.
 \item The code that implements HRIStudio and that was the subject of the evaluation was written under the workflow described in Sections~\ref{sec:ai-division} and~\ref{sec:ai-pattern}. Agents produced drafts; I read, tested, and accepted or rejected every one. The final state of the code is the product of my engineering decisions, regardless of who wrote any particular line.
 \end{itemize}
@@ -84,12 +84,12 @@ The Next.js application server and the Bun WebSocket server run outside Docker o
 The NAO6 integration stack is defined in a separate repository and provides three ROS~2 services that collectively bridge HRIStudio to the physical robot.
 \begin{enumerate}
-\item The \textbf{nao\_driver} service runs the NaoQi driver ROS~2 node, which connects to the NAO's proprietary framework over the local network and publishes sensor data (joint states, camera feeds) as standard ROS~2 topics.
+\item The \textbf{nao\_driver} service runs the NAOqi driver ROS~2 node, which connects to the NAO's proprietary framework over the local network and publishes sensor data (joint states, camera feeds) as standard ROS~2 topics.
-\item The \textbf{ros\_bridge} service runs the rosbridge WebSocket server, which exposes all ROS~2 topics over a WebSocket interface on a configurable port (default~9090). This is the endpoint that the HRIStudio server connects to.
+\item The \textbf{ros\_bridge} service runs the \texttt{rosbridge} WebSocket server, which exposes all ROS~2 topics over a WebSocket interface on a configurable port (default~9090). This is the endpoint that the HRIStudio server connects to.
 \item The \textbf{ros\_api} service provides runtime introspection of available ROS~2 topics, services, and parameters.
 \end{enumerate}
-All three services are built from a single Dockerfile based on the ROS~2 Humble base image (Ubuntu~22.04). The image installs the NaoQi driver and rosbridge server packages along with their dependencies (NaoQi libraries, bridge message types, OpenCV bridge, and TF2) and builds them with colcon. All services use host networking so that ROS~2 discovery and the NaoQi connection operate without port forwarding.
+All three services are built from a single Dockerfile based on the ROS~2 Humble base image (Ubuntu~22.04). The image installs the NAOqi driver and \texttt{rosbridge} server packages along with their dependencies (NAOqi libraries, bridge message types, OpenCV bridge, and TF2) and builds them with colcon. All services use host networking so that ROS~2 discovery and the NAOqi connection operate without port forwarding.
 Before starting the driver, an initialization script connects to the NAO via SSH and prepares it for external control:
@@ -103,7 +103,7 @@ Environment variables for the robot IP address, credentials, and bridge port are
 \subsection{Communication Between Stacks}
-Figure~\ref{fig:deployment-arch} shows the relationship between the two Docker stacks and the components that run on the host. The HRIStudio server communicates with the robot integration stack over a single WebSocket connection to the \texttt{rosbridge\_websocket} endpoint. For actions that bypass ROS entirely (posture changes, animation playback), the server connects directly to the NAO via SSH and invokes NaoQi commands through the \texttt{qicli} command-line tool. Both communication paths are configured per-robot in the plugin file.
+Figure~\ref{fig:deployment-arch} shows the relationship between the two Docker stacks and the components that run on the host. The HRIStudio server communicates with the robot integration stack over a single WebSocket connection to the \texttt{rosbridge\_websocket} endpoint. For actions that bypass ROS entirely (posture changes, animation playback), the server connects directly to the NAO via SSH and invokes NAOqi commands through the \texttt{qicli} command-line tool. Both communication paths are configured per-robot in the plugin file.
 \begin{figure}[htbp]
 \centering
@@ -159,7 +159,7 @@ Figure~\ref{fig:deployment-arch} shows the relationship between the two Docker s
    %% ---- NAO Robot ----
    \node[box, fill=gray!40, minimum width=2.8cm] (nao) at (0, -0.8)
-        {NAO6 Robot\\[-1pt]{\scriptsize NaoQi}};
+        {NAO6 Robot\\[-1pt]{\scriptsize NAOqi}};
    %% ---- Arrows: browser to host ----
    \draw[arrow] (browser.south west) -- node[lbl, left] {HTTP} (nextjs.north);
@@ -231,13 +231,13 @@ Each action definition specifies:
 \item A ROS~2 dispatch block containing the target topic, message type, and a payload mapping.
 \end{itemize}
-The payload mapping supports two modes. In \emph{static} mode, the plugin defines a fixed message template with placeholder tokens (e.g., \texttt{\{\{text\}\}}) that the execution engine fills from the researcher's parameters. In \emph{SSH} mode, the action bypasses ROS entirely and executes a shell command on the robot via SSH; this is used for NaoQi-native operations such as posture changes and animation playback that are not exposed as ROS~2 topics.
+The payload mapping supports two modes. In \emph{static} mode, the plugin defines a fixed message template with placeholder tokens (e.g., \texttt{\{\{text\}\}}) that the execution engine fills from the researcher's parameters. In \emph{SSH} mode, the action bypasses ROS entirely and executes a shell command on the robot via SSH; this is used for NAOqi-native operations such as posture changes and animation playback that are not exposed as ROS~2 topics.
-The NAO6 plugin defines 20 actions across three categories: speech (say text, say with emotion), movement (walk forward/backward, turn, stop, wake up, rest, stand, sit, crouch), and animation (bow, wave, nod, head shake, shrug, enthusiastic gesture, and others). Movement actions publish ROS~2 Twist messages to the velocity command topic. Animation actions publish animation path strings to the animation topic. Posture and lifecycle commands use SSH mode to call NaoQi services directly via the \texttt{qicli} command-line tool.
+The NAO6 plugin defines 20 actions across three categories: speech (say text, say with emotion), movement (walk forward/backward, turn, stop, wake up, rest, stand, sit, crouch), and animation (bow, wave, nod, head shake, shrug, enthusiastic gesture, and others). Movement actions publish ROS~2 Twist messages to the velocity command topic. Animation actions publish animation path strings to the animation topic. Posture and lifecycle commands use SSH mode to call NAOqi services directly via the \texttt{qicli} command-line tool.
 \subsection{Adding a New Robot}
-Adding support for a new robot platform requires writing a single JSON plugin file and placing it in the repository. No changes to the HRIStudio server code are required. The plugin author defines the robot's capabilities, maps each action to a ROS~2 topic or SSH command, and specifies the parameter schema for each action. After the repository is synced, the new robot's actions appear in the experiment designer and can be used in any study.
+Adding support for a new robot platform requires writing a single JSON plugin file and placing it in the plugin repository. No changes to the HRIStudio server code are required. The plugin author defines the robot's capabilities, maps each action to a ROS~2 topic or SSH command, and specifies the parameter schema for each action. After the repository is synced, the new robot's actions appear in the experiment designer and can be used in any study.
 \section{Database Schema}
@@ -240,55 +240,79 @@ doi = {10.1201/9781498710411-35}
@misc{RobotPluginsRepo,
  author       = {O'Connor, Sean},
  title        = {{HRIStudio Robot Plugins Repository}},
-  howpublished = {GitHub repository},
+  howpublished = {GitHub repository, maintained as a submodule of HRIStudio},
  year         = {2026},
  url          = {https://github.com/soconnor0919/robot-plugins}
 }
@misc{NaoWorkspaceRepo,
  author       = {O'Connor, Sean},
  title        = {{nao-workspace: A Containerized Choregraphe Development Environment}},
  howpublished = {GitHub repository},
  year         = {2026},
  url          = {https://github.com/soconnor0919/nao-workspace}
 }
@misc{NaoIntegrationRepo,
  author       = {O'Connor, Sean},
  title        = {{nao6-hristudio-integration: ROS/NAOqi Bridge for HRIStudio}},
  howpublished = {GitHub repository},
  year         = {2026},
  url          = {https://github.com/soconnor0919/nao6-hristudio-integration}
 }
@misc{Anthropic2024Claude,
  author       = {{Anthropic}},
-  title        = {{Claude}},
+  title        = {{Claude 3.5 Sonnet}},
-  howpublished = {Large language model},
+  howpublished = {Large Language Model},
-  year         = {2024--2026},
+  year         = {2024},
  url          = {https://www.anthropic.com/claude}
 }
@misc{AnthropicClaudeCode,
  author       = {{Anthropic}},
  title        = {{Claude Code}},
-  howpublished = {Agentic coding assistant},
+  howpublished = {Agentic CLI Developer Tool},
-  year         = {2024--2026},
+  year         = {2025},
  url          = {https://www.anthropic.com/claude-code}
 }
@misc{OpenCode,
-  author       = {{sst}},
+  author       = {{SST}},
  title        = {{OpenCode}},
-  howpublished = {Open-source AI coding agent},
+  howpublished = {Open-source AI Coding Agent},
-  year         = {2024--2026},
+  year         = {2024},
  url          = {https://opencode.ai}
 }
@misc{GeminiCLI,
  author       = {{Google}},
  title        = {{Gemini CLI}},
-  howpublished = {Open-source AI agent},
+  howpublished = {Agentic CLI Developer Tool},
-  year         = {2025--2026},
+  year         = {2024},
  url          = {https://github.com/google-gemini/gemini-cli}
 }
@misc{GoogleAntigravity,
  author       = {{Google}},
  title        = {{Antigravity}},
-  howpublished = {Agentic development platform},
+  howpublished = {Integrated Agentic Development Environment},
-  year         = {2025--2026},
+  year         = {2025},
  url          = {https://antigravity.google}
 }
@misc{CursorEditor,
  author       = {{Anysphere}},
  title        = {{Cursor Code Editor}},
  howpublished = {AI-Native Code Editor},
  year         = {2023},
  url          = {https://cursor.com}
 }
@misc{ZedEditor,
  author       = {{Zed Industries}},
-  title        = {{Zed}},
+  title        = {{Zed Code Editor}},
-  howpublished = {Collaborative code editor},
+  howpublished = {High-performance Code Editor with AI Integration},
-  year         = {2023--2026},
+  year         = {2023},
  url          = {https://zed.dev}
 }
@@ -10,7 +10,7 @@
 \usepackage{makecell}             %Multi-line table header cells
 \usepackage{tabularx}             %Auto-width table columns
 \usepackage{tikz}                 %For programmatic diagrams
-\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds,decorations.pathreplacing}
+\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds,decorations.pathreplacing,calc}
 \usepackage[
    hidelinks,
    linktoc=all,
@@ -21,12 +21,13 @@
 \butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research}
 \author{Sean O'Connor}
 \degree{Bachelor of Science}
-\department{Computer Science}
+\department{Computer Science and Engineering}
 \advisor{L. Felipe Perrone}
-% \advisorb{Brian King}
+\advisorb{Brian King}
 \honorscouncilrep{Abigail Kopec}
 \chair{Alan Marchiori}
-\maketitle
+% \maketitle
-
+\includepdf[pages=-,pagecommand={}]{pdfs/CoverPage-Signed.pdf}
 \frontmatter
 \acknowledgments{
Author	SHA1	Message	Date
soconnor	5d8ef0ce76	revisions of the revisions Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 1m3s Details	2026-04-30 00:19:02 -04:00
soconnor	51009cd1ce	add signed cover page Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 1m42s Details	2026-04-29 12:42:41 -04:00
soconnor	28c852a867	feat: add honors council representative and update department name in thesis Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 1m4s Details	2026-04-21 11:00:20 -04:00
soconnor	1404945756	post-defense revisions complete Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 1m3s Details	2026-04-21 00:25:54 -04:00
`@@ -29,4 +29,4 @@ The central question this thesis addresses is: \emph{can the right software arch`

	`\section{Chapter Summary}`	`\section{Chapter Summary}`

	This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the \emph{Accessibility Problem}: high technical barriers limit participation by non-programmers. The second is the \emph{Reproducibility Problem}: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I posed the central research question and described how this thesis addresses it through formal design, a reference implementation, and a pilot validation study. The next chapters establish the technical and methodological foundations.	This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the \emph{Accessibility Problem}: high technical barriers limit participation by non-programmers. The second is the \emph{Reproducibility Problem}: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I posed the central research question and described how this thesis addresses it through formal design, a reference implementation, and a pilot validation study.