Enhance architectural design, implementation, and evaluation chapters with detailed specifications and pilot validation study

Refactor implementation and evaluation chapters for clarity and detail
- Revised the implementation chapter to emphasize HRIStudio as a reference implementation of design principles, detailing architectural choices and mechanisms. - Enhanced descriptions of platform architecture, experiment storage, execution engine, and access control. - Updated evaluation chapter to reflect the study as a pilot validation study, clarifying research questions, study design, participant roles, and measures. - Improved consistency in language and structure throughout both chapters. - Added details on participant recruitment and task specifications to better contextualize the study. - Adjusted measurement instruments table to align with the new chapter title. - Updated LaTeX document to include additional TikZ library for improved diagram capabilities.
2026-05-08 15:18:54 -04:00 · 2026-03-26 13:50:07 -04:00 · 2026-03-05 23:28:59 -05:00 · 2026-03-05 14:09:57 -05:00 · 2026-03-04 13:24:36 -05:00 · 2026-03-02 17:00:22 -05:00
10 changed files with 918 additions and 501 deletions
@@ -19,14 +19,14 @@ To address the accessibility and reproducibility problems in WoZ-based HRI resea
 This approach represents a shift from the current paradigm of custom, robot-specific tools toward a unified platform that can serve as shared infrastructure for the HRI research community. By treating experiment design, execution, and analysis as distinct but integrated phases of a study, such a framework can systematically address both technical barriers and sources of variability that currently limit research quality and reproducibility.
-The implementation of this approach, realized as HRIStudio, demonstrates the feasibility of web-based control for real-time robot interaction studies. HRIStudio is an open-source proof-of-concept implementation that validates the proposed framework and serves as the reference system evaluated in this thesis.
+The contributions of this thesis are the design principles of this approach, namely: a hierarchical specification model, an event-driven execution model, and a protocol/trial separation with explicit deviation logging. Together they form a coherent architecture for WoZ infrastructure that any implementation could adopt. The platform I developed, HRIStudio, is one implementation of this architecture: an open-source reference system that realizes those principles and serves as the instrument for empirical validation.
 \section{Research Objectives}
-This thesis builds upon foundational work presented in two prior peer-reviewed publications. Prof. Perrone and I first introduced the conceptual framework for HRIStudio at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}, establishing the vision for a collaborative, web-based platform. Subsequently, we published the detailed system architecture and a first prototype at RO-MAN 2025 \cite{OConnor2025}, validating the technical feasibility of web-based robot control. These publications form the foundation upon which this thesis asks its central research question: can a unified, web-based software framework for Wizard-of-Oz experiments measurably improve both disciplinary accessibility and scientific reproducibility of Human-Robot Interaction research compared to existing platform-specific tools?
+This thesis builds upon foundational work presented in two prior peer-reviewed publications. Prof. Perrone and I first introduced the conceptual framework for HRIStudio at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}, establishing the vision for a collaborative, web-based platform. Subsequently, we published the detailed system architecture and a first prototype at RO-MAN 2025 \cite{OConnor2025}, validating the technical feasibility of web-based robot control. Those publications established the vision and the prototype. This thesis formalizes the contribution: a set of design principles for WoZ infrastructure that simultaneously address the \textit{Accessibility} and \textit{Reproducibility} Problems, a reference implementation of those principles, and pilot empirical evidence that they produce measurably different outcomes in practice.
-To answer this question, this thesis validates the framework through a user study, in which I implement the architectural concepts from the prior work in a complete, functional software platform and evaluate it with real users. The study compares setup effort, protocol adherence, and usability between HRIStudio and a representative baseline. The successful demonstration of this approach would provide evidence that thoughtful software infrastructure can lower barriers to entry in HRI while simultaneously improving the methodological rigor of the field.
+The central question this thesis addresses is: \emph{can the right software architecture make Wizard-of-Oz experiments more accessible to non-programmers and more reproducible across participants?} To answer it, I propose a hierarchical, event-driven specification model that separates protocol design from trial execution, enforces action sequences, and logs deviations automatically; implement it as HRIStudio; and evaluate it in a pilot study comparing design fidelity and execution reliability against a representative baseline tool. The goal is not to prove a statistical effect at scale, but to establish directional evidence that the architecture changes what researchers can do and how consistently they can do it.
 \section{Chapter Summary}
-This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the accessibility problem: high technical barriers limit participation by non-programmers. The second is the reproducibility problem: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I articulated a central research question and outlined how this thesis validates that approach through implementation and a user study. To validate this approach, the next chapters establish the technical and methodological foundations.
+This chapter has established the context and objectives for this thesis. I identified two critical challenges facing WoZ-based HRI research. The first is the \emph{Accessibility Problem}: high technical barriers limit participation by non-programmers. The second is the \emph{Reproducibility Problem}: fragmented tooling makes results difficult to replicate across labs. I proposed a web-based framework approach that addresses these challenges through intuitive design interfaces, enforced experimental protocols, and platform-agnostic architecture. Finally, I posed the central research question (can a hierarchical, event-driven specification model with explicit deviation logging lower the technical barrier and improve reproducibility of WoZ experiments?) and described how this thesis addresses it through formal design, a reference implementation, and a pilot validation study. The next chapters establish the technical and methodological foundations.
@@ -9,7 +9,7 @@ As established in Chapter~\ref{ch:intro}, the WoZ technique enables researchers
 Over the last two decades, multiple frameworks to support and automate the WoZ paradigm have been reported in the literature. These frameworks can be broadly categorized based on their primary design emphases, generality, and the methodological practices they encourage. Foundational work by Steinfeld et al. \cite{Steinfeld2009} articulated the methodological importance of WoZ simulation, distinguishing between the human simulating the robot (Wizard of Oz) and the robot simulating the human. In the latter case (Oz of Wizard), the robot acts as if controlled by a person when it is actually autonomous. This distinction has influenced how subsequent tools approach the design and execution of WoZ experiments.
-Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. These systems were designed to work with multiple robot types rather than a single hardware platform. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating an accessibility problem for non-technical researchers.
+Early platform-agnostic tools focused on providing robust, flexible interfaces for technically sophisticated users. These systems were designed to work with multiple robot types rather than a single hardware platform. Polonius \cite{Lu2011}, built on the Robot Operating System (ROS) \cite{Quigley2009}, exemplifies this generation. It provides a graphical interface for defining finite state machine scripts that control robot behaviors, with integrated logging capabilities to streamline post-experiment analysis. The system was explicitly designed to enable robotics engineers to create experiments that their non-technical collaborators could then execute. However, the initial setup and configuration still required substantial programming expertise. Similarly, OpenWoZ \cite{Hoffman2016} introduced a cloud-based, runtime-configurable architecture using web protocols. Its design allows multiple operators or observers to connect simultaneously, and its plugin system enables researchers to extend functionality such as adding new robot behaviors or sensor integrations. Most importantly, OpenWoZ allows runtime modification of robot behaviors, enabling wizards to deviate from scripts when unexpected situations arise. While architecturally sophisticated and highly flexible, OpenWoZ requires programming knowledge to create custom behaviors and configure experiments, creating the \emph{Accessibility Problem} for non-technical researchers.
 A second wave of tools shifted focus toward usability, often achieving accessibility by coupling tightly with specific hardware platforms. WoZ4U \cite{Rietz2021} was explicitly designed as an ``easy-to-use'' tool for conducting experiments with Aldebaran's Pepper robot. It provides an intuitive graphical interface that allows non-programmers to design interaction flows, and it successfully lowers the technical barrier. However, this usability comes at the cost of generalizability. WoZ4U is unusable with other robot platforms, and manufacturer-provided software follows a similar pattern.
@@ -23,7 +23,7 @@ Moreover, few platforms directly address the methodological concerns raised by s
 \section{Requirements for Modern WoZ Infrastructure}
-This thesis represents the culmination of a multi-year research effort to develop infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:
+This thesis is the latest step in a multi-year effort to build infrastructure that addresses the challenges identified in the WoZ platform landscape. Based on the analysis of existing platforms and identified methodological gaps, I derived requirements for a modern WoZ research infrastructure. Through our preliminary work \cite{OConnor2024}, we identified six critical capabilities that a comprehensive platform should provide:
 \begin{description}
 \item[R1: Integrated workflow.] All phases of the experimental workflow (design, execution, and analysis) should be integrated within a single unified environment to minimize context switching and tool fragmentation.
@@ -34,14 +34,14 @@ This thesis represents the culmination of a multi-year research effort to develo
 \item[R6: Collaborative support.] Multiple team members should be able to contribute to experiment design and review execution data, supporting truly interdisciplinary research.
 \end{description}
-To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved, and few tools embed methodological best practices directly into their design, like training wheels on a bicycle, guiding experimenters to follow sound methodology by default.
+To the best of my knowledge, no existing platform satisfies all six requirements. Most critically, the trade-off between accessibility and flexibility remains unresolved. Few tools embed methodological best practices directly into their design to guide experimenters toward sound methodology by default.
-The ideas presented here build upon prior work established in two peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.
+This work builds on two prior peer-reviewed publications. We first introduced the concept for HRIStudio as a Late-Breaking Report at the 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) \cite{OConnor2024}. In that position paper, we identified the lack of accessible tooling as a primary barrier to entry in HRI and proposed the high-level vision of a web-based, collaborative platform. We established the core requirements listed above and argued for a web-based approach to achieve them.
 Following the initial proposal, we published the detailed system architecture and preliminary prototype as a full paper at RO-MAN 2025 \cite{OConnor2025}. That publication validated the technical feasibility of our approach, detailing the communication protocols, data models, and plugin architecture necessary to support real-time robot control using standard web technologies while maintaining platform independence.
-While those prior publications established the conceptual framework and technical architecture, this thesis focuses on the realization and empirical validation of the platform. I extend that research in two key ways. First, I implement a functional software system that addresses engineering challenges related to stability, latency, and deployment, providing a minimum viable product for evaluation. Second, I provide a rigorous user study comparing the proposed framework against a representative baseline tool. This empirical evaluation provides evidence to support the claim that thoughtful infrastructure design can improve both accessibility and reproducibility in HRI research.
+While those prior publications established the conceptual framework and technical architecture, this thesis formalizes those design principles, realizes them in a complete implementation, and tests whether they produce measurably different outcomes in a pilot validation study. The pilot study compares design fidelity and execution reliability between HRIStudio and a representative baseline tool, showing whether these principles translate into better outcomes for real researchers.
 \section{Chapter Summary}
-This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet to the best of my knowledge, no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure (R1-R6): integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform-agnostic design, and collaborative support. These requirements form the foundation for evaluating how the proposed framework advances the state of WoZ research infrastructure. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
+This chapter has established the technical and methodological context for this thesis. Existing WoZ platforms fall into two categories: general-purpose tools like Polonius and OpenWoZ that offer flexibility but high technical barriers, and platform-specific systems like WoZ4U and Choregraphe that prioritize usability at the cost of cross-platform generality. Recent approaches such as VR-based frameworks attempt to bridge this gap, yet to the best of my knowledge, no existing tool successfully combines accessibility, flexibility, and embedded methodological rigor. Based on this landscape analysis, I identified six critical requirements for modern WoZ infrastructure (R1--R6): integrated workflows, low technical barriers, real-time control across platforms, automated logging, platform-agnostic design, and collaborative support. These requirements are the standard against which the proposed design is evaluated in Chapter~\ref{ch:evaluation}. The next chapter examines the broader reproducibility challenges that justify why these requirements are essential.
@@ -1,7 +1,7 @@
-\chapter{Reproducibility Challenges in WoZ-based HRI Research}
+\chapter{Reproducibility Challenges}
 \label{ch:reproducibility}
-Having established the landscape of existing WoZ platforms and their limitations, I now examine the factors that make WoZ experiments difficult to reproduce and how software infrastructure can address them. This chapter analyzes the sources of variability in WoZ studies and examines how current practices in infrastructure and reporting contribute to reproducibility problems. Understanding these challenges is essential for designing a system that supports experimentation at scale while remaining scientifically rigorous.
+Having established the landscape of existing WoZ platforms and their limitations, I now examine the factors that make WoZ experiments difficult to reproduce and how software infrastructure can address them. This chapter analyzes the sources of variability in WoZ studies and examines how current practices in infrastructure and reporting contribute to reproducibility problems. Understanding these challenges is essential for designing a system that supports reproducible, rigorous experimentation.
 \section{Sources of Variability}
@@ -31,8 +31,8 @@ Based on this analysis, I identify specific ways that software infrastructure ca
 \section{Connecting Reproducibility Challenges to Infrastructure Requirements}
-The reproducibility challenges identified above directly motivate the infrastructure requirements established in Chapter~\ref{ch:background}. Inconsistent wizard behavior creates the need for enforced experimental protocols (R1, R2) that guide wizards systematically. The lack of comprehensive data undermines analysis, motivating automatic logging requirements (R4). Technical fragmentation violates platform agnosticism (R5). Each lab builds custom software tied to specific hardware, and these custom systems become obsolete when hardware evolves. Incomplete documentation reflects the need for self-documenting protocol specifications (R1, R2) that are simultaneously executable and shareable. As Chapter~\ref{ch:background} demonstrated, no existing platform simultaneously satisfies all six requirements. Addressing this gap requires rethinking how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.
+The reproducibility challenges identified above directly motivate the infrastructure requirements (R1--R6) established in Chapter~\ref{ch:background}. Inconsistent wizard behavior creates the need for enforced execution protocols (R1) that guide wizards step by step, and for automatic logging (R4) that captures any deviations that occur. Timing errors specifically motivate responsive, fine-grained real-time control (R3): a wizard working with a sluggish interface introduces latency that disrupts the interaction and confounds timing analysis. Technical fragmentation forces each lab to rebuild infrastructure as hardware changes, violating platform agnosticism (R5). Incomplete documentation reflects the need for self-documenting, code-free protocol specifications (R1, R2) that are simultaneously executable and shareable. Finally, the isolation of individual research groups motivates collaborative support (R6): allowing multiple team members to observe and review trials enables the shared scrutiny that reproducibility requires. As Chapter~\ref{ch:background} demonstrated, no existing platform simultaneously satisfies all six requirements. Addressing this gap requires rethinking how WoZ infrastructure is designed, prioritizing reproducibility and methodological rigor as first-class design goals rather than afterthoughts.
 \section{Chapter Summary}
-This chapter has analyzed the reproducibility challenges inherent in WoZ-based HRI research, identifying three primary sources of variability: inconsistent wizard behavior, fragmented technical infrastructure, and incomplete documentation. Rather than treating these challenges as inherent to the WoZ paradigm, I showed how each stems from gaps in current infrastructure. Software design can systematically mitigate these challenges through enforced experimental protocols, comprehensive automatic logging, self-documenting experiment designs, and platform-independent abstractions. These design goals directly address the six infrastructure requirements identified in Chapter~\ref{ch:background}. The following chapters describe the design, implementation, and empirical evaluation of a system that prioritizes reproducibility as a foundational design principle from inception.
+This chapter has analyzed the reproducibility challenges inherent in WoZ-based HRI research, identifying three primary sources of variability: inconsistent wizard behavior, fragmented technical infrastructure, and incomplete documentation. Rather than treating these challenges as inherent to the WoZ paradigm, I showed how each stems from gaps in current infrastructure. Software design can systematically mitigate these challenges through enforced experimental protocols, comprehensive automatic logging, self-documenting experiment designs, and platform-independent abstractions. These design goals directly address the six infrastructure requirements identified in Chapter~\ref{ch:background}. The following chapters describe the design, implementation, and pilot validation of a system that prioritizes reproducibility as a foundational design principle from inception.
@@ -1,85 +1,335 @@
-\chapter{System Design}
+\chapter{Architectural Design}
 \label{ch:design}
-The previous chapters established the motivation for a web-based WoZ platform and identified six critical requirements for modern HRI research infrastructure. This chapter describes the design of HRIStudio, focusing on how the system architecture and experimental workflow implement these requirements. In this chapter I go over three key design decisions: the hierarchical structure of experiment specifications, the modular interface architecture, and the data flow during experiment execution.
+Chapter~\ref{ch:background} established six requirements for modern WoZ infrastructure, labeled R1 through R6, and Chapter~\ref{ch:reproducibility} showed the reproducibility problems that motivate them. This chapter presents the architectural contribution of this thesis: a hierarchical specification model, an event-driven execution model, a modular interface architecture, and an integrated data flow that together address all six requirements. These are design principles, not implementation details; they apply to any system built with the same goals.
 \section{Hierarchical Organization of Experiments}
-To address the need for self-documenting, executable experiment specifications (R1, R2), HRIStudio introduces a hierarchical organization of elements that allows researchers to express WoZ studies at multiple levels of abstraction. This structure enables experiment designs to be simultaneously intuitive for researchers to create and precise enough for the system to execute.
+WoZ studies involve multiple reusable conditions, shared protocol phases, and platform-specific behaviors that span the full research lifecycle. To organize these components without requiring researchers to write code, the system structures every study as a four-level hierarchy: \emph{study} $\rightarrow$ \emph{experiment} $\rightarrow$ \emph{step} $\rightarrow$ \emph{action}. This structure separates high-level protocol design from low-level execution behavior, keeping the authoring process code-free while integrating design, execution, and analysis into a single unified workflow.
-At the top level, researchers create a \emph{study} element that defines the overall research context, including metadata about the research project, collaborators, and general experimental conditions. A study contains two types of subordinate elements: \emph{experiment} elements represent reusable protocols (e.g., ``The Interactive Storyteller'' experiment), each specifying the sequence of steps and actions that define an interaction design. \emph{Trial} elements represent specific instantiations where a particular participant executes a particular experiment protocol. This distinction between protocol (experiment) and execution instance (trial) allows researchers to manage multiple repetitions of the same protocol (trials with different participants) while maintaining clear traceability.
+The terms in this hierarchy are used in a strict way. A \emph{study} is the top-level research container that groups related protocol conditions. An \emph{experiment} is one reusable condition within that study (for example, a control versus experimental condition). A \emph{step} is one phase of the protocol timeline (for example, an introduction, telling a story, or testing recall). An \emph{action} is the smallest executable unit inside a step (for example, trigger a gesture, play audio, or speak a prompt).
-Each experiment protocol comprises a sequence of \emph{step} elements, which model distinct phases of the interaction design. For example, an experiment protocol might define steps such as ``Introduction,'' ``Learning Task,'' and ``Closing.'' Within each step, researchers define one or more \emph{action} elements that are the atomic units of the experimental procedure. Actions can be directed at the wizard (e.g., ``Wait for subject to finish task, then say encouraging phrase'') or at the robot (e.g., ``Move arm to point, play audio greeting, wait for subject response''). 
+Figure~\ref{fig:experiment-hierarchy} shows a representation of this hierarchical structure for social robotics studies. Reading top-down, one study contains one or more experiments, each experiment contains one or more steps, and each step contains one or more actions. Figure~\ref{fig:trial-instantiation} shows the protocol-versus-instance separation in isolation. The left column holds the protocol designed once before the study begins; the right column shows the separate trial records produced each time a participant runs it. A dashed line marks the protocol/trial boundary: everything to its left was authored by the researcher before any participant arrived; everything to its right was generated during a live session. The \textit{instantiates} arrows from the experiment node fan out to each trial record, making the relationship explicit. This separation is central to reproducibility: the same experiment specification generates a distinct, timestamped record per participant, so researchers can compare across participants without conflating what was designed with what was executed.
 To illustrate how the schema can be used with a concrete example, consider an interactive storytelling study with the research question: \emph{Does robot interaction modality influence participant recall performance?} The two conditions differ in how the robot looks and behaves: NAO6 has a human-like form and uses expressive gestures, while TurtleBot is visibly machine-like with no social movement cues. This keeps the narrative task the same across both conditions while changing only how the robot delivers it.
 Figure~\ref{fig:example-hierarchy} maps that study onto the same hierarchy. The study branches into two experiments (TurtleBot with only voice, NAO6 with added gestures), each experiment uses the same ordered steps (Intro, Story Telling, Recall Test), and each step contains actions. The figure expands only the Story Telling step to keep the diagram readable, but Intro and Recall Test follow the same structure. Figures~\ref{fig:experiment-hierarchy}, \ref{fig:trial-instantiation}, and~\ref{fig:example-hierarchy} together progress from abstract schema, to protocol-versus-instance separation, to a concrete instantiation.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
-	nodebox/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
+	nodebox/.style={rectangle, draw=black, thick, fill=gray!15, align=center,
-	nodeboxdark/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.8cm, minimum height=0.8cm, align=center, font=\small},
+		text width=3.2cm, minimum height=1.0cm, font=\small, inner sep=4pt},
-	arrow/.style={->, thick}]
+	nodeboxdark/.style={rectangle, draw=black, thick, fill=gray!35, align=center,
 		text width=3.2cm, minimum height=1.0cm, font=\small, inner sep=4pt},
 	arrow/.style={->, thick},
 	label/.style={font=\small\itshape, fill=white, inner sep=2pt}]
-	\node[nodebox] (study) at (0, 3.4) {Study};
+	\node[nodebox]     (study)  at (0,  6.0) {Study};
-	\node[nodebox] (experiment) at (0, 2.1) {Experiment};
+	\node[nodebox]     (exp)    at (0,  4.0) {Experiment};
 	\node[nodebox]     (step)   at (0,  2.0) {Step};
 	\node[nodeboxdark] (action) at (0,  0.0) {Action};
-	\node[nodebox] (step1) at (-3.0, 0.7) {Step};
+	\draw[arrow] (study.south)  -- node[label, right=6pt] {has one or more} (exp.north);
-	\node[nodebox] (step2) at (0, 0.7) {Step};
+	\draw[arrow] (exp.south)    -- node[label, right=6pt] {has one or more} (step.north);
-	\node[nodebox] (step3) at (3.0, 0.7) {Step};
+	\draw[arrow] (step.south)   -- node[label, right=6pt] {has one or more} (action.north);
 	\node[nodeboxdark] (action1) at (-4.5, -0.7) {Action};
 	\node[nodeboxdark] (action2) at (-1.5, -0.7) {Action};
 	\draw[arrow] (study.south) -- (experiment.north);
 	\draw[arrow] (experiment.south) -- (step1.north);
 	\draw[arrow] (experiment.south) -- (step2.north);
 	\draw[arrow] (experiment.south) -- (step3.north);
 	\draw[arrow] (step1.south) -- (action1.north);
 	\draw[arrow] (step1.south) -- (action2.north);
 \end{tikzpicture}
-\caption{Hierarchy of experiment specifications from study-level context to atomic actions.}
+\caption{The four-level experiment specification hierarchy.}
 \label{fig:experiment-hierarchy}
 \end{figure}
-This hierarchical structure serves multiple purposes. First, it permits researchers to design experiment protocols without programming knowledge, using visual or declarative specifications at each level. Second, it naturally maps to the temporal structure of a trial session, making the protocol easy to follow during live execution. Third, it provides a foundation for comprehensive logging: each action executed during a trial can be recorded with precise timestamps and outcomes, making the experimental trace reproducible and analyzable. Fourth, the separation of experiment (protocol) from trial (execution) enables researchers to run the same protocol with different participants, facilitating direct comparison across trials while maintaining clear record-keeping of which participant ran which protocol.
+\begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
 	spec/.style={rectangle, draw=black, thick, fill=gray!15, align=center,
 		text width=3.2cm, minimum height=1.0cm, font=\small, inner sep=4pt},
 	trial/.style={rectangle, draw=black, thick, dashed, fill=gray!5, align=center,
 		text width=3.2cm, minimum height=1.0cm, font=\small, inner sep=4pt},
 	arrow/.style={->, thick},
 	darrow/.style={->, thick, dashed}]
-\section{Modular Interface Architecture}
+	%% ---- Column headers ----
 	\node[font=\small\bfseries] at (1.9,  7.0) {Protocol (designed once)};
 	\node[font=\small\bfseries] at (7.9,  7.0) {Trials (run per participant)};
-To support different roles in an experiment while maintaining coherent data flow (R3, R4, R6), HRIStudio implements three primary user interfaces, each optimized for a specific phase of the research lifecycle.
+	%% ---- Protocol column ----
 	\node[spec] (study) at (1.9, 5.8) {Study};
 	\node[spec] (exp)   at (1.9, 4.2) {Experiment};
 	\node[spec] (step)  at (1.9, 2.6) {Step};
-\subsection{Design Interface}
+	\draw[arrow] (study.south) -- (exp.north);
 	\draw[arrow] (exp.south)   -- (step.north);
-The \emph{Design} interface enables researchers to construct experiment specifications using drag-and-drop visual programming. Rather than requiring researchers to write code or complex configuration files, the interface presents a canvas where researchers can assemble pre-built action components into sequences. Components represent common tasks such as robot movements, speech synthesis, wizard instructions, and conditional logic. Researchers configure each component's parameters through property panels that provide contextual guidance and examples of best practices.
+	%% ---- Trial column ----
 	\node[trial] (t1) at (7.9, 5.5) {Trial --- P01\\{\footnotesize timestamped log}};
 	\node[trial] (t2) at (7.9, 4.2) {Trial --- P02\\{\footnotesize timestamped log}};
 	\node[trial] (t3) at (7.9, 2.9) {Trial --- P03\\{\footnotesize timestamped log}};
-By treating experiment design as a visual specification task, the interface lowers technical barriers (R2) and ensures that the resulting protocol specification is human-readable and shareable alongside research results. The specification is stored in a structured, machine-readable format that can be both displayed as a flowchart and executed by the platform's runtime.
+	%% ---- Separator ----
 	\draw[gray!60, thick, dashed] (4.85, 1.8) -- (4.85, 6.6);
 	\node[font=\footnotesize\itshape, gray!80] at (4.85, 1.4) {protocol\,/\,trial boundary};
-\subsection{Execute Interface}
+	%% ---- Instantiation arrows + label ----
 	\node[font=\small\itshape] at (6.35, 6.3) {instantiates};
 	\draw[darrow] (exp.east) -- (t1.west);
 	\draw[darrow] (exp.east) -- (t2.west);
 	\draw[darrow] (exp.east) -- (t3.west);
-During live trials, the Execute interface provides a synchronized live view of experiment execution. The wizard sees the current step and available actions, guiding the wizard through the experimental protocol while allowing flexibility for spontaneous, contextual responses. Actions are presented sequentially, but the wizard can manually trigger specific actions based on participant responses, ensuring that the interaction remains natural and responsive rather than rigidly scripted.
+\end{tikzpicture}
 \caption{One experiment protocol instantiated as a separate trial record per participant.}
 \label{fig:trial-instantiation}
 \end{figure}
-The Execute view includes manual controls for unscripted behaviors such as additional robot movements, speech, or gestures. These unscripted actions are recorded in the trial log as explicit deviations from the protocol, enabling researchers to later analyze both scripted and improvised interactions. This design balances the need for consistent, monitored behavior (which supports reproducibility) with the flexibility required for realistic human-robot interactions.
+\begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
 	nodebox/.style={rectangle, draw=black, thick, fill=gray!15, align=center, text width=2.0cm, font=\small, minimum height=1.2cm, inner sep=2pt},
 	nodeboxdark/.style={rectangle, draw=black, thick, fill=gray!30, align=center, text width=1.6cm, font=\small, minimum height=1.2cm, inner sep=2pt},
 	arrow/.style={->, thick}]
-Additional researchers can simultaneously access this same synchronized live view through the platform's Dashboard by selecting a live trial to ``spectate.'' Multiple researchers observing the same trial view the identical synchronized display of the wizard's controls, participant interactions, and robot state, supporting real-time collaboration and interdisciplinary observation (R6). Observers can take notes and mark significant moments without interfering with the wizard's control or the participant's experience.
+	% Study
 	\node[nodebox] (study) at (0, 7.0) {\textit{Study}\\Recall Study};
-\subsection{Analysis Interface}
+	% Experiments
 	\node[nodebox] (nao_exp) at (-3.8, 5.0) {\textit{Experiment}\\NAO6 with Gestures};
 	\node[nodebox] (tb_exp) at (3.8, 5.0) {\textit{Experiment}\\TurtleBot with Voice};
 	\draw[arrow] (study.south) -- (nao_exp.north);
 	\draw[arrow] (study.south) -- (tb_exp.north);
-After a live experiment session, the \emph{Analysis} interface enables researchers to review all recorded data streams in synchronized fashion. This includes video of the human-robot interaction, audio of speech and ambient sounds, logged actions and state changes, and sensor data from the robot. Researchers can scrub through the recording, mark significant events with annotations, and export selected segments or annotations for analysis.
+	% NAO steps (independent branch)
 	\node[nodebox] (nao_s1) at (-6.1, 3.0) {\textit{Step 1}\\Intro};
 	\node[nodebox] (nao_s2) at (-3.8, 3.0) {\textit{Step 2}\\Story Telling};
 	\node[nodebox] (nao_s3) at (-1.5, 3.0) {\textit{Step 3}\\Recall Test};
 	\draw[arrow] (nao_exp.south) -- (nao_s1.north);
 	\draw[arrow] (nao_exp.south) -- (nao_s2.north);
 	\draw[arrow] (nao_exp.south) -- (nao_s3.north);
-The analysis interface directly supports reproducibility (R4) by making the complete experimental trace accessible and analyzable. Researchers can verify that the protocol was executed as intended, examine deviations from the protocol, and compare execution traces across multiple sessions to verify consistency.
+	% TurtleBot steps (independent branch)
 	\node[nodebox] (tb_s1) at (1.5, 3.0) {\textit{Step 1}\\Intro};
 	\node[nodebox] (tb_s2) at (3.8, 3.0) {\textit{Step 2}\\Story Telling};
 	\node[nodebox] (tb_s3) at (6.1, 3.0) {\textit{Step 3}\\Recall Test};
 	\draw[arrow] (tb_exp.south) -- (tb_s1.north);
 	\draw[arrow] (tb_exp.south) -- (tb_s2.north);
 	\draw[arrow] (tb_exp.south) -- (tb_s3.north);
 	% NAO: multiple real actions for Story Telling
 	\node[nodeboxdark] (nao_a1) at (-5.9, 1.0) {\textit{Action 1}\\Gesture Hand};
 	\node[nodeboxdark] (nao_a2) at (-3.8, 1.0) {\textit{Action 2}\\Gesture Head};
 	\node[nodeboxdark] (nao_a3) at (-1.7, 1.0) {\textit{Action 3}\\Speak};
 	\draw[arrow] (nao_s2.south) -- (nao_a1.north);
 	\draw[arrow] (nao_s2.south) -- (nao_a2.north);
 	\draw[arrow] (nao_s2.south) -- (nao_a3.north);
 	% TurtleBot: multiple real actions for Story Telling
 	\node[nodeboxdark] (tb_a1) at (1.7, 1.0) {\textit{Action 1}\\Play Audio};
 	\node[nodeboxdark] (tb_a2) at (3.8, 1.0) {\textit{Action 2}\\Beep};
 	\node[nodeboxdark] (tb_a3) at (5.9, 1.0) {\textit{Action 3}\\Speak};
 	\draw[arrow] (tb_s2.south) -- (tb_a1.north);
 	\draw[arrow] (tb_s2.south) -- (tb_a2.north);
 	\draw[arrow] (tb_s2.south) -- (tb_a3.north);
 \end{tikzpicture}
 \caption{A recall study with two conditions mapped onto the four-level hierarchy.}
 \label{fig:example-hierarchy}
 \end{figure}
 Together, these three figures motivate why the hierarchy is useful in practice. The layered structure lets researchers define protocols at any level of granularity without writing code, which keeps the tool accessible to non-programmers. The step and action levels also align naturally with trial flow, so the wizard stays guided by the protocol while retaining control over timing, which supports the real-time control requirement. Action-level execution provides a natural unit for timestamped logging and post-trial analysis, satisfying the automated logging requirement. Finally, keeping experiment definitions separate from trial instances means the same protocol can be reproduced across participants and conditions, supporting both the integrated workflow and collaborative support requirements.
 \section{Event-Driven Execution Model}
-To achieve real-time responsiveness while maintaining methodological rigor (R3, R5), HRIStudio uses an event-driven execution model rather than a time-driven one. In a time-driven approach, the system would advance through actions on a fixed schedule, leading to rigid, potentially unnatural interaction timing. In contrast, the event-driven model allows the wizard to trigger or advance actions based on the perceived state of the human participant.
+To achieve real-time responsiveness while maintaining methodological rigor (R3, R5), the system uses an event-driven execution model rather than a time-driven one. In a time-driven approach, the system advances through actions on a fixed schedule regardless of what the participant is doing, so the robot might speak over a participant who is still talking, or move on before a response has been given. The event-driven model avoids this by letting the wizard trigger each action when the interaction is ready for it. Figure~\ref{fig:event-driven-timeline} contrasts the two approaches using the same four-action sequence: Greet (G), Begin Story (BS), Ask Question (AQ), and End (E). In the time-driven row, fixed intervals $t_0$ through $t_2$ define when each event fires, and dashed vertical lines show where those moments fall relative to the event-driven rows below. In both event-driven rows, the wizard fires the same four labeled events at different real-time positions --- T1 (a faster participant) finishes well before T2 (a slower one) --- while both preserve the same action order.
-This approach has several implications. First, not all sessions of the same experiment will have identical timing or duration; the length of a learning task, for example, depends on the participant's progress. The system records the actual timing of actions, permitting researchers to capture these natural variations in their data. Second, the event-driven model enables the wizard to respond contextually without departing from the protocol; the wizard remains guided by the sequence of available actions while having control over when to advance based on participant cues.
+\begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
 	dot/.style={circle, fill=black, minimum size=6pt, inner sep=0pt},
 	tline/.style={->, thick}]
-The system enforces protocol consistency by constraining the wizard's choices to the set of actions defined in the protocol specification, while recording all choices made and any deviations. This design directly addresses the reproducibility challenge of inconsistent wizard behavior by making the wizard's degrees of freedom explicit and logged.
+	% Row y positions
 	% 3.5 = Time-Driven, 2.0 = Event-Driven S1, 0.5 = Event-Driven S2
 	% Timelines
 	\draw[tline] (0, 3.5) -- (11.5, 3.5);
 	\draw[tline] (0, 2.0) -- (11.5, 2.0);
 	\draw[tline] (0, 0.5) -- (11.5, 0.5);
 	% Row labels
 	\node[font=\small, anchor=east] at (-0.15, 3.5) {Time-Driven};
 	\node[font=\small, anchor=east] at (-0.15, 2.0) {Event-Driven (T1)};
 	\node[font=\small, anchor=east] at (-0.15, 0.5) {Event-Driven (T2)};
 	% Time-driven events at fixed positions
 	\node[dot] at (1.0,  3.5) {};
 	\node[dot] at (3.5,  3.5) {};
 	\node[dot] at (7.0,  3.5) {};
 	\node[dot] at (10.5, 3.5) {};
 	% Action labels above time-driven row
 	\node[font=\scriptsize, above=3pt] at (1.0,  3.5) {Greet};
 	\node[font=\scriptsize, above=3pt] at (3.5,  3.5) {Begin Story};
 	\node[font=\scriptsize, above=3pt] at (7.0,  3.5) {Ask Question};
 	\node[font=\scriptsize, above=3pt] at (10.5, 3.5) {End};
 	%% ---- Time interval braces below time-driven row ----
 	\draw[decorate, decoration={brace, amplitude=4pt, mirror}]
 		(1.0, 3.2) -- (3.5, 3.2) node[midway, below=6pt, font=\scriptsize] {$t_0$};
 	\draw[decorate, decoration={brace, amplitude=4pt, mirror}]
 		(3.5, 3.2) -- (7.0, 3.2) node[midway, below=6pt, font=\scriptsize] {$t_1$};
 	\draw[decorate, decoration={brace, amplitude=4pt, mirror}]
 		(7.0, 3.2) -- (10.5, 3.2) node[midway, below=6pt, font=\scriptsize] {$t_2$};
 	% Dashed vertical alignment lines
 	\draw[dashed, gray!70] (1.0,  3.35) -- (1.0,  0.35);
 	\draw[dashed, gray!70] (3.5,  3.35) -- (3.5,  0.35);
 	\draw[dashed, gray!70] (7.0,  3.35) -- (7.0,  0.35);
 	\draw[dashed, gray!70] (10.5, 3.35) -- (10.5, 0.35);
 	% Event-driven S1 (fast participant)
 	\node[dot] at (1.0, 2.0) {};
 	\node[dot] at (2.5, 2.0) {};
 	\node[dot] at (5.5, 2.0) {};
 	\node[dot] at (7.8, 2.0) {};
 	% Event-driven S1 labels
 	\node[font=\scriptsize, below=3pt] at (1.0, 2.0) {G};
 	\node[font=\scriptsize, below=3pt] at (2.5, 2.0) {BS};
 	\node[font=\scriptsize, below=3pt] at (5.5, 2.0) {AQ};
 	\node[font=\scriptsize, below=3pt] at (7.8, 2.0) {E};
 	% Event-driven S2 (slower participant)
 	\node[dot] at (1.0,  0.5) {};
 	\node[dot] at (4.3,  0.5) {};
 	\node[dot] at (8.5,  0.5) {};
 	\node[dot] at (10.8, 0.5) {};
 	% Event-driven S2 labels
 	\node[font=\scriptsize, below=3pt] at (1.0,  0.5) {G};
 	\node[font=\scriptsize, below=3pt] at (4.3,  0.5) {BS};
 	\node[font=\scriptsize, below=3pt] at (8.5,  0.5) {AQ};
 	\node[font=\scriptsize, below=3pt] at (10.8, 0.5) {E};
 	% Time axis label
 	\node[font=\small\itshape] at (5.75, -0.25) {time};
 \end{tikzpicture}
 \caption{Time-driven (top) versus event-driven (bottom, two trials) execution of the same four-action protocol.}
 \label{fig:event-driven-timeline}
 \end{figure}
 This approach has several implications. First, not all trials of the same experiment will have identical timing or duration; the length of a learning task, for example, depends on the participant's progress. The system records the actual timing of actions, permitting researchers to capture these natural variations in their data. Second, the event-driven model enables the wizard to respond contextually without departing from the protocol; the wizard remains guided by the sequence of available actions while having control over when to advance based on participant cues.
 The system guides the wizard through the protocol step-by-step, ensuring the intended sequence is followed. Every action is logged with a timestamp whether it was scripted or not, and anything outside the protocol is flagged as a deviation. This means inconsistent wizard behavior shows up in the data rather than disappearing into it.
 \section{Modular Interface Architecture}
 Researchers interact with the system through three interfaces, each one encapsulating a specific phase of an experimental study: designing a protocol, running a trial, and reviewing the results.
 \subsection{Design Interface}
 The \emph{Design} interface gives researchers a drag-and-drop canvas for building experiment protocols, creating a visual programming environment. Researchers drag pre-built action components, including robot movements, speech, wizard instructions, and conditional logic, onto the canvas and drop them into sequence. Clicking a component opens a side panel where its parameters can be set, such as the text for a speech action or the gesture name for a movement.
 By treating experiment design as a visual specification task, the interface lowers technical barriers (R2). Researchers can assemble interaction logic by dragging components into sequence and setting parameters in plain language, without writing code. The resulting protocol specification is also human-readable and shareable alongside research results. The specification is stored in a structured format that can be displayed as a timeline for analysis and executed directly by the platform's runtime.
 \subsection{Execution Interface}
 During trials, the Execution interface shows the wizard exactly where they are in the protocol: the current step, the available actions, and the robot's current state, all updated in real time as the trial progresses.
 The Execution interface also exposes a set of manual controls for actions that fall outside the scripted protocol. Consider a participant who asks an unexpected question mid-trial: the wizard can trigger an unscripted speech response on the spot rather than leaving the interaction to stall. This keeps the interaction feeling natural for the participant. Critically, the system does not simply ignore these moments. Every unscripted action is timestamped and written to the trial log as an explicit deviation, giving researchers a complete picture of what actually happened versus what was planned. This makes unscripted actions a feature rather than a source of noise: the wizard retains real-time control over the interaction, and the logging infrastructure captures everything needed for post-trial analysis.
 Additional researchers can simultaneously access this same live view through the platform's Dashboard by selecting a trial to ``spectate.'' Multiple researchers observing the same trial view the identical synchronized display of the wizard's controls, participant interactions, and robot state, supporting real-time collaboration and interdisciplinary observation (R6). Observers can take notes and mark significant moments without interfering with the wizard's control or the participant's experience.
 \subsection{Analysis Interface}
 After a trial concludes, the \emph{Analysis} interface lets researchers review everything that was recorded: video of the interaction, audio, timestamped action logs, and robot sensor data, all scrubable from a single timeline. Researchers can annotate significant moments and export segments for further analysis. Because the same platform produced both the protocol and the recording, the interface can show exactly where the execution matched the design and where it deviated, without any manual cross-referencing.
 \section{Data Flow and Infrastructure Implementation}
-The overall data flow through HRIStudio follows the experimental workflow from design through analysis. During the design phase, researchers create experiment specifications that are stored in the system database. During a live experiment session, the system manages bidirectional communication between the wizard's interface and the robot control layer. All actions, sensor data, and events are streamed to a data logging service that stores complete session records. After the experiment, researchers access these records through the Analysis interface for analysis.
+To ensure that data from every experimental phase remains traceable, the system organizes its internals into three architectural layers and defines a clear data pathway from protocol design through post-trial analysis, covering how experiment specifications, control commands, and recorded data move through the system.
-This architecture satisfies the infrastructure requirements by design. The integrated workflow (R1) flows naturally through design $\rightarrow$ execution $\rightarrow$ analysis. Low technical barriers (R2) are achieved through the visual Design interface. Real-time control (R3) is supported by responsive event-driven execution. Automated logging (R4) is built-in at the system level. Platform agnosticism (R5) is achieved by decoupling the high-level action specification from robot-specific control commands in the ROS interface. Collaborative support (R6) is enabled through shared views and multi-user access to all system components.
+\subsection{Architectural Layers}
 The system is structured as a three-layer architecture, each with a specific responsibility:
 \begin{description}
 \item[User Interface layer.] Runs in researchers' web browsers and exposes the three interfaces (Design, Execution, Analysis), managing user interactions such as clicking buttons, dragging and dropping experiment components, and reviewing experimental results.
 \item[Application Logic layer.] Operates as a server process that manages experiment data, coordinates trial execution, authenticates users, and orchestrates communication between the interface and the robot.
 \item[Data and Robot Control layer.] Encompasses long-term storage of experiment protocols and trial data, as well as direct communication with robot hardware.
 \end{description}
 This separation of concerns provides two concrete benefits. First, each layer can evolve independently: improving the user interface requires no changes to robot control logic, and swapping in a different storage backend requires no changes to the execution engine. Second, the separation enforces clear responsibilities: the user interface never directly commands robot hardware; all robot actions flow through the application logic layer, which maintains consistent logging. Figure~\ref{fig:three-tier} shows that HRIStudio separates interface behavior, execution logic, and robot/data operations into distinct layers with explicit boundaries.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
    layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm},
    arrow/.style={->, thick, line width=1.5pt}]
    % Layer 1: UI
    \node[layer, fill=gray!15] (ui) at (0, 3.5) {
        \textbf{User Interface}\\[0.1cm]
        {\small Design, Execution, Analysis}
    };
    % Layer 2: Logic
    \node[layer, fill=gray!30] (logic) at (0, 1.8) {
        \textbf{Application Logic}\\[0.1cm]
        {\small Execution, Authentication, Logger}
    };
    % Layer 3: Data
    \node[layer, fill=gray!45] (data) at (0, 0.1) {
        \textbf{Data \& Robot Control}\\[0.1cm]
        {\small Database, File Storage, ROS}
    };
    % Arrows
    \draw[arrow] (ui.south) -- (logic.north);
    \draw[arrow] (logic.south) -- (data.north);
 \end{tikzpicture}
 \caption{Three-layer architecture separates user interface, application logic, and data/robot control.}
 \label{fig:three-tier}
 \end{figure}
 \subsection{Data Flow Through Experimental Phases}
 During the design phase, researchers create experiment specifications that are stored in the system database. During a trial, the system manages bidirectional communication between the wizard's interface and the robot control layer. All actions, sensor data, and events are streamed to a data logging service that stores complete records. After the trial, researchers can inspect these records through the Analysis interface.
 The flow of data during a trial proceeds through six distinct phases, as shown in Figure~\ref{fig:trial-dataflow}. First, a researcher creates an experiment protocol using the Design interface. Second, when a trial begins, the application server loads the protocol and begins stepping through it, sending commands to the robot and waiting for events such as wizard inputs, sensor readings, or timeouts. Third, every action, both planned protocol steps and unexpected events, is immediately written to the trial log with precise timing information. Fourth, the Execution interface continuously displays the current state, allowing the wizard and observers to monitor the progress of a trial in real-time. Fifth, when the trial concludes, all recorded media (video and audio) is transferred from the browser to the server and persisted in a database as part of the trial record. Sixth, the Analysis interface retrieves the stored trial data and reconstructs exactly what happened, synchronizing notable events with the video and audio recordings.
 This design ensures comprehensive documentation of every trial, supporting both fine-grained analysis and reproducibility. Researchers can review not just what they intended to happen, but what actually did happen, including timing variations and unexpected events.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
    stage/.style={rectangle, draw, thick, rounded corners, minimum width=3.5cm, minimum height=1cm, align=center, font=\footnotesize},
    arrow/.style={->, thick, line width=1.3pt}]
    % Six stages stacked vertically with descriptions inside
    \node[stage, fill=gray!10] (s1) at (0, 7.5) {1. Design Protocol\\{\scriptsize Researcher creates workflow}};
    \node[stage, fill=gray!15] (s2) at (0, 6) {2. Load \& Execute\\{\scriptsize System loads and runs trial}};
    \node[stage, fill=gray!20] (s3) at (0, 4.5) {3. Log Events\\{\scriptsize Actions recorded with timestamps}};
    \node[stage, fill=gray!25] (s4) at (0, 3) {4. Display Live State\\{\scriptsize Wizard sees real-time progress}};
    \node[stage, fill=gray!30] (s5) at (0, 1.5) {5. Transfer Media\\{\scriptsize Video/audio saved to server}};
    \node[stage, fill=gray!35] (s6) at (0, 0) {6. Analyze \& Playback\\{\scriptsize Review data with synchronized media}};
    % Downward arrows
    \draw[arrow] (s1.south) -- (s2.north);
    \draw[arrow] (s2.south) -- (s3.north);
    \draw[arrow] (s3.south) -- (s4.north);
    \draw[arrow] (s4.south) -- (s5.north);
    \draw[arrow] (s5.south) -- (s6.north);
 \end{tikzpicture}
 \caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
 \label{fig:trial-dataflow}
 \end{figure}
 \subsection{Requirements Satisfaction}
 The design choices described in this chapter were made to meet the requirements from Chapter~\ref{ch:background}. Having the researcher work through a single platform from protocol creation to post-trial review satisfies R1 (integrated workflow: design, execution, and analysis in one environment) without extra tooling. The visual drag-and-drop Design interface removes the need for programming knowledge, satisfying R2 (low technical barriers) by keeping the system accessible to researchers without a software background. Event-driven execution satisfies R3 (real-time control) by giving the wizard control over pacing while keeping the trial on protocol. All actions are logged automatically at the system level, satisfying R4 (automated logging) without requiring researchers to add logging by hand. The three-layer architecture decouples action specifications from robot-specific commands, satisfying R5 (platform agnosticism) by letting the same protocol run on different hardware without modification. Finally, shared live views and multi-user access let interdisciplinary teams observe and annotate the same trial simultaneously, satisfying R6 (collaborative support).
 \section{Chapter Summary}
-This chapter has described the system design of HRIStudio, with emphasis on how architectural choices directly implement the infrastructure requirements identified in Chapter~\ref{ch:background}. The hierarchical organization of experiment specifications enables intuitive, executable design. The modular interface architecture separates concerns across design, execution, and analysis phases while maintaining data coherence. The event-driven execution model balances protocol consistency with realistic interaction dynamics. The integrated data flow ensures that reproducibility is supported by design rather than by afterthought. The following chapter describes the implementation of these design principles using specific technologies and architectural components.
+This chapter described the architectural design with emphasis on how each design choice directly implements the infrastructure requirements identified in Chapter~\ref{ch:background}. The hierarchical organization of experiment specifications enables intuitive, executable design. The event-driven execution model balances protocol consistency with realistic interaction dynamics. The modular interface architecture separates concerns across design, execution, and analysis phases while maintaining data coherence. The integrated data flow ensures that reproducibility is supported by design rather than by afterthought. The following chapter presents HRIStudio as a reference implementation of these design principles, discussing specific technologies and architectural components.
@@ -1,222 +1,184 @@
 \chapter{Implementation}
 \label{ch:implementation}
-Chapter~\ref{ch:design} described the conceptual design of HRIStudio. This chapter addresses the realization of these design principles, discussing the core technologies used, the system architecture that integrates these technologies, and the current state of the implementation. 
+HRIStudio is a reference implementation of the design principles established in Chapter~\ref{ch:design}. The central contribution of this work is not the tool itself but the design principles that underpin it: the hierarchical specification model, the event-driven execution model, and the integrated data flow. Any system built on those principles would satisfy the same requirements. This chapter explains how HRIStudio realizes them, covering the architectural choices and mechanisms behind how the platform stores experiments, executes trials, integrates robot hardware, and controls access. The specific technologies used in this particular implementation are presented in Appendix~\ref{app:tech_docs}.
-The implementation demonstrates the feasibility of the proposed framework through a fully functional reference system. The work validates three key hypotheses: (1) that web technologies can achieve the real-time responsiveness required for live Wizard-of-Oz experiments, (2) that a plugin architecture can abstract robot-specific control without limiting expressiveness, and (3) that comprehensive event logging can be achieved automatically without requiring researchers to instrument their experiments. The following sections detail how these design principles were realized in practice.
+\section{Platform Architecture}
-\section{Core Implementation Decisions}
+HRIStudio follows the model of a web application. Users access it through a standard browser without installing specialized software, and the entire study team, including researchers, wizards, and observers, connect to the same shared system. This eliminates the need for a local installation and ensures the platform works identically on any operating system, directly addressing the low-technical-barrier requirement (R2, from Chapter~\ref{ch:background}). It also enables easy collaboration (R6): multiple team members can access experiment data and observe trials simultaneously from different machines without any additional configuration.
-HRIStudio is implemented as a web application. Researchers access it through a standard web browser without installing specialized software. This design decision directly addresses requirement R2 (low technical barrier) by eliminating installation complexity and ensuring the system works identically on different operating systems. This section describes the key implementation choices and the rationale behind them.
+I organized the system into three layers: User Interface, Application Logic, and Data \& Robot Control. This layered structure is shown in Figure~\ref{fig:three-tier}. In the implementation of this architecture, it is essential that the application server and the robot control hardware run on the same local network. This keeps communication latency low during trials: a noticeable delay between the wizard's input and the robot's response would break the interaction.
-\subsection{Web-Based Architecture}
+I implemented all three layers in the same language — TypeScript~\cite{TypeScript2014}, a statically-typed superset of JavaScript. The single-language decision keeps the type system consistent across the full stack. When the structure of experiment data changes, the type checker surfaces inconsistencies across the entire codebase at compile time rather than allowing them to appear as runtime failures during a trial.
-The choice to build HRIStudio as a web application was driven by three factors. First, web browsers are universally available, so researchers do not need to install custom software or manage dependencies. Second, web applications naturally support collaboration: multiple team members can access the same experiment data and observe live trials simultaneously from different locations. Third, web deployment simplifies updates: when I fix bugs or add features, all users immediately receive the improvements without manual software updates.
+\section{Experiment Storage and Trial Logging}
-I chose to use the same programming language~\cite{TypeScript2024} across the entire system, including the user interface, the server logic, and the data access layer. This consistency reduces a common source of errors: when the structure of experiment data changes, inconsistencies between different parts of the system are detected automatically rather than causing runtime failures during live trials.
+The system saves experiments to persistent storage when a researcher completes them in the Design interface. A saved experiment is a complete, reusable specification that a researcher can run across any number of trials without modification. In this chapter, a trial means one concrete run of an experiment protocol with one human subject; this is where spontaneous wizard deviations can occur.
-\subsection{Data Storage Strategy}
+When a trial begins, the system creates a new trial record linked to that experiment. The system writes every action the wizard triggers to that record with a precise timestamp, whether scripted or not, including any unscripted actions triggered outside the protocol. The system flags those unscripted actions as deviations. The Execution interface records video, audio, and robot sensor data alongside the action log for the duration of the trial. The Analysis interface can directly compare what was planned against what was executed for any trial, without any manual work by the researcher, because the trial record and the experiment reference the same underlying specification. Figure~\ref{fig:trial-record} shows the structure of a completed trial record: action log entries, video, audio, and robot sensor data all share a common timestamp reference so the Analysis interface can align them without manual synchronization; dashed lines mark step boundaries; and the system flags any deviation from the experiment specification at the appropriate position in the timeline.
 Experiment protocols and trial data are stored in a structured database that supports efficient queries, for example, retrieving all trials for a particular participant or comparing timing data across multiple sessions. However, video recordings and audio files are large and unstructured, so they are stored separately in a file storage system. This separation ensures that the database remains fast for common queries while still preserving complete multimedia records.
 \subsection{Robot Communication Layer}
 Rather than writing custom code to communicate with each robot's specific control system, HRIStudio uses a standard robotics communication framework as an intermediary. This design decision means that any robot that supports this framework can work with HRIStudio. For robots without native support, researchers can write a small adapter, a much simpler task than integrating directly with HRIStudio's core code.
 \subsection{Plugin Architecture for Platform Agnosticism}
 A critical design decision was how to support diverse robot platforms without hardcoding knowledge of specific robots into HRIStudio. The robotics landscape is fragmented: researchers use various robots (NAO, Pepper, Fetch, custom platforms) that communicate in different ways.
 The solution is a plugin architecture. When designing an experiment, researchers work with abstract actions like ``speak this text'' or ``raise arm.'' The system does not need to know whether it is controlling a NAO robot, a Pepper robot, or a custom research platform. Instead, each robot is described by a plugin, a configuration file that maps abstract actions to the specific commands that robot understands.
 This separation has important consequences. First, researchers can create an interaction protocol without knowing which robot will ultimately execute it, enabling protocol reuse across different hardware. Second, when a research lab acquires a new robot, they can add support for it by writing a plugin rather than modifying HRIStudio itself. Third, the visual designer's palette of available actions is automatically populated from the loaded plugins, ensuring the interface reflects the actual capabilities of the current robot.
 The plugin architecture also treats control flow (branches, loops, conditional logic) the same way as robot actions. This uniformity allows researchers to mix logical decisions and physical robot behaviors freely when designing experiments.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
-    action/.style={rectangle, draw=black, thick, fill=gray!15, minimum width=2.2cm, minimum height=0.6cm, align=center, font=\small},
+    dot/.style={circle, fill=black, minimum size=6pt, inner sep=0pt},
-    impl/.style={rectangle, draw=black, thick, fill=gray!30, minimum width=2.2cm, minimum height=0.7cm, align=center, font=\small},
+    devdot/.style={rectangle, draw=black, thick, fill=gray!50, minimum size=7pt, inner sep=0pt, rotate=45},
-    arrow/.style={-, thick}]
+    stepbox/.style={rectangle, draw=black, thick, fill=gray!15, align=center, font=\scriptsize, inner sep=3pt, minimum height=0.55cm},
    mediabar/.style={rectangle, draw=black, thick, fill=gray!30, minimum height=0.45cm},
    track/.style={font=\small, anchor=east}]
-    % First Y: speak()
+    % Time axis
-    \node[action] (a1) at (0, 7) {HRIStudio\\speak(text)};
+    \draw[->, thick] (0, -0.5) -- (11.5, -0.5) node[right, font=\small\itshape] {time};
-    \node[impl] (nao1) at (-2, 5) {NAO\\{\small robot-specific}};
+    \node[font=\small] at (0.1, -0.8) {$t_0$};
-    \node[impl] (pep1) at (2, 5) {Pepper\\{\small robot-specific}};
+    \node[font=\small] at (10.9, -0.8) {$t_n$};
    \draw[arrow] (a1) -- (nao1);
    \draw[arrow] (a1) -- (pep1);
-    % Second Y: raise_arm()
+    % Track labels
-    \node[action] (a2) at (0, 3) {HRIStudio\\raise\_arm()};
+    \node[track] at (-0.2, 5.2) {Experiment};
-    \node[impl] (nao2) at (-2, 1) {NAO\\{\small robot-specific}};
+    \node[track] at (-0.2, 3.9) {Action Log};
-    \node[impl] (pep2) at (2, 1) {Pepper\\{\small robot-specific}};
+    \node[track] at (-0.2, 2.9) {Video};
-    \draw[arrow] (a2) -- (nao2);
+    \node[track] at (-0.2, 1.9) {Audio};
-    \draw[arrow] (a2) -- (pep2);
+    \node[track] at (-0.2, 0.9) {Sensor Data};
-    % Third Y: move_forward()
+    % Track dividers
-    \node[action] (a3) at (0, -1) {HRIStudio\\move\_forward()};
+    \foreach \y in {4.5, 3.4, 2.4, 1.4, 0.4} {
-    \node[impl] (nao3) at (-2, -3) {NAO\\{\small robot-specific}};
+        \draw[gray!35, thin] (0, \y) -- (11.0, \y);
-    \node[impl] (pep3) at (2, -3) {Pepper\\{\small robot-specific}};
+    }
-    \draw[arrow] (a3) -- (nao3);
+
-    \draw[arrow] (a3) -- (pep3);
+    % Experiment step boxes
    \node[stepbox, minimum width=2.5cm] at (1.5, 5.2) {Intro};
    \node[stepbox, minimum width=4.0cm] at (5.2, 5.2) {Story Telling};
    \node[stepbox, minimum width=2.5cm] at (9.5, 5.2) {Recall Test};
    % Step boundary markers
    \draw[dashed, gray!60] (3.0, 4.5) -- (3.0, 0.4);
    \draw[dashed, gray!60] (7.5, 4.5) -- (7.5, 0.4);
    % Scripted actions
    \node[dot] at (0.5, 3.9) {};
    \node[dot] at (1.4, 3.9) {};
    \node[dot] at (2.3, 3.9) {};
    \node[dot] at (3.8, 3.9) {};
    \node[dot] at (5.0, 3.9) {};
    \node[dot] at (6.1, 3.9) {};
    \node[dot] at (7.2, 3.9) {};
    \node[dot] at (9.0, 3.9) {};
    \node[dot] at (10.5, 3.9) {};
    % Deviation marker
    \node[devdot] at (5.6, 3.9) {};
    \node[font=\scriptsize, above=5pt] at (5.6, 3.9) {deviation};
    % Video bar
    \node[mediabar, minimum width=10.8cm] at (5.4, 2.9) {};
    % Audio bar
    \node[mediabar, minimum width=10.8cm, fill=gray!20] at (5.4, 1.9) {};
    % Sensor data (continuous sampled line)
    \draw[thick, gray!60] plot[smooth] coordinates {
        (0.0, 0.90) (1.0, 0.97) (2.0, 0.84) (3.0, 1.01) (4.0, 0.87)
        (5.0, 0.96) (6.0, 0.83) (7.0, 0.99) (8.0, 0.86) (9.0, 0.95)
        (10.0, 0.88) (11.0, 0.93)
    };
 \end{tikzpicture}
-\caption{Plugin architecture: each abstract action branches to platform-specific implementations.}
+\caption{Structure of a completed trial record, showing synchronized action log, media, and sensor tracks.}
-\label{fig:plugin-architecture}
+\label{fig:trial-record}
 \end{figure}
-\subsection{Event-Driven Execution}
+Video and audio are recorded locally in the researcher's browser during the trial rather than streamed to the server in real time. This prevents network delays or server load from dropping frames or degrading audio quality during the interaction. When the trial concludes, the browser transfers the complete recordings to the server and associates them with the trial record. The Analysis interface can align video and audio with the logged actions without any manual synchronization, because the timestamp when recording starts is logged alongside the action log.
-During a trial, HRIStudio must balance two competing demands: following the experimental protocol precisely while allowing natural human-robot timing. The execution engine accomplishes this by waiting for specific events at designated points in the protocol. For example, if the protocol specifies ``wait for wizard to click Continue,'' the system pauses until that event occurs, regardless of how long it takes. This preserves the spontaneous, human-paced nature of interaction while ensuring the protocol structure is followed.
+The system stores structured and media data separately. Experiment specifications and trial records are stored in the same structured database, which makes it efficient to query across trials (for example, retrieving all trials for a specific participant or comparing action timing across conditions). Video and audio files are stored in a dedicated file store, since their size makes them unsuitable for a database and the system never queries their content directly.
-Every action during a trial, including robot movements, wizard button clicks, sensor readings, and timing information, is immediately recorded with precise timestamps. This comprehensive logging happens automatically, without requiring researchers to instrument their experiments manually. The complete event record enables two critical capabilities: first, researchers can analyze exactly what happened during a trial without relying on memory or handwritten notes; second, the detailed event log makes trials reproducible by documenting not just what was supposed to happen, but what actually occurred.
+\section{The Execution Engine}
-\subsection{Local Media Recording}
+The execution engine is the component that runs a trial: it loads the experiment, manages the wizard's connection, sends robot commands, and keeps all connected clients in sync.
-Video and audio recording during trials must not interfere with the live interaction. To ensure this, recording happens locally in the researcher's web browser rather than streaming data to a remote server in real-time. The browser accumulates the video and audio data, then transfers the complete recordings to the server when the trial concludes. This approach prevents network delays or server processing from causing dropped video frames or degraded audio quality during the critical interaction period.
+When a trial begins, the server loads the experiment and maintains a live connection to the wizard's browser and any observer connections. The execution engine does not advance through the actions of an experiment on a timer; instead, the wizard controls how time advances from action to action. This preserves the natural pacing of the interaction: the wizard advances only when the participant is ready, while the experiment structure ensures the protocol is followed. When the wizard triggers an action, the server sends the related command to the robot, writes the log entry, and pushes the updated experiment state to all connected clients in the same operation — keeping the wizard's view, the observer view, and the actual robot state synchronized in real time.
-The timestamps when recording starts and stops are logged alongside other trial events, ensuring that when researchers later review the video, they can see exactly what was happening in the experiment protocol at any given moment in the recording.
+No two human subjects respond identically to an experimental protocol. One subject gives a one-word answer; another offers a paragraph; a third asks the robot a question the script never anticipated. A fully programmed robot has no answer for that third subject: the interaction stalls, or immersion breaks. The wizard exists to fill that gap: where the program runs out of instructions, the wizard draws on their knowledge of human social interaction to keep the exchange coherent. Unscripted actions give the wizard the tools to exercise that judgment in the moment. The wizard triggers them via the manual controls in the Execution interface, the robot command runs, and the system logs the action with a deviation flag. This design preserves research value: the interaction gains the flexibility only a human can provide, and that flexibility appears explicitly in the record rather than disappearing into it.
-\section{System Architecture and Data Flow}
+\section{Robot Integration}
-\subsection{Separation of architectural layers}
+A configuration file describes each robot platform, listing the actions it supports and specifying how each one maps to a command the robot understands. The execution engine reads this file at startup and uses it whenever it needs to dispatch a command: it looks up the action type, assembles the appropriate message, and sends it to the robot over a bridge process running on the local network. The web server itself has no knowledge of any specific robot; all hardware-specific logic lives in the configuration file.
-HRIStudio's architecture separates the system into three distinct layers, each with a specific responsibility:
+The execution engine treats control flow elements such as branches and conditionals, which function as elements of a computer program, the same way as robot actions. These control-flow elements appear as action groups in the experiment and are evaluated during the trial, so researchers can freely mix logical decisions and physical robot behaviors when designing an experiment without any special handling.
-\begin{enumerate}
+Figure~\ref{fig:plugin-architecture} illustrates this mapping using NAO6 and TurtleBot as an example. Actions a platform does not support (such as \texttt{raise\_arm} on TurtleBot) appear as explicitly unsupported in the configuration file rather than silently failing. Because all hardware-specific logic lives in the configuration file, the experiment itself does not change between platforms.
 \item \textbf{User interface layer:} The visual interfaces (Design, Execute, Playback) run in the researcher's web browser. This layer handles user interactions, including clicking buttons, dragging experiment components, and viewing live trial status.
 \item \textbf{Application logic layer:} A server process manages experiment data, coordinates trial execution, authenticates users, and orchestrates communication between the interface and the robot.
 \item \textbf{Data and robot control layer:} This layer encompasses two responsibilities: long-term storage of experiment protocols and trial data; and direct communication with robot hardware.
 \end{enumerate}
 This separation provides several benefits. Different parts of the system can evolve independently; for example, improving the user interface does not require changes to robot control logic. The separation also clarifies responsibilities: the user interface should never directly command robot hardware; all robot actions flow through the application logic layer, which can enforce safety constraints and maintain consistent logging.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
-    layer/.style={rectangle, draw=black, thick, fill, minimum width=6.5cm, minimum height=1cm, align=center, text width=6.2cm},
+    expbox/.style={rectangle, draw=black, thick, fill=gray!10, align=left, font=\small, inner sep=10pt},
-    arrow/.style={->, thick, line width=1.5pt}]
+    cfgbox/.style={rectangle, draw=black, thick, dashed, fill=white, align=center, font=\small\itshape, inner sep=6pt},
    robotbox/.style={rectangle, draw=black, thick, fill=gray!25, align=left, font=\small, inner sep=10pt},
    arrow/.style={->, thick}]
-    % Layer 1: UI
+    % Experiment box
-    \node[layer, fill=gray!15] (ui) at (0, 3.5) {
+    \node[expbox] (exp) at (0, 0) {
-        \textbf{User Interface}\\[0.1cm]
+        \textbf{Experiment}\\[4pt]
-        {\small Design, Execute, Playback}
+        \texttt{speak(text)}\\[2pt]
        \texttt{raise\_arm()}\\[2pt]
        \texttt{move\_forward()}
    };
-    % Layer 2: Logic
+    % Configuration file node (intermediate)
-    \node[layer, fill=gray!30] (logic) at (0, 1.8) {
+    \node[cfgbox] (cfg) at (4.5, 0) {configuration\\file};
-        \textbf{Application Logic}\\[0.1cm]
+
-        {\small Execution, Authentication, Logger}
+    % NAO6 box
    \node[robotbox] (nao) at (9.5, 1.6) {
        \textbf{NAO6}\\[4pt]
        \texttt{speak} $\to$ \texttt{/nao/tts}\\[2pt]
        \texttt{raise\_arm} $\to$ \texttt{/nao/arm}\\[2pt]
        \texttt{move} $\to$ \texttt{/nao/move}
    };
-    % Layer 3: Data
+    % TurtleBot box
-    \node[layer, fill=gray!45] (data) at (0, 0.1) {
+    \node[robotbox] (tb) at (9.5, -1.6) {
-        \textbf{Data \& Robot Control}\\[0.1cm]
+        \textbf{TurtleBot}\\[4pt]
-        {\small Database, File Storage, ROS}
+        \texttt{speak} $\to$ \texttt{/tts/say}\\[2pt]
        \texttt{raise\_arm} $\to$ \textit{(not supported)}\\[2pt]
        \texttt{move} $\to$ \texttt{/cmd\_vel}
    };
    % Arrows
-    \draw[arrow] (ui.south) -- (logic.north);
+    \draw[arrow] (exp.east) -- (cfg.west);
-    \draw[arrow] (logic.south) -- (data.north);
+    \draw[arrow] (cfg.east) -- (nao.west);
    \draw[arrow] (cfg.east) -- (tb.west);
 \end{tikzpicture}
-\caption{HRIStudio's three-layer architecture separates user interface, application logic, and data/robot control.}
+\caption{Abstract experiment actions translated to platform-specific robot commands through per-platform configuration files.}
-\label{fig:three-tier}
+\label{fig:plugin-architecture}
 \end{figure}
-\subsection{Data Flow During a Trial}
+\section{Access Control}
-The flow of data during a trial illustrates how the architectural layers coordinate:
+I implemented access control using a role-based access control (RBAC) model. Each study has a membership list, and each member is assigned one of four roles that define a clear separation of capabilities: those who own the study, those who design it, those who run it, and those who observe it. This enforces need-to-know access at the study level so that each team member sees or is able to modify only what their role requires.
 \begin{enumerate}
 \item A researcher creates an experiment protocol using the Design interface and initiates a trial.
 \item The application server loads the protocol and begins stepping through it, sending commands to the robot and waiting for events (wizard inputs, sensor readings, timeouts).
 \item Every action, both planned protocol steps and unexpected events, is immediately written to the trial log with precise timing information.
 \item The Execute interface continuously displays the current state, allowing the wizard and observers to monitor progress in real-time.
 \item When the trial concludes, all recorded media (video, audio) is transferred from the browser to the server and associated with the trial record.
 \item Later, the Analysis interface retrieves the stored trial data and reconstructs exactly what happened, synchronized with the video and audio recordings.
 \end{enumerate}
 This design ensures comprehensive documentation of every trial, supporting both fine-grained analysis and reproducibility. Researchers can review not just what they planned to happen, but what actually occurred, including timing variations and unexpected events.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}[
    stage/.style={rectangle, draw, thick, rounded corners, minimum width=3.5cm, minimum height=1cm, align=center, font=\footnotesize},
    arrow/.style={->, thick, line width=1.3pt}]
    % Six stages stacked vertically with descriptions inside
    \node[stage, fill=gray!10] (s1) at (0, 7.5) {1. Design Protocol\\{\scriptsize Researcher creates workflow}};
    \node[stage, fill=gray!15] (s2) at (0, 6) {2. Load \& Execute\\{\scriptsize System loads and runs trial}};
    \node[stage, fill=gray!20] (s3) at (0, 4.5) {3. Log Events\\{\scriptsize Actions recorded with timestamps}};
    \node[stage, fill=gray!25] (s4) at (0, 3) {4. Display Live State\\{\scriptsize Wizard sees real-time progress}};
    \node[stage, fill=gray!30] (s5) at (0, 1.5) {5. Transfer Media\\{\scriptsize Video/audio saved to server}};
    \node[stage, fill=gray!35] (s6) at (0, 0) {6. Analyze \& Playback\\{\scriptsize Review data with synchronized media}};
    % Downward arrows
    \draw[arrow] (s1.south) -- (s2.north);
    \draw[arrow] (s2.south) -- (s3.north);
    \draw[arrow] (s3.south) -- (s4.north);
    \draw[arrow] (s4.south) -- (s5.north);
    \draw[arrow] (s5.south) -- (s6.north);
 \end{tikzpicture}
 \caption{Trial data flow: from protocol design through execution and recording, to analysis and playback.}
 \label{fig:trial-dataflow}
 \end{figure}
 \section{Validation Through Deployment}
 The HRIStudio platform was implemented as a complete, functional reference system and validated through deployment with a physical NAO6 robot. This section documents what was built and what was demonstrated.
 \begin{description}
-\item[Fully operational interfaces:] The Design, Execute, and Playback interfaces were implemented and tested with real users. The visual design environment supports drag-and-drop construction of experiment workflows with no programming required.
+    \item[Owner.] Full control over the study: can invite or remove members, configure the study settings, and access all data.
-
+    \item[Researcher.] Can create and modify experiment designs and review all collected trial data, but cannot manage team membership.
-\item[Real-time robot control:] The system successfully maintained responsive communication with a NAO6 robot during live trials, controlling speech output, arm movements, and head gestures. Commands from the web browser were translated to robot-specific instructions with acceptable latency.
+    \item[Wizard.] Can trigger actions during a trial and view the execution interface, but cannot modify the experiment design or access other wizards' sessions.
-
+    \item[Observer.] Read-only access: can watch a trial in real time and annotate significant moments, but cannot trigger actions or modify any data.
 \item[Automatic comprehensive logging:] Every wizard action, robot behavior, and sensor reading was recorded with millisecond-precision timestamps. The logging infrastructure captured the complete trial trace without requiring any manual instrumentation.
 \item[Plugin-based robot abstraction:] The NAO6 robot was integrated through a plugin that mapped abstract actions (e.g., \texttt{speak()}, \texttt{raise\_arm()}) to robot-specific commands. New robots can be added by creating additional plugins.
 \item[Reproducible deployment:] The complete system was packaged for easy deployment, enabling other researchers to set up the platform with minimal configuration. A mock robot was included for testing without physical hardware.
 \end{description}
-The implementation demonstrates that the proposed framework is technically feasible: web-based control can achieve sufficient responsiveness for live Wizard-of-Oz experiments, and a plugin architecture can provide platform abstraction without sacrificing expressiveness.
+The role definitions above determine who can view and change data during normal study operation. The role system also supports what is known as a double-blind design~\cite{Bartneck2024}, where neither the wizard nor the researcher has access to condition assignments or results until the study concludes. For example, the Owner can restrict a Wizard's view of which condition a human subject has been assigned to, and can prevent Researchers from accessing result data until all trials are complete, without any changes to the underlying experiment.
-\section{Architectural Challenges and Solutions}
+\section{Architectural Challenges}
-\subsection{Real-Time Responsiveness During Trials}
+The following two problems required specific solutions during implementation.
 The Execute interface must maintain responsive communication between the wizard and the robot. Wireless networks and web-based systems can introduce delays that, if not carefully managed, degrade interaction quality or compromise safety. The implementation addresses this in three ways: maintaining persistent connections that avoid the overhead of repeatedly establishing communication; deploying the server on the same local network as the robot to minimize network delays; and anticipating likely next actions to prepare the robot in advance when possible.
 \subsection{Synchronizing Multiple Data Sources}
 During playback, researchers need to see video, hear audio, and review event logs in perfect synchronization. However, these data sources have different characteristics: video captures 30 frames per second, audio samples thousands of times per second, and event logs record discrete actions at irregular intervals. The implementation uses a common time reference and records precise timestamps for all data, allowing the playback system to align everything accurately regardless of differences in how the data was originally captured.
 \subsection{Extensibility Without Fragmentation}
 The plugin architecture allows researchers to add support for new robot platforms without modifying HRIStudio's core code. This design separates the evolution of the platform itself from the evolution of robot support: I can improve HRIStudio's core functionality without affecting plugins, and researchers can add new robots without waiting for core platform changes.
 However, this separation creates a design challenge: the plugin interface must be flexible enough to accommodate diverse robots, but not so flexible that every robot requires completely custom code. Finding this balance requires validating the plugin design with multiple real robots to ensure the abstraction is appropriate.
 \section{Mapping Architecture to Requirements}
 The implementation choices described in this chapter directly support the six requirements established earlier:
 \begin{description}
-\item[R1 (Integrated workflow):] The unified Design/Execute/Analysis pipeline with shared data models ensures coherent workflows without switching between separate tools.
+    \item[Execution latency.] During a trial, the execution engine must respond quickly to wizard input --- a noticeable delay between the button press and the robot's action can disrupt the interaction. I addressed this by maintaining a persistent network connection to the robot bridge for the duration of each trial. The connection is established once at trial start and kept open, eliminating per-action setup overhead.
-\item[R2 (Low technical barrier):] Web-based deployment and drag-and-drop interface design eliminate installation complexity and reduce the learning curve.
+
-\item[R3 (Real-time control):] Event-driven execution with persistent connections enables responsive, natural human-robot interaction.
+    \item[Multi-source synchronization.] The Analysis interface requires aligning data streams captured at different sampling rates by different components: video, audio, action logs, and sensor data. The solution is a shared time reference: every data source records its timestamps relative to the same trial start time, $t_0$, so the Analysis interface can align all tracks without requiring manual calibration.
 \item[R4 (Automated logging):] Comprehensive event logging captures the complete trial trace automatically, without requiring researchers to add logging code to their experiments.
 \item[R5 (Platform agnosticism):] The plugin architecture allows integration with diverse robot platforms without modifying core system code.
 \item[R6 (Collaborative support):] Multiple team members can simultaneously observe trial execution through shared, synchronized views.
 \end{description}
 \section{Implementation Status}
 HRIStudio has reached minimum viable product status. The Design, Execution, and Analysis interfaces are operational. The execution engine handles scripted and unscripted actions with full timestamped logging, and I validated robot communication on the NAO6 platform during development. The platform can run a controlled WoZ study without modification to its core architecture or execution workflow.
 Work remaining for future development includes broader validation of the configuration file approach on robot platforms beyond NAO6.
 \section{Chapter Summary}
-This chapter has described the implementation of HRIStudio as a complete, functional reference system that validates the proposed framework. The key contributions of the implementation are: (1) demonstrating that web technologies can achieve sufficient responsiveness for real-time robot control in Wizard-of-Oz experiments, (2) validating the plugin architecture as a viable approach to platform abstraction, and (3) proving that comprehensive, automatic event logging can be achieved without requiring experimental instrumentation.
+This chapter described how HRIStudio realizes the design principles from Chapter~\ref{ch:design} in practice. Experiments are persistent, reusable specifications that produce complete, comparable trial records. The execution engine is event-driven rather than timer-driven, keeping the wizard in control of pacing while logging every action automatically. Per-platform configuration files keep the execution engine hardware-agnostic. The role system enforces access control at the study level. The platform is at minimum viable product status and can run a controlled WoZ study today. HRIStudio is one realization of these principles; the contribution lies in the design principles themselves, which any implementation could adopt.
 Building the system as a web application eliminates installation complexity and enables natural collaboration across locations. The plugin architecture enables researchers to add robot support without modifying core code, supporting the platform longevity goals established in Chapter~\ref{ch:background}.
 Technical details of the implementation, including deployment procedures, the plugin specification, and the communication protocols, are documented in Appendix~\ref{app:tech_docs}. The following chapter describes the pilot validation study conducted to assess the system's usability and effectiveness with real users.
@@ -1,48 +1,134 @@
-\chapter{Experimental Evaluation of HRIStudio}
+\chapter{Pilot Validation Study}
 \label{ch:evaluation}
-This chapter describes the pilot validation conducted to assess whether HRIStudio meets its design goals in practice. The primary contribution of this thesis is the conceptual framework and reference implementation; the pilot assessment serves to validate that the approach is viable, not to conduct a definitive empirical evaluation.
+This chapter presents the pilot validation study used to evaluate whether HRIStudio improves accessibility and reproducibility in WoZ-based HRI research. It defines the research questions, study design, participant roles, task, apparatus, procedure, and measurement instruments.
-\section{Assessment Goals}
+\section{Research Questions}
-The pilot validation addressed two feasibility questions:
+The evaluation targets the two problems established in Chapter~\ref{ch:background}. The first is the \emph{Accessibility Problem}: existing tools require substantial programming expertise, which prevents domain experts from conducting independent HRI studies. The second is the \emph{Reproducibility Problem}: without structured logging and protocol enforcement, experiment execution varies across participants and wizards in ways that are difficult to detect or control after the fact.
-\begin{enumerate}
+These problems give rise to two research questions. The first asks whether HRIStudio enables domain experts without prior robotics experience to successfully implement a robot interaction from a written specification. The second asks whether HRIStudio produces more reliable execution of that interaction compared to standard practice.
 \item \textbf{Usability}: Can users with no programming experience design and execute Wizard-of-Oz experiments using the system?
 \item \textbf{Technical validity}: Does the system maintain responsive robot control and comprehensive logging during live sessions?
 \end{enumerate}
-These questions assess whether the reference implementation successfully instantiates the proposed framework, providing evidence that the approach is sound.
+I hypothesized that wizards using HRIStudio would more completely and correctly implement the written specification, and that their designs would execute more reliably during the trial, compared to wizards using ad hoc programs created for specific social robotics experiments, with Choregraphe as the baseline tool in this study.
-\section{Pilot Design}
+\section{Study Design}
-The assessment used a within-subjects design with participants from non-technical backgrounds (psychology, education, and related fields). Each participant completed two tasks:
+I used what Bartneck et al.~\cite{Bartneck2024} call a between-subjects design, in which each participant is assigned to only one condition. I randomly assigned each wizard participant to one of two conditions: HRIStudio or Choregraphe. Both groups received the same task, the same time allocation, and the same training structure. Measuring each participant in only one condition prevents carryover effects, meaning performance changes caused by prior exposure to another condition rather than by the assigned condition itself.
-\begin{enumerate}
+In this study, I defined two types of participants with distinct roles. Wizards were faculty members drawn from across departments who designed and ran the robot interaction. Test subjects were undergraduate students who interacted with the robot during the trial. This separation ensures that the evaluation captures both the design experience and the quality of the resulting interaction. The next section details recruitment, inclusion criteria, and sample rationale for both groups.
 \item \textbf{Design task}: Create a simple experiment protocol using the visual design interface
 \item \textbf{Execution task}: Conduct a trial session using the wizard interface, controlling a robot
 \end{enumerate}
-Task completion, time-on-task, and error rates were recorded. Participants provided feedback via a brief questionnaire.
+\section{Participants}
 \textbf{Wizards.} I recruited eight Bucknell University faculty members drawn from across departments to serve as wizards. I deliberately recruited from both ends of the programming experience spectrum: four had substantial programming backgrounds, and four described themselves as non-programmers or having minimal coding experience. This cross-departmental recruitment was intentional. A primary claim of HRIStudio is that it lowers the technical barrier for domain experts who are not programmers; drawing wizards from outside computer science allows the data to speak to whether that claim holds for the intended user population.
 The key inclusion criterion for all wizards was no prior experience with either the NAO robot or Choregraphe software specifically. This controls for tool familiarity so that performance differences reflect the tools themselves rather than prior exposure. I recruited wizards through direct email. Participation was framed as a voluntary software evaluation unrelated to any professional obligations.
 \textbf{Test subjects.} I recruited eight undergraduate students from Bucknell University to serve as test subjects. Their role was to serve as the subjects for the experimental protocol coded by each wizard. To eliminate any risk of coercion, I screened participants to ensure that no test subject was enrolled in a course taught by the wizard they were paired with. Recruitment used campus flyers inviting volunteers to interact with a robot for approximately 15 minutes, and all participants received international snacks and refreshments upon arrival regardless of whether they completed the full session.
 \textbf{Sample size rationale.} With $N = 16$ total participants, this sample size is appropriate for a pilot validation study whose goal is directional evidence and failure-mode identification rather than effect-size estimation for a broad population. The size matches the scope and constraints of this honors thesis: two academic semesters, one undergraduate researcher, and no funded research assistant support. It also reflects the target population and recruitment context. Faculty domain experts outside computer science with no prior NAO or Choregraphe experience are a limited pool at a small liberal arts university and have high competing time demands; eight wizard participants represent the available pool without relaxing inclusion criteria.
 This scale is consistent with pilot and feasibility studies in HRI, where small $N$ designs are common in early-stage tool validation~\cite{Steinfeld2009}. Findings should be interpreted as preliminary evidence and directional indicators rather than as conclusive proof.
 \section{Task}
 Both wizard groups received the same written task specification: the \emph{Interactive Storyteller} scenario. The specification described a robot that introduces an astronaut named Dara, narrates her discovery of an anomalous glowing rock on Mars, asks the human subject a comprehension question about the story, and delivers one of two responses depending on whether the answer is correct. The full specification, including exact robot speech, required gestures, and branching logic, is reproduced in Appendix~\ref{app:materials}.
 The task was chosen because it requires several distinct capabilities: speech actions, gesture coordination, conditional branching based on human-subject input, and a defined conclusion. In both conditions, wizards had to translate the same written protocol into an executable interaction script, including action ordering, branching logic, and timing decisions. In Choregraphe, that meant assembling and connecting behavior nodes in a finite state machine. In HRIStudio, it meant building a sequential action timeline with conditional branches. This makes the task a direct comparison of how each tool supports coding the robot behavior required by the same protocol.
 \section{Robot Platform and Software Apparatus}
 Both conditions used the same NAO humanoid robot, a platform approximately 0.58 meters tall capable of speech synthesis, animated gestures, and head movement. Using the same hardware ensured that any differences in execution quality were attributable to the software, not the robot.
 Figure~\ref{fig:platform-photo-placeholders} reserves space for final platform images. Replace these placeholders with the final NAO6 and TurtleBot photos when available.
 \begin{figure}[htbp]
 \centering
 \begin{tikzpicture}
 	\draw[thick] (0,0) rectangle (6,4);
 	\node at (3,2.5) {\textbf{NAO6 Image Placeholder}};
 	\node at (3,1.7) {Humanoid platform photo};
 	\draw[thick] (7,0) rectangle (13,4);
 	\node at (10,2.5) {\textbf{TurtleBot Image Placeholder}};
 	\node at (10,1.7) {Mobile base platform photo};
 \end{tikzpicture}
 \caption{Placeholder image slots for NAO6 and TurtleBot platforms.}
 \label{fig:platform-photo-placeholders}
 \end{figure}
 The control condition used Choregraphe \cite{Pot2009}, a proprietary visual programming tool developed by Aldebaran Robotics and the standard software for NAO programming. Choregraphe organizes behavior as a finite state machine: nodes represent states and edges represent transitions triggered by conditions or timers.
 The experimental condition used HRIStudio, described in Chapter~\ref{ch:implementation}. HRIStudio organizes behavior as a sequential action timeline with support for conditional branches. Unlike Choregraphe, it abstracts robot-specific commands through configuration files, though for this study both tools controlled the same NAO platform.
 \section{Procedure}
-Participants attended a 30-minute orientation covering the HRIStudio interface. They then completed the design and execution tasks independently, with the researcher available for technical support. The researcher observed and recorded any usability issues or technical problems. After completing both tasks, participants completed the feedback questionnaire.
+Each wizard completed a single 75-minute session structured in four phases. Each session was run by one wizard and included one test subject during the trial phase, which lasted approximately 15 minutes.
-\section{Results}
+\subsection{Phase 1: Training (15 minutes)}
-All participants successfully completed both tasks without requiring assistance beyond initial orientation. Participants designed functional experiment protocols using only the visual interface, confirming that programming knowledge is not required (R2). During execution, the wizard interface guided participants through the protocol, and the robot responded appropriately to commands.
+I opened each session with a standardized tutorial tailored to the wizard's assigned tool. The tutorial covered how to create speech actions, specify gestures, define conditional branches, and save the completed design. Training was intentionally brief to simulate a domain expert encountering a new tool without dedicated onboarding. I answered clarification questions but did not offer hints about the design challenge.
-The system maintained responsive control throughout all sessions, with no perceptible delay between wizard input and robot action. Comprehensive event logs were generated automatically, capturing every action with millisecond-precision timestamps.
+\subsection{Phase 2: Design Challenge (30 minutes)}
-Participant feedback was generally positive regarding interface usability, with suggestions for improving the visual design of the protocol editor.
+The wizard received the paper specification and had 30 minutes to implement it using their assigned tool. I observed silently and recorded a screen capture of the wizard's workflow throughout. I noted time to completion, help requests, and any observable errors or misconceptions. If the wizard declared completion before the 30-minute limit, the remaining time was used to review and refine the design.
-\section{Interpretation}
+\subsection{Phase 3: Trial (15 minutes)}
-The pilot validation confirms that HRIStudio is usable by non-programmers and technically functional for live Wizard-of-Oz experiments. These results support the feasibility of the proposed approach: a web-based framework can enable domain experts to conduct HRI research without programming expertise.
+After the design phase, a test subject entered the room and the wizard ran their completed program to control the robot during an actual interaction. I video-recorded the full trial to capture robot behavior and timing. I told the test subject they were helping evaluate the robot's performance, not being evaluated themselves.
-This assessment is necessarily limited in scope. A more comprehensive evaluation would involve larger samples, direct comparison with alternative tools, and formal measurement of experimental validity. The focus here is on demonstrating feasibility rather than establishing generalizable findings about the framework's effectiveness.
+\subsection{Phase 4: Debrief (15 minutes)}
 Following the trial, the wizard exported their completed project file and completed the System Usability Scale survey. The exported file and video recording served as the primary artifacts for scoring.
 \section{Measures}
 \label{sec:measures}
 The study collected four measures, two primary and two supplementary.
 \subsection{Design Fidelity Score}
 The Design Fidelity Score measures how completely and correctly the wizard implemented the paper specification. I evaluated the exported project file against five criteria: whether all four interaction steps were present, whether robot speech matched the specification word-for-word, whether gestures were assigned to the correct steps, whether the conditional branch triggered on the correct condition, and whether both response branches were complete and correctly ordered. I scored each criterion as met or not met; the DFS is the proportion of criteria satisfied.
 This measure is motivated by a gap identified by Riek~\cite{Riek2012}, whose systematic review of 54 published WoZ studies found that only 11\% constrained what the wizard could recognize and fewer than 6\% described wizard training procedures, meaning the vast majority of WoZ studies never verified whether the wizard's design matched any formal specification. Porfirio et al.~\cite{Porfirio2023} similarly argued that formal, verifiable behavior specifications are a prerequisite for reproducible HRI, and the preliminary design of HRIStudio identified specification adherence as a primary evaluation target~\cite{OConnor2024}. The DFS applies these recommendations as a weighted rubric scored against the exported project file. The complete rubric is reproduced in Appendix~\ref{app:materials}. This measure addresses accessibility: did the tool allow a non-expert to produce a correct design?
 \subsection{Execution Reliability Score}
 The Execution Reliability Score measures whether the designed interaction executed as intended during the trial. I reviewed the video recording against the specification and the wizard's design. Evaluation criteria included whether the robot delivered the correct speech at each step, whether gestures executed and synchronized with speech, whether the conditional branch resolved correctly based on the test subject's answer, and whether any errors, disconnections, or hangs occurred. The score is the proportion of the interaction that executed without error.
 This measure responds directly to Riek's~\cite{Riek2012} finding that only 3.7\% of published WoZ studies reported any measure of wizard error, making it nearly impossible to determine whether execution matched design intent. Without an execution-level metric, a study could report a technically correct design that nonetheless failed during the trial due to timing errors, disconnections, or mishandled branches, exactly the kind of problem HRIStudio was designed to detect and log~\cite{OConnor2024, OConnor2025}. The ERS captures those deviations quantitatively. The complete rubric is reproduced in Appendix~\ref{app:materials}. This measure addresses reproducibility: did the design translate reliably into execution?
 \subsection{System Usability Scale}
 The System Usability Scale (SUS) is a validated 10-item questionnaire measuring perceived usability \cite{Brooke1996}. Wizards completed the SUS after the debrief phase. Scores range from 0 to 100, with higher scores indicating better perceived usability. The full questionnaire is reproduced in Appendix~\ref{app:materials}.
 \subsection{Time-to-Completion and Help Requests}
 Time to completion measures how long the wizard took to declare the design finished within the 30-minute window. Help request count and type capture where participants encountered difficulty. These supplementary measures provide context for interpreting the primary scores.
 \section{Measurement Instruments}
 Table~\ref{tbl:measurement_instruments} summarizes the four instruments, when they were collected, and which research question each addresses.
 \begin{table}[htbp]
 \centering
 \footnotesize
 \begin{tabular}{|p{3.2cm}|p{4.2cm}|p{2.4cm}|p{3cm}|}
 \hline
 \textbf{Instrument} & \textbf{What it captures} & \textbf{When collected} & \textbf{Research question} \\
 \hline
 Design Fidelity Score & Completeness and correctness of the wizard's implementation against the written specification & End of design phase & Accessibility \\
 \hline
 Execution Reliability Score & Whether the interaction executed as designed during the trial & Post-trial video review & Reproducibility \\
 \hline
 System Usability Scale & Wizard's perceived usability of the assigned tool & Debrief phase & User experience \\
 \hline
 Time-to-Completion \& Help Requests & Task duration and support requests during design & Throughout design phase & Supplementary \\
 \hline
 \end{tabular}
 \caption{Measurement instruments used in the pilot validation study.}
 \label{tbl:measurement_instruments}
 \end{table}
 \section{Chapter Summary}
-This chapter described the pilot validation conducted with the HRIStudio reference implementation. Results indicate that the system is usable by non-programmers and capable of maintaining responsive robot control with comprehensive logging. These findings validate the technical approach while acknowledging that further empirical evaluation is needed to assess the framework's impact on research quality and accessibility. The following chapters conclude the thesis with results, discussion, and directions for future work.
+This chapter described a pilot between-subjects study I designed to test whether the design principles formalized in Chapters~\ref{ch:design} and~\ref{ch:implementation} produce measurably different outcomes from existing practice. Eight wizard participants (four with programming backgrounds and four without) each designed and ran the Interactive Storyteller task on a NAO robot using either HRIStudio or Choregraphe. I measured design fidelity against the written specification, execution reliability during the trial, perceived usability via the SUS, and supplementary timing and help data. Chapter~\ref{ch:results} presents the results.
@@ -1,11 +1,290 @@
 \chapter{Study Materials}
 \label{app:materials}
-\section{Protocols}
+This appendix contains the study materials used in the evaluation described in Chapter~\ref{ch:evaluation}, in the order they were presented to participants.
 % TODO
-\section{IRB Materials}
+\section{Recruitment Materials}
 % TODO
-\section{Questionnaires}
+\subsection*{Email Invitation (Wizard Participants)}
-% TODO
+
 \textit{Subject: Invitation to evaluate Human-Robot Interaction software (International Snacks provided!)}
 Dear [Professor Name],
 I am conducting an honors thesis study to evaluate ``HRIStudio'', a new platform for designing human-robot interactions. I am seeking Computer Science faculty to act as expert reviewers by participating in a 75-minute Wizard-of-Oz design session.
 You will be asked to spend 30 minutes programming a simple behavior on the NAO robot using either HRIStudio or Choregraphe, and then run it live with a student volunteer. No prior experience with the NAO robot is required.
 International snacks and refreshments will be provided during the session. If you are willing to participate, please reply to schedule a time.
 \hfill Sean O'Connor (\texttt{sso005@bucknell.edu})
 \subsection*{Campus Flyer (Test Subject Participants)}
 \begin{center}
 \textbf{\large VOLUNTEERS NEEDED: INTERACT WITH A ROBOT!}
 \vspace{0.4cm}
 Participate in a short 15-minute session with a NAO humanoid robot.
 \vspace{0.4cm}
 \textbf{Snacks from around the world will be provided!}
 \vspace{0.2cm}
 Contact: \texttt{sso005@bucknell.edu}
 \end{center}
 \section{Informed Consent Forms}
 \subsection*{Wizard Participant Consent Form}
 \textbf{HRIStudio User Study --- Informed Consent (Faculty/Wizard Participant)}
 \textbf{Introduction:} You are invited to participate in a research study evaluating a new software platform for the NAO robot. This study is conducted by Sean O'Connor (Student PI) and Dr.~L.~Felipe Perrone (Advisor) in the Department of Computer Science at Bucknell University.
 \textbf{Purpose:} The purpose of this study is to compare the usability and reproducibility of a new visual programming tool (HRIStudio) against the standard software (Choregraphe).
 \textbf{Procedures:} If you agree to participate, you will complete the following in a single 75-minute session:
 \begin{enumerate}
    \item \textbf{Training (15 min):} A brief tutorial on your assigned software interface covering speech, gesture, and branching.
    \item \textbf{Design Challenge (30 min):} You will receive a written storyboard and program it on the NAO robot using your assigned tool.
    \item \textbf{Live Trial (15 min):} A student volunteer will enter the room and you will run your program to deliver the story to them.
    \item \textbf{Debrief (15 min):} You will complete a short usability survey.
 \end{enumerate}
 \textbf{Data Collection:} Your workflow will be screen-recorded during the design phase. The live trial will be video recorded to verify robot behavior. All data will be stored on encrypted drives and your identity replaced with a numerical code (e.g., W-01).
 \textbf{Risks and Benefits:} There are no known risks beyond those of normal computer use. You will receive international snacks and refreshments during the session. Your participation contributes to research on accessible tools for HRI.
 \textbf{Voluntary Participation:} Participation is entirely voluntary and unrelated to any departmental obligations. You may withdraw at any time without penalty.
 \textbf{Questions:} Contact Sean O'Connor (\texttt{sso005@bucknell.edu}) or the Bucknell IRB (\texttt{irb@bucknell.edu}).
 \vspace{0.8cm}
 \noindent\rule{0.55\textwidth}{0.4pt}\\
 Signature of Participant \hspace{4cm} Date
 \vspace{1.2cm}
 \subsection*{Test Subject Consent Form}
 \textbf{HRIStudio User Study --- Informed Consent (Student/Test Subject)}
 \textbf{Introduction:} You are invited to participate in a 15-minute robot interaction session as part of a research study conducted in the Bucknell Computer Science Department.
 \textbf{Procedure:} You will enter a lab room and listen to a short story told by a NAO humanoid robot. The robot will then ask you a comprehension question. The interaction takes approximately 5--10 minutes.
 \textbf{Data Collection:} The session will be video recorded to analyze the robot's timing and behavior. Your responses are not being graded; we are evaluating the robot's performance, not yours.
 \textbf{Risks and Benefits:} Minimal risk. You will receive international snacks and refreshments for your time.
 \textbf{Voluntary Participation:} You may stop the interaction and leave at any time without penalty.
 \vspace{0.8cm}
 \noindent\rule{0.55\textwidth}{0.4pt}\\
 Signature of Participant \hspace{4cm} Date
 \section{Paper Specification: The Interactive Storyteller}
 \textit{This document was given to each wizard participant at the start of the Design Phase.}
 \textbf{Goal:} Program the robot to tell a short interactive story to a participant. The robot must introduce the story, deliver the narrative with appropriate gestures, ask a comprehension question, and respond to the participant's answer.
 \textbf{Script and Logic Flow:}
 \begin{enumerate}
    \item \textbf{Start State}
    \begin{itemize}
        \item Robot is standing and looking at the participant.
    \end{itemize}
    \item \textbf{Step 1 --- The Hook}
    \begin{itemize}
        \item \textbf{Speech:} ``Hello. I want to tell you about someone named Dara ---
               an astronaut who made a decision that changed what we thought we knew about Mars.
               Are you ready?''
        \item \textbf{Gesture:} Perform a slow open-hand gesture toward the participant, then lower both arms and stand still before continuing.
    \end{itemize}
    \item \textbf{Step 2 --- The Narrative}
    \begin{itemize}
        \item \textbf{Speech:} ``It was 2147. Dara's crew had been on the Martian surface for six days.
               Mission protocol said to collect samples, document the terrain, and stay on schedule.
               But on the sixth morning, while the rest of the crew ran diagnostics,
               Dara wandered off course.
               About forty meters from camp, she stopped.
               Half-buried in the dust was a rock she almost stepped on ---
               smooth, the size of a fist, and glowing a deep, steady red.
               Not reflecting sunlight. Glowing.
               She knelt down, picked it up, and said nothing to anyone.''
        \item \textbf{Gesture 1:} As the robot says ``stayed on schedule,'' make a precise, dismissive hand wave.
        \item \textbf{Gesture 2:} As the robot says ``she stopped,'' pause all motion for one full second.
        \item \textbf{Gesture 3:} As the robot says ``glowing a deep, steady red,'' look slowly downward.
        \item \textbf{Gesture 4:} As the robot says ``said nothing to anyone,'' lean slightly forward and lower the voice.
    \end{itemize}
    \item \textbf{Step 3 --- Comprehension Check (Branching)}
    \begin{itemize}
        \item \textbf{Speech:} ``She brought it home.
               The mission report listed it as an anomalous geological sample.
               NASA has been running tests on it ever since.
               No one has published anything yet.''
        \item \textbf{Gesture:} Stand upright, look directly at the participant, and pause for one full second.
        \item \textbf{Question:} ``What color was the rock Dara found?''
        \item \textbf{Branch A (Correct answer: ``Red'' or ``red''):}
        \begin{itemize}
            \item \textbf{Speech:} ``Red. And it was still glowing when she landed.''
            \item \textbf{Gesture:} Robot nods once, slowly.
        \end{itemize}
        \item \textbf{Branch B (Any other answer):}
        \begin{itemize}
            \item \textbf{Speech:} ``Actually, red. Not reflecting light --- emitting it.''
            \item \textbf{Gesture:} Robot shakes head once.
        \end{itemize}
    \end{itemize}
    \item \textbf{Step 4 --- Conclusion}
    \begin{itemize}
        \item \textbf{Speech:} ``That was six years ago.
               The rock is in a lab in Houston.
               Dara still hasn't told anyone exactly where she found it.
               That's the end of the story.''
        \item \textbf{Gesture:} Stand still, lower arms to sides, and bow.
    \end{itemize}
 \end{enumerate}
 \section{Post-Study Questionnaire (System Usability Scale)}
 \textit{Completed by wizard participants after the live trial. Circle the number that best reflects your agreement with each statement.}
 \vspace{0.4cm}
 \noindent
 \renewcommand{\arraystretch}{2.2}
 \begin{tabularx}{\linewidth}{X *{5}{>{{\centering\arraybackslash}}p{0.85cm}}}
 \textbf{Statement} & \textbf{1} & \textbf{2} & \textbf{3} & \textbf{4} & \textbf{5} \\
 \textit{\footnotesize (Circle one per row)}
  & \textit{\footnotesize SD} & & & & \textit{\footnotesize SA} \\
 \hline
 1.\enspace I think that I would like to use this system frequently.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 2.\enspace I found the system unnecessarily complex.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 3.\enspace I thought the system was easy to use.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 4.\enspace I think that I would need the support of a technical person to be able to use this system.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 5.\enspace I found the various functions in this system were well integrated.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 6.\enspace I thought there was too much inconsistency in this system.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 7.\enspace I would imagine that most people would learn to use this system very quickly.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 8.\enspace I found the system very cumbersome to use.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 9.\enspace I felt very confident using the system.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 10.\enspace I needed to learn a lot of things before I could get going with this system.
  & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ & $\bigcirc$ \\
 \hline
 \end{tabularx}
 \renewcommand{\arraystretch}{1}
 \vspace{0.4cm}
 \noindent\textit{\footnotesize SD = Strongly Disagree \quad SA = Strongly Agree}
 \section{Design Fidelity Score Rubric}
 \textit{To be completed by the researcher after analyzing the exported project file.}
 \vspace{0.3cm}
 \noindent\textbf{Participant ID:} \underline{\hspace{3cm}} \hspace{1cm} \textbf{Condition:} \underline{\hspace{3cm}}
 \vspace{0.4cm}
 \renewcommand{\arraystretch}{1.6}
 \begin{tabularx}{\linewidth}{X >{\centering\arraybackslash}p{1.4cm} >{\centering\arraybackslash}p{1.4cm} >{\centering\arraybackslash}p{1.4cm}}
 \hline
 \textbf{Component} & \textbf{Present} & \textbf{Correct} & \textbf{Points} \\
 \hline
 \multicolumn{4}{l}{\textbf{Speech Actions (40 points total)}} \\
 \hline
 1.\enspace Introduction speech (``Hello. I want to tell you about someone named Dara\ldots'') & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 2.\enspace Narrative speech (``It was 2147. Dara's crew\ldots'') & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 3.\enspace Question speech (``What color was the rock Dara found?'') & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 4.\enspace Response speeches (correct and incorrect branches) & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 \hline
 \multicolumn{4}{l}{\textbf{Gestures and Actions (30 points total)}} \\
 \hline
 5.\enspace Open-hand gesture during introduction & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 6.\enspace At least two narrative gestures (pause, lean, gaze) & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 7.\enspace Nod (correct branch) or head shake (incorrect branch) & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 \hline
 \multicolumn{4}{l}{\textbf{Control Flow and Logic (30 points total)}} \\
 \hline
 8.\enspace Conditional branch triggers on participant's answer & Y~~/~~N & Y~~/~~N & ~~~~~/15 \\
 9.\enspace Correct sequencing of all four steps & Y~~/~~N & Y~~/~~N & ~~~~~/15 \\
 \hline
 \end{tabularx}
 \renewcommand{\arraystretch}{1}
 \vspace{0.4cm}
 \noindent\textbf{Scoring:} Award full points if both Present \emph{and} Correct; 50\% if Present but not Correct; 0 if not Present.
 \vspace{0.2cm}
 \noindent\textbf{Total:} \underline{\hspace{2cm}} / 100 \hspace{1.5cm} \textbf{Design Fidelity Score:} \underline{\hspace{2cm}}\%
 \vspace{0.3cm}
 \noindent\textbf{Notes:}
 \vspace{2.5cm}
 \section{Execution Reliability Score Rubric}
 \textit{To be completed by the researcher after reviewing the video recording of the live trial.}
 \vspace{0.3cm}
 \noindent\textbf{Participant ID:} \underline{\hspace{3cm}} \hspace{0.5cm} \textbf{Condition:} \underline{\hspace{3cm}}
 \vspace{0.2cm}
 \noindent\textbf{Video File:} \underline{\hspace{6cm}}
 \vspace{0.4cm}
 \renewcommand{\arraystretch}{1.6}
 \begin{tabularx}{\linewidth}{X >{\centering\arraybackslash}p{1.4cm} >{\centering\arraybackslash}p{1.6cm} >{\centering\arraybackslash}p{1.4cm}}
 \hline
 \textbf{Behavior} & \textbf{Executed?} & \textbf{Correctly?} & \textbf{Points} \\
 \hline
 \multicolumn{4}{l}{\textbf{Speech Execution (40 points total)}} \\
 \hline
 1.\enspace Introduction speech delivered without errors & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 2.\enspace Narrative speech delivered without errors & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 3.\enspace Comprehension question delivered correctly & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 4.\enspace Appropriate branch response given & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 \hline
 \multicolumn{4}{l}{\textbf{Gesture and Movement Execution (30 points total)}} \\
 \hline
 5.\enspace Introduction gesture executed completely & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 6.\enspace At least two narrative gestures executed & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 7.\enspace Nod or head shake executed correctly & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 \hline
 \multicolumn{4}{l}{\textbf{Timing and Synchronization (20 points total)}} \\
 \hline
 8.\enspace Speech and gestures synchronized & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 9.\enspace Pause held before comprehension question & Y~~/~~N & Y~~/~~N & ~~~~~/10 \\
 \hline
 \multicolumn{4}{l}{\textbf{System Reliability (10 points --- deduct if problems occur)}} \\
 \hline
 10.\enspace No disconnections, crashes, or hangs occurred & Y~~/~~N & N/A & ~~~~~/10 \\
 \hline
 \end{tabularx}
 \renewcommand{\arraystretch}{1}
 \vspace{0.4cm}
 \noindent\textbf{Scoring:} Award full points if both Executed \emph{and} Correct; 50\% if Executed but not Correct; 0 if not Executed. For item 10, award full points only if no errors occurred.
 \vspace{0.2cm}
 \noindent\textbf{Total:} \underline{\hspace{2cm}} / 100 \hspace{1.5cm} \textbf{Execution Reliability Score:} \underline{\hspace{2cm}}\%
 \vspace{0.3cm}
 \noindent\textbf{Notes:}
 \vspace{2.5cm}
@@ -1,211 +1,49 @@
 \chapter{Technical Documentation}
 \label{app:tech_docs}
-This appendix documents the technical implementation details of HRIStudio for researchers who wish to deploy, extend, or build upon the platform. The main text focuses on the conceptual framework and architectural decisions; this appendix preserves the implementation specifics.
+This appendix documents the specific technologies and libraries used to build HRIStudio, organized by the three architectural layers described in Chapter~\ref{ch:design}. The goal here is reference, not justification; Chapter~\ref{ch:implementation} explains the reasoning behind the major architectural choices.
-\section{System Architecture Overview}
+\section{Technology Stack}
-HRIStudio consists of three primary components:
+\subsection{User Interface Layer}
-\begin{enumerate}
+The frontend is built on Next.js (App Router) using React and TypeScript. TypeScript is used throughout the entire codebase, including the server and data access layers, so that type inconsistencies between layers are caught at compile time rather than at runtime. Styling is handled with Tailwind CSS and the shadcn/ui component library. The drag-and-drop canvas in the Design interface uses the \texttt{@dnd-kit} library (\texttt{@dnd-kit/core} and \texttt{@dnd-kit/sortable}) to manage nested drag operations for arranging steps and action blocks.
 \item \textbf{Web Application}: A Next.js application with TypeScript running in the browser, providing the user interface for design, execution, and analysis.
 \item \textbf{Application Server}: A Node.js server handling API requests, session management, and orchestration.
 \item \textbf{Data Layer}: PostgreSQL for structured data (studies, experiments, trials) and MinIO (S3-compatible) for unstructured media files.
 \end{enumerate}
-Communication between the web application and the robot is mediated through a rosbridge WebSocket server, which translates between the browser's WebSocket protocol and ROS topics and services~\cite{Quigley2009}.
+\subsection{Application Logic Layer}
 The server runs as a Next.js Node.js process. API routes use tRPC over HTTP for typed request/response calls; real-time communication during live trials uses a persistent WebSocket connection via the \texttt{ws} package. Authentication and session management are handled by NextAuth.js (v5 beta) with the \texttt{@auth/drizzle-adapter} and bcryptjs for password hashing. Currently, credential-based (username and password) authentication is supported.
 \subsection{Data and Robot Control Layer}
 Experiment protocols, trial records, and user data are stored in PostgreSQL. The schema and all database queries are managed through Drizzle ORM, which provides compile-time type safety for database interactions. Action configuration parameters and plugin-specific fields are stored as JSONB columns, which allows the same schema to accommodate any robot's action types.
 Video and audio recordings captured during trials are stored in a self-hosted MinIO instance, an S3-compatible object storage service. Recordings are captured in the browser using the native MediaRecorder API (assisted by \texttt{react-webcam}) and uploaded to MinIO as a chunked transfer when the trial concludes.
 Robot communication is handled through a ROS Bridge (\texttt{rosbridge\_suite} or \texttt{ros2-web-bridge}) running on the robot's local network. The server connects to the bridge over a WebSocket and exchanges JSON-encoded ROS messages; it does not run as a ROS node itself. The bridge address is configured per robot in the plugin file (for example, \texttt{"rosbridgeUrl": "ws://localhost:9090"} in the NAO6 plugin).
 \section{Deployment}
-HRIStudio is distributed as Docker containers, enabling reproducible deployment across computing environments. The deployment stack consists of three services defined in \texttt{docker-compose.yml}:
+The full stack is orchestrated using Docker Compose. The \texttt{docker-compose.yml} file defines three services: the PostgreSQL database (\texttt{postgres:15}), the MinIO storage instance, and the Next.js application server. Starting the entire system on any machine with Docker installed is a single \texttt{docker compose up} command. This configuration is intended for on-premises deployment, which is important for studies involving participant data that cannot leave the institution's network.
-\subsection{Database Service}
+\section{Plugin Specification}
-PostgreSQL stores all structured data: user accounts, study metadata, experiment protocols, trial sessions, and event logs. The database schema follows a hierarchical structure matching the Study/Experiment/Trial/Step/Action data model described in Chapter~\ref{ch:design}.
+Robot capabilities are defined in JSON plugin files. Each file describes a robot platform and the actions it supports. The structure of a plugin file is as follows:
 The service is configured with persistent storage to preserve data across restarts:
 \begin{verbatim}
 services:
  db:
    image: postgres:15
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: hristudio
    volumes:
      - postgres_data:/var/lib/postgresql/data
 \end{verbatim}
 \subsection{File Storage Service}
 MinIO provides S3-compatible object storage for media files (video recordings, audio captures). Video and audio are stored separately from the database to keep query performance high while preserving complete multimedia records. The separation of concerns between database and file storage reflects the architectural principle that structured queries and unstructured binary data have different access patterns.
 \subsection{Robot Communication}
 Robot control flows through a rosbridge WebSocket connection (\texttt{ws://localhost:9090}). The web client connects directly to rosbridge, which handles translation to ROS-specific protocols. This design means HRIStudio itself does not need to be ROS-aware; it speaks the standard rosbridge JSON protocol over WebSocket.
 For deployment without physical robot hardware, a mock robot server provides simulated sensor data and action responses, enabling development and testing of experiment protocols.
 \section{Rosbridge-WebSocket Protocol}
 HRIStudio communicates with robots using the rosbridge protocol, a JSON-based WebSocket specification for ROS communication. The protocol defines several operations; HRIStudio uses a subset for robot control:
 \subsection{Subscribe}
 Subscribe to a ROS topic to receive sensor data:
 \begin{verbatim}
 {
  "op": "subscribe",
  "topic": "/joint_states",
  "type": "sensor_msgs/JointState",
  "id": "sub_1"
 }
 \end{verbatim}
 HRIStudio subscribes to sensor topics including joint states, battery status, bumpers, touch sensors, and sonar readings. Each message received updates the robot state displayed in the wizard interface.
 \subsection{Publish}
 Send commands to robot topics:
 \begin{verbatim}
 {
  "op": "publish",
  "topic": "/speech",
  "type": "std_msgs/String",
  "msg": { "data": "Hello, how are you?" }
 }
 \end{verbatim}
 Robot actions are published to appropriate topics based on the action type. Speech uses \texttt{/speech}, movement uses \texttt{/cmd_vel}, and joint positions use \texttt{/joint\_angles}.
 \subsection{Service Calls}
 Request robot information via ROS services:
 \begin{verbatim}
 {
  "op": "call_service",
  "service": "/naoqi_driver/get_robot_info",
  "args": {},
  "id": "call_1"
 }
 \end{verbatim}
 Service calls are used for queries like battery level or joint names that require a request-response pattern rather than continuous streaming.
 \section{Plugin System}
 The plugin architecture enables HRIStudio to support different robot platforms without modifying core code. Each robot is described by a JSON plugin file that maps abstract actions to platform-specific commands.
 \subsection{Plugin Structure}
 A plugin file defines:
 \begin{itemize}
-\item \textbf{Metadata}: Robot identifier, name, manufacturer, model, version compatibility
+  \item \textbf{Metadata}: name, version, and a human-readable description of the platform.
-\item \textbf{Topic Configuration}: Default ROS topic names for the robot's sensors and actuators
+  \item \textbf{ROS configuration} (\texttt{ros2Config}): the bridge URL and any global connection parameters.
-\item \textbf{Actions}: Available behaviors, each with parameter schemas and ROS topic mappings
+  \item \textbf{Actions}: an array of action definitions. Each action specifies:
 \item \textbf{Sensors}: Available sensor streams with their ROS topic and message type
 \item \textbf{Specifications}: Physical properties (dimensions, weight, degrees of freedom)
 \end{itemize}
 \subsection{Action Definition Example}
 The following excerpt shows how a ``Say Text'' action is defined for the NAO6 mock robot:
 \begin{verbatim}
 {
  "id": "say_text",
  "name": "Say Text",
  "category": "speech",
  "parameterSchema": {
    "type": "object",
    "properties": {
      "text": {
        "type": "string",
        "description": "Text to speak",
        "default": "Hello"
      }
    },
    "required": ["text"]
  },
  "ros2": {
    "messageType": "std_msgs/String",
    "topic": "/speech",
    "payloadMapping": {
      "type": "template",
      "payload": {
        "data": "{{text}}"
      }
    }
  }
 }
 \end{verbatim}
 The plugin specifies that executing the \texttt{say\_text} action should publish to the \texttt{/speech} topic with a \texttt{std_msgs/String} message containing the text parameter. The template syntax (\texttt{\{\{text\}\}}) enables parameter substitution at runtime.
 \subsection{Supported Actions}
 The NAO6 plugin defines the following action categories:
 \begin{description}
 \item[Speech:] Say text with optional emotion markup
 \item[Movement:] Walk forward/backward, turn left/right, stop
 \item[Gestures:] Wave, point, custom animations
 \item[Sensors:] Get battery level, read joint states
 \end{description}
 The mock robot plugin implements these same actions with simulated responses, enabling testing without physical hardware.
 \section{Event Logging}
 Every action during a trial is logged with precise timestamps. The event log captures:
    \begin{itemize}
-\item Action executions: what was commanded, when, and the result
+      \item A unique action type identifier (e.g., \texttt{speak}, \texttt{raise\_arm})
-\item Wizard inputs: button clicks, step advancement, manual overrides
+      \item A human-readable label shown in the Design interface
-\item Robot state changes: joint positions, sensor readings at key moments
+      \item A parameter schema defining the input fields the researcher configures
-\item Timing metadata: when actions were requested, when they began, when they completed
+      \item The target ROS topic and message type
      \item A mapping from parameter names to message fields
    \end{itemize}
 \end{itemize}
-The logging system is event-driven: rather than polling for state, the system responds to ROS topic messages and user interface events, writing each to the log with a millisecond-precision timestamp. This approach ensures comprehensive capture without introducing artificial delays into the real-time control loop.
+When the server dispatches a robot command, it loads the active plugin, locates the matching action definition, constructs the ROS message by applying the parameter mapping, and sends it to the bridge. Adding a new robot means writing a new plugin file; no server code changes are required.
-The complete event log for a trial is stored as part of the trial record, making the entire execution trace available for analysis and verification.
+\section{Role-Based Access Control}
-\section{WebSocket Connection Management}
+HRIStudio uses a two-layer role system. System roles (\texttt{systemRoleEnum}) govern what a user can do across the platform: \emph{administrator}, \emph{researcher}, \emph{wizard}, and \emph{observer}. Study roles (\texttt{studyMemberRoleEnum}) govern what a user can see and do within a specific study: \emph{owner}, \emph{researcher}, \emph{wizard}, and \emph{observer}. A user's system role and study role are checked independently, so a user who is a wizard on one study can be an observer on another without any additional configuration.
 The wizard interface maintains a persistent WebSocket connection to rosbridge throughout a trial session. Connection management includes:
 \begin{itemize}
 \item \textbf{Automatic reconnection}: If the connection drops, the system attempts to reconnect with exponential backoff, up to a maximum of 5 attempts
 \item \textbf{Connection state tracking}: The interface displays current connection status (connected, connecting, disconnected)
 \item \textbf{Simulation mode}: When enabled, the client simulates robot responses without requiring rosbridge, useful for development and training
 \end{itemize}
 The simulation mode is particularly useful for wizard training: new operators can practice with the mock robot before conducting live sessions with participants.
 \section{Repository Structure}
 The HRIStudio source code is organized as follows:
 \begin{verbatim}
 hristudio/
 ├── docker-compose.yml        # Production deployment
 ├── src/
 │   ├── app/                 # Next.js pages and API routes
 │   ├── lib/
 │   │   └── ros/             # ROS communication library
 │   │       └── wizard-ros-service.ts
 │   └── components/          # React UI components
 └── robot-plugins/
    ├── plugins/             # Robot plugin definitions
    │   ├── nao6-mock.json
    │   ├── nao6-ros2.json
    │   └── turtlebot3-*.json
    └── package.json
 \end{verbatim}
 The separation between the main application and robot plugins enables the platform to be extended for new robots without modifying the core codebase.
@@ -193,33 +193,27 @@ series = {OzCHI '15}
  doi = {10.1145/3610978.3640741}
 }
-@misc{React2024,
+@InProceedings{TypeScript2014,
-  title={{React: A JavaScript library for building user interfaces}},
+author="Bierman, Gavin
-  author={Meta},
+and Abadi, Mart{\'i}n
-  year={2024},
+and Torgersen, Mads",
-  url={https://react.dev}
+editor="Jones, Richard",
 title="Understanding TypeScript",
 booktitle="ECOOP 2014 -- Object-Oriented Programming",
 year="2014",
 publisher="Springer Berlin Heidelberg",
 address="Berlin, Heidelberg",
 pages="257--281",
 abstract="TypeScript is an extension of JavaScript intended to enable easier development of large-scale JavaScript applications. While every JavaScript program is a TypeScript program, TypeScript offers a module system, classes, interfaces, and a rich gradual type system. The intention is that TypeScript provides a smooth transition for JavaScript programmers---well-established JavaScript programming idioms are supported without any major rewriting or annotations. One interesting consequence is that the TypeScript type system is not statically sound by design. The goal of this paper is to capture the essence of TypeScript by giving a precise definition of this type system on a core set of constructs of the language. Our main contribution, beyond the familiar advantages of a robust, mathematical formalization, is a refactoring into a safe inner fragment and an additional layer of unsafe rules.",
 isbn="978-3-662-44202-9"
 }
-@misc{Nextjs2024,
+@article{Brooke1996,
-  title={{Next.js: The React Framework for the Web}},
+author = {Brooke, John},
-  author={Vercel},
+year = {1995},
-  year={2024},
+month = {11},
-  url={https://nextjs.org}
+pages = {},
 title = {SUS: A quick and dirty usability scale},
 volume = {189},
 journal = {Usability Eval. Ind.}
 }
@misc{TypeScript2024,
  title={{TypeScript: Typed JavaScript at Any Scale}},
  author={{Microsoft and the TypeScript Community}},
  year={2024},
  url={https://www.typescriptlang.org}
 }
@misc{tRPC2024,
  title={{tRPC: Move fast and break nothing. End-to-end typesafe APIs made easy}},
  author={Alex Johansson and community contributors},
  year={2024},
  url={https://trpc.io}
 }
@@ -4,9 +4,16 @@
 %\usepackage{graphics}            %Select graphics package
 \usepackage{graphicx}             %
 %\usepackage{amsthm}              %Add other packages as necessary
 \usepackage{array}                %Extended column types and \arraybackslash
 \usepackage{tabularx}             %Auto-width table columns
 \usepackage{tikz}                 %For programmatic diagrams
-\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds}
+\usetikzlibrary{shapes,arrows,positioning,fit,backgrounds,decorations.pathreplacing}
-\usepackage[hidelinks]{hyperref}  %Enable hyperlinks and \autoref, hide colored boxes
+\usepackage[
    hidelinks,
    linktoc=all,
    pdfpagemode=UseOutlines
 ]{hyperref}  %Enable hyperlinks and PDF bookmarks
 \hyphenation{HRIStudio}
 \begin{document}
 \butitle{A Web-Based Wizard-of-Oz Platform for Collaborative and Reproducible Human-Robot Interaction Research}
 \author{Sean O'Connor}
@@ -61,6 +68,7 @@
 %J.~Good Phys., {\bf 2}, 294 (2004).
 %\end{thebibliography}
 \makeatletter\@mainmattertrue\makeatother
 \appendix
 \include{chapters/app_materials}
 \include{chapters/app_tech_docs}
Author	SHA1	Message	Date
soconnor	96057e1bf8	Enhance architectural design, implementation, and evaluation chapters with detailed specifications and pilot validation study Build Proposal and Thesis / build-github (push) Has been skipped Details Build Proposal and Thesis / build-gitea (push) Successful in 57s Details	2026-03-26 13:50:07 -04:00
soconnor	7757046eec	Refactor implementation and evaluation chapters for clarity and detail - Revised the implementation chapter to emphasize HRIStudio as a reference implementation of design principles, detailing architectural choices and mechanisms. - Enhanced descriptions of platform architecture, experiment storage, execution engine, and access control. - Updated evaluation chapter to reflect the study as a pilot validation study, clarifying research questions, study design, participant roles, and measures. - Improved consistency in language and structure throughout both chapters. - Added details on participant recruitment and task specifications to better contextualize the study. - Adjusted measurement instruments table to align with the new chapter title. - Updated LaTeX document to include additional TikZ library for improved diagram capabilities.	2026-03-05 23:28:59 -05:00
soconnor	4d960b0ca9	Revise evaluation chapter and appendices; enhance clarity on study design, participant roles, and consent forms for HRIStudio evaluation.	2026-03-05 14:09:57 -05:00
soconnor	fed059252c	Enhance system design and implementation chapters; clarify design decisions, improve technical documentation, and update role-based access control details.	2026-03-04 13:24:36 -05:00
soconnor	88bd10bebb	post-m06-ch04 revisions	2026-03-02 17:00:22 -05:00
soconnor	9128900bc7	Refine introduction, background, reproducibility, and implementation chapters; enhance clarity by emphasizing key challenges and updating references	2026-03-02 12:40:27 -05:00