<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.3 20210610//EN" "https://jats.nlm.nih.gov/publishing/1.3/JATS-journalpublishing1-3.dtd">
<!--<?xml-stylesheet type="text/xsl" href="article.xsl"?>-->
<article article-type="research-article" dtd-version="1.3" xml:lang="en" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id journal-id-type="issn">0000-0000</journal-id>
<journal-title-group>
<journal-title>Artificial Intelligence Advances in Education</journal-title>
</journal-title-group>
<issn publication-format="electronic">0000-0000</issn>
<publisher>
<publisher-name>SCS Journals</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.0000/XXXX.xxxx</article-id>
<article-version>VoR</article-version>
<article-categories>
<subj-group>
<subject>Original research</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A neuro-symbolic approach for automatic assessment in ordinary differential equations</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<contrib-id contrib-id-type="orcid">https://orcid.org/0000-0001-8164-6759</contrib-id>
<name>
<surname>Garc&#237;a</surname>
<given-names>P.</given-names>
</name>
<email>pgarcial@ucab.edu.ve</email>
<xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">https://orcid.org/0009-0007-6976-047X</contrib-id>
<name>
<surname>Estrada</surname>
<given-names>L.</given-names>
</name>
<email>lestrada@ucab.edu.ve</email>
<xref ref-type="aff" rid="aff-3">3</xref>
</contrib>
</contrib-group>
<aff id="aff-1"><label>1</label>Universidad Cat&#243;lica Andres Bello, Facultad de Ingenier&#237;a, Departamento de F&#237;sica, Caracas, Venezuela</aff>
<aff id="aff-2"><label>2</label>Red Iberoamericana de Investigadores en Matematicas Aplicadas a Datos (AUIP), Venezuela</aff>
<aff id="aff-3"><label>3</label>Universidad Catolica Andres Bello, Facultad de Ingenier&#237;a, Departamento de Matem&#225;tica, Caracas, Venezuela</aff>
<pub-date publication-format="electronic" date-type="pub" iso-8601-date="2026-04-08">
<day>08</day>
<month>04</month>
<year>2026</year>
</pub-date>
<pub-date publication-format="electronic" date-type="collection">
<year>2026</year>
</pub-date>
<volume>1</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>9</lpage>
<history>
<date date-type="received" iso-8601-date="2026-02-14">
<day>14</day>
<month>02</month>
<year>2026</year>
</date>
<date date-type="accepted" iso-8601-date="2026-03-31">
<day>31</day>
<month>03</month>
<year>2026</year>
</date>
<date date-type="rev-recd" iso-8601-date="2026-03-12">
<day>12</day>
<month>03</month>
<year>2026</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright: &#x00A9; 2026 The Author(s)</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access" xlink:href="https://creativecommons.org/licenses/by-nd/4.0/">
<license-p>This is an open-access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (CC BY-ND 4.0), which permits to copy and distribute the material in any medium or format only in an unadapted form, as long as the author is named. The license allows commercial use. See <uri xlink:href="https://creativecommons.org/licenses/by-nd/4.0/">https://creativecommons.org/licenses/by-nd/4.0/</uri>.</license-p>
</license>
</permissions>
<self-uri xlink:href="https://aiaie.scs-journals.com/articles/10.0000/XXXX.xxxx/"/>
<abstract>
<p>This work presents a robust neuro-symbolic framework for the automated assessment of ordinary differential equations by integrating large language models with symbolic computation engines. The core innovation lies in using the natural language model as a semantic orchestrator capable of interpreting student logic, while a deterministic symbolic engine shields the process.</p>
<p>This hybrid approach addresses the risk of hallucinations by providing a rigorous framework for symbolic verification, thus increasing the overall accuracy of the results.</p>
<p>Our results suggest that this architecture has the potential to perform complex error carry-over analysis, aiding in the differentiation between conceptual failures and consistent algebraic derivations, within the scope of the evaluated cases.</p>
</abstract>
<kwd-group>
<kwd>Automated Assessment</kwd>
<kwd>Neuro-symbolic AI Strategies</kwd>
<kwd>Ordinary Differential Equations</kwd>
<kwd>Large Language Models</kwd>
<kwd>Computer Algebra System</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec>
<title>1. Introduction</title>
<p>The relationship between Ordinary Differential Equations (ODEs) and machine learning is natural: while ODEs act as mathematical models encoding fundamental laws, machine learning emerges as a system capable of inferring patterns from complex data. In essence, ODEs allow data to be generated from the model, whereas machine learning strategies enable the derivation of models from data, with strategies ranging from kernel methods (<xref ref-type="bibr" rid="B5">Garc&#237;a, 2022</xref>) and neural networks (<xref ref-type="bibr" rid="B2">Chen et al., 2018</xref>) to transformers (<xref ref-type="bibr" rid="B1">Becker et al., 2023</xref>; <xref ref-type="bibr" rid="B4">d&#8217;Ascoli et al., 2024</xref>), which are the basis of large-scale language models (LLMs). This empirical landscape suggests that the use of LLMs for ODE exam evaluation is not simply a technological convenience, but a logical extension of machine learning&#8217;s ability to interpret and validate symbolic reasoning, particularly in this case.</p>
<p>The integration of AI into ODE education offers a powerful tool for bridging procedural skills with conceptual mastery. While recent observational studies highlight that LLMs still face significant hurdles in complex mathematical reasoning (<xref ref-type="bibr" rid="B3">Collins et al., 2024</xref>), emerging frameworks in educational data mining demonstrate their potential for providing automated and formative feedback in problem-solving tasks (<xref ref-type="bibr" rid="B18">Worden et al., 2024</xref>). This synergy supports the development of neuro-symbolic systems where LLMs manage natural language while symbolic engines ensure algebraic precision.</p>
<p>The implementation of automated assessment systems (<xref ref-type="bibr" rid="B6">Gnanaprakasam &amp; Lourdusamy, 2024</xref>; <xref ref-type="bibr" rid="B9">Korthals et al., 2025</xref>; <xref ref-type="bibr" rid="B12">Mendonca et al., 2025</xref>) responds to the need to scale educational feedback without compromising consistency, objectivity or comprehensiveness. By integrating a symbolic engine with an LLM, this approach aligns with the principles of adaptive assessment, in which the system&#8217;s ability to diagnose underlying reasoning allows for feedback personalization beyond simple binary correction (<xref ref-type="bibr" rid="B14">Shuste &amp; Zapata-Rivera, 2012</xref>). Thus, the identification of logical milestones facilitates a transition toward personalized education by determining whether a discrepancy stems from a specific operational error or a theoretical deficiency.</p>
<p>In large class sizes, manual grading is prone to fatigue and subjective variability. Automation ensures uniform rubrics and facilitates immediate formative feedback, which is essential to prevent conceptual errors.</p>
<p>In this context, LLMs based on the transformer architecture (<xref ref-type="bibr" rid="B16">Vaswani et al., 2017</xref>) have redefined the processing of hybrid content where natural language and formal technical notation converge. When applied to automated exam assessment, this technology facilitates scalability and can support a transition toward personalized education. By employing these tools, it is possible to deconstruct the development of a solution into logical milestones, enabling the identification of whether a discrepancy stems from a specific operational error or an underlying theoretical deficiency (<xref ref-type="bibr" rid="B17">Wei et al., 2022</xref>; <xref ref-type="bibr" rid="B19">Zhou et al., 2023</xref>).</p>
<p>However, the application of LLMs in the exact sciences presents accuracy challenges arising from their stochastic nature (<xref ref-type="bibr" rid="B11">Lee et al., 2025</xref>). This can lead to divergences in algebraic reasoning (<xref ref-type="bibr" rid="B8">Huang et al., 2025</xref>), where the model generates sequences that, although linguistically plausible, lack formal validity. To mitigate this risk, this article proposes a hybrid architecture where the LLM acts as a semantic orchestrator and the symbolic processor functions as a deterministic verification anchor. This collaboration ensures that the interpretive flexibility of the language model is backed by the absolute rigor of computational calculation.</p>
<p>In this framework, we hypothesize that a neuro-symbolic architecture, integrating the semantic orchestration of LLMs with the deterministic verification of a Computer Algebra System (CAS), allows for the automated assessment of ODEs while maintaining mathematical rigor and pedagogical fairness. This synergy is expected to bridge the gap between probabilistic reasoning and symbolic accuracy, enabling a human-like &#8216;error-drift&#8217; analysis.</p>
<p>Thus, in this work we present a novel neuro-symbolic framework for the automated assessment of ODE exams by integrating LLMs, Gemini (<xref ref-type="bibr" rid="B7">Google DeepMind, 2024</xref>) in this case, with CAS, SymPy (<xref ref-type="bibr" rid="B13">Meurer et al., 2017</xref>) in this case.</p>
<p>To show one way of addressing this problem, we have organized the article into five sections, organized as follows: Section 2 analyzes the specific challenges of qualifying ODEs, such as sequential dependency and the non-uniqueness in the representation of solutions; Section 3 details the proposed methodology to implement the neuro-symbolic strategy, describing the semantic extraction flow and the evaluator configuration using a structured system prompt; Section 4 shows a real-world case study; Section 5 presents the final remarks.</p>
</sec>
<sec>
<title>2. Automatic Assessment of ODEs Exams</title>
<p>The assessment of ODEs remains a significant challenge for classical AI systems, which often struggle with the multi-step reasoning and precise symbolic manipulation required in STEM subjects (<xref ref-type="bibr" rid="B15">Tan et al., 2025</xref>). As noted by Tan, the assessment of STEM subjects presents unique structural challenges, particularly in maintaining mathematical consistency and in addressing the <italic>black-box</italic> nature of deep learning models. Classical systems frequently fail to provide the explainable, step-by-step validation necessary for complex mathematical derivations, a gap that persists in current automated grading technologies.</p>
<p>In the particular case of ODEs, one of the main obstacles is sequential dependency or the cascade effect: solving an ODE is a graph of dependencies in which a minor error in an intermediate step, such as calculating an integrating factor, invalidates the final numerical result. However, this does not necessarily imply that the logic of the subsequent procedure is incorrect; an expert human evaluator is capable of performing an error propagation analysis to assign partial scores, a capability that traditional automated systems lack.</p>
<p>Another critical challenge lies in the non-uniqueness of the solution form, since, due to trigonometric identities or properties of logarithms or other functions, a correct answer can be expressed in multiple visually distinct ways. Conventional systems often fail to recognize these identities, requiring an exact character match rather than validating the mathematical identity of the function. In addition, the technical validation of an ODE requires checking whether the student&#8217;s proposal satisfies the fundamental differential operator (<italic>L</italic>[<italic>y</italic>] = <italic>g</italic>(<italic>x</italic>)), a symbolic verification that classic correctors do not integrate, limiting their ability to offer deep and fair pedagogical feedback.</p>
<p>To overcome these limitations, the integration of LLMs and CAS emerges as a robust solution. Although the LLM offers the semantic flexibility needed to interpret the nuances of student language, the symbolic engine ensures mathematical precision by providing a formal verification layer that mitigates the risk of model hallucinations.</p>
<p>This neuro-symbolic integration seeks to enhance evaluative accuracy while minimizing generative biases. From this perspective, the neuro-symbolic integration is proposed as a robust solution to potentially mitigate generative biases and enhance evaluative accuracy. This synergy is projected to facilitate the automation of complex pedagogical tasks, such as error-drift analysis and the validation of non-unique solutions.</p>
</sec>
<sec sec-type="methods">
<title>3. Neuro-Symbolic Methodology for ODE Assessment</title>
<p>The proposed methodology is based on a deterministic symbolic evaluation approach that goes beyond simple text comparison to focus on the logical validity of the mathematical procedure. The process begins with the segmentation of the student&#8217;s response into critical milestones. Subsequently, symbolic extraction is performed using the SymPy library to translate natural language into exact computational variables. The main innovation lies in error drift detection: if the system detects a fault in step n, it generates code to verify whether the subsequent steps are consistent with that initial error instead of automatically invalidating the entire exam. Finally, a litmus test is applied using the differential operator <italic>L</italic>[<italic>y</italic>] = <italic>g</italic>(<italic>x</italic>) and an algorithmic identity check <monospace>(simplify(Student &#8211; Ref) == 0)</monospace>, ensuring that any solution mathematically equivalent to the reference is accepted, regardless of its visual form.</p>
<sec>
<title>3.1 Neuro-symbolic assessment architecture</title>
<p>The architecture of this assessment system is based on the synergistic interaction of two main actors: the LLM, which acts as the cognitive core and orchestrator of the process, and the CAS, which functions as the high-precision technical validator. While the LLM is responsible for semantic interpretation, structuring the student&#8217;s steps, and generating error hypotheses, the Symbolic Computation Engine provides the mathematical rigor necessary to perform exact algebraic verifications and identity tests. This duality allows the system not only to understand the student&#8217;s intention in natural language but also to guarantee the mathematical infallibility of the correction by executing deterministic code.</p>
<p>The proposal to pair LLMs with symbolic computation engines, we believe, can emerge as a paradigm that seeks to bridge the gap between probabilistic and deterministic reasoning. Its innovative nature is reflected in the following technical aspects:</p>
<list list-type="roman-lower">
<list-item><p>Overcoming the <italic>black box</italic>: Unlike traditional computer-assisted assessment systems, which are rigid, or pure LLMs, which can hallucinate, this proposal uses the LLM as an <italic>intelligent translator</italic> of human logic into executable code that can be audited by humans.</p></list-item>
<list-item><p>Error drift analysis: This is one of the most revolutionary capabilities of this approach. Historically, only a human teacher could detect if a student failed at the beginning, but maintained logical consistency throughout the rest of the exam. Symbolic integration allows the system to recalculate the ODE using the student&#8217;s error to validate the consistency of the subsequent procedure.</p></list-item>
<list-item><p>Validation by identity, not by characters: Solves the classic problem of non-uniqueness of solutions in mathematics. While a traditional system would consider an answer using a different trigonometric identity to be incorrect, the symbolic engine verifies functional equality using the differential operator.</p></list-item>
<list-item><p>Rigorous scalability perspective: Offers a solution to the dilemma between the need for immediate feedback in large classes and the mathematical precision required by the exact sciences, mitigating the typical hallucinations of probabilistic models.</p></list-item>
</list>
<sec>
<title>3.1.1 Operational workflow: From prompt engineering to execution</title>
<p>Our strategy is based on five essential components that seek to replicate the most valuable characteristics of human correction, aimed at ensuring the fairest and most equitable assessment possible. Rather than limiting itself to a binary validation of results, this approach allows for a comprehensive assessment of student performance through the following pillars:</p>
<list list-type="roman-lower">
<list-item><p>Segmentation: This consists of the logical fragmentation of the response into critical milestones, allowing for a granular review of each stage of the process.</p></list-item>
<list-item><p>Symbolic Extraction: This translates natural language and informal notation into exact algebraic expressions, eliminating ambiguities in the interpretation of mathematical symbols.</p></list-item>
<list-item><p>Error Drift Detection and Partial Credit: One of the most human-like capabilities of the system allows the logical consistency of subsequent steps to be validated even when starting from an initial error, avoiding unfair penalties for isolated operational failures. An algorithm to implement this fundamental aspect of the strategy is given by the <xref ref-type="fig" rid="FA1">Algorithm 1</xref>.</p></list-item>
<list-item><p>Identity Verification: Ensures that any answer mathematically equivalent to the reference solution is accepted, regardless of the algebraic or trigonometric variant used by the student.</p></list-item>
<list-item><p>Fire Test: The definitive validation used by the differential operator to confirm that the student&#8217;s proposal rigorously satisfies the original equation and its conditions.</p></list-item>
</list>
<fig id="FA1">
<label>Algorithm 1</label>
<caption>
<p>Error-Drift and Partial Credit</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="aiae-917_garcia-g4.png"/>
</fig>
<p>The architecture of the method can be seen graphically in <xref ref-type="fig" rid="F1">Figure 1</xref>. To operationalize these pillars, we developed a specialized System Prompt that codifies the cognitive audit and error-handling logic, as detailed below.</p>
<fig id="F1">
<label>Figure 1</label>
<caption>
<p>Strategy flowchart</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="aiae-917_garcia-g1.png"/>
</fig>
</sec>
<sec>
<title>3.1.2 System prompt design and chain of thought configuration</title>
<p>As already mentioned, this implementation does not seek a simple textual interpretation of the answer, but rather the construction of a verification graph based on a system prompt designed specifically for process control in differential equations. In this approach, the prompt <xref ref-type="fig" rid="FL1">Listing 1</xref> is structured to increase the rigor of the evaluation through a Chain of Thought (CoT) (<xref ref-type="bibr" rid="B17">Wei et al., 2022</xref>), instructing the model not only to correct the final result, but also to identify the minimum logical links that connect the statement with the solution.</p>
<fig id="FL1">
<label>Listing 1</label>
<caption>
<p>Proposed System Prompt for CoT Assessement</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="aiae-917_garcia-g5.png"/>
</fig>
<p>In the context of the proposed neuro-symbolic architecture, this implies that the LLM must map the student&#8217;s <italic>chain of thought</italic> against a <italic>reference chain</italic> deterministically validated by SymPy. Thus, the following prompt defines the model&#8217;s behavioral logic, forcing it to treat the resolution of the differential equation as a sequence of interdependent links in which each transition must be symbolically verified to ensure the integrity of the evaluation process.</p>
</sec>
</sec>
<sec>
<title>3.2 Neuro-symbolic assessment in practice</title>
<p>The automated assessment strategy is implemented by configuring an execution environment where the Language Model acts as a symbolic logic orchestrator and the Computer Algebra System, SymPy, functions as a technical validator that provides the mathematical rigor necessary to eliminate generative hallucinations. In this case, We utilized Gemini 1.5 Flash (model version: <monospace>gemini-1.5-flash-00l</monospace>) via the Google AI Studio API. To ensure reproducibility and minimize stochastic behavior, the temperature was set to 0.0, with Top-P at 0.95, and a max output token limit of 2048.</p>
<p>While LLM interprets the student&#8217;s intent and segments the response into logical milestones, SymPy executes deterministic code to perform exact algebraic verifications, identity tests, and the final <italic>Fire Test</italic> using the differential operator <italic>L</italic>[<italic>y</italic>] &#8211; <italic>g</italic>(<italic>x</italic>) = 0.</p>
<p>This neuro-symbolic orchestration offers significant structural advantages over simply executing an LLM from a traditional Python script. While a conventional script requires perfectly structured data and fails in the face of unexpected variables or alternative notations, this system acts as an agent capable of performing symbolic extraction that automatically adapts the student&#8217;s intention to specific SymPy commands. Furthermore, in contrast to the rigidity of binary evaluation of static code, the proposed architecture allows for dynamic error tracking analysis where the LLM, upon detecting a failure in step <italic>n</italic>, reconfigures the symbolic engine to verify whether the subsequent development maintains logical consistency with that erroneous premise, thus facilitating a fair assignment of partial scores.</p>
<p>Finally, this model goes beyond the delivery of simple numerical results by leveraging the output of the calculation engine to generate pedagogical feedback in natural language. This approach identifies the specific stage where the student&#8217;s reasoning deviates from the formal derivation, providing a clearer explanation of the error. <xref ref-type="table" rid="T1">Table 1</xref> below summarizes these advantages.</p>
<table-wrap id="T1">
<label>Table 1</label>
<caption>
<p>Comparison of ODE Exam correction approaches against manual python script</p>
</caption>
<table>
<tr>
<th colspan="3"><hr/></th>
</tr>
<tr>
<th align="left" valign="top">Feature</th>
<th align="left" valign="top">Manual Python Script</th>
<th align="left" valign="top">Gemini + Sympy</th>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Processing</bold></td>
<td align="left" valign="top">Rigid and predefined algorithmic logic.</td>
<td align="left" valign="top">Heuristic reasoning based on the exam context.</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Input</bold></td>
<td align="left" valign="top">Requires structured data or prior cleaning.</td>
<td align="left" valign="top">Ability to process natural language and varied formulas.</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Error Analysis</bold></td>
<td align="left" valign="top">Generally binary and inflexible in the face of initial failures.</td>
<td align="left" valign="top">Detection of logical consistency through drag analysis.</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Maintenance</bold></td>
<td align="left" valign="top">High: requires code updates for each new problem.</td>
<td align="left" valign="top">Low: adapts to new statements through <italic>Prompt Engineering</italic>.</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
</table>
</table-wrap>
</sec>
</sec>
<sec>
<title>4. Neuro-Symbolic Assessment of Experimental Data</title>
<p>The study was conducted using a dataset consisting of <italic>n</italic> = 18 complete exams. Since each exam consists of three multi-step ODE problems, the analysis covered a total of 54 detailed solution units. The participants were university students from the Faculty of Engineering at the Andres Bello Catholic University (Caracas, Venezuela), enrolled in Computer Engineering, Civil Engineering, and Telecommunications Engineering.</p>
<p>In the following, one student&#8217;s exam will be used as a representative case study of the proposed assessment methodology. This exam was selected because one problem is solved well and partial errors are made in the others. We believe this offers a useful perspective on our strategy.</p>
<p>It should be noted that the performance of the proposed strategy observed in this particular case study exam is similar to that of the rest of the group evaluated, which allows us to generalize the conclusions obtained. In this way, and in order to optimize the length of the article and avoid an excessive load of images from the exam, only a detailed analysis of one of its answers will be presented, which will serve as an illustrative model of the interaction between the linguistic model and the symbolic calculation engine.</p>
<p>To make the presentation lighter, we will divide it into three parts: i) the presentation of the problem to the student, the reference solution and rubric for human assessment, ii) the student&#8217;s response, and iii) the response of the automatic evaluation system.</p>
<sec>
<title>4.1 Presentation of the problem</title>
<p>The original assessment consists of three problems designed to measure the competence in solving ODEs using the Laplace transform. Although the answers collected show significant variability in terms of accuracy and procedural errors, for reasons of editorial length, a detailed analysis of a single exam will be presented. This selection serves as a representative test case, allowing a qualitative illustration of the performance and robustness of the proposed correction strategy in the face of real mathematical developments.</p>
<p>In this exam, the student is asked to solve the following differential equations <xref ref-type="table" rid="T2">Table 2</xref>:</p>
<table-wrap id="T2">
<label>Table 2</label>
<caption>
<p>Exam Problems and Reference Solutions</p>
</caption>
<table>
<tr>
<th colspan="3"><hr/></th>
</tr>
<tr>
<th align="left" valign="top">Prob.</th>
<th align="left" valign="top">Problem Statement</th>
<th align="left" valign="top">Reference Solution</th>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">1</td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq001-mml">
<mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mtext>=</mml:mtext><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mn>0</mml:mn><mml:mi>t</mml:mi></mml:msubsup><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>cos</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mi>&#x03C4;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mi>d</mml:mi><mml:mo>&#x03C4;</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>1</mml:mn></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e1.gif"/>
</alternatives>
</inline-formula></td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq004-mml">
<mml:mrow><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>1</mml:mn><mml:mtext>+</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e4.gif"/>
</alternatives>
</inline-formula></td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">2</td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq002-mml">
<mml:mrow><mml:mi>t</mml:mi><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2033;</mml:mo></mml:msup><mml:mo>&#x2013;</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mtext>=</mml:mtext><mml:mn>2</mml:mn><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>0</mml:mn></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e2.gif"/>
</alternatives>
</inline-formula></td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq005-mml">
<mml:mrow><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>2</mml:mn><mml:mn>3</mml:mn></mml:mfrac></mml:mstyle><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mtext>+</mml:mtext><mml:msub><mml:mi>C</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mtext>+</mml:mtext><mml:msub><mml:mi>C</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e5.gif"/>
</alternatives>
</inline-formula></td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">3</td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq003-mml">
<mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msup><mml:mi>y</mml:mi><mml:mo>&#x2033;</mml:mo></mml:msup><mml:mtext>+</mml:mtext><mml:mn>4</mml:mn><mml:mi>y</mml:mi><mml:mtext>=</mml:mtext><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>,</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mtext>with</mml:mtext><mml:mo>&#x00A0;</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>2</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable columnalign='left'><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mn>0</mml:mn></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr columnalign='left'><mml:mtd columnalign='left'><mml:mrow><mml:mn>4</mml:mn><mml:mi>t</mml:mi><mml:mtext>+</mml:mtext><mml:mn>8</mml:mn><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:mtd><mml:mtd columnalign='left'><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow></mml:mrow></mml:mtd></mml:mtr></mml:mtable>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e3.gif"/>
</alternatives>
</inline-formula></td>
<td align="left" valign="top"><inline-formula>
<alternatives>
<mml:math id="Eq006-mml">
<mml:mrow><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>2</mml:mn><mml:mtext>cos</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>+</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mi>t</mml:mi><mml:mn>4</mml:mn></mml:mfrac></mml:mstyle><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mrow><mml:mtext>sin(</mml:mtext><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mtext>)</mml:mtext></mml:mrow><mml:mn>8</mml:mn></mml:mfrac></mml:mstyle><mml:mo>&#x2013;</mml:mo><mml:mi>u</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mi>&#x03C0;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mi>t</mml:mi><mml:mn>4</mml:mn></mml:mfrac></mml:mstyle><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mrow><mml:mtext>sin(</mml:mtext><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mtext>)</mml:mtext></mml:mrow><mml:mn>4</mml:mn></mml:mfrac></mml:mstyle><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mi>&#x03C0;</mml:mi><mml:mn>4</mml:mn></mml:mfrac></mml:mstyle><mml:mtext>cos(</mml:mtext><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mtext>)</mml:mtext></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e6.gif"/>
</alternatives>
</inline-formula></td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
</table>
</table-wrap>
<p>The ground-truth for this study was established through the evaluation of the exams by two senior faculty members. Both graders utilized a standardized analytical rubric (see <xref ref-type="table" rid="T3">Table 3</xref>), which was designed to evaluate procedural consistency and numerical results. The authors advocate that effective mathematical assessment must account for the logical flow of a solution; thus, the proposed neuro-symbolic framework aims to formalize this pedagogical principle through its &#8216;Error-Drift&#8217; mechanism. By mimicking the human ability to recalculate and validate a student&#8217;s reasoning following a computational slip, the system ensures an assessment that is both fair and deeply aligned with expert human judgment.</p>
<table-wrap id="T3">
<label>Table 3</label>
<caption>
<p>Analytical Grading Rubric</p>
</caption>
<table>
<tr>
<th colspan="3"><hr/></th>
</tr>
<tr>
<th align="left" valign="top">Dimension</th>
<th align="left" valign="top">Assessment Criteria</th>
<th align="left" valign="top">Max Score (%)</th>
</tr>
<tr>
<th colspan="3"><hr/></th>
</tr>
<tr>
<td align="left" valign="top"><bold>1. Initial Modeling</bold></td>
<td align="left" valign="top">Accuracy in problem transcription and correct selection of the ODE method.</td>
<td align="left" valign="top">20%</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>2. Procedural Consistency</bold></td>
<td align="left" valign="top">Logical flow in step <italic>n</italic> + 1 relative to step <italic>n</italic>. Correct logic is rewarded even if based on a prior error.</td>
<td align="left" valign="top">30%</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>3. Algebraic Rigor</bold></td>
<td align="left" valign="top">Precision in specific algebraic operations, sign management, and coefficient handling.</td>
<td align="left" valign="top">40%</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>4. Logical Convergence</bold></td>
<td align="left" valign="top">The final result is mathematically consistent with the student&#8217;s own mathematical path, emphasizing the conclusion of the process.</td>
<td align="left" valign="top">10%</td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
<tr>
<td align="left" valign="top"><bold>Total Score</bold></td>
<td align="left" valign="top"><bold>Comprehensive evaluation of the resolution process</bold></td>
<td align="left" valign="top"><bold>100%</bold></td>
</tr>
<tr>
<td colspan="3"><hr/></td>
</tr>
</table>
</table-wrap>
</sec>
<sec>
<title>4.2 Student response</title>
<p>The evaluation process begins with the digitization of the student&#8217;s response <xref ref-type="fig" rid="F2">Figure 2</xref>, which is originally submitted in handwritten format. It should be noted that this exam is the only sensitive information shared in the article and is presented as an anonymized document, the use of which was authorized in writing by the student. The rest of the data presented in the study consists of the anonymous grades of the other students in the sample, the use of which does not require their express authorization.</p>
<fig id="F2">
<label>Figure 2</label>
<caption>
<p>An example of a written exam response</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="aiae-917_garcia-g2.png"/>
</fig>
<p>This document, uploaded to the system as an image, constitutes the primary input for the AI workflow. The handwritten nature of the exam adds a level of complexity that the LLM must resolve through character recognition and interpretation of technical handwriting, ensuring that the transcription of formulas and procedures is faithful to the original development before proceeding with verification in the symbolic computation engine.</p>
<p>The automatic assessment of the exam, using our neuro-symbolic strategy, is given below as a list:</p>
<p><bold>Problem 1: Volterra integral equation</bold></p>
<p>Chain of thought links:</p>
<list list-type="bullet">
<list-item><p><italic>L</italic><sub>1</sub>&#160;<bold>(Identification):</bold> Recognition of the integral term as the convolution (<italic>y</italic> * cos <italic>t</italic>).</p></list-item>
<list-item><p><italic>L</italic><sub>2</sub>&#160;<bold>(Transformation):</bold> Application of Laplace: <inline-formula>
<alternatives>
<mml:math id="Eq007-mml">
<mml:mrow><mml:mtext mathvariant="italic">sY</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2009;</mml:mo><mml:mtext>&#x2013;1=</mml:mtext><mml:mi>Y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mi>s</mml:mi><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mtext>+</mml:mtext><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e7.gif"/>
</alternatives>
</inline-formula>.</p></list-item>
<list-item><p><italic>L</italic><sub>3</sub>&#160;<bold>(Resolution):</bold> Algebraic solving to obtain <inline-formula>
<alternatives>
<mml:math id="Eq008-mml">
<mml:mrow><mml:mi>Y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mtext>+</mml:mtext><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e8.gif"/>
</alternatives>
</inline-formula>.</p></list-item>
<list-item><p><italic>L</italic><sub>4</sub>&#160;<bold>(Decomposition):</bold> Fractionation into <inline-formula>
<alternatives>
<mml:math id="Eq009-mml">
<mml:mrow><mml:mi>Y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>1</mml:mn><mml:mi>s</mml:mi></mml:mfrac></mml:mstyle><mml:mtext>+</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn>3</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e9.gif"/>
</alternatives>
</inline-formula>.</p></list-item>
<list-item><p><italic>L</italic><sub>5</sub>&#160;<bold>(Inverse):</bold> Application of the inverse transform to obtain <inline-formula>
<alternatives>
<mml:math id="Eq010-mml">
<mml:mrow><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mn>1</mml:mn><mml:mtext>+</mml:mtext><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e10.gif"/>
</alternatives>
</inline-formula>.</p></list-item>
</list>
<p>Error analysis: The chain is <bold>intact.</bold> The symbolic engine confirms that the solution satisfies the differential operator <italic>L</italic>[<italic>y</italic>] &#8211; <italic>g</italic>(<italic>x</italic>) = 0.</p>
<p><bold>Grade: 100%.</bold></p>
<p><bold>Problem 2: ODEs with variable coefficients</bold></p>
<p>Chain of thought links:</p>
<list list-type="bullet">
<list-item><p><italic>L</italic><sub>1</sub>&#160;<bold>(Property):</bold> Application of the differentiation property in <italic>s</italic>: &#8466;{<italic>tf</italic> (<italic>t</italic>)} = &#8211;<italic>F</italic>&#8242;(<italic>s</italic>).</p></list-item>
<list-item><p><italic>L</italic><sub>2</sub>&#160;<bold>(Translation):</bold> Formulation of the derivative <inline-formula>
<alternatives>
<mml:math id="Eq011-mml">
<mml:mrow><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mi>d</mml:mi><mml:mrow><mml:mtext mathvariant="italic">ds</mml:mtext></mml:mrow></mml:mfrac></mml:mstyle><mml:mo stretchy='false'>(</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mi>Y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>s</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mtext mathvariant="italic">sY</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x2013;</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x0027;</mml:mo><mml:mo stretchy='false'>(</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo><mml:mo>.</mml:mo></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e11.gif"/>
</alternatives>
</inline-formula>.</p></list-item>
<list-item><p><italic>L</italic><sub>3</sub>&#160;<bold>(Derivation):</bold> Execution of the product derivative to obtain a first-order ODE in <italic>s</italic>.</p></list-item>
<list-item><p><italic>L</italic><sub>4</sub>&#160;<bold>(Solution in S):</bold> Construction and solution using integrating factor.</p></list-item>
</list>
<p>Error Analysis: <bold>Broken Chain</bold> in <italic>L</italic><sub>3</sub>. The student omitted the term <italic>s</italic><sup>2</sup><italic>Y</italic>&#8242;(s) when deriving the product, mistakenly transforming the problem into a simple algebraic equation. The system determined that the subsequent steps are inconsistent with the initial error.</p>
<p><bold>Grade: 30%.</bold></p>
<p><bold>Problem 3: Non-homogeneous ODE (finite segment)</bold></p>
<p>Chain of thought links:</p>
<list list-type="bullet">
<list-item><p><italic>L</italic><sub>1</sub>&#160;<bold>(Definition):</bold> Modeling <italic>f</italic>(<italic>t</italic>) as a finite line segment (piecewise function).</p></list-item>
<list-item><p><italic>L</italic><sub>2</sub>&#160;<bold>(Transformation):</bold> Use of step functions (Heaviside) to transform <italic>f</italic>(<italic>t</italic>) to the s domain.</p></list-item>
<list-item><p><italic>L</italic><sub>3</sub>&#160;<bold>(Fractions):</bold> Decomposition of the resulting expression into partial fractions.</p></list-item>
<list-item><p><italic>L</italic><sub>4</sub>&#160;<bold>(Inverse):</bold> Application of time translations for the final solution.</p></list-item>
</list>
<p>Error Analysis: <bold>Broken Chain</bold> in <italic>L</italic><sub>3</sub>. The student failed to decompose partial fractions. By treating <italic>f</italic>(<italic>t</italic>) as a finite segment, the proposed solution <inline-formula>
<alternatives>
<mml:math id="Eq012-mml">
<mml:mrow><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=</mml:mtext><mml:mi>t</mml:mi><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle><mml:mtext>sin</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mtext>+</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x2013;</mml:mo><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>&#x00A0;</mml:mo><mml:mtext>cos</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mn>2</mml:mn><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e12.gif"/>
</alternatives>
</inline-formula> is incomplete because it does not include the &#8220;shutdown&#8221; terms of the finite segment, failing the identity verification.</p>
<p><bold>Grade: 15%.</bold></p>
<p>To evaluate the reliability of our neuro-symbolic framework, we employed Krippendorff&#8217;s Alpha (<italic>&#945;</italic>) coefficient (<xref ref-type="bibr" rid="B10">Krippendorff, 2018</xref>), a versatile statistical measure that quantifies the extent of agreement between different observers or methods&#8212;in this case, the automated system and the human expert. Unlike simpler percentage agreements, this method accounts for the probability of agreement occurring by chance and is calculated based on the ratio of observed disagreement (<italic>D<sub>o</sub></italic>) to the disagreement expected by chance (<italic>D<sub>o</sub></italic>), <inline-formula>
<alternatives>
<mml:math id="Eq013-mml">
<mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mtext>=</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x2013;</mml:mo><mml:mstyle scriptlevel='+1'><mml:mfrac><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>o</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>e</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:mrow>
</mml:math>
<graphic xlink:href="aiae-917_garcia-e13.gif"/>
</alternatives>
</inline-formula>. Krippendorff&#8217;s Alpha (<italic>&#945;</italic>) typically ranges from 0 to 1, where 1 indicates perfect reliability and 0 reflects agreement purely by chance. In terms of interpretation, an alpha value above 0.800 is generally considered the threshold for high reliability and solid conclusions, while values between 0.667 and 0.800 are acceptable for drawing tentative conclusions in most research contexts.</p>
<p>In our case <xref ref-type="table" rid="T4">Table 4</xref>, this statistical measure produces for Problem 1, <italic>&#945;</italic> = 0.94 (near-perfect agreement); for Problem 2, <italic>&#945;</italic> = 0.81 (strong agreement); for Problem 3, <italic>&#945;</italic> = 0.76 (acceptable reliability); and for the total score, <italic>&#945;</italic> = 0.84 (high overall reliability). These values suggest that the hybrid architecture can effectively replicate expert judgment, maintaining scientific rigor across diverse types of differential equation problem.</p>
<table-wrap id="T4">
<label>Table 4</label>
<caption>
<p>Comparison of Krippendorff&#8217;s Alpha Coefficients (a)</p>
</caption>
<table>
<tr>
<th colspan="4"><hr/></th>
</tr>
<tr>
<th align="left" valign="top">Component</th>
<th align="left" valign="top">Simple Prompt</th>
<th align="left" valign="top">Neuro-Symbolic Strategy</th>
<th align="left" valign="top">Difference</th>
</tr>
<tr>
<th colspan="4"><hr/></th>
</tr>
<tr>
<td align="left" valign="top">Problem 1</td>
<td align="left" valign="top">0.824</td>
<td align="left" valign="top"><bold>0.940</bold></td>
<td align="left" valign="top">+0.116</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Problem 2</td>
<td align="left" valign="top">0.781</td>
<td align="left" valign="top"><bold>0.810</bold></td>
<td align="left" valign="top">+0.029</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Problem 3</td>
<td align="left" valign="top">0.645</td>
<td align="left" valign="top"><bold>0.760</bold></td>
<td align="left" valign="top">+0.115</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Total Grade</td>
<td align="left" valign="top"><bold>0.862</bold></td>
<td align="left" valign="top">0.840</td>
<td align="left" valign="top">&#8211;0.022</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
</table>
</table-wrap>
<p>To evaluate the classification performance of the neuro-symbolic framework, confusion matrices were constructed by discretizing the continuous numerical grades into three distinct academic performance levels (see <xref ref-type="fig" rid="F3">Figure 3</xref>). For the individual problems, the classification was based on proportional thresholds of the maximum score, while the Total Grade (<italic>N</italic> &#8712; [0, 20]) was categorized according to the following intervals: <italic>Insufficient</italic> (0 &lt; <italic>N</italic> &lt; 9.5), <italic>Acceptable</italic> (9.5 &lt; <italic>N</italic> &lt; 16.5), and <italic>Outstanding</italic> (16.5 &lt; <italic>N</italic> &lt; 20). These matrices allow for a visual analysis of the model&#8217;s precision in identifying student competency levels and provide a clear overview of the systematic agreement between the automated system and the human expert&#8217;s standard.</p>
<fig id="F3">
<label>Figure 3</label>
<caption>
<p>Classification performance of the neuro-symbolic strategy</p>
</caption>
<graphic xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="aiae-917_garcia-g3.png"/>
</fig>
<p>Finally, to establish a performance baseline, we evaluated the consistency of the model using a direct instructional prompt: <italic>&#8220;Could you grade these differential equation exams, considering a score between</italic> 0 <italic>and</italic> 20 <italic>and that the first problem is worth</italic> 6 <italic>points, the second and third</italic> 7 <italic>points?&#8221;</italic>. Under this <italic>simple request</italic> scenario, the inter-rater reliability measured by Krippendorff&#8217;s Alpha (<italic>&#945;</italic>) showed a significant drop across the individual problems compared against the proposed neuro-symbolic strategy.</p>
<p>While the simple prompt achieved a high overall coefficient for the Total Grade (<italic>&#945;</italic> = 0.862), largely due to a statistical compensation of errors, it demonstrated a lack of technical precision in specific tasks, particularly in Problem 3 (<italic>&#945;</italic> = 0.645), where unit step functions and translation theorems introduced complexity. In contrast, the neuro-symbolic strategy, incorporating symbolic verification via SymPy and a weighted 30/70 logic-to-result ratio, yielded more robust and consistent coefficients for each problem (<italic>P</italic><sub>1</sub> = 0.94, <italic>P</italic><sub>2</sub> = 0.81, <italic>P</italic><sub>3</sub> = 0.76). These results suggest that a structured hybrid approach is essential for replicating expert judgment and maintaining mathematical rigor in automated assessment.</p>
</sec>
</sec>
<sec>
<title>5. Final Remarks</title>
<p>The integration of an LLM and a CAS, interconnected through a chain-of-thought framework, represents a robust solution to the dichotomy between contextual interpretation and algorithmic rigor in the sciences.</p>
<p>Although LLMs enable fluent reasoning, the interpretation of ambiguous statements, and the diagnosis of conceptual errors in natural language, the symbolic engine serves as a deterministic anchor that executes mathematical operations without the risk of hallucinations. This architecture allows scientific problems to be approached with the cognitive flexibility required to understand human-led processes, combined with the computational precision indispensable for validating results, effectively bridging the gap between theoretical intuition and technical accuracy.</p>
<p>The integration of a symbolic engine to verify the student&#8217;s chain of thought suggests the potential to refine the quality of feedback by helping to distinguish between minor algebraic slips and fundamental conceptual gaps. Regarding student learning, the capacity to receive partial credit through &#8216;Error-Drift&#8217; analysis appears to offer a more supportive environment that acknowledges logical consistency even when initial errors occur. From a teaching perspective, this framework might serve as a complementary tool for instructional practice, possibly mitigating some of the subjective variability and fatigue typically associated with manual grading in large class sizes.</p>
<p><xref ref-type="table" rid="T5">Table 5</xref> summarizes our beliefs about the advantages of the hybrid approach at a macroscopic level, compared with traditional methods and the use of pure LLMs.</p>
<table-wrap id="T5">
<label>Table 5</label>
<caption>
<p>Comparison between evaluation systems</p>
</caption>
<table>
<tr>
<th colspan="4"><hr/></th>
</tr>
<tr>
<th align="left" valign="top">FEATURE</th>
<th align="left" valign="top">CAA SYSTEMS</th>
<th align="left" valign="top">PURE LLM</th>
<th align="left" valign="top">HYBRID (PROPOSED)</th>
</tr>
<tr>
<th colspan="4"><hr/></th>
</tr>
<tr>
<td align="left" valign="top">Language flexibility</td>
<td align="left" valign="top">Low</td>
<td align="left" valign="top">High</td>
<td align="left" valign="top">High</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Mathematical rigor</td>
<td align="left" valign="top">High</td>
<td align="left" valign="top">Medium (hallucinations)</td>
<td align="left" valign="top">High</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Trailing analysis</td>
<td align="left" valign="top">No</td>
<td align="left" valign="top">Limited</td>
<td align="left" valign="top">Yes</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
<tr>
<td align="left" valign="top">Pedagogical feedback</td>
<td align="left" valign="top">Static</td>
<td align="left" valign="top">Fluid</td>
<td align="left" valign="top">Structured</td>
</tr>
<tr>
<td colspan="4"><hr/></td>
</tr>
</table>
</table-wrap>
<p>In conclusion, this hybrid approach represents a robust solution for scaling personalized education in exact sciences, ensuring that feedback is both semantically consistent and mathematically accurate.</p>
</sec>
</body>
<back>
<sec>
<title>Author Contributions</title>
<p><bold>Conceptualization:</bold> PG and LE, <bold>Investigation:</bold> PG and LE, <bold>Methodology:</bold> PG, <bold>Data curation:</bold> PG, <bold>Writing &#8211; original draft:</bold> PG, <bold>Writing &#8211; review and editing:</bold> PG.</p>
</sec>
<sec sec-type="COI-statement">
<title>Competing Interests</title>
<p>The authors declare that there are no conflicts of interest regarding the publication of this paper.</p>
</sec>
<sec>
<title>Use of AI</title>
<p>During the preparation of this work, the authors used Gemini to edit and review the article. The authors reviewed and edited the content and take full responsibility for its accuracy and integrity.</p>
</sec>
<ref-list>
<ref id="B1"><label>1</label><mixed-citation publication-type="journal"><string-name><surname>Becker</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Klein</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Neitz</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Parascandolo</surname>, <given-names>G.</given-names></string-name>, &amp; <string-name><surname>Kilbertus</surname>, <given-names>N.</given-names></string-name> (<year>2023</year>). <article-title>Predicting ordinary differential equations with transformers</article-title>. <source>Proceedings of the 40th International Conference on Machine Learning</source>, <volume>202</volume>, <fpage>1990</fpage>&#8211;<lpage>2011</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.2307.12617</pub-id></mixed-citation></ref>
<ref id="B2"><label>2</label><mixed-citation publication-type="webpage"><string-name><surname>Chen</surname>, <given-names>R. T. Q.</given-names></string-name>, <string-name><surname>Rubanova</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Bettencourt</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Duvenaud</surname>, <given-names>D. K.</given-names></string-name> (<year>2018</year>). <article-title>Neural ordinary differential equations</article-title>. <source>Advances in Neural Information Processing Systems</source>, <volume>31</volume>, <fpage>6571</fpage>&#8211;<lpage>6583</lpage>. <uri>https://proceedings.neurips.cc/paper_files/paper/2018/file/69386f6bb1dfed68692a24c8686939b9-Paper.pdf</uri></mixed-citation></ref>
<ref id="B3"><label>3</label><mixed-citation publication-type="journal"><string-name><surname>Collins</surname>, <given-names>K. M.</given-names></string-name>, <string-name><surname>Jiang</surname>, <given-names>A. Q.</given-names></string-name>, <string-name><surname>Frieder</surname>, <given-names>S.</given-names></string-name>, et al. (<year>2024</year>). <article-title>Evaluating language models for mathematics through interactions</article-title>. <source>Proceedings of the National Academy of Sciences (PNAS)</source>, <volume>121</volume>(<issue>24</issue>), <elocation-id>e2318124121</elocation-id>. <pub-id pub-id-type="doi">10.1073/pnas.2318124121</pub-id></mixed-citation></ref>
<ref id="B4"><label>4</label><mixed-citation publication-type="webpage"><string-name><surname>d&#8217;Ascoli</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Becker</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Mathis</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Schwaller</surname>, <given-names>P.</given-names></string-name>, &amp; <string-name><surname>Kilbertus</surname>, <given-names>N.</given-names></string-name> (<year>2024</year>). <article-title>Odeformer: Symbolic regression of dynamical systems with transformers [Spotlight presentation]</article-title>. <source>International Conference on Learning Representations (ICLR)</source>. <uri>https://openreview.net/forum?id=TzoHLiGVMo</uri></mixed-citation></ref>
<ref id="B5"><label>5</label><mixed-citation publication-type="journal"><string-name><surname>Garc&#237;a</surname>, <given-names>P.</given-names></string-name> (<year>2022</year>). <article-title>Modeling systems with machine learning based differential equations</article-title>. <source>Chaos, Solitons &amp; Fractals</source>, <volume>165</volume>, <elocation-id>112872</elocation-id>. <pub-id pub-id-type="doi">10.1016/j.chaos.2022.112872</pub-id></mixed-citation></ref>
<ref id="B6"><label>6</label><mixed-citation publication-type="book"><string-name><surname>Gnanaprakasam</surname>, <given-names>J.</given-names></string-name>, &amp; <string-name><surname>Lourdusamy</surname>, <given-names>R.</given-names></string-name> (<year>2024</year>). <chapter-title>The role of ai in automating grading: Enhancing feedback and efficiency</chapter-title>. In <string-name><given-names>S.</given-names> <surname>Kadry</surname></string-name> (Ed.), <source>Artificial intelligence and education &#8211; shaping the future of learning</source>. <publisher-name>IntechOpen</publisher-name>. <pub-id pub-id-type="doi">10.5772/intechopen.1005025</pub-id></mixed-citation></ref>
<ref id="B7"><label>7</label><mixed-citation publication-type="webpage"><collab>Google DeepMind</collab>. (<year>2024</year>). <article-title>Gemini 1.5 flash: A multimodal ai model</article-title> [Accessed: 2026-02-08. Large Language Model developed by Google.]. <uri>https://gemini.google.com/</uri></mixed-citation></ref>
<ref id="B8"><label>8</label><mixed-citation publication-type="journal"><string-name><surname>Huang</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Yu</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Ma</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Zhong</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Feng</surname>, <given-names>Z.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Chen</surname>, <given-names>Q.</given-names></string-name>, <string-name><surname>Peng</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Feng</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Qin</surname>, <given-names>B.</given-names></string-name>, &amp; <string-name><surname>Liu</surname>, <given-names>T.</given-names></string-name> (<year>2025</year>). <article-title>A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions</article-title>. <source>ACM Trans. Inf. Syst.</source>, <volume>43</volume>(<issue>2</issue>). <pub-id pub-id-type="doi">10.1145/3703155</pub-id></mixed-citation></ref>
<ref id="B9"><label>9</label><mixed-citation publication-type="book"><string-name><surname>Korthals</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Rosenbusch</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Grasman</surname>, <given-names>R.</given-names></string-name>, &amp; <string-name><surname>Visser</surname>, <given-names>I.</given-names></string-name> (<year>2025</year>). <chapter-title>Grading university students with llms: Performance and acceptance of a canvas-based automation</chapter-title>. In <string-name><given-names>A. I.</given-names> <surname>Cristea</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Walker</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>O. C.</given-names> <surname>Santos</surname></string-name>, &amp; <string-name><given-names>S.</given-names> <surname>Isotani</surname></string-name> (Eds.), <source>Artificial intelligence in education. posters and late breaking results, workshops and tutorials, industry and innovation tracks, practitioners, doctoral consortium, blue sky, and wideAIED</source> (pp. <fpage>36</fpage>&#8211;<lpage>43</lpage>). <publisher-loc>Springer Nature Switzerland</publisher-loc>. <pub-id pub-id-type="doi">10.1007/978-3-031-99264-3_5</pub-id></mixed-citation></ref>
<ref id="B10"><label>10</label><mixed-citation publication-type="book"><string-name><surname>Krippendorff</surname>, <given-names>K.</given-names></string-name> (<year>2018</year>). <source>Content analysis: An introduction to its methodology</source> (<edition>4th</edition>). <publisher-name>SAGE Publications</publisher-name>. <pub-id pub-id-type="doi">10.4135/9781071878781</pub-id></mixed-citation></ref>
<ref id="B11"><label>11</label><mixed-citation publication-type="journal"><string-name><surname>Lee</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Sim</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Shin</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Seo</surname>, <given-names>W.</given-names></string-name>, <string-name><surname>Park</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Lee</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Hwang</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Kim</surname>, <given-names>S.</given-names></string-name>, &amp; <string-name><surname>Kim</surname>, <given-names>S.</given-names></string-name> (<year>2025</year>). <article-title>Reasoning abilities of large language models: In-depth analysis on the abstraction and reasoning corpus</article-title>. <source>ACM Trans. Intell. Syst. Technol.</source>, <volume>16</volume>(<issue>6</issue>). <pub-id pub-id-type="doi">10.1145/3712701</pub-id></mixed-citation></ref>
<ref id="B12"><label>12</label><mixed-citation publication-type="journal"><string-name><surname>Mendonca</surname>, <given-names>P. C.</given-names></string-name>, <string-name><surname>Quintal</surname>, <given-names>F.</given-names></string-name>, &amp; <string-name><surname>Mendonca</surname>, <given-names>F.</given-names></string-name> (<year>2025</year>). <article-title>Evaluating llms for automated scoring in formative assessments</article-title>. <source>Applied Sciences</source>, <volume>15</volume>(<issue>5</issue>). <pub-id pub-id-type="doi">10.3390/app15052787</pub-id></mixed-citation></ref>
<ref id="B13"><label>13</label><mixed-citation publication-type="journal"><string-name><surname>Meurer</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Smith</surname>, <given-names>C. P.</given-names></string-name>, <string-name><surname>Paprocki</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>&#268;ert&#237;k</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Kirpichev</surname>, <given-names>S. B.</given-names></string-name>, <string-name><surname>Rocklin</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Kumar</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Ivanov</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Moore</surname>, <given-names>J. K.</given-names></string-name>, <string-name><surname>Singh</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Rathnayake</surname>, <given-names>T.</given-names></string-name>, <string-name><surname>Vig</surname>, <given-names>V.</given-names></string-name>, <string-name><surname>Granger</surname>, <given-names>B. E.</given-names></string-name>, <string-name><surname>Muller</surname>, <given-names>R. P.</given-names></string-name>, <string-name><surname>Bonazzi</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Gupta</surname>, <given-names>H.</given-names></string-name>, <string-name><surname>Vats</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Johansson</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Pedregosa</surname>, <given-names>F.</given-names></string-name>, &#8230; <string-name><surname>Anthony</surname>, <given-names>A.</given-names></string-name> (<year>2017</year>). <article-title>Sympy: Symbolic computing in Python</article-title>. <source>PeerJ Computer Science</source>, <volume>3</volume>, <elocation-id>e103</elocation-id>. <pub-id pub-id-type="doi">10.7717/peerj-cs.103</pub-id></mixed-citation></ref>
<ref id="B14"><label>14</label><mixed-citation publication-type="book"><string-name><surname>Shuste</surname>, <given-names>V. J.</given-names></string-name>, &amp; <string-name><surname>Zapata-Rivera</surname>, <given-names>D.</given-names></string-name> (<year>2012</year>). <chapter-title>Adaptive educational systems</chapter-title>. In <string-name><given-names>P.</given-names> <surname>Durlach</surname></string-name> &amp; <string-name><given-names>A.</given-names> <surname>Lesgold</surname></string-name> (Eds.), <source>Adaptive technologies for training and education</source> (pp. <fpage>7</fpage>&#8211;<lpage>27</lpage>). <publisher-name>Cambridge University Press</publisher-name>. <pub-id pub-id-type="doi">10.1017/CBO9781139049580.004</pub-id></mixed-citation></ref>
<ref id="B15"><label>15</label><mixed-citation publication-type="journal"><string-name><surname>Tan</surname>, <given-names>L. Y.</given-names></string-name>, <string-name><surname>Hu</surname>, <given-names>S.</given-names></string-name>, <string-name><surname>Yeo</surname>, <given-names>D. J.</given-names></string-name>, &amp; <string-name><surname>Cheong</surname>, <given-names>K. H.</given-names></string-name> (<year>2025</year>). <article-title>A comprehensive review on automated grading systems in stem using ai techniques</article-title>. <source>Mathematics</source>, <volume>13</volume>(<issue>17</issue>), <elocation-id>2828</elocation-id>. <pub-id pub-id-type="doi">10.3390/math13172828</pub-id></mixed-citation></ref>
<ref id="B16"><label>16</label><mixed-citation publication-type="journal"><string-name><surname>Vaswani</surname>, <given-names>A.</given-names></string-name>, <string-name><surname>Shazeer</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Parmar</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Uszkoreit</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Jones</surname>, <given-names>&#321;.</given-names></string-name>, <string-name><surname>Gomez</surname>, <given-names>A. N.</given-names></string-name>, <string-name><surname>Kaiser</surname>, <given-names>L.</given-names></string-name>, &amp; <string-name><surname>Polosukhin</surname>, <given-names>I.</given-names></string-name> (<year>2017</year>). <article-title>Attention is all you need</article-title>. <source>Advances in Neural Information Processing Systems</source>, <volume>30</volume>, <fpage>5998</fpage>&#8211;<lpage>6008</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.1706.03762</pub-id></mixed-citation></ref>
<ref id="B17"><label>17</label><mixed-citation publication-type="journal"><string-name><surname>Wei</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Schuurmans</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Bosma</surname>, <given-names>M.</given-names></string-name>, <string-name><surname>Ichter</surname>, <given-names>B.</given-names></string-name>, <string-name><surname>Xia</surname>, <given-names>F.</given-names></string-name>, <string-name><surname>Chi</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Le</surname>, <given-names>Q.</given-names></string-name>, &amp; <string-name><surname>Zhou</surname>, <given-names>D.</given-names></string-name> (<year>2022</year>). <article-title>Chain-of-thought prompting elicits reasoning in large language models</article-title>. <source>Advances in Neural Information Processing Systems (NeurlPS)</source>, <volume>35</volume>, <fpage>24824</fpage>&#8211;<lpage>24837</lpage>. <pub-id pub-id-type="doi">10.48550/arXiv.2201.11903</pub-id></mixed-citation></ref>
<ref id="B18"><label>18</label><mixed-citation publication-type="webpage"><string-name><surname>Worden</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Croteau</surname>, <given-names>E.</given-names></string-name>, <string-name><surname>Cheng</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>McReynolds</surname>, <given-names>A.</given-names></string-name>, &amp; <string-name><surname>Heffernan</surname>, <given-names>N.</given-names></string-name> (<year>2024</year>). <article-title>Leveraging large language models for evaluating explanations in math education [NSF Public Access Repository]</article-title>. <source>Proceedings of the 14th Learning Analytics and Knowledge Conference (LAK &#8217;24)</source>. <uri>https://par.nsf.gov/biblio/10470442</uri></mixed-citation></ref>
<ref id="B19"><label>19</label><mixed-citation publication-type="webpage"><string-name><surname>Zhou</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Sch&#228;rli</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Hou</surname>, <given-names>L.</given-names></string-name>, <string-name><surname>Wei</surname>, <given-names>J.</given-names></string-name>, <string-name><surname>Carles</surname>, <given-names>N.</given-names></string-name>, <string-name><surname>Wang</surname>, <given-names>X.</given-names></string-name>, <string-name><surname>Schuurmans</surname>, <given-names>D.</given-names></string-name>, <string-name><surname>Zhou</surname>, <given-names>Y.</given-names></string-name>, <string-name><surname>Bousquet</surname>, <given-names>O.</given-names></string-name>, <string-name><surname>Le</surname>, <given-names>Q. V.</given-names></string-name>, &amp; <string-name><surname>Chi</surname>, <given-names>E. H.</given-names></string-name> (<year>2023</year>). <article-title>Least-to-most prompting enables complex reasoning in large language models</article-title>. <source>International Conference on Learning Representations (ICLR)</source>. <uri>https://openreview.net/references/pdf?id=b93l8WgU8</uri></mixed-citation></ref>
</ref-list>
</back>
</article>