Description: Advances in technology are too great to ignore. Assessment and evaluation practices in Canada appear to be lagging behind, when compared to other jurisdictions. This panel was designed to discuss recent technology advances in assessment and evaluation for possible use in Canada.
The following questions were intended to inspire and generate ideas. The speakers did not need to address the questions directly.
What are the cutting edge technologies currently being used in assessment and evaluation?
What are the opportunities and challenges in taking advantage of these technological advances in learning and assessment?
With the technological advances where do you see the assessment and evaluation field in 5 years?
How can we begin to integrate more technology into everyday assessment and evaluation practices?
Authors: Mark J. Gierl, PhD & Tahereh Firoozi
Institution: University of Alberta
Gierl, M. & Firoozi, T. (2022, January 27-28). Preparing Students for Careers Focused on Technology-Based Assessment and Evaluation [Paper presentation]. Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
We are at the beginning of data-enabled revolution. Massive amounts of data are produced every day. This data is collected using innovative software platforms and structured using advanced computational architectures so that machine learning algorithms can extract insights and yield outcomes that inform our decisions and guide our actions. This revolution is unfolding along two separate but related streams of research activity. The first stream is foundational research. Foundational research includes designing new data science, machine learning, and artificial intelligence methods and algorithms. It also includes developing appropriate computational platforms and software applications to implement these methods and algorithms. The second stream is applied research. Applied research implements the outcomes from the foundational research to solve practical problems in specific content areas. These content areas are not only different but also diverse, as they include fields that range from manufacturing to construction, health, energy technologies, and law enforcement, business to finance. Education is one of the applied content areas. Education will be profoundly affected by this data-enabled revolution. Educational measurement, research methodology, and evaluation now rely on data-mining techniques, machine learning algorithms, and learning analytic methods to analyze complex educational activities and outcomes, including cognitive, behavioural, and physiological (e.g., computer log files; eye tracking measures) data readily and abundantly generated by teachers and students. These techniques, algorithms, and methods are used to identify and to discover patterns and features that can be used, for instance, to predict learning outcomes, guide instructional methods, and inform curriculum developments. The information extracted from technology-based assessments will allow researchers and practitioners to create data-driven learning analytic models and applications to support learning and to guide teaching in digital learning environments that are becoming increasingly common as we begin the transition to an online educational world.
Unfortunately many of the programs that focus on educational measurement and evaluation in Canada—particularly at the graduate level—do not offer the kind of training that will enable students to contribute to either the primary or the secondary research stream guiding this data revolution. In fact, most students in educational measurement and evaluation have little training in data-mining techniques, machine learning algorithms, and learning analytic methods. Yet, these students will be expected to contribute to fields and areas of research or to compete for jobs and professions where these techniques, algorithms, and methods are already in place. Hence, we must quickly adapt our training programs to meet these changes and we must adjust our focus so that students become aware of and proficient in the techniques and methods that guide both the primary and secondary streams of research. In short, we must quickly adjust to the profound changes that are now occurring in the field of educational measurement and evaluation. These changes are motivated by the increased interest in and growing demand for extracting patterns and meaning from complex educational data sources that, increasingly, will guide evidence-based decision making throughout all levels in the educational system. We must therefore revise, augment, and update the focus of our programs in educational measurement and evaluation so that the students who complete their training can create, design, evaluate, and implement modern technology-based educational assessments along with the systems needed to support these assessments.
Every measurement and evaluation program in Canada is unique. Applied research areas within education are focused on many problems in a range of different contexts. Hence, a uniform training program is neither required nor desirable to prepare students for careers in measurement and evaluation. But we do feel that some guiding principles may be helpful. Hence the purpose of our paper is to identify and to describe three key principles that can be used to create programs to train the next generation of students in measurement and evaluation, particularly those graduate students who intend to focus on technology-enhanced assessment. For each principle, we describe the main idea behind the principle and then we describe how the principle can be implemented or supported within a program.
PRINCIPLE 1 Interdisciplinary Focus: Technology is developing concurrently in many fields and disciplines. Technology is also developing along two separate but related streams of research activity (i.e., foundational and applied). Students must understand both streams of research activity and, ultimately, how the streams are connected.
Interdisciplinary coursework (i.e., coursework taken from Education, Computing Science, Electrical and Computer Engineering, Linguistics) for students is required to so different methods, perspectives, and standards of practice are infused in the students’ program of study.
Recruiting students who have diverse educational backgrounds (i.e., students with backgrounds in Computing Science, Mathematics, Linguistics) and a wide range of experiences (i.e., students who have worked in industry prior to returning to graduate school) broaden the scope of a program for every faculty member and student because of the important benefits that are realized with the addition of different backgrounds, skills, research interests, and perspectives.
Graduate student committee representation should be interdisciplinary so different methods, perspectives, and standards of practice are infused in each student’s program of research.
Faculty from different fields and disciplines should contribute to the outcomes in a program, with a focus on diverse but complementary research programs and teaching specializations.
PRINCIPLE 2 Flexible Program Delivery: Shifts and changes in technology occur very quickly. As a result, programs must be flexible and must be capable of accommodating these constant changes. The purpose of offering a flexible program is to prepare students to be leaders in a rapidly changing world. The transition from a graduate program to a profession means that students must be proficient in using the latest techniques, algorithms, and methods.
Multiple lines of focus in a program must be available because diverse applied research streams mean that different methods will be required to solve different types of problems. Student must therefore have many choices in the courses they complete—these choices should reflect the student’s particular interests and should map onto a specific educational context. Hence, a flexible program in assessment and evaluation will have more elective than required courses.
Multiple lines of focus in a program must be available because diverse applied research streams mean that different methods will be required to solve different types of problems. Student must therefore have many choices in the research activities they pursue. Hence, a flexible program in assessment and evaluation will ensure that students can pursue diverse lines of research. This research agenda should include faculty supervision that resides both inside and outside of the program.
Students must have access to different kinds of technology-based resources to support their coursework and research. Some of these resources will only be available outside of the university environment (e.g., Google Cloud Platform; Amazon Cloud Computing).
PRINCIPLE 3 Culture for Creating Ideas: Technology and innovation is driven by ideas. Hence, programs must cultivate a culture where faculty and students can work together to create and develop their ideas. Ideas develop in environments where creativity and risk are both valued and encouraged. Ideas also develop in environments that foster collaboration along with a healthy interplay between research and practice.
Faculty members must initiate and maintain an active research program. The research program includes funding for graduate students.
A cohort model of student training is optimal where faculty attract highly-qualified students who can participate in their research program but also collaborate with other students and faculty in their program, in their university, and outside of the university.
Programs should strive to offer internships and to develop relationships with leaders in industry so students can develop their skills and see—first hand—the link between theory and practice, the benefits of collaborative research with practitioners in industry, and the importance of gaining experience in different working environments before making a career choice upon graduation.
We predict that the data revolution will change the field of educational testing. The transition from paper- to computer-based tests almost guarantees our prediction will be realized because educators can now access tremendous amounts of data from students and teachers using online educational platforms and testing systems. The assessment and evaluation methods and techniques used in the past to model educational outcomes and to make educational inferences will become irrelevant. Fortunately, many new methods and techniques are either under-development or in-use to help us understand our increasingly data-focused world. It is important to ensure that students receive training that equips them to use these contemporary methods and techniques as well as contribute to the creation of new methods and techniques as the field of assessment and evaluation marches forward and, in the process, continues to change and evolve.
Author: Hollis Lai, PhD
Institution: University of Alberta
Lai, H. (2022, January 27-28). Leveraging Technology in Educational Measurement [Paper presentation]. Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
It has been two decades into the 21st century. Educational assessment has experienced many hurdles in adopting new technologies. Computer-based testing, adaptive testing, automated scoring, and situational judgment tests are just a few of the innovations that are now in mass adoption. However, with the proliferation of interconnectedness through the internet and technology, there is an ever diverging skill set for the training of assessment and evaluation, and how to adopt it. To effectively leverage the use of technology in the field of educational measurement, I provide a summary of what is considered an expert in this field, what is training necessary for the field, and what are some actions needed for the field to leverage technology.
1) What is an expert of educational measurement in the 21st century?
To know what are the changes necessary to harness the change in technology, we must first reflect on what is deemed an expert of educational measurement today. A measurement expert today not only have to possess the traditional knowledge in measurement theory, test development, validity and standard-setting, but they must also be able to solve assessment issues of today such as addressing concerns of equity, diversity, and inclusiveness, provide strategies, and solve issues related to test security and cheating detection and provide advice and guidance to evolve testing practices that fit evolving requirements from the learners and the profession. Adding to this list of skills and knowledge, there is also a technical aspect of knowledge that is required such as understanding and implementing database queries, understanding information and software security paradigm, user experience and software design principles, and an understanding of various techniques in statistical and computational methods. In my humble experience with a decade of experience in medical education, I have noticed those are the skills that are commonly required in solving problems that are facing institutions and testing organizations. Suffice to say, the list is evolving and is becoming longer every day.
Testing practices are shifting in many disciplines. One such example in medicine is the adoption of competency-based training. This shift toward competency-based education has created: new opportunities to assess their learners, mobilization of many standard-setting panels, development of new assessment collection and reporting platforms, and new tools to facilitate and manage learner progression (these are the tools are the focus of my presentation). In this change, the experience for student training has evolved, how instructors determine learner competence has evolved, and the monitoring process of learners has evolved. However, this example of change management also highlighted some common themes that are occurring in other testing practices. First, expertise in measurement is required in developing and adapting to these changes, but in absence of educational measurement experts, the changes will continue. Second, the adoption of new technology and solutions can be low or high fidelity. Meaning, the adaptation of technology may not necessarily incur years of required development. Third, while educational measurement is still the kernel of most issues facing assessment and student learning, the measurement problem often requires an expert in measurement to identify this nuanced understanding. In sum, there are many interesting problems and opportunities to be solved in fields outside of educational measurement. It requires a well-rounded set of skills and understanding of all aspects of test development on a day-to-day basis.
2) What are the skills needed in educational measurement in the coming decade?
The knowledge, skills, and ability of educational measurement are evolving at an increasing pace. This is not unlike other fields of study, where an increasing number of skills are required at training to enter the field. There are three key attributes that I think will be needed for trainees of educational measurement entering into practice.
There is an increasing requirement for educational measurement to solve problems in the different domain spaces. Skills related to equating, survey and test development, latent scoring, statistical modelling and standard-setting are still needed, but they are stacked among other required understanding such as machine learning techniques, understanding of image and text processing approaches, and constraint programming approaches. A subset of these skills will help to solve the problems that are emerging in the next decade.
There is also a shift in the types of studies and research ongoing in educational measurement. Specifically, there is an increasing focus toward a problem-solving framework in presenting novel solutions in the field. In this shift, there are more and more applications and solutions from other fields of study that are being applied to problems in educational measurement. As the field of educational measurement evolves to incorporate issues in data and learning sciences, a problem-solving approach in studies may help in establishing a paradigm of knowledge.
In educational measurement, learning needs to be captured at all levels of training. From early childhood education to the training of specialists, each stage or context of learning is different and requires different solutions in the evaluation of performance. Training in educational measurement should include the different nuances in each context but allow learners to apply their knowledge in an agnostic manner. Meaning, educational measurement experts should be well versed in the assessment of knowledge across all levels of learning. This is an important distinction as measurement experts in the upcoming decade will likely be adapting in different learning contexts, for training in different professions, languages and cultures, assessing skills and knowledge from learners with different backgrounds.
3) How do we as a field leverage technology?
Technology is an application of scientific knowledge into a field of practice. With educational measurement, the challenge of applying new technology is threefold: the ability to create the technology required in the field, the ability to adapt existing technology for use in the field, and the ability to persuade stakeholders in adapting the technology available. Experts and learners in educational measurement require a new skill set to create new technologies. Although machine learning and natural language processing techniques are skills in high demand, those are not the only skills required to leverage technology. Other fields of knowledge such as software architecture, cognitive sciences, design thinking, and user experience design can contribute to the creation of new technology to better facilitate and capture the measurement of learning. Learning and the evaluation of learning is a common task that is pervasive across many fields. By adapting paradigms and methods across different fields, the field of educational measurement can better leverage and improve our method of measurement for learning.
As experts in educational measurement, the development and adaptation of the new methods of assessment is an important role that will sustain the field. However, another role that is equally needed lies in knowledge translation of the new methods and technologies for assessments. This role requires an in-depth understanding of learning theories, arguments to validity, and communication. With an ever-increasing depth of knowledge in every field, being able to translate the theories and methods used for stakeholder understanding is a missing piece in translating theory into practice. Venues such as this one will help us communicate our background and be able to contribute to learning among a team of experts in computer, data, cognitive and statistical science.
In sum, my outlook remains positive in that education measurement is still needed while undergoing change. Our ability to adapt in a domain that requires knowledge in computer, data, cognitive, and statistical science relies on our ability to create and adapt new methods to improve the measurement of learning. Admittedly, the impact of the internet, social media, and the subsequent data revolution have created a demand to rationalize the burgeoning amount of data. It is now up to us, as experts in educational measurement, to design, prototype, validate, and implement solutions that fit our evolving need for assessment.
Author: Cameron Montgomery, PhD
Organization: Education Quality and Accountability Office (EQAO)
Montgomery, C. (2022, January 27-28). EQAO Digitalization of Large-Scale Assessments [Paper presentation]. Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
The field of education has evolved considerably in recent years, increasingly incorporating digital learning tools. As an agency of the Government of Ontario mandated to assess students at key stages of their learning journey, the Education Quality and Accountability Office (EQAO) strives to continually improve its support of positive learning outcomes for all students. The agency quickly realised it needed to modernize its operations to reflect better the new fast-changing digital landscape. Digitalizing assessments ensures the agency continues to be responsive to the needs of the education community and that each student taking the assessment is offered the opportunity to demonstrate their understanding of the curriculum fully.
One of the challenges that faces large-scale assessment today is creating an assessment system where each test taker can fully demonstrate their knowledge and skills while the system maintains the reliability of data inherent to standardized testing. One digital assessment model, computer adaptive testing, reflects more accurately student achievement and the test taker’s understanding of curricula and of what is being assessed. At this time, for its province-wide standardized assessments, the Education Quality and Accountability Office (EQAO) has adopted to use a Multi-Stage Computer Adaptive Testing (msCAT) model to assess mathematics, and a testlet-based Linear-on-the-Fly (tLOFT) model to assess literacy.
Digitalized large-scale assessments offer several opportunities compared to the paper-based assessment model formerly used by EQAO:
Better alignment with the digital world that students engage with every day: In an age of rapid technological advancement, online assessments reflect both classroom and education experiences better.
Flexible assessment administration window: The computer assessment model allows for an assessment administration that can be available throughout the school year to support boards’ and schools’ schedules better. Schools and boards can administer large-scale assessments at their convenience to several groups of students at any time within the administration period.
Faster reporting: Computer assessments can offer immediate assessment results of multiple-select items upon completion by the student. Additionally, the remote online scoring of open-ended items is available at the scorer’s schedule convenience and this allows the agency to engage with educators from across the province for all its scoring activities.
Inherent accommodations tools: Several accessibility tools are offered to every student through each assessment’s toolbar.
Less carbon footprint: Digitalized assessments contribute to reducing waste connected to paper-based products.
At this juncture, the agency has identified some challenges linked to digitalized, online large-scale assessments:
Unique information technology infrastructure of each school board: Ontario is a large province consisting of 72 boards and school authorities, each with its diverse technological needs, geography and system availability.
Device availability: There exists significant differences in the availability of in-person student devices in schools and boards. (This is being mitigated by offering longer administration windows that are opened from two to six months.) Additionally, the assessments need to be administered in-person for a specific group in a specific physical setting, and cannot be proctored remotely at this time.
New large-scale assessment process: Following the government’s directive, the provincial assessments’ online model and platform were developed and introduced rapidly after more than twenty years of administering paper-based assessments. EQAO needs to inform stakeholders in a very short time, and the agency continues to adapt learning modules aimed at administrators and other stakeholders as the online assessment initiative moves forward.
Equity is a key consideration in implementing digital assessments and it is important that the assessments be aligned with the everyday experiences of students. The technology should not be the driving force itself as it is the data and insights from the assessments that are paramount for system and student improvement. The agency’s approach is anchored to being as flexible as possible and allowing schools to schedule the assessments at times that are suitable for them.
As the education and evaluation fields learn more about digital large-scale assessment platforms and as data become available for analysis, we can expect that technological advances will allow assessments in a few years to reflect student experiences much more accurately. Of particular note, assessments must continue to be responsive to the accessibility needs of each test taker, with a particular attention to increasing assessment customization and the inclusion of more accessibility tools. To engage students better as they are writing the assessment and help reduce stress associated with “exams”, digital assessments will probably explore seriously the possibilities granted by gamification models.
2.5 Using Text Mining to Identify Authentic Content & Neural Network to Score the College Major Preference Assessment
Authors: Amery Wu, 1, Shun-Ful Hu, 1, & Jacob Stone 2
Institutions: The University of British Columbia 1, Visier Inc. 2
Wu, A. D., Hu. S.F., & Stone, J. E. (2022, January 27-28). Using Text Mining to Identify Authentic Content & Neural Network to Score the College Major Preference Assessment [Paper presentation]. Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
Background & Purpose
The traditional Psychometrics makes use of designed data for interpretation within the researchers’ measurement/assessment framework. In contrast, Data Science and Artificial Intelligence have little interest in and dependence on a framework for interpretation. They use natural data to uncover patterns/algorithms for future prediction.
Computational Psychometrics, as a field, emerged from traditional Psychometrics in response to the variety, velocity, and volume of data obtained from the complex assessment systems made possible by new digital technology (von Davier, Mislevy, & Hao, 2021). It incorporates devices from Data Science and Artificial Intelligence into traditional psychometric methodology.
This presentation reports two case studies of Computational Psychometrics utilizing the College Major Preference Assessment (CMPA, iKoda Research, 2017). The CMPA assists individuals in finding their top three favorites from 50 college majors that are commonly offered in Northern America. Case-1 used text mining techniques in Data Science to identify the CMPA assessment content. Case-2 used the multilabel neural network in machine learning to score a short version of CMPA with an aim to predict as well as the original version.
Overview for CMPA
The design of CMPA has two sections of different assessment formats in sequence: Likert-type rating and forced-choice. The Likert-type format is designed to narrow the 50 majors down to a list of candidates. The forced-choice format is designed to further nail down one’s top three favorites. In addition, as a person progresses through the assessment, there are three steps at which the respondent’s low scoring majors are screened out from any further assessment. This results in an adaptive test increasingly personalized to a respondent. The interpretation and use of the assessment results should only be personal. Wu (2021) reported good reliability and validity evidence for CMPA.
Case-1: Using Text Mining Technique to Identify CMPA Assessment Materials
To be as realistic and authentic as possible, the content of CMPA was extracted from the natural texts posted on the official websites. What follows describes the methods for creating the assessment content.
First, the corpus was gathered from the publicly available descriptions for courses that constituted a university major. Data was collected from over 40 Northern American Universities. The data collected for each major was compiled into a single very large document of raw text.
Then, we computed the term frequency-inverse document frequency (TF-IDF) for each document following the formula 𝑇𝐹-𝐼𝐷𝐹 = 𝑇𝐹 × 𝑙𝑛 (1/DF), where TF was the frequency of the term (word) in the document, and IDF was the inverse of the number of documents for the different majors in which the term appears (see Fan & Qin, 2018). The TF-IDF yielded large values for words that appeared frequently in only one major but not others. This helped identify the ten most unique words for each major.
Next, we traced back to the original textual content to find the phrases that the ten terms were originally embedded in. For example, the term “earthquake” had a high TF-IDF for the document Geology, and it was embedded in phrases such as “how earthquakes occur.”
This way, we identified phrases that best characterized the unique nature, activity, work, topic, and skills, associated with a major. These identified phrases were used as the basis for creating both the Likert-rating and forced-choice items. Other examples of the identified phrases are “how the mind controls behaviour” for Psychology and “how plastic reacts when stressed” for Material Sciences.
Case-2: Using Neural Network to Score a Short Version of CMPA
In machine learning, a neural network is a circuit of layers of artificial neurons, called nodes, connected by directional edges. The first layer is the input data, and the last layer is a set of nodes, each representing the predicted probabilities of different outcomes. The nodes are simply hidden (latent) variables that are a function of the nodes in the previous layer where the edges represent the weights. The goal of a neural network is to find the optimal structure, function, and weights for (i.e., train) the neural network based on the input data, so that it can produce the best outcomes for future predictions.
Case-2 used a supervised (labeled) machine learning technique of multilabel neural network to score the short version of CMPA. The short version consisted of only the first section, i.e., the 99 Likert items. As such, the neural network was regarded as a machine for scoring the 99 Likert items. If the scoring accuracy is satisfactorily high for all majors, it will be defensible to use only the short version to assess individuals’ preference for the 50 majors, which is more time-efficient and less cognitively burdensome on the respondents.
Thus, the task of the multilabel neural network was to train the input data so that the short version could identify the top three majors for future users as effectively as the original longer version using the original summing-up scoring method. For each respondent, the trained network would output fifty probabilities (final scores), one for each major. To evaluate the prediction accuracy, the top three majors identified by the predicted probabilities were compared to the actual top three identified by the original CMPA procedure. Then, the proportion of agreement was taken as the accuracy rate. See Gargiulo, Silvestri, Ciampi, and De Pietro (2019) more explanation for multilabel neural network.
The results show that, with two middle layers, each with 64 nodes, the short version predicted the original outcomes exceedingly well. The minimum accuracy rate was 83% for Philosophy, and the maximum accuracy rate was 99% for Chemical Engineering, Electronic Engineering, and Materials Science. The overall accuracy had a median = 96% and mean = 95% across the 50 majors.
The results were highly generalizable to future users of CMPA. This was evaluated by a dataset different from training set that was saved for testing the generalizability of the trained multilabel neural network. The accuracy rates were almost equal to those reported in the last paragraph for the training data. The minimum accuracy rate was 81% for Psychology and maximum accuracy rate was 99% for Chemical Engineering, Electronic Engineering, and Materials Science. The overall accuracy had a median = 95% and mean = 94% across the 50 majors.
Conclusion & Future Work
Both case studies showed that the techniques in Data Science and Artificial Intelligence, when used carefully with human judgement, can contribute to Psychometrics in a substantive way. That said, a frequent criticism on neural network is its black box approach to prediction because the algorithm for the prediction is entirely data driven and is often hard to make sense. This contradicts with the focus of psychometrics on explanation and evaluation. To tackle this problem, our future work will delve into how explanatory tools in Psychometrics can be beneficial to explanatory Artificial Intelligence, a field that helps to make sense how the input data contribute to the prediction.
As presented in a very recent book edited by von Davier et al. (2021), there have been innovative examples of Computational Psychometrics. We anticipate more variety of marriages between Psychometrics and modern technology to create more exciting assessment projects.
Fan, H., & Qin, Y. (2018, May). Research on text classification based on improved tf-idf algorithm. In X. Luo (Eds.) 2018 International Conference on Network, Communication, Computer Engineering (NCCE 2018) (pp. 501-506). Atlantis Press. doi: https://doi.org/10.2991/ncce-18.2018.79.
Gargiulo, F., Silvestri, S., Ciampi, M., & De Pietro, G. (2019). Deep neural network for hierarchical extreme multi-label text classification. Applied Soft Computing, 79, 125-138.
iKoda Research (2017). Found a major you love. College Major Preference Assessment. Retrieved from: https://www.i-koda.com/ikoda-college-website/.
von Davier, A. A., Mislevy, R.J., Hao, J. Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment (Eds.). Springer, Cham.
von Davier, A. A., Mislevy, R. J., Hao, J. (2021). Introduction to Computational Psychometrics: Towards a Principled Integration of Data Science and Machine Learning Techniques into Psychometrics. In: von Davier A.A., Mislevy R.J., Hao J. (Eds.) Computational Psychometrics: New Methodologies for a New Generation of Digital Learning and Assessment. Methodology of Educational Measurement and Assessment. Springer, Cham. https://doi.org/10.1007/978-3-030- 74394-9_1.
Wu, S. (2021). Comparing Likert-type and forced-choice formats for assessing preference: a validation study on the College Major Preference Assessment (CMPA) [Master’s thesis, University of British Columbia]. UBC Theses and Dissertation. Retrieved from https://open.library.ubc.ca/collections/ubctheses/24/items/1.0397018.
Author: Greg Roussel
Organization: Grand Erie District School Board
Roussel, G. (2022, January 27-28). Untitled [Paper presentation]. Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
Advances in technology are too great to ignore. Assessment and evaluation practices in Canada appear to be lagging behind, when compared to other jurisdictions. This panel is designed to discuss recent technology advances in assessment and evaluation for possible use in Canada.
What are the cutting-edge technologies currently being used in assessment and evaluation?
The move to remote learning as a consequence of the COVID-19 pandemic has required many educators to adopt new skills and tools, including video conferencing, shared documents and digital spaces. Although these tools were available to educators in the Grand Erie District School Board prior to the pandemic, they were not widely used. Similarly, although the board implemented the learning management system Brightspace (see below), its usage was limited to the more technically savvy teachers, and teacher-consultants responsible for education technology. With the abrupt transition to remote learning in March of 2020 educators found themselves working in an unfamiliar context that required them to quickly learn and develop new skills for the effective use of these tools in remote teaching.
Brightspace is a promising learning management system that allows educators to interact with students in a digital space using a variety of tools such as:
Posting assignments and rubrics
Creating small discussion groups and assignment spaces for students
Collecting completed work for assessment and evaluation
Allowing students to develop and maintain a portfolio of work to demonstrate learning
This interactive environment gives teachers and students the ability to share work and provide feedback in an asynchronous fashion. Features include:
Teachers can post assignment rubrics ahead of time for students to reference.
Students can upload assignments
Teachers can provide feedback in a variety of ways, including notations in the document and a video response.
The Student Portfolio tool where students can provide evidence of their learning (e.g., a short audio clip explaining a concept, illustrations, notes) which teachers can use this evidence in their overall assessment.
One value-added feature of Brightspace is that it allows to teachers the ability to link specific expectations from the Ontario curriculum, which is available in the software, to an assignment. When evaluating the assignment, the teacher can see all the related expectations and assign the appropriate achievement level (Level 1, 2, 3, 4) for each expectation. For example, in Grade 9 Academic English an assignment may be designed to assess Reading for Meaning, with the specific expectation of Demonstrating Understanding of Content; from the Ontario curriculum:
1.3 identify the important ideas and supporting details in both simple and complex texts (e.g., select details from a story to create a profile of a character in the story; use a graphic organizer to categorize the ideas in an article).
Other tools in Brightspace include the ability to create online quizzes and surveys; set up small group discussions and assignment spaces as well checklists to help students navigate and manage the course.
What are the opportunities and challenges in taking advantage of these technological advances in learning and assessment?
The use of digital tools such as Brightspace provide tremendous opportunities for educators to meet students where they are, in terms of learning. Students today are “digital natives” and tend to possess highly developed technical skills that are more advanced than most adults. By reaching them on the devices that they use daily (phones, tablets, laptops) students can be more engaged in their learning. The asynchronous nature of digital spaces and remote learning allows students and teachers to engage with each other at times that are convenient for both.
The biggest obstacle in leveraging these technological advancements has been the slow uptake of new digital tools by educators. During the most recent provincial shutdown daily usage of Brightspace more than tripled at Grand Erie, suggesting that teachers are only using the tool during remote learning.
While some teachers are using Brightspace on a daily basis, it is only to take advantage of select tools. Teachers are predominately using the Assignments tool, by an almost 5:1 ratio compared to the next most popular tool, Quizzes. When looking at this phenomenon through the SAMR model (Substitution, Augmentation, Modification, and Redefinition) it is clear that most educators are still at the substitution or augmentation stage – teachers are collecting assignments digitally versus paper.
While school boards have devoted funds to make technology available to all students the issue of equitable access is one that is not easily addressed. Many remote communities do not have broadband internet and families from lower socio- economic backgrounds do not have the financial resources to acquire the devices needed to access the technology.
With the technological advances where do you see the assessment and evaluation field in 5 years?
As a new, generation of teachers enter the classroom that have grown up with advanced educational tools we should expect that they will use technology in their assessment and evaluation practices more often. Given that most of the current technological usage is merely substituting paper for pixels there is hope that the next wave of teachers can move to modifying and redefining how these advances are used. The time that teachers will save using digital tools for assessments (administration and marking of assessments) will hopefully be directed to even more reflection on the individual student needs and relevant instruction for the different needs.
How can we begin to integrate more technology into everyday assessment and evaluation practices?
With the expectation of increased use of digital tools comes the requirement of increased training and support. It is not sufficient to provide teachers and educators with new digital tools with the expectation that they are implemented in the classroom. For effective and sustained use of technology, educators require ongoing support and training from the Ministry and school board administrations to see and explore the possibilities that are available for them in the classroom. It is also important to recognize how variable the digital capacity is of educators. This means training is going to require a tiered approach to meet the staff where they are. Ideally, this would also take advantage of digital technology with on-demand training to accommodate varying schedules and in-person and asynchronous opportunities to troubleshoot and share experiences.
Ontario Ministry of Education. (2007) The Ontario Curriculum Grades 9 and 10 English. Retrieved January 5, 2022 from: http://www.edu.gov.on.ca/eng/curriculum/secondary/english910currb.pdf
Romrell, D., Kidder, L. & Wood, E. (2014). The SAMR Model as a Framework for Evaluating mLearning. Online Learning Journal, 18(2),. Retrieved January 5, 2022 from https://www.learntechlib.org/p/183753/.
2.7 Discussant Summary: Advancing Assessment and Evaluation by Leveraging Technology: Perspectives from and Implications for Research, Practice, and Policy in a Rapidly Changing World
Author: Amanda Cooper, PhD
Institution: Queen’s University
Cooper, A. (2022, January 27-28). Advancing Assessment and Evaluation by Leveraging Technology: Perspectives from and Implications for Research, Practice, and Policy in a Rapidly Changing World [Discussant Remarks] Advancing Assessment and Evaluation Virtual Conference: Queen’s University Assessment and Evaluation Group (AEG) and Educational Testing Services (ETS), Kingston, Ontario, Canada.
The COVID-19 pandemic has caused widespread school closures that has necessitated virtual learning at scale across K-12 education systems around the world. Consequently, education systems globally are grappling with rapidly changing virtual learning environments and the challenges surrounding leveraging technology to facilitate student learning. This theme explores advancing assessment and evaluation by levering technology and includes five panel members coming from different vantage points including universities, policymaking environments, and school districts. Knowledge mobilization (KMb) explores how research is applied and used to inform policy and practice in public service sectors. KMb utilizes a whole system perspective that explores three dimensions: research production (funders and universities), research use (practice and policymaking organizations like ministries of educations and school districts), and intermediary organizations that shape education systems (such as the Education Quality and Accountability Office in Ontario). I utilize KMb dimensions to explore the three vantage points offered by the diverse panelists –Research, Policy, and Practice – in relation to advancing assessment and evaluation by leveraging technology in education.
Contributors to Theme 2 Thought Papers
Dr. Mark Gierl, Professor of Educational Psychology, Faculty of Education, University of Alberta
Dr. Hollis Lai, Associate Professor, Faculty of Medicine and Dentistry, University of Alberta
Dr. Cameron Montgomery, Chair, Education Quality and Accountability Office (EQAO), Ontario
Dr. Amery Wu, Associate Professor, Faculty of Education, University of British Columbia
Greg Rousell, System Research Leader, Grand Erie District School Board
Discussant: Dr. Amanda Cooper, Associate Professor of Educational Policy and Leadership and Associate Dean of Research, Faculty of Education, Queen’s University
Short Paper Summaries
All five panelists describe the current context of technology and assessment in relation to rapid growth and development, including exponential growth of data across societal systems and virtual platforms. Drs. Gierl, Lai, and Wu explore technology and assessment in relation to research from university perspectives, whereas the final two panelists explore these issues from a policy perspective (Dr. Montgomery) and a practice perspective (Rousell) within a school district.
Dr. Gierl highlights that “we are at the beginning of a data-enabled revolution” (p.2). He outlines two separate but interrelated streams of research activity arising from this revolution:
Foundational Research: “Foundation research includes designing new data science, machine learning, and artificial intelligence methods and algorithms. It also includes developing appropriate computational platforms and software applications to implement these methods and algorithms”
Applied Research: “Applied research implements the outcomes from the foundational research to solve practical problems in specific content areas. These content areas are not only different but also diverse, as they include fields that range from manufacturing to construction, health, energy technologies, and law enforcement, business to finance. Education is one of the applied content areas.”
Dr. Gierl outlines three key principles that are needed to train the next generation of students in measurement and evaluation: (1) Interdisciplinary, (2) Flexible Program Delivery, and (3) a Culture for Creating Ideas. Dr. Gierl proposes an intriguing idea of “creativity and risk” for the field of educational testing to meet the needs of the data revolution and the changing landscape of assessment in the midst of technological advancement.
Dr. Lai’s paper begins by highlighting the historical hurdles the field of educational assessment has faced in adopting new technologies. He outlines assessment innovations such as computer-based testing, adaptive testing, automated scoring, and situational judgement tests that now have widespread adoption. He structures his paper around three questions: (1) What is an expert of educational measurement? (2) What skills are needed? (3) And how to leverage technology? The example of the shifts in medical education to competency-based medical training reflects some of the changing landscape of assessment of professional training programs. Dr. Lai concludes his paper with a focus on LEARNING and the evaluation of learning as the common task and a unifying concept among disparate approaches, methods, and outcomes. Hollis suggests knowledge translation of new methods and technologies that are needed for assessment, alongside new skills and technologies for experts of educational measurement.
Dr. Wu explores data science and artificial intelligence through two cases. Case 1 uses text mining techniques in data science to identify College Major Preference Assessment (CMPA) in 40 universities in North America. Case 2 uses multilabel neural network in machine learning to score a short version of the CMPA (99 Likert items). These approaches predicted outcomes exceedingly well with an overall accuracy across 50 majors (median 96%, mean 95%).
Policy and Practice
Dr Montgomery’s paper provides a policy perspective on large-scale assessment through an overview of the move to digitize EQAO (an intermediary organization that provides standards and implementation of literacy testing at scale across the province of Ontario). He outlines the benefits and challenges this year associated with a move to harness digital implementation of EQAO testing. Dr. Montgomery highlights benefits of this approach including more flexible assessment window, faster reporting of instant results, less carbon footprint, and built in accessibility tools. However, he also highlights the challenges with changing EQAO in relation to uneven technological across the 72 school boards in the province, device availability in schools, and the capacity building needed to assist Principals and school leaders undertake this digitized assessment.
Greg Rousell, a system research leader from a school district, offers a much needed perspective from practice. He outlines the challenges of virtual learning and leveraging technology with teachers on the frontlines. His paper explores the use of Brightspace as a platform that offered a range of tools to teachers who all moved to virtual learning platforms during school closures for COVID-19. While students capacity for technology often rivals that of older generations (a realization that technological learning might prove to be more engaging for students), technological capacity of teachers is variable. Consequently, more work with educators and students is needed if school districts are to optimize the use of technology in assessment, and virtual learning more broadly.
Implications and Future Questions
The papers provoked ideas on how to leverage technology for assessment that varied depending on the vantage point from research, policy, or practice. I propose sparks for future thinking in relation to three areas: the good, the bad, and the unknown.
New possibilities: Technology is enabling things we might never have imagined in fields of assessment such as Artificial Intelligence, machine learning, and connecting global learners across dynamic platforms.
A Call for Interdisciplinarity and Collaboration: New research opportunities are emerging that encourage diverse collaboration across sectors which is improving our ability to solve complex social challenges
Capacity-Building Needed: Capacity across sectors have not yet met the demand for the rapidly changing environment, as such investment is needed across the system for research producing organizations (universities) and research using organizations (schools)
Graduate Programs Must Become More Dynamic: In universities, need for new graduate programs – more dynamic pathways, diverse committee structures that push boundaries of combining expertise of different fields
Schools: In school districts, technological and internet infrastructure is needed alongside training and capacity building for stakeholders
Policymaking: In governments and intermediaries, infrastructure and capacity building to meet the challenges of the digital age.
Equity remains an important concern across our public institutions especially in relation to education in relation to how technology influences learning outcomes, assessment and evaluation, among diverse groups
While many discuss the benefits of technology for Equity in relation to assessment, there is little discussion on how technology might contribute to widening gaps across various learners in our society and actually contribute to further inequities. These two areas are not mutually exclusive – technology can offer positive contributions in some areas of equity, while simultaneously deepen these divides in other contexts.
More work is needed to explore how research, policy and practice are changing (or not) to meet the needs of a new era and how our public institutions can grapple with rapid and continual changes in technology.
As a scholar of knowledge mobilization and translation, I often talk about the connections between research, practice, and policy – the push and pull of competing demands across large scale systems across diverse types of organizations and stakeholders. This panel and the diverse perspectives show that assessment and evaluation is a complex landscape involving diverse actors, organizations, and policies. As such, harnessing the benefits of technology to advance assessment will require alignment and collaboration across research systems, school systems, in conjunction with the policy landscape. In closing, I offer further questions to consider
What role might policy play in harnessing technology for assessment from diverse vantage points in universities, government, and school districts?
What is the potential for “creativity and risk” (Gierl, 2022) in our public institutions?
What is the role of social innovation in advancing efforts to advance assessment by leveraging technology?
What potential exists to engage diverse actors to optimize these efforts, for instance, creating partnerships with industry?
What are the facilitators and barriers created by technology in assessment in relation to EDII?