Learning Analytics: How to Use Students’ Big Data to Improve Teaching
Learner interaction in an online educational environment (logging for example into a virtual learning management system) leaves a lot of digital traces behind. This huge amount of aggregate data that is sourced from as many students as possible is known as Big Data. The measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of discovering patterns, understanding and optimizing learning and the environments in which it occurs is known as Learning Analytics. This article will show how Learning Analytics can be used to improve teaching and learning by the description, diagnosis, prediction and prescription based on the collected student data.
2. Big Data Definition
Big Data can be defined as extremely large amount of structured, semistructured and unstructured data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions. Big Data, while impossible to define specifically, typically refers to data storage amounts in excesses of one terabyte (TB).
Big Data has three main characteristics: Volume (amount of data), Velocity (speed of data in and out), Variety (range of data types and sources).
- Volume describes the amount of data generated by organizations or individuals. Big Data is usually associated with this characteristic.
- Velocity describes the frequency at which data is generated, captured and shared.
- Variety: Big data means much more than rows and columns. It means unstructured text, video, audio that can have important impacts on company decisions – if it’s analyzed properly in time.
An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people, all from different sources (e.g. Web, Learning Management Systems, MOOCs, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.
3. Learning Analytics Definition
There are many definitions of Learning Analytics. One popular definition states that learning analytics are “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs” [Siemens, 2011]. Erik Duval [Duval, 2012] has proposed the following definition: “learning analytics is about collecting traces that learners leave behind and using those traces to improve learning”. Rebecca Ferguson [Ferguson, 2014] places learning analytics in a continuum:
- High-level figures: Which can provide an overview for internal and external reports and used for organizational planning purposes.
- Academic analytics: Figures on retention and success, used by the institution to assess performance.
- Educational data mining: Searching for patterns in the data.
- Learning analytics: Use of data, which may include ‘big data’, to provide actionable intelligence for learners and teachers.
The most common use of learning analytics is to identify students who appear less likely to succeed academically and to enable targeted interventions to help them achieve better outcomes. According to Gartner [Gartner, 2013] there are four types of Learning Analytics:
- Descriptive Analytics: use data mining to provide insight into the past and answer: “What has happened?”
- Diagnostic Analytics: examines data or content to answer the question “Why did it happen?”
- Predictive Analytics: use statistical models to understand the future and answer: “What could happen?”
- Prescriptive Analytics: use optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?”
Many educational institutions are still in the descriptive stage, using traditional Business Intelligence (BI) approaches: get all your data together and use visualization tools to get an overview on what has happened.
Gartner’s analytics model may be a good starting point to explain and prepare the transition to Artificial Intelligence (AI).
Diagnostic analytics is about figuring out why did an event happen, and uses techniques such as data discovery, data mining, and correlations.
But what can be more interesting is when trying to use predictive analytics to project what will happen. Usually this is done by using existing data to train predictive machine learning (ML) models. With the advent of Artificial Intelligence (AI), predictive and prescriptive analytics can become more precise in providing effective advice and actions. [Boobier, 2018]
4. Learning Analytics in Higher Education
Every time students interact with their educational institution, by logging into a virtual learning management system (like Moodle for instance), by submitting an online assessment, taking an online quiz or by visiting the library, they leave digital traces behind. Those digital traces can be collected and analyzed so that we can optimize teaching and learning.
There are many advantages in using Learning Analytics in Higher Education [Scatler et al., 2016]:
- As a tool for quality assurance and quality improvement – with many teaching staff using data to improve their own practice, and many institutions using learning analytics as a diagnostic tool on both an individual level (e.g. identifying issues) and a systematic level (e.g. informing the design of modules and degree programs).
- As a tool for boosting retention rates, with institutions using analytics to identify at risk students and intervening with advice and support at an earlier stage than would otherwise be possible.
- As a tool for assessing and acting upon differential outcomes among the student population, with analytics being used to closely monitor the engagement and progress of sub-groups of students, such as BME students or students from low participation areas, relative to the whole student body, prior to assessment results being made available.
- As an enabler for the development and introduction of adaptive learning – i.e. personalised learning delivered at scale, whereby students are directed to learning materials on the basis of their previous interactions with, and understanding of, related content and tasks.
Figure 2: Learning Analytics in Education (Siemens, 2010)
5. Moodle and Learning Analytics
As the importance of data becomes universally recognized, Moodle and other open systems are starting to provide users the tools to capitalize on their own information and extract value from it without intermediaries. (Moodlenews, 2017)
Beginning in version 3.4, Moodle implemented open source, transparent next-generation learning analytics using machine learning backends that go beyond simple descriptive analytics to provide predictions of learner success, and ultimately diagnosis and prescriptions (advisements) to learners and teachers.
In Moodle 3.4, this system ships with two built-in models:
- Students at risk of dropping out
- No teaching activity
The system can be easily extended with new custom models, based on reusable targets, indicators, and other components. (Moodle, 2018) with following features:
- Two built-in prediction models: “Students at risk of dropping out” and “No Teaching”.
- A set of student engagement indicators based on the Community of Inquiry.
- Built-in tools to evaluate models against your site’s data
- Proactive notifications for instructors using Events
- Instructors can easily send messages to students identified by the model, or jump to the Outline report for that student for more detail about student activity
- An API to build indicators and prediction models for third-party Moodle plugins
- Machine learning backend plugin type – supports PHP and Python, and can be extended to implement other ML backends
Moodle 3.4 is gives access to a dark box that, over time, is supposed to become better at predicting the future. Today, the forecast is binary: the student will either complete the course or drop out before it ends. You will be able to view the sources the box is using to make the predictions and you can play around with them to test the performance and efficiency of the box.
Moodle can support multiple prediction models at once, even within the same course. Moodle core ships with two prediction models, Students at risk of dropping out and No teaching. Additional prediction models can be created by using the Analytics API. Each model is based on the prediction of a single, specific “target,” or outcome (whether desirable or undesirable), based on a number of selected indicators.
Figure 3: Some actions that can be performed on a model in Moodle Analytics
Once you have trained a machine learning algorithm with the data available on the system, you will see insights (predictions) here for each “analysable.” In the included model “Students at risk of dropping out, insights may be selected per course. Predictions are not limited to ongoing courses– this depends on the model.
The prediction model can be evaluated by getting all the training data available on the site, calculating all the indicators and the target and passing the resulting dataset to machine learning backends. This process will split the dataset into training data and testing data and calculate its accuracy. Note that the evaluation process uses all information available on the site, even if it is very old. Because of this, the accuracy returned by the evaluation process may be lower than the real model accuracy as indicators are more reliably calculated immediately after training data is available because the site state changes over time.
Figure 4: Moodle’s Analytics Evaluate Model
Models will start generating predictions at different points in time, depending on the site prediction models and the site courses start and end dates.
Each model defines which predictions will generate insights and which predictions will be ignored. For example, the Students at risk of dropping out prediction model does not generate an insight if a student is predicted as “not at risk,” since the primary interest is which students are at risk of dropping out of courses, not which students are not at risk.
Each insight can have one or more actions defined. Actions provide a way to act on the insight as it is read. These actions may include a way to send a message to another user, a link to a report providing information about the sample the prediction has been generated for (e.g. a report for an existing student), or a way to view the details of the model prediction.
Figure 5 : Insights and Actions : Moodle Analytics
Learning Analytics (LA) will have a key role in educational institutions in the near future. It will allowing educators to identify students who appear less likely to succeed academically and to enable targeted interventions to help them achieve better outcomes. Many Learning Management Systems start to incorporate LA into their core, like Moodle, an open source LMS which just implemented transparent next-generation learning analytics using machine learning backends that go beyond simple descriptive analytics to provide predictions of learner success, and ultimately diagnosis and prescriptions (advisements) to learners and teachers.