With graduation season in full swing, this was the perfect opportunity to bring out our Random Chart of the Month to discuss the Times Higher Education (THE) World University Rankings!
THE is known for annually providing a comprehensive list of university rankings, which ranks institutions based on different factors. The formula for the ranking given to schools from all over the world in this list featured multiple factors such as teaching quality, research influence, international outlook, staff, and more.
Based on the recent scores provided by THE, the top three schools by rank include the University of Oxford at first, Harvard University at second, and a tie for third between the University of Cambridge and Stanford University.
This month's analysis will dig deeper into some significant scoring factors contributing to these schools climbing the ranking ladder.
We will use a similar modeling method from our previous blog last month when we analyzed CO2 emissions from vehicles. For the context of this university dataset, we will want to find some significant predictors that propel a university's overall score.
Contrary to the previous iterations of our Random Chart of the Month, we wanted to initiate our analysis portion before the visualization to contextualize the backbone of our modeling method.
When we worked on a multiple linear regression model from our last blog, we introduced independent and dependent variables, which both construct the contents of this data mining technique. As a refresher, independent variables are our set of predictors that influence the value we are trying to predict, THE's overall university score, also known as our dependent variable. We call it a Multiple Linear Regression because we have more than one independent variable, but any simple or multiple linear regression models will generally have one dependent variable.
When fitting the most optimal model, which sets THE's overall university score used for ranking the top schools as the target of our prediction, we can construct this index to define our variables:
|Derived from various factors reflecting the university's strength of research.
|Measures impact of citations received by research publications affiliated with that university.
|A higher value represents stronger connections and financial support from the industry (applied research, knowledge exchange, and technology transfer) for the university.
|Ratio of the number of students to academic staff or faculty members.
|% International Students
|The proportion of students that come from countries other than the country in which that university is located.
|Overall University Score
|THE's complete university score to rank their top schools annually.
As we progress with our analysis, we will be able to conclude the statistical impact that these sets of independent variables have in contributing to a high overall university score and what component allows schools such as Oxford, Harvard, Cambridge, and Stanford to score at its peak.
Having utilized a ranked bar chart in our previous blog, which used a similar modeling method, we wanted to showcase another visualization that can help understand the level of impact a set of predictors can have on what we are trying to predict. This time, we constructed another bar chart that targets the significant, independent variables we mentioned in our analysis and their per-point impact on the university score:
After verifying the necessary assumptions required to run a multiple linear regression model on this data, we can estimate coefficients for each of the predictors we used for modeling THE's university scores.
The values at the apex of each bar represents the coefficients we estimated for the predictors. One interpretation in our example would be: for every percent increase in a school's international students proportion, we expect that school's overall score to increase by 4.921 (while keeping other variables constant). If we were to write out the full estimated model as an equation, we would construct this:
From this equation, it becomes a matter of plugging in values for these variables to estimate where a school might end up in its overall score.
Based on the magnitude of each coefficient, we can see that a change in the proportion of international students in a school impacts the overall score the highest. Internationalization appears to matter a great deal when considering an increase to a school's score. This aligned with schools ranked in the top ten, which averaged about 30% of their student base coming from out-of-country.
Another variable worth noting is the school's student-to-staff ratio, which seems to carry a negative coefficient value in our model. This indicates an inverse effect on a higher overall score when that ratio increases - signifying that a smaller student-to-staff ratio is better for a more significant score. A smaller ratio could indicate an increased individualized learning experience, mentorships, and more.
Although these variables proved to be rather significant in deciphering how THE constructs their university scoring, plenty of other factors likely establish the top schools' tier. They are still great indicators of why you would expect some of these institutions to make the top in various rankings consistently.
Continue to tune into our ˜Random Chart of the Month' each month! We're experts at analyzing all kinds of data, but especially social media. Let us help you build a social media campaign backed by data and results. Reach out to us below!