As we begin to close the curtains on Earth Month, this is an excellent opportunity to bring in April's iteration of Random Chart of the Month. This time, we will analyze the publicly known Fuel Economy data for vehicle manufacturers in 2022 provided by the Environmental Protection Agency (EPA).
Since the mid-1970s, the United States government has been collecting and publishing Fuel Economy data to discern the availability/cost of oil and the environmental impact on emissions. The EPA measures and tests fuel economy and emissions, which is then publicly available on their government website.
We will be looking specifically at the 2022 iteration of the Fuel Economy data. It covers a wide range of variables for each recorded vehicle for that year, including CO2 (carbon dioxide) emissions, which we are highly interested in this analysis. We can visualize this data relating to how much vehicles emit based on various features such as brand and fuel type. We can then perform some data mining techniques to see what other vehicle components play significant roles in how much CO2 a car emits.
When evaluating this extensive dataset, three different variables measured CO2 emissions by each vehicle: city, highway, and combined. We also had a categorical variable naming which manufacturer belonged to that vehicle. With this information, we were able to discover, by a multiple bar chart, which manufacturer averaged the most CO2 emissions (for both city and highway):
In 2022, Rolls-Royce averaged the highest among all manufacturers in this data based on these averages for combined CO2 emissions for both city and highway driving cycles. On the lower end, we had Mitsubishi Motors Corporation averaging the least combined CO2 emissions.
Although it is fun to see how different manufacturers compare in a plot for CO2 emissions, there are further questions we can answer when we evaluate this large government data set. Now that we have a general quantification of where different manufacturers averaged CO2 emissions for their 2022 models, we can also look into other factors about those vehicles contributing to more significant emissions.
This is where we utilized some data mining techniques to fit a statistical model, specifically a multiple linear regression. Multiple linear regression is a statistical method that models relationships between a dependent variable and two or more independent variables. It is essential throughout the entire modeling process and interpretation to understand what these variables mean:
Tying into the Fuel Economy data, our dependent variable in question would be the combined CO2 emissions by the vehicle. We want to analyze and find out which other features in a vehicle included in the data play a statistical significance in contributing to how much a car pollutes. After meeting certain statistical assumptions for this modeling technique (we will likely cover these techniques in a future analysis blog, so stay tuned!), we can fit a multiple linear regression model to determine those predictors.
After running this model, we concluded a handful of predictors that impact combined CO2 emissions. One significant feature in a vehicle that strongly correlated with CO2 emission was the vehicle engine's method for air aspiration. Across all vehicle brands, cars with the supercharged process for air aspiration seemed to average higher levels of combined CO2 emission. This is likely related to how this engine type requires more fuel for its combustion process, increasing the overall amount of pollutants. Turbocharged engines averaged the lowest combined CO2 emissions among the different air aspiration descriptions.
Another significant predictor for CO2 emissions, to no one's surprise, is the vehicle's class description. This feature directly relates to the size of the vehicle, which should broadly impact mass and, ultimately, how much power is required to move the car. Among all class descriptions, vans averaged the highest combined CO2 emission, while compact cars emitted the least.
Although we could name some significant predictors for CO2 emissions, some factors outside the Fuel Economy data can contribute to substantial emissions. Some non-recordable variables include a person's driving style, overall traffic conditions, vehicle maintenance, etc. A special shoutout to vehicles marked with the ˜electricity' fuel usage, which averaged zero CO2 emissions. #SustainabilityRocks
Continue to tune into our ˜Random Chart of the Month' each month! We're experts at analyzing all kinds of data, but especially social media. Let us help you build a social media campaign backed by data and results. Reach out to us below!