Take-home Exercise 1: Geospatial Analytics for Public Good

Important

This handout provides the context, the task, the expectation and the grading criteria of Take-home Exercise 1. Students must review and understand them before getting started with the take-home exercise.

Setting the Scene

As city-wide urban infrastructures such as buses, taxis, mass rapid transit, public utilities and roads become digital, the datasets obtained can be used as a framework for tracking movement patterns through space and time. This is particularly true with the recent trend of massive deployment of pervasive computing technologies such as GPS and RFID on the vehicles. For example, routes and ridership data were collected with the use of smart cards and Global Positioning System (GPS) devices available on the public buses. These massive movement data collected are likely to contain structure and patterns that provide useful information about characteristics of the measured phenomena. The identification, analysis and comparison of such patterns will provide greater insights on human movement and behaviours within a city. These understandings will potentially contribute to a better urban management and useful information for urban transport services providers both from the private and public sector to formulate informed decision to gain competitive advantage.

In real-world practices, the use of these massive locational aware data, however, tend to be confined to simple tracking and mapping with GIS applications. This is mainly due to a general lack of functions in conventional GIS which is capable of analysing and model spatial and spatio-temporal data effectively.

Objectives

Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.

The Task

The specific tasks of this take-home exercise are as follows:

Geovisualisation and Analysis

  • With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,

    Peak hour period Bus tap on time
    Weekday morning peak 6am to 9am
    Weekday afternoon peak 5pm to 8pm
    Weekend/holiday morning peak 11am to 2pm
    Weekend/holiday evening peak 4pm to 7pm
  • Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,

  • Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).

Local Indicators of Spatial Association (LISA) Analysis

  • Compute LISA of the passengers trips generate by origin at hexagon level.
  • Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
  • With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).

Emerging Hot Spot Analysis(EHSA)

With reference to the passenger trips by origin at the hexagon level for the four time intervals given above:

  • Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
  • Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05).
  • With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns reveled. (not more than 250 words per cluster).

The Data

Apstial data

For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used.

Geospatial data

Two geospatial data will be used in this study, they are:

  • Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
  • hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.

Grading Criteria

This exercise will be graded by using the following criteria:

  • Geospatial Data Wrangling (20 marks): This is an important aspect of geospatial analytics. You will be assessed on your ability to employ appropriate R functions from various R packages specifically designed for modern data science such as readr, tidyverse (tidyr, dplyr, ggplot2), sf just to mention a few of them, to perform the entire geospatial data wrangling processes, including. This is not limited to data import, data extraction, data cleaning and data transformation. Besides assessing your ability to use the R functions, this criterion also includes your ability to clean and derive appropriate variables to meet the analysis need. (Warning: All data are like vast grassland full of land mines. Your job is to clear those mines and not to step on them).

  • Geospatial Analysis (25 marks): In this exercise, you are expected to use the appropriate thematic and analytics mapping techniques and R functions introduced in class to analysis the geospatial data prepared. You will be assessed on your ability to derive analytical maps by using appropriate rate mapping techniques.

  • Geovisualisation and Geocommunication (25 marks): In this section, you will be assessed on your ability to communicate the complex spatial statistics results in business friendly visual representations. This course is geospatial centric, hence, it is important for you to demonstrate your competency in using appropriate geovisualisation techniques to reveal and communicate the findings of your analysis.

  • Reproducibility (20 marks): This is an important learning outcome of this exercise. You will be assessed on your ability to provide a comprehensive documentation of the analysis procedures in the form of code chunks of Markdown. It is important to note that it is not enough by merely providing the code chunk without any explanation on the purpose and R function(s) used.

  • Bonus (10 marks): Demonstrate your ability to employ methods beyond what you had learned in class to gain insights from the data. The methods used must be geospatial in nature.

Submission Instructions

  • The write-up of the take-home exercise must be in Quarto html document format. You are required to publish the write-up on Netlify.
  • The R project of the take-home exercise must be pushed onto your Github repository.
  • You are required to provide the links to Netlify service of the take-home exercise write-up and github repository on eLearn.

Due Date

3rd December 2023 (Sunday), 11.59pm (midnight).

Survival Tips

Learning from seniors

  • CHUA YAN TING Have done well in all five grading criteria especially the geocommunication criterion.

  • LIN SHUYAN Geospatial data wrangling is very comprehensively done especially identifying water points located outside Nigeria administrative boundary due to location precision issue.

  • LOH SI YING Have done well in all five grading criteria especially the followings: (i) the geospatial wrangling are very comprehensively done including to exclude LGAs without water points from the analysis, (ii) managed to compute the p-values, (iii) Start each analysis by explaining the purpose of the analysis. Managed to relate the analysis results to the location context.

  • ONG ZHI RONG JORDAN A good example to learn geospatial data wrangling. Full of useful code chunk to learn from. Alternative approach to derive significant Gi.

  • ZHU YITING Provide a useful discussion on how to extract and download both data sets from their respective sources.

  • AILYS TEE XYNYN. This is a take-home exercise submission for IS415 Geospatial Analytics and Applications (one of my undergraduate course). Overall very well prepared submission. The data preparation was very well done. The flow of analysis processes were appropriate.

Peer Learning