Take-home Exercise 1: Geospatial Analytics for Public Good
This handout provides the context, the task, the expectation and the grading criteria of Take-home Exercise 1. Students must review and understand them before getting started with the take-home exercise.
Setting the Scene
As city-wide urban infrastructures such as buses, taxis, mass rapid transit, public utilities and roads become digital, the datasets obtained can be used as a framework for tracking movement patterns through space and time. This is particularly true with the recent trend of massive deployment of pervasive computing technologies such as GPS and RFID on the vehicles. For example, routes and ridership data were collected with the use of smart cards and Global Positioning System (GPS) devices available on the public buses. These massive movement data collected are likely to contain structure and patterns that provide useful information about characteristics of the measured phenomena. The identification, analysis and comparison of such patterns will provide greater insights on human movement and behaviours within a city. These understandings will potentially contribute to a better urban management and useful information for urban transport services providers both from the private and public sector to formulate informed decision to gain competitive advantage.
In real-world practices, the use of these massive locational aware data, however, tend to be confined to simple tracking and mapping with GIS applications. This is mainly due to a general lack of functions in conventional GIS which is capable of analysing and model spatial and spatio-temporal data effectively.
Objectives
Exploratory Spatial Data Analysis (ESDA) hold tremendous potential to address complex problems facing society. In this study, you are tasked to apply appropriate Local Indicators of Spatial Association (GLISA) and Emerging Hot Spot Analysis (EHSA) to undercover the spatial and spatio-temporal mobility patterns of public bus passengers in Singapore.
The Task
The specific tasks of this take-home exercise are as follows:
Geovisualisation and Analysis
With reference to the time intervals provided in the table below, compute the passenger trips generated by origin at the hexagon level,
Peak hour period Bus tap on time Weekday morning peak 6am to 9am Weekday afternoon peak 5pm to 8pm Weekend/holiday morning peak 11am to 2pm Weekend/holiday evening peak 4pm to 7pm Display the geographical distribution of the passenger trips by using appropriate geovisualisation methods,
Describe the spatial patterns revealed by the geovisualisation (not more than 200 words per visual).
Local Indicators of Spatial Association (LISA) Analysis
- Compute LISA of the passengers trips generate by origin at hexagon level.
- Display the LISA maps of the passengers trips generate by origin at hexagon level. The maps should only display the significant (i.e. p-value < 0.05)
- With reference to the analysis results, draw statistical conclusions (not more than 200 words per visual).
Emerging Hot Spot Analysis(EHSA)
With reference to the passenger trips by origin at the hexagon level for the four time intervals given above:
- Perform Mann-Kendall Test by using the spatio-temporal local Gi* values,
- Prepared EHSA maps of the Gi* values of the passenger trips by origin at the hexagon level. The maps should only display the significant (i.e. p-value < 0.05).
- With reference to the EHSA maps and data visualisation prepared, describe the spatial patterns reveled. (not more than 250 words per cluster).
The Data
Apstial data
For the purpose of this take-home exercise, Passenger Volume by Origin Destination Bus Stops downloaded from LTA DataMall will be used.
Geospatial data
Two geospatial data will be used in this study, they are:
- Bus Stop Location from LTA DataMall. It provides information about all the bus stops currently being serviced by buses, including the bus stop code (identifier) and location coordinates.
- hexagon, a hexagon layer of 250m (this distance is the perpendicular distance between the centre of the hexagon and its edges.) should be used to replace the relative coarse and irregular Master Plan 2019 Planning Sub-zone GIS data set of URA.
Grading Criteria
This exercise will be graded by using the following criteria:
Geospatial Data Wrangling (20 marks): This is an important aspect of geospatial analytics. You will be assessed on your ability to employ appropriate R functions from various R packages specifically designed for modern data science such as readr, tidyverse (tidyr, dplyr, ggplot2), sf just to mention a few of them, to perform the entire geospatial data wrangling processes, including. This is not limited to data import, data extraction, data cleaning and data transformation. Besides assessing your ability to use the R functions, this criterion also includes your ability to clean and derive appropriate variables to meet the analysis need. (Warning: All data are like vast grassland full of land mines. Your job is to clear those mines and not to step on them).
Geospatial Analysis (25 marks): In this exercise, you are expected to use the appropriate thematic and analytics mapping techniques and R functions introduced in class to analysis the geospatial data prepared. You will be assessed on your ability to derive analytical maps by using appropriate rate mapping techniques.
Geovisualisation and Geocommunication (25 marks): In this section, you will be assessed on your ability to communicate the complex spatial statistics results in business friendly visual representations. This course is geospatial centric, hence, it is important for you to demonstrate your competency in using appropriate geovisualisation techniques to reveal and communicate the findings of your analysis.
Reproducibility (20 marks): This is an important learning outcome of this exercise. You will be assessed on your ability to provide a comprehensive documentation of the analysis procedures in the form of code chunks of Markdown. It is important to note that it is not enough by merely providing the code chunk without any explanation on the purpose and R function(s) used.
Bonus (10 marks): Demonstrate your ability to employ methods beyond what you had learned in class to gain insights from the data. The methods used must be geospatial in nature.
Submission Instructions
- The write-up of the take-home exercise must be in Quarto html document format. You are required to publish the write-up on Netlify.
- The R project of the take-home exercise must be pushed onto your Github repository.
- You are required to provide the links to Netlify service of the take-home exercise write-up and github repository on eLearn.
Due Date
3rd December 2023 (Sunday), 11.59pm (midnight).
Survival Tips
Learning from seniors
CHUA YAN TING Have done well in all five grading criteria especially the geocommunication criterion.
LIN SHUYAN Geospatial data wrangling is very comprehensively done especially identifying water points located outside Nigeria administrative boundary due to location precision issue.
LOH SI YING Have done well in all five grading criteria especially the followings: (i) the geospatial wrangling are very comprehensively done including to exclude LGAs without water points from the analysis, (ii) managed to compute the p-values, (iii) Start each analysis by explaining the purpose of the analysis. Managed to relate the analysis results to the location context.
ONG ZHI RONG JORDAN A good example to learn geospatial data wrangling. Full of useful code chunk to learn from. Alternative approach to derive significant Gi.
ZHU YITING Provide a useful discussion on how to extract and download both data sets from their respective sources.
AILYS TEE XYNYN. This is a take-home exercise submission for IS415 Geospatial Analytics and Applications (one of my undergraduate course). Overall very well prepared submission. The data preparation was very well done. The flow of analysis processes were appropriate.
Peer Learning
- CAI JINGHENG
- CHAI ZHIXUAN This is one of the two submission that includes steps on how to download the Passenger O-D data by using LTA DataMall API and opensource Postmen. Refer to sub-section 3.1.1 Aspatial data. Although it is incomplete (Step 3 :)) but still one of the best.
- CHAN JING WEI MAGDALENE
- CHENG CHUN CHIEH
- CHIA YONG SOON
- CHOCK WAN KEE
- DABBIE NEO WEN MIN
- FOONG CHUERN YUE DARREN
- GAN WEI SHENG
- GOH SI HUI
- HAILEY CHEONG SZE-YENN
- KE KE
- KOH CHIN WENG
- KRISTINE JOY PAAS Have done well in all five grading criteria especially the reproducibility, geovisualisation and geocommunication criteria. Geospatial Analytics criterion can be improved by including a paragraph describing the purpose, concepts and methods of the geospatial analytics used.
- KWOK PEI SHAN
- KYLIE TAN JING YI Section 5: Spatial Association Analysis of this submission provides a comprehensive discussion of the methods used and analysis results.
- LANG SHUANG
- LIANG YAO
- LOH NIAN EN ALICIA
- LOW JI XIONG
- MAH LIAN KHYE
- MICHAEL BERLIAN
- MUHAMAD AMEER NOOR Have done well in all five grading criteria including a short write-up of the geospatial analytics methods used.
- NEO YI XIN This submission put function programming of R into good used. For example, subsection Processing the aspatial OD data for processing data with same structure repetitively, Task 1: Geovisulisation and Analysis to ensure that a same classification scale are used. Further more Sub-section Computing Distance-Based Spatial Weights Matrix serves as a good example on how to discussion geospatial analytics methods used. There are at least three other students did show the spatial weights map but they are way too messy.
- NOEL NG SER YING
- OH JIA WEN
- ONG JIA HU EDWARD
- PHAN HOANG LONG
- PHYO ZIN HTET
- PIRAPAT CHAIYA
- QIU RUILIU
- QUEK YOU TING
- SU SANDI CHO WIN
- TEN WEI PENG
- TOA ZI YING JANET
- TOH CHIN FOONG
- WANG YIZAO
- WANG YUHUI
- WIDYA TANTIYA YUTIKA
- WOO JIA JIAN
- XU LIN
- ZHANG CUNLEI
- ZHAO ZECHENG