Enhancing Data Accessibility and Usability: Insights from Observation Data at the Ieodo Ocean Research Station (2005-2023)
Article information
Abstract
The Ieodo Ocean Research Station (IORS) has been a critical site for collecting oceanic and atmospheric data over 19 years (2005-2023). This study analyzes key parameters including tide, wave, water temperature, salinity, wind, air temperature, etc. While these datasets have provided valuable insights into regional climatic and oceanographic phenomena, significant incorrect data were identified, particularly in the records from 2015 to 2019. Issues such as overlapping data spanning multiple years, non-chronological ordering, and discrepancies in decimal precision were detected likely due to inadequate data handling during the adjustment process in programs like Excel. These errors, which persisted for years without detection or correction, highlight a lack of responsibility and verification among both government and private sector employees involved in data management. Given the substantial investment of government resources in the IORS, the continuous presence of incorrect data on official platforms raises concerns that extend beyond the responsible parties to the broader scientific and public communities. To prevent future occurrences, the study emphasized the need for stringent data management practices and ongoing accuracy in data handling, ensuring that the IORS continues to serve as a reliable resource for oceanographic research in Korea and globally.
1. Introduction
The Ieodo Ocean Research Station (IORS) is strategically situated at coordinates 125.182°E, 32.123°N, approximately 149 km southwest of Mara Island, the southernmost point of Korea, and to the southwest of Jeju Island (Fig. 1, Byun et al., 2021; IORS, 2024; Moon et al., 2010; Oh et al., 2014; Oh et al., 2006; Shim et al., 2004). Since its establishment in 2003, the IORS has served as a crucial monitoring site for oceanic and atmospheric parameters, providing valuable insights into various phenomena such as the Tsushima Warm Current pathway, freshwater discharge from the Changjiang River, and air-sea interactions in regions distant from land influences (e.g., Bae et al., 2022; Ha et al., 2019; Hwang and Jung, 2012; Kim et al., 2021; Lee et al., 2020; Lee et al., 2022; Yang et al., 2022; Yeo and Nam, 2020; You et al., 2011). This location is also particularly advantageous for observing precursors to the Changma (the Korean rainy season) and typhoons before they reach the Korean Peninsula, China, and Japan (e.g., Moon et al., 2010; Oh et al., 2014; Saranya et al., 2024; Woo et al., 2021; Yun et al., 2015).
Data collected at the IORS-including parameters such as aerosols, solar radiation, turbulent flux, wind, wave, current, tide, temperature, and salinity-have been extensively utilized in various oceanic and atmospheric research efforts (e.g., Hwang et al., 2008; Kang et al., 2017; Woo et al., 2018). These datasets have provided essential knowledge for understanding regional climatic and oceanographic phenomena, establishing the IORS as an indispensable resource for researchers (Kim et al., 2020; Park et al., 2014).
The primary objective of this study is to present a comprehensive overview of the observations and their maintenances at the IORS since 2005. While the consistent provision of data from the IORS since 2005 is commendable, the lack of rigorous quality control and continuous management of this data raises significant concerns that must be addressed. Ensuring the reliability and accessibility of this data is crucial for its effective use in both ongoing and future research. Therefore, this study not only highlights the urgent need for enhanced data management practices to maximize its utility in scientific research (Han, 2020).
2. Data and Methods
This study analyzes hourly, 10-minute, and 1-minute observational data collected at the IORS for parameters including tide, wave, current, water temperature, salinity, wind, air temperature and pressure, relative humidity, solar radiation, and precipitation. The data, covering the period from 2005 to 2023, were downloaded from the Korean Hydrographic and Oceanographic Agency (KHOA) website (KHOA, 2024).
The dataset was plotted to identify usable data periods and calculate other parameters of interest. Specifically, hourly data from 2005 to 2007, 10-minute data from 2008 to 2019, and 1-minute data from 2020 to 2023 (Table 1). Anomalies were observed in the dataset, such as 20-minute and 10-minute intervals before September 12, 2013, at 17:00 KST, followed by 1-minute intervals afterward. Additionally, from 2015 to 2019, the data included mixed and random entries that were overlapped and not in chronological order.
A comparison was made to ensure accuracy between the data downloaded from the KHOA website (KHOA, 2024) on July 8, 2020, and August 22, 2024.
3. Results
3.1 Observation Status over 19 years (2005-2023)
Over the 19-year period from 2005 to 2023, the IORS has collected extensive observational data encompassing more than ten parameters. These include tide, wave, water temperature, salinity, wind, air temperature, relative humidity, air pressure, visibility, solar radiation, sunshine hours, and precipitation. The data were recorded at varying intervals (1-minute, 10-minute, 20-minute, and hourly). The number in parenthesis for each parameter represents the percentage of the observation period within a year, indicating the proportion of data observed during that period (Figs. 2-10). Although it is difficult to distinguish erroneous data from the graph, certain gaps and peaks are easily identifiable, particularly in the data from 2005 to 2014 (not shown; Han, 2020) and 2020 to 2023 (Figs. 7-10).
3.2 Data Anomalies between 2015 and 2019
Significant inconsistencies were identified in the dataset from 2015 to 2019 (Figs. 2-6). For instance, datasets labeled for a specific year contained overlapping data spanning multiple years, as seen in files like ‘data_2015_IE_IE_286_2015_KR.txt’, which included data from both 2015 and 2016 (Figs. 2a, 3a, 4a, 5a, and 6a). Additionally, these records were not in chronological order, leading to overlapping lines in plotted parameters such as water temperature and salinity (Fig. 3a).
To rectify these issues, we manually removed overlapping entries and reorganized the data chronologically, resulting in corrected plots that accurately represent single-year data (Figs. 2b, 3b, 4b, 5b, and 6b). These problematic datasets from 2015 to 2019 were initially downloaded from the KHOA website (KHOA, 2024) on August 22nd, 2024. In contrast, data downloaded on July 8, 2020, contained entries confined to their respective years without overlaps.
3.3 Discrepancies in Decimal Precision and Data Handling
A notable difference between the datasets downloaded on July 8, 2020, and August 22, 2024, is the variation in decimal precision. The 2020 data utilized zero decimal places, which cannot be used as scientific data, whereas the 2024 data employed two decimal places, which can be used as scientific data at a station. It is plausible that personnel at KHOA or junior employees from associated subcontracting companies attempted to standardize the data by adjusting the decimal precision using programs like Excel. During this process, incorrect data may have been included and saved.
The updated data containing these errors were uploaded without thorough verification, and neither responsible parties nor data users identified or reported the inaccuracies. This oversight underscores two critical issues:
1. Lack of Responsibility and Verification: Both government officials and private company employees did not exhibit adequate diligence in verifying and ensuring the accuracy of the newly updated data.
2. Insufficient Data Utilization: The persistence of erroneous data over several years suggests limited usage and scrutiny of the datasets by the user community.
3.4 Implications and Recommendations
Given that the IORS serves as a prominent ocean research station in the maritime regions surrounding Korea, China, and Japan, and operates with substantial funding from the Korean government, the continuous presence of incorrect data on official platforms is concerning. This situation not only reflects on the responsible authorities but also impacts the broader scientific and public communities reliant on accurate data for research and decision-makingg.
To prevent the recurrence of such issues, it is imperative that all involved parties, including government officials and private sector employees, uphold stringent standards of accuracy and responsibility throughout and beyond the project timelines. Implementing robust data management and quality control protocols will ensure the integrity of the datasets and enhance their utility for various applications in oceanographic research and environmental monitoring.
4. Summary and Discussion
The 19-year data (2005-2023) from the KHOA website (KHOA, 2024) consists of raw data, lacking sufficient metadata and data quality control flags. While the observation periods for parameters such as tide, wave, wind speed and direction, air temperature, and air pressure were extensive, spanning 19 years, making them potentially valuable for researchers, other datasets were less comprehensive. For instance, water speed and direction data are only less than four years, making them unsuitable for long-term trend analysis, such as 10 years. Additionally, there were no wave data available after 2011. However, there were sufficient water temperature and salinity data spanning 16 years, which could be used to study changes in temperature, salinity, and density.
The IORS plays a crucial role as an oceanic and atmospheric research station in the East China Sea (Shim et al., 2004). Since 2003, it has facilitated a variety of observations and research, including studies on aerosol, ozone, CO2, Changma, solar radiation, turbulent flux, wind, wave, fog, sea surface height, temperature, salinity, SST, underwater ambient noise, and typhoons. Previous studies utilizing data from the IORS have been conducted (Ha et al., 2019; Han, 2020; e.g., Moon et al., 2010; Oh et al., 2014; Yeo and Nam, 2020). Upon plotting the data downloaded from the KHOA website (Figs. 2-10), it was evident that the dataset lacked sufficient data across various fields. Notably, from 2015 to 2019, there were overlapped and non-chronological data (Figs. 2a, 3a, 4a, 5a, and 6a). To address this, we removed the overlapped data and corrected the chronological order, and re-plotted the data (Figs. 2b, 3b, 4b, 5b, and 6b).
We also identified issues related to data handling at the IORS. On July 8th, 2020, data was recorded using zero decimal places, likely using the Excel program, while on August 22nd, 2024, it was recorded with two decimal places, probably with the same program. It is suspected that KHOA personnel or subcontractors inadvertently introduced errors during the decimal adjustment process in Excel. This incorrect data was uploaded without proper verification, and neither those responsible nor the users identified or reported the errors. This situation highlights two primary issues: a lack of responsibility and verification among both government and private sector employees, and a possible underutilization of data, given that the errors persisted for years.
As a critical research facility supported by significant government funding, the IORS raises important concerns regarding the persistent presence of incorrect data on official platforms. To address these issues and prevent recurrence, it is essential that all stakeholders uphold accuracy and responsibility, even after project completion. Extending project timelines to 10 or 20 years could significantly improve the quality and maintenance of long-term data. Moreover, implementing incentives for identifying for errors, would enhance accountability and precision. A data real-name system, retaining records of responsible individuals for 100 years, would further reinforce responsible data management practices. Lastly, it is crucial to provide clear guidance to future users-including elementary, middle, and high school students, as well as undergraduate and graduate students, teachers, professors, and the general public-on how to effectively and confidently utilize IORS data for ocean science studies, both within Korea and globally.
Acknowledgements
Oceanic and atmospheric data at the IORS (http://www.khoa.go.kr/oceangrid/gis/category/reference/distribution.do#none) were used in this study. This research was funded by the Ministry of Trade, Industry, and Energy (MOTIE) of Korea under the “Regional Innovation Cluster Development Program (PN92300, P0025418)”, supervised by the Korea Institute for Advancement of Technology (KIAT). It was also supported by the Korea Institute of Ocean Science and Technology (PEA0231). Additionally, this study was supported by the project “Sustainable Research and Development of Dokdo (PG54141)” under the Ministry of Oceans and Fisheries, Korea.