Abstract
In the background of big data in power system, data cleaning of power operation and maintenance data can effectively improve data quality, making a good base for data analysis. In the process of data cleaning, the power data anomaly detection accuracy and data correction error have been a technical difficulty. To deal with these problems, we propose a data cleaning method based on Correlation isolation Forest and Attention-based LSTM (CiF-AL). This method constructs the isolation forest based on correlation between data attributes to extract the features of the training dataset, detects the anomalous data in the data set, and then uses the improved LSTM neural network model based on attention mechanism to predict and modify the anomalous data. The experimental results show that the power operation and maintenance data cleaning program based on CiF-AL has been effectively optimized in the accuracy of positioning of anomalous data, the accuracy of data correction, training time and resource consumption.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Liu, F.T., Kai, M.T., Zhou, Z.H.: Isolation forest. In: Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2009)
Wang, C., Xiao, Z.: A data cleaning model for electric power big data based on Spark framework. Electr. Meas. Instrum., 33–38 (2017)
Guo, A., Zhang, N., Sun, T.: Research on exception data cleaning method based on clustering in Hadoop platform. In: International Symposium on Computational Intelligence and Design, pp. 316–320 (2017)
Pruengkarn, R., Wong, K.W., Fung, C.C.: Data cleaning using complementary fuzzy support vector machine technique. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016, Part II. LNCS, vol. 9948, pp. 160–167. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46672-9_19
Chuck, C., Laskey, M., Krishnan, S., et al.: Statistical data cleaning for deep learning of automation tasks from demonstrations. In: IEEE Conference on Automation Science and Engineering, pp. 1142–1149. IEEE (2017)
Qin, H.: A data cleaning method based on genetic algorithm and neural network. Comput. Eng. Appl. 40(3), 45–46 (2004)
Dara, R., Satyanarayana, D.C., Govardhan, D.A.: Front end data cleaning and transformation in standard printed form using neural models. Int. J. Comput. Sci. Appl. 3(6), 9–19 (2014)
Liu, F.T., Kai, M.T., Zhou, Z.H.: Isolation-based anomaly detection. ACM Trans. Knowl. Discov. Data 6(1), 1–39 (2012)
Xu, R., Fang, L., Zhao, D., et al.: Electricity consumption prediction based on LSTM neural networks. Power Syst. Big Data 20(8), 25–29 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Supervised Sequence Labelling with Recurrent Neural Networks, pp. 1735–1780. Springer, Heidelberg (1997)
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000)
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2
Zhang, Y.: Long short-term memory with attention and multi-task learning for distant speech recognition. In: NCMMSC 2017, p. 5 (2017)
Liu, F., Hao, W., Chen, G., et al.: Attention of bilinear function based Bi-LSTM model for machine reading comprehension. Comput. Sci. 44(s1), 92–96 (2017)
Lu, C.: Research on the attention mechanism-based bidirectional LSTM model for the sentiment classification of Chinese product reviews. Softw. Eng. 20(11), 4–6 (2017)
van den Berg, R.A., Hoefsloot, H.C., Westerhuis, J.A., et al.: Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genom. 7(1), 142 (2006)
Zhou, Z.H.: Machine Learning, pp. 33–35. Tsinghua University Press, Beijing (2016)
Acknowledgments
This work was supported by Guangdong power grid co., LTD. Technology project funding (GDKJQQ20161191).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, X., Cai, Y., Zhu, W. (2018). Power Data Cleaning Method Based on Isolation Forest and LSTM Neural Network. In: Sun, X., Pan, Z., Bertino, E. (eds) Cloud Computing and Security. ICCCS 2018. Lecture Notes in Computer Science(), vol 11067. Springer, Cham. https://doi.org/10.1007/978-3-030-00018-9_47
Download citation
DOI: https://doi.org/10.1007/978-3-030-00018-9_47
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00017-2
Online ISBN: 978-3-030-00018-9
eBook Packages: Computer ScienceComputer Science (R0)