英文原文
Chapter 93
Implementation of an Integrated Log Analysis System Through Statistics-Based Prediction Techniques
Kwangman Ko
Abstract Integrated log analysis systems, which could collect, store and analyze a large volume of log and big data in real time by analyzing firewall logs, continue to expand their applications to a variety of fields such as abnormal network behavior detection, use pattern analysis with web log analysis, fraudulent order analysis and detection for internet shopping malls, inside information leakage analysis and detection. This paper presents a result of designing and implementing an prediction engine applying statistics-based log analysis(regression analysis, time-series analysis, cluster analysis and discriminant analysis etc.) technologies, which could overcome problems of trying to implement with GNUR, mathematical and statistical libraries, for finding preemptive action through concentrated
guard during an expected security accident time period by analyzing and predicting security-related infra logs.
Keywords Integrated log analysis _ Statistical prediction analysis _ Security _ Big-data analysis
93.1 Introduction
As the computer communication environment is advanced at a rapid pace and its performance is improved, a diversity of security-related issues has been presented as a serious problem. In particular, integrated log analysis systems, which could collect, store and analyze a large volume of log and big data in real time by
analyzing firewall logs, continue to expand their applications to a variety of fields such as abnormal network behavior detection, use pattern analysis with Web log analysis, fraudulent order analysis and detection for Internet shopping malls, inside information leakage analysis and detection [1, 2].
The log analysis products are divided into domestic simple security management ones [3], foreign-made log analysis ones and the existing ESM solution ones to supply at home and abroad, and it is expected that the ESM, which could not store raw logs, would be losing its existence value in the market as the laws and regulations requiring to store raw logs are increased [4].
In addition, it is considered that domestic simple security management products, which could store raw logs but their correlation analysis functions are
insufficient, could not also get out of low-price centered markets. Domestic logrelated market is divided into high-priced integrated log analysis one and lowpriced simple log management one, for the former, domestic and foreign-made
products are competing, and for the latter, domestic companies are also facing a competitive situation.
Anymon Plus [3], an integrated log analysis product, overwhelms its competitors’ products in PoC thanks to the world’s top real-time big data collection,
storage and analysis performance (processing 40,000 events per second), and is
emerging as the best product in the market. In addition, Anymon Plus is an ESM product storing raw logs, and is growing to a product that could conquer both log analysis and ESM market at the same time because it provides dynamic analysis functions as excellent as foreign-made products.
This paper presents a result of designing and implementing an prediction engine applying statistics-based log analysis (regression analysis, time-series analysis, cluster analysis and discriminant analysis [5] etc.) technologies, which could overcome problems of trying to implement [6] with GNU R [7, 8], mathematical and statistical libraries, for finding preemptive action through concentrated guard during an expected security accident time period by analyzing and predicting
security-related infra logs. This could help actively use to develop solutions which could preemptively respond to security accidents and threats and be introduced general-purposely by small and medium sized businesses.
This paper is organized as follows. Section 93.2 introduces the design of a
system model forming the basis of a statistics-based integrated log analysis system to be developed in this paper and a statistics-based prediction technique to be implemented. Section 93.3 describes how to implement detailed component modules’ specific functions and core prediction algorithms of the integrated log
system, and then presents the result of an experiment. Section 93.4 draws a conclusion and explains about the future study.
844 K. Ko
93.2 Based Works
93.2.1 Integrated Log Analysis System
The existing typical integrated log analysis system is mainly composed of a center manager and a site manager as Fig. 93.1.
The collection engines of center and site managers gather various logs from
security equipment, servers and applications that are clients’ logs to be collected. For this case, agents are loaded, and the site manager has a structure capable of distributed collection, storage and analysis because it could be expanded in parallel unlimitedly.
93.2.2 Design of the Integrated Log Analysis System
The statistics-based integrated log analysis system, which is ultimately designed and implemented in this study, has a structure as Fig. 93.2, in which how to develop statistical prediction engines is practically classified into four main methods.
The first method to develop the statistical prediction engine is an attempt to
connect or share interfaces with the existing products to interoperate with GNU R, which has an advantage of using a statistically and completely proven tool, but involves inefficiency of real-time processing due to a problem of interworking and operation speed. In addition, it excessively occupies system resources such as CPU and main memory, and there are insufficient interfaces for source data and output, so it involves difficulties of development.
Fig. 93.1 General integrate-log analysis system
93 Implementation of an Integrated Log Analysis System 845
内容需要下载文档才能查看
The second method is an attempt using mathematical and statistical libraries, which has an advantage of generality in addition to developing the prediction engine.
However, it should accept an expensive price policy for each server, and a difficulty of implementing additional logic is presented as a disadvantage. The third method is an attempt to purely develop a new statistical prediction engine, which could secure all the sources related to the implementation and it has an advantage of optimized modification considering the characteristics such as operating systems and development languages as necessary. However, there is a disadvantage of requiring considerable resources and time needed for development. Finally, an attempt to develop it based on a mathematical standard library could minimize verification by using a standardized library and the standard
library could be substituted for parallel process etc. In addition, it has a feature that could be concentrated in developing the prediction engine to minimize the
development period. However, it should be solved the consideration for the license issue that may be encountered when selecting an open-source library and the problem of expense that may be arisen when using additional libraries.
Fig. 93.2 Overview of statistic-based prediction engine system
内容需要下载文档才能查看
846 K. Ko
93.3 Statistics-Based Prediction Engine System
93.3.1 Generation of the Prediction Model Based
on Statistics
In order to periodically analyze a large volume of the collected log data to generate a model for analyzing and predicting abnormal data, it was applied the analysis flow as Fig. 93.3.
For the basic log data, it was applied the whole data in which the level that
could determine changes of data by the period is at least three weeks, and it was targeted at the log data for each specific IP and the specific time zone logs in order to decide whether or not to be abnormal data.
93.3.2 Integrated Log Analysis System Modules
This paper developed a mathematical standard library, in which modules could be self-replaced gradually based on open-source libraries, to design the statisticsbased prediction engine as Fig. 93.4 and to implement respective modules.
? Data import module: Import data with xls, csv and text formats into the internal memory
? Data editing module: Remove nulls of the data imported, convert characters into
numbers, and set additional variables
? Basic stat module: Compute basic statistics for the data processed (average, maximum, minimum, standard deviation and variance etc.)
内容需要下载文档才能查看93 Implementation of an Integrated Log Analysis System
Fig. 93.3 Flow of statistic-based prediction model generation
? Matrix module: Compute inverse matrix required in the regression analysis process
? Statistics module: Compute basic statistics (use in parameter estimating) ? Parameter select module: Select variables to be statistically analyzed
? Linear regression module: Compute linear regression analysis
? Parameter estimating module: Compute statistics such as ANOVA and R^2 to evaluate the regression analysis results
? Variables transform module: Transform linear variables into various functions (x -[1/x, log x, x ^ n etc.)
? Residual analysis module: Understand whether there is cross correlation (normal, equal variance or independence) or not between the parameters obtained
? Plot module: Output graphs for various statistics
? Distribution function module: Compute distribution functions needed when finding ANOVA etc.
? Outlier analysis module: Extract/remove outlier data through various algorithms
下载文档
热门试卷
- 2016年四川省内江市中考化学试卷
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
- 山东省滨州市三校2017届第一学期阶段测试初三英语试题
- 四川省成都七中2017届高三一诊模拟考试文科综合试卷
- 2017届普通高等学校招生全国统一考试模拟试题(附答案)
- 重庆市永川中学高2017级上期12月月考语文试题
- 江西宜春三中2017届高三第一学期第二次月考文科综合试题
- 内蒙古赤峰二中2017届高三上学期第三次月考英语试题
- 2017年六年级(上)数学期末考试卷
- 2017人教版小学英语三年级上期末笔试题
- 江苏省常州西藏民族中学2016-2017学年九年级思想品德第一学期第二次阶段测试试卷
- 重庆市九龙坡区七校2016-2017学年上期八年级素质测查(二)语文学科试题卷
- 江苏省无锡市钱桥中学2016年12月八年级语文阶段性测试卷
- 江苏省无锡市钱桥中学2016-2017学年七年级英语12月阶段检测试卷
- 山东省邹城市第八中学2016-2017学年八年级12月物理第4章试题(无答案)
- 【人教版】河北省2015-2016学年度九年级上期末语文试题卷(附答案)
- 四川省简阳市阳安中学2016年12月高二月考英语试卷
- 四川省成都龙泉中学高三上学期2016年12月月考试题文科综合能力测试
- 安徽省滁州中学2016—2017学年度第一学期12月月考高三英语试卷
- 山东省武城县第二中学2016.12高一年级上学期第二次月考历史试题(必修一第四、五单元)
- 福建省四地六校联考2016-2017学年上学期第三次月考高三化学试卷
- 甘肃省武威第二十三中学2016—2017学年度八年级第一学期12月月考生物试卷
网友关注
- 2018湖南省考申论每周一练:初心不变 心系人民
- 2018湖南省考行测题库:行测每日一练言语理解练习题答案10.16
- 2018湖南公务员考试面试题库:面试每日一练结构化面试模拟题答案1.8
- 2018湖南公务员考试面试题库:面试每日一练结构化面试模拟题1.5
- 2018湖南省考行测题库:行测每日一练言语理解练习题答案10.20
- 2018湖南省考面试题库:面试每日一练结构化面试模拟题答案10.18
- 2018湖南省考申论每周一练:杜绝恶意刷单
- 2018湖南省考面试题库:面试每日一练结构化面试模拟题答案10.16
- 2018湖南省考行测题库:行测每日一练言语理解练习题10.20
- 2018湖南省考面试题库:面试每日一练结构化面试模拟题答案10.19
- 2018湖南省考行测题库:行测每日一练资料分析练习题10.17
- 2018湖南公务员行测每日一练言语理解练习题答案12.13
- 2018湖南省考行测题库:行测每日一练判断推理练习题答案10.18
- 2018湖南省考面试题库:面试试每日一练结构化面试模拟题10.20
- 2018湖南省考申论每周一练答案:旅游转型升级
- 2018湖南公务员考试面试热点模拟题:劝阻吸烟引发老人离世
- 2018湖南公务员面试模拟题:把道德修养当做人生必修课
- 2018湖南省考面试题库:面试试每日一练结构化面试模拟题10.19
- 2018湖南省考申论每周一练答案:杜绝恶意刷单
- 2018湖南公务员行测每日一练言语理解练习题12.13
- 2018湖南公务员行测每日一练判断推理练习题答案12.14
- 2018湖南省考面试题库:面试试每日一练结构化面试模拟题10.16
- 2018湖南省考行测题库:行测每日一练判断推理练习题10.18
- 2018湖南省考面试模拟题:狮子对待鸡鸣声的不同态度
- 湖南公务员考试行测题库:行测每日一练言语理解练习题答案01.09
- 2018湖南公务员面试模拟题:有人质疑选票造假如何处理
- 2018湖南省考申论每周一练:旅游转型升级
- 2018湖南公务员行测每日一练判断推理练习题12.12
- 2018湖南省考面试题库:面试试每日一练结构化面试模拟题10.18
- 2018湖南省考面试题库:面试每日一练结构化面试模拟题10.13
网友关注视频
- 【部编】人教版语文七年级下册《老山界》优质课教学视频+PPT课件+教案,安徽省
- 七年级下册外研版英语M8U2reading
- 精品·同步课程 历史 八年级 上册 第15集 近代科学技术与思想文化
- 8.练习八_第一课时(特等奖)(苏教版三年级上册)_T142692
- 冀教版英语五年级下册第二课课程解读
- 外研版八年级英语下学期 Module3
- 人教版二年级下册数学
- 小学英语单词
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 3
- 三年级英语单词记忆下册(沪教版)第一二单元复习
- 沪教版牛津小学英语(深圳用) 五年级下册 Unit 10
- 七年级英语下册 上海牛津版 Unit9
- 冀教版小学数学二年级下册第二单元《余数和除数的关系》
- 二年级下册数学第一课
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,湖北省
- 【部编】人教版语文七年级下册《过松源晨炊漆公店(其五)》优质课教学视频+PPT课件+教案,江苏省
- 化学九年级下册全册同步 人教版 第25集 生活中常见的盐(二)
- 【部编】人教版语文七年级下册《逢入京使》优质课教学视频+PPT课件+教案,安徽省
- 外研版英语七年级下册module3 unit2第一课时
- 外研版英语七年级下册module3 unit1第二课时
- 第五单元 民族艺术的瑰宝_16. 形形色色的民族乐器_第一课时(岭南版六年级上册)_T3751175
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,广东省
- 沪教版牛津小学英语(深圳用) 四年级下册 Unit 2
- 苏科版数学七年级下册7.2《探索平行线的性质》
- 8.对剪花样_第一课时(二等奖)(冀美版二年级上册)_T515402
- 第五单元 民族艺术的瑰宝_15. 多姿多彩的民族服饰_第二课时(市一等奖)(岭南版六年级上册)_T129830
- 冀教版小学数学二年级下册第二单元《租船问题》
- 二年级下册数学第二课
- 外研版英语七年级下册module3 unit2第二课时
- 【部编】人教版语文七年级下册《泊秦淮》优质课教学视频+PPT课件+教案,辽宁省
精品推荐
- 2016-2017学年高一语文人教版必修一+模块学业水平检测试题(含答案)
- 广西钦州市高新区2017届高三11月月考政治试卷
- 浙江省湖州市2016-2017学年高一上学期期中考试政治试卷
- 浙江省湖州市2016-2017学年高二上学期期中考试政治试卷
- 辽宁省铁岭市协作体2017届高三上学期第三次联考政治试卷
- 广西钦州市钦州港区2016-2017学年高二11月月考政治试卷
- 广西钦州市钦州港区2017届高三11月月考政治试卷
- 广西钦州市钦州港区2016-2017学年高一11月月考政治试卷
- 广西钦州市高新区2016-2017学年高二11月月考政治试卷
- 广西钦州市高新区2016-2017学年高一11月月考政治试卷
分类导航
- 互联网
- 电脑基础知识
- 计算机软件及应用
- 计算机硬件及网络
- 计算机应用/办公自动化
- .NET
- 数据结构与算法
- Java
- SEO
- C/C++资料
- linux/Unix相关
- 手机开发
- UML理论/建模
- 并行计算/云计算
- 嵌入式开发
- windows相关
- 软件工程
- 管理信息系统
- 开发文档
- 图形图像
- 网络与通信
- 网络信息安全
- 电子支付
- Labview
- matlab
- 网络资源
- Python
- Delphi/Perl
- 评测
- Flash/Flex
- CSS/Script
- 计算机原理
- PHP资料
- 数据挖掘与模式识别
- Web服务
- 数据库
- Visual Basic
- 电子商务
- 服务器
- 搜索引擎优化
- 存储
- 架构
- 行业软件
- 人工智能
- 计算机辅助设计
- 多媒体
- 软件测试
- 计算机硬件与维护
- 网站策划/UE
- 网页设计/UI
- 网吧管理