An Evaluation of Text Mining Techniques in Sampling of Network Ports from IBR Traffic
- Chindipha, Stones D, Irwin, Barry V W, Herbert, Alan
- Authors: Chindipha, Stones D , Irwin, Barry V W , Herbert, Alan
- Date: 2019
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/427630 , vital:72452 , https://www.researchgate.net/profile/Stones-Chindi-pha/publication/335910179_An_Evaluation_of_Text_Mining_Techniques_in_Sampling_of_Network_Ports_from_IBR_Traffic/links/5d833084458515cbd1985a38/An-Evaluation-of-Text-Mining-Techniques-in-Sampling-of-Network-Ports-from-IBR-Traffic.pdf
- Description: Information retrieval (IR) has had techniques that have been used to gauge the extent to which certain keywords can be retrieved from a document. These techniques have been used to measure similarities in duplicated images, native language identification, optimize algorithms, among others. With this notion, this study proposes the use of four of the Information Retrieval Techniques (IRT/IR) to gauge the implications of sampling a/24 IPv4 ports into smaller subnet equivalents. Using IR, this paper shows how the ports found in a/24 IPv4 net-block relate to those found in the smaller subnet equivalents. Using Internet Background Radiation (IBR) data that was collected from Rhodes University, the study found compelling evidence of the viability of using such techniques in sampling datasets. Essentially, being able to identify the variation that comes with sampling the baseline dataset. It shows how the various samples are similar to the baseline dataset. The correlation observed in the scores proves how viable these techniques are to quantifying variations in the sampling of IBR data. In this way, one can identify which subnet equivalent best represents the unique ports found in the baseline dataset (IPv4 net-block dataset).
- Full Text:
- Date Issued: 2019
- Authors: Chindipha, Stones D , Irwin, Barry V W , Herbert, Alan
- Date: 2019
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/427630 , vital:72452 , https://www.researchgate.net/profile/Stones-Chindi-pha/publication/335910179_An_Evaluation_of_Text_Mining_Techniques_in_Sampling_of_Network_Ports_from_IBR_Traffic/links/5d833084458515cbd1985a38/An-Evaluation-of-Text-Mining-Techniques-in-Sampling-of-Network-Ports-from-IBR-Traffic.pdf
- Description: Information retrieval (IR) has had techniques that have been used to gauge the extent to which certain keywords can be retrieved from a document. These techniques have been used to measure similarities in duplicated images, native language identification, optimize algorithms, among others. With this notion, this study proposes the use of four of the Information Retrieval Techniques (IRT/IR) to gauge the implications of sampling a/24 IPv4 ports into smaller subnet equivalents. Using IR, this paper shows how the ports found in a/24 IPv4 net-block relate to those found in the smaller subnet equivalents. Using Internet Background Radiation (IBR) data that was collected from Rhodes University, the study found compelling evidence of the viability of using such techniques in sampling datasets. Essentially, being able to identify the variation that comes with sampling the baseline dataset. It shows how the various samples are similar to the baseline dataset. The correlation observed in the scores proves how viable these techniques are to quantifying variations in the sampling of IBR data. In this way, one can identify which subnet equivalent best represents the unique ports found in the baseline dataset (IPv4 net-block dataset).
- Full Text:
- Date Issued: 2019
Bolvedere: a scalable network flow threat analysis system
- Authors: Herbert, Alan
- Date: 2019
- Subjects: Bolvedere (Computer network analysis system) , Computer networks -- Scalability , Computer networks -- Measurement , Computer networks -- Security measures , Telecommunication -- Traffic -- Measurement
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/71557 , vital:29873
- Description: Since the advent of the Internet, and its public availability in the late 90’s, there have been significant advancements to network technologies and thus a significant increase of the bandwidth available to network users, both human and automated. Although this growth is of great value to network users, it has led to an increase in malicious network-based activities and it is theorized that, as more services become available on the Internet, the volume of such activities will continue to grow. Because of this, there is a need to monitor, comprehend, discern, understand and (where needed) respond to events on networks worldwide. Although this line of thought is simple in its reasoning, undertaking such a task is no small feat. Full packet analysis is a method of network surveillance that seeks out specific characteristics within network traffic that may tell of malicious activity or anomalies in regular network usage. It is carried out within firewalls and implemented through packet classification. In the context of the networks that make up the Internet, this form of packet analysis has become infeasible, as the volume of traffic introduced onto these networks every day is so large that there are simply not enough processing resources to perform such a task on every packet in real time. One could combat this problem by performing post-incident forensics; archiving packets and processing them later. However, as one cannot process all incoming packets, the archive will eventually run out of space. Full packet analysis is also hindered by the fact that some existing, commonly-used solutions are designed around a single host and single thread of execution, an outdated approach that is far slower than necessary on current computing technology. This research explores the conceptual design and implementation of a scalable network traffic analysis system named Bolvedere. Analysis performed by Bolvedere simply asks whether the existence of a connection, coupled with its associated metadata, is enough to conclude something meaningful about that connection. This idea draws away from the traditional processing of every single byte in every single packet monitored on a network link (Deep Packet Inspection) through the concept of working with connection flows. Bolvedere performs its work by leveraging the NetFlow version 9 and IPFIX protocols, but is not limited to these. It is implemented using a modular approach that allows for either complete execution of the system on a single host or the horizontal scaling out of subsystems on multiple hosts. The use of multiple hosts is achieved through the implementation of Zero Message Queue (ZMQ). This allows for Bolvedre to horizontally scale out, which results in an increase in processing resources and thus an increase in analysis throughput. This is due to ease of interprocess communications provided by ZMQ. Many underlying mechanisms in Bolvedere have been automated. This is intended to make the system more userfriendly, as the user need only tell Bolvedere what information they wish to analyse, and the system will then rebuild itself in order to achieve this required task. Bolvedere has also been hardware-accelerated through the use of Field-Programmable Gate Array (FPGA) technologies, which more than doubled the total throughput of the system.
- Full Text:
- Date Issued: 2019
- Authors: Herbert, Alan
- Date: 2019
- Subjects: Bolvedere (Computer network analysis system) , Computer networks -- Scalability , Computer networks -- Measurement , Computer networks -- Security measures , Telecommunication -- Traffic -- Measurement
- Language: English
- Type: text , Thesis , Doctoral , PhD
- Identifier: http://hdl.handle.net/10962/71557 , vital:29873
- Description: Since the advent of the Internet, and its public availability in the late 90’s, there have been significant advancements to network technologies and thus a significant increase of the bandwidth available to network users, both human and automated. Although this growth is of great value to network users, it has led to an increase in malicious network-based activities and it is theorized that, as more services become available on the Internet, the volume of such activities will continue to grow. Because of this, there is a need to monitor, comprehend, discern, understand and (where needed) respond to events on networks worldwide. Although this line of thought is simple in its reasoning, undertaking such a task is no small feat. Full packet analysis is a method of network surveillance that seeks out specific characteristics within network traffic that may tell of malicious activity or anomalies in regular network usage. It is carried out within firewalls and implemented through packet classification. In the context of the networks that make up the Internet, this form of packet analysis has become infeasible, as the volume of traffic introduced onto these networks every day is so large that there are simply not enough processing resources to perform such a task on every packet in real time. One could combat this problem by performing post-incident forensics; archiving packets and processing them later. However, as one cannot process all incoming packets, the archive will eventually run out of space. Full packet analysis is also hindered by the fact that some existing, commonly-used solutions are designed around a single host and single thread of execution, an outdated approach that is far slower than necessary on current computing technology. This research explores the conceptual design and implementation of a scalable network traffic analysis system named Bolvedere. Analysis performed by Bolvedere simply asks whether the existence of a connection, coupled with its associated metadata, is enough to conclude something meaningful about that connection. This idea draws away from the traditional processing of every single byte in every single packet monitored on a network link (Deep Packet Inspection) through the concept of working with connection flows. Bolvedere performs its work by leveraging the NetFlow version 9 and IPFIX protocols, but is not limited to these. It is implemented using a modular approach that allows for either complete execution of the system on a single host or the horizontal scaling out of subsystems on multiple hosts. The use of multiple hosts is achieved through the implementation of Zero Message Queue (ZMQ). This allows for Bolvedre to horizontally scale out, which results in an increase in processing resources and thus an increase in analysis throughput. This is due to ease of interprocess communications provided by ZMQ. Many underlying mechanisms in Bolvedere have been automated. This is intended to make the system more userfriendly, as the user need only tell Bolvedere what information they wish to analyse, and the system will then rebuild itself in order to achieve this required task. Bolvedere has also been hardware-accelerated through the use of Field-Programmable Gate Array (FPGA) technologies, which more than doubled the total throughput of the system.
- Full Text:
- Date Issued: 2019
Quantifying the accuracy of small subnet-equivalent sampling of IPv4 internet background radiation datasets
- Chindipha, Stones D, Irwin, Barry V W, Herbert, Alan
- Authors: Chindipha, Stones D , Irwin, Barry V W , Herbert, Alan
- Date: 2019
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/430271 , vital:72679 , https://doi.org/10.1145/3351108.3351129
- Description: Network telescopes have been used for over a decade to aid in identifying threats by gathering unsolicited network traffic. This Internet Background Radiation (IBR) data has proved to be a significant source of intelligence in combating emerging threats on the Internet at large. Traditionally, operation has required a significant contiguous block of IP addresses. Continued operation of such sensors by researchers and adoption by organisations as part of its operation intelligence is becoming a challenge due to the global shortage of IPv4 addresses. The pressure is on to use allocated IP addresses for operational purposes. Future use of IBR collection methods is likely to be limited to smaller IP address pools, which may not be contiguous. This paper offers a first step towards evaluating the feasibility of such small sensors. An evaluation is conducted of the random sampling of various subnet sized equivalents. The accuracy of observable data is compared against a traditional 'small' IPv4 network telescope using a /24 net-block. Results show that for much of the IBR data, sensors consisting of smaller, non-contiguous blocks of addresses are able to achieve high accuracy rates vs. the base case. While the results obtained given the current nature of IBR, it proves the viability for organisations to utilise free IP addresses within their networks for IBR collection and ultimately the production of Threat intelligence.
- Full Text:
- Date Issued: 2019
- Authors: Chindipha, Stones D , Irwin, Barry V W , Herbert, Alan
- Date: 2019
- Subjects: To be catalogued
- Language: English
- Type: text , article
- Identifier: http://hdl.handle.net/10962/430271 , vital:72679 , https://doi.org/10.1145/3351108.3351129
- Description: Network telescopes have been used for over a decade to aid in identifying threats by gathering unsolicited network traffic. This Internet Background Radiation (IBR) data has proved to be a significant source of intelligence in combating emerging threats on the Internet at large. Traditionally, operation has required a significant contiguous block of IP addresses. Continued operation of such sensors by researchers and adoption by organisations as part of its operation intelligence is becoming a challenge due to the global shortage of IPv4 addresses. The pressure is on to use allocated IP addresses for operational purposes. Future use of IBR collection methods is likely to be limited to smaller IP address pools, which may not be contiguous. This paper offers a first step towards evaluating the feasibility of such small sensors. An evaluation is conducted of the random sampling of various subnet sized equivalents. The accuracy of observable data is compared against a traditional 'small' IPv4 network telescope using a /24 net-block. Results show that for much of the IBR data, sensors consisting of smaller, non-contiguous blocks of addresses are able to achieve high accuracy rates vs. the base case. While the results obtained given the current nature of IBR, it proves the viability for organisations to utilise free IP addresses within their networks for IBR collection and ultimately the production of Threat intelligence.
- Full Text:
- Date Issued: 2019
- «
- ‹
- 1
- ›
- »