Pivoting Detection dataset

Overview:

This is a (labelled!) snippet of the data that have been used for the experiments in the journal paper "Detection and Threat Prioritization of Pivoting Attacks in Large Networks", which has been accepted for publication in IEEE Transactions on Emerging Topics in Computing (IEEE TETCI). If you wish to read the paper, you can retrieve it here.

The dataset we disclose was collected in the major department of a large organization, and has been anonymized for privacy reasons. Regardless, the network data included can still be useful to conduct experiments on pivoting activities, and for any other task involving network traffic analysis. 

 

Dataset Description:

The dataset contains network traffic data in the forms of network flows collected in a large organization over an entire working day. These flows represent the communications of the internal hosts of the monitored network environment -- that is, it only contains internal-to-internal network traffic. For privacy reasons, the true IP addresses of the hosts have been anonymized: they are now represented as integers.

All flow samples included in the dataset are associated to a binary label that denotes whether the corresponding flow is part of a pivoting activity or not. The labelling procedure was performed and verified manually.

The dataset is provided as a single compressed .tar.gz file of ~1.5GB. When extracted, it will result in a single .CSV file of ~6GB in size, and containing nearly 75M network flows. The following is the list of features captured by our network flow collector and included in the dataset.             

Feature Name Description Type
(Number) the number of the flow in the .CSV file int
src the source IP address of the flow (anonymized) int
dst the destination IP address of the flow (anonymized) int
p_src the port used by the source host int
p_dst the port used by the destination host int
b_in the destination-to-source bytes sent int
b_out the source-to-destination bytes sent int
d the duration of the flow float
timestamp the timestamp at which the communication began timestamp
pkts_in the destination-to-source packets sent int
pkts_out the source-to-destination packets sent int
proto the communication protocol used int
is_pivoting the label specifying if the flow is part of a pivoting activity bool

Feel free to augment the dataset with additional features that can be derived by the ones we provide (such as the ratio of incoming/outgoing bytes). 

Most of the pivoting activities captured in the dataset involved the remote use of the "terminal" host, which was controlled by the "pivoter" host through some third-party application (such as Windows Remote Desktop protocol). We remark that these activities are the "normal" pivoting activities that occurred in the monitored organization -- they are not malicious, and hence represent benign and rare events.

 

Obtaining the Dataset:

If you wish to use the dataset, feel free to write an email to giovanni.apruzzese@unimore.it mentioning "Pivoting Dataset" in the subject, and remembering to specify your institution in the email body!

If you use the dataset, you are kindly invited to cite our Paper. Also, if you obtain this dataset through other means, we invite you to inform us so that we can update the list of Institutions that used this dataset.

 

List of Institutions using the Dataset: