Accurate network traffic classification is essential for a variety of applications, such as security, quality of service, or monitoring. Traffic classification can be applied at ingress ports to separate traffic into different flows or queues, and can therefore be handled with different policies.
Different methods have been applied to classify network traffic. The most basic is to characterize them based on its source and destination ports. However, this is not too useful when services use non-standard ports. Another method is to inspect the payload of the packet. This method also has its drawbacks, in that its computationally intensive, and also impossible when traffic is encrypted. Because of these shortcomings, traffic classification has been a problem of interest to machine learning researchers, who have applied a variety of metrics from labeled flow trace data to classify network traffic. We expand on this research, taking into account the fact that much of the network traffic data in the world is unlabeled. In our approach, we use a weakly supervised method to improve on the problem domain of labeled network traffic data classification. We a statistical approach, which relies on packet features such as packet length, arrival times, etc. In particular, we generate a set of heuristics based on traffic features to massively increase the amount of labeled data and improve traffic classification accuracy.