Cyber Crimes and Confusion Matrix
Cybercrime, or computer crime, is a crime that involves a computer and a network. The computer may have been used in the commission of a crime, or it may be the target. Cybercrime may harm someone’s security and financial health.
Most, but not all, cybercrime is committed by cybercriminals or hackers who want to make money or profit. Cybercrime is carried out by individuals or organizations.
The application or the software that identifies the attempts to reach the server as either bad or good. Although the perfect ideal software identifies accurately with 100% precision. And
- 100% capture rate
- 0% escape rate
- 0% False Positive
The software or the application needs to be developed, analyzed and tested. Sometimes, the software could classify everything bad and might have a 100% capture rate.
Let us also consider that the software that will classify all attacks as either bad or good. The perfect software would classify every transaction correctly resulting in 100% Precision (everything classified as bad was actually bad), 100% capture rate (classified every actual bad as bad), 0% escape rate (no bads classified as good), and 0% False Positive rate (no goods classified as bad). The system needs to be developed, tested, and analyzed from a real-time perspective.
For example, the application could classify every attack as bad and achieve 100% capture rate, and 0% escape rate, but would also result in poor Precision and a huge False Positive rate – thus requiring significant enhancement to solve the improper prediction and improve the efficiency. On the other extreme, the application could classify everything good, be mostly right, and not catch any bads. The criteria to analyze the situations of this type which are the most common issues faced in the cybersecurity domain can be solved with the help of Confusion Matrix.
WHAT IS CONFUSION MATRIX?
A Confusion matrix is an N x N matrix used for evaluating the performance of a machine learning classification model, where N is the number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. This gives us a holistic view of how well our classification model is performing and what kinds of errors it is making.
This is a list of rates that are often computed from a confusion matrix for a binary classifier:
- Accuracy: Overall, how often is the classifier correct?
- (TP+TN)/total
- Misclassification Rate: Overall, how often is it wrong?
- (FP+FN)/total
- equivalent to 1 minus Accuracy
- also known as “Error Rate”
- True Positive Rate: When it’s actually yes, how often does it predict yes?
- TP/actual yes
- also known as “Sensitivity” or “Recall”
- False Positive Rate: When it’s actually no, how often does it predict yes?
- FP/actual no
- True Negative Rate: When it’s actually no, how often does it predict no?
- TN/actual no
- equivalent to 1 minus False Positive Rate
- also known as “Specificity”
- Precision: When it predicts yes, how often is it correct?
- TP/predicted yes
- Prevalence: How often does the yes condition actually occur in our sample?
- actual yes/total
In the real-time scenario, the data is not available to define and analyze the software with the confusion matrix. Hence, the software evaluation can be with the help of predefined data or hand-crafted data, where you will be having the real truth and data to complete the Confusion Matrix.
With this approach you know where every bad is (you have the ground truth), so you can complete all 4 quadrants of the confusion matrix, and can then only, conduct a system analysis, driving the application to the real goal of tuning and optimizing Precision (maximizing TP) and Capture rate (maximizing TP/TP+FN) , while at the same time minimizing Escapes (FN) and False Positive rate (FP/FP+TP).
This approach or the way of Confusion Matrix to have the accurate software for detecting the cyberattacks on to the servers and can help to identify them properly securing the data and the infrastructure of the company.