CISO’s Guide: 9 Machine Learning Terms You Must Know

In our previous blogs, we introduced the concept of machine learning for cybersecurity, then took a deeper dive into the pillars of machine learning and explored how supervised and unsupervised machine learning work, so that you can move beyond the marketing claims and understand the key technical concepts. In this blog, we present key terms you should know.

Machine learning - A type of AI that provides computers with the ability to learn without being explicitly programmed. Machine learning focuses on the development of computer programs that can teach themselves to grow and change when exposed to new data. Machine learning and AI are often used interchangeably, especially in marketing.

Screen Shot 2017-12-05 at 8.39.21 AM.png

Data scientist - Typically PhDs, these people are experts in the use of AI techniques. Based on the goal or use case of the desired outcome, they select the right models and algorithms, combine them with the right data and tune them to produce accurate results.

Screen Shot 2017-12-05 at 8.43.21 AM.png

Data - This is the "raw feedstock" for all machine learning. For security machine learning, the input can be packets, flows, logs, alerts and text, such as performance reviews and external threat intelligence.

Screen Shot 2017-12-05 at 8.39.36 AM.png

Feature - Individual data elements relevant to the specific machine learning model and use case. They can be directly extracted from the data or derived after passing the raw data through a pre-processing algorithm. Examples are: access (IP addresses and countries visited) or time (access start and end times), or counter (bytes uploaded and downloaded).

Screen Shot 2017-12-05 at 8.39.46 AM.png

Model - The model takes in features and parameters provided by the algorithms, and applies a specific machine learning calculation—the output of the model.

Screen Shot 2017-12-05 at 8.48.54 AM.png

Supervised machine learning - Uses a "teaching" technique to develop a relationship between a known set of outputs and their inputs. Once the model is developed, it can be used to predict the output for a new set of inputs.

Screen Shot 2017-12-05 at 8.40.01 AM.png

Unsupervised machine learning - The algorithm is "self-learning," so no prior training or preparation is required before it's deployed. The algorithm automatically discovers relevant structure and relationships among the inputs.

Screen Shot 2017-12-05 at 9.12.39 AM.png

Baselining - Anomaly detection models typically build a profile of "expected" behavior for an entity like a user or a system. Once these baselines are established, the models look for deviations from the baseline.

Training - This process prepares a supervised machine learning model by feeding features extracted from training data. Training is performed periodically with the update of training data or features.

Learn More

In our next blog, we will present a use case for how machine learning and user entity behavioral analytics can find attacks that evaded other real-time systems.

Want to go deeper right now? Download the Read the CISO's Guide to Machine Learning and User Entity Behavioral Analytics e-book.