Understanding IntroSpect’s Modular, Data-Agnostic and Scalable UEBA Architecture

Share Post

In a recent blog, I made the process for selecting a user and entity behavior analytics (UEBA) solution easy. Simply evaluate solutions along three axes-scalability, multi-dimensional analytics, and the integration between human and machine intelligence.

Guided by these considerations, Aruba has designed a UEBA solution with a flexible architecture that delivers varying levels of machine assistance to complement analysts' needs for behavioral analytics.

Four Layers of Abstraction

As shown in Figure 1, IntroSpect's UEBA is built with four layers of abstraction—use case definition, feature selection, baseline profiling, and anomaly detection.


  • Use Case Definition. The first layer defines a behavioral use case (e.g., suspicious access to critical servers) that generally requires local context.
  • Feature Selection. The second layer is about the selection of meaningful feature categories(e.g., time, data volume, or counters) for each use case. While feature selection is mostly a human-driven effort, deep-learning algorithms (e.g., convolutional neural network) that can automatically extract discriminative features from a large volume of unlabeled training data also can be used.
  • Baseline Profiling. The third layer learns the "normal baseline" for each entity (i.e., user or host) for each use case along two dimensions of behavior: historical and peer group. The former uses the entity's own historical behavior, while the latter uses common behaviors across a peer group, which can be flexibly defined (e.g. from Active Directory or user-provided input) or derived through self-learning. IntroSpect also uses adaptive learning to incorporate analyst feedback into its behavioral models.
  • Anomaly Detection. The fourth layer detects behavioral anomalies for each entity through the deviation from their baselines. This part is totally automated and machine driven. Given different dimensions and types of feature vectors selected in each behavior use case, Aruba built different unsupervised machine learning models with corresponding distance(i.e., deviation) calculations to automatically detect anomalies.

Three Critical Design Choices

Building a flexible behavioral analytics solution requires deliberate design choices and significant investment during product implementation. It's well worth it as IntroSpect's analytics identify threats that evade other simplistic approaches. This payoff is enabled by three critical design choices.


As described above, Aruba has abstracted and decoupled the use case layers (the first and second layers), which are security-context-driven, from the detection layers (the third and fourth layers), which are machine-learning-driven. In addition, we have done all the heavy lifting to pre-tune and self-tune these machine-learning models, so that security analysts can start benefiting from the solution without a deep understanding of machine learning.

All four layers are built in a totally modular fashion, so that security analysts – no matter whether they come with security or data science backgrounds – always can interact and influence the results of UEBA with their own expertise to improve its overall accuracy.


As explained in my "Three Considerations When Selecting a UEBA Solution" blog, a multi-dimensional UEBA solution that combines anomalous signals from different data sources can highly improve its effectiveness. IntroSpect's UEBA solution is built in a data-agnostic way. This means an analyst can add UEBA support for a new use case from existing or new data sources with some simple schema and use case-specific configurations.


If you compare these two different behavior use cases in the above pictures – suspicious access to critical servers (Figure 1) and suspicious access to buildings (Figure 2), you'll find that the main difference between them is the data source, i.e., the first is from either server logs or network packets and the second is from badge reader logs. Except that both use cases are monitoring the same temporal features (plus some other behavior-specific features) and detecting similar behavioral anomalies.


IntroSpect's behavior analytics platform is built using a big data architecture, leveraging Apache Hadoop and Spark-based technologies. For data persistence, we use a mix of NoSQL key-value, columnar, and time-series databases to store a high-volume of both raw and derived data in the most efficient format for different analytics uses.

A hierarchical data processing approach enables us to break and embed different analytics requirements, such as feature extraction and aggregation, into all stream and batch processing layers, thus minimizing the data read-write cost to achieve the best scalability.

Achieving Automatability

In "The Five Characteristics of an Intelligence-Driven Security Operations Center," Gartner's Neil McDonald and Oliver Rochford make a central point about how they see the enterprise security operations center (SOC) evolving.

"Rather than seek full automation of all SOC activities, enterprises should seek 'automatability' — the capability of being automated as higher levels of confidence is achieved. Even then, analytics-driven, human-augmented security decision support systems will be used to provide the SOC analyst with the context of the recommended action, along with the details behind the verdict and recommended action." 

This is the foundation of IntroSpect's product vision. By designing a behavioral analytics solution that's modular, data-agnostic and scalable, we enable organizations to achieve that "automatability."

IntroSpect ships with a broad range of behavioral use cases, developed by our own security experts, leveraging the modular architecture. This enables organizations to get value from IntroSpect immediately upon deployment. Analysts also can influence and improve the quality of behavior detections in many different ways. Plus they can define their own behavior use cases, allowing them to extend IntroSpect to fit their specific requirements.

With IntroSpect, it's not about replacing security analysts with automated systems (which is what resonated with Drew Conry-Murray of Packet Pushers who wrote this article about IntroSpect). Rather, it's about enabling organizations to make optimal use of scarce SOC resources. 

Ready to learn more? Get the CISO's guide to machine learning and user and entity behavioral analytics.

Jisheng Wang is the senior director of data science in the office of the CTO for Aruba, a Hewlett Packard Enterprise company.