Some recent papers
Search and Retrieval in Semantic-Structural Representations of Novel Malware
Abstract: “In this study, we present a novel representation for binary programs which captures semantic similarity and structural properties. This representation enables the search and retrieval of binary executable programs based on their similarity of behavioral properties. The proposed representation is composed in a bottom-up approach: we begin by extracting data dependency graphs (DDG), which are representative of both program structure and operational semantics. We then encode each program as a set of graph hashes representing isomorphic uniqueness, a method we have labeled DDG Fingerprinting. We present experimental results of search using k-Nearest Neighbors in a metric space constructed from a set of binary executables. Searches in the dataset are based on the operational semantics of specific malware examples.”
http://dx.doi.org/10.54364/aaiml.2024.41117
@article{musgrave2024search, title={Search and Retrieval in Semantic-Structural Representations of Novel Malware}, author={Musgrave, John and Campan, Alina and Messay-Kebede, Temesguen and Kapp, David and Wang, Boyang}, journal={Advances in Artificial Intelligence and Machine Learning}, volume={4}, number={1}, pages={117}, year={2024} }
Empirical Network Structure of Malicious Programs
Abstract: “A modern binary executable is a composition of various types of networks. Control flow graphs are a commonly used representation of an executable program used for classification tasks. Control flow and term frequency representations are widely adopted, but provide only a partial view of program semantics and present challenges to increases in resolution. By performing a quantitative analysis of program networks, we enable the identification of patterns within these features that are correlated to structure. This allows for increases in feature resolution and pattern recognition in classification tasks. These are necessary steps in order to obtain greater explainability in classification results. We demonstrate the presence of Scale-Free properties of network structure for program data dependency and control flow graphs, and show that data dependency graphs also have Small-World structural properties. We show that program data dependency graphs have a degree correlation that is structurally disassortative, and that control flow graphs have a neutral degree assortativity, indicating the use of random graphs to model the structural properties of program control flow graphs would show increased accuracy. An increase in feature resolution allows for the structural properties of program classes to be analyzed for patterns as well as their component parts. By providing an increase in feature resolution within labeled datasets of executable programs we provide a quantitative basis to interpret the results of classifiers trained on CFG graph features. By capturing a complete picture of program networks we can enable future work in mapping a program’s operational semantics to its structure.“
http://dx.doi.org/10.54364/aaiml.2024.41112
@article{musgrave2024empirical, title={Empirical Network Structure of Malicious Programs}, author={Musgrave, John and Campan, Alina and Messay-Kebede, Temesguen and Kapp, David and Wang, Boyang}, journal={Advances in Artificial Intelligence and Machine Learning}, volume={4}, number={1}, pages={112}, year={2024} }