This is a living review of articles related to “program debloating” and “software debloating.”
Legend: Conference; Journal; Thesis.
Index
2024
- [ICSE] Efficiently Trimming the Fat: Streamlining Software Dependencies with Java Reflection and Dependency Analysis LinkNumerous third-party libraries introduced into client projects are not actually required, resulting in modern software being gradually bloated. Software developers may spend much unnecessary effort to manage the bloated dependencies: keeping the library versions up-to-date, making sure that heterogeneous licenses are compatible, and resolving dependency conflict or vulnerability issues. However, the prior debloating techniques can easily produce false alarms of bloated dependencies since they are less effective in analyzing Java reflections. Besides, the solutions given by the existing approaches for removing bloated dependencies may induce new issues that are not conducive to dependency management. To address the above limitations, in this paper, we developed a technique, Slimming, to remove bloated dependencies from software projects reliably. Slimming statically analyzes the Java reflections that are commonly leveraged by popular frameworks (e.g., Spring Boot) and resolves the reflective targets via parsing configuration files (*.xml, *.yml and *.properties). By modeling string manipulations, Slimming fully resolves the string arguments of our concerned reflection APIs to identify all the required dependencies. More importantly, it helps developers analyze the debloating solutions by weighing the benefits against the costs of dependency management. Our evaluation results show that the static reflection analysis capability of Slimming outperforms all the other existing techniques with 97.0% of Precision and 98.8% of Recall. Compared with the prior debloating techniques, Slimming can reliably remove the bloated dependencies with a 100% test passing ratio and improve the rationality of debloating solutions. In our large-scale study in the Maven ecosystem, Slimming reported 484 bloated dependencies to 66 open-source projects. 38 reports (57.6%) have been confirmed by developers.
2023
[SecDev] Evaluating Container Debloaters Link
DOCKER containers have been widely used by organizations because they are lightweight and single hardware can run multiple instances of a container. However, this ease of virtualization comes with weaker isolation as compared to virtual machines. A compromised container can allow the attacker to escape to the host and gain privileged access. Several approaches have been developed to reduce the attack surface of containers either through the reduction of system calls or through slimming container images. Unfortunately, measuring the performance of container debloaters is challenging as there exists no platform for this purpose. This paper aims to address this gap, by building a unified platform to benchmark them. Currently, our benchmark includes 7 workload applications, and 3 container debloaters, i.e., SPEAKER, CONFINE (syscalls reduction tools), and SLIMTOOLKIT (image size reduction tool). We added several evaluation metrics in the framework, which include category-based system call reduction, CVEs mitigated, size reduction, and execution correctness.Our evaluation revealed interesting insights into the existing techniques. Both the system call reduction tools were able to produce correct debloated containers as compared to SLIMTOOLKIT (tool to reduce image size) which worked well too by reducing almost 79 percent of the size of the image but it failed to produce correct results on 2 out of 7 applications.[ESEC/FSE] Dead Code Removal at Meta: Automatically Deleting Millions of Lines of Code and Petabytes of Deprecated Data Link
Software constantly evolves in response to user needs: new features are built, deployed, mature and grow old, and eventually their usage drops enough to merit switching them off. In any large codebase, this feature lifecycle can naturally lead to retaining unnecessary code and data. Removing these respects users’ privacy expectations, as well as helping engineers to work efficiently. In prior software engineering research, we have found little evidence of code deprecation or dead-code removal at industrial scale. We describe Systematic Code and Asset Removal Framework (SCARF), a product deprecation system to assist engineers working in large codebases. SCARF identifies unused code and data assets and safely removes them. It operates fully automatically, including committing code and dropping database tables. It also gathers developer input where it cannot take automated actions, leading to further removals. Dead code removal increases the quality and consistency of large codebases, aids with knowledge management and improves reliability. SCARF has had an important impact at Meta. In the last year alone, it has removed petabytes of data across 12.8 million distinct assets, and deleted over 104 million lines of code.[ESEC/FSE] Dead Code Removal at Meta: Automatically Deleting Millions of Lines of Code and Petabytes of Deprecated Data Link
Software constantly evolves in response to user needs: new features are built, deployed, mature and grow old, and eventually their usage drops enough to merit switching them off. In any large codebase, this feature lifecycle can naturally lead to retaining unnecessary code and data. Removing these respects users’ privacy expectations, as well as helping engineers to work efficiently. In prior software engineering research, we have found little evidence of code deprecation or dead-code removal at industrial scale. We describe Systematic Code and Asset Removal Framework (SCARF), a product deprecation system to assist engineers working in large codebases. SCARF identifies unused code and data assets and safely removes them. It operates fully automatically, including committing code and dropping database tables. It also gathers developer input where it cannot take automated actions, leading to further removals. Dead code removal increases the quality and consistency of large codebases, aids with knowledge management and improves reliability. SCARF has had an important impact at Meta. In the last year alone, it has removed petabytes of data across 12.8 million distinct assets, and deleted over 104 million lines of code.[ArXiv] Unleashing The Adversarial Facet of Software Debloating Link
Software debloating techniques are applied to craft a specialized version of the program based on the user’s requirements and remove irrelevant code accordingly. The debloated programs presumably maintain better performance and reduce the attack surface in contrast to the original programs. This work unleashes the effectiveness of applying software debloating techniques on the robustness of machine learning systems in the malware classification domain. We empirically study how an adversarial can leverage software debloating techniques to mislead machine learning malware classification models. We apply software debloating techniques to generate adversarial examples and demonstrate these adversarial examples can reduce the detection rate of VirusTotal. Our study opens new directions for research into adversarial machine learning not only in malware detection/classification but also in other software domains.[USENIX] AnimateDead: Debloating Web Applications Using Concolic Execution Link
Year over year, modern web applications evolve to cater to the needs of many users and support various runtime environments. The ever-growing need to appeal to as many users as possible and the reliance on third-party dependencies comes at the price of code-bloat. Previous research has highlighted the benefits of debloating mechanisms which produce smaller applications, customized to the real needs of their users with significant security improvements. Recognizing the limitations of dynamic and static debloating schemes (including high runtime overhead and lack of accuracy), we propose a hybrid approach based on concolic execution. We developed AnimateDead, a PHP emulator capable of concolic execution and designed a distributed analysis framework around it. By using the readily available web server logs as application entry points, we perform concolic reachability analysis and extract the code-coverage of target web applications in an abstract environment, which allows our results to generalize for all user inputs and database states. We demonstrate that debloating via concolic execution improves the security of web applications by shrinking the size of their code by an average of 47% and reducing critical API calls by 55%, while removing 35-65% of vulnerabilities for historic CVEs. We show that via concolic execution, we can debloat web applications with comparable security improvements of dynamic debloating schemes without suffering from the runtime overhead, and the need for a training phase. Moreover, AnimateDead-debloated web applications reduce the likelihood of breakage by allowing users to perform all actions reachable from the analyzed entry points.
2022
[ACSAC] DeView: Confining Progressive Web Applications by Debloating Web APIs Link
A progressive web application (PWA) becomes an attractive option for building universal applications based on feature-rich web Application Programming Interfaces (APIs). While flexible, such vast APIs inevitably bring a significant increase in an API attack surface, which commonly corresponds to a functionality that is neither needed nor wanted by the application. A promising approach to reduce the API attack surface is software debloating, a technique wherein an unused functionality is programmatically removed from an application. Unfortunately, debloating PWAs is challenging, given the monolithic design and non-deterministic execution of a modern web browser. In this paper, we present DeView, a practical approach that reduces the attack surface of a PWA by blocking unnecessary but accessible web APIs. DeView tackles the challenges of PWA debloating by i) record-and-replay web API profiling that identifies needed web APIs on an app-by-app basis by replaying (recorded) browser interactions and ii) compiler-assisted browser debloating that eliminates the entry functions of corresponding web APIs from the mapping between web API and its entry point in a binary. Our evaluation shows the effectiveness and practicality of DeView. DeView successfully eliminates 91.8% of accessible web APIs while i) maintaining original functionalities and ii) preventing 76.3% of known exploits on average.[ESEC/FSE] Diet code is healthy: simplifying programs for pre-trained models of code Link
Pre-trained code representation models such as CodeBERT have demonstrated superior performance in a variety of software engineering tasks, yet they are often heavy in complexity, quadratically with the length of the input sequence. Our empirical analysis of CodeBERT’s attention reveals that CodeBERT pays more attention to certain types of tokens and statements such as keywords and data-relevant statements. Based on these findings, we propose DietCode, which aims at lightweight leverage of large pre-trained models for source code. DietCode simplifies the input program of CodeBERT with three strategies, namely, word dropout, frequency filtering, and an attention-based strategy that selects statements and tokens that receive the most attention weights during pre-training. Hence, it gives a substantial reduction in the computational cost without hampering the model performance. Experimental results on two downstream tasks show that DietCode provides comparable results to CodeBERT with 40% less computational cost in fine-tuning and testing.[CACM] Set the configuration for the heart of the OS: on the practicality of operating system kernel debloating Link
This paper presents a study on the practicality of operating system (OS) kernel debloating, that is, reducing kernel code that is not needed by the target applications. Despite their significant benefits regarding security (attack surface reduction) and performance (fast boot time and reduced memory footprints), the state-of-the-art OS kernel debloating techniques are not widely adopted in practice, especially in production environments. We identify the limitations of existing kernel debloating techniques that hinder their practical adoption, such as both accidental and essential ones. To understand these limitations, we build an advanced debloating framework named Cozart that enables us to conduct a number of experiments on different types of OS kernels (such as Linux and the L4 microkernel) with a wide variety of applications (such as HTTPD, Memcached, MySQL, NGINX, PHP, and Redis). Our experimental results reveal the challenges and opportunities in making OS kernel debloating practical. We share these insights and our experience to shed light on addressing the limitations of kernel debloating techniques in future research and development efforts.[CACM] Technical perspective: ‘What is the ideal operating system? Link
The enormous consumer market for IBM PCs and compatibles has made them affordable. Now, with a free operating system called Linux, these inexpensive machines can be converted into powerful workstations for teaching, research, and software development. Linux is a full-fledged operating system that provides all the capabilities normally associated with commercial Unix systems and has excellent compilers for C, C++, Pascal, Modula-2 and -3, Oberon, Smalltalk, and Fortran. For professionals who use Unix-based workstations at work, Linux permits virtually identical working environments on their personal home machines. For cost-conscious educational institutions–especially in developing nations–Linux can create world-class computing environments from inexpensive, easily maintained PC clones. And, for university students–especially in science and engineering–Linux provides an essentially cost-free path into Unix and X Windows. Until recently, most information about Linux was found only on the Internet. This article attempts to remedy that situation by describing how to obtain, install, and use Linux, as well as the hardware requirements to run it. In addition to providing a survey of pertinent literature, the article describes several public-domain applications that the author has found useful.[TSE] XDebloat: Towards Automated Feature-Oriented App Debloating Link
Existing programming practices for building Android apps mainly follow the “one-size-fits-all” strategy to include lots of functions and adapt to most types of devices. However, this strategy can result in software bloat and many serious issues, such as slow download speed, and large attack surfaces. Existing solutions cannot effectively debloat an app as they either lack flexibility or require human efforts. This work proposes a novel feature-oriented debloating approach and builds a prototype, named XDebloat , to automate this process in a flexible manner. First, We propose three feature location approaches to mine features in an app. XDebloat supports feature location approaches at a fine granularity. It also makes the feature location results editable. Second, XDebloat considers several Android-oriented issues (i.e., callbacks) to perform a more precise analysis. Third, XDebloat supports two major debloating strategies: pruning-based debloating and module-based debloating. We evaluate XDebloat with 200 open-source and 1,000 commercial apps. The results show that XDebloat can successfully remove components from apps or transform apps into on-demand modules within 10 minutes. For the pruning-based debloating strategy, on average, XDebloat can remove 32.1% code from an app. For the module-based debloating strategy, XDebloat can help developers build instant apps or app bundles automatically.[ACMQUEUE] OCCAM-v2: Combining Static and Dynamic Analysis for Effective and Efficient Whole-program Specialization Link
Software has evolved to support diverse sets of features. Agile software engineering practices, such as code designed for reusability, introduce redundant code. In a common motif, entire libraries are linked where only a small number of functions are needed. The accumulation of extraneous code can negatively impact application users who need to access only a subset of these features.[ASPLOS] One size does not fit all: security hardening of MIPS embedded systems via static binary debloating for shared libraries Link
Embedded systems have become prominent targets for cyberattacks. To exploit firmware’s memory corruption vulnerabilities, cybercriminals harvest reusable code gadgets from the large shared library codebase (e.g., uClibc). Unfortunately, unlike their desktop counterparts, embedded systems lack essential computing resources to enforce security hardening techniques. Recently, we have witnessed a surge of software debloating as a new defense mechanism against code-reuse attacks; it erases unused code to significantly diminish the possibilities of constructing reusable gadgets. Because of the single firmware image update style, static library debloating shows promise to fortify embedded systems without compromising performance and forward compatibility. However, static library debloating on stripped binaries (e.g., firmware’s shared libraries) is still an enormous challenge. In this paper, we show that this challenge is not insurmountable for MIPS firmware. We develop a novel system, named uTrimmer, to identify and wipe out unused basic blocks from shared libraries’ binary code, without causing additional runtime overhead or memory consumption. We propose a new method to identify address-taken blocks/functions, which further help us maintain an inter-procedural control flow graph to conservatively include library code that could be potentially used by firmware. By capturing address access patterns for position-independent code, we circumvent the challenge of determining code-pointer targets and safely eliminate unused code. We run uTrimmer to debloat shared libraries for SPEC CPU2017 benchmarks, popular firmware applications (e.g., Apache, BusyBox, and OpenSSL), and a real-world wireless router firmware image. Our experiments show that not only does uTrimmer deliver functional programs, but also it can cut the exposed code surface and eliminate various reusable code gadgets remarkably. uTrimmer’s debloating capability can compete with the static linking results.[CCS] C2C: Fine-grained Configuration-driven System Call Filtering Link
Configuration options allow users to customize application features according to the desired requirements. While the code that corresponds to disabled features is never executed, it still resides in process memory and comprises part of the application’s attack surface, e.g., it can be reused for the construction of exploit code. Automatically reducing the attack surface of disabled application features according to a given configuration is thus a desirable defense-in-depth capability. The intricacies of modern software design and the complexities of popular programming languages, however, introduce significant challenges in automatically deriving the mapping of configuration options to their corresponding application code. In this paper, we present Configuration-to-Code (C2C), a generic configuration-driven attack surface reduction technique that automatically maps configuration options to application code using static code analysis and instrumentation. C2C operates at a fine-grained level by pruning configuration-dependent conditional branches in the control flow graph, allowing the precise identification of a given configuration option’s code at the basic block level. At runtime, C2C reduces the application’s attack surface by filtering any system calls required exclusively by disabled features. Using popular applications, we show how security-critical system calls (such as execve) can be automatically disabled when not needed, limiting an attacker’s vulnerability exploitation capabilities. System call filtering also reduces the exposed attack surface of the underlying Linux kernel, neutralizing 32 additional CVEs (for a total of 88) compared to previous software specialization techniques.[ICSR] Scratching the Surface of ./configure: Learning the Effects of Compile-Time Options on Binary Size and Gadgets Link
Numerous software systems are configurable through compile-time options and the widely used ./configure. However, the combined effects of these options on binary’s non-functional properties (size and attack surface) are often not documented, and or not well understood, even by experts. Our goal is to provide automated support for exploring and comprehending the configuration space (a.k.a., surface) of compile-time options using statistical learning techniques. In this paper, we perform an empirical study on four C-based configurable systems. We measure the variation of binary size and attack surface (by quantifying the number of code reuse gadgets) in over 400 compile-time configurations of a subject system. We then apply statistical learning techniques on top of our build infrastructure to identify how compile-time options relate to non-functional properties. Our results show that, by changing the default configuration, the system’s binary size and gadgets vary greatly (roughly to and to , respectively). Then, we found out that identifying the most influential options can be accurately learned with a small training set, while their relative importance varies across size and attack surface for the same system. Practitioners can use our approach and artifacts to explore the effects of compile-time options in order to take informed decisions when configuring a system with ./configure.[EISA] JSLIM: Reducing the Known Vulnerabilities of JavaScript Application by Debloating Link
As the complexity of software projects increases, more and more developers choose to package various external dependency libraries into software projects to simplify software development. Unfortunately, these introduced dependent libraries are likely to introduce many potential security risks. This phenomenon is called software bloat. One way to handle this increased threat is through software debloating, i.e., the removal of dead code and code corresponding to vulnerabilities introduced from external dependency libraries. In our paper, we proposed JSLIM, an effective vulnerability-aware software debloating system. First, JSLIM processes the public vulnerability information through natural language processing technology, obtains the mapping relationship between the vulnerability and the NPM package, and determines which function in the package causes a specific vulnerability. Then, according to the generated function call graph, determine whether the program calls the method that generates the vulnerability in the dependent library. JSLIM removes the code that isn’t called by the program and uses the sandbox to isolate the code that has vulnerabilities but cannot be removed. We conduct experiments on popular open-source JavaScript applications. The experimental results show that our method removes most of the code related to the known vulnerabilities of the application and effectively prevents attackers from exploiting known vulnerabilities in the NPM package to launch attacks on applications
2021
[Doctoral Thesis] Reducing Software’s Attack Surface With Code Debloating Link
Current software is designed to support a large spectrum of features to meet various users’ needs. However, each end-user only requires a small set of the features to perform required tasks, rendering the software bloated. The bloated code not only hinders optimal execution, but also leads to a larger attack surface. Code debloating technique, which aims to remove the unneeded features’ code, has been proposed to reduce the bloated software’s attack surface. However, there is a fundamental gap between the features needed by a user and the implementation, so it is challenging to identify the code that only supports the needed features. Previous works ask end-users to provide a set of sample inputs to demonstrate how they will use the software and generate a debloated version of the software that only contains the code triggered from running the sample inputs. Unfortunately, software debloated by this approach only supports running given inputs, presenting an unusable notion of debloating: if the debloated software only needs to support a fixed set of inputs, the debloating process is as simple as synthesizing a map from the input to the observed output. We call this Over-debloating Problem. This dissertation focuses on removing software’s unneeded code while providing high robustness for debloated software to run more inputs sharing the same functionalities with the given inputs, with approaches either using heuristics, feature-code map, or code partitioning. First, the thesis presents RAZOR, which first collects executed code for running the software on the given inputs and then uses heuristics to infer non-executed code related to the given inputs. In the end, RAZOR rewrites the software to keep not only the executed code but also the inferred code, which makes the debloated software support running other inputs besides the given ones. However, in RAZOR, the heuristics are syntax-based and can only infer a limited set of related code, which fails on debloating large-scale and complex software such as web browsers. Later, the thesis presents SLIMIUM, which uses a feature-code map to debloat the web browser Chromium at feature-level. In SLIMIUM, the feature-code map is initially created from manual analysis and then it is expanded using static program analysis. However, relying on manual efforts to identify features and relevant code is time-consuming and difficult to be applied to other software. Finally, the thesis presents DEPART, which provides a general approach to debloat large-scale and complex software written with object-oriented programming (OOP) languages without any manual efforts. DEPART performs pure static analysis to automatically partitions a program into distinct groups implementing different features, which is later used for debloating. The key idea of DEPART is to relate the software’s code and types (i.e., defined objects sharing unique behaviors in OOP) by analyzing the code’s various operations. Based on the relations, we propose several rules to describe the conditions that should be satisfied for including types and code into a same group.[TOSEM] Guided Feature Identification and Removal for Resource-constrained Firmware Link
Background. IoT firmware oftentimes incorporates third-party components, such as network-oriented middleware and media encoders/decoders. These components consist of large and mature codebases, shipping with a variety of non-critical features. Feature bloat increases code size, complicates auditing/debugging, and reduces stability. This is problematic for IoT devices, which are severely resource-constrained and must remain operational in the field for years. Unfortunately, identification and complete removal of code related to unwanted features requires familiarity with codebases of interest, cumbersome manual effort, and may introduce bugs. We address these difficulties by introducing PRAT, a system that takes as input the codebase of software of interest, identifies and maps features to code, presents this information to a human analyst, and removes all code belonging to unwanted features. PRAT solves the challenge of identifying feature-related code through a novel form of differential dynamic analysis and visualizes results as user-friendly feature graphs. Evaluation on diverse codebases shows superior code removal compared to both manual feature deactivation and state-of-art debloating tools, and generality across programming languages. Furthermore, a user study comparing PRAT to manual code analysis shows that it can significantly simplify the feature identification workflow.[PLDI] Logical bytecode reduction Link
Reducing a failure-inducing input to a smaller one is challenging for input with internal dependencies because most sub-inputs are invalid. Kalhauge and Palsberg made progress on this problem by mapping the task to a reduction problem for dependency graphs that avoids invalid inputs entirely. Their tool J-Reduce efficiently reduces Java bytecode to 24 percent of its original size, which made it the most effective tool until now. However, the output from their tool is often too large to be helpful in a bug report. In this paper, we show that more fine-grained modeling of dependencies leads to much more reduction. Specifically, we use propositional logic for specifying dependencies and we show how this works for Java bytecode. Once we have a propositional formula that specifies all valid sub-inputs, we run an algorithm that finds a small, valid, failure-inducing input. Our algorithm interleaves runs of the buggy program and calls to a procedure that finds a minimal satisfying assignment. Our experiments show that we can reduce Java bytecode to 4.6 percent of its original size, which is 5.3 times better than the 24.3 percent achieved by J-Reduce. The much smaller output is more suitable for bug reports.[USENIX] LIGHTBLUE: Automatic Profile-Aware Debloating of Bluetooth Stacks Link
The Bluetooth standard is ubiquitously supported by computers, smartphones, and IoT devices. Due to its complexity, implementations require large codebases, which are prone to security vulnerabilities, such as the recently discovered BlueBorne and BadBluetooth attacks. While defined by the standard, most of the Bluetooth functionality, as defined by different Bluetooth profiles, is not required in the common usage scenarios. Starting from this observation, we implement LIGHTBLUE, a framework performing automatic, profile-aware debloating of Bluetooth stacks, allowing users to automatically minimize their Bluetooth attack surface by removing unneeded Bluetooth features. L IGHT B LUE starts with a target Bluetooth application, detects the associated Bluetooth profiles, and applies a combination of control-flow and data-flow analysis to remove unused code within a Bluetooth host code. Furthermore, to debloat the Bluetooth firmware, LIGHTBLUE extracts the used Host Controller Interface (HCI) commands and patches the HCI dispatcher in the Bluetooth firmware automatically, so that the Bluetooth firmware avoids processing unneeded HCI commands. We evaluate LIGHTBLUE on four different Bluetooth hosts and three different Bluetooth controllers. Our evaluation shows that LIGHTB LUE achieves between 32% and 50% code reduction in the Bluetooth host code and between 57% and 83% HCI command reduction in the Bluetooth firmware. This code reduction leads to the prevention of attacks[ArXiv] Muzeel: A Dynamic JavaScript Analyzer for Dead Code Elimination in Today’s Web Link
JavaScript contributes to the increasing complexity of today’s web. To support user interactivity and accelerate the development cycle, web developers heavily rely on large general-purpose third-party JavaScript libraries. This practice increases the size and the processing complexity of a web page by bringing additional functions that are not used by the page but unnecessarily downloaded and processed by the browser. In this paper, an analysis of around 40,000 web pages shows that 70% of JavaScript functions on the median page are unused, and the elimination of these functions would contribute to the reduction of the page size by 60%. Motivated by these findings, we propose Muzeel (which means eliminator in Arabic); a solution for eliminating JavaScript functions that are not used in a given web page (commonly referred to as dead code). Muzeel extracts all of the page event listeners upon page load, and emulates user interactions using a bot that triggers each of these events, in order to eliminate the dead code of functions that are not called by any of these events. Our evaluation results spanning several Android mobile phones and browsers show that Muzeel speeds up the page load by around 30% on low-end phones, and by 25% on high-end phones under 3G network. It also reduces the speed index (which is an important user experience metric) by 23% and 21% under the same network on low-end, and high-end phones, respectively. Additionally, Muzeel reduces the overall download size while maintaining the visual content and interactive functionality of the pages.[EMSE] An Exploratory Study on Dead Methods in Open-source Java Desktop Applications Link
Background. Dead code is a code smell. It can refer to code blocks, fields, methods, etc. that are unused and/or unreachable. Empirical evidence shows that dead code harms source code comprehensibility and maintainability in software applications. Researchers have gathered little empirical evidence on the spread of dead code in software applications. Moreover, we know little about the role of this code smell during software evolution. Aims. Our goal is to gather preliminary empirical evidence on the spread and evolution of dead methods in open-source Java desktop applications. Given the exploratory nature of our investigation, we believe that its results can justify more resource and time-demanding research on dead methods. Method. We quantitatively analyzed the commit histories of 13 open-source Java desktop applications, whose software projects were hosted on GitHub, for a total of 1,044 commits. We focused on dead methods detected at a commit level to investigate the spread and evolution of dead methods in the studied applications. The perspective of our explorative study is that of both practitioners and researchers. Results. The most important take-away results can be summarized as follows: (i) dead methods seems to affect open-source Java desktop applications; (ii) dead methods generally survive for a long time, in terms of commits, before being “buried” or “revived;” (iii) dead methods are rarely revived; and (iv) most dead methods are dead since the creation of the corresponding methods. Conclusions. We conclude that developers should carefully handle dead methods in open-source Java desktop applications since this code smell is harmful, widespread, rarely revived, and survives for a long time in software applications. Our results also justify future research on dead methods.[ICSME] The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application Link
Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency classes are necessary for an application to function correctly from those that could be safely removed. To do so, we conduct a case study on a real-world commercial Java application. Our study shows that the tools we used were able to correctly identify a considerable amount of redundant code, which could be removed without altering the results of the existing application tests. One of the redundant classes turned out to be (formerly) vulnerable, confirming that this technique has the potential to be applied for hardening purposes. However, by manually reviewing the results of our experiments, we observed that none of the tools can handle a widely used default mechanism for dynamic class loading.[ArXiv] Stubbifier: Debloating Dynamic Server-Side JavaScript Applications Link
JavaScript is an increasingly popular language for server-side development, thanks in part to the Node.js runtime environment and its vast ecosystem of modules. With the Node.js package manager npm, users are able to easily include external modules as dependencies in their projects. However, npm installs modules with all of their functionality, even if only a fraction is needed, which causes an undue increase in code size. Eliminating this unused functionality from distributions is desirable, but the sound analysis required to find unused code is difficult due toJavaScript’s extreme dynamicity. We present a fully automatic technique that identifies unused code by constructing static or dynamic call graphs from the application’s tests, and replacing code deemed unreachable with either file- or function-level stubs. If a stub is called, it will fetch and execute the original code on-demand, thus relaxing the requirement that the call graph be sound. The technique also provides an optional guarded execution mode to guard application against injection vulnerabilities in untested code that resulted from stub expansion. This technique is implemented in an open source tool called Stubbifier, which supports the ECMAScript 2019 standard. In an empirical evaluation on 15 Node.js applications and 75 clients of these applications, Stubbifier reduced application size by 56% on average while incurring only minor performance overhead. The evaluation also shows that Stubbifier’s guarded execution mode is capable of preventing several known injection vulnerabilities that are manifested in stubbed-out code. Finally, Stubbifier can work alongside bundlers, popular JavaScript tools for bundling an application with its dependencies. For the considered subject applications, we measured an average size reduction of 37% in bundled distributions.[arXiv] Lightweight, Multi-Stage, Compiler-Assisted Application Specialization Link
Program debloating aims to enhance the performance and reduce the attack surface of bloated applications. Several techniques have been recently proposed to specialize programs. These approaches are either based on unsound strategies or demanding techniques, leading to unsafe results or a high overhead debloating process. In this paper, we address these limitations by applying partial-evaluation principles to generate specialized applications. Our approach relies on a simple observation that an application typically consists of configuration logic, followed by the main logic of the program. The configuration logic specifies what functionality in the main logic should be executed. LMCAS performs partial interpretation to capture a precise program state of the configuration logic based on the supplied inputs. LMCAS then applies partial-evaluation optimizations to generate a specialized program by propagating the constants in the captured partial state, eliminating unwanted code, and preserving the desired functionalities. Our evaluation of LMCAS on commonly used benchmarks and real-world applications shows that it successfully removes unwanted features while preserving the functionality and robustness of the deblated programs, runs faster than prior tools, and reduces the attack surface of specialized programs. LMCAS runs 1500x, 4.6x, and 1.2x faster than the state-of-the-art debloating tools CHISEL, RAZOR, and OCCAM, respectively; achieves 25% reduction in the binary size; reduces the attack surface of code-reuse attacks by removing 51.7% of the total gadgets and eliminating 83% of known CVE vulnerabilities[arXiv] The Used, the Bloated, and the Vulnerable: Reducing the Attack Surface of an Industrial Application Link
Software reuse may result in software bloat when significant portions of application dependencies are effectively unused. Several tools exist to remove unused (byte)code from an application or its dependencies, thus producing smaller artifacts and, potentially, reducing the overall attack surface. In this paper we evaluate the ability of three debloating tools to distinguish which dependency classes are necessary for an application to function correctly from those that could be safely removed. To do so, we conduct a case study on a real-world commercial Java application. Our study shows that the tools we used were able to correctly identify a considerable amount of redundant code, which could be removed without altering the results of the existing application tests. One of the redundant classes turned out to be (formerly) vulnerable, confirming that this technique has the potential to be applied for hardening purposes. However, by manually reviewing the results of our experiments, we observed that none of the tools can handle a widely used default mechanism for dynamic class loading.[TSE] Evolving JavaScript Code to Reduce Load Time Link
JavaScript is one of the most used programming languages for front-end development of Web applications. The increase in complexity of front-end features brings concerns about performance, especially the load and execution time of JavaScript code. In this paper, we propose an evolutionary program improvement technique to reduce the size of JavaScript programs and, therefore, the time required to load and execute them in Web applications. To guide the development of this technique, we performed an experimental study to characterize the patches applied to JavaScript programs to reduce their size while keeping the functionality required to pass all test cases in their test suites. We applied this technique to 19 JavaScript programs varying from 92 to 15,602 LOC and observed reductions from 0.2 to 73.8 percent of the original code, as well as a relationship between the quality of a program’s test suite and the ability to reduce the size of its source code.[ENASE] BloatLibD: Detecting Bloat Libraries in Java Applications Link
Third-party libraries (TPLs) provide ready-made implementations of various software functionalities and are frequently used in software development. However, as software development progresses through various iteractions, there often remains an unused set of TPLs referenced in the application’s distributable. These unused TPLs become a prominent source of software bloating and are responsible for excessive consumption of resources, such as CPU cycles, memory, and mobile devices’ battery-usage. Thus, the identification of such bloat-TPLs is essential. We present a rapid, storage-efficient, obfuscation-resilient method to detect the bloat TPLs. Our approach’s novel aspects are i) Computing a vector representation of a .class file using a model that we call Jar2Vec. The Jar2Vec model is trained using the Paragraph Vector Algorithm. ii) Before using it for training the Jar2Vec models, a .class file is converted to a normalized form via semantics-preserving transformations. iii) A Bloated Library Detector (BloatLibD) developed and tested with 27 different Jar2Vec models. These models were trained using different parameters and >30000 .class files taken from >100 different Java libraries available at MavenCentral.com. BloatLibD achieves an accuracy of 99% with an F1 score of 0.968 and outperforms the existing tools, viz., LibScout, LiteRadar, and LibD with an accuracy improvement of 74.5%, 30.33%, and 14.1%, respectively. Compared with LibD, BloatLibD achieves a response time improvement of 61.37% and a storage reduction of 87.93%. Our program artifacts are available at https://bit.ly/2WFALXf.[ESEC/FSE] A Longitudinal Analysis of Bloated Java Dependencies Link
Motivated by the negative impact of software bloat on security, performance, and maintenance, several works have proposed techniques to remove bloat. However, no work has analyzed how bloat evolves over time or how it emerges in software projects. In particular, a concern when removing bloated code is to know if it might be useful in subsequent versions of the application. In this work, we study the evolution and emergence of bloated Java dependencies. These are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,515 versions of Maven dependency trees. We empirically demonstrate the constant increase of the amount of bloated dependencies over time. A key finding of our analysis is that 89.2% of the direct dependencies that are bloated remain bloated in all subsequent versions of the studied projects. This empirical evidence suggests that developers can safely remove a bloated dependency. We further report novel insights regarding the unnecessary maintenance efforts induced by bloat, we identify that 22% of dependency updates are made on bloated dependencies.[EMSE] A comprehensive study of bloated dependencies in the Maven ecosystem Link
Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application’s code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challenges related to dependency management. In this paper, we propose an original study of one such challenge: the emergence of bloated dependencies. Bloated dependencies are libraries that are packaged with the application’s compiled code but that are actually not necessary to build and run the application. They artificially grow the size of the built binary and increase maintenance effort. We propose DepClean, a tool to determine the presence of bloated dependencies in Maven artifacts. We analyze 9,639 Java artifacts hosted on Maven Central, which include a total of 723,444 dependency relationships. Our key result is as follows: 2.7% of the dependencies directly declared are bloated, 15.4% of the inherited dependencies are bloated, and 57% of the transitive dependencies of the studied artifacts are bloated. In other words, it is feasible to reduce the number of dependencies of Maven artifacts to 1/4 of its current count. Our qualitative assessment with 30 notable open-source projects indicates that developers pay attention to their dependencies when they are notified of the problem. They are willing to remove bloated dependencies: 21/26 answered pull requests were accepted and merged by developers, removing 140 dependencies in total: 75 direct and 65 transitive.[TSE] TRIMMER: An Automated System for Configuration-based Software Debloating Link
Software bloat has negative implications for security, reliability, and performance. To counter bloat, we propose TRIMMER, a static analysis-based system for pruning unused functionality. TRIMMER removes code that is unused with respect to user-provided command-line arguments and application-specific configuration files (no low-level source annotations required). TRIMMER uses concrete memory tracking and a custom inter-procedural constant propagation analysis that facilitates dead code elimination. Our system supports both context-sensitive and context-insensitive constant propagation. We show that context-sensitive constant propagation is important for effective software pruning in most applications. We introduce sparse constant propagation that performs constant propagation only for configuration-hosting variables and show that it performs better (higher code size reductions) compared to constant propagation for all program variables. Overall, our results show that TRIMMER reduces binary sizes for real-world programs with reasonable analysis times. Across 20 evaluated programs, we observe a mean binary size reduction of 22.7% and a maximum reduction of 62.7%. For 5 programs, we observe performance speedups ranging from 5% to 53%. Moreover, we show that winnowing software applications can reduce the program attack surface by removing code that contains exploitable vulnerabilities. We find that debloating using TRIMMER removes CVEs in 4 applications.[CODASPY] Code Specialization through Dynamic Feature Observation Link
Modern software (both programs and libraries) provides large amounts of functionality, vastly exceeding what is needed fora single given task. This additional functionality results in an increased attack surface: first, an attacker can use bugs in the unnecessary functionality to compromise the software,and second, defenses such as control-flow integrity (CFI) relyon conservative analyses that gradually lose precision with growing code size.Removing unnecessary functionality is challenging as the debloating mechanism must remove as much code as possi-ble, while keeping code required for the program to function.Unfortunately, most software does not come with a formal description of the functionality that it provides, or even a mapping between functionality and code. We therefore require a mechanism that—given a set of representable inputs and con-figuration arameters—automatically infers the underlying functionality, and discovers all reachable code corresponding to this functionality.We propose Ancile, a code specialization technique that leverages fuzzing (based on user provided seeds) to discover the code necessary to perform the functionality required bythe user. From this, we remove all unnecessary code and tailor indirect control-flow transfers to the minimum necessary foreach location, vastly reducing the attack surface. We evaluate Ancile using real-world software known to have a large attack surface, including image libraries and network daemons like ng-inx. For example, our evaluation shows that Ancile can remove up to 93.66% of indirect call transfer targets and up to 78%of functions in libtiff’stiffcrop utility, while still maintaining its original functionality.
2020
[ICSE-SEIP] DECAF: Automatic, Adaptive De-bloating and Hardening of COTS Firmware Link
Once compromised, server firmware can surreptitiously and permanently take over a machine and any stack running thereon, with no hope for recovery, short of hardware-level intervention. To make things worse, modern firmware contains millions of lines of unnecessary code and hundreds of unnecessary modules as a result of a long firmware supply chain designed to optimize time-to-market and cost, but not security. As a result, off-the-shelf motherboards contain large, unnecessarily complex, closed-source vulnerability surfaces that can completely and irreversibly compromise systems. In this work, we address this problem by dramatically and automatically reducing the vulnerability surface. DECAF is an extensible platform for automatically pruning a wide class of commercial UEFI firmware. DECAF intelligently runs dynamic iterative surgery on UEFI firmware to remove a maximal amount of code with no regressive effects on the functionality and performance of higher layers in the stack (OS, applications). DECAF has successfully pruned over 70% of unnecessary, redundant, reachable firmware in leading server-grade motherboards with no effect on the upper layers, and increased resulting system performance and boot times.[ICSE-SEIP] Piranha: Reducing Feature Flag Debt at Uber Link
Feature flags are commonly used in mobile app development and can introduce technical debt related to deleting their usage from the codebase. This can adversely affect the overall reliability of the apps and increase their maintenance complexity. Reducing this debt without imposing additional overheads on the developers necessitates the design of novel tools and automated workflows. In this paper, we describe the design and implementation of Piranha, an automated code refactoring tool which is used to automatically generate differential revisions (a.k.a diffs) to delete code corresponding to stale feature flags. Piranha takes as input the name of the flag, expected treatment behavior, and the name of the flag’s author. It analyzes the ASTs of the program to generate appropriate refactorings which are packaged into a diff. The diff is assigned to the author of the flag for further processing, who can land it after performing any additional refactorings. We have implemented Piranha to delete code in Objective-C, Java, and Swift programs, and deployed it to handle stale flags in multiple Uber apps. We present our experiences with the deployment of Piranha from Dec 2017 to May 2019, including the following highlights: (a) generated code cleanup diffs for 1381 flags (17% of total flags), (b) 65% of the diffs landed without any changes, (c) over 85% of the generated diffs compile and pass tests successfully, (d) around 80% of the diffs affect more than one file, (e) developers process more than 88% of the generated diffs, (f) 75% of the generated diffs are processed within a week, and (g) Piranha diffs have been interacted with by ~200 developers across Uber. Piranha is available as open source at https://github.com/uber/piranha.[FSE] JShrink: In-Depth Investigation into Debloating Modern JavaApplications Link
Modern software is bloated. Demand for new functionality has led developers to include more and more features, many of which become unneeded or unused as software evolves. This phenomenon, known as software bloat, results in software consuming more resources than it otherwise needs to. How to effectively and automatically debloat software is a long-standing problem in software engineering. Various debloating techniques have been proposed since the late 1990s. However, many of these techniques are built upon pure static analysis and have yet to be extended and evaluated in the context of modern Java applications where dynamic language features are prevalent. To this end, we develop an end-to-end bytecode debloating framework called JShrink. It augments traditional static reachability analysis with dynamic profiling and type dependency analysis and renovates existing bytecode transformations to account for new language features in modern Java. We highlight several nuanced technical challenges that must be handled properly and examine behavior preservation of debloated software via regression testing. We find that (1) JShrink is able to debloat our real-world Java benchmark suite by up to 47% (14% on average); (2) accounting for dynamic language features is indeed crucial to ensure behavior preservation—reducing 98% of test failures incurred by a purely static equivalent, Jax, and 84% for ProGuard; and (3) compared with purely dynamic approaches, integrating static analysis with dynamic profiling makes the debloated software more robust to unseen test executions—in 22 out of 26 projects, the debloated software ran successfully under new tests.[APSEC] Don’t Trust Me, Test Me: 100% Code Coverage for a 3rd-party Android App Link
The incompleteness of 3rd-party app testing is an accepted fact in Software Engineering. This issue makes it impossible to verify the app functionality and to confirm its safety to the end-user. To solve this problem, enterprises developed strict policies. A company, willing to use modern apps, may perform an expensive security analysis, rely on trust or forbid the app. These strategies may lead companies to high direct and indirect spending with no guarantee of safety. In this work, we present a novel approach, called Dynamic Binary Shrinking, that allows a user to review app functionality and leave only tested code. The shrunk app produces 100% instruction coverage on observed behaviors and in this way guarantees the absence of unexplored, and therefore, potentially malicious code. On our running examples, we demonstrate that apps use less than 20% of the codebase. We developed an approach and the ACVCut tool to shrink Android apps towards the executed code.[Digital Threats: Research and Practice] Large-scale Debloating of Binary Shared Libraries Link
Developers nowadays have access to an arsenal of toolkits and libraries for rapid application prototyping. However,when an application loads a library, the entirety of that library’s code is mapped into the process address space,even if only a single function is actually needed. The unused portion isbloatthat can negatively impact softwaredefenses by unnecessarily inflating their overhead or increasing the attack surface. In this paper, we investigatewhether debloating is possible and practical at the binary level. To this end, we presentNibbler: a system thatidentifies and erases unused functions within dynamic shared libraries. Nibbler works in tandem with defenseslike continuous code re-randomization and control-flow integrity, enhancing them without incurring additionalrun-time overhead. We developed and tested a prototype of Nibbler on x86-64 Linux; Nibbler reduces the size ofshared libraries and the number of available functions, for real-world binaries and the SPEC CINT2006 suite,by up to 56% and 82%, respectively. We also demonstrate that Nibbler benefits defenses by showing that: (i) itimproves the deployability of a continuous re-randomization system for binaries, namely Shuffler, by increasingits efficiency by 20%, and (ii) it improves certain fast, but coarse and context-insensitive control-flow integrityschemes by reducing the number of gadgets reachable through indirect branch instructions by 75% and 49%, onaverage. Lastly, we apply Nibbler on≈30K C/C++ binaries and≈5K unique dynamic shared libraries (i.e., almostthe complete set of the Debiansiddistribution), as well as on9official Docker images (with millions of downloadsin Docker Hub), reporting entrancing findings regarding code bloat at large.[CCS] Slimium: Debloating the Chromium Browser with Feature Subsetting Link
Today, a web browser plays a crucial role in offering a broad spectrum of web experiences. The most popular browser, Chromium, has become an extremely complex application to meet ever-increasing user demands, exposing unavoidably large attack vectors due to its large code base. Code debloating attracts attention as a means of reducing such a potential attack surface by eliminating unused code. However, it is very challenging to perform sophisticated code removal without breaking needed functionalities because Chromium operates on a large number of closely connected and complex components, such as a renderer and JavaScript engine. In this paper, we present Slimium, a debloating framework for a browser (i.e., Chromium) that harnesses a hybrid approach for a fast and reliable binary instrumentation. The main idea behind Slimium is to determine a set of features as a debloating unit on top of a hybrid (i.e., static, dynamic, heuristic) analysis, and then leverage feature subsetting to code debloating. It aids in i) focusing on security-oriented features, ii) discarding unneeded code simply without complications, and iii)~reasonably addressing a non-deterministic path problem raised from code complexity. To this end, we generate a feature-code map with a relation vector technique and prompt webpage profiling results. Our experimental results demonstrate the practicality and feasibility of Slimium for 40 popular websites, as on average it removes 94 CVEs (61.4%) by cutting down 23.85 MB code (53.1%) from defined features (21.7% of the whole) in Chromium.[SecureComm] Hecate: Automated Customizationof Program and Communication Featuresto Reduce Attack Surfaces Link
Customizing program and communication features is a commonly adopted strategy to counter security threats that arise from rapid inflation of software features. In this paper, we propose Hecate, a novel framework that leverages dynamic execution and trace to create customized, self-contained programs, in order to minimize potential attack surface. It automatically identifies program features (i.e., independent, well-contained operations, utilities, or capabilities) relating to application binaries and their communication functions, tailors and eliminates the features to create customized program binaries in accordance with user needs, in a fully unsupervised fashion. Hecate makes novel use of deep learning to identify program features and their constituent functions by mapping dynamic instruction trace to functions in the binaries. It enables us to modularize program features and efficiently create customized program binaries at large scale. We implement a prototype of Hecate using a number of open source tools such as DynInst and TensorFlow. Evaluation using real-world executables including OpenSSL and LibreOffice demonstrates that Hecate can create a wide range of customized binaries for diverse feature requirements, with the highest accuracy up to 96.28% for feature/function identification and up to 67% reduction of program attack surface.[PLDI] BlankIt Library Debloating: Getting What You Want Instead of Cutting What You Don’t Link
Modern software systems make extensive use of libraries derived from C and C++. Because of the lack of memory safety in these languages, however, the libraries may suffer from vulnerabilities, which can expose the applications to potential attacks. For example, a very large number of return-oriented programming gadgets exist in glibc that allow stitching together semantically valid but malicious Turing-complete and -incomplete programs.While CVEs get discovered and often patched and remedied, such gadgets serve as building blocks of future undiscovered attacks, opening an ever-growing set of possibilities for generating malicious programs. Thus, significant reduction in the quantity and expressiveness (utility) of such gadgets for libraries is an important problem.In this work, we propose a new approach for handling an application’s library functions that focuses on the principle of getting only what you want. This is a significant departure from the current approaches that focus oncutting what is unwanted. Our approach focuses on activating/deactivating library functions on demand in order to reduce the dynamically linked code surface, so that the possibilities of constructing malicious programs diminishes substantially. The key idea is to load only the set of library functions that will be used at each library call site within the application at runtime. This approach of demand-driven loading relies on an input-aware oracle that predicts a near-exact set of library functions needed at a given call site during the execution. The predicted functions are loaded just in time and unloaded on return.We present a decision-tree based predictor, which acts as an oracle, and an optimized runtime system, which works directly with library binaries like GNU libc and libstdc++. We show that on average, the proposed scheme cuts the exposed code surface of libraries by 97.2%, reduces ROP gadgets present in linked libraries by 97.9%, achieves a prediction accuracy in most cases of at least 97%, and adds a runtime overhead of 18% on all libraries (16% for glibc, 2% for others) across all benchmarks of SPEC 2006. Further, we demonstrate BlankIt on two real-world applications, sshd and nginx, with a high amount of debloating and low overheads.[NIER] Program Debloating via Stochastic Optimization Link
Programs tend to provide a broad range of features, and different typologies of users tend to use only a subset of these features. For this reason, and because unnecessary functionality can be harmful in terms of both performance and security, recently we have witnessed an increasing interest in debloating techniques—techniques for reducing the size of a program by eliminating (possibly) unneeded features. Most existing debloating techniques tend to focus on program-size reduction alone, by producing a reduced program that behaves correctly for a provided set of inputs. Although effective with respect to their stated goal, these approaches ignore other important aspects of debloating and ultimately solve a simplified formulation of the problem. We believe that program debloating is a multifaceted issue, in which different, possibly conflicting goals must be considered and suitably accounted for. In this spirit, we propose a general approach that allows for formulating program debloating as a multi-objective optimization problem. Given a program to be debloated, our approach lets users specify (1) a usage profile for the program (i.e., a set of inputs with associated usage probabilities), (2) the factors of interest for the debloating task at hand, and (3) the relative importance of these factors. Based on this information, the approach defines a suitable objective function, so as to be able to associate a score to every possible reduced program, and tries to generate an optimal solution, that is, one that maximizes the objective function. To provide concrete evidence of the usefulness of our approach, we also present and evaluate Debop, a specific instance of the approach that considers three objectives: size reduction, attack surface reduction, and generality (i.e., extent to which the reduced program behaves correctly for the inputs in p’s usage profile). Our results, albeit still preliminary, are promising, in that they show that our approach can be effective in generating debloated programs that achieve good trade-offs between the different factors involved in the debloating process. Our results also provide insights on the performance of our general approach when compared to a specialized single-goal technique.[TOSEM] Is Static Analysis Able to Identify Unnecessary Source Code? Link
Grown software systems often contain code that is not necessary anymore. Such unnecessary code wastesresources during development and maintenance, for example, when preparing code for migration or certifi-cation. Running a profiler may reveal code that is not used in production, but it is often time-consuming toobtain representative data in this way. We investigate to what extent a static analysis approach, which is based on code stability and code centrality,is able to identify unnecessary code and whether its recommendations are relevant in practice. To study thefeasibility and usefulness of our approach, we conducted a study involving 14 open-source and closed-source software systems. As there is no perfect oracle for unnecessary code, we compared recommendations forunnecessary code with historical cleanups, runtime usage data, and feedback from 25 developers of fivesoftware projects. Our study shows that recommendations generated from stability and centrality informationpoint to unnecessary code that cannot be identified by dead code detectors. Developers confirmed that 34%of recommendations were indeed unnecessary and deleted 20% of the recommendations shortly after ourinterviews. Overall, our results suggest that static analysis can provide quick feedback on unnecessary codeand is useful in practice.[TSE] MoMIT: Porting a JavaScript Interpreter on a Quarter Coin Link
The Internet of Things (IoT) is a network of physical, connected devices providing services through private networks and the Internet. The devices connect through the Internet to Web servers and other devices. One of the popular programming languages for communicating Web pages and Web apps is JavaScript (JS). Hence, the devices would benefit from JS apps. However, porting JS apps to the many IoT devices, e.g., System-on-a-Chip (SoCs) devices (e.g., Arduino Uno), is challenging because of their limited memory, storage, and CPU capabilities. Also, some devices may lack hardware/software capabilities for running JS apps “as is”. Thus, we propose MoMIT, a multiobjective optimization approach to miniaturize JS apps to run on IoT devices. We implement MoMIT using three different search algorithms. We miniaturize a JS interpreter and measure the characteristics of 23 apps before/after applying MoMIT. We find reductions of code size, memory usage, and CPU time of 31%, 56%, and 36%, respectively (medians). We show that MoMIT allows apps to run on up to two additional devices in comparison to the original JS interpreter.[USENIX] Mininode: Reducing the Attack Surface of Node.js Applications Link
JavaScript has gained traction as a programming language that qualifies for both the client-side and the server-side logic of applications. A new ecosystem of server-side code written in JavaScript has been enabled by Node.js, the use of the V8 JavaScript engine and a collection of modules that provide various core functionality. Node.js comes with its package manager, called NPM, to handle the dependencies of modern applications, which allow developers to build Node.js applications with hundreds of dependencies on other modules.In this paper, we present Mininode, a static analysis tool for Node.js applications that measures and removes unused code and dependencies. Our tool can be integrated into the building pipeline of Node.js applications to produce applications with significantly reduced attack surface. We analyzed 672k Node.js applications and reported the current state of code bloating in the server-side JavaScript ecosystem. We leverage a vulnerability database to identify 1,660 vulnerable packages that are loaded from 119,433 applications as dependencies. Mininode is capable of removing 2,861 of these vulnerable dependencies. The complex expressiveness and the dynamic nature of the JavaScript language does not always allow us to statically resolve the dependencies and usage of modules. To evaluate the correctness of our reduction, we run Mininode against 37k Node.js applications that have unit tests and reduce correctly 95.4% of packages. Mininode was able to restrict access to the built-in fs and net modules in 79.4%and 96.2% of the reduced applications respectively.
2019
[Master Thesis] Static debloating of R applications: a case study Link
In the embedded domain, industrial sectors (i.e., automotive industry, avionics) are undergoing radical changes. They broadly adopt commodity hardware and move away from special-purpose control units. During this transition, heterogeneous software components are consolidated to run on commodity operating systems. To efficiently consolidate such components, a modular encapsulation of common functionality into reusable binary files (i.e., shared libraries) is essential. However, shared libraries are often unnecessarily large as they entail a lot of generic functionality that is not required in a narrowly defined scenario. As the source code of proprietary components is often unavailable and the industry is heading towards binary-only distribution, we propose an approach towards lightweight binary tailoring. As demonstrated in the evaluation, lightweight binary tailoring effectively reduces the amount of code in all shared libraries on a Linux-based system by 63 percent and shrinks their files by 17 percent. The reduction in size is beneficial to cut down costs (e.g., lower storage and memory footprint) and eases code analyses that are necessary for code audits.[TECS] Honey, I Shrunk the ELFs: Lightweight Binary Tailoring of Shared Libraries Link
In the embedded domain, industrial sectors (i.e., automotive industry, avionics) are undergoing radical changes. They broadly adopt commodity hardware and move away from special-purpose control units. During this transition, heterogeneous software components are consolidated to run on commodity operating systems. To efficiently consolidate such components, a modular encapsulation of common functionality into reusable binary files (i.e., shared libraries) is essential. However, shared libraries are often unnecessarily large as they entail a lot of generic functionality that is not required in a narrowly defined scenario. As the source code of proprietary components is often unavailable and the industry is heading towards binary-only distribution, we propose an approach towards lightweight binary tailoring. As demonstrated in the evaluation, lightweight binary tailoring effectively reduces the amount of code in all shared libraries on a Linux-based system by 63 percent and shrinks their files by 17 percent. The reduction in size is beneficial to cut down costs (e.g., lower storage and memory footprint) and eases code analyses that are necessary for code audits.[MOBILESoft] Identifying Features of Android Apps from Execution Traces Link
Understanding a program and the features it provides is essential for a number of software engineering tasks, including refactoring, debugging, and debloating. Unfortunately, program understanding and feature identification are also extremely challenging and time consuming activities. To support developers when they perform these activities, we propose FeatureFinder, an approach that aims to identify and understand the features of a program by analyzing its executions. Specifically, we defined our approach for Android apps, given their widespread use. Given an app, FeatureFinder generates traces that capture different properties of the app executions through instrumentation. It then leverages the user events in the trace to split the trace into segments, and clusters these segments based on their characteristics, using a classifier. Each identified cluster indicates a feature exercised in the execution. Finally, FeatureFinder suitably labels each identified cluster, so as to provide a human-readable description of the corresponding feature. We performed a case study in which we used FeatureFinder to identify features in two executions of the K-9 MAIL app. In the study, FeatureFinder was able to correctly identify 6 of the 11 manually identified features, which we believe is an encouraging result and motivates further research.[ACSAC] Nibbler: Debloating Binary Shared Libraries Link
Developers today have access to an arsenal of toolkits and libraries for rapid application prototyping. However, when an application loads a library, the entirety of that library’s code is mapped into the address space, even if only a single function is actually needed.The unused portion is bloat that can negatively impact software defenses by unnecessarily inflating their overhead or increasing their attack surface. Recent work has explored debloating as a way of alleviating the above problems, when source code is available.In this paper, we investigate whether debloating is possible and practical at the binary level. To this end, we presentNibbler: a system that identifies and erases unused functions within shared libraries. Nibbler works in tandem with defenses like continuous code re-randomization and control-flow integrity, enhancing them without incurring additional run-time overhead. We developed and tested a prototype of Nibbler on x86-64 Linux; Nibbler reduces the size of shared libraries and the number of available functions, for real-world binaries and the SPEC CINT2006 suite, by up to 56%and 82%, respectively. We also demonstrate that Nibbler benefits defenses by showing that: (i) it improves the deployability of a continuous re-randomization system for binaries, namely Shuffler,by increasing its efficiency by 20%, and (ii) it improves certain fast,but coarse and context-insensitive control-flow integrity schemes by reducing the number of gadgets reachable through returns and indirect calls by 75% and 49% on average.[USENIX] Less is More: Quantifying the Security Benefits of Debloating Web Applications Link
As software becomes increasingly complex, its attack surface expands enabling the exploitation of a wide range of vulnerabilities. Web applications are no exception since modern HTML5 standards and the ever-increasing capabilities of JavaScript are utilized to build rich web applications, often subsuming the need for traditional desktop applications. One possible way of handling this increased complexity is through the process of software debloating, i.e., the removal not only of dead code but also of code corresponding to features that a specific set of users do not require. Even though debloating has been successfully applied on operating systems, libraries, and compiled programs, its applicability on web applications has not yet been investigated. In this paper, we present the first analysis of the security benefits of debloating web applications. We focus on four popular PHP applications and we dynamically exercise them to obtain information about the server-side code that executes as a result of client-side requests. We evaluate two different debloating strategies (file-level debloating and function-level debloating) and we show that we can produce functional web applications that are 46% smaller than their original versions and exhibit half their original cyclomatic complexity. Moreover, our results show that the process of debloating removes code associated with tens of historical vulnerabilities and further shrinks a web application’s attack surface by removing unnecessary external packages and abusable PHP gadgets.[FEAST] Bloat Factors and Binary Specialization Link
Code bloating in software has been proven to be pervasive in recent research. However, each study provides a different approach to measure bloat. In this paper, we propose a system of metrics to effectively quantify bloat in binaries called bloat factors. Subsequently, we conducted an extensive study to calculate bloat factors for over 3000 Linux applications and 896 shared libraries. Using these metrics as pointers, we introduce a static approach to perform debloating for closed-source binaries by creating corresponding specialized versions to cater for a specific program requirements. We evaluated our debloating technique on large programs and achieved a maximum code reduction of 19.7%.[DIMVA] BinTrimmer: Towards Static Binary Debloating Through Abstract Interpretation Link
The increasing complexity of modern programs motivates software engineers to often rely on the support of third-party libraries. Although this practice allows application developers to achieve a compelling time-to-market, it often makes the final product bloated with conspicuous chunks of unused code. Other than making a program unnecessarily large, this dormant code could be leveraged by willful attackers to harm users. As a consequence, several techniques have been recently proposed to perform program debloating and remove (or secure) dead code from applications. However, state-of-the-art approaches are either based on unsound strategies, thus producing unreliable results, or pose too strict assumptions on the program itself. In this work, we propose a novel abstract domain, called Signedness-Agnostic Strided Interval, which we use as the cornerstone to design a novel and sound static technique, based on abstract interpretation, to reliably perform program debloating. Throughout the paper, we detail the specifics of our approach and show its effectiveness and usefulness by implementing it in a tool, called BinTrimmer, to perform static program debloating on binaries. Our evaluation shows that BinTrimmer can remove up to 65.6% of a library’s code and that our domain is, on average, 98% more precise than the related work.[ICSE] Poster: Recommending Unnecessary Source Code Based on Static Analysis Link
Grown software systems often contain code that is not necessary anymore. Unnecessary code wastes resources during development and maintenance, for example, when preparing code for migration or certification. Running a profiler may reveal code that is not used in production, but it is often time-consuming to obtain representative data this way. We investigate to what extent a static analysis approach which is based on code stability and code centrality, is able to identify unnecessary code and whether its recommendations are relevant in practice. To study the feasibility and usefulness of our static approach, we conducted a study involving 14 open-source and closed-source software systems. As there is no perfect oracle for unnecessary code, we compared recommendations of our approach with historical cleanup actions, runtime usage data, and feedback from 25 developers of 5 software projects. Our study shows that recommendations generated from stability and centrality information point to unnecessary code. Our results suggest that static analysis can provide quick feedback on unnecessary code that is useful in practice.[WWW] Unnecessarily identifiable: Quantifying the fingerprintability of browser extensions due to bloat Link
In this paper, we investigate to what extent the page modifications that make browser extensions fingerprintable are necessary for their operation. We characterize page modifications that are completely unnecessary for the extension’s functionality as extension bloat. By analyzing 58,034 extensions from the Google Chrome store, we discovered that 5.7% of them were unnecessarily identifiable because of extension bloat. To protect users against unnecessary extension fingerprinting due to bloat, we describe the design and implementation of an in-browser mechanism that provides coarse-grained access control for extensions on all websites. The proposed mechanism and its built-in policies, does not only protect users from fingerprinting, but also offers additional protection against malicious extensions exfiltrating user data from sensitive websites.[arXiv] Binary Debloating for Security via Demand Driven Loading Link
Modern software systems heavily use C/C++ based libraries. Because of the weak memory model of C/C++, libraries may suffer from vulnerabilities which can expose the applications to potential attacks. For example, a very large number of return oriented programming gadgets exist in glibc that allow stitching together semantically valid but malicious Turing-complete programs. In spite of significant advances in attack detection and mitigation, full defense is unrealistic against an ever-growing set of possibilities for generating such malicious programs. In this work, we create a defense mechanism by debloating libraries to reduce the dynamic functions linked so that the possibilities of constructing malicious programs diminishes significantly. The key idea is to locate each library call site within an application, and in each case to load only the set of library functions that will be used at that call site. This approach of demand-driven loading relies on an input-aware oracle that predicts a near-exact set of library functions needed at a given call site during the execution. The predicted functions are loaded just in time, and the complete call chain (of function bodies) inside the library is purged after returning from the library call back into the application. We present a decision-tree based predictor, which acts as an oracle, and an optimized runtime system, which works directly with library binaries like GNU libc and libstdc++. We show that on average, the proposed scheme cuts the exposed code surface of libraries by 97.2%, reduces ROP gadgets present in linked libraries by 97.9%, achieves a prediction accuracy in most cases of at least 97%, and adds a small runtime overhead of 18% on all libraries (16% for glibc, 2% for others) across all benchmarks of SPEC 2006, suggesting this scheme is practical.[USENIX] Is Less Really More? Towards Better Metrics for Measuring Security Improvements Realized Through Software Debloating Link
Nearly all modern software suffers from bloat that negatively impacts its performance and security. To combat this problem, several automated techniques have been proposed to debloat software. A key metric used in these works to demonstrate improved security is code reuse gadget count reduction. The use of this metric is based on the prevailing idea that reducing the number of gadgets available in a software package reduces its attack surface and makes mounting a gadget-based code reuse exploit such as return-oriented programming (ROP) more difficult for an attacker. In this paper, we challenge this idea and show through a variety of realistic debloating scenarios the flaws inherent to the gadget count metric. Specifically, we demonstrate that software debloating can achieve high gadget count reduction rates, yet fail to limit an attacker’s ability to construct an exploit. Worse yet, in some scenarios high gadget count reduction rates conceal instances in which software debloating makes security worse by introducing new quality gadgets. To address these issues, we propose new metrics based on quality rather than quantity for assessing the security impact of software debloaitng. We show that these metrics can be efficiently calculated with our Gadget Set Analyzer tool. Finally, we demonstrate the the utility of these metrics through a realistic debloating case study.[EuroSec] Configuration-Driven Software Debloating Link
With legitimate code becoming an attack surface due to the proliferation of code reuse attacks, software debloating is an effective mitigation that reduces the amount of instruction sequences that may be useful for an attacker, in addition to eliminating potentially exploitable bugs in the removed code. Existing debloating approaches either statically remove code that is guaranteed to not run (e.g., non-imported functions from shared libraries), or rely on profiling with realistic workloads to pinpoint and keep only the subset of code that was executed. In this work, we explore an alternative configuration-driven software debloating approach that removes feature-specific code that is exclusively needed only when certain configuration directives are specified—which are often disabled by default. Using a semi-automated approach, our technique identifies libraries solely needed for the implementation of a particular functionality and maps them to certain configuration directives. Based on this mapping, feature-specific libraries are not loaded at all if their corresponding directives are disabled. The results of our experimental evaluation with Nginx, VSFTPD, and OpenSSH show that using the default configuration in each case, configuration-driven debloating can remove 77% of the code for Nginx, 53% for VSFTPD, and 20% for OpenSSH, which represent a significant attack surface reduction.[arXiv] Trimming Mobile Applications for Bandwidth-Challenged Networks in Developing Regions Link
Despite continuous efforts to build and update network infrastructure, mobile devices in developing regions continue to be constrained by limited bandwidth. Unfortunately, this coincides with a period of unprecedented growth in the size of mobile applications. Thus it is becoming prohibitively expensive for users in developing regions to download and update mobile apps critical to their economic and educational development. Unchecked, these trends can further contribute to a large and growing global digital divide. Our goal is to better understand the source of this rapid growth in mobile app code size, whether it is reflective of new functionality, and identify steps that can be taken to make existing mobile apps more friendly bandwidth constrained mobile networks. We hypothesize that much of this growth in mobile apps is due to poor resource/code management, and do not reflect proportional increases in functionality. Our hypothesis is partially validated by mini-programs, apps with extremely small footprints gaining popularity in Chinese mobile networks. Here, we use functionally equivalent pairs of mini-programs and Android apps to identify potential sources of “bloat,” inefficient uses of code or resources that contribute to large package sizes. We analyze a large sample of popular Android apps and quantify instances of code and resource bloat. We develop techniques for automated code and resource trimming, and successfully validate them on a large set of Android apps. We hope our results will lead to continued efforts to streamline mobile apps, making them easier to access and maintain for users in developing regions.[CCS] Binary Control-Flow Trimming Link
A new method of automatically reducing the attack surfaces of binary software is introduced, affording code consumers the power to remove features that are unwanted or unused in a particular deployment context. The approach targets stripped binary native code with no source-derived metadata or symbols, can remove semantic features irrespective of whether they were intended and/or known to code developers, and anticipates consumers who can demonstrate desired features (e.g., via unit testing), but who may not know the existence of specific unwanted features, and who lack any formal specifications of the code’s semantics. Through a combination of runtime tracing, machine learning, in-lined reference monitoring, and contextual control-flow integrity enforcement, it is demonstrated that automated code feature removal is nevertheless feasible under these constraints, even for complex programs such as compilers and servers. The approach additionally accommodates consumers whose demonstration of desired features is incomplete; a tunable entropy-based metric detects coverage lapses and conservatively preserves unexercised but probably desired flows. A prototype implementation for Intel x86-64 exhibits low runtime overhead for trimmed binaries (about 1.87%), and case studies show that consumer-side control-flow trimming can successfully eliminate zero-day vulnerabilities.[JSS] Slimming javascript applications: An approach for removing unused functions from javascript libraries Link
Context: A common practice in JavaScript development is to ship and deploy an application as a large file, called bundle, which is the result of combining the application code along with the code of all the libraries the application depends on. Despite the benefits of having a single bundle per application, this approach leads to applications being shipped with significant portions of code that are actually not used, which unnecessarily inflates the JavaScript bundles and could slow down website loading because of the extra unused code. Although some static analysis techniques exist for removing unused code, our investigations suggest that there is still room for improvements. Objective: The goal of this paper is to address the problem of reducing the size of bundle files in JavaScript applications. Method: In this context, we define the notion of Unused Foreign Function (UFF) to denote a JavaScript function contained in dependent libraries that is not needed at runtime. Furthermore, we propose an approach based on dynamic analysis that assists developers to identify and remove UFFs from JavaScript bundles. Results: We report on a case-study performed over 22 JavaScript applications, showing evidence that our approach can produce size reductions of 26% on average (with reductions going up to 66% in some applications). Conclusion: It is concluded that removing unused foreign functions from JavaScript bundles helps reduce their size, and thus, it can boost the results of existing static analysis techniques.[arXiv] PolyDroid: Learning-Driven Specialization of Mobile Applications Link
The increasing prevalence of mobile apps has led to a proliferation of resource usage scenarios in which they are deployed. This motivates the need to specialize mobile apps based on diverse and varying preferences of users. We propose a system, called PolyDroid, for automatically specializing mobile apps based on user preferences. The app developer provides a number of candidate configurations, called reductions, that limit the resource usage of the original app. The key challenge underlying PolyDroid concerns learning the quality of user experience under different reductions. We propose an active learning technique that requires few user experiments to determine the optimal reduction for a given resource usage specification. On a benchmark suite comprising 20 diverse, open-source Android apps, we demonstrate that on average, PolyDroid obtains more than 85% of the optimal performance using just two user experiments.[arXiv] The Dynamics of Software Composition Analysis Link
Developers today use significant amounts of open source code, surfacing the need for ways to automatically audit and upgrade library dependencies and leading to the emergence of Software Composition Analysis (SCA). SCA products are concerned with three tasks: discovering dependencies, checking the reachability of vulnerable code for false positive elimination, and automated remediation. The latter two tasks rely on call graphs of library and application code to check whether vulnerable methods found in the open source components are called by applications. However, statically-constructed call graphs introduce both false positives and false negatives on real-world projects. In this paper, we develop a novel, modular means of combining statically- and dynamically-constructed call graphs via instrumentation to improve the performance of false positive elimination. Our experiments indicate significant performance improvements, but that instrumentation-based call graphs are less readily applicable in practice.[FSE] Binary reduction of dependency graphs Link
Delta debugging is a technique for reducing a failure-inducing input to a small input that reveals the cause of the failure. This has been successful for a wide variety of inputs including C programs, XML data, and thread schedules. However, for input that has many internal dependencies, delta debugging scales poorly. Such input includes C#, Java, and Java bytecode and they have presented a major challenge for input reduction until now. In this paper, we show that the core challenge is a reduction problem for dependency graphs, and we present a general strategy for reducing such graphs. We combine this with a novel algorithm for reduction called Binary Reduction in a tool called J-Reduce for Java bytecode. Our experiments show that our tool is 12x faster and achieves more reduction than delta debugging on average. This enabled us to create and submit short bug reports for three Java bytecode decompilers.[USENIX] RAZOR : A Framework for Post-deployment Software Debloating Link
Commodity software typically includes functionalities for a broad user population. However, each individual user usually only needs a subset of the supported functionalities. The bloated code not only hinders optimal execution, but also leads to a larger attack surface. Recent work explores program debloating as an emerging solution to this problem. Unfortunately, existing works require program source code, limiting their deployability. In this paper, we propose a practical debloating framework, RAZOR, that performs code reduction for deployed binaries. Based on users’ specification, our tool customizes the binary to generate a functional program with the minimal code size. Instead of only supporting given test cases, RAZOR takes several control-flow heuristics to infer complementary code that are necessary to support user-expected functionalities. We have evaluated RAZOR on commonly used benchmarks and real-world applications, including the web browser FireFox and the close-sourced PDF reader FoxitReader. The result shows that RAZOR is able to reduce over 70% of the code from the bloated binary. It produces functional programs and does not introduce new security issues. RAZOR is thus a practical framework for debloating real-world programs.[FEAST] CARVE: Practical Security-Focused Software Debloating Using Simple Feature Set Mappings Link
Software debloating is an emerging field of study aimed at improving the security and performance of software by removing excess library code and features that are not needed by the end user (called bloat). Software bloat is pervasive, and several debloating techniques have been proposed to address this problem. While these techniques are effective at reducing bloat, they are not practical for the average user, risk creating unsound programs and introducing vulnerabilities, and are not well suited for debloating complex software such as network protocol implementations. In this paper, we propose CARVE, a simple yet effective security-focused debloating technique that overcomes these limitations. CARVE employs static source code annotation to map software features source code, eliminating the need for advanced software analysis during debloating and reducing the overall level of technical sophistication required by the user. CARVE surpasses existing techniques by introducing debloating with replacement, a technique capable of preserving software interoperability and mitigating the risk of creating an unsound program or introducing a vulnerability. We evaluate CARVE in 12 debloating scenarios and demonstrate security and performance improvements that meet or exceed those of existing techniques.
2018
[SALAD] Fine-Grained Library Customization Link
Conduct a case study to understand the impact of code bloat in production-run software by analysing statically linked libraries. Leverage dependence analysis to trim the resultless code statements residing in a target library[USENIX] Debloating Software through Piece-Wise Compilation and Loading Link
Introduce a generic inter-modular late-stage debloating framework. It combines static (i.e., compile-time) and dynamic (i.e., load-time) approaches to systematically detect and automatically eliminate unused code from program memory. This can be thought of as a runtime extension to dead code elimination. Unused code is identified and removed by introducing a piece-wise compiler that not only compiles code modules (executables, shared and static objects), but also generates a dependency graph that retains all compiler knowledge on which function depends on what other function(s).[IST] Slimming javascript applications: An approach for removing unused functions from javascript libraries Link
Define the notion of Unused Foreign Function (UFF) to denote a JavaScript function contained in dependent libraries that is not needed at runtime. Also propose an approach based on dynamic analysis that assists developers to identify and remove UFFs from JavaScript bundles. The results show a reduction of JavaScript bundles of 26%.[ASE] TRIMMER: Application Specialization for Code Debloating Link
Proposes Trimmer, an application specialization tool that leverages user-provided configuration data to specialize an application to its deployment context. The specialization process attempts to eliminate the application functionality that is unused in the user-defined context.[ISSRE] RedDroid: Android Application Redundancy Customization Based on Static Analysis Link
The paper presents a comprehensive study of software bloat in Android applications, and categorize them into two types, compile-time redundancy and install-time redundancy. It also propose a static analysis based approach to identifying and removing software bloat from Android applications.[CCS] Effective Program Debloating via Reinforcement Learning Link
Uses reinforcement learning to improve Delta Debugging in terms of processing time by reducing the number of iterations necessary to remove redundant code. The approach aggressively removes redundant code even on the execution paths. The reduction is based on a test script with the specification of the functionalities that will be keep. The approach is implemented as a program reducer for C programs based on the syntax-guided Hierarchical Delta Debugging algorithm. The evaluation took into account the effectiveness, security and robustness of 10 reference programs from GNU packages.[ICSE] Perses: Syntax guided program reduction Link
Reduces programs by exploiting the formal syntax of the program. Perses considers only smaller, syntactically valid variants to avoid futile efforts on syntactically invalid variants. Evaluation was carried out using 20 C programs, and also Java applications.[FMICS] Wholly : A Build System For The Modern Software Stack Link
Wholly is designed for reproducible and verifiable builds of optimized and debloated software that runs uniformly on traditional desktops, the cloud, and IoT devices. Wholly uses Linux containers to ensure the integrity and reproducibility of the build environment. It uses the clang compiler to generate LLVM bitcode for all produced libraries and binaries to allow for whole program analysis, specialization, and optimization.
2017
[FEAST] A Multi-OS Cross-Layer Study of Bloating in User Programs, Kernel and Managed Execution Environments Link
Presents a study of bloating across the software stack (user-level programs, OS kernels and JVM). Employs (1) static measurements to detect limits to debloating, and (2) dynamic measurements to detect how much of the code available to a program is utilized under typical payloads. It uses a tracing procedure in ato measure the bloat in kernel, measuring the amount of kernel code that executes during the boot process and during the execution of popular system calls. The results show that bloating is pervasive and severe. A significant fraction of code across the software stack is never executed and provides scope for debloating.[FEAST] DamGate: Dynamic Adaptative Multi-feature Gating in Program Binaries Link
Presents DamGate, a framework for dynamic feature customization, allowing vigilant management of program features at runtime to prevent violation of privacy and security policies. At the heart of this technique is the selective placement of checker functions (gates) into feature-constituent functions that need to be protected. Through execution gating and feature validation on the fly, DamGate provides differentiated control policy for program features and enables flexible runtime reconfiguration. The proposed framework is prototyped and evaluated using LibreOffice The evaluation results show that it can achieve desired feature customization with negligible gating overhead.[FSE] Cimplifier: Automatically Debloating Containers Link
Propose a technique to debloating application containers running on Docker. They decompose a complicated container into multiple simpler containers with respect to a given user-defined constraint. Their technique is based on dynamic analysis to obtain information about application behaviors. The evaluation on real-world containers shows that this approachpreserves the original functionality, leads to reduction in image size of up to 95%, and processes even large containers in under thirty seconds.[FEAST] TOSS: Tailoring Online Server Systems through Binary Feature Customization Link
Propose an approach for automated customization of online servers and software systems, which are implemented using a client-server architecture based on the underlying network protocols. TOSS harnesses program tracing and tainting-guided symbolic execution to identify desired (feature-related) code from the original program binary, and apply static binary rewriting to remove redundant features and directly create customized program binary with only desired features. The evaluation was conducted in Mosquitto, TOSS was able to create a functional program binary with only desired features and significantly reduce potential attack surface by eliminating undesired protocol/program features.[FSE] Failure-Directed Program Trimming Link
Propose a program trimming technique that aims to reduce the number of execution paths while preserving safety. This allows any safety checker to be goal directed by pruning execution paths that cannot possibly result in an assertion violation.[SPA] Smartphone Bloatware: An Overlooked Privacy Problem Link
Provide findings of a user-study that was conducted to investigate the practical utility of smarthphone bloatware in personal and professional lives of users.
2016
[FEAST] Beyond Binary Program Transformation Link
Describe a number of program transformation and analysis tools developed at SRI to tackle bloatware from three angles: slicing binaries to exclude unnecessary components, transformation of different copies of the same binary to create diversity and reduce the potential impact of an attack, and verification-based super-optimization to prune unreachable code and harden vulnerable code.[HASE] Feature-based Software Customization: Preliminary Analysis, Formalization, and Methods Link
Proposes an approach to customizing Java bytecode by applying static dataflow analysis and enhanced programming slicing technique. This approach allows developers to customize Java programs based on various users’ requirements or remove unnecessary features from tangled code in legacy projects. We evaluate our approach by conducting case studies on removing cross cutting features from real world Java programs. The results show that our approach has the potential for practical use. Additionally, we find out that, by increasing the diversity of the software, our approach can help achieve moving target defense. Associated PhD thesis: Link.[COMPSAC] JRed: Program Customization and Bloatware Mitigation Based on Static Analysis Link
Proposes the JRed tool, which is built on top of the Soot framework to trim unused code from both Java applications and JRE automatically. It uses SPARK, a flexible points-to analysis framework for Java, to facilitate call graph construction. Evaluation was conducted using the DaCapo benchmark according to various criteria: code size, code complexity, memory footprint, execution and garbage collection time, and security. The experimental results show that, Java application size can be reduced by 44.5% on average and the JRE code can be reduced by more than 82.5% on average. The code complexity is significantly reduced according to a set of well-known metrics. Furthermore, we report that by trimming redundant code, 48.6% of the known security vulnerabilities in the Java Runtime Environment JRE 6 update 45 has been removed.[A-TEST] Modernizing Hierarchical Delta Debugging Link
Use extended context-free grammars (ANTLRv4) to improve the peformance of HDD. The tool, called Picireny, supports the outlined ideas: the reduced outputs are significantly smaller (by circa 25–40%) on the investigated test cases than those produced by the reference HDD implementation using standard context-free grammars. These results, together with the technical improvements that ease the use of the modernized tool, can hopefully help spreading the adaptation of HDD in practice.
2015
[GECCO] Removing the Kitchen Sink from Software Link
Propose to approaches to trimming software. The first one removes specific program features using dynamic tracing as a guide. This approach is safer than many alternatives, but is limited to removing code which is reachable in a trace when an undesirable feature is enabled. The second approach uses a genetic algorithm (GA) to mutate a program until a suitable variant is found. This approach can potentially remove any code that is not strictly required for proper execution, but may break program semantics in unpredictable ways.[SAC] Automated Software Winnowing Link
Propose winnowing, a static analysis and code specialization technique that uses partial evaluation. The process preserves the normal semantics of the original program – that is, any valid execution of the original program on specified inputs is preserved in its winnowed form. Invalid executions, such as those involving buffer overflows, may be executed differently. We also describe OCCAM, a tool that implements the techniques and present an experimental evaluation of its effectiveness.[SPLASH] Detecting redundant CSS rules in HTML5 applications: a tree rewriting approach Link
HTML5 applications normally have a large set of CSS (Cascading Style Sheets) rules for data display. Each CSS rule consists of a node selector and a declaration block (which assigns values to selected nodes’ display attributes). As web applications evolve, maintaining CSS files can easily become problematic. Some CSS rules will be replaced by new ones, but these obsolete (hence redundant) CSS rules often remain in the applications. Not only does this “bloat” the applications – increasing the bandwidth requirement – but it also significantly increases web browsers’ processing time. Most works on detecting redundant CSS rules in HTML5 applications do not consider the dynamic behaviours of HTML5 (specified in JavaScript); in fact, the only proposed method that takes these into account is dynamic analysis, which cannot soundly prove redundancy of CSS rules. In this paper, we introduce an abstraction of HTML5 applications based on monotonic tree-rewriting and study its “redundancy problem”. We establish the precise complexity of the problem and various subproblems of practical importance (ranging from P to EXP). In particular, our algorithm relies on an efficient reduction to an analysis of symbolic pushdown systems (for which highly optimised solvers are available), which yields a fast method for checking redundancy in practice. We implemented our algorithm and demonstrated its efficacy in detecting redundant CSS rules in HTML5 applications.
2014
- [GPCE] Automatic feature selection in large-scale system-software product lines LinkSystem software can typically be configured at compile time via a comfortable feature-based interface to tailor its functionality towards a specific use case. However, with the growing number of features, this tailoring process becomes increasingly difficult: As a prominent example, the Linux kernel in v3.14 provides nearly 14 000 configuration options to choose from. Even developers of embedded systems refrain from trying to build a minimized distinctive kernel configuration for their device – and thereby waste memory and money for unneeded functionality. In this paper, we present an approach for the automatic use-case specific tailoring of system software for special-purpose embedded systems. We evaluate the effectiveness of our approach on the example of Linux by generating tailored kernels for well-known applications of the Rasperry Pi and a Google Nexus 4 smartphone. Compared to the original configurations, our approach leads to memory savings of 15–70 percent and requires only very little manual intervention.
2013
[FSE] Cachetor: Detecting Cacheable Data to Remove Bloat Link
Modern object-oriented software commonly suffers from runtime bloat that significantly affects its performance and scalability. Studies have shown that one important pattern of bloat is the work repeatedly done to compute the same data values. Very often the cost of computation is very high and it is thus beneficial to memoize the invariant data values for later use. While this is a common practice in real-world development, manually finding invariant data values is a daunting task during development and tuning. To help the developers quickly find such optimization opportunities for performance improvement, we propose a novel run-time profiling tool, called Cachetor, which uses a combination of dynamic dependence profiling and value profiling to identify and report operations that keep generating identical data values. The major challenge in the design of Cachetor is that both dependence and value profiling are extremely expensive techniques that cannot scale to large, real-world applications for which optimizations are important. To overcome this challenge, we propose a series of novel abstractions that are applied to run-time instruction instances during profiling, yielding significantly improved analysis time and scalability. We have implemented Cachetor in Jikes Research Virtual Machine and evaluated it on a set of 14 large Java applications. Our experimental results suggest that Cachetor is effective in exposing caching opportunities and substantial performance gains can be achieved by modifying a program to cache the reported data.[OOPSLA] Combining Concern Input with Program Analysis for Bloat Detection Link
Introduce the use of concern information (feature information) in program analysis tasks and demonstrated its application in estimating the propensity for execution bloat of optional concerns in Java programs. The objective is to answer questions such as (1) whether a given set of optional features could lead to execution bloat and (2) which particular statements are the likely sources of bloat when those features are not required.[ISMM] A Bloat-Aware Design for Big Data Applications Link
Proposes a bloat-aware design paradigm towards the development of efficient and scalable Big Data applications in object-oriented GC enabled languages.It points out that the negative impact on performance caused by bloatware is being amplified by today’s big-data software usage nature. Perform a study on the impact of several typical memory bloat patterns. Investigate two data-intensive applications: Giraph and Hive.
2012
[SIGMETRICS] Does Lean Imply Green? A Study of the Power Performance Implications of Java Runtime Bloat Link
Conducts the first systematic experimental study of the joint power performance implications of bloat across a range of hardware and software configurations on modern server platforms. The study employs controlled experiments to expose different effects of a common type of Java runtime bloat, excess temporary objects, in the context of the SPECPower ssj2008 workload.[WOOT] Microgadgets: size does matter in turing-complete return-oriented programming Link
Return-oriented programming (ROP) has gained a lot of popularity lately, as an attack against currently implemented defenses in modern operating systems. Several kinds of ROP-based attacks and anti-ROP defenses have been proposed in recent years. The original attack technique depends on the existence of a hand-picked set of byte sequences (called gadgets) in the program, while subsequent approaches use complex scanners, which perform semantic analysis on the code to locate gadgets. The latter ones are efficient at finding gadgets and building an attack, but incur a significant cost in time. We propose a ROP attack technique, based on a handpicked but flexible and Turing-complete set of gadgets. One novelty in this approach is the use of microgadgets, which are gadgets restricted to 2 or 3 bytes in length. Our approach splits gadgets into several classes of varying sizes (from 1 to more than 800). Only a single gadget from each class is required for Turing-completeness. The short length of the gadgets, as well as the large size of the classes, increase the likelihood of finding all required gadgets. We also describe an efficient scanner which locates these gadgets in a given program. We then use this scanner on the /usr/bin directories from several Linux distributions, to show that many programs indeed contain a Turing-complete set of microgadgets, which attackers can use to perform arbitrary computations.
2011
[?] Manipulating Program Functionality to Eliminate Security Vulnerabilities Link
Present several mechanisms that can either excise or change system functionality in ways that may 1) eliminate security vulnerabilities while 2) enabling the system to continue to deliver acceptable service.[SCP] “Slimming” a Java virtual machine by way of cold code removal and optimistic partial program loading Link
Present a method to mitigate the bloatware problem in “always connected” embedded devices. Specifically, by storing the library code in a remote server. The classes that are needed will be downloaded on demand. In addition, by applying some more sophisticated analysis, some library code can be downloaded in advance before they are actually executed to improve the performance.[ECOOP] Reuse, Recycle to De-bloat Software Link
Describes a novel algorithm that detects bloat caused by the creation of temporary container and String objects within a loop. The analysis determines which objects created within a loop can be reused. Then it describes a source-to-source transformation that efficiently reuses such objects. Empirical evaluation indicates that our solution can reduce up to 40% of temporary object allocations in large programs, resulting in a performance improvement that can be as high as a 20% reduction in the run time, specifically when a program has a high churn rate or when the program is memory intensive and needs to run the GC often.[COMPUTER] Software Bloat and Wasted Joules: Is Modularity a Hurdle to Green Software? Link
The paper discusses that adopting an integrated analysis of software bloat and hardware platforms is necessary to realizing modular software that’s also green.
2010
[PLDI] Detecting Inefficiently-Used Containers to Avoid Bloat Link
Runtime bloat degrades significantly the performance and scalability of software systems. An important source of bloat is the inefficient use of containers. It is expensive to create inefficiently used containers and to invoke their associated methods, as this may ultimately execute large volumes of code, with call stacks dozens deep,and allocate many temporary objects.This paper presents practical static and dynamic tools that can find inappropriate use of containers in Java programs. At the core of these tools is a base static analysis that identifies, for each container,the objects that are added to this container and the key statements(i.e., heap loads and stores) that achieve the semantics of common container operations such as ADD and GET. The static tool finds problematic uses of containers by considering the nesting relation-ships among the loops where these semantics-achieving statements are located, while the dynamic tool can instrument these statements and find inefficiencies by profiling their execution frequencies.The high precision of the base analysis is achieved by taking advantage of a context-free language (CFL)-reachability formulation of points-to analysis and by accounting for container-specific properties. It is demand-driven and client-driven, facilitating refinement specific to each queried container object and increasing scalability.The tools built with the help of this analysis can be used both to avoid the creation of container-related performance problems early during development, and to help with diagnosis when problems are observed during tuning. Our experimental results show that the static tool has a low false positive rate and produces more relevant information than its dynamic counterpart. Further case studies suggest that significant optimization opportunities can be found by focusing on statically-identified containers for which high allocation frequency is observed at run time.[FOSER] Software Bloat Analysis: Finding, Removing, and Preventing Performance Problems in Modern Large-Scale Object-Oriented Applications Link
Describes software bloat, an emerging problem that has increasingly negative impact on large-scale object-oriented applications. It is argued that it is essentially a software engineering problem, and it is time for the SE community to start contributing new solutions for it. The paper survey some of the existing work on bloat analysis, describe challenges, and outline some promising future directions. The authors believe that there are larger opportunities than ever before for the SE community to make software more efficient, and this can happen entirely at the application level, without the help of ever-increasing hardware capabilities.
2006
[ICSE] A comparison of bloat control methods for genetic programming Link
Present HDD, a simple but effective algorithm that significantly speeds up Delta Debugging and increases its output quality on tree structured inputs such as XML. Instead of treating the inputs as one flat atomic lis, HDD applies DD to the very structure of the data, from the coarsest to the finest levels. This approach allows to pruene the large irrelevant portions of the input early.[Evolutionary Computation] HDD: Hierarchical Delta Debugging Link
Genetic programming has highlighted the problem of bloat, the uncontrolled growth of the average size of an individual in the population. The most common approach to dealing with bloat in tree-based genetic programming individuals is to limit their maximal allowed depth. An alternative to depth limiting is to punish individuals in some way based on excess size, and our experiments have shown that the combination of depth limiting with such a punitive method is generally more effective than either alone. Which such combinations are most effective at reducing bloat? In this article we augment depth limiting with nine bloat control methods and compare them with one another. These methods are chosen from past literature and from techniques of our own devising, esting with four genetic programming problems, we identify where each bloat control method performs well on a per-problem basis, and under what settings various methods are effective independent of problem. We report on the results of these tests, and discover an unexpected winner in the cross-platform category.
2002
[TSE] Simplifying and Isolating Failure-Inducing Input Link
This paper is the state-of-the-art publication of the Delta Debugging (DD) algorithm. DD aims to generalice and simplify some failing test case to a minimal test case that still produces the failure; it also isolates the difference between a passing and a failing test case. Mozilla web browser is used as a use case. The algorithm is applied to find failure-inducing parts in the program invocation (GCC options), in the program input (GCC, fuzz, and Mozilla input), and in the sequence of user interactions (Mozilla user actions).[TPLS] Practical extraction techniques for Java Link
Reducing application size is important for software that is distributed via the internet, in order to keep download times manageable, and in the domain of embedded systems, where applications are often stored in (Read-Only or Flash) memory. This paper explores extraction techniques such as the removal of unreachable methods and redundant fields, inlining of method calls, and transformation of the class hierarchy for reducing application size. We implemented a number of extraction techniques in Jax, an application extractor for Java, and evaluated their effectiveness on a set of large Java applications. We found that, on average, the class file archives for these benchmarks were reduced to 37.5% of their original size. Modeling dynamic language features such as reflection, and extracting software distributions other than complete applications requires additional user input. We present a uniform approach for supplying this input that relies on MEL, a modular specification language. We also discuss a number of issues and challenges associated with the extraction of embedded systems applications.[TPLS] Practical extraction techniques for Java Link
Reducing application size is important for software that is distributed via the internet, in order to keep download times manageable, and in the domain of embedded systems, where applications are often stored in (Read-Only or Flash) memory. This paper explores extraction techniques such as the removal of unreachable methods and redundant fields, inlining of method calls, and transformation of the class hierarchy for reducing application size. We implemented a number of extraction techniques in Jax, an application extractor for Java, and evaluated their effectiveness on a set of large Java applications. We found that, on average, the class file archives for these benchmarks were reduced to 37.5% of their original size. Modeling dynamic language features such as reflection, and extracting software distributions other than complete applications requires additional user input. We present a uniform approach for supplying this input that relies on MEL, a modular specification language. We also discuss a number of issues and challenges associated with the extraction of embedded systems applications.
2000
- [PGI] Are We All In the Same “Bloat”? Link“Bloat”, a term that has existed in the technical community for many years, has recently received attention in the popular press. The term has a negative connotation implying that human, or system performance is diminished in some way when “bloat” exists. Yet “bloat” is seldom clearly defined and is often a catch-all phrase to suggest that software is filled with unnecessary features. However, to date there are no studies that explore how users actually experience complex functionality-filled software applications and most importantly, the extent to which they experience them in similar/different ways. The significance of understanding users’ experience is in the implications this understanding has for design. Using both quantitative and qualitative methods, we carried out a study to gain a better understanding of the experiences of 53 members of the general population who use a popular word processor, Microsoft Word, Office 97. As a result we are able to further specify the term “bloat”, distinguishing an objective and subjective dimension. It is the discovery of the subjective dimension that opens the design space and raises new challenges for interface designers. There is certainly more to “bloat” than meets the eye.
1999
- [OOPSLA] Practical experience with an application extractor for Java LinkJava programs are routinely transmitted over low-bandwidth network connections as compressed class file archives (i.e., zip files and jar files). Since archive size is directly proportional to download time, it is desirable for applications to be as small as possible. This paper is concerned with the use of program transformations such as removal of dead methods and fields, inlining of method calls, and simplification of the class hierarchy for reducing application size. Such “extraction” techniques are generally believed to be especially useful for applications that use class libraries, since typically only a small fraction of a library’s functionality is used. By “pruning away” unused library functionality, application size can be reduced dramatically. We implemented a number of application extraction techniques in Jax, an application extractor for Java, and evaluate their effectiveness on a set of realistic benchmarks ranging from 27 to 2,332 classes (with archives ranging from 56,796 to 3,810,120 bytes). We report archive size reductions ranging from 13.4% to 90.2% (48.7% on average).
External resources
- Why Bloat Is Still Software’s Biggest Vulnerability in IEEE Spectrum
- Debloater Evaluation Dataset
- Sensenmann: Code Deletion at Scale at Google
- Java Bytecode Debloating for Size Reduction and Security
- Open-source minimalism?
- Technical Perspective: What Is the Ideal Operating System?
- OpenBSD did a big cleanup after heartbleed (2014)
- https://catless.ncl.ac.uk/Risks/30/84/#subj25
- http://tonsky.me/blog/disenchantment/
- https://bulldogjob.com/news/400-feature-creep
- IEEE Software: Code Inflation
- IEEE Software: Four Trends Leading to Java Runtime Bloat
- ACM Queue: A Generation Lost in the Bazaar
- Smallest x86 ELF Hello World
- Wikipedia’s JavaScript initialisation on a budget
- Don’t Put Fat Jars in Docker Images
- Hello world
- Following up on “Hello world”
- Building a self-contained game in C# under 8 kilobytes
- The cost of JavaScript in 2019
- The fault in our JARS
- How much code do you need?