Zhuo Zhang  张倬


Postdoctoral Researcher @ Purdue CS
Chief Research Scientist @ Offside Labs
Oops, your browser doesn't support this application.

I will be on the academic job market for the 2024-25 cycle!

I am a postdoctoral researcher at Purdue University, advised by Samuel Conte Professor Xiangyu Zhang. I completed my Ph.D. at Purdue University in 2023 and previously obtained my B.Sc. with Zhiyuan Honours from Shanghai Jiao Tong University (SJTU) in 2018.

I am passionate about hardcore hacking and currently active in Web3 security with Offside Labs. Before this, I was a core member of CTF teams 0ops and A*0*E. My efforts in securing the digital world have earned me over $?00,000 in bug bounties.

In my separate capacity, I build open-source projects for public good. Among these is a widely recognized Web3 bug dataset, which has gained stars on GitHub. Meanwhile, I co-lead the MEDGA team, who develops open-source debugging tools for smart contracts.

I will be on the academic job market for the 2024-25 cycle!

I am a postdoctoral researcher at Purdue University, advised by Samuel Conte Professor Xiangyu Zhang. I completed my Ph.D. at Purdue University in 2023 and previously obtained my B.Sc. with Zhiyuan Honours from Shanghai Jiao Tong University (SJTU) in 2018.

I am passionate about hardcore hacking and currently active in Web3 security with Offside Labs. Before this, I was a core member of CTF teams 0ops and A*0*E. My efforts in securing the digital world have earned me over $?00,000 in bug bounties.

In my separate capacity, I build open-source projects for public good. Among these is a widely recognized Web3 bug dataset, which has gained stars on GitHub. Meanwhile, I co-lead the MEDGA team, who develops open-source debugging tools for smart contracts.

Oct, 2024 I am deeply honored to have received the prestigious ACM SIGSAC Doctoral Dissertation Award!
Jun, 2024 Our proposal, MEDGA, aimed at enhancing the debugging experience for Ethereum development, has been approved and supported by Ethereum Foundation. I am proud to serve as the PI for this project!
Dec, 2023 We at Offside Labs rescued $2 million from an on-chain attack on Affine DeFi. See our blog post for more information!
May, 2023 We have released our Web3 bug dataset, which has received stars on GitHub!

As a dedicated security researcher and hands-on ethical hacker, I am eager to push the boundaries of how we secure digital systems in practice. At the core of my research are two fundamental limitations that hinder this scalability:

  • Inherent Complexity of Software (Undecidability): Program analysis, a critical security analysis method, is fundamentally limited by the undecidability of non-trivial program properties. That means determining certain behaviors (e.g., data dependencies) for arbitrary programs is impossible without introducing false positives or false negatives.
  • Uncertainty in Real-World Security Analysis: A significant yet underexplored challenge in security analysis is managing uncertainty. Real-world analysis often suffers from incomplete information (e.g., missing source code when hardening legacy systems). The growing integration of AI in software systems further amplifies this uncertainty due to the opaque nature of AI models. As a result, analysis has to make educated guesses based on limited or ambiguous data.

My methodology synergizes learning and reasoning in a cohesive and reliable manner. Along this line, I have devised several security analysis techniques across various domains. These include analyzing binary executables without source code (OOPSLA'19, Oakland'21a, Oakland'21b), securing smart contracts (ICSE'23, Security'23b, PLDI'24), and red-teaming AI systems (Security'23a, Oakland'24).

Run the following command in a terminal (GNU):

$ echo "$(echo "ghNsgnm" | md5sum - | xxd -r -p | base64 | cut -c3-10)@purdue.edu" 

Or calculate it online:

If necessary, click here to get my GPG key.

Please refer to my publications. If necessary, click here to get my GPG key.
  • Revamping Binary Analysis with Sampling and Probabilistic Inference Logo Logo   Logo Logo
    Zhuo Zhang
    PhD Dissertation, Purdue University, August 2023
    🏆   ACM SIGSAC Doctoral Dissertation Award
    🏆   Excellent Score in Code Delivery and Evaluation of Office of Naval Research (ONR)
  • Keywords: Decompilation, Probabilistic Analysis close
    Abstract:
         Binary analysis, a cornerstone technique in cybersecurity, enables the examination of binary executables, irrespective of source code availability. It plays a critical role in understanding program behaviors, detecting software bugs, and mitigating potential vulnerabilities, specially in situations where the source code remains out of reach. However, aligning the efficacy of binary analysis with that of source-level analysis remains a significant challenge, primarily due to the uncertainty caused by the loss of semantic information during the compilation process.
         This dissertation presents an innovative probabilistic approach, termed as probabilistic binary analysis, designed to combat the intrinsic uncertainty in binary analysis. It builds on the fundamental principles of program sampling and probabilistic inference, enhanced further by an iterative refinement architecture. The dissertation suggests that a thorough and practical method of sampling program behaviors can yield a substantial quantity of hints which could be instrumental in recovering lost information, despite the potential inclusion of some inaccuracies. Consequently, a probabilistic inference technique is applied to systematically incorporate and process the collected hints, suppressing the incorrect ones, thereby enabling the interpretation of high-level semantics. Furthermore, an iterative refinement mechanism is deployed to augment the efficiency of the probabilistic analysis in subsequent applications, facilitating the progressive enhancement of analysis outcomes through an automated or human-guided feedback loop.
         This work offers an in-depth understanding of the challenges and solutions related to assessing low-level program representations and systematically handling the inherent uncertainty in binary analysis. It aims to contribute to the field by advancing the development of precise, reliable, and interpretable binary analysis solutions, thereby setting the groundwork for future exploration in this domain.


Binary Analysis

  • BDA: Practical Dependence Analysis for Binary Executables by Unbiased Whole-Program Path Sampling and Per-Path Abstract Interpretation Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Guannan Wei, Yonghwi Kwon, Xiangyu Zhang
    Proceedings of the ACM on Programming Languages Volume 3 Issue OOPSLA (OOPSLA 2019)
    Athens, Greece, October 2019   [artifact]   [bibtex]
    🏆   ACM SIGPLAN Distinguished Paper Award
  • Keywords: Path Sampling, Abstract Interpretation, Binary Analysis, Data Dependence close
    Abstract:
    Logo      Binary program dependence analysis determines dependence between instructions and hence is important for many applications that have to deal with executables without any symbol information. A key challenge is to identify if multiple memory read/write instructions access the same memory location. The state-of-the-art solution is the value set analysis (VSA) that uses abstract interpretation to determine the set of addresses that are possibly accessed by memory instructions. However, VSA is conservative and hence leads to a large number of bogus dependences and then substantial false positives in downstream analyses such as malware behavior analysis. Furthermore, existing public VSA implementations have difficulty scaling to complex binaries.
         In this paper, we propose a new binary dependence analysis called BDA enabled by a randomized abstract interpretation technique. It features a novel whole program path sampling algorithm that is not biased by path length, and a per-path abstract interpretation avoiding precision loss caused by merging paths in traditional analyses. It also provides probabilistic guarantees. Our evaluation on SPECINT2000 programs shows that it can handle complex binaries such as gcc whereas VSA implementations from the-state-of-art platforms have difficulty producing results for many SPEC binaries. In addition, the dependences reported by BDA are 75 and 6 times smaller than Alto, a scalable binary dependence analysis tool, and VSA, respectively, with only 0.19% of true dependences observed during dynamic execution missed (by BDA). Applying BDA to call graph generation and malware analysis shows that BDA substantially supersedes the commercial tool IDA in recovering indirect call targets and outperforms a state-of-the-art malware analysis tool Cuckoo by disclosing 3 times more hidden payloads.

  • OSPREY: Recovery of Variable and Data Structure via Probabilistic Analysis for Stripped Binary   Logo
    Zhuo Zhang, Yapeng Ye, Wei You, Guanhong Tao, Wen-chuan Lee, Yonghwi Kwon, Yousra Aafer, Xiangyu Zhang
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [bibtex]   [evaluation data]
  • Keywords: Binary Analysis, Variable Recovery, Probabilistic Analysis, Reverse Engineering close
    Abstract:
    Logo      Recovering variables and data structure information from stripped binary is a prominent challenge in binary program analysis. While various state-of-the-art techniques are effective in specific settings, such effectiveness may not generalize. This is mainly because the problem is inherently uncertain due to the information loss in compilation. Most existing techniques are deterministic and lack a systematic way of handling such uncertainty. We propose a novel probabilistic technique for variable and structure recovery. Random variables are introduced to denote the likelihood of an abstract memory location having various types and structural properties such as being a field of some data structure. These random variables are connected through probabilistic constraints derived through program analysis. Solving these constraints produces the posterior probabilities of the random variables, which essentially denote the recovery results. Our experiments show that our technique substantially outperforms a number of state-of-the-art systems, including IDA, Ghidra, Angr, and Howard. Our case studies demonstrate the recovered information improves binary code hardening and binary decompilation.

  • StochFuzz: Sound and Cost-effective Fuzzing of Stripped Binaries by Incremental and Stochastic Rewriting   Logo
    Zhuo Zhang, Wei You, Guanhong Tao, Yousra Aafer, Xuwei Liu, Xiangyu Zhang
    Proceedings of the 42th IEEE Symposiums on Security and Privacy (Oakland 2021)
    Virtually, May 2021   [benchmarks]   [bibtex]   [code: ★]   [poster]
    🏆   CSAW 2021 Best Applied Security Paper Award TOP-10 Finalists
  • Keywords: Fuzz, Binary Rewriting, Probabilistic Analysis close
    Abstract:
    Logo      Fuzzing stripped binaries poses many hard challenges as fuzzers require instrumenting binaries to collect runtime feedback for guiding input mutation. However, due to the lack of symbol information, correct instrumentation is difficult on stripped binaries. Existing techniques either rely on hardware and expensive dynamic binary translation engines such as QEMU, or make impractical assumptions such as binaries do not have inlined data. We observe that fuzzing is a highly repetitive procedure providing a large number of trial-and-error opportunities. As such, we propose a novel incremental and stochastic rewriting technique STOCHFUZZ that piggy-backs on the fuzzing procedure. It generates many different versions of rewritten binaries whose validity can be approved/disapproved by numerous fuzzing runs. Probabilistic analysis is used to aggregate evidence collected through the sample runs and improve rewriting. The process eventually converges on a correctly rewritten binary. We evaluate STOCHFUZZ on two sets of real-world programs and compare with five other baselines. The results show that STOCHFUZZ outperforms state-of-the-art binary-only fuzzers (e.g., e9patch, ddisasm, and RetroWrite) in terms of soundness and cost-effectiveness and achieves performance comparable to source-based fuzzers. STOCHFUZZ is publicly available.

  • ReSym: Harnessing LLMs to Recover Variable and Data Structure Symbols from Stripped Binaries
    Danning Xie, Zhuo Zhang, Nan Jiang, Xiangzhe Xu, Lin Tan, Xiangyu Zhang
    Proceedings of the 31st Conference on Computer and Communications Security (CCS 2024)
    Salt Lake City, UT, October 2024
    🏆   ACM SIGSAC Distinguished Paper Award
  • Keywords: Binary Analysis, Type Inference, Variable Recovery, Large Language Model close
    Abstract:
         Decompilation aims to recover the source code form of a binary executable and hence has a wide range of applications in cyber security, such as malware analysis and legacy code hardening. A prominent challenge is to recover variables, including both primitive and complex variables such as user-defined data structures, and their symbol information, including names and types. Existing efforts focus on solving parts of the problem, e.g., recovering only types (without names) or only local variables (without user-defined structures). In this paper, we propose \tool, a novel LLM-based technique to recover both names and types for local variables and structures including user-defined structures. It features fine-tuning two LLMs to handle local variables and structures, respectively. To overcome the input token limitations of existing LLMs, we also devise a novel Prolog-based algorithm to aggregate and cross-check results from multiple LLM queries, suppressing uncertainty and hallucinations. Our experiments show that \tool is effective in recovering variable information and user-defined data structures, substantially outperforming the state-of-the-art methods OSPREY and DIRTY.

  • Unleashing the Power of Generative Model in Recovering Variable Names from Stripped Binary
    Xiangzhe Xu, Zhuo Zhang, Zian Su, Ziyang Huang, Shiwei Feng, Yapeng Ye, Nan Jiang, Danning Xie, Siyuan Cheng, Lin Tan, Xiangyu Zhang
    Proceedings of the 32rd Network and Distributed System Security Symposium (NDSS 2025)
    San Diego, CA, February 2025
  • Keywords: Binary Analysis, Variable Name Recovery, Large Language Model close
    Abstract:
         Decompilation aims to recover the source code form of a binary executable. It has many applications in security and software engineering such as malware analysis, vulnerability detection and code reuse. A prominent challenge in decompilation is to recover variable names. We propose a novel method that leverages the synergy of large language model (LLM) and program analysis. Language models encode rich multi-modal knowledge, but its limited input size prevents providing sufficient global context for name recovery. We propose to divide the task to many LLM queries and use program analysis to correlate and propagate the query results, which in turn improves the performance of LLM by providing additional contextual information. Our results show that 75% of the recovered names are considered good by users and our technique outperforms the state-of-the-art technique by 16.5% and 20.23% in precision and recall, respectively.

Blockchain Security

  • Demystifying Exploitable Bugs in Smart Contracts
    Zhuo Zhang, Brian Zhang, Wen Xu, Zhiqiang Lin
    Proceedings of the 45st ACM/IEEE International Conference on Software Engineering (ICSE 2023)
    Melbourne, Australia, May 2023   [dataset: ★]
  • Keywords: Smart Contract, Web3 Security, Blockchain close
    Abstract:
         Exploitable bugs in smart contracts have caused significant monetary loss. Despite the substantial advances in smart contract bug finding, exploitable bugs and real-world attacks are still trending. In this paper we systematically investigate 516 unique real-world smart contract vulnerabilities in years 2021-2022, and study how many can be exploited by malicious users and cannot be detected by existing analysis tools. We further categorize the bugs that cannot be detected by existing tools into seven types and study their root causes, distributions, difficulties to audit, consequences, and repair strategies. For each type, we abstract them to a bug model (if possible), facilitating finding similar bugs in other contracts and future automation. We leverage the findings in auditing real world smart contracts, and so far we have been rewarded with $102,660 bug bounties for identifying 15 critical zero-day exploitable bugs, which could have caused up to $22.52 millions monetary loss if exploited.

  • Your Exploit is Mine: Instantly Synthesizing Counterattack Smart Contract
    Zhuo Zhang, Zhiqiang Lin, Marcelo Morales, Xiangyu Zhang, Kaiyuan Zhang
    Proceedings of the 32nd USENIX Security Symposium (Security 2023)
    Anaheim, CA, August, 2023   [bibtex]
  • Keywords: Maximal Extractable Value, Web3 Security, Blockchain close
    Abstract:
         Smart contracts are susceptible to exploitation due to their unique nature. Despite efforts to identify vulnerabilities using fuzzing, symbolic execution, formal verification, and manual auditing, exploitable vulnerabilities still exist and have led to billions of dollars in monetary losses. To address this issue, it is critical that runtime defenses are in place to minimize exploitation risk. In this paper, we present STING, a novel runtime defense mechanism against smart contract exploits. The key idea is to instantly synthesize counterattack smart contracts from attacking transactions and leverage the power of Maximal Extractable Value (MEV) to front run attackers. Our evaluation with 62 real-world recent exploits demonstrates its effectiveness, successfully countering 54 of the exploits (i.e., intercepting all the funds stolen by the attacker). In comparison, a general front-runner defense could only handle 12 exploits. Our results provide a clear proof-of-concept that STING is a viable defense mechanism against smart contract exploits and has the potential to significantly reduce the risk of exploitation in the smart contract ecosystem.

  • Nyx: Detecting Exploitable Front-Running Vulnerabilities in Smart Contracts
    Wuqi Zhang, Zhuo Zhang, Qingkai Shi, Lu Liu, Lili Wei, Yepang Liu, Xiangyu Zhang, Shing-Chi Cheung
    Proceedings of the 45th IEEE Symposium on Security and Privacy (Oakland 2024)
    San Francisco, CA, May, 2024   [code]
  • Keywords: Frontrunning Attack, Web3 Security, Blockchain close
    Abstract:
         Smart contracts are susceptible to front-running attacks, in which malicious users leverage prior knowledge of upcoming transactions to execute attack transactions in advance and benefit their own portfolios. Existing contract analysis techniques raise a number of false positives and false negatives in that they simplistically treat data races in a contract as front-running vulnerabilities and can only analyze contracts in isolation. In this work, we formalize the definition of exploitable front-running vulnerabilities based on previous empirical studies on historical attacks, and present Nyx, a novel static analyzer to detect them. Nyx features a Datalog-based preprocessing procedure that efficiently and soundly prunes a large part of the search space, followed by a symbolic validation engine that precisely locates vulnerabilities with an SMT solver. We evaluate Nyx using a large dataset that comprises 513 realworld front-running attacks in smart contracts. Compared to six state-of-the-art techniques, Nyx surpasses them by 32.64%-90.19% in terms of recall and 2.89%-70.89% in terms of precision. Nyx has also identified four zero-days in real-world smart contracts.

  • Consolidating Smart Contracts with Behavioral Contracts
    Guannan Wei, Danning Xie, Wuqi Zhang, Yongwei Yuan, Zhuo Zhang*
    Proceedings of the ACM on Programming Languages Volume 8 Issue PLDI (PLDI 2024)
    Copenhagen, Demark, June, 2024
  • Keywords: Behavioral Contract, Smart Contract, Runtime Monitoring close
    Abstract:
         Ensuring the reliability of smart contracts is of vital importance due to the wide adoption of smart contract programs in decentralized financial applications. However, statically checking many rich properties of smart contract programs can be challenging. On the other hand, dynamic validation approaches have shown promise for widespread adoption in practice. Nevertheless, as part of the programming environment for smart contracts, existing dynamic validation approaches have not provided programmers with a notion to clearly articulate the interface between components, especially for addresses representing opaque contract instances. We argue that the ``design-by-contract'' approach should complement the development of smart contract programs. Unfortunately, there is limited linguistic support for it in existing smart contract languages.
         In this paper, we design a Solidity language extension ConSol that supports behavioral contracts. ConSol provides programmers with a modular specification and monitoring system for both functional and latent address behaviors. The key capability of ConSol is to attach specifications to first-class addresses and monitor violations when invoking these addresses. We evaluate ConSol using 20 real-world cases, demonstrating its effectiveness in expressing critical conditions and preventing attacks. Additionally, we assess ConSol's efficiency and compare gas consumption with manually inserted assertions, showing that our approach introduces only marginal gas overhead. By separating specifications and implementations using behavioral contracts, ConSol assists programmers in writing more robust and readable smart contracts.

AI Security

  • Pelican: Exploiting Backdoors of Naturally Trained Deep Learning Models In Binary Code Analysis   Logo
    Zhuo Zhang, Guanhong Tao, Guangyu Shen, Shengwei An, Qiuling Xu, Yingqi Liu, Yapeng Ye, Yaoxuan Wu, Xiangyu Zhang
    Proceedings of the 32nd USENIX Security Symposium (Security 2023)
    Anaheim, CA, August, 2023   [bibtex]
  • Keywords: Binary Analysis, Deep Learning Security, Probabilistic Analysis close
    Abstract:
    Logo      Deep Learning (DL) models are increasingly used in many cyber-security applications and achieve superior performance compared to traditional solutions. In this paper, we study backdoor vulnerabilities in naturally trained models used in binary analysis. These backdoors are not injected by attackers but rather products of defects in datasets and/or training processes. The attacker can exploit these vulnerabilities by injecting some small fixed input pattern (e.g., an instruction) called backdoor trigger to their input (e.g., a binary code snippet for a malware detection DL model) such that misclassification can be induced (e.g., the malware evades the detection). We focus on transformer models used in binary analysis. Given a model, we leverage a trigger inversion technique particularly designed for these models to derive trigger instructions that can induce misclassification. During attack, we utilize a novel trigger injection technique to insert the trigger instruction(s) to the input binary code snippet. The injection makes sure that the code snippets' original program semantics are preserved and the trigger becomes an integral part of such semantics and hence cannot be easily eliminated. We evaluate our prototype PELICAN on 5 binary analysis tasks and 15 models. The results show that PELICAN can effectively induce misclassification on all the evaluated models in both white-box and black-box scenarios. Our case studies demonstrate that PELICAN can exploit the backdoor vulnerabilities of two closed-source commercial tools.

  • On Large Language Models’ Resilience to Coercive Interrogation
    Zhuo Zhang, Guangyu Sheng, Guanhong Tao, Siyuan Cheng, Xiangyu Zhang,
    Proceedings of the 45th IEEE Symposium on Security and Privacy (Oakland 2024)
    San Francisco, CA, May, 2024   [code]   [website]
  • Keywords: Jailbreaking, Model Alignment, Large Language Model close
    Abstract:
         Large Language Models (LLMs) are increasingly employed in numerous applications. It is hence important to ensure that their ethical standard aligns with humans’. However, existing jail-breaking efforts show that such alignment could be compromised by well-crafted prompts. In this paper, we disclose a new threat to LLMs alignment when a malicious actor has access to the top-k token predictions at each output position of the model, such as in all open-source LLMs and many commercial LLMs that provide the needed APIs (e.g., some GPT versions). It does not require crafting any prompt. Instead, it leverages the observation that even when an LLM declines a toxic query, the harmful response is concealed deep within the output logits. We can coerce the model to disclose it by forcefully using low-ranked output tokens during autoregressive output generation, and such forcing is only needed in a very small number of selected output positions. We call it model interrogation. Since our method operates differently from jail-breaking, it has better effectiveness than state-of-theart jail-breaking techniques (92% versus 62%) and is 10 to 20 times faster. The toxic content elicited by our method is also of better quality. More importantly, it is complementary to jail-breaking, and a synergetic integration of the two exhibits superior performance over individual methods. We also find that with interrogation, harmful content can even be extracted from models customized for coding tasks.

Academic Awards

Selected Capture-The-Flag (CTF)

  • 1st place at Paradigm CTF 2023 (w/ Offside Labs)
  • 1st place at DEFCON CTF 2020 (w/ A*0*E)
  • 1st place at the 40th IEEE S&P Celebration Scavenger Hunt (solo)
  • 4th place at DEFCON CTF 2018 (w/ A*0*E)
  • 3rd place at DEFCON CTF 2017 (w/ A*0*E)

Selected Web3 Bug Bounties

  • Critical bug report for Anonymous Project, awarded $3,000 (my first web3 bug bounty)
  • Critical bug report for Duet Protocol, awarded $50,000
  • Critical bug report for Grizzly.fi, awarded $10,000
  • Critical bug report for ApeX Protocol, awarded ~$25,000
  • Critical bug report for Infinity NFT Marketplace, awarded $20,000
  • Critical bug reports for ENS, awarded ~$40,000
  • Program Committee Member
  • USENIX Security Symposium, 2025
    The ACM Conference on Computer and Communications Security (CCS), 2024
    International Conference on Software Engineering (ICSE), 2025
    International Conference on Automated Software Engineering (ASE), 2024
    International Symposium on Software Testing and Analysis (ISSTA), 2024, 2025
    International Symposium on Research in Attacks, Intrusions and Defenses (RAID), 2024
    The ACM ASIA Conference on Computer and Communications Security (ASIACCS), 2024
    Workshop on Binary Analysis Research (BAR), 2022
  • Reviewer
  • IEEE Transactions on Software Engineering
    IEEE Transactions on Information Forensics and Security
    IEEE/ACM Transactions on Networking
    The Association for Computational Linguistics (ACL) Rolling Review, 2023
  • Sub-reviewer
  • USENIX Security Symposium
    IEEE Symposium on Security and Privacy (Oakland)
    The Network and Distributed System Security Symposium (NDSS)
    International Conference on Dependable Systems and Networks (DSN)
    International Conference on Automated Software Engineering (ASE)
    International Symposium on Software Testing and Analysis (ISSTA)
    International Symposium on the Foundations of Software Engineering (FSE)
    The ACM Conference on Computer and Communications Security (CCS)
    The ACM Conference on Systems, Programming, Languages, and Applications (OOPSLA)

August 2017 - June 2019

Mentor: Anton Kochkov
Project: radeco ( ★)
Radare2-based binary analysis framework

June 2016 - July 2017

Mentor: Sen Nie
Research Intern: Keen Security Lab of Tencent, Shanghai, China
FREE-FALL: HACKING TESLA FROM WIRELESS TO CAN BUS  [video]   [writeup]   [slides]