1Dataflow Analysis in Practice:
Program Analysis Frameworks,
Analysis Scope and Approximation 1 Announcements n HW1 due today n HW2 posted n Your task is to set the infrastructure locally n Start as soon as you can
n Ask questions on Submitty, in class, in office
hours n Run Soot on toy programs and study Jimple IR CSCI 4450/6450, A Milanova 2 2 2So Far and Moving On… n Dataflow analysis n Four classical dataflow problems n Dataflow frameworks n CFGs, lattices, transfer functions and properties,
worklist algorithm, MFP vs. MOP solutions n Non-distributive analysis n Constant propagation n Points-to analysis (will cover in catchup week!) n Program analysis in practice CSCI 4450/6450, A Milanova 3 3 Outline of Today’s Class n Constant propagation (recap) n Program analysis in practice n Program analysis frameworks n Soot program analysis framework n Ghidra framework n Analysis scope and approximation n Class analysis CSCI 4450/6450, A Milanova 4 4 3Constant Propagation fits into
Monotone Dataflow Framework n Property space n Product lattice L = Lx x Ly x … x Lz n L satisfies the ACC and
n Function space F: Là L is monotone n Thus, analysis fits into the monotone
dataflow framework and can be solved using
the worklist algorithm CSCI 4450/6450, A Milanova 5 T T … -2
-1
0
1
2
... Lx: 5 Example CSCI 4450/6450, A Milanova 6 2. x=1 y=2 3. x=2 y=1 4. z=x+y 1.
if (b>0) 5. w=10*z out(2):
out(3):
in(4):
out(4):
in(5):
in(1) is T =
6 4Constant Propagation is
Monotone but Not Distributive!
n f4(f2(f1(T))) computes zà 3 n f4(f3(f1(T))) computes zà 3 n Thus, MOP at 5 f4(f2(f1(T))) V f4(f3(f1(T))) computes zà 3 MFP at 5 computes zà T (i.e., z is NOT a const)
7 2. x=1 y=2 3. x=2 y=1 4. z=x+y 1.
if (b>0) 5. w=10*z out(2): xà1, yà2 out(3): xà2, yà1 in(4): xà T, yà T
out(4): zà T
in(5): zà T in(1) is T 7 More Product Lattices n Problem statement: Is integer variable x odd
or even at program point n? n Lx: CSCI 4450/6450, A Milanova (Example program from MIT OCW Program Analysis) 8 x=x+1 y=y+2 … if (x≥10) T F xà T, yà T
y=0 T T odd
even
xà T, yà even
xà T, yà even
xà T, yà even
8 5More Product Lattices n Problem statement: What sign does a
variable hold at a given program point, i.e., is
it positive, negative, or 0 n Lx: E.g., < xà+,yàT,zà0 > CSCI 4450/6450, A Milanova 9 T T +
0
- 9 So far and moving on n Intraprocedural dataflow analysis n CFGs, lattices, transfer functions, worklist
algorithm, etc. n Classical analyses n Interprocedural analysis n Analysis scope and approximation CSCI 4450/6450, A Milanova 10 10 6Program Analysis in Practice n Program analysis frameworks n LLVM n Ghidra n Soot n WALA, other CSCI 4450/6450, A Milanova 11 11 Soot: a framework for analysis and
optimization of Java/Dalvik bytecode n https://soot-oss.github.io/soot/ n History n Overview of Soot n From Java bytecode/Dalvik bytecode to typed 3-address code (Jimple) n 3-address code analysis and optimization n From Jimple to Java/Dalvik n Jimple n Analysis 12 12 7History n https://soot-oss.github.io/soot/ n Started by Prof. Laurie Hendren at McGill n First paper on Soot came in 1999 n Patrick Lam n Ondřej Lhoták n Eric Bodden n and other… n Now developed by Eric Bodden and his
group: https://github.com/soot-oss/soot 13CSCI 4450/6450, A Milanova 13 Overview of Soot Class files/APK JIMPLIFY Some IR ANALYSIS/ OPTIMIZATION Optimized jimple Class files/APK 14CSCI 4450/6450, A Milanova 14 8Advantages of Jimple and Soot n Jimple n Typed local variables n 16 simple 3-address statements (1 operator per
statement). Bridges gap from analysis
abstraction to analysis implementation n Soot provides n Itraprocedural dataflow analysis framework n Points-to analysis for Java n IR from Dalvik and taint analysis n Other analyses and optimizations 15 15 n Run soot: java soot.Main –jimple A
(need paths) public class A { main(String[] args) { A a = new A(); a.m(); } public void m() { }
} public class A extends java.lang.Object
{
public void () {
A r0; r0 := @this: A; specialinvoke r0. ()>(); return;
} … (continues on next slide…) 16 Jimple CSCI 4450/6450, A Milanova 16 9public class A { main(String[] args) { A a = new A(); a.m(); } public void m() { }
} … public void m() { A r0; r0 := @this: A; return; }
… Jimple CSCI 4450/6450, A Milanova 17 Java: Jimple: 17 public class A { main(String[] args) { A a = new A(); a.m(); } public void m() { }
} … main(java.lang.String[]) { java.lang.String[] r0; A $r1, r2; r0 := @parameter0: java.lang.String[]; $r1 = new A; specialinvoke $r1.()>(); r2 = $r1; virtualinvoke r2.(); return;
}
} Jimple CSCI 4450/6450, A Milanova 18 Java: Jimple: 18 10 n Abstracts program constructs n Some basic Soot classes and interfaces
n SootClass n SootMethod n SootMethod sm; sm.isMain(), sm.isStatic(), etc. n Local n Local l; … l.getType() n InstanceInvokeExpr n Represents an instance (as opposed to static) invoke
expression n InstanceInvokeExpr iie; … receiver = iie.getBase(); Soot Abstractions. Look up API! CSCI 4450/6450, A Milanova 19 19 n Github project:
https://github.com/soot-oss/soot n Javadoc:
https://soot-build.cs.uni-paderborn.de/public/origin/develop/soot/soot- develop/jdoc/ (old) https://javadoc.io/doc/ca.mcgill.sable/soot/latest/index.html Resources CSCI 4450/6450, A Milanova 20 20 11 n Constructor/Super Call: n Virtual Call: n Static Call: n Interface Call: 1. We should not need to worry about dynamicInvoke. (Soot
does support it.) A a = new A(); $r1 = new A; specialinvoke $r1.()>(); a.m(); virtualinvoke r2.(); sm(); staticinvoke (); interfaceinvoke r0.();x.m(); 4 Kinds of Calls1 21 21 Outline of Today’s Class n Program analysis in practice n Program analysis frameworks n Soot program analysis framework n Ghidra framework n Analysis scope and approximation n Overview of class analysis framework (HW2) n Class analysis CSCI 4450/6450, A Milanova 22 22 12 Analysis Scope n Intraprocedural analysis n Scope is the CFG of a single subroutine n Assumes no call and returns in routine, or
models calls and returns n What we did so far n Interprocedural analysis n Scope of analysis is the ICFG (Interprocedural
CFG), which models flow of control between
routines CSCI 4450/6450, A Milanova 23 23 Analysis Scope n Whole-program analysis n Usually, assumes entry point “main” n Application code + libraries n Intricate interdependences, e.g., Android apps n Modular analysis n Scope either a library without entry point n or application code with missing libraries n … or a library that depends on other missing
libraries
CSCI 4450/6450, A Milanova 24 24 13 Approximations n Once we tackle the “whole program”
maintaining a solution per program point (i.e.,
in(j) and out(j) sets) becomes too expensive n Approximations n Transfer function space n Lattice
n Context sensitivity n Flow sensitivity CSCI 4450/6450, A Milanova 25 25 Context Sensitivity n So far, we studied intraprocedural analysis
n Once we extend to interprocedural analysis
the issue of “context sensitivity” comes up n Interprocedural analysis can be context- insensitive or context-sensitive n In our Java homework, we’ll work with context- insensitive analyses n We’ll talk more about context-sensitive analysis 26CSCI 4450/6450, A Milanova 26 14 Context Insensitivity n Context-insensitive analysis makes one big
CFG; reduces the problem to standard
dataflow, which we know how to solve n Treats implicit assignment of actual-to- parameter and return-to-left_hand_side as
explicit assignment n E.g., x = id(y) where id: int id(int p) { return p; } adds p = y // flow of values from arg to param and
x = ret // flow of return to left_hand_side n Can be flow-sensitive or flow-insensitive 27 27 Context Insensitivity int id(int p) { return p;
} a = 5; 2: b = id(a); x = b*b; c = 6; 5: d = id(c);
CSCI 4450/6450, A Milanova 28 1. a = 5 2. p = a call id 3. return id b = ret 5. p = c call id 6. return id d = ret 4. x = b*b c = 6 7. entry id
8. ret = p
9. exit id
28 15 Flow Sensitivity n Flow-sensitive vs. flow-insensitive analysis n Flow-sensitive analysis maintains the CFG
and computes a solution per each node in
CFG (i.e. each program point) n Standard dataflow analysis is flow-sensitive n For large programs, maintaining CFG and
solution per program point does not scale CSCI 4450/6450, A Milanova 29 29 Flow Insensitivity n Flow-insensitive analysis discards CFG
edges and computes a single solution S n A “declarative” definition, i.e., specification: n Least solution S of equations S = fj(S) V S n Points-to analysis is an example where such a
solution makes sense! CSCI 4450/6450, A Milanova 30 30 16 Flow Insensitivity n An “operational” definition. A worklist
algorithm: S = 0, W = { 1, 2, … n } /* all nodes */ while W ≠ Ø do { remove j from W S = fj(S) V S if S changed then
W = W U { k | k is ”successor” of j }
} n “successor” is not CFG successor nodes, but
more generally, nodes k whose transfer
function fk may be affected as a result of the
change in S by j 31 31 Your Homework n A bunch of flow-insensitive, context- insensitive analyses for Java n RTA, 0-CFA, PTA, other n Simple property space n Simple transfer functions n E.g., in fact, RTA gets rid of most CFG nodes,
processes just 2 kinds of nodes! n Millions of lines of code in seconds CSCI 4450/6450, A Milanova 32 32 17 Homework n Install and run starter code n Please let me as soon as possible if you have
issues n Frameworks are very fragile. They anger a lot n Look into your git_repo/sootOutput directory
and study Jimple n Study framework code and API n Soot API n Class analysis framework API CSCI 4450/6450, A Milanova 33 33 Homework n Overview of class analysis framework n We’ll discuss more on Thursday n Come prepared with questions CSCI 4450/6450, A Milanova 34 34