程序代写案例-AIT 681

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top
AIT 681
Secure Software Engineering
Topic #17. Program Analysis: Basics
Instructor: Dr. Kun Sun
1
In Last Class
• Topic #15. Secure C
oding: Return-to-Libc
Attacks
– [Du]: Chapter 5
• Topic #16. Dirty COW Race Condition Attack
– [Du]: Chapter 8
2
Program Analysis
• The process of automatically analyzing the behavior of
computer programs regarding a property such as
correctness, robustness, safety and liveness.
• Program analysis focuses on two major areas
– program optimization: improving the program’s
performance while reducing the resource usage
– program correctness: ensuring that the program
does what it is supposed to do.
• Program analysis can be performed without executing
the program (static program analysis), during runtime
(dynamic program analysis) or in a combination of
both.
3
Program Analysis on Security
• Program analysis in the context of identifying
security vulnerabilities and defending security
attacks
• Need for program analysis
– A software related vulnerability is essentially a bug
in the software
• Identification
– Defending software oriented attacks
• Software transformation
• Guidance for bug fixing
4
Learning Goal
• Understand various state-of-the-art program
analysis approaches
• Useful Textbooks on Compiler
– Alfred V. Aho, Monical S. Lam, Ravi Sethi, Jeffrey D.
Ullman, Compilers Principles, Techniques, & Tools.
– Torben Ægidius Mogensen , Basics of Compiler Design,
http://hjemmesider.diku.dk/~torbenm/Basics/
5
Program Representation
Why Program Representations
• Original representations
– Source code (cross languages).
– Binaries (cross machines and platforms).
– Source code / binaries + test cases.
• They are hard for machines to analyze.
• Software is translated into certain
representations before analyses are applied.
7
Outline
• Control flow graphs
• Program dependence graphs
• Super control flow graphs
• Call graph
8
Control Flow Graph
• The most commonly used program
representation.
9
Program representation: Basic blocks
• A basic block in program P is a sequence of
consecutive statements with a single entry and a single
exit point. Thus, a block has unique entry and exit
points.
• Control always enters a basic block at its entry point
and exits from its exit point. There is no possibility of
exit or a halt at any point inside the basic block except
at its exit point.
• The entry and exit points of a basic block coincide
when the block contains only one statement.
10
Basic blocks: Example
Example: Computing x raised to y
11
Basic blocks: Example (contd.)
Basic blocks
12
Building Basic Blocks
• Identify leaders
– The first instruction in a procedure, or
– The target of any branch, or
– An instruction immediately following a
branch (implicit target)
• Gobble all subsequent instructions until the
next leader
13
Control Flow Graph (CFG)
• A control flow graph (or flow graph) G is
defined as a finite set N of nodes and a finite
set E of edges.
– An edge (i, j) in E connects two nodes ni
and nj in N.
– We often write G= (N, E) to denote a flow
graph G with nodes given by N and edges
by E.
14
Control Flow Graph (CFG)
• In a flow graph of a program, each basic block
becomes a node and edges are used to indicate
the flow of control between blocks.
• A directed edge (i, j) connecting basic blocks bi
and bj implies that control can go from block bi
to block bj.
• We also assume that there is a node labeled
Start in N that has no incoming edge, and
another node labeled End, also in N, that has
no outgoing edge.
15
CFG Example
N={Start, 1, 2, 3, 4, 5, 6, 7,
8, 9, End}
E={(Start,1), (1, 2), (1, 3), (2,4),
(3, 4), (4, 5), (5, 6), (6, 5),
(5, 7), (7, 8), (7, 9), (9, End)}
16
CFG Example
N={Start, 1, 2, 3, 4, 5, 6, 7,
8, 9, End}
E={(Start,1), (1, 2), (1, 3), (2,4),
(3, 4), (4, 5), (5, 6), (6, 5),
(5, 7), (7, 8), (7, 9), (9, End)}
Same CFG with statements
removed.
17
Building a CFG From Basic Block
• Construction
– Each CFG node represents a basic block
– There is an edge from node i to j if
• Last statement of block i branches to the
first statement of j, or
• Block i does not end with an
unconditional branch and is immediately
followed in program order by block j (fall
through)
18
Paths
19
Consider a flow graph G= (N, E). A sequence of k edges, k>0,
(e_1, e_2, … e_k) , denotes a path of length k through the flow
graph if the following sequence condition holds.
Given that np, nq, nr, and ns are nodes belonging to N,
and 0< inr. }
Complete path: a path from start to exit
Subpath: a subsequence of a complete path
Sample Paths
20
p1= ( Start, 1, 2, 4, 5, 6, 5, 7, 9, End)
p2= (Start, 1, 3, 4, 5, 6, 5, 7, 9, End)
Two feasible and complete paths:
Bold edges: complete path.
Dashed edges: subpath.
p1= ( (Start, 1), (1, 2), (2, 4), (4, 5), (5,
6), (6, 5), (5, 7), (7, 9), (9, End))
Specified unambiguously using edges:
Paths: (in)feasible paths
p1= ( Start, 1, 3, 4, 5, 6, 5, 7, 8, 9, End)
p2= (Start, 1, 2, 4, 5, 7, 9, End)
A path p through a flow graph for
program P is considered feasible if
there exists at least one test case which
when input to P causes p to be
traversed.
21
Number of paths
• There can be many distinct paths
through a program. A program with
no condition statements contains
exactly one path that begins at node
Start and terminates at node End.
• Each additional condition in the
program can increases the number of
distinct paths by at least one.
• Depending on their location,
conditions can have a multiplicative
effect on the number of paths.
22
Path Explosion
1. while (p1) {
2. if (p2) {
3. continue;
4. } else {
5. if (p3)
6. continue;
7. s1
8. }
9.}
23
A Simplified Version of CFG
• Each statement is represented by a node
– For readability.
– Not for efficient implementation.
24
Dominator
• X dominates Y if all possible program paths from
START to Y have to pass X.
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)DOM(6)={1,3, 6}
25
Note that a basic block is identified by the first statement in the block.
Dominator
• X strictly dominates Y if X dominates Y and X!=Y
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)SDOM(6)={1,3}
26
Dominator
• X is the immediate dominator of Y if X is the last
dominator of Y along a path from Start to Y.
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)IDOM(6)={3}
27
Is it possible X is not the last dominator of Y along another path?
Postdominator
• X post-dominates Y if every possible program path from Y to
End has to pass X.
– Strict post-dominator, immediate post-dominance.
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)PDOM(4)={3,6}
SPDOM(4)={3,6}
IPDOM(4)=3
28
Back Edges
• A back edge is an edge whose head dominates its tail
– Back edges often identify loops
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)
An directed edge (x, y) is considered to be directed from x to y; y is
called the head and x is called the tail of the arrow;
29
Outline
• Control flow graphs
• Program dependence graphs
• Super control flow graphs
• Call graph
30
Program Dependence Graph
• The second widely used program representation.
• Nodes are constituted by statements instead of basic
blocks.
• Two types of dependences between statements
– Data dependence
– Control dependence
• CFG focuses on the flow of control; PDG focuses on
dependences.
• Mainly used in debugging, security.
31
Data Dependence
• X is data dependent on Y if (1) there is a variable v that is
defined at Y and used at X and (2) there exists a path of
nonzero length from Y to X along which v is not re-defined.
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)
32
6 is data dependent on 5, and 1.
5 is data dependent on 1, and 4.
5 is not data dependent on 2,
since i is re-defined at 4.
Computing Data Dependence is Hard in General
• Aliasing
– A variable can refer to multiple memory locations/objects.
1: int x, y, z …;
2: int * p;
3: x=…;
4: y=…;
5: p = & x;
6: p = p +z;
7: … = *p;
33
Control Dependence
• Intuitively, Y is control-dependent on X iff X directly
determines whether Y executes (statements inside one branch
of a predicate are usually control dependent on the predicate)
– X is not strictly post-dominated by Y
– there exists a path from X to Y s.t. every node in the path other than X
and Y is post-dominated by Y
X
Y
Not post-dominated by Y
Every node is post-dominated by Y
There is a path from X to End that
does not pass Y or X==Y
No such paths for nodes in a path
between X and Y.
34
Control Dependence
• Intuitively, Y is control-dependent on X iff X directly
determines whether Y executes (statements inside one branch
of a predicate are usually control dependent on the predicate)
• The intuition:
1) X can control not to execute Y;
2) X can direct the control along a path that Y is promised to
be executed.
X
Y
Not post-dominated by Y
Every node is post-dominated by Y
35
Control Dependence - Example
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: sum=sum+i
endwhile
6: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: sum=sum+i
6: print (sum)CD(5)=3
CD(3)=3, tricky!
Y is control-dependent on X iff X directly determines whether
Y executes
X is not strictly post-dominated by Y
there exists a path from X to Y s.t. every node in the path
other than X and Y is post-dominated by Y
36
Control Dependence is not Syntactically Explicit
1: sum=0
2: i=1
3: while ( i4: i=i+1
5: if (i%2==0)
6: continue;
7: sum=sum+i
endwhile
8: print(sum)
3: while ( i1: sum=0
2: i=1
4: i=i+1
5: if (i%2==0)
8: print (sum)
7: sum=sum+i
Y is control-dependent on X iff X directly determines whether
Y executes
X is not strictly post-dominated by Y
there exists a path from X to Y s.t. every node in the path
other than X and Y is post-dominated by Y
37
CD(7) = 5
“Continue” will change the control dependency
Control Dependence is Tricky!
Can a statement control depend on two
predicates?
Y is control-dependent on X iff X directly determines whether
Y executes
X is not strictly post-dominated by Y
there exists a path from X to Y s.t. every node in the path
other than X and Y is post-dominated by Y
38
Control Dependence is Tricky!
1: if ( p1 || p2 )
2: s1;
3: s2;
1: ? p1
Can a statement control depend on two
predicates?
1: ? p2
2: s1
3: s2
What if ?
1: if ( p1 && p2 )
2: s1;
3: s2;
Y is control-dependent on X iff X directly determines whether
Y executes
X is not strictly post-dominated by Y
there exists a path from X to Y s.t. every node in the path
other than X and Y is post-dominated by Y
39
The Use of PDG
• A program dependence graph consists of control
dependence graph and data dependence graph
• Why it is so important to software reliability?
– In debugging, what could possibly induce the failure?
– In security, sensitive data leakage
p=getpassword( );

if (p==“zhang”) {
send (m);
}
p=getpassword( );

send (p);
40
Outline
• Control flow graphs
• Program dependence graphs
• Super control flow graphs
• Call graph
41
Super Control Flow Graph (SCFG)
• Also called interprocedural control flow graph (ICFG).
• Besides the normal intraprocedural control flow graph,
additional edges are added connecting
– Each call site to the beginning of the procedure it calls.
– The return statement back to the call site.
1: for (i=0; i2: t1= f(0);
3: t2 = f(243);
4: x[i] = t1 + t2 + t3;
5: }
6: int f (int v) {
7: return (v+1);
8: }
1
2
3
4
7
42The blue edges are called the calling edges. The green edges are called the return edges.
Outline
• Control flow graphs
• Program dependence graphs
• Super control flow graphs
• Call graph
43
Call Graph (CG)
• Each node represents a function; each edge
represents a function invocation
void A( ) {
B( );
C( );
}
void C ( ) {
D( );
A( );
}
void B( ) {
L1: D( );
L2: D( );
}
void D ( ) {
}
A
CB
D
44
Use of CG
• CFI (control flow integrity)
• Android framework access control
inconsistencies
• Hidden behavior detection in Android apps
45
Many Other Representations
• Points-to Graph.
• Static Single Assignment (SSA).
• Code Property Graph (CPG)
46
Tools
• C/C++: LLVM, CIL
• Java: SOOT, Wala
• Binary: Valgrind, Pin, DynamoRIO
47
Next Class
• Topic #18. Program Analysis: Dynamic Program
Analysis
– Program Tracing
– Dynamic Slicing
48

欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: Fudaojun0228