CSE6350
Advanced Topics in Computer Architecture
Fall
2012
Instructor: Hao Che
Email: hche@cse.uta.edu
GTA: TBD
GTA email: TBD
Office Location: ERB536
Office Hours: Monday & Wednesday,
2:00 – 3:50pm
or by appointment
Phone: 817.272.3631
Fax: 817.272.3784
Required
Text:
The main references of this course will
include handouts and presentation slides prepared by the instructor. The
following textbook is recommended as a reference book:
Computer Architecture: A
Quantitative Approach by John Hennessy and David Patterson, Morgan Kaufmann
Publishers, ISBN: 978-0-12-383872-8
Prerequisites:
CSE5350 and CSE5344 or consent of
instructor.
Course Objectives and
Outcomes:
This is
a 6000-level course designed for students in both networking and system tracks.
Accordingly, it covers two related subjects. The first subject addresses a
major Internet router implementation challenge, i.e., how to program router
interface cards using the state-of-the-art multithreaded multicore processors
or chip multiprocessors (CMPs) to achieve high speed forwarding performance.
This subject is a sequel to CSE5344 and it also serves as a motivating case
study for the second subject. The second subject provides in-depth coverage of
the emerging CMP architectures and is a sequel to CSE5350. The aim is to not
only cover known facts but also stimulate research interests in addressing
fundamental challenges facing the design and programming of CMPs. As the number
of cores in a CMP ever increases, how to design and program CMPs to achieve
desired performance for various workloads becomes a challenge. Clearly, the
traditional uniprocessor analysis approaches, such as benchmark testing and
cycle-accurate simulation, quickly become ineffective as the number of cores in
a CMP increases. To tackle this challenge, this course will introduce a novel
thread-level analysis methodology for large design space exploration of CMP
architectures. In particular, based on this methodology, initial results on the
development of performance bound analysis, bottleneck resource identification,
simulation, and analytical modeling techniques, all at the thread level, will
be introduced. These techniques will be further explored and applied to the
analysis of various aspects of CMP architectures by the students in a term
project.
Grading policy:
There will be 5 quizzes throughout the semester, which will be announced
at least one week in advance (NOT popup). Among those quizzes, the one with the
lowest score will NOT be counted toward the final grade. There will also be a
research oriented term project.
Table
of Contents:
1. Probability, Stochastic Process, and Queuing
Theory
2. Event-Driven Simulation Basics
3. Fundamentals of Chip Multiprocessor
a. Background
i.
Single
Issue
ii.
Superscalar
iii.
Simultaneous
Multithreading
iv.
Multithreaded
Multcore
b. Fundamentals
i.
On-Chip
Networks for Multicore Systems
ii.
Tiled
Multicore Processors
iii.
General-Purpose Multicore Processors
iv.
Throughput-Oriented
Multicore Processors
v.
Stream
Processors
vi.
Speculative
Multithreading
vii.
Memory
Transactions for Multicore Systems
4. A Thread-Level Analysis Methodology for CMP
a. Methodology
i.
Generic
CMP Organization
ii.
Code
Path
iii.
Design
Space
b. Fast Performance Bound Estimation
i.
Instruction
Budget Estimation
ii.
ALU
Work Conserving Condition
iii.
Conditions
and Worst-Case Bounds for Wire-Speed Processing
c.
Large
Design Space Exploration Based on Function Analysis
i.
Resource
Bottleneck Identification
1) Operational Analysis Basics
2) Single Core
3) Multicore
ii.
Stochastic
models
1) Single-Core
2) Multicore
5. A Case Study: Router Interface Card
Programming
a. System Overview
i.
IP
Networking Architecture Overview
ii.
Router
Architecture
iii.
Network
Interface Card
iv.
Network
Processor and TCAM coprocessor
b. Protocol Stacks and Data formats
i.
Data
Link Layer: POS, Ethernet, VLAN, AAL5, PPP, ARP
ii.
Internetworking
Layer: IP, IS-IS, OSPF, BGP, RIP, MPLS, DiffServ
iii.
Transport
Layer: TCP, UDP, ICMP
c.
Data
Plane and Control Plane Separation
i.
Routing
and Forwarding
ii.
Multipath
Routing and Equal Cost load balancing
iii.
MPLS
signaling and Label Swapping
iv.
DiffServ Policy
Control and Traffic Conditioning
v.
Issues
with PPP, ICMP, and ARP protocols
d. Data Plane Function Partitioning
i.
Ingress
or Egress
ii.
Fast
Path or Slow Path
e.
Data
Plane Implementation using Network Processors and TCAM coprocessors
i.
Function
Requirements
ii.
Network
Processor Organization
iii.
Resource
mapping and data structure
iv.
Instruction
Set
v.
Micro-code
programming