CSE6350 Advanced Topics in Computer Architecture

Fall 2012

Monday & Wednesday, 4:00 – 5:30 PM, WH 208

Instructor: Hao Che

Email: hche@cse.uta.edu

GTA: TBD

GTA email: TBD

Office Location: ERB536

Office Hours: Monday & Wednesday, 2:00 – 3:50pm

or by appointment

Phone: 817.272.3631

Fax: 817.272.3784

Required Text:

The main references of this course will include handouts and presentation slides prepared by the instructor. The following textbook is recommended as a reference book:

Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson, Morgan Kaufmann Publishers, ISBN: 978-0-12-383872-8

Prerequisites:

CSE5350 and CSE5344 or consent of instructor.

Course Objectives and Outcomes:

This is a 6000-level course designed for students in both networking and system tracks. Accordingly, it covers two related subjects. The first subject addresses a major Internet router implementation challenge, i.e., how to program router interface cards using the state-of-the-art multithreaded multicore processors or chip multiprocessors (CMPs) to achieve high speed forwarding performance. This subject is a sequel to CSE5344 and it also serves as a motivating case study for the second subject. The second subject provides in-depth coverage of the emerging CMP architectures and is a sequel to CSE5350. The aim is to not only cover known facts but also stimulate research interests in addressing fundamental challenges facing the design and programming of CMPs. As the number of cores in a CMP ever increases, how to design and program CMPs to achieve desired performance for various workloads becomes a challenge. Clearly, the traditional uniprocessor analysis approaches, such as benchmark testing and cycle-accurate simulation, quickly become ineffective as the number of cores in a CMP increases. To tackle this challenge, this course will introduce a novel thread-level analysis methodology for large design space exploration of CMP architectures. In particular, based on this methodology, initial results on the development of performance bound analysis, bottleneck resource identification, simulation, and analytical modeling techniques, all at the thread level, will be introduced. These techniques will be further explored and applied to the analysis of various aspects of CMP architectures by the students in a term project.

Grading policy:

There will be 5 quizzes throughout the semester, which will be announced at least one week in advance (NOT popup). Among those quizzes, the one with the lowest score will NOT be counted toward the final grade. There will also be a research oriented term project.

Quiz: 40%
Term Project: 60%

Table of Contents:

1. Probability, Stochastic Process, and Queuing Theory

2. Event-Driven Simulation Basics

3. Fundamentals of Chip Multiprocessor

a. Background

i. Single Issue

ii. Superscalar

iii. Simultaneous Multithreading

iv. Multithreaded Multcore

b. Fundamentals

i. On-Chip Networks for Multicore Systems

ii. Tiled Multicore Processors

iii. General-Purpose Multicore Processors

iv. Throughput-Oriented Multicore Processors

v. Stream Processors

vi. Speculative Multithreading

vii. Memory Transactions for Multicore Systems

4. A Thread-Level Analysis Methodology for CMP

a. Methodology

i. Generic CMP Organization

ii. Code Path

iii. Design Space

b. Fast Performance Bound Estimation

i. Instruction Budget Estimation

ii. ALU Work Conserving Condition

iii. Conditions and Worst-Case Bounds for Wire-Speed Processing

c. Large Design Space Exploration Based on Function Analysis

i. Resource Bottleneck Identification

1) Operational Analysis Basics

2) Single Core

3) Multicore

ii. Stochastic models

1) Single-Core

2) Multicore

5. A Case Study: Router Interface Card Programming

a. System Overview

i. IP Networking Architecture Overview

ii. Router Architecture

iii. Network Interface Card

iv. Network Processor and TCAM coprocessor

b. Protocol Stacks and Data formats

i. Data Link Layer: POS, Ethernet, VLAN, AAL5, PPP, ARP

ii. Internetworking Layer: IP, IS-IS, OSPF, BGP, RIP, MPLS, DiffServ

iii. Transport Layer: TCP, UDP, ICMP

c. Data Plane and Control Plane Separation

i. Routing and Forwarding

ii. Multipath Routing and Equal Cost load balancing

iii. MPLS signaling and Label Swapping

iv. DiffServ Policy Control and Traffic Conditioning

v. Issues with PPP, ICMP, and ARP protocols

d. Data Plane Function Partitioning

i. Ingress or Egress

ii. Fast Path or Slow Path

e. Data Plane Implementation using Network Processors and TCAM coprocessors

i. Function Requirements

ii. Network Processor Organization

iii. Resource mapping and data structure

iv. Instruction Set

v. Micro-code programming