CSE6350 Advanced Topics in Computer Architecture

Fall 2012

Monday & Wednesday, 4:00 – 5:30 PM, WH 208

 

 

Instructor:             Hao Che

Email:                    hche@cse.uta.edu

GTA:                      TBD

GTA email:           TBD

 

 

Office Location:  ERB536

Office Hours:       Monday & Wednesday, 2:00 – 3:50pm

                                or by appointment

Phone:                   817.272.3631

Fax:                        817.272.3784

Required Text:

The main references of this course will include handouts and presentation slides prepared by the instructor. The following textbook is recommended as a reference book: 

 

Computer Architecture: A Quantitative Approach by John Hennessy and David Patterson, Morgan Kaufmann Publishers, ISBN: 978-0-12-383872-8

 

 

Prerequisites:

CSE5350 and CSE5344 or consent of instructor.

 

Course Objectives and Outcomes:

 

This is a 6000-level course designed for students in both networking and system tracks. Accordingly, it covers two related subjects. The first subject addresses a major Internet router implementation challenge, i.e., how to program router interface cards using the state-of-the-art multithreaded multicore processors or chip multiprocessors (CMPs) to achieve high speed forwarding performance. This subject is a sequel to CSE5344 and it also serves as a motivating case study for the second subject. The second subject provides in-depth coverage of the emerging CMP architectures and is a sequel to CSE5350. The aim is to not only cover known facts but also stimulate research interests in addressing fundamental challenges facing the design and programming of CMPs. As the number of cores in a CMP ever increases, how to design and program CMPs to achieve desired performance for various workloads becomes a challenge. Clearly, the traditional uniprocessor analysis approaches, such as benchmark testing and cycle-accurate simulation, quickly become ineffective as the number of cores in a CMP increases. To tackle this challenge, this course will introduce a novel thread-level analysis methodology for large design space exploration of CMP architectures. In particular, based on this methodology, initial results on the development of performance bound analysis, bottleneck resource identification, simulation, and analytical modeling techniques, all at the thread level, will be introduced. These techniques will be further explored and applied to the analysis of various aspects of CMP architectures by the students in a term project.    

 

Grading policy:

 

There will be 5 quizzes throughout the semester, which will be announced at least one week in advance (NOT popup). Among those quizzes, the one with the lowest score will NOT be counted toward the final grade. There will also be a research oriented term project.

 

Table of Contents:

1.       Probability, Stochastic Process, and Queuing Theory

2.       Event-Driven Simulation Basics

3.       Fundamentals of Chip Multiprocessor

a.       Background

                                                            i.      Single Issue

                                                           ii.      Superscalar

                                                         iii.      Simultaneous Multithreading

                                                         iv.      Multithreaded Multcore

b.       Fundamentals

                                                            i.      On-Chip Networks for Multicore Systems

                                                           ii.      Tiled Multicore Processors

                                                         iii.       General-Purpose Multicore Processors

                                                         iv.      Throughput-Oriented Multicore Processors

                                                          v.      Stream Processors

                                                         vi.      Speculative Multithreading

                                                       vii.      Memory Transactions for Multicore Systems

 

4.       A Thread-Level Analysis Methodology for CMP

a.       Methodology

                                                            i.      Generic CMP Organization

                                                           ii.      Code Path

                                                         iii.      Design Space

b.       Fast Performance Bound Estimation

                                                            i.      Instruction Budget Estimation

                                                           ii.      ALU Work Conserving Condition

                                                         iii.      Conditions and Worst-Case Bounds for Wire-Speed Processing

c.        Large Design Space Exploration Based on Function Analysis

                                                            i.      Resource Bottleneck Identification

1)       Operational Analysis Basics

2)       Single Core

3)       Multicore

                                                           ii.      Stochastic models

1)       Single-Core

2)       Multicore

5.       A Case Study: Router Interface Card Programming

a.       System Overview

                                                            i.      IP Networking Architecture Overview

                                                           ii.      Router Architecture

                                                         iii.      Network Interface Card

                                                         iv.      Network Processor and TCAM coprocessor

b.       Protocol Stacks and Data formats

                                                            i.      Data Link Layer: POS, Ethernet, VLAN, AAL5, PPP, ARP

                                                           ii.      Internetworking Layer: IP, IS-IS, OSPF, BGP, RIP, MPLS, DiffServ

                                                         iii.      Transport Layer: TCP, UDP, ICMP

c.        Data Plane and Control Plane Separation

                                                            i.      Routing and Forwarding

                                                           ii.      Multipath Routing and Equal Cost load balancing

                                                         iii.      MPLS signaling and Label Swapping

                                                         iv.      DiffServ Policy Control and Traffic Conditioning

                                                          v.      Issues with PPP, ICMP, and ARP protocols

d.       Data Plane Function Partitioning

                                                            i.      Ingress or Egress

                                                           ii.      Fast Path or Slow Path

e.        Data Plane Implementation using Network Processors and TCAM coprocessors

                                                            i.      Function Requirements

                                                           ii.      Network Processor Organization

                                                         iii.      Resource mapping and data structure

                                                         iv.      Instruction Set

                                                          v.      Micro-code programming