1.1
| Network Systems And The Internet   1
|
1.2
| Applications Vs. Infrastructure   1
|
1.3
| Network Systems Engineering   2
|
1.4
| Packet Processing   2
|
1.5
| Achieving High Speed   3
|
1.6
| Network Speed   3
|
1.7
| Hardware, Software, And Hybrids   4
|
1.8
| Scope And Organization Of The Text   5
|
1.9
| Summary   5
|
| For Further Study   5
|
2.1
| Introduction   7
|
2.2
| Networks And Packets   7
|
2.3
| Connection-Oriented And Connectionless Paradigms   8
|
2.4
| Digital Circuits   8
|
2.5
| LAN And WAN Classifications   9
|
2.6
| The Internet And Heterogeneity   9
|
2.7
| Example Network Systems   9
|
2.8
| Broadcast Domains   10
|
2.9
| The Two Key Systems Used In The Internet   11
|
2.10
| Other Systems Used In The Internet   12
|
2.11
| Monitoring And Control Systems   13
|
2.12
| Summary   13
|
| For Further Study   13
|
3.1
| Introduction   15
|
3.2
| Protocols And Layering   15
|
3.3
| Layers 1 And 2 (Physical And Network Interface)   17
|
| 3.3.1
| Ethernet   17
|
| 3.3.2
| Ethernet Frame Format   17
|
| 3.3.3
| Ethernet Addresses   18
|
| 3.3.4
| Ethernet Type Field   19
|
3.4
| Layer 3 (Internet)   19
|
| 3.4.1
| The Internet Protocol   19
|
| 3.4.2
| IP Datagram Format   19
|
| 3.4.3
| IP Addresses   20
|
3.5
| Layer 4 (Transport)   20
|
| 3.5.1
| UDP And TCP   20
|
| 3.5.2
| UDP Datagram Format   21
|
| 3.5.3
| TCP Segment Format   22
|
3.6
| Protocol Port Numbers And Demultiplexing   23
|
3.7
| Encapsulation And Transmission   23
|
3.8
| Address Resolution Protocol   24
|
3.9
| Summary   24
|
| For Further Study   24
|
4.1
| Introduction   29
|
4.2
| A Conventional Computer System   29
|
4.3
| Network Interface Cards   30
|
4.4
| Definition Of A Bus   31
|
4.5
| The Bus Address Space   32
|
4.6
| The Fetch-Store Paradigm   33
|
4.7
| Network Interface Card Functionality   34
|
4.8
| NIC Optimizations For High Speed   34
|
4.9
| Onboard Address Recognition   35
|
| 4.9.1
| Unicast And Broadcast Recognition And Filtering   35
|
| 4.9.2
| Multicast Recognition And Filtering   35
|
4.10
| Onboard Packet Buffering   36
|
4.11
| Direct Memory Access   37
|
4.12
| Operation And Data Chaining   38
|
4.13
| Data Flow Diagram   39
|
4.14
| Promiscuous Mode   39
|
4.15
| Summary   40
|
| For Further Study   40
|
5.1
| Introduction   43
|
5.2
| State Information and Resource Exhaustion   43
|
5.3
| Packet Buffer Allocation   44
|
5.4
| Packet Buffer Size And Copying   45
|
5.5
| Protocol Layering And Copying   45
|
5.6
| Heterogeneity And Network Byte Order   46
|
5.7
| Bridge Algorithm   47
|
5.8
| Table Lookup And Hashing   49
|
5.9
| IP Datagram Fragmentation And Reassembly   50
|
| 5.9.1
| Interpretation Of The Flags Field   51
|
| 5.9.2
| Interpretation Of The Fragment Offset Field   51
|
| 5.9.3
| IP Fragmentation Algorithm   52
|
| 5.9.4
| Fragmenting A Fragment   53
|
| 5.9.5
| IP Reassembly   53
|
| 5.9.6
| Grouping Fragments Together   54
|
| 5.9.7
| Fragment Position   54
|
| 5.9.8
| IP Reassembly Algorithm   55
|
5.10
| IP Datagram Forwarding   56
|
5.11
| IP Forwarding Algorithm   57
|
5.12
| High-Speed IP Forwarding   57
|
5.13
| TCP Connection Recognition Algorithm   59
|
5.14
| TCP Splicing Algorithm   60
|
5.15
| Summary   63
|
| For Further Study   63
|
| Exercises   63
|
6.1
| Introduction   67
|
6.2
| Packet Processing   68
|
6.3
| Address Lookup And Packet Forwarding   68
|
6.4
| Error Detection And Correction   69
|
6.5
| Fragmentation, Segmentation, And Reassembly   70
|
6.6
| Frame And Protocol Demultiplexing   70
|
6.7
| Packet Classification   71
|
| 6.7.1
| Static And Dynamic Classification   71
|
| 6.7.2
| Demultiplexing Vs. Classification   71
|
| 6.7.3
| Optimized Packet Processing   72
|
| 6.7.4
| Classification Languages   72
|
6.8
| Queueing And Packet Discard   73
|
| 6.8.1
| Basic Queueing   73
|
| 6.8.2
| Priority Mechanisms   73
|
| 6.8.3
| Packet Discard   75
|
6.9
| Scheduling And Timing   75
|
6.10
| Security: Authentication And Privacy   76
|
6.11
| Traffic Measurement And Policing   76
|
6.12
| Traffic Shaping   77
|
6.13
| Timer Management   79
|
6.14
| Summary   80
|
| For Further Study   80
|
| Exercises   80
|
7.1
| Introduction   83
|
7.2
| Implementation Of Packet Processing In An Application   83
|
7.3
| Fast Packet Processing In Software   84
|
7.4
| Embedded Systems   84
|
7.5
| Operating System Implementations   85
|
7.6
| Software Interrupts And Priorities   85
|
7.7
| Multiple Priorities And Kernel Threads   87
|
7.8
| Thread Synchronization   88
|
7.9
| Software For Layered Protocols   88
|
| 7.9.1
| One Thread Per Layer   89
|
| 7.9.2
| One Thread Per Protocol   90
|
| 7.9.3
| Multiple Threads Per Protocol   90
|
| 7.9.4
| Separate Timer Management Threads   90
|
| 7.9.5
| One Thread Per Packet   91
|
7.10
| Asynchronous Vs. Synchronous Programming   92
|
7.11
| Summary   93
|
| For Further Study   93
|
| Exercises   93
|
8.1
| Introduction   97
|
8.2
| Network Systems Architecture   97
|
8.3
| The Traditional Software Router   98
|
8.4
| Aggregate Data Rate   99
|
8.5
| Aggregate Packet Rate   99
|
8.6
| Packet Rate And Software Router Feasibility   101
|
8.7
| Overcoming The Single CPU Bottleneck   103
|
8.8
| Fine-Grain Parallelism   104
|
8.9
| Symmetric Coarse-Grain Parallelism   104
|
8.10
| Asymmetric Coarse-Grain Parallelism   105
|
8.11
| Special-Purpose Coprocessors   105
|
8.12
| ASIC Coprocessor Implementation   106
|
8.13
| NICs With Onboard Processing   107
|
8.14
| Smart NICs With Onboard Stacks   108
|
8.15
| Cells And Connection-Oriented Addressing   108
|
8.16
| Data Pipelines   109
|
8.17
| Summary   111
|
| For Further Study   111
|
| Exercises   111
|
9.1
| Introduction   115
|
9.2
| Inherent Limits Of Demultiplexing   115
|
9.3
| Packet Classification   116
|
9.4
| Software Implementation Of Classification   117
|
9.5
| Optimizing Software-Based Classification   118
|
9.6
| Software Classification On Special-Purpose Hardware   119
|
9.7
| Hardware Implementation Of Classification   119
|
9.8
| Optimized Classification Of Multiple Rule Sets   120
|
9.9
| Classification Of Variable-Size Headers   122
|
9.10
| Hybrid Hardware\|/\|Software Classification   123
|
9.11
| Dynamic Vs. Static Classification   124
|
9.12
| Fine-Grain Flow Creation   125
|
9.13
| Flow Forwarding In A Connection-Oriented Network   126
|
9.14
| Connectionless Network Classification And Forwarding   126
|
9.15
| Second Generation Network Systems   127
|
9.16
| Embedded Processors In Second Generation Systems   128
|
9.17
| Classification And Forwarding Chips   129
|
9.18
| Summary   130
|
| For Further Study   130
|
| Exercises   130
|
10.1
| Introduction   133
|
10.2
| Bandwidth Of An Internal Fast Path   133
|
10.3
| The Switching Fabric Concept   134
|
10.4
| Synchronous And Asynchronous Fabrics   135
|
10.5
| A Taxonomy Of Switching Fabric Architectures   136
|
10.6
| Dedicated Internal Paths And Port Contention   136
|
10.7
| Crossbar Architecture   137
|
10.8
| Basic Queueing   139
|
10.9
| Time Division Solutions: Sharing Data Paths   141
|
10.10
| Shared Bus Architecture   141
|
10.11
| Other Shared Medium Architectures   142
|
10.12
| Shared Memory Architecture   143
|
10.13
| Multistage Fabrics   144
|
10.14
| Banyan Architecture   145
|
10.15
| Scaling A Banyan Switch   146
|
10.16
| Commercial Technologies   148
|
10.17
| Summary   148
|
| For Further Study   149
|
| Exercises   149
|
11.1
| Introduction   153
|
11.2
| The CPU In A Second Generation Architecture   153
|
11.3
| Third Generation Network Systems   154
|
11.4
| The Motivation For Embedded Processors   155
|
11.5
| RISC vs. CISC   155
|
11.6
| The Need For Custom Silicon   156
|
11.7
| Definition Of A Network Processor   157
|
11.8
| A Fundamental Idea: Flexibility Through Programmability   158
|
11.9
| Instruction Set   159
|
11.10
| Scalability With Parallelism And Pipelining   159
|
11.11
| The Costs And Benefits Of Network Processors   160
|
11.12
| Network Processors And The Economics Of Success   161
|
11.13
| The Status And Future Of Network Processors   162
|
11.14
| Summary   162
|
| For Further Study   163
|
| Exercises   163
|
12.1
| Introduction   165
|
12.2
| Network Processor Functionality   165
|
12.3
| Packet Processing Functions   166
|
12.4
| Ingress And Egress Processing   167
|
| 12.4.1
| Ingress Processing   167
|
| 12.4.2
| Egress Processing   168
|
12.5
| Parallel And Distributed Architecture   170
|
12.6
| The Architectural Roles Of Network Processors   171
|
12.7
| Consequences For Each Architectural Role   171
|
12.8
| Macroscopic Data Pipelining And Heterogeneity   173
|
12.9
| Network Processor Design And Software Emulation   173
|
12.10
| Summary   174
|
| For Further Study   174
|
| Exercises   175
|
13.1
| Introduction   177
|
13.2
| Architectural Variety   177
|
13.3
| Primary Architectural Characteristics   178
|
| 13.3.1
| Processor Hierarchy   178
|
| 13.3.2
| Memory Hierarchy   179
|
| 13.3.3
| Internal Transfer Mechanisms   181
|
| 13.3.4
| External Interface And Communication Mechanisms   182
|
| 13.3.5
| Special-Purpose Hardware   183
|
| 13.3.6
| Polling And Notification Mechanisms   183
|
| 13.3.7
| Concurrent Execution Support   184
|
| 13.3.8
| Hardware Support For Programming   185
|
| 13.3.9
| Hardware And Software Dispatch Mechanisms   185
|
| 13.3.10
| Implicit Or Explicit Parallelism   186
|
13.4
| Architecture, Packet Flow, And Clock Rates   186
|
13.5
| Software Architecture   189
|
13.6
| Assigning Functionality To The Processor Hierarchy   189
|
13.7
| Summary   191
|
| For Further Study   192
|
| Exercises   192
|
14.1
| Introduction   195
|
14.2
| The Processing Hierarchy And Scaling   195
|
14.3
| Scaling By Making Processors Faster   196
|
14.4
| Scaling By Increasing The Number of Processors   196
|
14.5
| Scaling By Increasing Processor Types   197
|
14.6
| Scaling A Memory Hierarchy   198
|
14.7
| Scaling By Increasing Memory Size   200
|
14.8
| Scaling By Increasing Memory Bandwidth   200
|
14.9
| Scaling By Increasing Types Of Memory   201
|
14.10
| Scaling By Adding Memory Caches   202
|
14.11
| Scaling With Content Addressable Memory   203
|
14.12
| Using CAM for Packet Classification   205
|
14.13
| Other Limitations On Scale   207
|
14.14
| Software Scalability   208
|
14.15
| Bottlenecks And Scale   209
|
14.16
| Summary   209
|
| For Further Study   210
|
| Exercises   210
|
15.1
| Introduction   213
|
15.2
| An Explosion Of Commercial Products   213
|
15.3
| A Selection of Products   214
|
15.4
| Two-Stage Pipeline (Agere)   214
|
15.5
| Augmented RISC Processor (Alchemy)   216
|
15.6
| Embedded Processor Plus Coprocessors (AMCC)   218
|
15.7
| Pipeline Of Homogeneous Processors (Cisco)   219
|
15.8
| Pipeline Of Heterogeneous Processors (EZchip)   220
|
15.9
| Extensive And Diverse Processors (Hifn)   222
|
15.10
| Flexible RISC Plus Coprocessors (Motorola)   224
|
15.11
| Extremely Long Homogeneous Pipeline (Xelerated)   228
|
15.12
| Summary   228
|
| For Further Study   229
|
| Exercises   229
|
16.1
| Introduction   233
|
16.2
| Low Development Cost Vs. Performance   233
|
16.3
| Programmability Vs. Processing Speed   234
|
16.4
| Performance: Packet Rate, Data Rate, And Bursts   234
|
16.5
| Speed Vs. Functionality   235
|
16.6
| Per-Interface Rate Vs. Aggregate Data Rate   235
|
16.7
| Network Processor Speed Vs. Bandwidth   235
|
16.8
| Coprocessor Design: Lookaside Vs. Flow-Through   236
|
16.9
| Pipelining: Uniform Vs. Synchronized   236
|
16.10
| Explicit Parallelism Vs. Cost And Programmability   236
|
16.11
| Parallelism: Scale Vs. Packet Ordering   237
|
16.12
| Parallelism: Speed Vs. Stateful Classification   237
|
16.13
| Memory: Speed Vs. Programmability   237
|
16.14
| I/O Performance Vs. Pin Count   238
|
16.15
| Programming Languages: A Three-Way Tradeoff   238
|
16.16
| Multithreading: Throughput Vs. Programmability   238
|
16.17
| Traffic Management Vs. Blind Forwarding At Low Cost   239
|
16.18
| Generality Vs. Specific Architectural Role   239
|
16.19
| Memory Type: Special-Purpose Vs. General-Purpose   239
|
16.20
| Backward Compatibility Vs. Architectural Advances   240
|
16.21
| Parallelism Vs. Pipelining   240
|
16.22
| Summary   241
|
| Exercises   241
|
17.1
| Introduction   245
|
17.2
| Intel Terminology   245
|
17.3
| IXA: Internet Exchange Architecture   246
|
17.4
| IXP: Internet Exchange Processor   246
|
17.5
| Basic IXP2xxx Features   247
|
17.6
| External Connections   248
|
| 17.6.1
| Serial Line Interface   249
|
| 17.6.2
| PCI Bus   249
|
| 17.6.3
| Media Switch Fabric Interface   249
|
| 17.6.4
| DRAM Bus   250
|
| 17.6.5
| SRAM Buses   250
|
| 17.6.6
| Slowport Bus   250
|
17.7
| Internal Components   250
|
17.8
| IXP2xxx Processor Hierarchy   252
|
| 17.8.1
| General-Purpose Processor   253
|
| 17.8.2
| Embedded RISC Processor (XScale)   253
|
| 17.8.3
| I\^/\^O Processors (Microengines)   253
|
| 17.8.4
| Coprocessors And Other Functional Units   254
|
| 17.8.5
| Physical Interface Processors   254
|
17.9
| IXP2xxx Memories   254
|
17.10
| Word And Longword Accesses   256
|
17.11
| An Example Of Underlying Complexity   257
|
17.12
| Other Hardware Facilities   258
|
17.13
| Summary   258
|
| For Further Study   259
|
| Exercises   259
|
18.1
| Introduction   261
|
18.2
| Purpose Of An Embedded Processor   261
|
18.3
| XScale Architecture   263
|
18.4
| RISC Instruction Set And Registers   264
|
18.5
| XScale Memory Architecture   264
|
18.6
| XScale Memory Map   265
|
18.7
| Virtual Address Space And Memory Management   265
|
18.8
| Shared Memory And Address Translation   266
|
18.9
| Internal Peripheral Units   267
|
| 18.9.1
| Serial Connection Through UART Hardware   267
|
| 18.9.2
| Countdown Timers   268
|
| 18.9.3
| General-Purpose I/O Pins   268
|
18.10
| Other I/O   268
|
18.11
| User And Kernel Mode Operation   268
|
18.12
| Coprocessor 15   269
|
18.13
| Summary   269
|
| For Further Study   269
|
| Exercises   270
|
19.1
| Introduction   273
|
19.2
| The Purpose Of Microengines   273
|
19.3
| Microengine Architecture   274
|
19.4
| The Concept Of Microsequencing   274
|
19.5
| Microengine Instruction Set   275
|
19.6
| Separate Memory Address Spaces   277
|
19.7
| Execution Pipeline   278
|
19.8
| The Concept Of Instruction Stalls   279
|
19.9
| Conditional Branching And Pipeline Abort   281
|
19.10
| Memory Access Delay   281
|
19.11
| Hardware Threads And Context Switching   282
|
19.12
| Microengine Instruction Store   284
|
19.13
| Microengine Hardware Registers   285
|
19.14
| General-Purpose Registers   285
|
| 19.14.1
| Context-Relative Vs. Absolute Registers   285
|
| 19.14.2
| Register Banks   285
|
19.15
| Transfer Registers   287
|
19.16
| Next Neighbor Registers And Software Pipeline   287
|
19.17
| Local Memory   288
|
19.18
| Content Addressable Memory (CAM)   289
|
19.19
| Local Control And Status Registers (CSRs)   290
|
19.20
| Inter-Processor Communication   290
|
19.21
| SHaC Unit   291
|
19.22
| SHaC Architecture   292
|
19.23
| Scratchpad Memory   293
|
19.24
| Hash Unit   293
|
19.25
| Configuration, Control, And Status Registers   295
|
19.26
| Media Switch Fabric Interface   296
|
19.27
| Transmit And Receive BUFs   296
|
19.28
| Crypto Unit   297
|
19.29
| Summary   298
|
| For Further Study   298
|
| Exercises   299
|
20.1
| Introduction   301
|
20.2
| Reference Systems   301
|
20.3
| The Intel Reference System   302
|
| 20.3.1
| Intel's Hardware Testbed   302
|
| 20.3.2
| Intel's SDK And Related Software   303
|
20.4
| Operating System Used On The XScale   304
|
20.5
| External Host Operating System And Workbench   304
|
20.6
| External File Access And Storage   305
|
20.7
| Bootstrapping The Reference Hardware   306
|
20.8
| Running Software   306
|
20.9
| System Reboot   306
|
20.10
| Alternative Cross-Development Software   307
|
20.11
| Summary   307
|
| For Further Study   308
|
| Exercises   308
|
21.1
| Introduction   311
|
21.2
| Support Software And Overall Structure   311
|
21.3
| Pieces Of Software   312
|
21.4
| Microblocks, Interconnections, And Pipeline Organization   312
|
21.5
| Assignment Of Microblocks To Microengines   313
|
21.6
| Mpackets And Transfers   313
|
21.7
| Ingress (RX) And Egress (TX) Microblocks   314
|
21.8
| Microblocks And Parallel Execution   314
|
21.9
| Packet Buffers   315
|
21.10
| Buffer Queues And Buffer Allocation   316
|
21.11
| Buffer Handles And Packet Discard   318
|
21.12
| Packet Forwarding And Memory Rings   319
|
21.13
| Queue Array Hardware And Spilling   320
|
21.14
| Exceptions, Core Components, And XScale Processing   321
|
21.15
| Summary   321
|
| Exercises   322
|
22.1
| Introduction   325
|
22.2
| XScale Responsibilities   325
|
22.3
| Conceptual Organization Of XScale Software   326
|
22.4
| Core Component Infrastructure (CCI)   326
|
22.5
| Resource Manager (RM)   327
|
22.6
| Operating System Specific Library (OSSL)   328
|
22.7
| Hardware Abstraction Layer (HAL)   328
|
22.8
| Memory Management   328
|
22.9
| Allocation Of Local Memory   330
|
22.10
| Address Translation   330
|
22.11
| Ring And Queue Interface   331
|
22.12
| Buffer Management Facilities   332
|
22.13
| Organization Of Core Software   332
|
22.14
| Patching Symbols And Loading Microcode   334
|
22.15
| Summary   336
|
| Exercises   336
|
23.1
| Introduction   339
|
23.2
| Intel's Microengine Assembler   339
|
23.3
| Microengine Assembly Language Syntax   340
|
23.4
| Example Operand Syntax   341
|
23.5
| Symbolic Register Names And Allocation   344
|
23.6
| Register Types And Syntax   345
|
23.7
| Local Register Scope, Nesting, And Shadowing   346
|
23.8
| Register Assignments And Conflicts   347
|
23.9
| The Macro Preprocessor   348
|
23.10
| Macro Definition   348
|
23.11
| Repeated Generation Of A Code Segment   350
|
23.12
| Structured Programming Directives   351
|
23.13
| Instructions That Can Cause A Context Switch   353
|
23.14
| Indirect Reference   354
|
23.15
| External Transfers   355
|
23.16
| Summary   356
|
| For Further Study   357
|
| Exercises   357
|
24.1
| Introduction   359
|
24.2
| Specialized Memory Operations   359
|
24.3
| Ring And Queue Manipulation   360
|
24.4
| Processor Coordination Via Bit Testing   360
|
24.5
| Atomic Memory Operations   361
|
24.6
| Critical Sections And Folding   362
|
24.7
| Control And Status Registers   364
|
24.8
| Intel Dispatch Loop Macros   365
|
24.9
| Traffic Management And Packet Scheduling   366
|
24.10
| Accessing Fields In A Packet Header   366
|
24.11
| Dispatch Loop And Associated Variables   368
|
24.12
| Header Caching   369
|
24.13
| Packet I/O And The Concept Of Mpackets   369
|
24.14
| Ingress And Egress Packet Transfer   370
|
24.15
| I/O Details   371
|
24.16
| Summary   372
|
| For Further Study   373
|
| Exercises   373
|
25.1
| Introduction   375
|
25.2
| An Example Implementation Of NAT   375
|
25.3
| NAT Complexity And Simplifying Assumptions   377
|
25.4
| Network Address And Port Translation   377
|
25.5
| Ping Packets And Identifiers   378
|
25.6
| Dynamic NAT Table Creation And Management   378
|
25.7
| Organization Of The Code   379
|
25.8
| ARP Processing   381
|
25.9
| Implementation Of The NAT Microblock   382
|
25.10
| Header Caching And Alignment   384
|
25.11
| NAT Table Lookup   384
|
25.12
| Header Fields That NAT Changes   387
|
25.13
| Definition Of Constants For The Entire System   388
|
25.14
| Constants And Types For The User Interface   390
|
25.15
| Definitions Of Scratch Ring Constants   391
|
25.16
| Overall Organization Of The NAT Microblock   392
|
25.17
| Macros Used To Implement NAT   400
|
25.18
| Optimized ARP Table Lookup   410
|
25.19
| Header Modification And Checksum Computation   410
|
25.20
| Core Component   411
|
25.21
| Core Component Initialization   411
|
| 25.21.1
| Device Registration   411
|
| 25.21.2
| Other Initialization   411
|
| 25.21.3
| Patching And Loading Microcode   412
|
| 25.21.4
| Scratch Rings And Interfaces   413
|
| 25.21.5
| Hash Engine Initialization   413
|
| 25.21.6
| Starting Microengines And A Timer Thread   413
|
25.22
| Core Component Packet Processing   413
|
25.23
| Cleanup   414
|
25.24
| Structure Of The Core Component Kernel Module   415
|
| 25.24.1
| Protocol Declarations Used By The Core Component   415
|
| 25.24.2
| Core Component Initialization And Pseudo Device   417
|
| 25.24.3
| Core Component Packet Handler   431
|
25.25
| User Interface Application Code   445
|
25.26
| Summary   449
|
| Exercises   449
|