1.1
| Network Systems And The Internet   1
|
1.2
| Applications Vs. Infrastructure   1
|
1.3
| Network Systems Engineering   2
|
1.4
| Packet Processing   2
|
1.5
| Achieving High Speed   3
|
1.6
| Network Speed   3
|
1.7
| Hardware, Software, And Hybrids   4
|
1.8
| Scope And Organization Of The Text   5
|
1.9
| Summary   5
|
| For Further Study   5
|
2.1
| Introduction   7
|
2.2
| Networks And Packets   7
|
2.3
| Connection-Oriented And Connectionless Paradigms   8
|
2.4
| Digital Circuits   8
|
2.5
| LAN And WAN Classifications   9
|
2.6
| The Internet And Heterogeneity   9
|
2.7
| Example Network Systems   9
|
2.8
| Broadcast Domains   10
|
2.9
| The Two Key Systems Used In The Internet   11
|
2.10
| Other Systems Used In The Internet   12
|
2.11
| Monitoring And Control Systems   13
|
2.12
| Summary   13
|
| For Further Study   13
|
3.1
| Introduction   15
|
3.2
| Protocols And Layering   15
|
3.3
| Layers 1 And 2 (Physical And Network Interface)   17
|
| 3.3.1
| Ethernet   17
|
| 3.3.2
| Ethernet Frame Format   17
|
| 3.3.3
| Ethernet Addresses   18
|
| 3.3.4
| Ethernet Type Field   19
|
3.4
| Layer 3 (Internet)   19
|
| 3.4.1
| The Internet Protocol   19
|
| 3.4.2
| IP Datagram Format   19
|
| 3.4.3
| IP Addresses   20
|
3.5
| Layer 4 (Transport)   20
|
| 3.5.1
| UDP And TCP   20
|
| 3.5.2
| UDP Datagram Format   21
|
| 3.5.3
| TCP Segment Format   22
|
3.6
| Protocol Port Numbers And Demultiplexing   23
|
3.7
| Encapsulation And Transmission   23
|
3.8
| Address Resolution Protocol   24
|
3.9
| Summary   24
|
| For Further Study   24
|
4.1
| Introduction   29
|
4.2
| A Conventional Computer System   29
|
4.3
| Network Interface Cards   30
|
4.4
| Definition Of A Bus   31
|
4.5
| The Bus Address Space   32
|
4.6
| The Fetch-Store Paradigm   33
|
4.7
| Network Interface Card Functionality   34
|
4.8
| NIC Optimizations For High Speed   34
|
4.9
| Onboard Address Recognition   35
|
| 4.9.1
| Unicast And Broadcast Recognition And Filtering   35
|
| 4.9.2
| Multicast Recognition And Filtering   35
|
4.10
| Onboard Packet Buffering   36
|
4.11
| Direct Memory Access   37
|
4.12
| Operation And Data Chaining   38
|
4.13
| Data Flow Diagram   39
|
4.14
| Promiscuous Mode   39
|
4.15
| Summary   40
|
| For Further Study   40
|
5.1
| Introduction   43
|
5.2
| State Information and Resource Exhaustion   43
|
5.3
| Packet Buffer Allocation   44
|
5.4
| Packet Buffer Size And Copying   45
|
5.5
| Protocol Layering And Copying   45
|
5.6
| Heterogeneity And Network Byte Order   46
|
5.7
| Bridge Algorithm   47
|
5.8
| Table Lookup And Hashing   49
|
5.9
| IP Datagram Fragmentation And Reassembly   50
|
| 5.9.1
| Interpretation Of The Flags Field   51
|
| 5.9.2
| Interpretation Of The Fragment Offset Field   51
|
| 5.9.3
| IP Fragmentation Algorithm   52
|
| 5.9.4
| Fragmenting A Fragment   53
|
| 5.9.5
| IP Reassembly   53
|
| 5.9.6
| Grouping Fragments Together   54
|
| 5.9.7
| Fragment Position   54
|
| 5.9.8
| IP Reassembly Algorithm   55
|
5.10
| IP Datagram Forwarding   56
|
5.11
| IP Forwarding Algorithm   57
|
5.12
| High-Speed IP Forwarding   57
|
5.13
| TCP Connection Recognition Algorithm   59
|
5.14
| TCP Splicing Algorithm   60
|
5.15
| Summary   63
|
| For Further Study   63
|
| Exercises   63
|
6.1
| Introduction   67
|
6.2
| Packet Processing   68
|
6.3
| Address Lookup And Packet Forwarding   68
|
6.4
| Error Detection And Correction   69
|
6.5
| Fragmentation, Segmentation, And Reassembly   70
|
6.6
| Frame And Protocol Demultiplexing   70
|
6.7
| Packet Classification   71
|
| 6.7.1
| Static And Dynamic Classification   71
|
| 6.7.2
| Demultiplexing Vs. Classification   71
|
| 6.7.3
| Optimized Packet Processing   72
|
| 6.7.4
| Classification Languages   72
|
6.8
| Queueing And Packet Discard   73
|
| 6.8.1
| Basic Queueing   73
|
| 6.8.2
| Priority Mechanisms   74
|
| 6.8.3
| Packet Discard   75
|
6.9
| Scheduling And Timing   75
|
6.10
| Security: Authentication And Privacy   76
|
6.11
| Traffic Measurement And Policing   76
|
6.12
| Per-Flow And Aggregate Policing   77
|
6.13
| Traffic Characteristics   78
|
| 6.13.1
| Constant Bit Rate (CBR)   78
|
| 6.13.2
| Variable Bit Rate (VBR)   78
|
| 6.13.3
| Available Bit Rate (ABR)   79
|
| 6.13.4
| Unspecified Bit Rate (UBR)   79
|
6.14
| Traffic Shaping   80
|
6.15
| The Point Of Traffic Management   81
|
6.16
| Timer Management   82
|
6.17
| Summary   83
|
| For Further Study   83
|
| Exercises   83
|
7.1
| Introduction   87
|
7.2
| Implementation Of Packet Processing In An Application   87
|
7.3
| Fast Packet Processing In Software   88
|
7.4
| Embedded Systems   88
|
7.5
| Operating System Implementations   89
|
7.6
| Software Interrupts And Priorities   89
|
7.7
| Multiple Priorities And Kernel Threads   91
|
7.8
| Thread Synchronization   92
|
7.9
| Software For Layered Protocols   92
|
| 7.9.1
| One Thread Per Layer   93
|
| 7.9.2
| One Thread Per Protocol   94
|
| 7.9.3
| Multiple Threads Per Protocol   94
|
| 7.9.4
| Separate Timer Management Threads   94
|
| 7.9.5
| One Thread Per Packet   95
|
7.10
| Asynchronous Vs. Synchronous Programming   96
|
7.11
| Summary   97
|
| For Further Study   97
|
| Exercises   97
|
8.1
| Introduction   101
|
8.2
| Network Systems Architecture   101
|
8.3
| The Traditional Software Router   102
|
8.4
| Aggregate Data Rate   103
|
8.5
| Aggregate Packet Rate   103
|
8.6
| Packet Rate And Software Router Feasibility   105
|
8.7
| Overcoming The Single CPU Bottleneck   107
|
8.8
| Fine-Grain Parallelism   108
|
8.9
| Symmetric Coarse-Grain Parallelism   108
|
8.10
| Asymmetric Coarse-Grain Parallelism   109
|
8.11
| Special-Purpose Coprocessors   109
|
8.12
| ASIC Coprocessor Implementation   110
|
8.13
| NICs With Onboard Processing   111
|
8.14
| Smart NICs With Onboard Stacks   112
|
8.15
| Cells And Connection-Oriented Addressing   112
|
8.16
| Data Pipelines   113
|
8.17
| Summary   115
|
| For Further Study   115
|
| Exercises   115
|
9.1
| Introduction   119
|
9.2
| Inherent Limits Of Demultiplexing   119
|
9.3
| Packet Classification   120
|
9.4
| Software Implementation Of Classification   121
|
9.5
| Optimizing Software-Based Classification   122
|
9.6
| Software Classification On Special-Purpose Hardware   123
|
9.7
| Hardware Implementation Of Classification   123
|
9.8
| Optimized Classification Of Multiple Rule Sets   124
|
9.9
| Classification Of Variable-Size Headers   126
|
9.10
| Hybrid Hardware\|/\|Software Classification   127
|
9.11
| Dynamic Vs. Static Classification   128
|
9.12
| Fine-Grain Flow Creation   129
|
9.13
| Flow Forwarding In A Connection-Oriented Network   130
|
9.14
| Connectionless Network Classification And Forwarding   130
|
9.15
| Second Generation Network Systems   131
|
9.16
| Embedded Processors In Second Generation Systems   132
|
9.17
| Classification And Forwarding Chips   133
|
9.18
| Summary   134
|
| For Further Study   134
|
| Exercises   134
|
10.1
| Introduction   137
|
10.2
| Bandwidth Of An Internal Fast Path   137
|
10.3
| The Switching Fabric Concept   138
|
10.4
| Synchronous And Asynchronous Fabrics   139
|
10.5
| A Taxonomy Of Switching Fabric Architectures   140
|
10.6
| Dedicated Internal Paths And Port Contention   140
|
10.7
| Crossbar Architecture   141
|
10.8
| Basic Queueing   143
|
10.9
| Time Division Solutions: Sharing Data Paths   145
|
10.10
| Shared Bus Architecture   145
|
10.11
| Other Shared Medium Architectures   146
|
10.12
| Shared Memory Architecture   147
|
10.13
| Multistage Fabrics   148
|
10.14
| Banyan Architecture   149
|
10.15
| Scaling A Banyan Switch   150
|
10.16
| An Example Commercial Technology   152
|
10.17
| Protocol Independence: The Agere PI40   153
|
10.18
| Implementation Of The Agere Fabric   153
|
10.19
| Capacity Of The Agere Fabric   155
|
10.20
| Other Commercial Technologies   155
|
10.21
| Practical Issues   155
|
10.22
| Summary   156
|
| For Further Study   156
|
| Exercises   156
|
11.1
| Introduction   161
|
11.2
| The CPU In A Second Generation Architecture   161
|
11.3
| Third Generation Network Systems   162
|
11.4
| The Motivation For Embedded Processors   163
|
11.5
| RISC vs. CISC   163
|
11.6
| The Need For Custom Silicon   164
|
11.7
| Definition Of A Network Processor   165
|
11.8
| A Fundamental Idea: Flexibility Through Programmability   166
|
11.9
| Instruction Set   167
|
11.10
| Scalability With Parallelism And Pipelining   167
|
11.11
| The Costs And Benefits Of Network Processors   168
|
11.12
| Network Processors And The Economics Of Success   169
|
11.13
| The Status And Future Of Network Processors   170
|
11.14
| Summary   170
|
| For Further Study   171
|
| Exercises   171
|
12.1
| Introduction   173
|
12.2
| Network Processor Functionality   173
|
12.3
| Packet Processing Functions   174
|
12.4
| Ingress And Egress Processing   175
|
| 12.4.1
| Ingress Processing   175
|
| 12.4.2
| Egress Processing   176
|
12.5
| Parallel And Distributed Architecture   178
|
12.6
| The Architectural Roles Of Network Processors   179
|
12.7
| Consequences For Each Architectural Role   179
|
12.8
| Macroscopic Data Pipelining And Heterogeneity   181
|
12.9
| Network Processor Design And Software Emulation   181
|
12.10
| Summary   182
|
| For Further Study   182
|
| Exercises   183
|
13.1
| Introduction   185
|
13.2
| Architectural Variety   185
|
13.3
| Primary Architectural Characteristics   186
|
| 13.3.1
| Processor Hierarchy   186
|
| 13.3.2
| Memory Hierarchy   187
|
| 13.3.3
| Internal Transfer Mechanisms   189
|
| 13.3.4
| External Interface And Communication Mechanisms   190
|
| 13.3.5
| Special-Purpose Hardware   191
|
| 13.3.6
| Polling And Notification Mechanisms   191
|
| 13.3.7
| Concurrent Execution Support   192
|
| 13.3.8
| Hardware Support For Programming   193
|
| 13.3.9
| Hardware And Software Dispatch Mechanisms   193
|
| 13.3.10
| Implicit Or Explicit Parallelism   194
|
13.4
| Architecture, Packet Flow, And Clock Rates   194
|
13.5
| Software Architecture   197
|
13.6
| Assigning Functionality To The Processor Hierarchy   197
|
13.7
| Summary   199
|
| For Further Study   200
|
| Exercises   200
|
14.1
| Introduction   203
|
14.2
| The Processing Hierarchy And Scaling   203
|
14.3
| Scaling By Making Processors Faster   204
|
14.4
| Scaling By Increasing The Number of Processors   204
|
14.5
| Scaling By Increasing Processor Types   205
|
14.6
| Scaling A Memory Hierarchy   206
|
14.7
| Scaling By Increasing Memory Size   208
|
14.8
| Scaling By Increasing Memory Bandwidth   208
|
14.9
| Scaling By Increasing Types Of Memory   209
|
14.10
| Scaling By Adding Memory Caches   210
|
14.11
| Scaling With Content Addressable Memory   211
|
14.12
| Using CAM for Packet Classification   213
|
14.13
| Other Limitations On Scale   215
|
14.14
| Software Scalability   216
|
14.15
| Bottlenecks And Scale   217
|
14.16
| Summary   217
|
| For Further Study   218
|
| Exercises   218
|
15.1
| Introduction   221
|
15.2
| An Explosion Of Commercial Products   221
|
15.3
| A Selection of Products   222
|
15.4
| Augmented RISC Processor (Alchemy)   222
|
15.5
| Embedded Processor Plus Coprocessors (AMCC)   223
|
15.6
| Pipeline Of Homogeneous Processors (Cisco)   225
|
15.7
| Pipeline Of Heterogeneous Processors (EZchip)   226
|
15.8
| Extensive And Diverse Processors (Hifn)   227
|
15.9
| Homogeneous Parallel Processors Plus Controller (Intel)   230
|
15.10
| Flexible RISC Plus Coprocessors (Motorola)   233
|
15.11
| Extremely Long Homogeneous Pipeline (Xelerated)   237
|
15.12
| Summary   238
|
| For Further Study   239
|
| Exercises   239
|
16.1
| Introduction   241
|
16.2
| Low Development Cost Vs. Performance   241
|
16.3
| Programmability Vs. Processing Speed   242
|
16.4
| Performance: Packet Rate, Data Rate, And Bursts   242
|
16.5
| Speed Vs. Functionality   243
|
16.6
| Per-Interface Rate Vs. Aggregate Data Rate   243
|
16.7
| Network Processor Speed Vs. Bandwidth   243
|
16.8
| Coprocessor Design: Lookaside Vs. Flow-Through   244
|
16.9
| Pipelining: Uniform Vs. Synchronized   244
|
16.10
| Explicit Parallelism Vs. Cost And Programmability   244
|
16.11
| Parallelism: Scale Vs. Packet Ordering   245
|
16.12
| Parallelism: Speed Vs. Stateful Classification   245
|
16.13
| Memory: Speed Vs. Programmability   245
|
16.14
| I/O Performance Vs. Pin Count   246
|
16.15
| Programming Languages: A Three-Way Tradeoff   246
|
16.16
| Multithreading: Throughput Vs. Programmability   246
|
16.17
| Traffic Management Vs. Blind Forwarding At Low Cost   247
|
16.18
| Generality Vs. Specific Architectural Role   247
|
16.19
| Memory Type: Special-Purpose Vs. General-Purpose   247
|
16.20
| Backward Compatibility Vs. Architectural Advances   248
|
16.21
| Parallelism Vs. Pipelining   248
|
16.22
| Summary   249
|
| Exercises   249
|
17.1
| Introduction   253
|
17.2
| Agere Terminology   253
|
17.3
| Agere PayloadPlus (APP)   254
|
17.4
| Conceptual Pipeline   254
|
17.5
| Functional Block Characteristics   255
|
17.6
| First-Generation Agere Implementation   256
|
17.7
| Second Generation Agere Implementation (APP550)   257
|
17.8
| Basic APP550 Features   258
|
17.9
| External Connections   259
|
| 17.9.1
| Memory Interfaces   260
|
| 17.9.2
| Media Interfaces   260
|
| 17.9.3
| Switching Fabric Interface   261
|
| 17.9.4
| PCI Bus Interface   261
|
| 17.9.5
| Scheduling Interface   261
|
| 17.9.6
| Coprocessor Interface   262
|
17.10
| Memory Uses   262
|
17.11
| Internal Architecture   263
|
17.12
| Engines On The APP550   264
|
17.13
| High-Speed Full Duplex Operation   264
|
17.14
| The Underlying Complexity   265
|
17.15
| Summary   265
|
| For Further Study   266
|
| Exercises   266
|
18.1
| Introduction   269
|
18.2
| Hierarchical Structure And Vestigial Terminology   269
|
18.3
| Major Functional Units On The APP550   270
|
18.4
| Input Interface   271
|
18.5
| Pattern Processing Engine (PPE)   271
|
18.6
| Data Flow Through The Classifier   273
|
18.7
| PDU Assembler   274
|
18.8
| Reorder Buffer   274
|
18.9
| State Engine   275
|
18.10
| Traffic Manager   276
|
18.11
| Stream Editor   278
|
18.12
| Programming, Performance, And Global Pulse Rate   279
|
18.13
| Other Functional Units On The APP550   280
|
18.14
| Summary   280
|
| For Further Study   281
|
19.1
| Introduction   283
|
19.2
| Reference Systems   283
|
19.3
| Simulation Vs. Emulation   284
|
19.4
| The Agere Reference System   285
|
19.5
| Software Development Environment (SDE)   285
|
19.6
| System Performance Analyzer (SPA)   286
|
19.7
| Hardware Development System (HDS)   287
|
19.8
| HDS Bootstrap And Operation   288
|
19.9
| Run Time Environment (RTE)   288
|
19.10
| Testing Paradigms   289
|
19.11
| Summary   289
|
| For Further Study   290
|
20.1
| Introduction   293
|
20.2
| Support For High-Speed Classification   293
|
20.3
| Agere's Functional Programming Language (FPL)   294
|
20.4
| FPL Characteristics   294
|
20.5
| Two Pass Processing And The Division Into Blocks   296
|
20.6
| FPL Pass 1: The Root Program   296
|
20.7
| FPL Pass 2: The Replay Program   297
|
20.8
| Status Values Passed With A Packet   297
|
20.9
| Algorithms For First And Second Pass Processing   298
|
20.10
| Designating The First And Second Pass   299
|
20.11
| Pattern Matching Paradigm   300
|
20.12
| Using Patterns For Conditionals   301
|
20.13
| Symbolic Constants And Include Files   303
|
20.14
| Example FPL Code For Second Pass Processing   303
|
20.15
| Sequential Pattern Matching Paradigm   304
|
20.16
| Tree Functions And The BITS Default   306
|
20.17
| Return Values   306
|
20.18
| Passing Information To The Traffic Manager Block   306
|
20.19
| Sticky Bits   307
|
20.20
| Access To Built-in And External Functions   307
|
20.21
| FPL Constant Syntax   308
|
20.22
| Dotted Decimal Patterns And Longest Prefix Match   308
|
20.23
| FPL Support For Dynamic Classification   309
|
20.24
| Ordered Virtual Functions   310
|
20.25
| Ordered Functions And State Access   310
|
20.26
| Debugging FPL   311
|
20.27
| Summary   311
|
| For Further Study   312
|
| Exercises   312
|
21.1
| Introduction   315
|
21.2
| State Engine Functionality   315
|
21.3
| External Host Interface   316
|
21.4
| State Engine Control And Status Registers   316
|
21.5
| State Engine Memory And Address Space   317
|
21.6
| Onboard Configuration Bus And External Connection   317
|
21.7
| ASI Functions And Access From FPL   318
|
21.8
| Policing Engine And Policing Scripts   318
|
21.9
| Return Values From Policing   319
|
21.10
| Binding A Script ID To A Specific Function   319
|
21.11
| FPL Prototype Declaration And Example   320
|
21.12
| Initialization Of The Policing Database   320
|
21.13
| Register Files And Optimized Access   321
|
21.14
| C-NP Language Overview   322
|
| 21.14.1
| Lexical Conventions   322
|
| 21.14.2
| Data Declarations   323
|
| 21.14.3
| Expressions   324
|
| 21.14.4
| Statements   325
|
| 21.14.5
| Preprocessor Directives   325
|
| 21.14.6
| Script Structure   326
|
21.15
| An Example Policing Script   326
|
21.16
| Summary   328
|
| For Further Study   328
|
| Exercises   329
|
22.1
| Introduction   331
|
22.2
| Policing, Queueing, And Scheduling Decisions   331
|
22.3
| Buffer Management   332
|
22.4
| Completion Of Flow Policing   333
|
22.5
| A Unified Packet Discard Algorithm (WRED)   334
|
22.6
| Implementation Of Weighted RED On The APP550   335
|
22.7
| Scheduling And Traffic Shaping   338
|
22.8
| Traffic Shaping And The Agere Definition Of Scheduler   338
|
22.9
| CBR Shaping   340
|
22.10
| VBR Shaping   340
|
22.11
| Bandwidth Allocation In A Packet Switching System   341
|
22.12
| The Two Forms Of Bandwidth Allocation   341
|
22.13
| Fixed Bandwidth Allocation   342
|
22.14
| Implementation Of Fixed Allocation   342
|
22.15
| Proportional Bandwidth Allocation   343
|
22.16
| An Example Of Proportional Allocation   343
|
22.17
| Implementation Of Proportional Bandwidth Allocation   345
|
22.18
| Smoothed Deficit Weighted Round Robin   345
|
22.19
| A Hierarchy Of Queues   346
|
22.20
| Traffic Management Mechanisms On The APP550   346
|
22.21
| Summary   348
|
| For Further Study   349
|
23.1
| Introduction   351
|
23.2
| Motivation For An External Processor   351
|
23.3
| Role Of An External Host Processor   352
|
23.4
| Physical Interconnection To An External Host   353
|
23.5
| Packet Exchange And The Concept Of Pseudo Interface   353
|
23.6
| An Application Program Interface For External Hosts   353
|
23.7
| Programming Paradigm And Handle Declarations   354
|
23.8
| Initialization Functions   355
|
23.9
| Examples Of Object Functions   356
|
23.10
| A Dynamic Classification Example   357
|
23.11
| An Example Of Slow Path Packet Transfer   359
|
23.12
| Summary   361
|
| For Further Study   362
|
| Exercises   362
|
24.1
| Introduction   365
|
24.2
| An ISP Access Node   365
|
24.3
| Differentiated Services (DiffServ)   366
|
24.4
| Mapping SLA Requirements To DiffServ Classes   367
|
24.5
| Queues, Destination IDs, And Classification   369
|
24.6
| Policing, Coloring, And Flow IDs   374
|
24.7
| Example Code: Classification (FPL)   375
|
24.8
| Example Code: Policing Scripts   386
|
24.9
| Buffer Management And Packet Discard   388
|
24.10
| Example Code: Buffer Management And Discard (WRED)   389
|
24.11
| Scheduler   391
|
24.12
| Dynamic Scheduler   393
|
24.13
| Example Code: SDWRR Scheduler   393
|
24.14
| Packet Marking (Modification)   396
|
24.15
| Example Code: Packet Marking   396
|
24.16
| Example Code: Host Interface   397
|
24.17
| Summary   401
|
| For Further Study   402
|
| Exercises   402
|