Documente Academic
Documente Profesional
Documente Cultură
SYSTEM ENGINEERING
Engineering 180
Systems Engineering
Embedded Processing Case Study
Lecture 1
May 21, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved
SYSTEM ENGINEERING
Lecture 1
Overview of Embedded Subsystem Design
Case Study: Problem statement
Lecture 2
Conceptual Design
Lecture 3
Preliminary Design
Lecture 4
Detailed Design / Integration and Test
2015 Steve Kirsch- All Rights Reserved
Lecture 1: Agenda
SYSTEM ENGINEERING
Overview
Problem statement
Identify stake holder
Top level requirements
Key Performance Parameter (KPPs)
Homework
2015 Steve Kirsch- All Rights Reserved
11/6/2015
Wikipedia:
An embedded system is a computer system with a
dedicated function within a larger mechanical or
electrical system, often with real-time computing
constraints
It is embedded as part of a complete device often
including hardware and mechanical parts.
Embedded systems control many devices in
common use today
Overview: continued
SYSTEM ENGINEERING
Amp
ADC
Embedded
Digital
Processor
DAC
Amp
Load
Overview: continued
SYSTEM ENGINEERING
11/6/2015
Overview: continued
SYSTEM ENGINEERING
Overview: continued
SYSTEM ENGINEERING
Overview:
Embedded system engineers job is very challenging
SYSTEM ENGINEERING
11/6/2015
Size
Weight
Power
Life cycle costs
Non-recurring develop cost
Recurring cost
10
Durability / Reliability
Maintainability
Supportability
Development schedule
Development test and integration environment
Infrastructure requirement (many application specific)
Build-time (Drivers, Libraries, Interfaces)
Run-time (Services, Clients, Servers, etc.)
11
User interfaces
Human to computer
computer to computer
Realtime operation
Hard realtime sec - msec response times
Testability / observability
12
11/6/2015
13
Functional Performance
Mission environment
Support
Detection performance
Tracking accuracies
ID capabilities
Weapon support
Map characteristics
Offboard info
Communication
requirements
Weapon
characteristics
Design has to
balance multiple
desires and
constraints
Maintenance concept
Reliability
Maintainability measures
Built-in test capability
Physical characteristics
Weight
Size (O&M)
Prime power
utilization
Cooling required
Dissipation
EMI/C characteristics
Physical environment
Operating temperatures
Storage temperatures
Coolant characteristics
Vibration levels
Shock
Prime power characteristics
EMI/C requirements
Cost
Recurring cost
Development cost
Life-cycle cost
Programmatic Characteristics
Development plan
Production plan
Risks
Technology maturity
Source: Raytheon
2015 Steve Kirsch- All Rights Reserved
14
15
11/6/2015
16
17
HP-35 Wikipedia
The HP-35 was Hewlett-Packard's first pocket calculator and the
world's first scientific pocket calculator[1] (a calculator
with trigonometric and exponential functions). Like some of HP's
desktop calculators, it used reverse Polish notation. Introduced at
US$395,[2] the HP-35 was available from 1972 to 1975.
18
11/6/2015
19
20
21
11/6/2015
22
23
24
11/6/2015
Case Study:
The Need Phase Proposal Phase
SYSTEM ENGINEERING
It is marketing, it is management,
It is a lot of engineering, and it is about managing risk
2015 Steve Kirsch- All Rights Reserved
25
Current System
Predator
2015 Steve Kirsch- All Rights Reserved
26
SYSTEM ENGINEERING
Conceptual Design
System Level
System Specification
Preliminary Design
System Architecture
Level #1 A
Subsystems
Subsystem Specifications
Preliminary Design B
Subsystem Architecture
Level # 2
Subsystems
B1
B2
B3
27
11/6/2015
28
Frequency
Synthesis/
WF Gen
Power
Amplifier
Antenna
Reciever Exciter
Subsystem
Low Noise
Amplifier
DownConversion
A/D
Conversion
and Timing
Generator
Digital
Signal
Processing
Control
Image
Information
Detected Objects
Antenna
Subsystem
Processing
Subsystem
Commands,
Motion Data
Radar Results,
Health Info
System
Interfaces
29
Potential
Solutions
Conceptual Design
Baseline
Solution
SYSTEM ENGINEERING
Detailed,
Documented
Baseline
Preliminary Design
Includes:
Elicitation of need and requirements
Design through insight, invention, and successive
refinement
Management of complexity through partitioning and
creating well-posed lower-level design problems
2015 Steve Kirsch- All Rights Reserved
30
10
11/6/2015
Concept
Technology
Refinement Development
System Development
& Demonstration
Production &
Deployment
Operations
& Support
Materiel Developer
PM Total Life Cycle System Manager
Acquisition Framework
High
Ability to
Influence
LCC
(70-75%
of Cost
Decisions
Made)
Less Ability to
Influence LCC
(85% of Cost
Decisions
Made)
(10%-15%)
Little Ability to
Influence LCC (90-95%
of Cost Decisions
Made)
(5%-10%)
31
Mission scenarios
Threats
Operational environment
Platform resource allocation for the Radar system
Space
Weight
Cooling capacity
Operators interfaces
Mission stability (system shall run continuously for N hours)
Plus many more illities requirements
32
33
11
11/6/2015
Case Study:
System Conceptual Design Phase Product
Products
Baseline design
Performance
Risk
Cost
Schedule
SYSTEM ENGINEERING
Subsystem spec
2015 Steve Kirsch- All Rights Reserved
Case Study:
System Conceptual Design Review (SDR)
34
SYSTEM ENGINEERING
35
Space
Weight
Power
Cooling
Illities
Transmit waveform specifications (PRF, num coherent pulses
transmitted/collected, sample rates, number of receive channels,
phase coding, etc.)
Processing algorithms (preliminary)
Interfaces (Sensor data, Sensor command, Nav, mission
computer, instrumentation system)
2015 Steve Kirsch- All Rights Reserved
36
12
11/6/2015
Range
Series of pulse with a phase relationship transmitted and collect that can be coherently processed
Dwell time
Bars
Pulse modulation
Amplitude and phase superimposed on the pulse during the duration of a pulse
Receive channels Radar antenna are typically partitioned into subArrays that have physically offset
phase centers connected to a separate receiver and A/D
2015 Steve Kirsch- All Rights Reserved
37
SYSTEM ENGINEERING
CPI M
pulse 0 - N
Number of CPIs/Dwell
Number of pulses/CPI
Pulse modulation LFM linear frequency modulation
Number of receive channel
PRF
Number swaths/scan area
Scan area rate
38
Case study:
GMTI Processing Algorithm (CPI processing)
SYSTEM ENGINEERING
I/Q
Formation
Pulse
Compression
Clutter
Cancellation
Motion
Compensation
Noise
Estimation
Target
Detection
Doppler
Filtering
PDI
Processing
39
13
11/6/2015
Case study:
GMTI Processing Algorithm (PDI processing)SYSTEM ENGINEERING
CPI
Processing
False Alarm
Control
M of N
Processing
Ambiguity
Resolving
Angle
Estimation
Noise
Estimation
Sidelobe
Detection
Rejection
Target
Parameter
Estimation
Hit
List
40
Homework
SYSTEM ENGINEERING
41
14
11/6/2015
SYSTEM ENGINEERING
Engineering 180
Systems Engineering
Embedded Processing Case Study
Lecture 2
May 26, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved
SYSTEM ENGINEERING
Lecture 2
Conceptual Design
Review Homework
Starting point for conceptual design of the
embedded processing
Feasibility and Requirements Analysis
Embedded processing design synthesis
Subsystem concept design review process
Homework
11/6/2015
Homework 1: review
SYSTEM ENGINEERING
Custom design will have some logic dedicated to sub-functions that arent active all the
time. Microprocessors logic is reused for all sub-functions
Microprocessors are application agnostic, therefore we can leverage huge investments
made by others. Application specific logic can be implemented in software
Microprocessors can be faster than custom logic (Seems almost counter intuitive!)
Utilizes the latest manufacturing processes
Resources available for access to the best experts and large design teams
Can over come the overhead of interpreting instructions with clever utilization of
parallelism
Homework 1: review
SYSTEM ENGINEERING
BAA Broad Area Announcement typically precedes RFP request for proposal
when contracting with the US government
RFP let
Contract won!
Program manager
Subsystem architect (head technical subsystem engineer)
Development team leads (Tech leads)
Hardware unit lead
Mode software lead
Infrastructure software lead
11/6/2015
Requirement sources
Customers
System Team
System Program Manager
Contracting organization
SYSTEM ENGINEERING
Procurement spec
Subsystem specs (Generated by tier 1 system team)
KPPs (Key Performance Parameters identified by
customer or system team)
System TPMs (Technical Performance Measures)
SRD (system requirements document)
SDD (system design description)
SRR (system requirements review material)
Vendor components specifications
Legacy systems components *
Standards *
Laws of Physics
Company development procedures, ethics, rules
Laws of the land and point of deployment
Common sense
* Potential requirement source
SYSTEM ENGINEERING
Key questions:
Can we design the embedded processing to run in realtime while
meeting the SWaP-C requirements?
SWAP-C (Space Weight and Power - Cost)
11/6/2015
SYSTEM ENGINEERING
Understand requirements and focus first on the primary requirements that will
likely drive the top level design
For our case study, the real-time signal processing requirement is key
System requirement is to scan an area of interest in N secs process the real time
data and produce a hit report of all ground movers within the AoI with a false
alarm rate of R and a probability of detection P.
The system flowdown requirements have specified the waveforms and the signal
processing algorithms that can achieve this system performance
As one begins to drill down to the next level of detail some requirements might not be
achievable with in the scope of other requirements
Requirements can be modified to help achieve the primary system goals at this stage
10
SYSTEM ENGINEERING
Pulse
Compression
Clutter
Cancellation
Motion
Compensation
Noise
Estimation
Target
Detection
Doppler
Filtering
PDI
Processing
11
A/D rates
A/D sample word size (often a function of data rate)
Number of input data channels
REX Processor network bandwidth and protocol
How is the data packaged and shipped?
How much extra bandwidth is needed for the protocol (eg. error correction coding)?
What is the receive duty? (How much of the total time is data streaming?)
12
11/6/2015
N Pulses
Receive
window
Receive
window
Receive
window
Receive
window
Receive
window
xxxxxx
13
SYSTEM ENGINEERING
Data CollectionTime
Mode 1
Mode 2
Mode 1
14
Data CollectionTime
D0
D1
Mode 1
Collection
Mode 2
Collection
Mode 1 D0
Processing
D2
Mode 1
Collection
Mode 2 D1
Processing
15
11/6/2015
D0
D1
Mode 1
Collection
Mode 2
Collection
Mode 1 D0
Processing
D2
Mode 1
Collection
Mode 2 D1
Processing
Mode 2 D2
Processing
16
Parallel Processing
Memory size required could be larger then the fully pipelined architecture
Processing performance per processor reduced
Notice the time to get the results from processing dwell D0 (latency) is longer in this case
D0
Mode 1
Collection
Dwells
D1
Mode 2
Collection
D2
Mode 1
Collection
Mode 1 D0
Processing
Mode 2 D1
Processing
Mode 1 D2
Processing
17
18
11/6/2015
19
20
Memory Bandwidth
Network Bandwidth
I/O Bandwidth
CPU OPs (Operations per sec)
Signal processing performance usually expresses performance in FLOPS (Floating
point operations per sec
I/O Ports
Program
Memory
CPU
Clock
21
11/6/2015
22
23
24
11/6/2015
P Core
Core 1
Input
Data
Core N
Reg File
Output
Data
Instruction
ALU
0
ALU
N
25
26
SYSTEM ENGINEERING
27
11/6/2015
General Purpose
Processors (GPPs)
Digital Signal
Processors (DSPs)
Field Programmable
Gate Arrays (FPGAs)
Lowest
Ease of Application
Programming
Full-featured support of
HOL programming
Limited support of
HOL programming
VHDL Required
Limited number of
products available
with floating point
High Performance
GPPs more expensive
than DSPs
Lowest
Highest
When to Consider
Using
28
SYSTEM ENGINEERING
ASIC FPGA
Yes
No
High
Moderate
Lengthy
Moderate
Moderate
High
High
Moderate
Lower
Higher
Higher
Lower
Functional Density
29
30
10
11/6/2015
SYSTEM ENGINEERING
Mission
Computer
INS
GPS
Embedded Processor
General Purpose
Processing
High Speed
Instrumentation
System
REX
Signal Processing
31
32
33
11
11/6/2015
SYSTEM ENGINEERING
34
35
SYSTEM ENGINEERING
36
12
11/6/2015
SYSTEM ENGINEERING
37
Digital Entertainment
AES
DES
High-Pass Gray-Scale Filter
Huffman Decoding
MP3 Decode
MPEG-2 Decode
MPEG-2 Encode
MPEG-4 Decode
MPEG-4 Encode
RGB to CMYK Conversion
RGB to YIQ Conversion
RSA
Autocorrelation
Bit Allocation
Convolutional Encoder
Fast Fourier Transform (FFT)
Viterbi Decoder
38
39
13
11/6/2015
Pulse
Compression
Clutter
Cancellation
Motion
Compensation
Noise
Estimation
Target
Detection
Doppler
Filtering
PDI
Processing
40
41
42
14
11/6/2015
43
44
SYSTEM ENGINEERING
TPMs
Validation by analysis
Validation at unit test level
Validation in the system integration lab
Validation in a deployed environment
45
15
11/6/2015
46
Homework
SYSTEM ENGINEERING
47
16
11/6/2015
SYSTEM ENGINEERING
Engineering 180
Systems Engineering
Embedded Processing Case Study
Lecture 3
May 28, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved
SYSTEM ENGINEERING
Lecture 3
Preliminary Design
11/6/2015
SYSTEM ENGINEERING
Nested
Design Process for Complex Systems
Major subsystem interfaces defined
P1. Subsystem
Requirements Analysis
Preliminary subSystem
Architecture
P2. Requirements
Allocation
P3. Interface
identification/design
P4. Subsystemlevel synthesis
P5. Preliminary
design review
To detailed design
2015 Steve Kirsch- All Rights Reserved
11/6/2015
SYSTEM ENGINEERING
All radar waveforms finalized by system CDR (not quite there yet)
Explore the full range of variability on interfaces
SYSTEM ENGINEERING
PCIe x8
High Speed
point to point
mesh
network
sFPDP x8
REX
Data
I/F
REX
Cntrl
I/F
10 Gb Ethernet
Ethernet
Controllers
System I/O
Custom
I/F
11/6/2015
SYSTEM ENGINEERING
The Cell multi-core Processor was a combined development between Sony, Toshiba
. and IBM
First app was the Sonys PlayStation 3
First chips (90 nm version) available in 2005
65nm version in 2007 and 45nm version in 2009 (first chip used in Sony play station)
Chip performance was way ahead of its time in 2005
10
11
12
11/6/2015
SYSTEM ENGINEERING
13
SYSTEM ENGINEERING
14
15
11/6/2015
SYSTEM ENGINEERING
Wikipedia:
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's
taxonomy. It describes computers with multiple processing elements that perform the
same operation on multiple data points simultaneously. Thus, such machines
exploit data level parallelism, but not concurrency: there are simultaneous (parallel)
computations, but only a single process (instruction) at a given moment.
Register File
Instruction
Register
Decoder
ALU
ALU
ALU
ALU
16
SYSTEM ENGINEERING
Concurrency
Does it Imply Parallelism?
17
SYSTEM ENGINEERING
Sequential program
A single thread of control that executes one instruction at a time
Next instruction isnt executed until the prior one has completed
Concurrent program
A collection of autonomous sequential threads executing logically in
parallel
Parallelism
Physically simultaneous processing
Requires a multi-processor not just a multi-threaded single processor
18
11/6/2015
Data Synchronization
SYSTEM ENGINEERING
19
Data Organization
SYSTEM ENGINEERING
20
Data Organization
Inter-process Communication FundamentalsSYSTEM ENGINEERING
Parallel programs need to share data and results processed by
different processors. There are two typical ways to pass data
Shared memory Architecture
Message passing architecture
Share memory
Architecture
PROCESSOR
Message Passing
Architecture
PROCESSOR
PROCESSOR
Processor
+ memory
Processor
+ memory
PROCESSOR
Processor
+ memory
Interconnection
Network
GLOBAL
MEMORY
PROCESSOR
Processor
+ memory
Processor
+ memory
PROCESSOR
21
11/6/2015
Processing Domains
SYSTEM ENGINEERING
Channel
T7
T5
T3
T1
Thread 1
Slow Time
T6
T4
T2
Thread 0
T0
Fast Time
Data in Sequential Memory Locations
2015 Steve Kirsch- All Rights Reserved
22
T6
Thread 1
Fast Time
T4
T2
Channel
T0
Thread 0
Slow Time
6/11/2015
23
24
11/6/2015
SYSTEM ENGINEERING
25
Auto-vectorization for data level parallelism (DLP) extraction has been difficult to
automate
Many attempts (Intel C++ compiler, GCC, Green Hills Multi tools)
Experience shows these tools arent particular good
Still a big research area (too risky for our case study)
Shared memory
Message passing
GPU specific architecture
26
Programming Model:
Directed Acyclic Graphs (DAG)
SYSTEM ENGINEERING
Wikipedia Definition
5
10
11
3
2
9
7
27
11/6/2015
Programming Model:
Directed Acyclic Graphs (DAG)
SYSTEM ENGINEERING
Put vertex with no inputs on left and no output on right and those with both input and
output in the middle provides a more intuitive data flow diagram
5
10
11
2
9
7
10
8
2
7
11
28
Directed Acyclic Graph DAG methodology is a perfect match for signal processing abstraction
DAG are a good method for expressing parallelism and data flow relationships
Signal Processing Programming model is focus on exposing parallelism and processing precedence
relationships
A vertex represents a signal processing function(s) and directed edges are the data flow path
from one processing step to the next
The Acyclic nature of DAG is key to achieving an efficient processing structure
The invocation of the processing at a vertex is only dependent on the input data availability
Once processing at a vertex has been invoked it will run to completion uninterrupted
Data flows through the processing steps at the rate solely determined by the latency of the processing at
each vertex
Poor Design
1+2
Good Design
29
SYSTEM ENGINEERING
30
10
11/6/2015
SYSTEM ENGINEERING
31
SYSTEM ENGINEERING
32
PDI Graph
Post Detection Integration processing
1 subgraph per graph
CPI Graph
Ch0
Subgraph 0
Ch 1
Subgraph 1
CPI Graph
Subgraph 0
Ch 2
Subgraph 2
Ch 3
Subgraph 3
33
11
11/6/2015
P2,1
PDI subgraph 0
P2,0
PDI subgraph 0
PDI subgraph 0
P1,1
CPI subgraph 3
CPI subgraph 3
CPI subgraph 3
P1.0
CPI subgraph 2
CPI subgraph 2
CPI subgraph 2
P0,1
CPI subgraph 1
CPI subgraph 1
CPI subgraph 1
P0,0
CPI subgraph 0
CPI subgraph 0
CPI subgraph 0
Processing
for Dwell 0
Processing
for Dwell 1
Processing
for Dwell 2
PX,Y
X= module number
Y= Processor number
34
SYSTEM ENGINEERING
Objectives:
Make sure the functional baseline
requirements have been adequately
addressed by the preliminary design
Physical architecture
Interfaces
Subsystem functional requirements
Real-time constraints
SWAP
illities
Key documents:
Subsystem description
Interface control documents (ICDs)
Preliminary Timing Analysis
Requirements traceability
Draft Requirement Compliance Matrix
Design review package
35
SYSTEM ENGINEERING
36
12
11/6/2015
Homework
SYSTEM ENGINEERING
Read Paper:
Hybrid processor architectures meet demands for SWaP
By John Keller
Available in CCLE 15S-ENGR180-1 Information Folder
What are the pros and cons of using a hybrid processor architecture for our case study of
a Radar embedded processor?
Is a hybrid architecture a good potential solution to resolve our processing timeline
issue?
37
SYSTEM ENGINEERING
Backup Slides
38
SYSTEM ENGINEERING
Hardware Architecture
Signal Processing Software Architecture
Performance of the Architecture
To get good performance requires a system
approach
Lets drill down into the architecture
39
13
11/6/2015
40
SYSTEM ENGINEERING
41
SYSTEM ENGINEERING
42
14
11/6/2015
Standard programming languages (eg C++) if compiler technology supports automated vectorization
of code
Predesigned Signal processing libraries
Signal processing libraries are target dependent code written utilizing SIMD instruction sets
Are basically assembly level code that can access the ISA (instruction set architecture) of the target
processor
Examples of SIMD instruction set are:
AltiVec PowerPc architecture
SSE x86 architecture
SPE intrinsics IBM Cell SPE
43
15
11/6/2015
SYSTEM ENGINEERING
Engineering 180
Systems Engineering
Embedded Processing Case Study
Lecture 4
June 2, 2015
Steve Kirsch
2015 Steve Kirsch- All Rights Reserved
SYSTEM ENGINEERING
Lecture 4
Detailed Design / Integrations and Test
2015 Steve Kirsch- All Rights Reserved
11/6/2015
Update of TPMs
SYSTEM ENGINEERING
PDR:
Performance was identified as a big risk!
SYSTEM ENGINEERING
Reported at PDR
Processing time will be longer then the collection time thus not keeping up with
real-time
Data
Processing
Data
Collection
1
1
2
2
11/6/2015
SYSTEM ENGINEERING
Processor Enclosure
5 module slots available
4 used in baseline + 1 spare
System Design
Performance Growth:
Source of real-time performance issue
SYSTEM ENGINEERING
Doppler Tune preliminary assessment accounted only
for the application of the tuning parameters
Generation of tuning parameter computation initial ignore
resulted in a big unaccounted processing load
11/6/2015
T0 T1 T2 T3
SYSTEM ENGINEERING
So
T0 T1 T2 T3
S1
T0 T1 T2 T3
S2
T0 T1 T2 T3
Time-shifted
Replicas of Tx(Sn)
S3
T0 T1 T2 T3
S4
T0 T1 T2 T3
S5
Received energy Rx
Convolution function
of Tx(sn) with Rx
S0 S1 S2 S3 S4 S5 S6 S7 S8 S9 S10S11 S12S13S14S15
Convolution / Correlation =
Time-shift replicas of Tx(sn)
Rx
= Dot Product
10
11
12
11/6/2015
Given a set of N complex input samples, xn where n = 0, N-1, the DFT filters are:
N 1
Fm W 'mn xn , where m 0, N 1
n0
2j mn
2j
) W mn , where W exp(
)
N
N
13
SYSTEM ENGINEERING
14
15
11/6/2015
Doppler Tune preliminary assessment accounted only for the application of the tuning parameters
Generation of tuning parameter computation initial ignore resulted in a big unaccounted processing load
Performance was dominated by data movement not computation cycles
Initial analysis didnt account for simultaneous data flow of REX data to main memory and data produced between
processing steps
16
Could Pulse compression processing be done in the REX prior to sending data to processor?
Can margin that was planned to reduce risk later in the program be used now to solve this performance
problem?
If spare slot is used for an additional signal processing card, will it solve the performance issues?
What are the options for increasing throughput and memory bandwidth on signal procession card?
Increase development cost and NRE might be a big driver for solution
17
Performance Trades:
Second Look at Requirements and Assumptions
SYSTEM ENGINEERING
Tailoring system requirements to only the address specific known Radar mode was
deemed a poor choice
Design requirement to accommodate new undefined applications is very important
Though this approach could meet the SWAP requirements of the first application of the
system it was deemed too expense and would exceed the SWAP for other potential
applications.
Partitioning a mode across multi units given limited box to box bandwidth potentially
wouldnt solve all the performance issues
Utilizing spare slot for the additional performance would violate the
processing margin requirement
Intent of spare is for future programs and risk reduction during test and integration phase
Program resources could be reallocated (ie. $$ and schedule and engineering talent)
Module SWAP margin was a lower risk and margin could be used earlier in program
Next step is trade studies for best way to improve module performance
2015 Steve Kirsch- All Rights Reserved
18
11/6/2015
Programming model not well understood by the engineer doing performance modeling
t1
t5
t6
Pong
Buf
t7
t8
Time
DMA to Ping
Ping
Buf
t2
t3
t4
Data independent
processing domain
t1
DMA to Pong
Processing
t3
t1
t5
t4
t2
t2
t3
t7
t6
t4
t5
t8
t6
t7
t8
19
SYSTEM ENGINEERING
FFT example
N=1024 Complex Floating Point Samples
Total Flops to perform pulse compression via fast convolution =
10,240 FLOPs
Assume CPU executes 1 FLOP/ns
Fast convolution time = 10,240 FLOPs / (1FLOP/ns)
= 10.24 sec
Assume memory bandwidth = 100MB/sec
Complex floating point sample = 8 bytes
20
Trade-study results
SYSTEM ENGINEERING
Analysis error accounted for only a small fraction of the performance issue
Re-allocation of processing requirement from Cell and a new Front-end processor looks
promising
Large data rate reduction after front-end processing (reducing processing load on following stages)
Application specific design tends to have the highest performance per SWAP
Trades Conclusion
Additional investment to develop application specific solution for front-end processing functions
FPGA (Field Programmable Gate Array) solution best choice (other contender, GPGPUs and DSP specific
COTS chips)
Biggest bang for the buck!
Front-end processing fairly consistent between different mode applications
Greatly reduces load on IBM Cell
21
11/6/2015
SYSTEM ENGINEERING
Main
Memory
CPU
IBM Cell
Distributed
Global
Bulk
Memory
Front-end
Processor
Network
Interface
Controller
Main
Memory
Main
Memory
CPU
IBM Cell
Distributed
Global
Bulk
Memory
CPU
IBM Cell
Distributed
Global
Bulk
Memory
Front-end
Processor
Network
Interface
Controller
Front-end
Processor
Network
Interface
Controller
sFPDP x8
High Speed
point to point
mesh
network
REX
Data
I/F
REX
Cntrl
I/F
Ethernet
Controllers
10 Gb Ethernet
Custom
I/F
System I/O
22
Very large
Computational
intensive functions
GBM functions
REX data store in GBM instead of Main memory
Decouples high bandwidth REX interface from impacting Cell computations
23
Lessons Learned
SYSTEM ENGINEERING
Hardware requirements
Software requirements
Interaction between hardware and software
Problems discovered later in the design process are much more costly (e.g. If
performance issues were found in integration the fix would have been very expensive)
Though in this case we werent able to change the system requirements it was worth
exploring
A lower cost solution might have been to give up design margin, but the consequences
were too high and the probability of an occurrence wasnt low enough
Application specific designs can be more SWAP efficient then general solutions, but are
in general more costly
24
11/6/2015
Increasing complexity
and cost of validation
25
Real-time debug tools for unit test and system integration lab
Does the IDE (Integrated Development Environment) support nonintrusive monitoring of OS and application software (example next slide)
System Level Instrumentation (Support for both SIL and Field testing)
At the full system level are there sufficient interfaces and capability
provided for non-intrusive real time access
Are there sufficient support for data reduction tools
Sorting and understanding of the data of interest
Example of an IDE
Real-time Non-Intrusive Debug Tool
26
SYSTEM ENGINEERING
27
11/6/2015
Example of an IDE
Real-time Non-Intrusive Debug Tool
SYSTEM ENGINEERING
EventAnalyzer displays the length and frequency of RTOS and user events, making it quickly apparent what
operations take the most time and where optimization efforts should be focused
28
CDR purpose
Final design review prior to the official acceptance of the design
Opportunity for all stake holders to assess designs compliance to requirements
Opportunity to review risk assessment and mitigation results
All risks should be well understood and accepted at this time
Goal of a CDR
- Demonstrate the design meets the functional and performance requirements
- Assures the test and evaluation strategies, procedures and support are in place
for the next development phase
- Establishment of the Product Baseline
Successful completion of CDR is the green light for the next development phases
- Building Hardware
- Writing of Application Software
- Unit test
- System Test
2015 Steve Kirsch- All Rights Reserved
29
SYSTEM ENGINEERING
Last 4 lectures stepped through the design development process for embedded
processing design
Concept development
Preliminary design
Detailed design
Integration and Test (briefly)
Case study utilized real application for real-time high performance embedded
processing in a highly SWAP constrained environment
Goal was to provide insight to the system engineering process and the myriad of
complexities that the embedded system engineer needs to be aware of and the skill
set required
30
10