Welcome to SMC-IT
Canadian Arm
DSN
James Webb Telescope
Mars Communications
MSL
Spitzer
Exomars

MINI-WORKSHOP ON

MULTICORE PROCESSORS FOR SPACE - OPPORTUNITIES AND CHALLENGES

Using Maestro for Early Prototyping

This 2 1/2 day workshop at the SMC-IT conference includes both general topics applicable to most multicore processors in space missions and in-depth technical topics from recent work on the Maestro processor. Please contact the organizing committee chair if you are interested in presenting or serving on a panel at the workshop. Additional topic suggestions are welcome!

Invited talks will address major themes in the workshop:

  • Session I: Programming for Parallelism in Space: Tools and Programming Models
    • Compilers
    • Parallel Libraries
    • Programming Models – How to Choose and Combine
  • Session II: Supporting Technologies and Tools for Multicore In Space, including
    • Run-time support for compilers
    • Programmability and Portability
    • Power Management
    • Performance optimization and assurance
  • Session III: Will It Work? Is It Still Working?
    • Verification and Validation of Multicore Processors for  Space Missions
    • On-Board Fault Management for Multicore
  • Session IV: Space Applications on Multicore Processors
  • Session V: Operating Systems, Hypervisors, and I/O for Multicore In Space
  • Session VI: Architecture-Specific Optimization Techniques: Maestro

Preliminary workshop schedule:

  • Sunday 7/19
    • 1PM – 5PM Multicore in space background information – embedded multicore requirements, avionics system design and multicore processors, OPERA/Maestro overview, the vocabulary of multicore and parallel programming
    • evening – miniworkshop reception
  • Monday 7/20
    • 9:30AM                       Workshop Keynote
    • 10:30AM – noon          Session I
    • 1:30PM – 5:00PM        Sessions II and III
    • evening                        Conference Reception
  • Tuesday 7/21
    • 9:30AM – noon            Session IV
    • 1:30PM – 4:45PM        Sessions V and VI
    • 4:45PM – 5:00PM        Workshop Summary

Invited tutorials on Sunday afternoon 7/19 1PM – 5:30PM provide background information:

  • 1:00 – 2:00 PM A Rapid Tour Through Multicore Technologies (B. Scott Michel, Aerospace)
    This tutorial briefly examines the spectrum of multicore technologies, ranging from symmetric chip multicore to hybrid multicore and accelerator technologies (e.g., general-purpose graphics processor unit (GPGPU) acceleration). It also presents the problem areas to which each of these technologies are applied. Finally, various solutions to the multicore "programmability gap" are examined.

  • 2:00 – 2:45 PM Avionics System Design and Multicore – From R&D To Deployment – The Road To Flight (Rafi Some (speaker), Len Day, Dwight Geer, JPL)
    "The Road to Flight" will cover a few of the salient mission lifecycle technology infusion steps required to design and build a new processor part type into full flight qualified avionics subsystem for deep space.  Both hardware and software will be examined.  A case study will be given as an example. Comments on the DoD perspective will be given by Mike Malone.

  • 2:45 – 4:00 PM OPERA/Maestro overview including RHBD (Mike Malone, Draper Laboratory and Steve Crago, USC/ISI-East)
    This presentation will describe the OPERA (Onboard Processing Expandable Reconfigurable Architecture) program.  Also discussed will be background on the RHBD (Radiation Hardened By Design) program, the original DARPA PCA (Polymorphic Computing Architectures) program, and the OPERA program’s space multicore processor - Maestro.  Finally, the presentation will outline the associated software and early architecture prototyping options. 

  •  4:00 – 5:30PM Leveraging Multicore and Virtualization (Mike Deliman, Wind River)
  • This tutorial will introduce the concepts essential to understanding the role of the operating system and other system components on multicore platforms, and illustrate the differences between the software “stack” for a uniprocessor system and those necessary to provide SLA (service level agreement) assurance for a variety of multicore architectures.

The workshop keynote on Monday 7/20 morning kicks off the detailed technical sessions:

  • 9:30AM – 10:30AM Digital Space - Dr. Anant Agarwal, Professor of Electrical Engineering and Computer Science at MIT and Associate Director of the CSAIL Laboratory

Invited talks will address major themes in the workshop:

  • Session I: Programming for Parallelism in Space: Tools and Programming Models
    Session I Chair: Hans Zima; Co-Chair: B. Scott Michel

      Compilers – Session I-A Monday 7/20 10:30AMnoon

  • 10:30AM – 11:00AM Models and Tools for Spaceborne Computing (Hans P. Zima, zima@jpl.nasa.gov)
    The emergence of multi-core technology provides the opportunity for high-capability space-borne computing supporting autonomy and on-board science processing. The massive parallelism provided by systems such as Maestro requires a re-thinking of the ways future architectures will be programmed.

    This presentation will outline requirements and challenges for high-productivity programming languages for reliable flight computing. We will discuss the major ideas underlying the programming models of Partitioned Global Address Space (PGAS) languages as well as the Chapel language developed in DARPA's HPCS program and point out how these concepts can contribute to the development of a powerful new language for space-borne computing.

  • 11:00AM – 11:30AM Empirical Analysis of UPC in the Tile-64 Processor, Olivier Serres (serres@gwmail.gwu.edu), Ahmed Anbar (anbar@gwmail.gwu.edu), Saumil Merchant (smerchan@gwu.edu), Tarek El-Ghazawi (tarek@gwu.edu)(speaker)
    This paper presents preliminary implementations and analyses of the Unified Parallel C (UPC) programming language on the Tile 64 processor from Tilera Corporation. The chosen UPC design flow uses the Berkley UPC translator and the Tilera C compiler to compile the UPC programs. The UPC runtime system is built on top of the GASNet communications infrastructure. Two approaches have been used to implement GASNet for the Tile 64 processor: (i) GASNet built using pThreads conduit, and (ii) GASNet built using MPI conduit. Each approach uses different on-chip, inter-core communication networks providing different latencies and bandwidths for inter-process communications. The paper presents the implementation details and empirical analyses of both approaches by comparing results from microbenchmarks and commonly used full-featured benchmarks. Analyses of the benchmark results reveal several bottlenecks and optimization opportunities. These are currently under study and the insights gained will be used to design a highly optimized UPC runtime system for the Tile 64 processor.

  • 11:30AMnoon Rstream: Automatic Parallelization and Mapping for Space Processors (Richard Lethin, Reservoir Labs, lethin@reservoir.com)
    High computational efficiencies (ops/Watt) are available for emerging space based processors but apparently at the cost of greater programming complexity, such as parallelization and the detailed choreography of hardware resources.  This talk will present the R-Stream compiler which can perform such mapping automatically from a high-level program input, and discuss some of the potential system benefits of this approach to software development for space processors.

       Parallel Libraries – Session I-B Monday 7/20 1:30PM – 2:30PM

  • 1:30PM – 2:00PM MPI Performance on Maestro, Steve Crago (crago@east.isi.edu), Mikyung Kang, and Jinwoo Suh (speaker), USC/ISI-East
    In this talk, we will describe our implementation of MPI (Message Passing Interface) 1.2 on Maestro. Our MPI library is implemented on top of a modified version of Tilera's iLib. We will describe performance in terms of latency and bandwidth for a range of message sizes, and will show the component costs and contributing performance characteristics. We will also describe the validation tests that we have run to ensure compliance with the MPI 1.2 standard.
  • 2:00PM – 2:30PMOpenMP for Tilera, Richard Lethin (speaker) and Vassily Litvinov, Reservoir Labs, (vass@reservoir.com)
    OpenMP is a mainstream programming methodology for shared memory computers.   This talk will describe the implementation of OpenMP in terms of Tilera shared memory abstractions, and show how to program the Tilera architecture using OpenMP.

    Session II: Supporting Technologies and Tools for Multicore In Space Monday 7/20 2:30PM – 4PM
    Session Chair: Richard Schooler; Co-Chair: Steve Crago
    • 2:30PM – 3:00PM Run-Time Monitoring – What, When, How –Dong-In Kang,(speaker) and Steve Crago, USC/ ISI-East (crago@east.isi.edu)
      In this talk, we will describe a run-time performance monitor for the Maestro processor. The Maestro processor is a 49-core multi-core processor for space being developed using radiation-hardened by design technology. The run-time performance monitor collects information from built-in hardware performance counters to be used by programmers, performance analysis and optimization tools, and the operating system or run-time system. Performance information related to the processor, memory system, and on-chip networks is collected.
    • 3:00PM – 3:15PM Managing Power on the TILE64 – Richard Schooler, Tilera (rschooler@tilera.com )
      This talk will discuss the power management options on the TILE64 and Maestro chips, providing an opportunity for Q&A about this architecture feature.
    • 3:15PM – 4:00PM Performance Analysis Tools for Maestro – Rob Kost, USC/ISI-East (speaker) and Steve Crago USC/ISI-East (crago@east.isi.edu)
      We will describe a suite of performance analysis tools for the Maestro processor in this talk. The Maestro processor is a 49-core multi-core processor for space being developed using radiation-hardened by design technology. Maestro is based on a homogeneous architecture and is relatively straightforward to program and supports a variety of programming models. However, performance analysis optimization for 49 cores can be a challenge for complex applications that run on many cores. Maestro programmers can use the performance analysis tools that are part of Tilera’s Multicore Development Environment (MDE), and we are developing additional tools for Maestro that provide additional capability, with a focus on correlating performance characteristics to source code.
    • Session III: Will It Work? Is It Still Working? Monday 7/20 4PM – 5PM
      Session Chair: Joe Coughlan; Co-Chair: Hans Zima
    • 4:00PM – 4:30PM Introspection-Based Fault Tolerance, Mark L. James, Paul L. Springer, and Hans P. Zima (speaker), JPL (zima@jpl.nasa.gov)
      Future deep-space missions will need support for autonomy and enhanced science processing. The integration of emerging multi-core technology into space-borne systems can provide the required performance; however, protecting such systems against faults has become a critical research issue.        In this talk we present the design of a generic framework for introspection that supports runtime monitoring and analysis of program execution as well as feedback-oriented recovery from faults. Introspection provides flexible software fault tolerance matched to the requirements and properties of applications by exploiting knowledge that is either contained in a knowledge base, provided by users, or automatically derived from specifications. The JPL-developed Spacecraft Health Inference Engine (SHINE)  is at the center of the introspection system.

    • 4:30PM – 5:00PM Adaptive Software-based Fault Tolerance for Space Multicore Processing, Adam Jacobs (Jacobs@chrec.org), Grzegorz Cieslewski (cieslewski@chrec.org)(coordination: Herman Lam, hlam@ufl.edu) (30 minutes)
      Increasing demand for high-performance computing in space, coupled with limitations of device-level methods for SEU mitigation, are driving innovations in advanced space computing with system- and application-level fault tolerance.  As devices increasingly feature multicore architectures, the space community must adapt and incorporate these devices into future missions.  These multicore devices are an increasingly attractive option for processing in space-based systems due to their inherent advantages in performance, scalability, energy efficiency, size, and cost, but with them come challenges in attaining optimal performability. 

      This presentation will highlight research activities at the University of Florida from two recent projects on this path, the NASA Dependable Multiprocessor (DM) developed at Florida and Honeywell, and the hybrid fault tolerance (HFT) framework of CHREC.  The NASA Dependable Multiprocessor project features a multitude of system- and application-level techniques for fault tolerance to protect the system from SEU-induced errors, much of which is applicable to the needs of space multicore processing.  The DM system consists of primary and secondary RadHard system controllers and a suite of COTS- based, data-processing boards featuring PPC, AltiVec, and FPGA processors, all connected through Gigabit Ethernet, similar to many traditional supercomputing clusters.

      Fault tolerance in DM can adapt to environmental radiation conditions, with an array of disparate and flexible modes, including SIFT at the highest level via high-availability middleware with manager and agent processes running on RadHard and COTS microprocessor technologies, respectively, along with a variety of modes for fault tolerance operating underneath, many available in either spatial or temporal form.  The high-availability middleware manages the health and status of multiple concurrent jobs, taking corrective action when necessary.  Application-level communication between nodes is facilitated through the use of Fault-Tolerant Embedded MPI (FEMPI), allowing for the recovery of a parallel job without the need to completely restart the application.  Application-level techniques for fault tolerance, such as replication, algorithm-based fault tolerance, and checkpoint/rollback are also featured and examined with a range of applications including LU decomposition, 2D-FFT, synthetic aperture radar, and hyperspectral imaging.  

      The hybrid fault tolerance or HFT framework is a new component in an on-going research project of CHREC entitled F6-09, Reconfigurable and Hybrid Fault Tolerance.  One of the tasks in current work on HFT that is applicable to the needs of space multicore processing is a new method of protecting microprocessor cores from SEU-induced errors via automated source-to-source (S2S) translation with high productivity.  Replication embedded in the application program instructs the processor to perform redundant calculations.  These calculations can then be compared and/or voted upon to detect and/or correct errors automatically.  Through the use of S2S translation, we present a method of performing this replication through a high-level language (in this case, C).  A translator would take an input program source code and output a fault-tolerant version of the same program (with very little or no user intervention) that could then be compiled with any valid compiler. 

      Additionally, we are exploring methods for software-based fault injection to examine the reliability of various microprocessor devices and the efficiency of newly proposed FT methods for them.  Our simple, portable fault injector (SPFI) allows us to emulate SEUs by injecting errors directly into processor registers of each processor core.  The injector software can work with any system that supports the GNU debugger, making the tool highly portable.  This approach allows us to quickly inject faults, test behavior, and estimate error rates expected without the need for expensive radiation testing at each step.

    • Session IV: Space Applications on Multicore Processors Tuesday 7/21 9:30AMnoon
      Session Chair: Steve Crago; Co-Chair: Marti Bancroft
    • 9:30AM – 10:00AM Parallelizing Lunar Safe Landing Algorithms on the Tilera Tile 64 Processor (Carlos Villalpando, carlos.y.villalpando@jpl.nasa.gov, speaker, and Raphael Some, rsome@jpl.nasa.gov)
      NASA's Electronics Technology Development Program (ETDP) is developing an autonomous lading capability for the Altair Lunar Lander under the ALHAT (Autonomous Landing and Hazard Avoidance Technology) project. The Radiation Hardened Electronics for Space Environments (RHESE) project is supporting this work with the evaluation and selection of high performance multi-core processors for computationally intensive image processing under its High Performance Processing (HPP) task. In this presentation, we provide a snapshot of the ongoing work in parallelizing and porting the ALHAT hazard avoidance algorithms to the Tile64 processor and the results of scaling studies performed on these codes using the Tilera TILExpress-64 evaluation board.

    • 10:00AM – 10:45AM Space Applications on Tilera, Justin Richardson (richardson@chrec.org), Chris Massie (massie@chrec.org) speakers, contact is Dr. Herman Lam (hlam@chrec.org) - includes SAD (Sum of Absolute Differences), Steganography, and HSI (Hyperspectral Imaging)
      This extended abstract summarizes our presentation for the ”Multicore Processors For Space - Opportunities and Challenges” workshop, at the IEEE Space Mission Challenges for Information Technology (SMC-IT) 2009 conference. The pre- sentation begins with an overview of the NSF Center for High- Performance Reconfigurable Computing (CHREC) at the University of Florida and its research activities in reconfigurable computing (RC) and RC applications for space. In particular, we will focus on space applications being implemented on the Tilera TILE64 processor. These applications include:1) A case study of a Sum of Absolute Differences (SAD) algorithm for a comparative analysis of the vector-based operations provided by the TILE64. 2) A steganography application for the TILE64 processor, highlighting alternative parallelization strategies. 3) A Hyper-spectral imaging (HSI) application to compare the TILE64’s shared memory and DMA operations with respect to memory homing

    • 10:45AM – 11:15AM OpenJPEG Performance on Maestro, Donald Yeung, University of Maryland (speaker)
      JPEG2K is an important image compression standard based on wavelets. OpenJPEG2K is an open source program that implements the standard. In this talk, we will describe the performance of a parallel implementation of the compression part of OpenJPEG2K. Several implementation alternatives were considered and characterized and will be discussed.

    • 11:15AM – 11:45AM Multi-Core Architectures for Emerging NASA Applications: Some Results for Tilera’s TILE64 and Maestro, (JPL: Paul Springer, paul.springer@jpl.nasa.gov and Ed Upchurch) (30 minutes)
      Multi-core architectures promise several orders of magnitude increase in compute performance over current mission processors. Our research is using dynamic discrete event models coupled with actual implementation of key kernel codes on multi-core chips to determine to what extent and under what conditions and workloads these promises are true. This presentation will discuss our results on the actual Tilera Tile 64 chip and the Maestro simulator for the FFT key kernel of a Support Vector Machine (SVM) application. Speedup curves and trace analysis show perfect scaling all the way to 56 nodes for both Tile64 and Maestro.

    • 11:45AMnoon FFTW Karandeep Singh USC/ISI-East (speaker)
      In this talk, we will describe the performance of FFTW on the Maestro processor. The Maestro processor is a 49-core multi-core processor for space being developed using radiation-hardened by design technology. Fast Fourier Transform is an important processing kernel used for many signal and image processing applications. FFTW is a fast, free, and portable self-optimizing implementation of FFT. Absolute performance of the FFTW using floating-point arithmetic on a single core will be reported and performance will be compared to other implementations of FFT that have been optimized for Maestro.
    • Session V: Operating Systems, Hypervisors, and I/O for Multicore In Space (Tuesday 7/21 1:30 – 3:00PM)

    • 1:30 PM – 2:00PM Multicore, Hypervisor, and Real Time (Mike Deliman, Wind River)
      This talk will provide detail about the costs / benefits of three major paradigms for approaching device control in a multicore environment:
      • Virtual devices in the OS with the hypervisor running the device
      • Collaboration where an OS runs directly on a core beside the Hypervisor
      • Modes where OSs under Hypervisor control "own" a given device ("native" driver mode)

    • 2:00PM – 2:15PM Operating-System Options on TILE64/Maestro – How To Choose (Richard Schooler, rschooler@tilera.com)
      This talk will briefly outline the operating system options on TILE64 and Maestro chips, from SMP Linux to bare metal.

    • 2:15PM – 3:00PM The OS, The Hypervisor, and I/O: How To Change/Add A New Device (Richard Schooler, rschooler@tilera.com)
      This talk will use code examples and white papers shipped with the latest MDE (Multicore Development Environment) to illustrate the areas of the operating system and hypervisor that need changes when adapting an existing driver to a new device, or when adding an entirely new driver. These examples apply to both TILE64 and Maestro.
      Session Chair: Marti Bancroft; Co-Chair: Richard Schooler

    • Session VI: Architecture-Specific Optimization Techniques: Maestro (Tuesday 7/21 3:15 – 4:45PM)
      Session Chair: B. Scott Michel; Co-Chairs: Steve Crago, Marti Bancroft

    • 3:15PM – 3:45PM Protection Mechanisms For Domain Separation – Richard Schooler, Tilera
      This talk will address protection mechanisms for separation of processing and other domains in general and use the HardwallTM to illustrate these concepts for TILE64 and Maestro chips.

    • 3:45PM – 4:15PM Using DMA for data-intensive applications – DMA Performance on Maestro – Karandeep Singh, USC/ISI-East
      In this talk, we will describe the performance of DMA on TIle64/Maestro.  We will describe a simple co-addition algorithm used to illustrate and measure DMA performance, and will show performance for different numbers of parallel DMA transfers and transfer sizes.  We will show that we were able to get near the peak memory bandwidth expected, and will compare DMA performance to processor-based memory accesses.    
        
    • 4:14PM- 5:00 PM MW5 WORKSHOP KEYNOTE: “Challenges and Opportunities in building high reliability systems for Aerospace,”
      Ravi Iyer, Professor of Electrical and Computer Engineering, University of Illinois at Urbana - Champaign Director, Coordinated Science Laboratory co - Director, Center for Reliable and High Performance Computing?

    Organizing Committee

    Marti Bancroft, MBC, chair 
    marti@dragonsden.com

    Larry A. Bergman, JPL (SMC-IT coordination)
    larry.a.bergman@jpl.nasa.gov

    Joseph C. Coughlan, NASA
    joseph.c.coughlan@nasa.gov

    Steve Crago, ISI-East
    crago@east.isi.edu

    B. Scott Michel, Aerospace
    scottm@aero.org

    Richard Schooler, Tilera
    rschooler@tilera.com

    Hans Zima, JPL
    zima@jpl.nasa.gov

    Principal Organizer

    Marti Bancroft, MBC, chair
    marti@dragonsden.com

    For more information contact: info@SMC-IT.org   .   Copyright 2009 SMC-IT 2009. All rights reserved.      .       Webmaster: klittle@smc-it.org