Home Author Information
Committee Contact Us Corporate Grants
Program
Registration Schedule Tours
Travel and Local Area Information Workshops SMC-IT Group Photo Past SMC-IT Archives
CFP Advertisment Poster

LRO
ESA James Webb Space Telescope
LCROSS
MSL
Cubesat
DARPA F6 Constellation
Orbiting Carbon Observatory
Shuttle Unity Canadian Arm
NASA Solar Probe
NASA WISE
Juno
Saturn
NASA SDO
Space
Phoenix Rover
Interplanetary Internet
NASA EVA
SDO
Solar
Glory
Space Based Space Surveillance Satellite
Bimodal NTR Space Transfer Vehicle Concept
STSS-ATRR Satellite

SMC-IT 2011
TOPICS OF INTEREST

Reliable Software

Autonomy & Automation

CubeSat Software

Space Cybersecurity

Robotics Software

Engineering Design Tools

Fault Management

Real-Time Embedded Systems

Machine Vision

Image Processing

Flight Computing

Novel Applications

Mission Architecture Design

Operations Technologies

Middleware Services

Knowledge Management

Integrated System Health Management

Astronaut Support IT

Science Software Applications

On-board vs Ground Computing

Space Communications

Smart Instruments

Mission Assurance IT

Software Architectures & Tools


SCHEDULE

OCTOBER 1, 2010
Call for Full Papers and Mini-Workshop Summaries

NOVEMBER 1, 2010
Author Submission Website Open

DECEMBER 31, 2010
Call for Full Papers and Mini-Workshop Summaries

MARCH 20, 2011
Author Acceptance Notification

MAY 19, 2011
Early Bird Registration Opens

MAY 19, 2011
Preliminary Program Announced

MAY 20, 2011
Camera Ready Manuscripts Due (incorporating reviewer comments) for upload to the IEEE CPS website

JULY 1, 2011
Regular Registration Opens

AUGUST 2 - 4, 2011
Conference


SMC-IT 2011 TOURS
(August 5, 2011)

USGS, Menlo Park Tour


 

NASA Ames Research Center Tour


 

Computer History Museum


 

Intel Museum (on your own)
Intel Museum


 

Hiller Aviation Museum

Ames Wind TunnelCrowne Plaza Cabana HotelComputer-History-MuseumAmes Pleiades SupercomputerIntel 80 Core Teraflops Research Wafer
NOTE: To receive future announcements, please send a blank email to:
smc-it-join@smc-it.org

MINI-WORKSHOP ON

RHBSSW (RADIATION HARDENING BY SYSTEMS AND SOFTWARE)

CALL FOR PAPERS

Held in conjunction with the Fourth International Conference on Space Mission Challenges for Information Technology (SMC-IT-2011), August 2 - 4, 2011, in Palo Alto, California, USA

Abstract:

This workshop is intended to explore opportunities, due to the availability of many/multi-core technologies for space missions, to provide the desired goal: fail-operate, i.e., operate through failures. This workshop will have four main sessions (16 hours total, spread over 2 days. In addition to new fault tolerance techniques required for multicore computing and new fault management capabilities enabled by the high throughput afforded by multicore computing, the workshop organizers believe that there is a large body of fault-tolerance research that may have been previously shelved due to the impracticality of deploying it using older technologies. With the advent of many/multi-core processing in space, much of this research should perhaps be revisited, as it might provide the insights needed to achieve operate-through using the newer technologies.

Theme and Goals:

Session 1: Problem Definition

Because most of the work in dealing with radiation effects has been at the component level (RHBP, RHBD) this session will attempt to define the fault set that needs to be addressed from that level through the areas that affect the ability to fail-operate or operate-through. Examples include the hardware faults that we may see as we continue to shrink features sizes, independent of those anticipated by radiation effects – including early component failure and degraded performance. The goal of this session will be to establish a common vocabulary for faults and multi/many-core processors, to identify a fault set likely to be seen in future multicore computers and to  identify areas of opportunity and challenges unique to deploying many/multi-core computers in space missions.

Session 2: Fault Tolerance (A Local Perspective)

Just as most RHBP and RHBD work has focused at the component level, so has much of fault tolerance focused on a more local and hard fault viewpoint. Although this is important work and essential to understanding fault behavior, the advent of multi and many-core processors creates both an opportunity for additional fault tolerance and challenges in fault containment and localization. In this area, it is possible that prior research work on fault identification and mitigation – work that was previously not practical to use in systems due to SWaP and/or raw processing power constraints – could be revisited and might provide some unique new approaches to leveraging multi/many-core space processing. Fault propagation is another area to be considered in this session – just as multi/many-core processing brings opportunities for localized fault tolerance, the technology also brings opportunities for fault propagation and the challenge of assessing single points of failure, fault effects, fault detection and localization and fault localization and fault mitigation, especially in real time mission critical applications.

Session 2 will close with a panel session summarizing the different perspective and encouraging (or so the organizers hope) a lively discussion among the participants in preparation for session 3.

Session 3: Fault Management (A System Perspective)

When a component fails, it does not necessarily follow that the system fails. Further, a component may not fail in the classic sense (completely stop operating or responding) but may still behave in a way that endangers the mission (reduced performance, slightly incorrect numerical results, excessive retransmits,….). As in session 2, the organizers believe that it is possible that prior research work on fault identification and mitigation – work that was previously not practical to use in systems due to SWaP and/or raw processing power constraints – could be revisited and might provide some unique new approaches to leveraging multi/many-core space processing. Software is a key component in space processing systems - what approaches to managing software faults are appropriate and effective in a multi/many-core environment?  This session will address architectures (both hardware and software) that aid operate-through, and also touch on the challenge in dealing with operate-through when the system has heterogeneous components (different processor types – RAD750, Maestro, ARM, Freescale,  and different accelerator types – DSP, FPGA, GPGPU,  as well as different sensor types,….).  Verification/validation of the effectiveness of operate-through approaches (beyond the traditional radiation testing environments) is part of this discussion.

Session 4: Commercial Practices – And The Gaps To Be Addressed By Space Processing

As feature sizes shrink, some faults that were formerly associated only with harsh environments are now seen at sea level in commercial processors/workstations. These, plus others that are related to the newer technologies as discussed in Session 1, are a design consideration for exascale terrestrial computer systems. Our quest for techniques to ensure that missions can operate-through may therefore get increased assistance from commercial vendors. This can already be seen in the widespread availability of such techniques as hardware error correction for memories, registers, and caches – but there is likely more on the horizon in both hardware and software (systems software such as the operating systems, and application techniques). Some of the commercial practices come at a cost – some of the error correction techniques can involve timeouts that erode critical timelines before reporting an uncorrectable error. The organizers will be encouraging participation by leading commercial hardware and software vendors and will close the session with a panel that will debate the gaps that space processing research will need to fill based on a ten-year projection of capabilities and areas of “opportunity”.

Relevance:

Incorporation of new technologies is often resisted due to perception of increased risk. At the same time, failure to advance performance of space missions can also increase risk. The emerging many-core options give us an opportunity to revisit the approaches to achieving mission success despite SWaP contraints. Just as some older applications research has been show to provide useful insights into application performance in many-core, so may some prior research work on fault tolerance and fault management be now worth revisiting as it might at last be deployable to aid mission success.

Organization:

Session Chair(s): Marti Bancroft, MBC, USA, marti@dragonsden.com
Rafi Some, NASA JPL, USA, raphael.r.some@jpl.nasa.gov
marti@dragonsden.com