Tuesday 25th – Wednesday 26th February
(updated 23.2.14) – SCHEDULE
A New Kind of Parallelism: The Ubox Method
John Gustafson, CTO Ceranovo Inc.; USA
“It is time to overthrow a century of numerical analysis. Current methods are based on the acceptance of rounding error and sampling error using numerical representations that were invented in 1914 and algorithms designed for a time when transistors were expensive. The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second. The ubox method, based on a new numerical format that uses metadata to store more information using fewer bits, creates a the richest source of data parallelism since the Monte Carlo method, and redefines what is meant by “high performance”. Examples are given for practical application to structural analysis, radiation transfer, the n-body problem, linear and nonlinear systems of equations, and Laplace’s equation that suggest that the ubox method is general and can replace the ‘bag of tricks’ we currently use to solve technical computing problems.”
The Revolution is On – Dependable Software is Becoming Affordable
Gernot Heiser – Leader of SSRG, the Software Systems Research Group at NICTA; Scientia Professor and John Lions Chair at University of New South Wales; Australia
“It’s been a truism for far too long: software is buggy and unreliable, and where reliability matters, people go to extraordinary expense to a achieve it.
This is changing. A few years ago we completed the correctness proof of a complete operating-system kernel, seL4, and we have now extended this to a complete proof chain from high-level security and safety requirements to the binary running on the hardware. An analysis of the cost reveals that it is well below that of traditionally-engineered “high-assurance” software, and not too far off that of industry-standard low-assurance software.
This talk will give an overview of NICTA’s software system verification activities, and our plans for making verified software cost-competitive with traditionally unreliable code.”
The Symbiotic relationship between game technology and Heterogeneous HPC
Alex St. John – United States / New Zealand
In 1994 I wrote the following words in a pitch to Bill Gates for Microsoft to make consumer 3D graphics a strategic priority:
“The battle of the 3D API on the console and under DOS is just beginning. With the opportunity to buy the most respected player, we can immediately draw huge attention from the entire game industry to our platform by effectively declaring the winner and demonstrating our commitment to pursuing this market. The likely impact on new title generation under Windows from such an announcement will have a revolutionary impact on the PC game industry.”
Several months later we acquired Rendermorphics, moved the entire team of British engineers to Redmond WA and created the Direct3D GAMING API which resulted in the massive growth of GPU companies like AMD and Nvidia and gave rise to the modern day general purpose programmable GPU used across the game industry for 3D entertainment applications and among researchers for scientific HPC applications.
The symbiotic relationship between gaming and scientific computing has been instrumental to the tremendous rapid innovation and production of powerful inexpensive GPU’s. In this presentation I will illustrate how intimately interdependent and complimentary the two applications for GPU power have been and will continue to be far into the foreseeable future.
Embedded Languages for High-Performance Computing in Haskell
A/Prof Manuel Chakravarty, UNSW – Australia
Embedded languages are a convenient and expressive method to capture patterns of high-performance code in functional languages. These patterns can be turned into efficient low-level code by template instantiation of code skeletons, where code fusion combines individual skeleton instances to minimise the abstraction penalty.
In this talk, I will illustrate these concepts as used in the open-source framework Accelerate, an embedded language for general-purpose GPU computing in Haskell that delivers competitive performance with a fraction of the effort required in low-level GPGPU frameworks, such as CUDA or OpenCL.
Embedded Computing Systems in the Multi-Core Era
Wolfgang Schröder-Preikschat, Prof. Dr.-Ing. habil. Friedrich-Alexander-Universität – Germany
Oversimplified, an embedded system is a computerized product in the broadest sense. Given such a very general interpretation, embedded systems appear almost everywhere in our life. This holds for any kind of consumer electronics, medical devices, or vehicles but also water- and aircrafts culminating in complex control systems for large industrial facilities or public services. From the technological point of view, embedded systems are assembled from varied-size micro-controllers, signal processors, common processors, or a mixture out of it. In most of these cases, multi-core technology founds the basis as a characteristic trait that can hardly be circumvented – much as one would like. In terms of hardware, the chips form a homogeneous or heterogeneous sphere of tightly coupled processing elements. The functional and, in particular, non-functional properties of these computing devices have an effect all the way bottom-up throughout the system software to the point of the application software. Engineering as well as re-engineering of software for such hardware is anything but easy, despite the numerous experiences that can be learned from parallel systems development of the past decades.
The talk considers embedded systems, on the one hand, from the perspective of a special-purpose parallel system and, on the other hand, by taking the position of an operating-system engineer. General opportunities, problems, and challenges by the use of multi-core technology in that domain will be marked out. Difficulties in adopting legacy software are addressed, as well as the degree of transparency that can be expected, for example, from an operating system in order to aid this process. General focus is on embedded systems in the field of control systems, rather than consumer electronics, that have to operate under (soft, firm, hard) real-time constraints.
Get off the grass with Multicore
Prof Shaun Hendy, FRSNZ – New Zealand
New Zealanders work harder and earn less than most other people in the developed world. As Ernest Rutherford put it: “We’ve got no money, so we’ve got to think.” Yet on a per capita basis, the OECD produces four times as many patents as New Zealand. Why is this? What determines a country’s capacity to innovate?
In this talk, I will take a quantitative and comparative look at New Zealand’s innovation ecosystem using data from international trade, patent and scientific databases. This analysis illustrates the important role that networks and collaboration play in the production of new knowledge. In particular, we find that large cities consistently produce more patents per capita than smaller cities because they connect people with diverse expertise and combine ideas in new, complementary ways. The challenge for New Zealand is to find ways to build connected communities of knowledge workers and businesses that are as effective that those that exist in larger population centres.
We also find that high productivity countries invent and export many highly complex products, while low productivity countries tend to export a small number of commodity products. If New Zealand is to boost its long run economic growth rate, then it must diversify its exports, while building scale in its industries. New Zealand must open up the exchange of information and ideas within its innovation sector and become an exporter of knowledge rather than nature.
Can Multicore New Zealand lead the way in innovation by building scale and diversity through open collaboration?
Construction of Square Kilometre Array Computing
Tim Cornwell, SKA architect; UK
“The Square Kilometre Array is now in design and is due to move to construction in 2017. The array is actually three telescopes each one of which will provide a different window into the universe yielding insights into pressing astrophysical questions. This groundbreaking scientific instrument will be one of the jewels of 21st-century science. From 2013 to 2016, SKA is in the preconstruction phase. At the end of this phase ready to build designs must be available for all elements. From 2017 onwards, there will be a procurement phase followed by construction. In the pre-construction phase, consortia are responsible for the development of designs for each element. There is no assumed connection between the consortia responsible for pre-construction and the bodies responsible for construction. This means that in all areas but particularly in computing there will be opportunities for those not currently involved. This will be true at all levels of the construction phase. I will discuss what these opportunities might look like.”
(via video conference)
A tale of two projects — Partitioning to a massively parallel machine and Scaling out program analysis to millions of lines of code
Cristina Cifuentes – Architect and Research Director, Oracle Labs – Australia
Throughout the years, Parallelisation has been essential when handling large amounts of program data in projects that involve transformations of code, whether for the purposes of generating machine code for thousands of processors, or for analysing millions of lines of code or billions of facts in a timely manner.
This talk summarises key design decisions made in a couple of program transformation projects held at Sun Microsystems Laboratories and at Oracle Labs, that allowed both projects to achieve results over very large program data used in the transformations.
The first project relates to a cycle-based Verilog compiler that generated code for a massively parallel machine of more than 40,000 processors. The second project relates to analysing millions of lines of source code, and billions of facts, to find bugs and security issues in those programs.
Comparative Scalability I/O Studies in HPC Clusters
Prof. Andreas Wicenec, UWA – Australia
Typical I/O to FLOP ratios (Amdahl numbers) of current HPC clusters are orders of magnitudes below one . Apart from those theoretical limitations, the actual achievable I/O rates and thus the scalability of applications requiring access to very large data volumes very often is affected by non-optimised configurations of hardware and/or various software layers. In this paper we will present results of various experiments showing the influence of configuration changes, the usage of different I/O libraries and comparisons between cheap local, node based storage compared to a high end Lustre global file system. The results suggest that for certain extremely data intensive and data parallel problems, scalability can be reached by using an extreme shared nothing paradigm. On the other hand it is also clear that proper configuration and the choice and optimisation of the underlying I/O software stack, including the OS I/O system are equally important.
 Szalay, A. S. (2011). Amdahl ’ s Laws and Extreme Data-Intensive Scientific Computing. ADASS, 442.
Exploiting the new Kaveri CPU/GPU architecture
David Brebner, CEO Unlimited Realities – New Zealand
David will present how the UMAJIN application engine can take advantage of the Kaveri APU and the heterogeneous systems architecture to execute more code, faster.
Adaptive Integration of Hardware and Software Lock Elision Techniques
Mark Moir, Principal Investigator of the Scalable Synchronization Research Group, Oracle – USA / New Zealand
Hardware Transactional Memory (HTM) has recently entered mainstream computing. There has been significant research into ways to exploit HTM, ranging from supporting new transactional programming models, to supporting scalable concurrent data structures, to “transactional lock elision” (TLE) techniques, which use HTM to boost performance and scalability of applications based on traditional lock-based programming with no changes to application code.
In recent work, we have been exploring the use of software techniques that similarly aim to improve the scalability of lock-based programs. These techniques involve somewhat more programmer effort than TLE but work in the absence of HTM, and furthermore can provide benefits in cases in which HTM is available but not effective. Different combinations of these hardware and software techniques are most effective in different environments and for different workloads.
This talk introduces the Adaptive Lock Elision (ALE) library, which supports integration of these techniques and facilitates dynamic choices between them at runtime, guided by a pluggable policy. Results of preliminary evaluation on four different platforms—two of which support HTM—will be presented. Our results illustrate the need for policies that adapt to the platform and workload, and evaluate preliminary work in this direction.
Scalable distributed storage with CEPH
Ricardo Rocha, Cloud Engineer, Catalyst IT – New Zealand
High Performance Computing (HPC) applications have always imposed significant requirements on data storage systems, both in the amount of data being stored and the need for low latency and high bandwidth for appropriate processing.
In recent years, a significant shift has occurred with storage solutions being offered where new interfaces and paradigms aim at relaxing some of the features of traditional access protocols (strict posix, posix like, …) to achieve greater scalability and availability.
In this presentation we introduce the available interfaces in one of the most popular solutions, the open source CEPH data storage platform. We describe the possibilities for both legacy and new applications to explore these new features in the best way. And we present results from a detailed performance evaluation, covering a comparison for a variety of data access workloads via traditional posix, block level and direct access to its distributed object store.
Big Data, e-infrastructure and the Economic Impact Opportunity
Prof. John Bancroft, Head of Project Development Asia-Pacific, STFC – UK – New Zealand
This talk focuses on the structure of the UK’s e-ecosystem, its recent large investments into e-infrastructure and Big Data, the reasons for them and the economic impact opportunity they are targeted to produce. These opportunities will not be confined to the UK, but will impact on research and commerce across the Globe. By establishing key partnerships between the United Kingdom and New Zealand the talk will also describe some economic impact opportunities for New Zealand’s innovators and companies, especially those that may flow from the Square Kilometre Array Project, in which NZ is a major partner.
Scaling HPC applications in a core rich environment
Kent Winchell – CTO Office, Cray, Inc. USA
Core rich environments provide immediate help for scale out applications but not for scale up applications. Many applications have time to solution requirements that depend on certain computational performance levels. A case study of the computation intensity of Numerical Weather Prediction with respect to Xeon PHI and GPGPUs will be presented.
The problem of scale in computational nanoscience
Nicola Gaston, VUW. New Zealand
For the better part of a hundred years, we have known in principle how to describe the quantum mechanical interactions between atoms to a desired level of accuracy; however the computational effort has remained impractical for large systems.
Nanoscience is defined by the scale of the system of interest, which for material systems ranges from hundreds to hundreds of thousands of atoms. This is a regime in which materials behave quite differently from the bulk, but in which the computational problem is already severe. I will discuss approaches to dealing with this problem, including the computational techniques for which the Nobel prize in Chemistry was awarded in 2013. Some of the most significant remaining challenges will be discussed, along with the future potential of computational nanoscience for a range of industrial applications.
HPC Technology Landscape Review
TN Chan, System Architect, Compucon New Zealand
Heterogeneous computing has evolved to heterogeneous multiprocessing as of today. We are no longer satisfied with the arrangement for acceleration with GPU alone but looking for static or dynamic allocation of resources from a mix of GPU, Many-Core, and FPGA in the same system. Will this evolution be driven by hardware, software, or users? This session attempts to provide a view of the situation.
SKA: Driving Innovation
Jasper Horrell, GM Innovation, SKA South Africa
Hosting of a significant portion of the SKA in Africa has big implications for the continent. The involvement in the SKA has led to a significant well-managed, targeted investment from the South African government which has already transformed the science and technology landscape in South Africa and into African partner countries. Not only has one of the best radio quiet sites been established in the Karoo semi-desert region of the Northern Cape, but a vibrant community of world leading scientists and engineers has been formed to support the initiative. The reach of the SKA SA project is extensive, from innovations in computing, the development of the KAT-7 and MeerKAT radio telescope instruments at the Karoo site, the generation of a very significant astronomy focused HCD programme, deep involvement in SKA design consortia, to radio astronomy development activities in African partner countries. SKA is an enabler: spawning big data projects; linking businesses government and academia; enabling conversations that would not have been possible; the list goes on. An overview of this amazing growth is given, with a focus on the computing developments, in particular.