| Title | Date Added | Company | |
|---|---|---|---|
![]() |
Using Intel Thread Profiler for Win32 Threads: Philosophy and Theory | 2008-05-02 | Intel |
| In the past, software profiling tools have concentrated on measuring the execution time of functions and procedures within applications. For serial applications this was useful information that guided the programmer to those parts of the code that could most benefit from optimization or threading. For applications that have already been threaded, this type of information often has little value. Information that relates directly to performance issues from threaded and parallel execution is of much more use. This is the first paper in a two-part series that explains what Intel Thread Profiler for Win32 Threads does and how to use it effectively on one's own explicitly threaded applications.
Tags: High Performance Computing |
|||
![]() |
Black Belt Itanium Processor Performance: The Foundation (Part 1 of 5) | 2008-04-30 | Intel |
| Many optimizations will automatically be enabled with aggressive compiler flags. However, with a small amount of performance analysis, architectural insight and targeted code modification, the software developer can greatly improve the resulting application performance. Understanding and resolving pointer ambiguation is important on all architectures and will always improve performance, but on Itanium architecture-based processors the payoff can be enormous, and help release the full power of this new architecture.
Tags: Programming Languages, Application Development |
|||
![]() |
Planning Considerations for HiperDispatch Mode | 2008-04-02 | IBM |
| For all levels of z/OS, a TCB or SRB may be dispatched on any logical processor of the type required (standard, zAAP or zIIP). A unit of work starts on one logical processor and subsequently may be dispatched on any other logical processor. The logical processors for one LPAR image receive an equal share for equal access to the physical processors under PR/SM LPAR control. For example, if the weight of a logical partition with four logical processors results in a share of two physical processors, or 200%, the LPAR hypervisor will manage each of the four logical processors with a 50% share of a physical processor. All logical processors are used if there is work available, and they typically have similar processing utilizations.
Tags: Mainframes, |
|||
![]() |
Premier IT Magazine: Reinvented Transistors | 2008-01-01 | Intel |
| 45-nm Manufacturing Creating the Next Wave of Quad-Core Processors | |||
![]() |
Intel Dual-Core HPC Cluster Uses Next-Generation Intel Xeon Processors | 2008-01-01 | Intel |
| Intel has delivered on Moore's Law using dual-core processing to build a 128-node High-Performance Computing (HPC) cluster that delivers theoretical peak performance of 3.2 teraflops and sustained performance of over 2.1 teraflops. Based on off-the-shelf technologies, including the next-generation dual-core Intel Xeon processor and an InfiniBand interconnect, the cluster represents a new era that rapidly increases performance while reducing or holding steady the requirements for power, heat and floor space. Industry collaborators and end-users can access the machine through the Intel Remote Access Service and use it to test-drive their codes and accelerate their move to Intel multi-core computing.
Tags: High Performance Computing |
|||
![]() |
Live Migration With AMD-V Extended Migration Technology | 2007-12-17 | Advanced Micro Devices (AMD) |
| Virtual Machine migration is a capability being increasingly utilized in today's enterprise environments. With live migration, a Virtual Machine Monitor (VMM) moves a running Virtual Machine (VM) nearly instantaneously from one server to another for a seamless experience to the end user while maintaining guest uptime. However, if the processors running in the computers in the pre and post migration environment are not identical, live migration can result in unexpected behavior of the guest software. Since the introduction of its AMD64 processor technology in 2003, AMD has worked closely with virtualization software developers to define the functionality necessary to ensure that live migration is possible across a broad range of AMD64 processors.
Tags: Upgrades and Migration, Virtualization |
|||
![]() |
Tuning Symantec Brightmail AntiSpam on UltraSPARC T1 and T2 Processor-Powered Servers | 2007-12-07 | Sun Microsystems |
| Electronic mail is a business-critical function in virtually every enterprise, and it is also one that is under constant attack. Well-known viruses such as Melissa, and worms like SoBig have propagated through email and have disrupted user PCs and corporate networks worldwide. Fraudulent email messages find their ways into inboxes and tempt unsuspecting users into divulging personal information at phishing sites. As companies recognize that their intellectual property can easily leave their premises through email messages, filtering outbound and internal messages is becoming as important as protecting an organization from incoming traffic.
Tags: Network Security, Spam - E-mail Fraud - Phishing |
|||
![]() |
AMD Stream Computing: Software Stack | 2007-12-06 | Advanced Micro Devices (AMD) |
| Advanced Micro Devices, Inc. (AMD) - a leading global provider of innovative computing solutions - is working with other leading companies and academic institutions worldwide to deliver a complete, accelerated computing ecosystem with software and tools necessary to turn its high performance, low cost, supercomputing vision into reality. AMD's Stream Computing initiative is ushering processing technologies into the accelerated computing era through integration of CPU, GPU and complete software stack. AMD Stream Computing is a first step in harnessing the tremendous processing power the GPU (Stream Processor) for high performance, data-parallel computing in a wide range of business, scientific and consumer applications.
Tags: High Performance Computing |
|||
![]() |
Cache Optimization for Coarse Grain Task Parallel Processing Using Inter-Array Padding | 2007-12-01 | Waseda University |
| The wide use of multiprocessor system has been making automatic parallelizing compilers more important. To improve the performance of multiprocessor system more by compiler, multigrain parallelization is important. In multigrain parallelization, coarse grain task parallelism among loops and subroutines and near fine grain parallelism among statements is used in addition to the traditional loop parallelism. In addition, locality optimization to use cache effectively is also important for the performance improvement. This paper describes inter-array padding to minimize cache conflict misses among macro-tasks with data localization scheme which decomposes loops sharing the same arrays to fit cache size and executes the decomposed loops consecutively on the same processor.
Tags: Parallel Processing |
|||
![]() |
Parallel Processing Using Data Localization for MPEG2 Encoding on OSCAR Chip Multiprocessor | 2007-12-01 | Waseda University |
| Need for efficient processing of multimedia applications on PCs, mobile phones, games and so on have been increasing. Especially, low cost, low power consumption and high performance processors for multi-media applications have been expected. To satisfy the demands, chip multiprocessor architectures which allow giving scalability using multigrain parallelism are attracting much attention. However, to get performance of chip multiprocessor architectures, data locality optimization for target applications is also required. This paper describes a parallel processing scheme for MPEG2 encoding using data localization technique which improves execution efficiency by using global data locality optimization among different loops with coarse grain task parallel processing, and evaluates the performance of the proposal scheme on OSCAR chip multiprocessor architecture.
Tags: Parallel Processing |