Linux Platform

Tcl/Tk Features

Clif Flynt , in Tcl/Tk (Tertiary Edition), 2012

1.i.ii Documentation

On a UNIX/Linux platform, the man pages will be installed nether installationDirectory/man/isle of man, and can be accessed using the man control. You may need to add the path to the installed transmission pages to your MANPATH surround variable.

On Microsoft Windows platforms, you can access the Tcl help from the Start carte, shown in the following analogy.

This volition open a window for selecting which help you lot need. The window is shown in the following illustration.

Selecting Tcl Manual/Tcl Built-In Commands/Listing Handling from that card volition open a window similar that shown in the following illustration. Yous can select from this window the page of help you demand.

A Macintosh running Mac OS X comes with Tcl/Tk installed. You lot can access the man pages by opening a last window and typing homo commandName as shown in the following illustration.

If you install a not-Apple version of the tclsh and wish interpreters (for example any the latest ActiveState release is), the new man pages will exist installed in a location defined by the installation.

The post-obit paradigm shows a Safari view of the ActiveState html man pages for Tcl eight.6.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780123847171000014

Networking in CCTV

Vlado Damjanovski , in CCTV (Third Edition), 2014

The IP check commands

Some software commands found on many platforms, Windows and Linux alike, should be known to CCTV users and could help determine what is the IP accost of the computer and whether a network device is present on the network and whether it is visible by other computers. These are the "ping" "ipconfig " commands.

Under Windows => Commencement => Run blazon "Control" or "cmd."

This will open a DOS window where diverse commands tin can be typed in.

To find out the IP address and the MAC address the computer y'all are on blazon in "ipconfig."

ipconfig

The estimator should come with a similar response as shown to the right, with your own numbers. In our example, the reckoner's IP address is 10.0.0.7. The IP config as well shows the MAC address of the device, and if this was an IP camera, the MAC address might be needed if the network was more controlled, or requested by the IT manager.

Let's assume nosotros desire to make up a footling IP CCTV network and we desire to requite our IP camera an address that belongs to the domain nosotros a part of. One of the easiest means to check which addresses are used on the LAN is the command "ipconfig /all."

ipconfig /all

This control should list all IP devices on the network, and we can so decide to employ an IP address from the same range, but the one that is not in that listing. Let's presume nosotros have decided to use ten.0.0.138. Before we do anything, the best suggestion is to test the (not)existence of this address on the network with the "ping" command.

ping <   destination address   >

If some device is using this IP address a response should come stating the ping response time. If no device is using this address, the ping should time-out and no response volition come up dorsum to us, but statement such as "Destination host unreachable."

If the device is connected and has the IP address you take queried (in our case 10.0.0.138), it will answer with something similar to what is shown in the window below. The time taken to reply is shown in milliseconds (ms).

Another useful command is "netstat." The netstat (network statistics) displays network connections (both incoming and outgoing), routing tables, and a number of network interface (network interface controller or software-defined network interface) and network protocol statistics. It is used for finding problems in the network and to determine the amount of traffic on the network equally a performance measurement.

netstat

The screen on the right shows the response from my computer when this control is typed in.

And finally, if at that place is a trouble with the connection, only it is not clear where the network stops going further the "tracert" command may evidence where the problem is:

tracert <   destination address   >

For case, if we were to trace how a ping gets to the Google server, the command volition exist:

tracert www.google.com

And the result may await something like shown here on the correct.

Some times it is necessary to notice out the MAC accost of all devices connected in the same network. An "arp" (from Accost Resolution Protocol) can exist used:

arp –a

Another variation of the "netstat" command might be useful in IP CCTV is the control that helps discover out which ports are opened on a computer and used by which application. This command is written as:

netstat -ano |observe /i "listening"

A response from a command similar that may look similar the one shown on the right.

The kickoff column shows if the protocol used is TCP or UDP. The 2nd column refers to the ports used by some of the processes, and the last cavalcade on the right shows the ID of the procedure using the corresponding port. These process IDs in the Windows Operating Organisation are referred to every bit PID (Process ID) and can be found out via the Windows Task Director, nether the Services tab.

In the example here, the process number 736 that listens on port 135 belongs to the "Remote Procedure Call (RPC)" program.

And, finally, a command, or rather graphical representation, showing the CPU load on a Windows based computer can exist establish nether the Windows Chore Managing director, under the Performance tab. On the screen shot shown to the lesser right it can be seen that this computer has two cores (the 2 separate windows nether the CPU usage history) and that he current CPU usage is merely x% with 47 processes running in the groundwork.

In VMS workstations where multiple streams are being decoded, the all-time place to come across how the computer copes with such a load is to check this window under the Windows Task Managing director.

Read full chapter

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9780124045576500112

Domain IV: Monitoring

In Sarbanes-Oxley It Compliance Using COBIT and Open Source Tools, 2005

Nagios Monitoring of Windows Hosts

Although Nagios is designed to run on a Linux platform, at that place is no limitation to the check_command plug-ins that can exist performed on sure operating systems, including the Windows platform. There are two main approaches to integrating Windows host and service checks. The starting time is the NSClient, which runs on Windows NT4 and higher. The items that tin be monitored include CPU load, disk usage, uptime, service states, process states, retentiveness usage, file age, and most perflib counters. The latest copy is available from http://nsclient.ready2run.nl/ . In order to configure this you must:

Re-create pNSClient.exe, pdh.dll, psapi.dll, and counters.defs in any directory on the car you want to monitor (i.e., c:∖nsclient).

Run the following pNSClient.exe /install command.

Type cyberspace start nsclient on the command line, or start the Nagios Agent service in the services applet on the command panel.

Configure the check_nt plug-in on the Nagios server to monitor your desired items.

The 2nd approach is to trap Uncomplicated Network Management Protocol (SNMP) data. Windows has a lot of performance information that tin can be monitored; however, it is usually hard to monitor remotely. The best approach is to install SNMP services on all of your Windows servers. To betrayal the functioning counters, you must also install any necessary operation Management Information Bases (MIBs) for the services you want to monitor. Spider web site http://snmpboy.msft.cyberspace contains valuable information to assistance in your setup.

Read full affiliate

URL:

https://www.sciencedirect.com/scientific discipline/article/pii/B9781597490368500112

Tooling Around with Nmap

Angela Orebaugh , Becky Pinkard , in Nmap in the Enterprise, 2008

Source and Install

RNmap is a UNIX-based application that will crave other components in order to run. For the RNmap server and whatsoever clients, yous volition have to install the latest version of Python from world wide web.python.org/download/. For our examples here, we are using the UNIX compressed source tarball version for our SUSE 10 RNmap manager and for our separate SUSE 10 client. You will also have to make sure the latest version of Nmap is installed on your RNmap server.

Installation for the tarballs is straightforward for current installations of the Linux platform. For example, for the Python "more compressed" UNIX tarball:

bzip2 -cd Python-2.5.ane.tar.tar | tar xvf -

cd Python-two.5.one

./configure

make

su root

brand install

For the Nmap tarball:

bzip2 -cd nmap-iv.52.tar.bz2 | tar xvf -

cd nmap-4.52

./configure

brand

su root

make install

At this signal, you have ane final package to install–RNmap. The gzip RNmap bundle is quite pocket-sized, measuring in at less than 30 kilobytes. You can download the package from sourceforge here: http://sourceforge.net/projects/rnmap/. Installing it is besides adequately easy: First you must extract the contents of the file:

gzip –dc rnmap_0.10.tar.gz | tar –xvf –

This will extract the file contents to a newly created ./rnmap directory on your difficult drive. One time created, you must navigate to the server directory: /rnmap/server. In order to become started, you lot take to run the python rnmap-adduser script and tell RNmap the names and passwords of your remote users. In our case, we created a user 'Test1' with a password of 'test1'.

vmware1:/software/rnmap/server # python rnmap-adduser.py

Username: Test1

Password:

Retype password:

All washed.

This creates a new file in the /server directory named users.listing. The absurd thing is that RNmap creates an md5 hash of your password and stores the hash so that your users' passwords aren't stored in clear text. Something else that is of import to notice here is that RNmap also sets the UNIX permissions on the users.list file to 600 so that simply the file owner has access. You practise not want other users to be able to view or copy the file to download hashes for attempted cracking. Hither is what our users.list file looks like later creating our Test1 user and affiliated password:

vmware1:/software/rnmap/server # more than users.list

Test1:/jXrQhrwMhrbb7STyWp5Gw==

Tip

You tin can learn more about UNIX file permissions past reading the man folio for chmod. It's easy to find many sites pertaining to this topic only by typing human chmod into the Google search bar. Agreement and properly using UNIX file permissions is integral to securing your files, users and applications.

At this point, we are ready to starting time the RNmap server and test our rnmap.py client from our SUSE customer! RNmap does come with the capability of utilizing SSL to encrypt the channel between the clients and server and while this is certainly recommended, for the purposes of our test, nosotros'll exist calling it without SSL support. Firing upwardly the RNmap server at present is pretty directly-frontwards. First RNmap tells us we need to utilise the –nossl choice:

vmware1:/software/rnmap/server # ./rnmapd

Can't find pyOpenSSL module. If y'all desire to start non SSL Rnmap server

use '–nossl' command line choice.

Retrying with this choice, nosotros are able to get the server started and and so come across information technology running by performing a quick grep for the service name in our process list:

vmware1:/software/rnmap/server # ./rnmapd –nossl

vmware1:/software/rnmap/server # ps -ef | grep rnmapd

root   8465   1 0 03:38 ?   00:00:00 python ./rnmapd –nossl

You lot can as well run a quick netstat command to see if the RNmap port is showing upward on its default TCP port of 3418:

vmware1:/software/rnmap/server # netstat -nap | grep 3418

tcp   0   0 0.0.0.0:3418   0.0.0.0:*   LISTEN 8465/python

Now it looks similar we are in business on the server end. Let's copy the rnmap.py customer and the RNmap /lib directory to our SUSE customer workstation where we take already installed Python and try to connect dorsum to the RNmap server. Again, for the purposes of our demo, we won't be using SSL and we made a quick-admission RNmap directory and dropped the client and the lib files in our Python directory structure to go on things simple. At present, permit'south see if the customer is going to work (Usage instructions edited for fit):

Vmware2:/Python/RNmap-lib # python rnmap.py –nossl

Rnmap cli client 5. 0.ten

Copyright (C) 2000–2003 Tuomo Makinen

Redistributable under the terms of the GNU General Public License

Usage:

-s address Rnmap server address
–p port Server port (default 3418)
–f file Filename to relieve the scanning consequence (no stdout)
–n cmd Nmap command
–u user Username
–c file Text file with Username, Password & Server:Port info (Newline separated)
–help Prints this message
–nossl Turns SSL support off
–mparseable Use car parseable log form
–scriptkiddie Use script kiddie log form
–xml Apply xml log course
–version Testify version of rnmap cli customer

example:

# rnmap.py -s 192.168.1.78 -n "-sS -p one–65535 192.168.one.10" –mparseable

Looks adept! Now we'll motility on to some usage examples in the adjacent department and run across if we can get some results.

Read total chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781597492416000078

Setting Upward CUDA

Shane Melt , in CUDA Programming, 2013

Installing a Debugger

CUDA provides a debug environment called Parallel Nsight on the Windows and Linux platforms. This provides back up for debugging CPU and GPU code and highlights areas where things are working less than efficiently. It as well helps tremendously when trying to debug multithreaded applications.

Nsight is completely costless and is a hugely useful tool. All it requires is that you register as a CUDA-registered developer, which is again entirely gratuitous. Once registered, you will be able to download the tool from the NVIDIA website.

Note that you must take Visual Studio 2008 or after (non the express version) and you must accept installed Service Pack 1. There is a link inside the release notes of Nsight to the SP1 download you need to install. The Linux version integrates into Eclipse.

Parallel Nsight comes as two parts, an application that integrates itself into Visual Studio every bit shown in Figure 4.7, and a split monitoring application. The monitoring application works in conjunction with the main application. The monitor is unremarkably resident, simply does non have to exist, on the same auto every bit the evolution surround. Parallel Nsight works best with two CUDA capable GPUs, a dedicated GPU to run the code on and one to use equally the regular display. Thus, the GPU running the target code cannot exist used to run a second display. As most GPU cards take dual-monitor outputs, you tin just run two monitors off the display bill of fare should you take a dual-monitor setup. Note in the latest release, ii.2, the need for two GPUs was dropped.

Figure 4.7. Nsight integrated into Microsoft Visual Studio.

It'southward as well possible to ready the tool to acquire data from a remote GPU. Yet, in most cases it's easier to buy a low-cease GPU and install it into your PC or workstation. The first step needed to gear up Parallel Nsight on Windows is to disable TDR (Effigy 4.8). TDR (Timeout Detection and Recovery) is a mechanism in Windows that detects crashes in the driver-level lawmaking. If the driver stops responding to events, Windows resets the driver. Every bit the driver will halt when you define a breakpoint, this feature needs to be disabled.

Figure four.8. Disabling Windows kernel timeout.

To set the value, simply run the monitor and click on the "Nsight Monitor Options" hyperlink at the lesser right of the monitor dialog box. This will bring up the dialog shown in Effigy 4.eight. Setting the "WDDM TDR enabled" will modify the registry to disable this characteristic. Reboot your PC, and Parallel Nsight volition no longer warn y'all TDR is enabled.

To employ Parallel Nsight on a remote machine, simply install the monitor package only on the remote Windows PC. When you showtime run the monitor, it volition warn you Windows Firewall has blocked "Public network" (Internet based) access to the monitor, which is entirely what you lot want. However, the tool needs to have access to the local network, then allow this exception to any firewall rules you accept set upwards on the monitor motorcar. As with a local node, you lot will have to ready the TDR issue and reboot one time installed.

The adjacent pace is to run Visual Studio on the host PC and select a new analysis activity. You will run into a section most the meridian of the window that looks like Effigy iv.nine. Notice the "Connection Name" says localhost, which merely means your local machine. Open Windows Explorer and browse the local network to come across the proper name of the Windows PC you would like to employ to remotely debug. Supplant localhost with the name shown in Windows Explorer. And so printing the "Connect" button. You lot should run into two confirmations that the connection has been made equally shown in Figure 4.10.

Figure 4.9. Parallel Nsight remote connection.

Figure 4.ten. Parallel Nsight connected remotely.

First, the "Connect" button will change to a "Disconnect." Second, the "Connection Status" box should plough greenish and evidence all the possible GPUs on the target auto (Effigy 4.xi). In this instance we're connecting to a test PC that has five GTX470 GPU cards prepare on information technology.

Effigy 4.11. Parallel Nsight connexion status.

Clicking on the "Launch" button on the "Application Control" panel adjacent to the "Connexion Condition" panel will remotely launch the application on the target machine. However, prior to this all the necessary files need to exist copied to the remote automobile. This takes a few seconds or so, only is all automatic. Overall, it'southward a remarkably simple way of analyzing/debugging a remote application.

You may wish to set upward Parallel Nsight in this manner if, for example, you have a laptop and wish to debug, or simply remotely run, an application that volition run on a GPU server. Such usage includes when a GPU server or servers are shared by people who apply it at dissimilar times, teaching classes, for example. Yous may as well have remote developers who need to run lawmaking on particularly ready test servers, perhaps considering those servers besides contain huge quantities of information and it'due south not practical or desirable to transfer that data to a local development auto. It also means you lot don't need to install Visual C++ on each of the remote servers you lot might have.

On the Linux and Mac side the debugger environs is CUDA-GDB. This provides an extended GNU debugger package. As with Parallel Nsight information technology allows debugging of both host and CUDA code, which includes setting a breakpoint in the CUDA code, unmarried step, select a debug thread, etc. Both CUDA-GDB and the Visual Profiler tools are installed by default when you install the SDK, rather than being a separate download equally with Parallel Nsight. Equally of 2012, Parallel Nsight was also released under the Eclipse environment for Linux.

The major difference betwixt Windows and Mac/Linux was the profiling tool support. The Parallel Nsight tool is in this respect vastly superior to the Visual Profiler. The Visual Profiler is also available on Windows. It provides a fairly high-level overview and recommendations as to what to address in the lawmaking, and therefore is very suited to those starting out using CUDA. Parallel Nsight, by contrast, is aimed at a far more than advanced user. We cover usage of both Parallel Nsight and Visual Profiler afterward in subsequent chapters. However, the focus throughout this text is on the use of Parallel Nsight equally the primary debugging/assay tool for GPU development.

For advanced CUDA evolution I'd strongly recommend using Parallel Nsight for debugging and analysis. For nearly people new to CUDA the combination of the Visual Profiler and CUDA-GDB work well enough to allow for development.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/article/pii/B9780124159334000041

Virtual Private Networks and Remote Access

Eric Knipp , ... Edgar Danielyan Technical Editor , in Managing Cisco Network Security (Second Edition), 2002

Linux FreeS/WAN

The Secure Wide Area Network projection or FreeS/WAN aims to make IPSec freely bachelor on Linux platforms. It does so by providing free source code for IPSec. The project's official Web site can be plant at www.freeswan.org. It all started with John Gilmore, the founder and main driving force behind FreeS/WAN, who wanted to brand the Cyberspace more secure and protect traffic against wiretapping.

To avoid consign limitations imposed by the U.S. Government of cryptographic products, FreeS/WAN has been completely developed and maintained outside of the United states of america of America. As a upshot, the strong encryption supported by FreeS/WAN is exportable.

Those interested in large calibration FreeS/WAN implementations should read a paper called "Moat: a Virtual Private Network Appliance and Services Platform" that discusses a large VPN deployment using FreeS/WAN. Information technology was written by John South. Denker, Steven K. Bellovin, Hugh Daniel, Nancy L. Mintz,Tom Killian, and Mark A. Plotnick, and is bachelor for download from www.research.att.com/~smb/papers/index.html.

Read total chapter

URL:

https://world wide web.sciencedirect.com/science/commodity/pii/B978193183656250012X

Cloud Application Evolution

Dan C. Marinescu , in Deject Computing, 2013

xi.iv How to launch an EC2 Linux example and connect to it

This department gives a step-by-step process to launch an EC2 Linux instance from a Linux platform.A. Launch an case

ane.

From the AWS Management Console, select EC2 and, in one case signed in, become to Launch Instance Tab.

two.

To determine the processor architecture when you want to match the example with the hardware, enter the command

uname -m

and cull an appropriate Amazon Linux AMI by pressing Select.
3.

Choose Instance Details to control the number, size, and other settings for instances.

4.

To learn how the system works, press Proceed to select the default settings.

5.

Define the case's security, equally discussed in Department xi.iii: In the Create Central Pair page enter a name for the pair and then press Create and Download Key Pair.

6.

The key-pair file downloaded in the previous step is a .pem file, and it must exist subconscious to prevent unauthorized admission. If the file is in the directory awcdir/dada.pem enter the commands

cd awcdir

chmod 400 dada.pem

seven.

Configure the firewall. Go to the page Configure firewall, select the choice Create a New Security Group, and provide a Group Name. Normally we apply ssh to communicate with the instance; the default port for communication is port 8080, and we can change the port and other rules by creating a new rule.

8.

Press Continue and examine the review page, which gives a summary of the example.

ix.

Press Launch and examine the confirmation folio, then printing Close to stop the exam of the confirmation folio.

10.

Press the Instances tab on the navigation console to view the instance.

eleven.

Expect for your Public DNS proper noun. Because by default some details of the instance are hidden, click on the Bear witness/Hide tab on the top of the panel and select Public DNS.

12.

Record the Public DNS equally PublicDNSname; it is needed to connect to the instance from the Linux terminal.

xiii.

Use the ElasticIP panel to assign an Elastic IP accost if a permanent IP address is required.

B. Connect to the instance using ssh and the TCP transport protocol.

1.

Add together a rule to the iptables to permit ssh traffic using the TCP protocol. Without this step, either an access denied or permission denied error message appears when you're trying to connect to the case.

sudo iptables -A iptables -p -tcp -dport ssh -j ACCEPT

ii.

Enter the Linux control:

ssh -i abc.pem [email protected]

If yous get the prompt You want to go along connecting? respond Yes. A warning that the DNS name was added to the list of known hosts will announced.

3.

An icon of the Amazon Linux AMI volition be displayed.

C. Gain root access to the instance

Past default the user does not have root access to the instance; thus, the user cannot install whatsoever software. Once continued to the EC2 instance, use the post-obit control to gain root privileges:

sudo -iSo utilize yum install commands to install software, due east.g., gcc to compile C programs on the cloud.D. Run the service ServiceName

If the case runs under Linux or Unix, the service is terminated when the ssh connection is closed. To avoid the early termination, use the command

nohup ServiceNameTo run the service in the background and redirect stdout and stderr to files p.out and p.err, respectively, execute the control

nohup ServiceName > p.out two > p.err &

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780124046276000117

Our Showtime Python Forensics App

Chet Hosmer , in Python Forensics, 2014

Chapter review

In this chapter I created our first useable Python forensics awarding. The pfish.py plan executes on both Windows and Linux platforms and through some ingenuity I only used Python Standard Library modules to accomplish this forth with our own lawmaking. I also scratched the surface with argparse allowing u.s. to not only parse the command line but as well validate command line parameters before they were used past the application.

I besides enabled the Python logger and reported events and errors to the logging system to provide a forensic tape of our actions. I provided the user with the capability of selecting among the most popular one-way hashing algorithms and the plan extracted cardinal attributes of each file that was processed. I also leveraged the cvs module to create a nicely formatted output file that tin be opened and candy past standard applications on both Windows and Linux systems. Finally, I implemented our first grade in Python with many more to come.

Read full chapter

URL:

https://world wide web.sciencedirect.com/science/commodity/pii/B9780124186767000037

Domain Iii: Delivery and Back up

In Sarbanes-Oxley IT Compliance Using COBIT and Open Source Tools, 2005

Sample Configurations

http://xfld/builtright/?category_id=29

http://xfld/nustuff/?category_id=34

Because this chapter deals with operations, we have included sample configurations of some of the open source software that was discussed, to requite you an idea of how to implement your own solutions. We have provided annotated examples of a short list of sample applications on the enclosed CD, which are geared toward the sample companies as appropriate. Although it is not an exhaustive list, it will give y'all an idea of the major open source projects and what information technology takes to set them up.

Authentication Cluster (NuStuff Electronics

The following sample configurations illustrate how to build an LDAP+Samba authentication server for a cross platform environment (all of these run on the Linux platform):

LDAP Primary/Slave Authentication

ISC Dynamic Host Configuration Protocol (DHCP)/Dynamic DNS with Demark 9

Samba Primary Domain Controller/Backup Domain Controller (PDC)/(BDC)

Heartbeat Cluster for LDAP/Samba

Web Server (BuiltRight Construction)

This sample configuration gives you everything you need to set up and run Apache and eGroupware on a Windows environment, including the following:

Apache Spider web server and Hypertext Preprocessor (PHP)

MySQ database

eGroupware awarding

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9781597490368500100

OpenCL Profiling and Debugging

Bridegroom R. Gaster , ... Dana Schaa , in Heterogeneous Computing with OpenCL (Second Edition), 2013

AMD Accelerated Parallel Processing Profiler

The AMD Accelerated Parallel Processing (APP) Profiler is a operation analysis tool that gathers data from the OpenCL runtime and AMD Radeon GPUs during the execution of an OpenCL application. Nosotros can then utilise this information to discover bottlenecks in an application and observe ways to optimize the awarding'southward performance for AMD platforms. Hereafter, nosotros refer to the AMD APP Profiler every bit the profiler.

The profiler tin be installed as function of the AMD APP SDK installation or individually using its own installer package. You tin download the profiler from the AMD developer website at http://developer.amd.com.

In this section, we draw the major features in Version ii.5 of the profiler with the described version is included with Version ii.seven of the AMD APP SDK. Because the profiler is still being rapidly adult, please consult the profiler documentation for the latest features of the tool.

The profiler supports two usage models:

ane.

Equally a Microsoft Visual Studio 2008 or 2010 plug-in

two.

Every bit a command line utility tool for both Windows and Linux platforms

Using the profiler as a Visual Studio plug-in is the recommended usage model because ane can visualize and analyze the results in multiple ways. To start the profiler in the Visual Studio plug-in, simply load a solution into Visual Studio. Select a C/C++ project as the startup projection, and click on the Collect Application Trace or Collect GPU Performance Counters button on the APP Profiler Session Explorer panel. Past default, the APP Profiler Session Explorer panel will be docked in the same window console as the Visual Studio Solution Explorer panel. No code or projection modifications are required to profile the awarding. The profiler will query Visual Studio for all the project settings required to run the application. When the awarding completes, the profiler will generate and display the contour information.

The command line utility tool is a popular way to collect data for applications for which the source code is non available. The output text files generated by the profiler tin be analyzed directly. They tin also be loaded by the Visual Studio plug-in to exist visualized.

2 modes of operation are supported by the profiler: collecting OpenCL awarding traces and collecting OpenCL kernel GPU operation counters.

Collecting OpenCL Awarding Trace

The OpenCL application trace lists all the OpenCL API calls fabricated by the application. For each API call, the profiler records the input parameters and output results. In addition, the profiler also records the CPU timestamps for the host code and device timestamps retrieved from the OpenCL runtime. The output data is recorded in a text-based AMD custom file format chosen an Awarding Trace Profile file. Consult the tool documentation for the specification. This mode is especially useful in helping to sympathise the high-level structure of a complex application.

From the OpenCL application trace information, we can exercise the post-obit:

Notice the loftier-level structure of the awarding with the Timeline View. From this view, we can determine the number of OpenCL contexts and control queues created and the relationships between these items in the application. The application code, kernel execution, and information transfer operations are shown in a timeline.

Determine if the application is bound past kernel execution or data transfer operations, find the top 10 most expensive kernel and data transfer operations, and observe the API hot spots (near frequently called or about expensive API phone call) in the application with the Summary Pages View.

View and debug the input parameters and output results for all API calls made by the application with the API Trace View.

The Timeline View (Figure 13.1) provides a visual representation of the execution of the application. Along the superlative of the timeline is the time filigree, which shows the total elapsed time of the application when fully zoomed out, in milliseconds. Timing begins when the first OpenCL phone call is fabricated by the awarding and ends when the final OpenCL call is made. Directly below the time filigree, each host (Os) thread that made at least one OpenCL phone call is listed. For each host thread, the OpenCL API calls are plotted along the time filigree, showing the start fourth dimension and duration of each call. Below the host threads, the OpenCL tree shows all contexts and queues created by the awarding, along with data transfer operations and kernel execution operations for each queue. Nosotros tin can navigate in the Timeline View by zooming, panning, collapsing/expanding, or selecting a region of involvement. From the Timeline View, we can also navigate to the corresponding API call in the API Trace View and vice versa.

Figure 13.one. The Timeline and API Trace View of AMD APP Profiler in Microsoft Visual Studio 2010.

The Timeline View can be useful for debugging your OpenCL application. The following are examples:

Y'all can easily ostend that the high-level structure of your application is correct. By examining the timeline, you can verify that the number of queues and contexts created friction match your expectations for the awarding.

You can gain confidence that synchronization has been performed properly in the application. For instance, if kernel A execution is dependent on a buffer functioning and outputs from kernel B execution, then kernel A execution should announced later on the completion of the buffer execution and kernel B execution in the fourth dimension grid. It can be hard to find this type of synchronization fault using traditional debugging techniques.

Finally, you tin can run into that the application has been utilizing the hardware efficiently. For example, the timeline should evidence that nondependent kernel executions and data transfer operations occur simultaneously.

Summary Pages View

The Summary Pages View shows various statistics for your OpenCL application. Information technology can provide a general idea of the location of the awarding's bottlenecks. It also provides useful information such as the number of buffers and images created on each context, the most expensive kernel call, etc.

The Summary Pages View provides access to the following individual pages:

API Summary page: This page shows statistics for all OpenCL API calls fabricated in the application for API hot spot identification.

Context Summary page: This page shows the statistics for all the kernel acceleration and information transfer operations for each context. It likewise shows the number of buffers and images created for each context. This is shown in Figure 13.ii.

Figure 13.ii. The Context Summary Page View of AMD APP Profiler in Microsoft Visual Studio 2010.

Kernel Summary page: This page shows statistics for all the kernels that are created in the application.

Tiptop 10 Data Transfer Summary folio: This page shows a sorted listing of the 10 most expensive individual data transfer operations.

Top 10 Kernel Summary folio: This page shows a sorted listing of the 10 most expensive private kernel execution operations.

Warning(south)/Error(southward) Page: The Warning(due south)/Error(s) Page shows potential problems in your OpenCL application. It can find unreleased OpenCL resources, OpenCL API failures and provide suggestions to reach better functioning. Clicking on a hyperlink takes you to the corresponding OpenCL API that generates the message.

From these summary pages, it is possible to make up one's mind whether the awarding is bound by kernel execution or data transfer (Context Summary page). If the awarding is bound by kernel execution, we can determine which device is the bottleneck. From the Kernel Summary page, we can find the name of the kernel with the highest total execution time. Or, from the Tiptop 10 Kernel Summary page, we can find the individual kernel example with the highest execution time. If the kernel execution on a GPU device is the bottleneck, the GPU performance counters can then be used to investigate the clogging inside the kernel. We draw the GPU performance counters view later in this chapter.

If the application is bound by the data transfers, it is possible to make up one's mind the most expensive information transfer type (read, write, re-create, or map) in the application from the Context Summary page. We can investigate whether we can minimize this blazon of information transfer by modifying the algorithm if necessary. With help from the Timeline View, we tin can investigate whether information transfers accept been executed in the nigh efficient way—that is, concurrently with a kernel execution.

API Trace View

The API Trace View (Effigy 13.one) lists all the OpenCL API calls fabricated by the application.

Each host thread that makes at to the lowest degree one OpenCL call is listed in a separate tab. Each tab contains a listing of all the API calls fabricated by that particular thread. For each call, the list displays the index of the call (representing execution guild), the name of the API office, a semicolon delimited list of parameters passed to the part, and the value returned by the function. When displaying parameters, the profiler will attempt to dereference pointers and decode enumeration values to give every bit much data as possible about the data being passed in or returned from the role. Double-clicking an item in the API Trace View will display and zoom into that API call in the Host Thread row in the Timeline View.

The view allows us to clarify and debug the input parameters and output results for each API telephone call. For example, we tin can easily cheque that all the API calls are returning CL_SUCCESS or that all the buffers are created with the correct flags. We can also identify redundant API calls using this view.

Collecting OpenCL GPU Kernel Performance Counters

The GPU kernel performance counters can be used to find possible bottlenecks in the kernel execution. Yous can find the list of performance counters supported by AMD Radeon GPUs in the tool documentation.

Once we have used the trace data to discover which kernel is most in need of optimization, we can collect the GPU performance counters to drill down into the kernel execution on a GPU device. Using the performance counters, we can exercise the post-obit:

Detect the number of resources (General Purpose Registers, Local Memory size, and Flow Control Stack size) allocated for the kernel. These resources affect the possible number of in-flight wavefronts in the GPU. A higher number of wavefronts ameliorate hides data latency.

Determine the number of ALU, global, and local memory instructions executed by the GPU.

Determine the number of bytes fetched from and written to the global memory.

Determine the utilization of the SIMD engines and memory units in the system.

View the efficiency of the Shader Compiler in packing ALU instructions into the VLIW instructions used by AMD GPUs.

View whatsoever local memory (Local Data Share (LDS)) bank conflicts where multiple lanes within a SIMD unit attempt to read or write the same LDS bank and accept to be serialized, causing admission latency.

The Session View (Figure 13.three) shows the operation counters for a profile session. The output information is recorded in a comma-separated-variable (csv) format. You can besides click on the kernel name entry in the "Method" column to view the OpenCL kernel source, AMD Intermediate Linguistic communication, GPU ISA, or CPU associates code for that kernel.

Effigy 13.three. The Session View of AMD APP Profiler in Microsoft Visual Studio 2010.

Read full affiliate

URL:

https://www.sciencedirect.com/science/article/pii/B9780124058941000139