gprofng: add an examples directory

This directory contains example programs for the user to experiment with.
Initially there is one application written in C.  The plan is to include
more examples, also in other langauges, over time.
In addition to the sources and a make file, a sample script how to make
a profile is included.  There is also a README.md file.

gprofng/ChangeLog
2024-01-08  Ruud van der Pas  <ruud.vanderpas@oracle.com>

	* examples: Top level directory.
	* examples/mxv-pthreads: Example program written in C.
This commit is contained in:
Vladimir Mezentsev 2024-01-08 21:52:39 -08:00
parent 8fe04eeb2c
commit c49f224f9e
8 changed files with 1115 additions and 0 deletions

View File

@ -0,0 +1,158 @@
# README for the matrix-vector multiplication demo code
## Synopsis
This program implements the multiplication of a matrix and a vector. It is
written in C and has been parallelized using the Pthreads parallel programming
model. Each thread gets assigned a contiguous set of rows of the matrix to
work on and the results are stored in the output vector.
The code initializes the data, executes the matrix-vector multiplication, and
checks the correctness of the results. In case of an error, a message to this
extent is printed and the program aborts. Otherwise it prints a one line
message on the screen.
## About this code
This is a standalone code, not a library. It is meant as a simple example to
experiment with gprofng.
## Directory structure
There are four directories:
1. `bindir` - after the build, it contains the executable.
2. `experiments` - after the installation, it contains the executable and
also has an example profiling script called `profile.sh`.
3. `objects` - after the build, it contains the object files.
4. `src` - contains the source code and the make file to build, install,
and check correct functioning of the executable.
## Code internals
This is the main execution flow:
* Parse the user options.
* Compute the internal settings for the algorithm.
* Initialize the data and compute the reference results needed for the correctness
check.
* Create and execute the threads. Each thread performs the matrix-vector
multiplication on a pre-determined set of rows.
* Verify the results are correct.
* Print statistics and release the allocated memory.
## Installation
The Makefile in the `src` subdirectory can be used to build, install and check the
code.
Use `make` at the command line to (re)build the executable called `mxv-pthreads`. It will be
stored in the directory `bindir`:
```
$ make
gcc -o ../objects/main.o -c -g -O -Wall -Werror=undef -Wstrict-prototypes main.c
gcc -o ../objects/manage_data.o -c -g -O -Wall -Werror=undef -Wstrict-prototypes manage_data.c
gcc -o ../objects/workload.o -c -g -O -Wall -Werror=undef -Wstrict-prototypes workload.c
gcc -o ../objects/mxv.o -c -g -O -Wall -Werror=undef -Wstrict-prototypes mxv.c
gcc -o ../bindir/mxv-pthreads ../objects/main.o ../objects/manage_data.o ../objects/workload.o ../objects/mxv.o -lm -lpthread
ldd ../bindir/mxv-pthreads
linux-vdso.so.1 (0x0000ffff9ea8b000)
libm.so.6 => /lib64/libm.so.6 (0x0000ffff9e9ad000)
libc.so.6 => /lib64/libc.so.6 (0x0000ffff9e7ff000)
/lib/ld-linux-aarch64.so.1 (0x0000ffff9ea4e000)
$
```
The `make install` command installs the executable in directory `experiments`.
```
$ make install
Installed mxv-pthreads in ../experiments
$
```
The `make check` command may be used to verify the program works as expected:
```
$ make check
Running mxv-pthreads in ../experiments
mxv: error check passed - rows = 1000 columns = 1500 threads = 2
$
```
The `make clean` comand removes the object files from the `objects` directory
and the executable from the `bindir` directory.
The `make veryclean` command implies `make clean`, but also removes the
executable from directory `experiments`.
## Usage
The code takes several options, but all have a default value. If the code is
executed without any options, these defaults will be used. To get an overview of
all the options supported, and the defaults, use the `-h` option:
```
$ ./mxv-pthreads -h
Usage: ./mxv-pthreads [-m <number of rows>] [-n <number of columns] [-r <repeat count>] [-t <number of threads] [-v] [-h]
-m - number of rows, default = 2000
-n - number of columns, default = 3000
-r - the number of times the algorithm is repeatedly executed, default = 200
-t - the number of threads used, default = 1
-v - enable verbose mode, off by default
-h - print this usage overview and exit
$
```
For more extensive run time diagnostic messages use the `-v` option.
As an example, these are the options to compute the product of a 2000x1000 matrix
with a vector of length 1000 and use 4 threads. Verbose mode has been enabled:
```
$ ./mxv-pthreads -m 2000 -n 1000 -t 4 -v
Verbose mode enabled
Allocated data structures
Initialized matrix and vectors
Defined workload distribution
Assigned work to threads
Thread 0 has been created
Thread 1 has been created
Thread 2 has been created
Thread 3 has been created
Matrix vector multiplication has completed
Verify correctness of result
Error check passed
mxv: error check passed - rows = 2000 columns = 1000 threads = 4
$
```
## Executing the examples
Directory `experiments` contains the `profile.sh` script. This script
checks if gprofng can be found and for the executable to be installed.
The script will then run a data collection experiment, followed by a series
of invocations of `gprofng display text` to show various views. The results
are printed on stdout.
To include the commands executed in the output of the script, and store the
results in a file called `LOG`, execute the script as follows:
```
$ bash -x ./profile.sh >& LOG
```
## Additional comments
* The reason that compiler based inlining is disabled is to make the call tree
look more interesting. For the same reason, the core multiplication function
`mxv_core` has inlining disabled through the `void __attribute__ ((noinline))`
attribute. Of course you're free to change this. It certainly does not affect
the workings of the code.
* This distribution includes a script called `profile.sh`. It is in the
`experiments` directory and meant as an example for (new) users of gprofng.
It can be used to produce profiles at the command line. It is also suitable
as a starting point to develop your own profiling script(s).

View File

@ -0,0 +1,79 @@
#
# Copyright (C) 2021-2023 Free Software Foundation, Inc.
#
# This file is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; see the file COPYING3. If not see
# <http://www.gnu.org/licenses/>.
#
#------------------------------------------------------------------------------
# This script demonstrates how to use gprofng.
#
# After the experiment data has been generated, several views into the data
# are shown.
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# Define the executable, algorithm parameters and gprofng settings.
#------------------------------------------------------------------------------
exe=../experiments/mxv-pthreads
rows=4000
columns=2000
threads=2
exp_directory=experiment.$threads.thr.er
#------------------------------------------------------------------------------
# Check if gprofng has been installed and can be executed.
#------------------------------------------------------------------------------
which gprofng > /dev/null 2>&1
if (test $? -eq 0) then
echo ""
echo "Version information of the gprofng release used:"
echo ""
gprofng --version
echo ""
else
echo "Error: gprofng cannot be found - if it was installed, check your path"
exit
fi
#------------------------------------------------------------------------------
# Check if the executable is present.
#------------------------------------------------------------------------------
if (! test -x $exe) then
echo "Error: executable $exe not found - check the make install command"
exit
fi
echo "-------------- Collect the experiment data -----------------------------"
gprofng collect app -O $exp_directory $exe -m $rows -n $columns -t $threads
#------------------------------------------------------------------------------
# Make sure that the collect experiment succeeded and created an experiment
# directory with the performance data.
#------------------------------------------------------------------------------
if (! test -d $exp_directory) then
echo "Error: experiment directory $exp_directory not found"
exit
fi
echo "-------------- Show the function overview -----------------------------"
gprofng display text -functions $exp_directory
echo "-------------- Show the function overview limit to the top 5 -----------"
gprofng display text -limit 5 -functions $exp_directory
echo "-------------- Show the source listing of mxv_core ---------------------"
gprofng display text -metrics e.totalcpu -source mxv_core $exp_directory
echo "-------------- Show the disassembly listing of mxv_core ----------------"
gprofng display text -metrics e.totalcpu -disasm mxv_core $exp_directory

View File

@ -0,0 +1,70 @@
#
# Copyright (C) 2021-2023 Free Software Foundation, Inc.
#
# This file is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; see the file COPYING3. If not see
# <http://www.gnu.org/licenses/>.
CC = gcc
WARNINGS = -Wall -Werror=undef -Wstrict-prototypes
OPT = -g -O
CFLAGS = $(OPT) $(WARNINGS)
LDFLAGS =
LIBS = -lm -lpthread
OBJDIR = ../objects
BINDIR = ../bindir
EXPDIR = ../experiments
EXE = mxv-pthreads
OBJECTS = $(OBJDIR)/main.o $(OBJDIR)/manage_data.o $(OBJDIR)/workload.o $(OBJDIR)/mxv.o
default: $(BINDIR)/$(EXE)
$(BINDIR)/$(EXE): $(OBJECTS)
@mkdir -p $(BINDIR)
$(CC) -o $(BINDIR)/$(EXE) $(LDFLAGS) $(OBJECTS) $(LIBS)
ldd $(BINDIR)/$(EXE)
$(OBJDIR)/main.o: main.c
@mkdir -p $(OBJDIR)
$(CC) -o $(OBJDIR)/main.o -c $(CFLAGS) main.c
$(OBJDIR)/manage_data.o: manage_data.c
@mkdir -p $(OBJDIR)
$(CC) -o $(OBJDIR)/manage_data.o -c $(CFLAGS) manage_data.c
$(OBJDIR)/workload.o: workload.c
@mkdir -p $(OBJDIR)
$(CC) -o $(OBJDIR)/workload.o -c $(CFLAGS) workload.c
$(OBJDIR)/mxv.o: mxv.c
@mkdir -p $(OBJDIR)
$(CC) -o $(OBJDIR)/mxv.o -c $(CFLAGS) mxv.c
$(OBJECTS): mydefs.h
.c.o:
$(CC) -c -o $@ $(CFLAGS) $<
check:
@echo "Running $(EXE) in $(EXPDIR)"
@./$(EXPDIR)/$(EXE) -m 1000 -n 1500 -t 2
install: $(BINDIR)/$(EXE)
@/bin/cp $(BINDIR)/$(EXE) $(EXPDIR)
@echo "Installed $(EXE) in $(EXPDIR)"
clean:
@/bin/rm -f $(BINDIR)/$(EXE)
@/bin/rm -f $(OBJECTS)
veryclean:
@make clean
@/bin/rm -f $(EXPDIR)/$(EXE)

View File

@ -0,0 +1,374 @@
/* Copyright (C) 2021-2023 Free Software Foundation, Inc.
Contributed by Oracle.
This file is part of GNU Binutils.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 51 Franklin Street - Fifth Floor, Boston,
MA 02110-1301, USA. */
/*
* -----------------------------------------------------------------------------
* This program implements the multiplication of an m by n matrix with a vector
* of length n. The Posix Threads parallel programming model is used to
* parallelize the core matrix-vector multiplication algorithm.
* -----------------------------------------------------------------------------
*/
#include "mydefs.h"
int main (int argc, char **argv)
{
bool verbose = false;
thread_data *thread_data_arguments;
pthread_t *pthread_ids;
int64_t remainder_rows;
int64_t rows_per_thread;
int64_t active_threads;
int64_t number_of_rows;
int64_t number_of_columns;
int64_t number_of_threads;
int64_t repeat_count;
double **A;
double *b;
double *c;
double *ref;
int64_t errors;
/*
* -----------------------------------------------------------------------------
* Start the ball rolling - Get the user options and parse them.
* -----------------------------------------------------------------------------
*/
(void) get_user_options (
argc,
argv,
&number_of_rows,
&number_of_columns,
&repeat_count,
&number_of_threads,
&verbose);
if (verbose) printf ("Verbose mode enabled\n");
/*
* -----------------------------------------------------------------------------
* Allocate storage for all data structures.
* -----------------------------------------------------------------------------
*/
(void) allocate_data (
number_of_threads, number_of_rows,
number_of_columns, &A, &b, &c, &ref,
&thread_data_arguments, &pthread_ids);
if (verbose) printf ("Allocated data structures\n");
/*
* -----------------------------------------------------------------------------
* Initialize the data.
* -----------------------------------------------------------------------------
*/
(void) init_data (number_of_rows, number_of_columns, A, b, c, ref);
if (verbose) printf ("Initialized matrix and vectors\n");
/*
* -----------------------------------------------------------------------------
* Determine the main workload settings.
* -----------------------------------------------------------------------------
*/
(void) get_workload_stats (
number_of_threads, number_of_rows,
number_of_columns, &rows_per_thread,
&remainder_rows, &active_threads);
if (verbose) printf ("Defined workload distribution\n");
for (int64_t TID=active_threads; TID<number_of_threads; TID++)
{
thread_data_arguments[TID].do_work = false;
}
for (int64_t TID=0; TID<active_threads; TID++)
{
thread_data_arguments[TID].thread_id = TID;
thread_data_arguments[TID].verbose = verbose;
thread_data_arguments[TID].do_work = true;
thread_data_arguments[TID].repeat_count = repeat_count;
(void) determine_work_per_thread (
TID, rows_per_thread, remainder_rows,
&thread_data_arguments[TID].row_index_start,
&thread_data_arguments[TID].row_index_end);
thread_data_arguments[TID].m = number_of_rows;
thread_data_arguments[TID].n = number_of_columns;
thread_data_arguments[TID].b = b;
thread_data_arguments[TID].c = c;
thread_data_arguments[TID].A = A;
}
if (verbose) printf ("Assigned work to threads\n");
/*
* -----------------------------------------------------------------------------
* Create and execute the threads. Note that this means that there will be
* <t+1> threads, with <t> the number of threads specified on the commandline,
* or the default if the -t option was not used.
*
* Per the pthread_create () call, the threads start executing right away.
* -----------------------------------------------------------------------------
*/
for (int TID=0; TID<active_threads; TID++)
{
if (pthread_create (&pthread_ids[TID], NULL, driver_mxv,
(void *) &thread_data_arguments[TID]) != 0)
{
printf ("Error creating thread %d\n", TID);
perror ("pthread_create"); exit (-1);
}
else
{
if (verbose) printf ("Thread %d has been created\n", TID);
}
}
/*
* -----------------------------------------------------------------------------
* Wait for all threads to finish.
* -----------------------------------------------------------------------------
*/
for (int TID=0; TID<active_threads; TID++)
{
pthread_join (pthread_ids[TID], NULL);
}
if (verbose)
{
printf ("Matrix vector multiplication has completed\n");
printf ("Verify correctness of result\n");
}
/*
* -----------------------------------------------------------------------------
* Check the numerical results.
* -----------------------------------------------------------------------------
*/
if ((errors = check_results (number_of_rows, number_of_columns,
c, ref)) == 0)
{
if (verbose) printf ("Error check passed\n");
}
else
{
printf ("Error: %ld differences in the results detected\n", errors);
}
/*
* -----------------------------------------------------------------------------
* Print a summary of the execution.
* -----------------------------------------------------------------------------
*/
print_all_results (number_of_rows, number_of_columns, number_of_threads,
errors);
/*
* -----------------------------------------------------------------------------
* Release the allocated memory and end execution.
* -----------------------------------------------------------------------------
*/
free (A);
free (b);
free (c);
free (ref);
free (pthread_ids);
return (0);
}
/*
* -----------------------------------------------------------------------------
* Parse user options and set variables accordingly. In case of an error, print
* a message, but do not bail out yet. In this way we can catch multiple input
* errors.
* -----------------------------------------------------------------------------
*/
int get_user_options (int argc, char *argv[],
int64_t *number_of_rows,
int64_t *number_of_columns,
int64_t *repeat_count,
int64_t *number_of_threads,
bool *verbose)
{
int opt;
int errors = 0;
int64_t default_number_of_threads = 1;
int64_t default_rows = 2000;
int64_t default_columns = 3000;
int64_t default_repeat_count = 200;
bool default_verbose = false;
*number_of_rows = default_rows;
*number_of_columns = default_columns;
*number_of_threads = default_number_of_threads;
*repeat_count = default_repeat_count;
*verbose = default_verbose;
while ((opt = getopt (argc, argv, "m:n:r:t:vh")) != -1)
{
switch (opt)
{
case 'm':
*number_of_rows = atol (optarg);
break;
case 'n':
*number_of_columns = atol (optarg);
break;
case 'r':
*repeat_count = atol (optarg);
break;
case 't':
*number_of_threads = atol (optarg);
break;
case 'v':
*verbose = true;
break;
case 'h':
default:
printf ("Usage: %s " \
"[-m <number of rows>] " \
"[-n <number of columns] [-r <repeat count>] " \
"[-t <number of threads] [-v] [-h]\n", argv[0]);
printf ("\t-m - number of rows, default = %ld\n",
default_rows);
printf ("\t-n - number of columns, default = %ld\n",
default_columns);
printf ("\t-r - the number of times the algorithm is " \
"repeatedly executed, default = %ld\n",
default_repeat_count);
printf ("\t-t - the number of threads used, default = %ld\n",
default_number_of_threads);
printf ("\t-v - enable verbose mode, %s by default\n",
(default_verbose) ? "on" : "off");
printf ("\t-h - print this usage overview and exit\n");
exit (0);
break;
}
}
/*
* -----------------------------------------------------------------------------
* Check for errors and bail out in case of problems.
* -----------------------------------------------------------------------------
*/
if (*number_of_rows <= 0)
{
errors++;
printf ("Error: The number of rows is %ld but should be strictly " \
"positive\n", *number_of_rows);
}
if (*number_of_columns <= 0)
{
errors++;
printf ("Error: The number of columns is %ld but should be strictly " \
"positive\n", *number_of_columns);
}
if (*repeat_count <= 0)
{
errors++;
printf ("Error: The repeat count is %ld but should be strictly " \
"positive\n", *repeat_count);
}
if (*number_of_threads <= 0)
{
errors++;
printf ("Error: The number of threads is %ld but should be strictly " \
"positive\n", *number_of_threads);
}
if (errors != 0)
{
printf ("There are %d input error (s)\n", errors); exit (-1);
}
return (errors);
}
/*
* -----------------------------------------------------------------------------
* Print a summary of the execution status.
* -----------------------------------------------------------------------------
*/
void print_all_results (int64_t number_of_rows,
int64_t number_of_columns,
int64_t number_of_threads,
int64_t errors)
{
printf ("mxv: error check %s - rows = %ld columns = %ld threads = %ld\n",
(errors == 0) ? "passed" : "failed",
number_of_rows, number_of_columns, number_of_threads);
}
/*
* -----------------------------------------------------------------------------
* Check whether the computations produced the correct results.
* -----------------------------------------------------------------------------
*/
int64_t check_results (int64_t m, int64_t n, double *c, double *ref)
{
char *marker;
int64_t errors = 0;
double relerr;
double TOL = 100.0 * DBL_EPSILON;
double SMALL = 100.0 * DBL_MIN;
if ((marker=(char *)malloc (m*sizeof (char))) == NULL)
{
perror ("array marker");
exit (-1);
}
for (int64_t i=0; i<m; i++)
{
if (fabs (ref[i]) > SMALL)
{
relerr = fabs ((c[i]-ref[i])/ref[i]);
}
else
{
relerr = fabs ((c[i]-ref[i]));
}
if (relerr <= TOL)
{
marker[i] = ' ';
}
else
{
errors++;
marker[i] = '*';
}
}
if (errors > 0)
{
printf ("Found %ld differences in results for m = %ld n = %ld:\n",
errors,m,n);
for (int64_t i=0; i<m; i++)
printf (" %c c[%ld] = %f ref[%ld] = %f\n",marker[i],i,c[i],i,ref[i]);
}
return (errors);
}

View File

@ -0,0 +1,148 @@
/* Copyright (C) 2021-2023 Free Software Foundation, Inc.
Contributed by Oracle.
This file is part of GNU Binutils.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 51 Franklin Street - Fifth Floor, Boston,
MA 02110-1301, USA. */
#include "mydefs.h"
bool verbose;
/*
* -----------------------------------------------------------------------------
* This function allocates the data and sets up the data structures to be used
* in the remainder.
* -----------------------------------------------------------------------------
*/
void allocate_data (int active_threads,
int64_t number_of_rows,
int64_t number_of_columns,
double ***A,
double **b,
double **c,
double **ref,
thread_data **thread_data_arguments,
pthread_t **pthread_ids)
{
if ((*b = (double *) malloc (number_of_columns * sizeof (double))) == NULL)
{
printf ("Error: allocation of vector b failed\n");
perror ("vector b");
exit (-1);
}
else
{
if (verbose) printf ("Vector b allocated\n");
}
if ((*c = (double *) malloc (number_of_rows * sizeof (double))) == NULL)
{
printf ("Error: allocation of vector c failed\n");
perror ("vector c");
exit (-1);
}
else
{
if (verbose) printf ("Vector c allocated\n");
}
if ((*ref = (double *) malloc (number_of_rows * sizeof (double))) == NULL)
{
printf ("Error: allocation of vector ref failed\n");
perror ("vector ref");
exit (-1);
}
if ((*A = (double **) malloc (number_of_rows * sizeof (double))) == NULL)
{
printf ("Error: allocation of matrix A failed\n");
perror ("matrix A");
exit (-1);
}
else
{
for (int64_t i=0; i<number_of_rows; i++)
{
if (((*A)[i] = (double *) malloc (number_of_columns
* sizeof (double))) == NULL)
{
printf ("Error: allocation of matrix A columns failed\n");
perror ("matrix A[i]");
exit (-1);
}
}
if (verbose) printf ("Matrix A allocated\n");
}
if ((*thread_data_arguments = (thread_data *) malloc ((active_threads)
* sizeof (thread_data))) == NULL)
{
perror ("malloc thread_data_arguments");
exit (-1);
}
else
{
if (verbose) printf ("Structure thread_data_arguments allocated\n");
}
if ((*pthread_ids = (pthread_t *) malloc ((active_threads)
* sizeof (pthread_t))) == NULL)
{
perror ("malloc pthread_ids");
exit (-1);
}
else
{
if (verbose) printf ("Structure pthread_ids allocated\n");
}
}
/*
* -----------------------------------------------------------------------------
* This function initializes the data.
* -----------------------------------------------------------------------------
*/
void init_data (int64_t m,
int64_t n,
double **restrict A,
double *restrict b,
double *restrict c,
double *restrict ref)
{
(void) srand48 (2020L);
for (int64_t j=0; j<n; j++)
b[j] = 1.0;
for (int64_t i=0; i<m; i++)
{
ref[i] = n*i;
c[i] = -2022;
for (int64_t j=0; j<n; j++)
A[i][j] = drand48 ();
}
for (int64_t i=0; i<m; i++)
{
double row_sum = 0.0;
for (int64_t j=0; j<n; j++)
row_sum += A[i][j];
ref[i] = row_sum;
}
}

View File

@ -0,0 +1,78 @@
/* Copyright (C) 2021-2023 Free Software Foundation, Inc.
Contributed by Oracle.
This file is part of GNU Binutils.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 51 Franklin Street - Fifth Floor, Boston,
MA 02110-1301, USA. */
#include "mydefs.h"
/*
* -----------------------------------------------------------------------------
* Driver for the core computational part.
* -----------------------------------------------------------------------------
*/
void *driver_mxv (void *thread_arguments)
{
thread_data *local_data;
local_data = (thread_data *) thread_arguments;
bool do_work = local_data->do_work;
int64_t repeat_count = local_data->repeat_count;
int64_t row_index_start = local_data->row_index_start;
int64_t row_index_end = local_data->row_index_end;
int64_t m = local_data->m;
int64_t n = local_data->n;
double *b = local_data->b;
double *c = local_data->c;
double **A = local_data->A;
if (do_work)
{
for (int64_t r=0; r<repeat_count; r++)
{
(void) mxv_core (row_index_start, row_index_end, m, n, A, b, c);
}
}
return (0);
}
/*
* -----------------------------------------------------------------------------
* Computational heart of the algorithm.
*
* Disable inlining to avoid the repeat count loop is removed by the compiler.
* This is only done to make for a more interesting call tree.
* -----------------------------------------------------------------------------
*/
void __attribute__ ((noinline)) mxv_core (int64_t row_index_start,
int64_t row_index_end,
int64_t m,
int64_t n,
double **restrict A,
double *restrict b,
double *restrict c)
{
for (int64_t i=row_index_start; i<=row_index_end; i++)
{
double row_sum = 0.0;
for (int64_t j=0; j<n; j++)
row_sum += A[i][j] * b[j];
c[i] = row_sum;
}
}

View File

@ -0,0 +1,117 @@
/* Copyright (C) 2021-2023 Free Software Foundation, Inc.
Contributed by Oracle.
This file is part of GNU Binutils.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 51 Franklin Street - Fifth Floor, Boston,
MA 02110-1301, USA. */
#ifndef ALREADY_INCLUDED
#define ALREADY_INCLUDED
#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <float.h>
#include <math.h>
#include <malloc.h>
#include <pthread.h>
struct thread_arguments_data {
int thread_id;
bool verbose;
bool do_work;
int64_t repeat_count;
int64_t row_index_start;
int64_t row_index_end;
int64_t m;
int64_t n;
double *b;
double *c;
double **A;
};
typedef struct thread_arguments_data thread_data;
void *driver_mxv (void *thread_arguments);
void __attribute__ ((noinline)) mxv_core (int64_t row_index_start,
int64_t row_index_end,
int64_t m,
int64_t n,
double **restrict A,
double *restrict b,
double *restrict c);
int get_user_options (int argc,
char *argv[],
int64_t *number_of_rows,
int64_t *number_of_columns,
int64_t *repeat_count,
int64_t *number_of_threads,
bool *verbose);
void init_data (int64_t m,
int64_t n,
double **restrict A,
double *restrict b,
double *restrict c,
double *restrict ref);
void allocate_data (int active_threads,
int64_t number_of_rows,
int64_t number_of_columns,
double ***A,
double **b,
double **c,
double **ref,
thread_data **thread_data_arguments,
pthread_t **pthread_ids);
int64_t check_results (int64_t m,
int64_t n,
double *c,
double *ref);
void get_workload_stats (int64_t number_of_threads,
int64_t number_of_rows,
int64_t number_of_columns,
int64_t *rows_per_thread,
int64_t *remainder_rows,
int64_t *active_threads);
void determine_work_per_thread (int64_t TID,
int64_t rows_per_thread,
int64_t remainder_rows,
int64_t *row_index_start,
int64_t *row_index_end);
void mxv (int64_t m,
int64_t n,
double **restrict A,
double *restrict b,
double *restrict c);
void print_all_results (int64_t number_of_rows,
int64_t number_of_columns,
int64_t number_of_threads,
int64_t errors);
extern bool verbose;
#endif

View File

@ -0,0 +1,91 @@
/* Copyright (C) 2021-2023 Free Software Foundation, Inc.
Contributed by Oracle.
This file is part of GNU Binutils.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 3, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, 51 Franklin Street - Fifth Floor, Boston,
MA 02110-1301, USA. */
#include "mydefs.h"
/*
* -----------------------------------------------------------------------------
* This function determines the number of rows each thread will be working on
* and also how many threads will be active.
* -----------------------------------------------------------------------------
*/
void get_workload_stats (int64_t number_of_threads,
int64_t number_of_rows,
int64_t number_of_columns,
int64_t *rows_per_thread,
int64_t *remainder_rows,
int64_t *active_threads)
{
if (number_of_threads <= number_of_rows)
{
*remainder_rows = number_of_rows%number_of_threads;
*rows_per_thread = (number_of_rows - (*remainder_rows))/number_of_threads;
}
else
{
*remainder_rows = 0;
*rows_per_thread = 1;
}
*active_threads = number_of_threads < number_of_rows
? number_of_threads : number_of_rows;
if (verbose)
{
printf ("Rows per thread = %ld remainder = %ld\n",
*rows_per_thread, *remainder_rows);
printf ("Number of active threads = %ld\n", *active_threads);
}
}
/*
* -----------------------------------------------------------------------------
* This function determines which rows each thread will be working on.
* -----------------------------------------------------------------------------
*/
void determine_work_per_thread (int64_t TID, int64_t rows_per_thread,
int64_t remainder_rows,
int64_t *row_index_start,
int64_t *row_index_end)
{
int64_t chunk_per_thread;
if (TID < remainder_rows)
{
chunk_per_thread = rows_per_thread + 1;
*row_index_start = TID * chunk_per_thread;
*row_index_end = (TID + 1) * chunk_per_thread - 1;
}
else
{
chunk_per_thread = rows_per_thread;
*row_index_start = remainder_rows * (rows_per_thread + 1)
+ (TID - remainder_rows) * chunk_per_thread;
*row_index_end = remainder_rows * (rows_per_thread + 1)
+ (TID - remainder_rows) * chunk_per_thread
+ chunk_per_thread - 1;
}
if (verbose)
{
printf ("TID = %ld row_index_start = %ld row_index_end = %ld\n",
TID, *row_index_start, *row_index_end);
}
}